From Bioinformatics Core Wiki


The CRG Bioinformatics Core provides the bioinformatics and computational expertise to researchers at the CRG and external organizations, both academic and commercial. We support researchers throughout all steps from project planning/budgeting and experimental design to coding (R & Python scripts) and pipeline development, data analysis, and results interpretation. We work in synergy with the other CRG Core Technologies units.

We also provide training in basic and advanced bioinformatics techniques and support CRG core facilities and CRG groups with development and maintenance of data and software management resources, including Nextflow pipelines, databases, LIMS, web-tools, and Linux containers.

In addition to services provided for fee, we support fully collaborative grant-funded investigations. This includes preliminary data analysis, planning the grant budget and experiments provided by the CRG Core facilities, writing relevant sections of the proposal, data analysis and biological inference, custom software development, and co-authored dissemination of the grant results.


To request a service or a free consultation, to propose a collaborative project or a grant proposal, please contact us via email or phone. We encourage researchers to discuss both experimental and bioinformatics procedures before submitting materials for sequencing at the [1]. We can help in selecting the cost effective approaches.

After an agreement on the provided services and deliverables is reached, we issue an official quotation, which has to be approved by the user in writing (via e-mail). By agreeing with the quotation, the user enters into the contract with CRG and agrees with Terms and Conditions of the service.

Please note that payment of fees for services and authorship are not mutually exclusive. Each Core personnel who has participated in the work sufficiently enough to take public responsibility for appropriate portions of the content should be recognized as co-author; co-authorship should follow commonly-accepted scientific practice. The recovery of Core expenses through the recharge system does not exclude the possibility for authorship for Core personnel. Similarly, authorship does not substitute for payment of Core expenses for services rendered.

All work performed by the CRG Bioinformatics core should be acknowledged in scholarly publications, posters, and presentations by a direct statement in the acknowledgement section “The authors would like to thank the Bioinformatics Unit of the Centre for Genomic Regulation (CRG) for assistance with <services performed>.”

Throughout the project, we document our work, track the personnel and computational hours, regularly communicate with the researchers on the project progress, and revise initial goals if needed. To avoid unnecessary expenses, if the problem with the data quality was spotted, we communicate it right away.

When the request is completed, we issue an invoice on actual accounted hours (please refer to our FEES). We can also provide the final written report, facilitate preparation of relevant sections of publications, and handle submission of data to public repositories.


All our communication with users, including consultations, meetings, quotations, e-mails, is free of charge. Please refer to cost estimates for standard bioinformatics services and to the CRG webpage for our current fees.


The provided services are the subject of the CRG Core Technologies Terms and Conditions.


  • Reference-based and de novo assembly of eukaryotic and prokaryotic genomes.
  • Genome re-sequencing and quality assessment of genome assemblies.
  • ChIP-seq (TFs, histone modifications): peak calling, differential binding analysis among sample groups, peak annotation.
  • Whole exome and whole genome analysis: variant calling, CNVs.
  • Identification and annotation of DNA structural variants for common and rare human diseases: individual and family analysis, cancer driver gene mutations.
  • Genomes comparison.
  • Genome functional annotation: ab initio gene prediction, annotation of genes, transcripts, DNA motifs, promoters, and other DNA regulatory elements.
  • Analysis of ATAC-seq, CUT&RUN, Nanopore DNA sequencing (assembly, methylation, SVs) and other high-throughput data.


  • Reference-based and de novo assembly of eukaryotic and prokaryotic transcriptomes.
  • Transcriptome functional annotation: ab initio gene prediction, annotation of genes, transcripts, DNA motifs, promoters, and other regulatory elements.
  • Variant calling from transcriptome sequencing data.
  • RNA-seq for mRNA: discovery of new transcripts, differentially expressed genes/transcripts.
  • Functional analysis of differentially expressed genes/transcripts: Gene Ontology terms, DNA motifs, and pathways enrichment analysis.
  • RNA-seq for small and non-coding RNA: differential expression, discovery of new microRNAs, microRNA target prediction.
  • Analysis of OpenArray real-time PCR, and other high-throughput experimental data.
  • Identification of batch effects and visualization of data and results: hierarchical clustering, heatmaps, dendrograms, volcano plots, principal components analysis for the overall (dis)similarity among experiments.
  • RNA-target-based sequencing: RIP-seq, iCLIP, CLIP-seq, and other.
  • Analysis of B and T cell repertoires (adaptive immune receptor repertoires, or AIRR) from high-throughput sequencing data: germline allele assignment, identification of clones, visualization of clonal frequencies.
  • Analysis of Nanopore dRNA-seq data (modifications, A-tail length, isoforms identification).
  • Analysis of single cell RNA-seq.


  • Analysis of amplicon (16S rRNA genes / ITS), whole genome and transcriptome shotgun sequencing data.
  • Identification of microbial communities, taxonomic diversity and abundances at the levels of genus, family, order, class, phylum.
  • Conservation and abundance of bacterial gene functional modules and biochemical pathways.
  • Estimation of microbial diversity and sequence coverage.
  • ORF prediction and functional annotation.
  • Phylogenetic analysis.
  • Comparative analysis of samples: microbial profiles, Gene Ontology terms, metabolic and pathway analyses.


  • Protein functional annotation and prediction.
  • Analysis of SNPs and other variations effects on protein structure and function.
  • Multiple sequence alignment.
  • Orthologs and paralogs assignment.
  • Phylogenetic analysis and tree construction.
  • Protein structure comparison and 3D homology modeling.
  • Protein-protein and protein-ligand 3D docking.
  • B- and T-cell epitope prediction.


We have an extensive experience in design, development and support of following bioinformatics resources (browse our related projects here):

  • Databases: Relational and NoSQL.
  • Websites for data submission, search, and analysis.
  • Web-tools.
  • LIMSs (Laboratory Information Management System) for management of the laboratory's operations, data flow, and communication with users and external collaborators.
  • Software evaluation and benchmarking.
  • Software development: bioinformatics scripts; data processing and analysis pipelines; integrative bioinformatics web applications; customized genome browsers.
  • NextFlow pipelines development, testing, deployment (on Cloud), maintenance.
  • External and internal data integration solutions.
  • Linux containers development, testing, deployment (on Cloud), maintenance.


Please contact us with any bioinformatics problem you may have, even if it does not fit to those listed in our services. Being part of several bioinformatics communities and alliances we may find the solution or experts in the field. Among custom and additional services we provide are the following:

  • Support Cloud (AWS) computing (we maintain the AWS infrastructure for online hands-on training).
  • Support in the design and and analysis of customized NGS experiment.
  • Statistical analysis and plots: R scripts, descriptive and inferred statistics, hypothesis testing, sample size estimation, PCA, clustering, linear regression, correlation, ANOVA.
  • Statistical modeling (e.g., for biomarker discovery, patient stratification).
  • One-to-one training.
  • Data submission to GEO, ArrayExpress, SRA, and other public data repositories.
  • Mining public databases and publications and re-analyzing published data.
  • Manuscript preparation: writing methods, results, results interpretation and visualization.
  • Grant support: writing methods, methodology for data management and sharing, experimental and bioinformatics analysis design, obtaining preliminary results, and result interpretation.
Bioinformatics Core Facility @ CRG — 2011-2024