GEM Mapper

From Bioinformatics Core Wiki

First session

Link to the program [1]

Introduction

  • Sequence with quality information
  • Every platform, their specificities, common errors, and some minor format differences. Intermediates are also exported (intensity of fluorescence -> this means more space that the very sequence: 5 bytes per base).

A typical scenario

  • Problem: short-read mapping

Complications:

  • Huge number of sequnces per experiment. Not viable usual alignment algorithms.

How to make sense of NGS data

  • Problem comparing between software pipelines. Like RPKM.
  • Protocol DNA resequencing is quite well established.
  • Problem quantification RNA and Chip-Seq

Biology uses of GEM

  • Mapping: gem-mapper.

Index references first for queries afterwards.

  • When splicing, junctions between exons and cannot be mapped directly. Gem-split-mapper solves it (even inter-chromosomal).

Comment Paolo. Blat cannot do long-range mapping.

  • Mappability: How many times a k-mer appears in the genome.

Bindings from several languages.

GEM in building block of pipelines:

MIRO -> microRNA-seq analysis seq.crg.es/download/software/Miro

David Gonzalez's longRNA

Pipeline for metagenomics Chip-Seq

Flux capacitator (Micha) coupled to GEM library.

Index, more or less the same size of the file. You need as much memory as the size of index, for avoiding swapping.

Second session

Exercise

Map Solexa -> to chrom18 and map it in UCSC Genome Browser

First index: gem-do-index -i Data/chr18.fasta -o chr18 --reverse-complement emulate

Difference emulata and index both strands - Fast both strands small - If large, better only one and emulate

Format of the quality can be different. That should be asked. Phred (Sanger's phred) and Solexa format. Two phred formats. Qualities encoded in different ways. Phred: 0-40 Solexa: -40/-60 - 40

Example: values (0-40) Mask (values higher are simply good. ex.: 26)

Mapping: gen-mapper -I chr18 -i H_sapiens-XX... -q phred -o H_sapiens-XX...

You can define number of maximum mismatches. Program does not distinguish beginning, medium, end.

More than one thread... -t 6 After run (0..5), files need to be joined.

In case of RNA, extra step is split mapping. Since continguous parts of RNA may come from apart places of DNA.

gem-split-mappper

No working with FASTQ, only FASTA. Filter matches by consensus (ex. split sites: -s splice consensus) Done for searching non-canonical splits :O

Format of the mapping: @XX/YY

YY -> Number of mismatch XX -> Total qualitity mismatch: XX = YY*q

Split map format points the split point and the two chromosomes.

gem2sam convert files. Bam format -> Compressed -> Can be indexed and compressed. Then passed to UCSC Built a custom track

Bioinformatics Core Facility @ CRG — 2011-2021