Reference mapping and variant detection

Reference mapping and variant
Peter Tsai
Bioinformatics Institute, University of Auckland
Reference mapping
The mapping is the process of comparing each read
with the reference genome.
There are many different software available to perform
reference mapping
Multiple placement of reads (multi-hits)
Allow gaps
Don’t allow gaps at all
Limits on number of mis-matches
Assess your mapping results
◦ % of total reads mapped
◦ % of uniquely mapped reads
◦ Coverage statistics, variance in depth
Mapped read depth
Variant detection
Identification of point mutation, short insertion and
We go thought every column of the alignment and see
how many alleles are found and how many are
different to the reference genome.
Complexity of variant detection
2nd generation sequencing is NOT single molecule sequencing
Due to the PCR amplification, some DNA fragments will be
sequenced more often than others => results in uneven
coverage across the genome.
This would provide false support in variant detection, as we
are usually more confident in variants that has higher
coverage support.
Solution: Mark or remove exact duplicate reads when doing
variant detection.
Complexity of variant detection
Cloning process artifacts (e.g. PCR induced mutations).
Error rate associated with the sequence reads.
Error rate associated with the mapping.
Reliability of the reference genome.
Calling a variant
A (ref): 0%
G: 100%
A (ref): 7%
T: 93%
A hard cut-off in percentage
of difference to reference
75% as minimum threshold
for a variant to be call
homozygous variant.
Percentage based cut-off
assumes you have sufficient
When to call a variant ?
A: 18%
C: 0%
G: 55%
T: 27%
Alignment considerations
Perform local realignment and calculate mapping score
to determine which one is better.
What depth do I need ?
Factors to consider
Read length
Sequencing depth
Require sufficient depth, ~30x
Base call quality for each supporting bases
Longer reads are more likely to be mapped with high confidences
Use high quality bases, Q30
Mapping quality
Local realignment to improve variant calling
Related flashcards

Population genetics

22 cards


17 cards

Mitochondrial diseases

16 cards


23 cards

Create Flashcards