Software of each Methods used for SV detections

advertisement
Structural Variation Detection
Using NGS technology
Ke Lin
23rd Feb, 2012
Content
•
•
•
Introduction
Methods and software used for SV detection
Exercises
Introduction
What is Structural Variation?
• variation in structure of chromosomes in one species
• using FISH to detect and localize the presence or
absence of specific DNA sequences
Introduction
What is Structural Variation?
• a region of DNA include inversions, balanced
translocation and genomic imbalances (CNV)
• approximately 1kb or greater in size
• many of SVs are associated with genetic diseases
Introduction
What can NGS do to detect SV?
• hypothesis: the reference genome of the species is
available
• re-sequencing of other individuals of the species with
shallow genome coverage (< 30X)
• paired-end sequencing
Introduction
What can NGS do to detect SV?
Introduction
What can NGS do to detect SV?
Methods used for SV detections
1. local (de novo) assembly and then align
assembled sequences to reference genomes
Methods used for SV detections
1. local assembly and then align assembled
sequences to reference genomes
Methods used for SV detections
1. local (de novo) assembly and then align
assembled sequences to reference genomes
• accurate but costly
• the genomes of individuals within one
species should be quite similar on
sequence level
Methods used for SV detections
2. map reads to reference genomes and deduce the
SV according to expected insert size of the pairs
• not accurate enough but much less cost
• lots of methods were developed
• downstream analysis can help to increase
the accuracy
Methods used for SV detections
Signatures used for SV discovery
• PEM (Paired End Mapping)
Methods used for SV detections
Signatures used for SV discovery
• PEM (Paired End Mapping)
1.paired end reads have to both mapped
to references
2.reads need to align without gaps
Methods used for SV detections
Signatures used for SV discovery
• DOC (Depth Of Coverage)
Methods used for SV detections
Signatures used for SV discovery
• DOC (Depth Of Coverage)
1.don't know where the copies occur
2.not able to detect insertions of novel
sequence
Methods used for SV detections
Signatures used for SV discovery
• Split reads
Methods used for SV detections
Signatures used for SV discovery
• Split reads
1.gaps introduced is size limited (allow a
few base pairs)
2.novel sequence insertions will not be
complete if the local assembly of
hanging reads are substantially larger
than the insert size
Software of each Methods used for SV detections
• PEM
1.BreakDancer
Input: BWA mapping output, bam format
Command:
bam2cfg.pl -g -h bamfile1 bamfile2 .. > configure_file
Output: Configuration file for next process
Software of each Methods used for SV detections
• PEM
1.BreakDancer
Software of each Methods used for SV detections
• PEM
1.BreakDancer
Software of each Methods used for SV detections
• PEM
1.BreakDancer
Input: configuration file
Command:
breakdancer_max -h -g int.bed -o chromosome cfg_file > output
Output: tab delimited file
Software of each Methods used for SV detections
1. Chromosome 1
2. Position 1
3. Orientation 1
4. Chromosome 2
5. Position 2
6. Orientation 2
7. Type of a SV
8. Size of a SV
9. Confidence Score
10. Total number of supporting read pairs
11. Total number of supporting read pairs from each
bam/library
12. Estimated allele frequency (if -h)
13 - end. copy number for each bam/library
Software of each Methods used for SV detections
• DOC
1.cnD
Input: BWA mapping output, bam format
Command:
samtools pileup -c bamfile | pileup2win.pl > output_file
Output: windows file for next process
Software of each Methods used for SV detections
• DOC
1.cnD
Input: windows file
Command:
cnD.x86-64 --prefix=lib_name --nohet windows_file1
cat lib*_viterbi.txt > viterbi.txt
metaCaller.pl --threshold=value viterbi.txt > metacalls.txt
extractCNChanges.pl metacalls.txt > output
Output: tab delimited file
chr start pos end pos
Gain/Loss
Software of each Methods used for SV detections
• Split reads
1.Pindel
Input: configuration file
Command:
pindel_x86_64 -f ref.fasta -i cfg_file -c ALL -o name
Output: files with indicative names
D = deletion, SI = short insertion, INV = inversion
TD = tandem duplication, LI = large insertion, BP = unassigned
Downstream Analysis after SV detections
• Local assembly of SV regions
• Annotation of novel insertion
• Fine tune potential changed gene model
Downstream Analysis after SV detections
• Local assembly of SV regions
• Annotation of novel insertion
• Fine tune potential changed gene model
Exercises:
Find all deletions in chromosome1 using BreakDancer.
Try to do it using cnD (gene loss) and Pindel respectively.
The input file can be found:
/mnt/geninf15/work/bif_course_2012/SV/exercises/
The documentation of each program can be found:
/mnt/geninf15/work/bif_course_2012/SV/DOC/
Download