A case study on the use of bioinformatic tools to analyse next generation  sequencing data: whole‐exome sequencing to study 

advertisement
A case study on the use of bioinformatic
tools to analyse next generation sequencing data:
whole‐exome sequencing to study predisposition for breast cancer
Daniel Park, PhD(CANTAB)
Genetic Epidemiology Laboratory
Department of Pathology
The University of Melbourne
Familial breast cancer
Using data from twin and cancer registries in Sweden, Denmark and Finland (547 pairs of identical twins and 1075 pairs of non‐identical twins)
Source: Sprecher Institute for Comparative Cancer Research, Cornell University
Familial breast cancer
Proportion of women from the Australian Breast Cancer Family Study with breast cancer diagnosed before age 40 years and a strong family history of breast cancer whose cancers have been explained by currently identified breast cancer susceptibility genes.
Genes and breast cancer (so far)
BRCA1
BRCA2
TP53
Penetrance
ATM
PALB2 PTEN
CHEK2
CLINICALLY RELEVANT
Common SNPs
Frequency
Study approaches to date
• Linkage
– e.g. BRCA1, BRCA2
• Candidate genes
– e.g. RAD51C, CHEK2, PALB2
• Genome‐wide association studies
– ~20 common SNPs exhibiting risk ratios of ~1.1‐
1.2 e.g. FGFR2
Massively parallel sequencing
Decoding coloured blobs
SOLiD chemistry
SOLiD chemistry
SOLiD chemistry
Our approach
• Highly selected pedigrees
– Number of cases
– Age at onset
– Previously screened for known risk genes
• Second cousins
• Germline DNA
• Exomes
Data
Massively parallel sequencing:
analysis
“Here, the
defective
reverse
parking
gene.”
Supercomputer‐based analysis
/vlsci/VR0053/
jdavis/
djpark/
shared/
script/
bfast/
bioscope/
data/
fhammet/
fodefrey/
Exome_analyses/
ref/
SAMtools/
Picard/
BEDtools/
results/
sample1/
sample1/
sample1/
sample1/
sample1/
sample1/
bioscope/
sample2/
sample2/
sample2/
sample2/
sample2/
sample2/
bfast/
SIFT
SIFT software applied to remaining variants to predict likelihood of a ‘damaging’ effect based on phylogenetic conservation and nature of amino acid change
diBayes principle
The joint probability function is:
P(G,S,R) = P(G | S,R)P(S | R)P(R)
where the names of the variables have
been abbreviated to G = Grass wet, S =
Sprinkler, and R = Rain.
The model can answer questions like
"What is the probability that it is raining,
given the grass is wet?" by using the
conditional probability formula and
summing over all nuisance variables:
Some locally re‐aligned data
Case study 1: FAN1(R377W)
FAN1 is required for resistance to the DNA‐crosslinking agent mitomycin C (Smogorzewska et al. 2010)
FAN1 binds FAND2 and is recruited to sites of DNA damage, supporting a role in the FA‐dependent pathway of ICL repair (Kratz et al. 2010)
70
67
+
*
41
59
+
70
+
*
*
40
+
34
33
+
Mmel39
+
FAN1 exhibits flap nuclease activity and can cleave branched DNA structures (Liu et al. 2010)
Case study 2: GeneX
• ‘Frontline’ role in homologous recombination repair of DNA damage
• Rare 2 base (frame‐shifting) deletion in three families affected by breast cancer but not thousands of unaffected families
• Rare R>W predicted damaging variant in two families affected by breast cancer but not thousands of unaffected families
• Ongoing population‐based case‐control screening and further screening in multiple‐case families
Acknowledgements
•
•
•
•
•
•
•
Genetic Epidemiology Lab
University of Utah
Australian Breast Cancer Family Study
Breast Cancer Family Registry
IARC
Cancer Council Victoria
Victorian Life Sciences Computation Initiative
Download