Transcriptome Profiling in Human Congenital Heart Disease

Transcriptome Profiling
of Human Cardiac Tissues
in Hypoplastic Left Heart
Karl D. Stamm, MS
Donna K. Mahnke, MS; Mary A. Goetsch, MS;
D. Woodrow Benson, MD, PhD; Xing Li, PhD;
Aoy Tomita-Mitchell, PhD; Timothy J. Nelson, MD, PhD;
James S. Tweddell, MD; Michael E. Mitchell, MD
September 2013 Research Update
• Medical Research
• Trouble with humans
• Rare diseases are common in a large enough population
• Next-Generation Sequencing Tech
• Illumina HiSeq methodology
• Differential expression
• Further Mining
• Principle components analyses
• Gene profiles and the self-organizing-map
Trouble with Humans
• Small sample sizes
• Low statistical power
• High interpersonal variability
• Ethnic backgrounds imply metabolic differences
• Phenocopy
• Multiple distinct diseases showing identical presentation
• Confounds clustering or association studies
• Ruins Case/Control study power
• PHI – Private/Protected Health Information
• Data security is paramount
• Cross-disciplinary collaborations are limited
• DNA is theoretically but not practically identifiable
Congenital Heart Defect
• Down Syndrome 1:700 live births
• 50-60% have some structural heart defect
• 22qD Syndrome 1:4000 live births
• 75-90% have some structural heart defect
• ‘Healthy’
99:100 live births
• 0.8% have some structural heart defect
C.H.D. in particular
Hypoplastic Left Heart Syndrome
1 in 40 CHD cases are HLHS
2.5 : 10000 of all births
• Complex developmental
• 100% fatal before the
invention of the
Norwood Procedure
• No multigenerational
• Spontaneous mutation:
immune to detection by
genetic linkage
All sequencing costs for this study provided by
Generate Reads – Illumina Tech
10 to 500 million short reads are generated in pairs, 2x50 to 2x100 bp each.
Align Reads to Reference
Which one?
NCBI #37.3 has 3.1 billion bases across 190 contiguous scaffolds
UCSC hg19 has 3.2 billion bases across 163 contiguous scaffolds
Haploid reference contains disease alleles and chimeric sequence
like an A+B+O blood type.
Image of patches modifying the CHR17 reference from 2011 according to Ensembl
Millions of Variants
• The 1000 Genomes project found 38 million SNPs, 1.4 million
short insertions or deletions, and more than 14 thousand
larger deletions
• The NHLBI Exome Sequencing Project targeted 22MBases
across 2,440 individuals and found 563,700 variants, 82% of
which were novel. They averaged 200 novel, coding mutations
per person.
• We find about 150-300 thousand SNVs in an exome, 10% of
which are nonsynonymous
• SAMTOOLS is the software of choice for variant calling relative
to your reference genome.
• CCG/Proline -> CTG/Leucine
• HOPX is a gene known to regulate heart development!
• Very common mutation
RNA-Seq vs. Whole Genome
Extract and purify
mRNA by
Convert spliced
Run standard
sequencing on the
Result: Expression
level dependent
sequence coverage
Image found at
RNA-Seq Reconstructs Transcripts
From the CuffLinks paper, Trapnell et al.
Nature Biotechnology Volume: 28, Pages:
511–515 Year published: (2010)
IGV – aligned reads viewer
Simple arbitrary feature read depth counting.
-Count by gene, exon, whatever
BEDTOOLS : a flexible suite of utilities for comparing genomic features.
Example of bad alignment
Variance and mean linked by local regression - for robust parameter estimation.
• Negative Binomial
• Models count as ‘binomial successes until a set number of failures’ which
better fits the RNA-Seq fragment generation (limited reagent)
• Allows/captures the ‘overdispersion’ seen in RNA-Seq experiments.
Scale the totals for compatible
Mean-Variance Connection
Detection in Low Values
Per-gene mean by difference ratio
• Starting from 18,000 Rsids minus 1200 NA
• 1000 entries p<0.05
Big lists
Noisy data
Complex correlation
Heterogeneous background
Precious Tissue Samples
• Collecting tissue during surgery
is an extra burden placed on
overloaded surgical teams.
• Samples must be processed
carefully to avoid degradation of
sensitive molecules.
• Many steps and costs prior to
gene sequencing.
• Collaborators have provided 35
patients’ atrial septal tissues.
• Still no ethical source of healthy
• Hope to see separation between red/notred or solid/notsolid points
• Lack of discrimination in major variation dimensions
• Implying uncontrolled heterogeneity dominates
Therefore, more difference person to person than between subtypes
Top25 Consistent Genes
• Anyone know what it means when Adducin2 and HomeoboxA4 are
overexpressed? Is it significant that a dehydrogenase is under-expressed?
Group Profiles
at Selected Dimensions
Self-Organizing Map
Kohonen 1990
Halfway between neural networks and k-means (horrible oversimplification)
Enforced grid layout and local neighborhood similarity
Data points (here 25-dimensional vectors) lay out in natural organization
Stochastic - Iteration
Pairwise Similarity
• Co-clustering frequency determines sample similarity
• Sub-clusters are identified organically
• Lists of genes differential across conditions
• Many conditions, uncertain homogeneity
• List cutoff subjective
• No healthy control group
• We can mine these lists for pathways or biological processes
• Resulting in more lists of more complex results
Transcriptome Project
Future Work
• A few more samples are coming… Can we build a classifier?
• Predict non-measured variables? Signatures of immune
response point towards treatment targets.
• Predict compensatory effects? Samples are taken just days
after birth, but 8 months after the heart started beating.
• How else we could look at this rich, unique dataset?
Thanks for listening
Related flashcards

71 Cards


21 Cards

Create flashcards