PPT presentation - LabKey Software

advertisement
Managing Next Generation Sequencing
and Multiplexed Genotyping Data Using
Open Source LabKey Server
Adam Rauch
adam@labkey.com
© 2010 LabKey Software
www.labkey.com
LabKey Software Company Overview
 LabKey Software is a consulting company
 Spun off from the McIntosh Lab (part owned by FHCRC)
 Professional software engineers from Amazon, Microsoft,
BEA etc
 Work in partnership with scientists
 For-profit fee-for-service contracts
 Non-profit grant sub-awards
– Co-investigators with a shared research agenda
 All development approved by and relevant to FHCRC
 Development & support around LabKey Server
 Extending the base LabKey Server platform
 Creating customized lab-specific solutions
 Hosting LabKey server
 Support
LabKey Software 2010
2
What Is LabKey Server?
An open-source, web-based platform for
organizing, analyzing & sharing scientific data
Data integration analysis for assays
Proteomics, flow cytometry, plate-based assays, etc.
Study Data Management
Combines demographic, clinical, assay & specimen data
LabKey Server powers many deployments…
CPAS: FHCRC proteomics repository
Atlas Science Portal: SCHARP’s HIV vaccine studies
AdaptiveTCR: Customer analytics for ImmunoSEQ NGS
UW (Katze, Heinecke, et al), USC, Markey, Harvard,
IDRI, TGen, Wisconsin Primate EHR, UC Denver, etc.
LabKey Software 2010
3
Dave O’Connor Lab, University of Wisconsin
 Academic research lab
 Focus: understanding SIV using nonhuman
primate models & applying NHP methods to
human HIV disease research
LabKey Software 2010
O’Connor Lab SIV/HIV Research
Host Immune
Genetics
Source: modified from Yewdell et al., Nature
Reviews Immunology 2003
Virus Genetics
Source: Korber et al., British Medical
Bulletin 2001
Importance of MHC Class I
Host Immune
Genetics
Source: modified from Yewdell et al., Nature
Reviews Immunology 2003
MHC class I molecules
dictate immunity to
disease
High degree of
polymorphism within
the MHC class I
peptide-binding
domain
Specific MHC alleles
associated with
superior control of HIV
infection
Importance of Viral Variability
HIV has fast
replication cycle, high
mutation rate
Evolution of the virus
causes escape from
immune responses
Specific mutations are
associated with
resistance to
antiretroviral drug
therapy
Virus Genetics
Source: Korber et al., British Medical
Bulletin 2001
Sequencing in the O’Connor Lab
 2005 – 2009 Sanger sequencing
 “Prohibitively expensive” for most experiments
 2009 Roche/454 GS FLX at UIUC
 2010 Roche/454 GS Junior in lab
 Roche/454 GS Junior




Long-read instrument, critical for genotyping
Identical to GS FLX, but 1/8 throughput & lower cost
~100,000 reads per run (~1¢ per read), average ~560bp read length
115 runs this year
 MID tagging
 Allows pooling multiple samples (30-100) into a single run
 Galaxy server
 Open-source sequence analysis tool (Giardine et al, Genome Res 2005)
 Lab has built custom workflow to match sequences to known MHC alleles
 Uses BLAT, transitioning to AGILE (Northwestern alignment tool)
LabKey Software 2010
8
Roche/454 MHC Workflow
• Total RNA isolation and cDNA synthesis
– RNA isolation ~4 hrs; cDNA synthesis ~2 hrs
• Primary PCR amplification
– plus SPRI purification, quantification, pooling ~3 hrs
• emPCR
– set-up ~1 hr, run ~5.5 hrs
• Breaking and enrichment
– ~3 hrs
• Roche/454 GS Junior run
– set-up ~1.5 hrs; run time ~10 hrs
• Data processing and analysis
– run processing ~2 hrs; analysis time varies
www.454.co
m
There is a real disconnect between the ability to
collect next-generation sequence data (easy)
and the ability to analyze it meaningfully (hard)
Dave O’Connor
PROBLEM: DATA
MANAGEMENT!
LabKey Software 2010
10
Problem: Data Management
 As volume has increased, lab has found it difficult to
manage all their sequencing data & meta data:
 Run meta data
 Run metrics
 Sequencing reads and quality scores
 Sample information and multiplex identifiers (MIDs)
 Reference sequences for genotyping experiments
 Genotyping matches
 O’Connor asked LabKey to build a system that can:
 Store sequencing and genotyping data in a single database that
links all the tables, allowing arbitrary queries and reports
 Provide tools for analysis, querying, visualization and export
 Automate data workflows for efficiency & consistency
 Eventually, link sequencing results to their primate EHR system
LabKey Software 2010
11
LabKey Sequencing System
Reads
Quality Scores
Metrics
Sample
Information
Reference
Sequences
Galaxy
Genotyping
Workflow
Sequencing and Genotyping Database
Reporting
Analysis
Visualization
Export
External Tools
LabKey Software 2010
12
Database Schema
A nalyses (genotyping)
RowId
Run
CreatedBy
Created
Sequences (genotyping)
Dictionar ies (genotyping)
Description
RowId
RowId
Path
Dictionary
Container
FileName
Uid
CreatedBy
Status
AlleleName
Created
SequenceDictionary
Initials
SequencesView
GenbankId
ExptNumber
Comments
Locus
Species
Origin
Sequence
PreviousName
LastEdit
Version
ModifiedBy
Translation
Type
IpdAccession
Reference
M atches (genotyping)
RowId
Analysis
SampleId
Runs (genotyping)
Reads
RowId
[Percent]
MetaDataId
AverageLength
Container
PosReads
A nalysisSamples (genotyping)
NegReads
PosExtReads
Created
SampleId
Path
RegIon
Run
[...]
CreatedBy
Analysis
NegExtReads
M etr ics (genotyping)
M etaData (genotyping)
FileName
Run
Status
[...]
Id
Samples (genotyping)
Variant
UploadId
SampleId
FullLength
[...]
AlleleFamily
Reads (genotyping)
RowId
Run
Name
A llelesJunction (genotyping)
MatchId
SequenceId
ReadsJunction (genotyping)
MatchId
ReadId
Mid
Sequence
Quality
13
Demo
LabKey Software 2010
14
Possible Future Directions
Respond to O’Connor lab’s near-term needs
Genomics-specific analytics
Additional export formats
Tighter integration with Galaxy
Support for amplicon-designated reads
Match combining
Simplify configuration and operation
Integrate with Wisconsin primate EHR
Better integration with R / Bioconductor
Visualization
Other sequencing platforms: Illumina, PacBio…
LabKey Software 2010
15
Acknowledgements
O’Connor Laboratory
David O’Connor
Simon Lank
Julie Karl
Benjamin Bimber
LabKey Software 2010
LabKey Software
Mark Igra
Brian Connolly
Elizabeth Nelson
Josh Eckels
Matthew Bellew
Et al
Questions?
LabKey Software 2010
17
Download