Tengcha + Trellis * generic middleware for retrieving data from Chado

advertisement
Tengcha – generic
middleware for
retrieving data from
Chado
Justin Reese
GMOD Meeting
April 5, 2012
Summary
 Tengcha is a plug-in to the Trellis framework that allows data
to be read from Chado
 Written in Java
 Tengcha is used by WebApollo to read data from our Chado
db’s to help people manually annotate
 Tengcha can be used as generic middleware to:
 read data from Chado dbs
 output as Das or Jbrowse style JSON
 Source code lives here on Google code:
https://genomancer.googlecode.com/svn/trunk
Reading data into WebApollo
Poka Plugin
UCSC
MySQL
database
Ivy Plugin
Ensembl
DAS Data
Model
Web
Apollo
Jbrowse-flavored
JSON (NClists)
BAM alignment
files
Problem – lots of data in Chado
databases
 Much of our (and others’) data lives in Chado
databases:
 protein alignments
 gene calls
 RNAseq data/expression data
 etc.
 Could convert data to JSON and get JBROWSE to
handle the data, but it’d be easier if we pulled it directly
from Chado database
Reading data into WebApollo
Tengcha Plugin
Chado
GBOL
Poka Plugin
UCSC
MySQL
database
DAS Data
Model
Web
Apollo
Ivy Plugin
Ensembl
Jbrowse-flavored
JSON (NClists)
BAM alignment
files
Tengcha
 Trellis is a java-based plug-in to Trellis framework
 Trellis can read data from many places:
 UCSC (via Poka plug-in)
 DAS servers (via Ivy plug-in)
 previously no plug-in to read data from Chado
 Trellis can output data in a few formats:
 Das2
 JSON (Jbrowse-flavored JSON)
 Possibly Das1 in the future?
 Design goals:
 should read data from all standard Chado databases (not just our Chado
databases) with data loaded using GMOD bulk loader, with very minimal
configuration
 should be easily configurable to read data from non-standard Chado
database
 should be reasonably fast (Chado is normalized, can be slow…)
 should be thoroughly unit-tested
Configuring Tengcha
 Configurable items:
 how to connect to Chado – database host, id/pw, port:
genomancer/tengcha/src/hibernate_cfg.xml
 cv and cvterm of reference sequence features (default:
scaffold):
genomancer/tengcha/Config.java
 cvterm for parent/child relationships in featurerelationship
cvterms (default: part_of, derived_from):
genomancer/tengcha/Config.java
 Configuration for non-standard Chado:
 edit hibernate XML mappings for Chado tables,
Tengcha as a generic tool for
reading from Chado
 Easy interoperability b/t Chado and anything that
speaks Das
 Output Chado features in
 Das (XML)
 Nested-containment lists (JSON)
 Caching of painful reads (highly configurable caching
through hibernate)
 Java-based, if you like that sort of thing
For the Chado mavens
 Relevant tables:
feature
featureloc
featurerelationship
analysis
analysisfeature
cv
cvterm
 If you haven’t altered these, your non-standard Chados
should work out of the box…
 Live demo
 Source code lives here on Google code:
https://genomancer.googlecode.com/svn/trunk
 We’d be glad to help you hook it up to your Chado
ขอบคุณ
 LBNL




Ed Lee
Gregg Helt
Nomi Harris
Suzanna Lewis
 UC Berkeley
 Mitch Skinner
 Rob Buels
 Ian Holmes
 Georgetown University





Chris Childers
Justin Reese
Mónica Muñoz-Torres
Jay Sundaram
Christine Elsik
Download