IWPLS’09 | m.a.swertz@rug.nl | 1 2009 MOLGENIS rapid prototyping of data portals for life science projects Morris A. Swertz, K Joeri van der Velde, Joris Lops, Martijn Dijkstra, Peter Horvatovich, Marco Roos, Helen Parkinson, Ritsert C. Jansen IWPLS 2009 September 14, Edinburgh EBI = Family of bespoke data portals IWPLS’09 | m.a.swertz@rug.nl | 2 Example MOLGENIS projects: Locus Specific database Biobanking/phenotypes NextGen sequencing Proteo/Metabolomics Animal Observations Why? How? What? .org Why: Biological challenges IWPLS’09 | m.a.swertz@rug.nl | 3 10 ! Many materials ! HTP processes ! Complex, distr. workflows ! Large data ! Trace dependencies ! Adapt to new protocols 2.500.000 10.000 panel genome markers material process 10,0000 phenotype inbreed 1000 1,000,000 100,000 Collaborating biologists individuals genotype genotypes map 10,000,00 10,000,00 100,000 QTL profiles correlate arab 220903 Koornneef0007 526 (11.117) AM (Top,4, Ar,10000.0,556.28,0.70,LS 10); Sm (Mn, 2x1.00); Sb (1,40.00 ) 1.40e3 100 171.1702 1396 649.3804 551 % 526.3066 650.3882 224 248 172.1795 162 809.4496;80 hybridize LC/MS 100 microarray s 0 100 expression spectra s 200 300 400 500 600 700 800 900 m /z 1000 preprocess normalize norm true peaks exprs. network 100,000 probes .org Why: Informatics challenges IWPLS’09 | m.a.swertz@rug.nl | 4 10 10.000 10.000 panel genome markers inbreed 100 1,000,000 individuals genotype genotypes 100,000 10,000 map QTL profiles correlate 10,000,00 hybridize expressions 100 100,000 microarrays probes normalize norm exprs. ! Complex engineering ! Time intensive ! Hitting moving targets ! Reinventing wheels ! Hard to integrate network design & assemble 10 UI data tools workflows 2.500.000 10.000 panel genome tools markers phenotype 1000 1,000,000 individuals genotype genotypes map 10,000,00 100,000 design & assemble 100,000 QTL profiles UI data correlate arab 220903 Koornneef0007 526 (11.117) AM (Top,4, Ar,10000.0,556.28,0.70,LS 10); Sm (Mn, 2x1.00); Sb (1,40.00 ) 1.40e3 100 171.1702 1396 649.3804 551 % 526.3066 650.3882 224 248 172.1795 162 809.4496;80 LC/MS 0 100 200 spectra 300 400 500 600 700 800 900 m /z 1000 preproces s true peaks network workflows .org How: generative methods IWPLS’09 | m.a.swertz@rug.nl | 5 Step 1: Model variation Points (biology) Model Modelofofa avariant variant <!-<!-- entity entity organization organization --> --> <entity <entity name="Experiment" name="Experiment" label="Experiment"> label="Experiment"> <field name="ExperimentID" <field name="ExperimentID" key="1“ key="1“ readonly="true" readonly="true" label=" ExperimentID(autonum)"/> label="ExperimentID(autonum)"/> <field <field name="Medium" name="Medium" type="xref" type="xref" xref_field="Medium.name"/> xref_field="Medium.name"/> /> /> <field <field name="Protocol" name="Protocol" label="Experiment label="Experiment Protocol"/> Protocol"/> <field <field name="Temperature" name="Temperature" type="int" type="int" Reusable framework and generators + Step 2. Automate common Patterns (informatics) 10.000 strains 10.000 genome markers inbreed 100 individuals 1,000,000 genotype genotypes 100,000 hybridize expression s 100 100,000 microarray s probes 10,000 map 10,000,00 preprocess norm exprs. QTL profiles correlate network Step 3. Reuse in family of projects .org How: generative methods Model new protocols IWPLS’09 | m.a.swertz@rug.nl | 6 6 Your model Model of a variant <!-<!-- entity entity organization organization --> --> <entity <entity name="Experiment" name="Experiment" label="Experiment"> label="Experiment"> <field name="ExperimentID" <field name="ExperimentID" key="1“ key="1“ readonly="true" readonly="true" label=" ExperimentID(autonum)"/> label="ExperimentID(autonum)"/> <field <field name="Medium" name="Medium" type="xref" type="xref" xref_field="Medium.name"/> xref_field="Medium.name"/> /> /> <field <field name="Protocol" name="Protocol" label="Experiment label="Experiment Protocol"/> Protocol"/> <field <field name="Temperature" name="Temperature" type="int" type="int" 10 + Use new protocols 2.500.000 10.000 panel Reusable framework and generators genome 6 markers phenotype 1000 1,000,000 individuals genotype genotypes 100,000 map 10,000,00 100,000 QTL profiles correlate arab 220903 Koornneef0007 526 (11.117) AM (Top,4, Ar,10000.0,556.28,0.70,LS 10); Sm (Mn, 2x1.00); Sb (1,40.00 ) 1.40e3 100 171.1702 1396 649.3804 551 % 526.3066 650.3882 224 248 172.1795 162 809.4496;80 LC/MS 0 100 200 spectra 300 400 500 600 700 800 900 m /z 1000 preproces s true peaks network .org How: generative methods Add features once IWPLS’09 | m.a.swertz@rug.nl | 7 Your model Model of a variant <!-<!-- entity entity organization organization --> --> <entity <entity name="Experiment" name="Experiment" label="Experiment"> label="Experiment"> <field name="ExperimentID" <field name="ExperimentID" key="1“ key="1“ readonly="true" readonly="true" label=" ExperimentID(autonum)"/> label="ExperimentID(autonum)"/> <field <field name="Medium" name="Medium" type="xref" type="xref" xref_field="Medium.name"/> xref_field="Medium.name"/> /> /> <field <field name="Protocol" name="Protocol" label="Experiment label="Experiment Protocol"/> Protocol"/> <field <field name="Temperature" name="Temperature" type="int" type="int" Reusable framework and generators + 10.000 strains Added automatically 10.000 genome markers inbreed 100 1,000,000 10,000 7 individuals genotype genotypes 100,000 hybridize expression s 100 100,000 microarray s probes map 10,000,00 preprocess norm exprs. QTL profiles correlate network .org Implementation IWPLS’09 | m.a.swertz@rug.nl | 8 customize... Model file XML MyScript Plugins user interaction infrastructure Generate FormGen TreeGen MenuGen PluginGen APIs in Java, R, Web services and HTTP MatrixGen Communication infrastructure JDBCMapGen JTypeGen JReadCsvGen JListGen RListGen JDatabaseGen RMatrixGen HSQLGen WSGen data infrastructure MySQLGen .org Family of bespoke data portals IWPLS’09 | m.a.swertz@rug.nl | 9 Locus Specific database Biobanking/phenotypes NextGen sequencing Proteo/Metabolomics Animal Observations .org Date 24.06.2009 | 10 What? A practical example: XGAP - eXtensible portal for Genotype And Phenotype data Download and install IWPLS’09 | m.a.swertz@rug.nl | 11 molgenis.org or molgenis.sourceforge.net: Apache TOMCAT .org Date 24.06.2009 | 12 Modeling* Date 24.06.2009 | m.a.swertz@ru Experiment ID : autoid Name : varchar Experiment Experiment Experiment 1 1 1 Trait probes Assay ID : autoid Name : varchar Subject ID : autoid Name : varchar Row 1 Assay 1 Data ID : autoid Value : object ID : autoid Name : varchar Column 1 expressions *Can also extract automatically from an existing database individuals Experiment ID : autoid Name : varchar Experiment Experiment Experiment 1 1 1 Trait Assay Subject ID : autoid ID :|autoid Date 24.06.2009 m.a.swertz@ru Name : varchar Name : varchar ID : autoid Name : varchar Row 1 Assay 1 Data ID : autoid Value : object Column 1 Date 24.06.2009 | 15 Iterative modeling/re-generation IWPLS’09 | m.a.swertz@rug.nl | 16 <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE molgenis PUBLIC "MOLGENIS 1.0" "http://molgenis.sourceforge.net/dtd/molgenis_v_1_0.dtd" > <molgenis name="xgap" label="XGAP - eXtensible Genotype and Phenotype database"> <!-- INVESTIGATION --> <module name="xgap.core"> <description>Core entities.</description> <entity name="Investigation" extends="FugeInvestigation"> <unique fields="name" description="Name is unique" /> </entity> <entity name="ProtocolApplication" extends="FugeProtocolApplication"> <field name="Status" type="enum" enum_options="[inprocess, final]" default="inprocess" description="The status of this protocolapplication (inprocess = still working on it, <field name="Investigation" type="xref" xref_entity="Investigation" xref_field="id" xref_label="name" description="Reference to the Investigation this protocolapplication belongs to."/> <unique fields="name,Investigation" description="Name is unique within an Investigation" /> </entity> <!-- DATA --> <entity name="Data" extends="FugeData"> <description> Generic structure for describing data matrices such as genotype result, gene expression measurement, QTL calculation, etc. </description> <field name="Investigation" type="xref" xref_entity="Investigation" xref_field="id" xref_label="name" description="Reference to the Investigation this data is measured as part of."/> <!--field name="DataType" type="xref" xref_entity="DataType" xref_field="id" xref_label="name" description="Added to distinguish betweenqtl and raw data etc." /--> <field name="RowType" type="enum" enum_options="[Marker,Probe,ProbeSet,Individual,Sample,PairedSample,MassPeak,Gene,Trai .org Result: data portal user interfaces IWPLS’09 | m.a.swertz@rug.nl | 17 Swertz et al, CASIMIR consortium, GEN2PHEN consortium (submitted) .org Result: CSV exchange format IWPLS’09 | m.a.swertz@rug.nl | 18 annotations data Raw and processed data 1415670_at 1415671_at 1415672_at 1415673_at 1415674_a_at 1415675_at 1415676_a_at 1415677_at 1415678_at 1415679_at 1415680_at 1415681_at 1415682_at 1415683_at BxD1 BxD2 BxD3 BxD4 BxD5 BxD6 BxD7 0,293493 0,687197 0,137687 0,5992 0,691055 0,644053 0,938754 0,124305 0,261548 0,771756 0,022287 0,374063 0,711998 0,526277 0,592037 0,334535 0,173969 0,516279 0,21625 0,970534 0,192734 0,555223 0,992222 0,17998 0,79899 0,505028 0,776323 0,736155 0,585366 0,61328 0,448061 0,977578 0,746478 0,937131 0,782904 name gene 0,374765 chr 0,840321bplocal bpglobal 0,938431 0,272201 0,477756 0,187776 0,54069 0,700227 0,971044 0,486389 0,443447 1415670_at Copg 0,236767 0,717116 6 0,714643 87875328 924571670 0,716683 0,380579 0,592676 0,224927 0,304563 0,687426 1415671_at Atp6v0d1 8 0,285177 108413750 1234908613 0,086303 0,069413 0,601634 0,289336 0,197956 0,820493 0,072161 1415672_at Golga7 8 24706942 1151201805 0,669657 0,578992 0,373976 0,581597 0,561598 0,051069 0,070144 1415673_at Psph 0,428784 0,614857 5 0,763586 130080298 820157654 0,277747 0,716174 0,73642 0,704263 1415674_a_at Trappc4 9 0,087832 44155401 1298908898 0,208313 0,279458 0,063052 0,077388 0,577486 0,063826 0,94562 0,077064 0,735568 0,35094 1415675_at Dpm20,081915 0,109705 2 0,278815 32395013 227443878 0,308529 0,008908 0,793956 0,698222 1415676_a_at Psmb50,304491 0,613119 14 0,055048 53568499 1912894020 annotations 1415677_at 1415678_at 1415679_at 1415680_at Dhrs1 Ppm1a Psenen Anapc1 my GaP researcher 14 12 7 2 54693657 73712802 30270655 128304204 1914019178 1701465838 1017168695 323353069 researcher database Swertz et al, CASIMIR consortium, GEN2PHEN consortium (submitted) .org Result: Java API/plugin templates IWPLS’09 | m.a.swertz@rug.nl | 19 Data import wizard Cluster Job Creation and Monitoring .org Result: R interfaces IWPLS’09 | m.a.swertz@rug.nl | 20 source(“http://localhost:8080/xgap/api/R”) MOLGENIS is connected #download data traits <- get.metabolitedata(name=“mytraits”) 25 metabolite downloaded in 30ms genotypes <- get.markerdata(name=“mygenotypes") 744 marker downloaded in 30ms #calculate ... #upload results for others to use add.data(qtls, name=“myqtls”) 18.600 data items added in 2sec Swertz et al, CASIMIR consortium, GEN2PHEN consortium (submitted) / .org 20 Result: SOAP interface IWPLS’09 | m.a.swertz@rug.nl | Smedley, Swertz, Wolstencroft et al (2008) Brief. in Bioinf. / 21 .org Current work IWPLS’09 | m.a.swertz@rug.nl | 22 › Semantics/ontology integration Ontology browsers OLS/BioPortal RDF interfaces,Semantic query engine › Add tool and workflow models Marco’s talk, Peter’s talk › New generator targets Other portals, like presented@IWPLS Emerging frameworks › myExperiment plugin to share models .org Acknowledgements IWPLS’09 | m.a.swertz@rug.nl | 23 Joeri van der Velde Joris Lops George Byelas Tomasz Adamusiak Danny Arends Martijn Dijkstra Matthijs Kattenberg Ishtiaq Ahmad Ate Boerema Henrikki Almusa Rudi Alberts Bruno M. Tesson Richard A. Scheltema Gonzalo Vera Rodriguez the EU-GEN2PHEN consortium the EU-CASIMIR consortium the NBIC/BioAssist/NPC consortium EBI Helen E. Parkinson Andrew R. Jones Peter Horvatovich Juha Muilu Marco Roos M. Scott Marshall Carole Goble Paul Schofield John M. Hancock Anthony Brookes Klaus Schughart Hans Hillege Engbert O. de Brock Ritsert C. Jansen Cisca Wijmenga Morris Swertz .org Thank you! Questions? Conclusion IWPLS’09 | m.a.swertz@rug.nl | 24 MOLGENIS: OSS toolbox to auto-generate data portals http://www.xgap.org http://www.molgenis.org We have funding and are hiring -PhD student -Postdoc -3 software engineers m.a.swertz@rug.nl › MOLGENIS for data integration: Smedley et al 2009, Brief. in Bioinformatics 9(6):532 › Review of MOLGENIS type of systems Swertz & Jansen 2007, Nature Reviews Genetics 8(3):235 › First MOLGENIS, in those times in PHP Swertz et al 2004, Bioinformatics 20(4)L2075 .org Generator IWPLS’09 | m.a.swertz@rug.nl | 26 .org Plugin/Java part IWPLS’09 | m.a.swertz@rug.nl | 27 .org Plugin / Layout IWPLS’09 | m.a.swertz@rug.nl | 28 .org