Document 13355392

advertisement
Motivation: modernizing (life) science
2
Homo biologicus in his natural habitat
3
A mailing list for the Homo biologicus
4
Sharing a calendar for the Homo biologicus
5
/*
* determines ridges in htm expression table
*/
#include "ridge.h"
int selecthtm(PGconn *conn, char *htmtablename, char *chromname, PGresult *htmtable)
{
char querystring[256];
sprintf("SELECT * FROM %s WHERE chrom = %s ORDER BY genstart", htmtablename, chromname);
htmtable = PQexec(conn, querystring);
}
return(validquery(htmtable, querystring));
int is_ridge(PGresult *htmtable, int
/* determines if mincount genes in a
/* pre: htmtable is valid and sorted
/* post:
{
if (mincount<=0)
row, double exprthreshold, int mincount)
row are (part of) a ridge */
on genStart (ascending)
return TRUE;
if (row>=PQntuples(htmtable)) return FALSE;
}
int main()
{
if(PQgetvalue(htmtable, 0, PQfnumber(htmtable, "movmed39expr")) < exprthreshold)
{
return FALSE;
}
return(is_ridge(htmtable, ++row, exprthreshold, --mincount));
PGconn
*conn;
/* holds database connection */
char querystring[256]; /* query string */
PGresult *result;
int i;
conn = PQconnectdb("dbname=htm port=6400 user=mroos password=geheim");
if (PQstatus(conn)==CONNECTION_BAD)
{
fprintf(stderr, "connection to database failed.\n");
fprintf(stderr, "%s", PQerrorMessage(conn));
exit(1);
}
else printf("Connection ok\n");
sprintf(querystring, "SELECT * FROM chromosomes");
printf("%s\n", querystring);
result = PQexec(conn, querystring);
if (validquery(result, querystring))
{
printresults(result);
}
else
{
PQclear(result);
PQfinish(conn);
return FALSE;
}
}
PQclear(result);
PQfinish(conn);
return TRUE;
int printresults(PGresult *tuples)
{
int i;
}
14/09/2009
for (i=0; i< PQntuples(tuples) && i < 10; i++)
{
printf("%d, ", i);
printf("%s\n", PQgetvalue(tuples,i,0));
}
return TRUE;
int validquery(PGresult *result, char *querystring)
{
printf(" in validquery\n");
if (PQresultStatus(result) != PGRES_TUPLES_OK)
{
printf("Query %s failed.\n", querystring);
fprintf(stderr, "Query %s failed.\n", querystring);
return FALSE;
}
return TRUE;
}
BioAID
6
1070 databases
Nucleic Acids Research Jan 2008
(96 in Jan 2001)
Proteomics, Genomics,
Transcriptomics, Protein sequence
prediction, Phenotypic studies,
Phylogeny, Sequence analysis,
Protein Structure prediction,
Protein-protein interaction,
Metabolomics, Model organism
collections, Systems Biology,
Epidemiology, etcetera …
All with a splendid interface
… all different, of course
7
Homo biologicus’ bioinformatics
Local
Database
Local
Database
14/09/2009
BioAID
8
Homo digitalis in his natural habitat
9
Homo digitalis in his natural habitat
10
Workflows
11
Semantic Web (Linked Open Data)
12
Homo biologicus
Lots of data
to deal with
Single tiny brain
Lots of knowledge
to deal with
Lots of methods
and algorithms to try
and combine
No
computational
superpowers
A needy biologist
13
Homo biologicus enhancis
Lots of
accessible data
Knowledge bases
to query
Web Services,
Workflows,
and their creators
available
Community
brain power
Other people’s
computational
superpowers
An enhanced biologist
14
e-Laboratories and e-Laboratory factories
15
Context: BioAssist Bioinformatics Support
16
An existing & acknowledged ‘e-Laboratory’
17
In the e-Laboratory Factory
• Galaxy as front end
• Workflows & Web Services
• Grid enabled Taverna
• MOLGENIS
• Semantic/Concept Web
• myExperiment/BioCatalogue
• Scientific Research Objects
18
Scientific Research Object 1.0
19
Anatomy of a Research Object
Research questions on SROs
• How to record & represent scientific
collections?
– OAI-ORE serialised in RDF
– Life cycle modeled on scientific publication:
Draft->Review->Publication->Deprecation
• How do we describe the resources within
our Research Object?
– Dublin Core , SIOC, Research Object Upper Model (ROUM)
• How to capture/represent Trustworthiness?
• How much are scientists willing to share?
A pilot
• Create an executable Scientific
Research Object that holds
– Galaxy tool models + resources
– Taverna workflow(s)
– MOLGENIS data models and UI models
– Metadata on the experiment
• Tab delimited, linked to concepts
• Samples, subjects, observations, etc.
– Other (references) to datasets
• Execution through Galaxy
22
SRO = a pack of models
- Tool models
SRO enactment = a running e-laboratory
- Data/ui models
- Flow models
+Attached data
Model
SROs
Tools
my protocols
my data
my protocols
my data
mashup
data
Flows
e-bioinformatician
programmatic interaction
user interfacing
Data
2.0
mashup
tools
e-biologist
e-Galaxy mock-up
Your Scientific Research Object
Running workflow
MOLGENIS
Convert
Import/Export
Research Object
Store
Configure
Run
MOLGENIS
Convert
Import/Export
Research Objects
Store
Configure
Run
Related research and
documents
Adlsjflad jslf adsflkj alfd adsf
Adflja dlfkjal adlfj lakdjflkj adf
Adflkj lakjlkjadsf lakdfjlf ladoioewn
Jlakdsfo oiuw fja oija oisdflv oaijdf
24
e-Galaxy mock-up
Running workflow
MOLGENIS
Convert
Import/Export
Research Objects
Store
Configure
Run
Related research and
documents
Adlsjflad jslf adsflkj alfd adsf
Adflja dlfkjal adlfj lakdjflkj adf
Adflkj lakjlkjadsf lakdfjlf ladoioewn
Jlakdsfo oiuw fja oija oisdflv oaijdf
25
e-Galaxy mock-up
Suggestions
by semantic components
Your Scientific Research Object
Underlying workflow
MOLGENIS
Convert
Import/Export
Research Objects
Store
Configure
Run
Related research and
documents
Adlsjflad jslf adsflkj alfd adsf
Adflja dlfkjal adlfj lakdjflkj adf
Adflkj lakjlkjadsf lakdfjlf ladoioewn
Jlakdsfo oiuw fja oija oisdflv oaijdf
26
e-Science requirement: Reuse
27
http://www.epigenius.org/ (mock-up)
28
E-Lab Vacancies in the Netherlands (blatant advertisement)
http://snipurl.com/elabjobs [OMII-UK(myGrid)/NBIC collaboration]
• Software engineer e-Laboratories
– Taverna components for Galaxy/Liferay
– Semantic/Concept Web meta-analysis
• e-biologist (PhD student)
– Workflow & semantic web applied to
epigenetics => epiGenius portal
• Grid engineers & post-doc medical applications
– http://www.vl-e.nl/vlemed/vacancies.html
29
BioAssist Requirements
• Help bioinformaticians help biologists
• Serve bioinformatics community
– Analysis pipelines parse large datasets
– Local/external, small/large databases
– Data for humans and machines
– Knowledge for humans and machines
* Plugin developed by Richard Holland (Eagle Genomics) for SARA and NBIC
31
More specifically Requirements and tools
• Analysis pipelines that parse large datasets
Taverna with plugin for Grid access*
Taverna platform for e-Labs
• Biological databases (small and large)
MOLGENIS
REST/SOAP/CSV
Galaxy/Taverna
• Data to be exchanged by humans and machines
Scientific Research Objects/myExperiment PACKs
• Biological meaning disclosed and linked
Concept Web and Semantic Web
Semantically enabled Taverna
* Plugin developed by Richard Holland (Eagle Genomics) for SARA and NBIC
32
MOLGENIS research portal generator:
Input:
model of
my research
Output:
auto-generates
software files
Rich user interfaces
for biologists
plugin your
handwritten scripts
(tools,workflows)
Programming interfaces
for bioinformaticians
Connect to R
statistics
m<-find.markers()
544 markers downloaded.
…
library(qtl)
#qtl analysis here
Workflow
ready webservices
Rich documentation
and UML diagrams
add.data(qtl, name = “QTLs”)
2,448,000 data elements added.
CSV exchange
format
strain.txt
specie s.txt
protocol.txt
probe.txt
m ark er.txt
in vestigation.txt
ind ivid ual.txt
gene .txt
data.txt
constant.properties
data
Strongly
typed
framework
t
Data storage optimized
for HTP genomics
db
files
http://www.molgenis.org
Swertz & Jansen (2007) Nature Reviews Genetics 8, 235-243
A putative scenario
in addition
With Galaxy
With ‘e-Galaxy’
• Select genome annotation
track from UCSC genome
browser, load into Galaxy
• Combine with other data
resources and local data
• Perform a region selection
algorithm
• Collect regions of interest
• Save successful steps
• Data disclosed with
MOLGENIS
• References to datasets
stored in SRO
• Run region selection
workflow (show process)
• Run meta-analysis
– Parse metadata for concepts
– Run meta-analysis
– Present additional information
34
Download