Portals and myGrid Stefan Rennick Egglestone Mixed Reality Laboratory University of Nottingham Introduction to myGrid • a computer science pilot project working in the field of bioinformatics • a consortium of the European Bioinformatics Institute, IT Innovations, 5 universities and some industrial partners • ends June 2005 and other projects will develope infrastructure further Presentation aims • • • • Introduce myGrid Introduce bioinformatics Introduce portal work in myGrid Show some screenshots of portlets Introduction to bioinformatics • how to store, process and publish large volumes of biological data • large databases, access and analysis services • composite processes involve multiple databases and services • Automation through workflows Data in bioinformatics • Commonly genetic sequences – DNA: GCGCATAGCGATGA – Protein: MAHPLGPHGVANA • Meta information – Species, chromosome – Interesting features – Equipment used – First published paper referring to sequence Data storage • 3 international databases aim to store all DNA sequences (EMBL, GenBank, DDBJ) • Protein sequences in SwissProt • Journals require submission before publication • Smaller databases hold specialist information Using bioinformatics data • Database access services – Fetch sequence for given ID – Fetch similar sequences • Sequence analysis – Look for interesting regions of sequence • Sequence prediction – Predict proteins generated by DNA sequence Service interface types • • • • Web-page Command-line tool set Programming language library client SOAP web-service with WSDL interface Using services • Often need to combine services with different interface types • Cut-and-paste from web-page to file and run command-line tool • Repetitive and time-consuming • Can be automated using scripts Workflows myGrid workflow technology • Freefluo workflow enactor • Taverna – graphical workbench allowing users to – Author workflows – Enact and browse results • myGrid Information Repository Authoring a workflow Enacting a workflow Browsing results Including services in workflows • Service invocation done by processor • Generic processor for SOAP/WSDL webservices • Custom processor can wrap custom client • SOAPlab exposes command-line tools as web-service Portal in myGrid • Taverna/Freefluo is production workflow system, so interface can’t be hacked around with • Some interface limitiations – Difficult to start new workflow running using results of enactment – Complex interface, so takes time to master Text services work • If enactment of a workflow produces a SwissProt protein sequence record, can extract from this PubMed ID of first paper referring to this protein • Add extra workflow stages which look up related papers • Might like to re-run these stages as a separate workflow on any new papers found Input form Monitoring progress Results MIR portal work • Taverna/Freefluo/MIR interface caters for expert user • Large numbers of users who won’t write workflows but might enact them • Provide a simpler workflow enactment interface • Portal useful – all biologists have browser on their desk Collections of workflows View workflow View workflow results View individual output param Further details • www.mygrid.org.uk • Twiki.mygrid.org.uk • Stefan Rennick Egglestone (sre@cs.nott.ac.uk0 • Ian Roberts (i.roberts@dcs.sheff.ac.uk) • Presentation and notes will be at www.mrl.nott.ac.uk/~sre