Galaxy_(of_bioinformatics)

advertisement
Martin Senger <martin.senger@kaust.edu.sa>
[using also few slides from the presentations from the
Galaxy Developers Conference 2011]
1.
2.
3.
What Galaxy can do... (or could do)
Show me where I can try for myself
What can we do to make our Galaxy better
...and what this is not
• a detailed tutorial how to use Galaxy
• a way to convince you that I understand
everything about Galaxy

A web-based interface to the command-line tools
(of any kind) and their combinations
(“workflows”)
Galaxy performs analysis interactively through the web,
on arbitrarily large datasets
 Galaxy remembers what it did - history


Flexibility to include anybody’s command-line
tools


by writing wrappers whose templates are available
An environment for sharing tools (or their
wrappers)

“Tools Shed” repository

Locally stored data
user-specific
 shared between users

 e.g. genome builds

Origin of data

uploaded data from your computer
 using a web interface
 using an FTP server

fetched from external databases (“datasources”)
 only those that are “aware” of Galaxy
 internally: two ways how to fetch data (async vs. sync.)
 you need to be familiar with these databases and their UIs
1
2
3
• Data have metadata
• allowing to use
data only for those
tools that
recognize such
data types
• Data have attributes
• annotate data
• convert data to a
new format
• change data type

Automated set of steps – perhaps each time
with different input data (of the same type)
reproducibility (usable in publications)
 reusability (sharing workflows with others)
 created from the scratch (using a workflow editor) or
from your history

An example – a workflow
editor
Thanks to:
user would not have done this from the command line on our
cluster
• http://main.g2.bx.psu.edu/screencast
• If we have time (6mins) click here:
• Creating a workflow from your history

Where are all these galaxies?

public servers
 available immediately, free of charge
 http://main.g2.bx.psu.edu/
 and few others, such as http://galaxy.nbic.nl/
 usually limited resources
 you cannot customize them to your special needs

KAUST/CBRC Galaxy
 http://galaxy.cbrc.kaust.edu.sa/
 running on an internal cluster with limited resources
 but we can do with it whatever we need to do

Galaxy in the Amazon clouds (CloudMan)





when you do not have infrastructure in house
when you have particular resource (cores, memory...) needs
when you need a customization
if you have a credit card
details in this presentation:
 http://wiki.g2.bx.psu.edu/GCC2011?action=AttachFile&do=get&target=
CloudManGalaxyOnTheCloud.pdf

Galaxy has also the RESTfull API for programmatic access (beta)
...we need to:
Image courtesy of http://mychinaconnection.com/english-proverb/there-is-no-free-lunc

Data issues



Tools



make a subset of tools we really need and test them fully
consider to wrap other tools (not yet available by default)
Logistics




add genome-wise data we (CBRC) need
add data usable for others (Core, students...)
provide user-oriented courses
create a user group to share experience and to promote knowledge
monitor its stability and usage
Hardware/sysadmin issues



Install it on better hardware (in due time)
Change the current queue priority (a chicken-egg problem)
Add an ftp server


Galaxy home page:

http://galaxy.psu.edu/
An overview presentation:

http://wiki.g2.bx.psu.edu/GCC2011?action=AttachFile&do=
get&target=IntroductionSession.pdf
Download