Slajd 1 - Infrastruktura PL-Grid

advertisement
Domain-oriented services and resources
of Polish Infrastructure for Supporting
Computational Science in the European
Research Space – PLGrid Plus
Domain-oriented services and resources
of Polish Infrastructure for Supporting
Computational Science in the European
Research Space – PLGrid Plus
Genomic Data Analysis Services
Available for PL-Grid Users
Tomasz Waller, Tomasz Gubała, Kazimierz Murzyn
Academic Computer Centre Cyfronet AGH, cyfro.net
Klaster LifeScience Kraków, lifescience.pl
Recent Advances in Omics Research, Kraków, October 2014
INNOVATIVE ECONOMY
NATIONAL COHESION STRATEGY
EUROPEAN UNION
EUROPEAN REGIONAL
DEVELOPMENT FUND
ACC Cyfronet AGH and
PL-Grid Infrastructure
2
Academic Computer Centre Cyfronet AGH
• Established in 1973 (40 years of experience)
• Provides network, computational power and data
storage capabilities for Polish science
• ~374 TFlops (zeus, 175@top500), 2.5 PB (disks)
and 3.5 PB (tapes)
• 1.7 PFlops (prometheus) with 10 PB of disks,
expected first half of 2015
• Regular and bigmem nodes, vSMP, GPGPU, FPGA,
MPI over Infiniband
• Details: http://kdm.cyfronet.pl/
PL-Grid Infrastructure for Polish science
• Five computing centers with Cyfronet as
the consortium leader
• Total: ~588 TFlops and ~5.6 PB (disks) but
soon to grow considerably (see above)
• Available free of charge to all Polish scientists
and their foreign collaborators
• Details: http://www.plgrid.pl
INNOVATIVE ECONOMY
NATIONAL COHESION STRATEGY
EUROPEAN UNION
EUROPEAN REGIONAL
DEVELOPMENT FUND
Using PL-Grid Infrastructure
3
Register at https://portal.plgrid.pl
User verification process based on Polish OPI number
Assistants and foreigners are confirmed by Polish PIs
Variety of basic and higher level services available after login
Local SSH access, cloud computing, middlewares
Considerable library of installed applications
GATK, MACS, SAMTools, Picard, TopHat, Bowtie, (p)BWA,
R/Bioconductor, AutoDock/AutoGrid, BLAST, Clustal, CPMD, Gromacs,
NAMD, Matlab, Mathematica …
Free to compile and install own applications using the shell login
Possibility to use own commercial licenses on HPC resources
Specific services dedicated to the Life Science domain
INNOVATIVE ECONOMY
NATIONAL COHESION STRATEGY
EUROPEAN UNION
EUROPEAN REGIONAL
DEVELOPMENT FUND
DNA Microarray Integromics Analysis
Platform (1/2)
4
https://lifescience.plgrid.pl/
For people who perform biological investigations using DNA
microarrays
Goal: help to analyze gene expression information and correlate it
with other clinical data
Analyses available now: normalization, clustering, SAM, T-test, GObased enrichment, ANNs, PCA, panel filtering
’Integromics’ analyses in ’beta’ (testing) stage
CCA, PLS (gene expression and lipidomics)
Roleswitch, TargetScore (gene expression and miRNA)
Still in continuous development (Pathways, EBI export etc.)
Supported models: some Affymetrix, Agilent SurePrint (adding
support for others is possible, in case of demand)
INNOVATIVE ECONOMY
NATIONAL COHESION STRATEGY
EUROPEAN UNION
EUROPEAN REGIONAL
DEVELOPMENT FUND
DNA Microarray Integromics Analysis
Platform (2/2)
5
Notable features
Integration with EBI ArrayExpress (import, MIAME)
Sharing experiments with others
Importing own data for further analysis
Supported languages: PL, EN
Manual: https://docs.cyfronet.pl/x/JpaZ
Cooperation
Jagiellonian University Medical Collage, Kraków
Medical University of Silesia, Katowice
Institute of Oncology, Gliwice
INNOVATIVE ECONOMY
NATIONAL COHESION STRATEGY
EUROPEAN UNION
EUROPEAN REGIONAL
DEVELOPMENT FUND
Agilent GeneSpring GX
6
RDP: genespring.plgrid.pl
Used with Windows Remote Desktop
Integrated with the DNA Integromics
Platform for uniform microarray files
management
5-year, single-seat license for all registered
Polish scientists
Manual: https://docs.cyfronet.pl/x/JIq1
INNOVATIVE ECONOMY
NATIONAL COHESION STRATEGY
EUROPEAN UNION
EUROPEAN REGIONAL
DEVELOPMENT FUND
Galaxy NGS Server (1/4)
INNOVATIVE ECONOMY
NATIONAL COHESION STRATEGY
7
EUROPEAN UNION
EUROPEAN REGIONAL
DEVELOPMENT FUND
Galaxy NGS Server (2/4)
8
https://galaxy.plgrid.pl/
”Galaxy is an open, web-based platform for data intensive
biomedical research.”
Goal: deploy high-performance, high-throughput NGS data
analysis solution on top of HPC resources for PL-Grid users
Needs a lot of adjustments and in-house add-on development
Work started 12.2013, and still at a beta stage…  - but
accessible to anyone willing to test and to help
Planned integrated tools (list not closed): GATK, SAMtools,
Bowtie, TopHat, BWA, bedtools, Cufflinks, Picard,
SnpEff/SnpSift, Flexbar, FastQC, MACS
Targeted platforms: Illumina *Seq, Ion Proton, Roche 454
INNOVATIVE ECONOMY
NATIONAL COHESION STRATEGY
EUROPEAN UNION
EUROPEAN REGIONAL
DEVELOPMENT FUND
Galaxy NGS Server (3/4)
9
Notable features
Full integration with Zeus
cluster and disk arrays
PBS and MQ system for
effective job queuing
Secured environment (open
for all PL-Grid users, not
”public”)
All major Galaxy features
(history, sharing, viewers)
Well documented
workflows designed by
NGS experts
Basics (alignment and quality
control, trimming, filtering)
DNA-Seq, RNA-Seq, variant
calling, SNP calling,
methylation, exome analysis
with annotations
INNOVATIVE ECONOMY
NATIONAL COHESION STRATEGY
Manual: https://docs.cyfronet.pl/x/voas
Cooperation
Institute of Pharmacology, Polish Academy of Sciences,
Kraków
OMICRON, Jagiellonian University Medical Collage, Kraków
National Research Institute of Animal Production, KrakówBalice
EUROPEAN UNION
EUROPEAN REGIONAL
DEVELOPMENT FUND
Galaxy NGS Server (4/4)
10
Current challenges
Some security issues in the Galaxy code prevent the production
deployment
Cluster integration is there, yet rather unstable and prone to fail (quite
an intricate contraption, it is)
Broad variety of integrated tools and wrappers does not help
Call to action – who is needed
Users: the bigger the community, the easier to make us visible
Early adopters: tell us what you need, help us test and integrate the
tools and workflows you use
Programmers: if you’d like to help us bring a dedicated HPC-powered
Galaxy for Polish scientists, any assistance is greatly appreciated
Contact: t.gubala@cyfronet.pl
INNOVATIVE ECONOMY
NATIONAL COHESION STRATEGY
EUROPEAN UNION
EUROPEAN REGIONAL
DEVELOPMENT FUND
Links, Contact, Partners
11
These resources, services and tools (and much more) are available
after registering to PL-Grid
https://portal.plgrid.pl/
PL-Grid User Manual
https://docs.plgrid.pl/podrecznik_uzytkownika (PL)
https://docs.plgrid.pl/display/PLGDoc/User+manual (EN)
Questions, problems, requests about PL-Grid
https://helpdesk.plgrid.pl or helpdesk@plgrid.pl
Contact for LifeScience domain services
plgrid@lifescience.pl
INNOVATIVE ECONOMY
NATIONAL COHESION STRATEGY
EUROPEAN UNION
EUROPEAN REGIONAL
DEVELOPMENT FUND
Download