Overview of Research Computing

advertisement
Overview of Research Computing
ITS Research Computing
Mark Reed
Overview – Research Computing
• Resources
• Services
• Projects
ReCo Resources
• Computational Resources
 compute
clusters: Killdevil, Kure
 Special purpose servers:
 galaxy, bioapps,
sapientia, ICISS, eruditio
• Software
 licensed
 open
source
• Data Storage
• Virtual Computing Lab (VCL)
• Access to National Resources
ReCo Services
•
•
•
•
•
•
•
Technical Support
Training and Development
Engagement and Collaboration
Research Database Support
Secure Data Exchange
Data Grids – iRODS
Desktop Support - THL
ReCo Projects
4 hn
• EFRC
O-O bond
2 H2O formation
PCET CatOx
O2
Multi e+ +
4 H catalysis
Light
Harvesting
Antenna
4 e-
4 e-
D
C
ET interface
Proton Transfer
• HTS and Seqware
• Digital Humanities
A
2 H2
CatRed
PCET
Multi ecatalysis
4 H+
Resources
Compute Cluster Advantages
• fast interconnect, tightly coupled
• aggregated resources
 compute
cores
 memory
•
•
•
•
•
installed software base
high availability
large (scratch) file spaces
scheduling and job management
data backup
Multi-Purpose Killdevil Cluster
• High Performance Computing
 Large
parallel jobs, high speed interconnect
• High Throughput Computing (HTC)
 high
volume serial jobs
• Large memory jobs
 special
nodes for extreme memory
• GPGPU computing
 computing
on Nvidia processors
Killdevil Nodes
• Three types of nodes:
 compute
nodes
 large memory nodes
 GPGPU nodes
Killdevil Compute Cluster
• Heterogeneous Research
Cluster
• Dell Blades
• 700+ Compute Nodes mostly
–
–
–
–
Xeon 5670 2.93 GHz
9600 cores
Nehalem Microarchitecture
Dual socket, hex core and oct
core
– 48 GB memory
– some higher memory nodes • Infiniband 4x QDR
• GPGPU Nodes
– 64 Nvidia Tesla M2070
• Extreme Memory Nodes
– two 1 TB node, 32 cores
Interconnect
• priority usage for patrons
– Buy in is cheap
• Storage
– large lustre scratch file
system IB connected
– /netscr
Kure
• A HPC/HTC research
compute cluster in RC
• Named after the beach in
North Carolina
• It’s pronounced like the
Nobel prize winning
physicist and chemist,
Madame Curie
Kure Compute Cluster
• Heterogeneous Research
Cluster
• Hewlett Packard Blades
• 200+Compute Nodes,
mostly
–
–
–
–
–
–
Xeon 5560 2.8 GHz
Nehalem Microarchitecture
Dual socket, quad core
48 GB memory
• priority usage for patrons
over 1800 cores
– Buy in is cheap
some higher memory nodes
• Infiniband 4x QDR
• Storage
– /netscr, /proj
Getting an account:
For Kure, KillDevil and Mass Storage
• http://onyen.unc.edu
• Subscribe to Services
Resources: Available Software
Licensed Software
• over 20 licensed software applications (some are
site or volume licensed, others restricted)
 SAS,
Matlab, Maple, Mathematica, Gaussian, Accelrys
Materials Studio and Discovery Studio modules, Sybyl,
Schrodinger, Stata, ArcGIS, NAG, IMSL, Totalview,
Envi/IDL, JMP, and JMP Genomics
• compilers (licensed and otherwise)
 intel,
PGI, gnu, CUDA compiler
Large Installed Software Base
• Numerous other packages provided for research
and technical computing
 including
BLAST, PyMol, SOAP, PLINK, NWChem, R,
Cambridge Structural Database, Amber, Gromacs,
Petsc, Scalapack, Netcdf, Babel, Qt, Ferret, Gnuplot,
Grace, iRODS, XCrySDen, and many more.
Mass Storage
• long term archival storage
• easy to access and use
• “limitless” capacity
2
TB free
• looks like ordinary disk file
system – data is actually
stored on tape
• data is backed up
“To infinity … and beyond”
- Buzz Lightyear
Virtual Computing Lab (VCL)


Collaboration with NC State to establish VCL infrastructure for
UNC.
VCL provides on-demand access to high-end computing
resources, via highly customized, virtual Windows and Linux
machines.
Virtual Computing Lab (VCL)
• Users can log on from anywhere at any time to make
a reservation to use a machine
• Lots of software available!
ArcGIS
 SAS
 MATLAB
 Adobe
 MS Office
 LaTEX
 SigmaPlot
 MUCH MORE!

Go to http://vcl.unc.edu to sign on
For help, see
“Getting Started on VCL” webpage
http://help.unc.edu/CCM3_007680
Access to National Resources
• XSEDE – NSF funded
leadership class
infrastructure at 11
partner sites.
• Open Science Grid –
national shared
computing and storage
resources in a common
grid infrastructure
Services
Services: Training
• Courses are offered in the following areas:
 Introductions
to HPC resources
 Research Applications
 Linux
 General Computing
 Parallel Programming
• Courses are taught throughout year by
Research Computing, for listings and
details, go to:


http://learnit.unc.edu/workshops
http://help.unc.edu/CCM3_008194
Services: Technical Support
• Technical support in using RC resources is
available
 Support
in compiling, porting, using tools, submitting
jobs, using software packages, storage and data
management, …
•
•
•
•
online web forms
email research@unc.edu
962-HELP (962-4357)
personal consultation
Engagement, Support and Collaboration
• Research scientists with experience in
computational chemistry, physics, grid
computing, environmental modeling,
mathematics, parallel computing and
the life sciences are available for
consultation and collaboration.
• Digital Humanities Specialist
• Extensive technical support for utilizing
research computing resources.
Services: Secure Data Exchange
• Capability to share secure and sensitive data
using a secure “drop box” mechanism for
anonymous or non-Onyen users or full FTP access
for trusted Onyen accounts
• Computing - challenges of flexibility needed for
research and realities of cyber attacks
• Networking – maximizing bandwidth for research
endeavors vs. IPS/IDS inspection
• Data – compliance requirements,
data sharing, privacy, etc.
Services: Data Grids –iRODS
• Distributed data storage using the integrated Rule oriented
Data System (iRodS). iRODS provides scientists with a
secure, scalable system that can support many aspects of
research data management
• Enables data grids/repositories whose policies are
implemented and enforced through rules
Research Computing is
experimenting with hosting iRODS
collections as a service.
Collaborating with UNC Libraries,
Institute for the Environment, and
RENCI.
www.irods.org
Desktop Computing –TarHeel Linux
Linux Image Pull
Desktop/Laptop Campus Machines
Kickstart Server for Linux
Distribution in ITS Manning
Machine Room
•Build desktop machines tailored for the RC
environment with additional customization
by user.
•Based on CentOS
•Security Approved Build
•nightly updates
• Onyen
• OpenAFS
• Customized Applications
• Firewall
•http://tarheellinux.unc.edu
Services: Research Database Support
• Full time DB admin to support
UNC research databases
• over 20 UNC Research Databases for
research production, training and
development

clients include School of Pharmacy,
Lineberger Comprehensive Cancer Center
(LCCC), Computer Science, SILS, Renci,
Bioinformatics, Institute for the
Environment, …
Projects
Energy Frontier Research Centers
http://www.er.doe.gov/bes/EFRC/index.html
Chemical Approaches to Artificial
Photosynthesis. Modular Approach
1.
2.
3.
4.
Light absorption, sensitization
Electron transfer quenching
Vectorial electron/proton transfer, redox splitting
Catalysis of water oxidation and reduction
Photosystem II
4 hn
O-O bond
2 H2O formation
PCET CatOx
O2
Multi e+ +
4 H catalysis
Light
Harvesting
Antenna
4 e-
4 e-
D
C
ET interface
A
2 H2
CatRed
PCET
Multi ecatalysis
4 H+
Meyer, Accounts of
Chemical Research 1989,
22, 163.
Proton Transfer
Meyer, et. al. Inorg. Chem. 2005, 6802;
Acc. Chem Res 1989, 163.
High Throughput Sequencing
• The High Throughput Sequencing Facility (HTSF)
provides core services primarily for
• Lineberger Comprehensive Cancer Center (LCCC)
and the TCGA (The Cancer Genome Atlas) project
• Renci – NIDA project (National Inst. Drug Abuse)
• UNC life sciences
High Throughput Deep Sequencing
Infrastructure
• ~20 NextGen sequences
– Illumina HiSeq, Ion
Torrent, …
•
•
•
•
•
•
RNAseq pipeline
DNAseq pipeline
Whole Genome pipeline
ChIP/FAIREseq pipeline
De novo assembly
Specialized Workflow
Engine, Condor, LSF
scheduling
High Throughput Deep Sequencing
Infrastructure
Data Collection Infrastructure
Aggregation
Server
Isilon
1.7 PB
Compute Nodes
MaPSeq meta scheduler
running multiple pipelines
Pipeline
Manager
Processing Pipeline
• TCGA is a project to catalog genetic mutations
responsible for cancer. UNC is one of twelve
national centers
• Processed over 4500 samples in support of TCGA
to date
• Have processed over 700 samples in a week
• Goal is to process 10,000 unique samples total
over five years
Lumbee Familial Political
Factions
Malinda Maynor Lowery, History
Brooklyn Renaissance Social
Graph
Melissa Bullard, History
Ancient World Mapping
Application
Questions and Comments?
• For assistance with any of our services, please
contact Research Computing
 Email: research@unc.edu
 Phone:
919-962-HELP
 Submit help ticket at http://help.unc.edu
Download