Overview of Research Computing ITS Research Computing Mark Reed Overview – Research Computing • Resources • Services • Projects ReCo Resources • Computational Resources compute clusters: Killdevil, Kure Special purpose servers: galaxy, bioapps, sapientia, ICISS, eruditio • Software licensed open source • Data Storage • Virtual Computing Lab (VCL) • Access to National Resources ReCo Services • • • • • • • Technical Support Training and Development Engagement and Collaboration Research Database Support Secure Data Exchange Data Grids – iRODS Desktop Support - THL ReCo Projects 4 hn • EFRC O-O bond 2 H2O formation PCET CatOx O2 Multi e+ + 4 H catalysis Light Harvesting Antenna 4 e- 4 e- D C ET interface Proton Transfer • HTS and Seqware • Digital Humanities A 2 H2 CatRed PCET Multi ecatalysis 4 H+ Resources Compute Cluster Advantages • fast interconnect, tightly coupled • aggregated resources compute cores memory • • • • • installed software base high availability large (scratch) file spaces scheduling and job management data backup Multi-Purpose Killdevil Cluster • High Performance Computing Large parallel jobs, high speed interconnect • High Throughput Computing (HTC) high volume serial jobs • Large memory jobs special nodes for extreme memory • GPGPU computing computing on Nvidia processors Killdevil Nodes • Three types of nodes: compute nodes large memory nodes GPGPU nodes Killdevil Compute Cluster • Heterogeneous Research Cluster • Dell Blades • 700+ Compute Nodes mostly – – – – Xeon 5670 2.93 GHz 9600 cores Nehalem Microarchitecture Dual socket, hex core and oct core – 48 GB memory – some higher memory nodes • Infiniband 4x QDR • GPGPU Nodes – 64 Nvidia Tesla M2070 • Extreme Memory Nodes – two 1 TB node, 32 cores Interconnect • priority usage for patrons – Buy in is cheap • Storage – large lustre scratch file system IB connected – /netscr Kure • A HPC/HTC research compute cluster in RC • Named after the beach in North Carolina • It’s pronounced like the Nobel prize winning physicist and chemist, Madame Curie Kure Compute Cluster • Heterogeneous Research Cluster • Hewlett Packard Blades • 200+Compute Nodes, mostly – – – – – – Xeon 5560 2.8 GHz Nehalem Microarchitecture Dual socket, quad core 48 GB memory • priority usage for patrons over 1800 cores – Buy in is cheap some higher memory nodes • Infiniband 4x QDR • Storage – /netscr, /proj Getting an account: For Kure, KillDevil and Mass Storage • http://onyen.unc.edu • Subscribe to Services Resources: Available Software Licensed Software • over 20 licensed software applications (some are site or volume licensed, others restricted) SAS, Matlab, Maple, Mathematica, Gaussian, Accelrys Materials Studio and Discovery Studio modules, Sybyl, Schrodinger, Stata, ArcGIS, NAG, IMSL, Totalview, Envi/IDL, JMP, and JMP Genomics • compilers (licensed and otherwise) intel, PGI, gnu, CUDA compiler Large Installed Software Base • Numerous other packages provided for research and technical computing including BLAST, PyMol, SOAP, PLINK, NWChem, R, Cambridge Structural Database, Amber, Gromacs, Petsc, Scalapack, Netcdf, Babel, Qt, Ferret, Gnuplot, Grace, iRODS, XCrySDen, and many more. Mass Storage • long term archival storage • easy to access and use • “limitless” capacity 2 TB free • looks like ordinary disk file system – data is actually stored on tape • data is backed up “To infinity … and beyond” - Buzz Lightyear Virtual Computing Lab (VCL) Collaboration with NC State to establish VCL infrastructure for UNC. VCL provides on-demand access to high-end computing resources, via highly customized, virtual Windows and Linux machines. Virtual Computing Lab (VCL) • Users can log on from anywhere at any time to make a reservation to use a machine • Lots of software available! ArcGIS SAS MATLAB Adobe MS Office LaTEX SigmaPlot MUCH MORE! Go to http://vcl.unc.edu to sign on For help, see “Getting Started on VCL” webpage http://help.unc.edu/CCM3_007680 Access to National Resources • XSEDE – NSF funded leadership class infrastructure at 11 partner sites. • Open Science Grid – national shared computing and storage resources in a common grid infrastructure Services Services: Training • Courses are offered in the following areas: Introductions to HPC resources Research Applications Linux General Computing Parallel Programming • Courses are taught throughout year by Research Computing, for listings and details, go to: http://learnit.unc.edu/workshops http://help.unc.edu/CCM3_008194 Services: Technical Support • Technical support in using RC resources is available Support in compiling, porting, using tools, submitting jobs, using software packages, storage and data management, … • • • • online web forms email research@unc.edu 962-HELP (962-4357) personal consultation Engagement, Support and Collaboration • Research scientists with experience in computational chemistry, physics, grid computing, environmental modeling, mathematics, parallel computing and the life sciences are available for consultation and collaboration. • Digital Humanities Specialist • Extensive technical support for utilizing research computing resources. Services: Secure Data Exchange • Capability to share secure and sensitive data using a secure “drop box” mechanism for anonymous or non-Onyen users or full FTP access for trusted Onyen accounts • Computing - challenges of flexibility needed for research and realities of cyber attacks • Networking – maximizing bandwidth for research endeavors vs. IPS/IDS inspection • Data – compliance requirements, data sharing, privacy, etc. Services: Data Grids –iRODS • Distributed data storage using the integrated Rule oriented Data System (iRodS). iRODS provides scientists with a secure, scalable system that can support many aspects of research data management • Enables data grids/repositories whose policies are implemented and enforced through rules Research Computing is experimenting with hosting iRODS collections as a service. Collaborating with UNC Libraries, Institute for the Environment, and RENCI. www.irods.org Desktop Computing –TarHeel Linux Linux Image Pull Desktop/Laptop Campus Machines Kickstart Server for Linux Distribution in ITS Manning Machine Room •Build desktop machines tailored for the RC environment with additional customization by user. •Based on CentOS •Security Approved Build •nightly updates • Onyen • OpenAFS • Customized Applications • Firewall •http://tarheellinux.unc.edu Services: Research Database Support • Full time DB admin to support UNC research databases • over 20 UNC Research Databases for research production, training and development clients include School of Pharmacy, Lineberger Comprehensive Cancer Center (LCCC), Computer Science, SILS, Renci, Bioinformatics, Institute for the Environment, … Projects Energy Frontier Research Centers http://www.er.doe.gov/bes/EFRC/index.html Chemical Approaches to Artificial Photosynthesis. Modular Approach 1. 2. 3. 4. Light absorption, sensitization Electron transfer quenching Vectorial electron/proton transfer, redox splitting Catalysis of water oxidation and reduction Photosystem II 4 hn O-O bond 2 H2O formation PCET CatOx O2 Multi e+ + 4 H catalysis Light Harvesting Antenna 4 e- 4 e- D C ET interface A 2 H2 CatRed PCET Multi ecatalysis 4 H+ Meyer, Accounts of Chemical Research 1989, 22, 163. Proton Transfer Meyer, et. al. Inorg. Chem. 2005, 6802; Acc. Chem Res 1989, 163. High Throughput Sequencing • The High Throughput Sequencing Facility (HTSF) provides core services primarily for • Lineberger Comprehensive Cancer Center (LCCC) and the TCGA (The Cancer Genome Atlas) project • Renci – NIDA project (National Inst. Drug Abuse) • UNC life sciences High Throughput Deep Sequencing Infrastructure • ~20 NextGen sequences – Illumina HiSeq, Ion Torrent, … • • • • • • RNAseq pipeline DNAseq pipeline Whole Genome pipeline ChIP/FAIREseq pipeline De novo assembly Specialized Workflow Engine, Condor, LSF scheduling High Throughput Deep Sequencing Infrastructure Data Collection Infrastructure Aggregation Server Isilon 1.7 PB Compute Nodes MaPSeq meta scheduler running multiple pipelines Pipeline Manager Processing Pipeline • TCGA is a project to catalog genetic mutations responsible for cancer. UNC is one of twelve national centers • Processed over 4500 samples in support of TCGA to date • Have processed over 700 samples in a week • Goal is to process 10,000 unique samples total over five years Lumbee Familial Political Factions Malinda Maynor Lowery, History Brooklyn Renaissance Social Graph Melissa Bullard, History Ancient World Mapping Application Questions and Comments? • For assistance with any of our services, please contact Research Computing Email: research@unc.edu Phone: 919-962-HELP Submit help ticket at http://help.unc.edu