LONI LONI/LSU RP Update Honggao Liu, Ph.D Director, HPC @ LSU NSF HPCOPS PI, LONI November 5, 2009 Queen Bee Update LONI • Was fairly reliable and down four times for total of 116 unavailable hours in the past four months • Network connection between QB and the rest of the TeraGrid used a dedicated 1 Gbps connection to LONI, and a 10 Gbps connection from LONI to Chicago. The planned 10 Gbps to QB has been delayed by physical construction at a carrier hotel building in downtown Baton Rouge, which is preventing the installation of a new fiber. LONI ordered a 10GE Metro-E circuit between LSU and QB in September as a local wave service from AT&T. We are looking at the beginning of January 2010 to have the local wave service operational. • Four incidents occurred in the past four months. Each involved a user gaining root priviledge on QB head nodes. In each case, the impacted users were notified and forced to change their passwords. The head nodes were reinstalled and kernel patches were installed. Queen Bee Usage • Queen Bee had over 85% total usage in past four months LONI Queen Bee Usage 4000000 TG Usage 3500000 LONI Usage 3000000 SU Hours 2500000 2000000 1500000 1000000 500000 0 Feb-08 Apr-08 Jun-08 Aug-08 Oct-08 Month Dec-08 Feb--09 Apr--09 Jun--09 Aug--09 Oct--09 Queen Bee Usage Users, Jobs and SUs for Queen Bee relative to the peak of each data type. That lets one plot all 3 data types on the same graph to see how they relate QueenBee Monthly Usage 120.00 100.00 Percent of Max • • 80.00 60.00 Users (166.0) Jobs (14854.0) 40.00 SUs (3564332.5) TG SU Share 20.00 0.00 LONI LONI’s New TeraGrid Projects LONI • Project: SAGA (http://saga.cct.lsu.edu/) Deployment on TeraGrid • Project Aim I: Deploy SAGA on major TeraGrid resources (Kraken, Ranger, Abe/QB) – Get stable release (Ole Weidner) • Scheduled Date: 31st Oct, 2009 • Estimated Date: 15th Nov, 2009 – Make available via CTSS (Lukasz) • Work with RP and GIG (progressing well) • Test deployment available on QB • Project Aim II & III: SAGA-based Shell & Developing/Deploying FAUST (Framework for Adaptive Ubiquitous Scalable Tasks) – Planned for second half (Jan’10-May’10) of project – Depends upon stable and reliable deployment on TG SAGA Deployment on TeraGrid • Project Aim IV: Documentation (Andre) – Programming Manual and Exercise (Andre, Bety) in progress • http://faust.cct.lsu.edu/trac/saga/wiki/Tutorials/NeSC2009 – Tutorial and Training • Held several training events in Fall 2009 – – – – – International Summer School on Grid Computing Advanced Distributed Summer School NeSC-Edinburgh Training Planned LONI training (January 2010)? Is there interest in a TG-wide tutorial/training? • We currently provide source releases only – they’re available at http://saga.cct.lsu.edu/download/ • We’re following a 6/8-weekly release cycle. – 1.4 release due date 15 Nov (TeraGrid version) – 1.5 release due date 15 Jan 2010 • File a bug or feature request here: – http://faust.cct.lsu.edu/trac/saga/ LONI LONI’s New TeraGrid Projects • Project: TeraGrid-LONI-DEISA Interoperability LONI • Background: Demonstrate the advantages of Scale-Out and Interoperability (across TG and DEISA) for appropriate scientific problems • Aim: To enhance the understanding of HIV-1 enzymes using replica-based methods across federated TG-DEISA-LONI – Do so using general-purpose, extensible, scalable approach – Test limits of Distributed Scale-Out – both algorithmic and infrastructure limits – As part of the VPH project, to ultimately help build the CI for quick, efficient (patientspecific) decision-tools using predictive MD of drugs and enzymatic targets (HIV-1 protease) • • • • • • • Application Models of HIV-1 and drugs created Integration of LAMMPS with SAGA Initial Replica-Exchange performed Integration of LAMMPS with SAGA-based BigJob Initial isolated runs on TeraGrid: Ranger and Abe Working on launching on DEISA SAGA-UNICORE (via GridSAM) testing in place TeraGrid-LONI-DEISA Interoperability Next Steps: • • • • • LONI Integration of SAGA into Binding Affinity Calculator (BAC) tools to facilitate distributed Scale-Out Protonation study of Ritonavir bound to HIV-1 Protease wild type (on QB/Ranger) Study of binding affinity between 6 HIV-1 Protease mutants and the drug Ritonavir using SAGA-BAC Tools Develop tools for Post-Processing on UK NGS and DEISA Investigation of Reverse Transcriptase with Replica-Exchange (If time permits) LONI’s New TeraGrid Projects LONI • Project: Extension of PetaShare to TeraGrid • PetaShare is an NSF-funded project that is deploying additional disk and tape storage at LONI sites and developing user-friendly data-aware storage systems, data-aware schedulers, and cross-domain metadata schemes. • PetaShare is currently providing distributed data storage and management capabilities to nine LONI institutions connected via high-speed LONI network. • This project is to extend PetaShare toTeraGrid thus TeraGrid users are be able to access their datasets in a more convenient way using the transparent PetaShare interfaces. • TeraGrid and LONI users be able to easily share and exchange data with each other. • PetaShare data access and retrieval services currently optimized for the LONI network and will need to be enhanced and optimized for the wide-area TeraGrid networks. • PetaShare services currently run only Linux-based systems and will need to be ported to different architecture and operating systems on Teragrid. • Ahmet Topcu was hired from IU for the TG PetaShare project and started here on June 15. LSU HPC/CCT Update • New Linux Cluster –Philip LONI – Total 38 nodes, with 8 Intel “Nehalem” Xeon cores @ 2.93GHz, 160GB HD, 1GB Ethernet per node – 32 nodes with 24GB 1333MHz Ram, 3 nodes with 96GB 1066MHz Ram and 3 nodes with 48GB 1066MHz Ram – Open to users in September. Not a TeraGrid resource but potential for OSG jobs • New Educational Cluster dedicated for students--Arete – Total 72 nodes. 56 nodes have 8 AMD Opteron cores @2.3GHz and 16 nodes with cores @ 2.7GHz, 8GB RAM, 4x146GB HDD, Infiniband and 1GB Ethernet – Available for campus wide use beginning in the Spring 2010 • New Lustre storage – 240TB DDN storage through Dell was received and deployed as long term storage and will be allocated to LSU HPC users – The current 55TB Panasas storage will be upgraded to 80TB in December LONI/LSU Training LONI • 5 workshops were held at LONI/LSU since June Title Location Date # of Participant s Method Beowulf Boot camp LSU 6/15-18 22 In classroom SC09 Parallel Computing LSU 7/05-11 32 in classroom Scaling to Petascale LSU 8/03-07 18 in classroom LONI HPC workshop LaTech 10/6-7 12 in classroom LONI HPC workshop ULL 10/26-27 30 in classroom • 13 tutorials were provided since September at LSU and on Access Grid