LONI RP Update November 5 2009 ()

advertisement
LONI
LONI/LSU RP Update
Honggao Liu, Ph.D
Director, HPC @ LSU
NSF HPCOPS PI, LONI
November 5, 2009
Queen Bee Update
LONI
• Was fairly reliable and down four times for total of 116 unavailable hours in
the past four months
• Network connection between QB and the rest of the TeraGrid used a
dedicated 1 Gbps connection to LONI, and a 10 Gbps connection from
LONI to Chicago. The planned 10 Gbps to QB has been delayed by physical
construction at a carrier hotel building in downtown Baton Rouge, which is
preventing the installation of a new fiber. LONI ordered a 10GE Metro-E
circuit between LSU and QB in September as a local wave service from
AT&T. We are looking at the beginning of January 2010 to have the local
wave service operational.
• Four incidents occurred in the past four months. Each involved a user
gaining root priviledge on QB head nodes. In each case, the impacted users
were notified and forced to change their passwords. The head nodes were
reinstalled and kernel patches were installed.
Queen Bee Usage
• Queen Bee had over 85% total usage in past four months
LONI
Queen Bee Usage
4000000
TG Usage
3500000
LONI Usage
3000000
SU Hours
2500000
2000000
1500000
1000000
500000
0
Feb-08
Apr-08
Jun-08
Aug-08
Oct-08
Month
Dec-08
Feb--09
Apr--09
Jun--09
Aug--09
Oct--09
Queen Bee Usage
Users, Jobs and SUs for Queen Bee relative to the peak of each data type.
That lets one plot all 3 data types on the same graph to see how they relate
QueenBee Monthly Usage
120.00
100.00
Percent of Max
•
•
80.00
60.00
Users (166.0)
Jobs (14854.0)
40.00
SUs (3564332.5)
TG SU Share
20.00
0.00
LONI
LONI’s New TeraGrid Projects
LONI
• Project: SAGA (http://saga.cct.lsu.edu/) Deployment on TeraGrid
• Project Aim I: Deploy SAGA on major TeraGrid resources
(Kraken, Ranger, Abe/QB)
– Get stable release (Ole Weidner)
• Scheduled Date: 31st Oct, 2009
• Estimated Date: 15th Nov, 2009
– Make available via CTSS (Lukasz)
• Work with RP and GIG (progressing well)
• Test deployment available on QB
• Project Aim II & III: SAGA-based Shell & Developing/Deploying
FAUST (Framework for Adaptive Ubiquitous Scalable Tasks)
– Planned for second half (Jan’10-May’10) of project
– Depends upon stable and reliable deployment on TG
SAGA Deployment on TeraGrid
• Project Aim IV: Documentation (Andre)
– Programming Manual and Exercise (Andre, Bety) in progress
• http://faust.cct.lsu.edu/trac/saga/wiki/Tutorials/NeSC2009
– Tutorial and Training
• Held several training events in Fall 2009
–
–
–
–
–
International Summer School on Grid Computing
Advanced Distributed Summer School
NeSC-Edinburgh Training
Planned LONI training (January 2010)?
Is there interest in a TG-wide tutorial/training?
• We currently provide source releases only – they’re available at
http://saga.cct.lsu.edu/download/
• We’re following a 6/8-weekly release cycle.
– 1.4 release due date 15 Nov (TeraGrid version)
– 1.5 release due date 15 Jan 2010
• File a bug or feature request here:
– http://faust.cct.lsu.edu/trac/saga/
LONI
LONI’s New TeraGrid Projects
• Project: TeraGrid-LONI-DEISA Interoperability
LONI
• Background: Demonstrate the advantages of Scale-Out and Interoperability
(across TG and DEISA) for appropriate scientific problems
• Aim: To enhance the understanding of HIV-1 enzymes using replica-based
methods across federated TG-DEISA-LONI
– Do so using general-purpose, extensible, scalable approach
– Test limits of Distributed Scale-Out – both algorithmic and infrastructure limits
– As part of the VPH project, to ultimately help build the CI for quick, efficient (patientspecific) decision-tools using predictive MD of drugs and enzymatic targets (HIV-1 protease)
•
•
•
•
•
•
•
Application Models of HIV-1 and drugs created
Integration of LAMMPS with SAGA
Initial Replica-Exchange performed
Integration of LAMMPS with SAGA-based BigJob
Initial isolated runs on TeraGrid: Ranger and Abe
Working on launching on DEISA
SAGA-UNICORE (via GridSAM) testing in place
TeraGrid-LONI-DEISA Interoperability
Next Steps:
•
•
•
•
•
LONI
Integration of SAGA into Binding Affinity Calculator (BAC) tools
to facilitate distributed Scale-Out
Protonation study of Ritonavir bound to HIV-1 Protease wild type
(on QB/Ranger)
Study of binding affinity between 6 HIV-1 Protease mutants and
the drug Ritonavir using SAGA-BAC Tools
Develop tools for Post-Processing on UK NGS and DEISA
Investigation of Reverse Transcriptase with Replica-Exchange (If
time permits)
LONI’s New TeraGrid Projects
LONI
• Project: Extension of PetaShare to TeraGrid
• PetaShare is an NSF-funded project that is deploying additional disk and tape
storage at LONI sites and developing user-friendly data-aware storage systems,
data-aware schedulers, and cross-domain metadata schemes.
• PetaShare is currently providing distributed data storage and management
capabilities to nine LONI institutions connected via high-speed LONI network.
• This project is to extend PetaShare toTeraGrid thus TeraGrid users are be able to
access their datasets in a more convenient way using the transparent PetaShare
interfaces.
• TeraGrid and LONI users be able to easily share and exchange data with each other.
• PetaShare data access and retrieval services currently optimized for the LONI
network and will need to be enhanced and optimized for the wide-area TeraGrid
networks.
• PetaShare services currently run only Linux-based systems and will need to be
ported to different architecture and operating systems on Teragrid.
• Ahmet Topcu was hired from IU for the TG PetaShare project and started here on
June 15.
LSU HPC/CCT Update
• New Linux Cluster –Philip
LONI
– Total 38 nodes, with 8 Intel “Nehalem” Xeon cores @ 2.93GHz, 160GB HD,
1GB Ethernet per node
– 32 nodes with 24GB 1333MHz Ram, 3 nodes with 96GB 1066MHz Ram and 3
nodes with 48GB 1066MHz Ram
– Open to users in September. Not a TeraGrid resource but potential for OSG jobs
• New Educational Cluster dedicated for students--Arete
– Total 72 nodes. 56 nodes have 8 AMD Opteron cores @2.3GHz and 16 nodes
with cores @ 2.7GHz, 8GB RAM, 4x146GB HDD, Infiniband and 1GB Ethernet
– Available for campus wide use beginning in the Spring 2010
• New Lustre storage
– 240TB DDN storage through Dell was received and deployed as long term
storage and will be allocated to LSU HPC users
– The current 55TB Panasas storage will be upgraded to 80TB in December
LONI/LSU Training
LONI
• 5 workshops were held at LONI/LSU since June
Title
Location Date
# of
Participant
s
Method
Beowulf Boot camp
LSU
6/15-18
22
In classroom
SC09 Parallel Computing
LSU
7/05-11
32
in classroom
Scaling to Petascale
LSU
8/03-07
18
in classroom
LONI HPC workshop
LaTech
10/6-7
12
in classroom
LONI HPC workshop
ULL
10/26-27
30
in classroom
• 13 tutorials were provided since September at LSU and on Access Grid
Download