Open Science Grid High Throughput Computing On A National Scale Alain Roy

advertisement
Open Science Grid
Open Science Grid
High Throughput Computing
On A National Scale
Alain Roy
Open Science Grid
OSG: HTC at National Scale
• OSG provides high-throughput computing
across the United States.
– 70 or so sites
– For 28-Nov-2008:
•
•
•
•
•
31 May 2016
131,261 jobs for 393,312 hours
Used 54 sites
Jobs by 30 different virtual organizations
86% of jobs succeeded
Underestimate: 64% of sites reported statistics
HTC Week 2007: Open Science Grid (Alain Roy)
2
Open Science Grid
Who Uses OSG?
• About 30 virtual organizations
– High-energy physics uses a large chunk of OSG
– But several other sciences are actively using
OSG.
•
•
•
•
nanoHUB: nanotechnology simulations
LIGO: detecting gravitational waves
CHARMM: molecular dynamics
Football pool: mathematical coding theory
More at:
http://www.opensciencegrid.org/Science_on_the_OSG/
31 May 2016
HTC Week 2007: Open Science Grid (Alain Roy)
3
Open Science Grid
Focus on DZero
• High-energy physics experiment
• Based at Fermilab, near Chicago, US
• Searching for new particles by smashing
together protons and antiprotons at nearly
the speed of light.
• I’m not a physicist: this is as deep as my
understanding goes
31 May 2016
HTC Week 2007: Open Science Grid (Alain Roy)
4
Open Science Grid
DZero & HTC
• DZero exemplifies HTC
• They do local HTC
• They do HTC on OSG
31 May 2016
HTC Week 2007: Open Science Grid (Alain Roy)
5
Open Science Grid
Colliders are Big
Ten-Story Building
Collider Ring
31 May 2016
HTC Week 2007: Open Science Grid (Alain Roy)
6
Open Science Grid
Physics Collaborations Are Big
31 May 2016
HTC Week 2007: Open Science Grid (Alain Roy)
7
Open Science Grid
Dzero’s Computing is Big
Detector
(three stories tall)
31 May 2016
Process/Analyze
(~1000 CPUs)
HTC Week 2007: Open Science Grid (Alain Roy)
8
Open Science Grid
Dzero’s Problem
• As data comes off the detector, it is:
– Processed once on everyone’s behalf
– Analyzed many times by many scientists
• Recently, they wanted to re-process all the data
from the detector in time for scientists to analyze
for summer conferences
• They needed ~ 500, 1GHz computers for one
year
• They only had 1000 CPUs for a few months
– And they were also doing new processing, not just
reprocessing
31 May 2016
HTC Week 2007: Open Science Grid (Alain Roy)
9
Open Science Grid
DZero’s Core Problem
• DZero needed a peak capacity that was
beyond their local capacity
31 May 2016
HTC Week 2007: Open Science Grid (Alain Roy)
10
Open Science Grid
DZero’s Solution
• Expand HTC onto OSG, and other grids
– Requested 1500 CPUs from OSG for four
months
31 May 2016
HTC Week 2007: Open Science Grid (Alain Roy)
11
Open Science Grid
How Did it Go? (1/2)
• Used about 12 OSG sites
– Number fluctuated over time
– Ramped up: Added sites one at a time
– Certified that each site produced correct
answers
– Only three of these were “DZero” sites
– Kept roughly 1500 CPUs busy, after ramp up
• Reprocessed 445 million events
– 286 million on OSG
31 May 2016
HTC Week 2007: Open Science Grid (Alain Roy)
12
Open Science Grid
How Did it Go? (2/2)
• 90 TB of input data
• 250 TB of application
– The application is 1GB
– It was transferred many times
– Easier than pre-installing on all nodes
31 May 2016
HTC Week 2007: Open Science Grid (Alain Roy)
13
Open Science Grid
Beyond DZero
OSG’s goal is to provide for many
scientists what we provided for for DZero.
31 May 2016
HTC Week 2007: Open Science Grid (Alain Roy)
14
Open Science Grid
The OSG Vision
Transform
processing and data intensive science
through
a cross-domain,
self-managed,
national,
distributed cyber-infrastructure
that brings together
campus and
community infrastructure
and facilitates
the needs of Virtual Organizations (VO)
at all scales
31 May 2016
HTC Week 2007: Open Science Grid (Alain Roy)
15
Open Science Grid
The OSG Vision
Transform
processing and data intensive science
through
a cross-domain,
self-managed,
national,
distributed cyber-infrastructure
that brings together
Implies:
campus and
community
infrastructure
Autonomy
and facilitates
Heterogeneity
the needs of Virtual Organizations (VO)
at allLarge-Scale
scales
31 May 2016
HTC Week 2007: Open Science Grid (Alain Roy)
16
Open Science Grid
Autonomy & Heterogeneity
• Autonomy & heterogeneity are a pain
• But they are also a fact of life
• If we accept it, we have access to more
resources
• This is opportunistic computing
31 May 2016
HTC Week 2007: Open Science Grid (Alain Roy)
17
Open Science Grid
The OSG Vision
Transform
processing and data intensive science
through
a cross-domain,
self-managed,
national,
distributed cyber-infrastructure
that brings together
campus and
community infrastructure
and facilitates
the needs of Virtual Organizations (VO)
at all scales
31 May 2016
HTC Week 2007: Open Science Grid (Alain Roy)
18
Open Science Grid
The Three Cornerstones
National
Campus
31 May 2016
Needs to be
harmonized into a well
integrated whole.
Community
HTC Week 2007: Open Science Grid (Alain Roy)
19
Open Science Grid
OSG Needs
• OSG needs many things to be successful:
– Good people
– Good software
– Good security
– Good policies
– Good communication
– Good testing
–…
31 May 2016
My Focus
HTC Week 2007: Open Science Grid (Alain Roy)
20
Open Science Grid
VDT: OSG Software Stack
• Virtual Data Toolkit (VDT)
– A software distribution for Grid computing
– A packaging of other software
(Like a Linux distribution, but different)
– No software development
• We get Condor, Globus, and other software from
other groups
• We “glue” it together
31 May 2016
HTC Week 2007: Open Science Grid (Alain Roy)
21
Open Science Grid
Why Have The VDT?
• Everyone could download the software from the
providers…
• But the VDT:
–
–
–
–
–
–
31 May 2016
Figures out dependencies between software
Works with providers for bug fixes
Provides automatic configuration
Builds it (we provide binaries)
Packages it
Tests everything on a dozen or so platforms (and
growing)
HTC Week 2007: Open Science Grid (Alain Roy)
22
Open Science Grid
Example: VOMS
• VOMS can authorize people in a VO
• VOMS has a web interface
• We:
– Install Tomcat
– Install Apache
We pre-build binaries
for each of these
• Built with Globus SSL
• Patched so GSI pass-through to Apache works
–
–
–
–
Install VOMS
Install VOMS Admin
Install Perl modules needed by VOMS Admin
Install MySQL and set up database (with commandline tool)
– Configure all software
– Configure rotation of log files
31 May 2016
HTC Week 2007: Open Science Grid (Alain Roy)
23
Open Science Grid
Example: Security Update
• Last year, a security update to Globus software:
– We decided to patch four versions of the VDT
– We built updated binaries three times on about six
platforms
– We coordinated creation of patch for unsupported
version of Globus
– We patched the Globus updates with our patches
– We took subset of Globus updates
– We packaged an update that was reversible, if there
were problems.
31 May 2016
HTC Week 2007: Open Science Grid (Alain Roy)
24
Open Science Grid
What’s in the VDT?
• Job management:
– Globus GRAM
– Condor
• Data management
– Globus GridFTP
– dCache
– Bestman
• Security
–
–
–
–
31 May 2016
VOMS
GUMS
PRIMA
MyProxy
• Information/Monitoring
– CEMon
– Generic Info Provider
– Site Validation
• Infrastructure:
–
–
–
–
–
Apache
Tomcat
Python
Perl modules
…
• Miscellaneous
– Squid
– Wget
HTC Week 2007: Open Science Grid (Alain Roy)
25
Open Science Grid
Supported Platforms
• RedHat Enterprise Linux
– 3, 4, 5
– x86, x86-64, ia64
• Scientific Linux Keeping up with Linux distributions
– 3, 4, 5,
is like sprinting a marathon
– x86, x86-64, ia64
•
•
•
•
Fedora Core 4
But to support autonomous,
Debian 3 (soon 4)
heterogeneous sites, it’s a necessity
SLES 9
Mac OS X
31 May 2016
HTC Week 2007: Open Science Grid (Alain Roy)
26
Open Science Grid
VDT Growth
31 May 2016
HTC Week 2007: Open Science Grid (Alain Roy)
27
Open Science Grid
The VDT’s Challenge
•
•
•
•
•
•
•
Keep software up to date
Add new software
Support latest OS versions (and old ones!)
Keep it secure
Make it easy to update
Make it easier to install
Create better documentation
31 May 2016
HTC Week 2007: Open Science Grid (Alain Roy)
28
Open Science Grid
Questions?
Alain Roy
roy@cs.wisc.edu
vdt-support@opensciencegrid.org
31 May 2016
HTC Week 2007: Open Science Grid (Alain Roy)
29
Download