A small school`s experience

advertisement
Streamlining Research
Computing Infrastructure
!
!
A small school’s experience
Gowtham
HPC Research Scientist, ITS
Adj. Asst. Professor, Physics/ECE
!
g@mtu.edu
(906) 487-3593
http://www.mtu.edu
Houghton, MI
2
Houghton, MI
56 miles
Twin Cities, MN 375 miles
Sault Ste. Marie, MI
Canada 265 miles
Duluth, MN 215 miles
Isle Royale National Park, MI
Green Bay, WI 215 miles
Detroit, MI 550 miles
3
Fall 2013
- Population
- Houghton/Hancock: 15,000 (22,000)
1885
Michigan Tech
- Faculty: 500
1897
- Students: 7,000 (5,600 +1,400)
- General budget: $170 million
1927
- Staff: 1,000
- Endowment value: $83 million
4
1964
- Sponsored programs awards: $48 million
An as is snapshot
January 2011
- 8 mini to medium sized clusters
- Spread around campus
- Varying versions of Rocks - Different software configurations - Single power supply for most components
- Manual systems administration and maintenance
- Minimal end user training and documentation
These 8 clusters — purchased mostly with start-up funds — had 1,000 CPU cores spanning several hardware
generations and few low-end GPUs. Only one of them had InfiniBand (40 Gb/s).
5
Initial consolidation
January 2011 — March 2011
- Move all clusters to one of two data centers
- Merge clusters when possible
- Consistent racking, cabling and labeling scheme
- Upgrade to Rocks 5.4.2
R107B36 OB1
- Identical software configuration R107B41 P01
- End user training
- Complete documentation
Rack 107, Back side, 36th slot,
On Board NIC 1 (of a node)
Rack 107 Back side, 41st slot,
Port 01 (of the switch)
Compute nodes deemed not up to the mark were put away for building a test cluster: wigner.research.mtu.edu
6
Capture usage pattern
April 2011 — December 2011
- hpcmonitor.it.mtu.edu
- Ganglia monitoring system
Monitoring multiple clusters with Ganglia:
http://central6.rocksclusters.org/roll-documentation/ganglia/6.1/x111.html
7
Analysis of usage pattern
January 2012
- Low usage
- 20% on most days
- 45-50% on luckiest of days
- Inability and/or unwillingness to share resources
- Lack of resources for researchers in need
- More systems administrative work
- Space, power and cooling costs
- Less time for research, teaching and collaborations
8
The meeting
January 2012
- VPR, Provost, CIO, CTO, Chair of HPC Committee
and yours truly
- Strongly encourage sharing of under-utilized clusters
- End of life for existing individual clusters
- Stop funding new individual clusters
- Acquire one big centrally managed cluster
- Central administration will fully support the new policies
- One person committees
- No exceptions for anyone
9
The philosophy
January 2012
Greatest good for the greatest number
- Warren Perger and Gifford Pinchot
Much is said of the questions of this kind, about greatest good for the greatest number. But the greatest number too
often is found to be one. It is never the greatest number in the common meaning of the term that makes the
greatest noise and stir on questions mixed with money …
- John Muir
10
The philosophy
January 2012
It’s not just a keel and a hull and a deck and sails. That’s what a ship needs but not what a ship is. But what a ship is … what the Black Pearl Superior really is … is freedom.
- Captain Jack Sparrow, Pirates of the Caribbean
Adopted shamelessly from Henry Neeman’s SC11 presentation: Supercomputing in Plain English
11
Bidding/Acquiring process
February 2012 — May 2013
- $750k for everything
- $675k for hardware + 10% for unexpected expenses
- 5 rounds with 4 vendors (2 local; 2 brand names)
- Local vendor won the bid
February 2013
- Staggered delivery of components
April — May 2013
- Fly-wheel installation April — May 2013
- Load test with building and campus generators
12
wigner.research
January 2011 — December 2013
- Built with retired nodes from other clusters
- 1 front end
- 2 login nodes
First version of wigner had just
two nodes: 1 front end and 1
compute node, built with retired
lab PCs and no switch
- 1 NAS node (2 TB RAID1 storage)
- 32 compute nodes
- 50+ software suites
- 150+ users
As of Spring 2014, wigner has been retired. The nodes are being used as a testing platform for upcoming Data
Science program at Michigan Tech and to teach building and managing a research computing cluster as part of
PH4395: Computer Simulations
13
wigner.research
March 2011 — December 2013
- HPC Proving Grounds
- OS installation and customization
- Software compilation and integration with queueing system
- Extensive testing of policies, procedures and user experience
- PH4390, PH4395 and MA5903 students
- Small to medium sized research groups
- Automating systems administration
- Integrating configuration files, logs, etc.
with a revision control system
14
rocks.it.mtu.edu
April 2012 — present
- Central Rocks server (x86_64)
- Serves 6.1, 6.0, 5.5, 5.4.3 and 5.4.2
- Saves time during installation
- Facilitates inclusion of cluster-specific rolls
Scripts and procedures were provided by Philip Papadopoulos
15
Superior
June 2013
- 1 front end
- 2 login nodes
Compute nodes (CPU and GPU):
Intel Sandy Bridge E5-2670 2.60 GHz
16 CPU cores and 64 GB RAM
- 1 NAS node: 33 TB usable RAID60 storage space
- 72 CPU compute nodes
- 5 GPU compute nodes
- 4 NVIDIA Tesla M2090 GPUs (448 CUDA cores)
Housed in the newly built Great Lakes Research Center: http://www.mtu.edu/greatlakes/
16
Superior
June 2013
- 56 Gbps InfiniBand
- Primary research network
- Copper cables
- Gigabit ethernet
- Administrative and secondary research network
- Redundant power supply for every component
With 81 total nodes, there was 33% room for growth before needing to re-design the InfiniBand switch system.
Final cost was $680k. Remaining $70k was used to build a test cluster: portage.research.mtu.edu
17
Superior
June 2013
- Physical assembly (7 days)
- Racking, cabling and labeling - Rocks Cluster Distribution (5 days)
- OS installation, customization, compliance
- Software compilation, user accounts
- 3 pilot research groups (14 days)
- Reward for being good and productive users
- Help fix bugs, etc.
18
Superior
June 2013
Ethernet switch system
Front end
Login nodes
CPU Compute nodes
InfiniBand switch system
Storage node
GPU compute nodes
19
Superior
June 2013
- short.q (compute-0-N; N: 0-7)
- 24 hour limit on run time
- long.q (compute-0-N: N: 8-81)
- No limit on run time
- gpu.q (compute-0-N: N: 82-86)
- No limit on run time
http://superior.research.mtu.edu/available-resources
20
Benchmarks: HPL
#
June 2013
Performance
(TFLOPS)
Notes
1
Theoretical
23.96
--
2
Practical
21.57
~90% of #1
3
Measured
21.38
89.23% of #1
http://netlib.org/benchmark/hpl
Theoretical performance = # of nodes x # of cores per node x Clock frequency (cycles/second) x # of floating point operations per cycle
21
Benchmarks: LAMMPS
June 2013
Total Run Time (hours)
17
Michigan Tech: Superior
NASA: Pleiades
12.75
8.5
4.25
0
2 (32)
4 (64)
6 (96)
10 (160)
# of nodes (CPU cores)
Benjamin Jensen (advisor: Dr. Gregory Odegard)
Computational Mechanics and Materials Research Laboratory, Mechanical Engineering-Engineering Mechanics
Results from a simulation involving 1,440 atoms and 500,000 time steps.
22
Account request
-
Résumé
Proposal
-
Title and abstract
User population
Preliminary results
Nature of data sets
Required resources
-
List of software/compilers
Scalability
Source of funding
Submit completed proposal to:
!
Dr. Warren Perger
Chair, HPC Committee
wfp@mtu.edu
LaTeX/MS Word template available at http://superior.research.mtu.edu/account-request
23
Why proposal?
- A metric for merit
- An easily accessible list of projects
- Know what the facility is being used for
- Intellectual scholarship and computational requirements
- For VPR, CIO, deans, dept. chairs and institutional directors
- A fail-safe opportunity to practice writing proposals
seeking allocations in NSF’s XSEDE, etc.
http://nsf.gov
http://xsede.org
http://superior.research.mtu.edu/list-of-projects
24
User population
- Tier A
- New faculty
- Established faculty with funding
- Tier B
- Established faculty with no (immediate) funding
Group members and external collaborators inherit their PI’s tier.
New faculty status is valid for 2 years from the first day of work.
25
Job submission: qgenscript
One stop shop for
-
Array jobs
Exclusive node access
Wait on pending jobs
Email/SMS notifications
Wait time statistics
Command to submit the script
Job information file
http://superior.research.mtu.edu/job-submission/#batch-submission-scripts
26
Job scheduling policy
- Users’ priorities are computed periodically
- A weighted function of CPU time and production
- In effect only when Superior is running at near 100% capacity
- Pre-emption and advanced reservation are disabled
- Any job that will start will run to completion
http://superior.research.mtu.edu/job-submission/#scheduling-policy
27
Email/SMS notifications
http://superior.research.mtu.edu/job-submission/#sms-notifications
28
Job information file
29
Running programs in login nodes
- Reduces performance for all users
- First offense
- Terminates the program
- An email notification [cc: user’s advisor]
- Subsequent offenses
- Same as first offense
- Logs the user out and locks down the account
http://superior.research.mtu.edu/job-submission/#running-programs-on-login-nodes
A continued trend will be grounds for removal of user’s account.
30
Disk usage
- Data is not backed up
- Limits per user
-
/home/john - 25 MB
/research/john - decided on a per proposal basis
- When a user exceeds the limit
-
12 reminders at 6 hour intervals [cc: user’s advisor]
13th reminder, logs out the user and locks down the account
http://superior.research.mtu.edu/job-submission/#disk-usage
31
Useful commands
- qgenscript
- qresources
- qlist
- qnodes-map
- qnodes-active | qnodes-idle
- qwaittime
-
Developed at Michigan Tech
http://superior.research.mtu.edu/job-submission/#useful-commands
32
qstatus | quser | qgroup
qnodes-in-job
qjobs-in-node
qjobs-in-active-nodes
qjobinfo | qjobcount
qusage
Usage reports
All PIs and Chair of HPC
Committee receive a weekly
report.
!
VPR, CIO, deans, department
chairs and institutional
directors receive quarterly
and annual reports (or when
necessary).
33
Usage reports
July 2013 — December 2013
- 21 projects
- 10 Tier A +11 Tier B
- 100 users
- 9 publications
- 75+% busy on most days
- $325k worth usage
34
~50% of initial investment
Cost recovery model: $0.10 per CPU-core per hour
Metrics
Cannot manage what cannot be measured
Not everything that’s (easily) measurable is (really) meaningful
Not everything that’s (really) meaningful is (easily) measurable
35
Metrics
- Move towards a merit-based system
- Easily measurable quantities
- Who users are
- # of CPUs and total CPU time
- Really meaningful entities
- Publications
- Type (poster, conference proceeding, journal) and impact factor
- Citations
36
Metrics: job priority
System already knows
who the users are
An in-house algorithm to compute users’ priorities
Publications reported to:
!
Dr. Warren Perger
Chair, HPC Committee
wfp@mtu.edu
37
Metrics
http://superior.research.mtu.edu/usage-reports
Interactive visualizations are built using Highcharts framework
38
Metrics
http://superior.research.mtu.edu/usage-reports
39
Metrics
http://superior.research.mtu.edu/usage-reports
40
Metrics
http://superior.research.mtu.edu/usage-reports
41
Metrics
http://superior.research.mtu.edu/usage-reports
42
Metrics: global impact
Michigan Tech original
Journal Article
Book Chapter
MS Thesis
PhD Dissertation
http://superior.research.mtu.edu/list-of-publications
43
Conference Proceeding
Further consolidation
August 2013 — December 2013
- Move all clusters to Great Lakes Research Center
- Upgrade to Rocks 6.1 and add a login node
- Retire individual clusters when possible
- 16 compute nodes and 1 NAS node added to Superior
- portage.research.mtu.edu
- Segue to Superior
- 1 front end, 1 login node, 1 NAS node and 6 compute nodes
- Testing, course work projects and beginner research groups
44
An as is snapshot
January 2014
- 1 big, 1 mini (central) and 3 individual clusters
- 1 data center with .research.mtu.edu network
- Rocks 6.1
- Identical software configurations
- Automated systems administration and maintenance - Extensive end user training
- Complete documentation
Immersive Visualization Studio (IVS) is powered by a Rocks 5.4.2 cluster and has 24 HD screens (46” 240 Hz LED) working in unison to create a 160 sq. feet display wall.
45
@MTUHPCStatus
Immediate future
February 2014 and beyond
- More tools to enhance user experience
- Videos for self-paced learning of command line linux
- Encourage GPU computing
- Expand storage
- Provide backup
- Re-design InfiniBand switch system (216 nodes)
- Plan for expanded (or new) Superior
46
Thanks be to
47
Philip Papadopoulos and Luca Clementi (UCSD and SDSC)
Timothy Carlson (PNL)
Thomas Reuti Reuter (Phillips Universität Marburg)
Alexander Chekholko (Stanford University)
Rocks, Grid Engine and Ganglia mailing lists
Henry Neeman (University of Oklahoma)
Steven Gordon (The Ohio State University)
Gergana Slavova, Walter Shands and Michael Tucker (Intel)
Gaurav Sharma and Scott Benway (MathWorks)
Adam DeConinck (NVIDIA)
Download