Outsourcing University Services Dr Rhys Newman University of Oxford NeSC 22

advertisement
Outsourcing University Services
Future of National Computer Grid Services in the UK
Dr Rhys Newman
University of Oxford
NeSC 22nd Feb 2007
My Background…
 Academic researching Computer Grid Technology (more on this later)
 I work in the physics department although I am a software engineer by trade
 Spent 6 months project-leading a small-scale computer room build project
for Oxford Physics
 Spent 2 further years on the committee overseeing a computer room build
for the expansion of the Oxford Supercompter
 Spent most of last year campaigning for outsourcing CPU provision in the
face of the above experiences
 Am a director of a University spin-out company which aims to bring grid
computing technology to market
 This experience has meant I have looked at the economic argument for grid
computing (at least in connection with CPU usage) in detail and compared it in
particular with outsourcing CPU time and building your own computer facility
 I therefore feel well qualified to comment on the
issue of “Outsourcing your CPU”
Current Status in the UK
Relative Power
100
80
Percentage
 The UK maintains about 7% of
the top 500 supercomputer
power (6% in 2006).
 Even though the total CPU
power has increased by 35x
since 2000.
 To get into the top 500 you’ll
need about 1000 processor
cores at 2.4Ghz or better
Percentage in UK
UK Supercomputing Performance
60
40
20
0
2000
2001
2002
2003
Year
2004
2005
 A cluster similar to Cambridge’s recent supercomputer,
but equivalent to the #1 supercomputer in GFlops:





18 000 Dual Core Xeon machines
Cost over £30 million to buy (computers only at £2500 each retail)
Consume over 15MW and cost £8 million in electricity to run per year
Would provide approximately 1 billion GHzHrs per year
Would need a machine room the size of a football pitch
2006
What do Academic Users want?
Percentage x86 Architecture
 More computing!!!
 Typically x86 Linux based
 Power Processors coming in
artificially in 2006 due to Blue
Gene upgrade.
100.00%
80.00%
60.00%
40.00%
20.00%
0.00%
2000
2001
2002
2003
2004
100
80
Percent
 Important for cost
 Infiniband appearing before
10GBit Ethernet?
2006
Percentage Standard
Ethernet
Myrinet
Interconnect Family
 Clusters with raw CPU “grunt”
rather than special hardware
or interconnect.
2005
60
40
20
0
2000
2001
2002
2003
Year
2004
2005
2006
Correlation of GFlops and GHzHrs
 Clock speed (GHzHrs) correlates to performance within machine families:
SpecInt2000 vs GHz in 2005
SpecInt2000 vs GHz in 2004
2500
2000
1800
1600
2000
1400
1200
1500
1000
800
1000
600
1x1
1x2
400
500
200
1x1
2x2
0
0
1
2
3
4
5
6
2x1
2x1
7
0
8
0
1
2
3500
3000
2500
SpecInt2000 vs GHz in 2006
2000
1500
1000
1x1
1x2
500
2x2
0
0
4
8
12
16
3
4
5
6
7
8
9
To GHzHr or not to GHzHr?
 Despite the flaws in this measurement of power,
observe the following prices on Dell.co.uk
Cost (£ per GHzHr)
Dual Quad Core Xeon 5355 2.66GHz (8 cores)
Dual Dual Core Xeon 5130 2GHz (4 cores)
Dual Dual Xeon 5050 3GHz (4 cores)
Quad Core Xeon 5355 2.66GHz (4 cores)
Dual Core Xeon 5050 3GHz (2 cores)
Dual Dual Core AMD Opteron 2.8GHz (4 cores)
Dual Core AMD Opteron 2.8GHz (2 cores)
Dual Core AMD Opteron 2.4GHz (2 cores)
Dual Core AMD Opteron 2GHz (2 cores)
0
0.005
0.01
0.015
0.02
 The average is 1.36p GHzHr (2.53p if you use Hire Purchace).
 These values were 1.15p GHzHr 6 months ago (1.39p on HP)
 This suggests an acceptance de-facto of GHzHrs as a basis of price
How to get the most GHzHrs for £
1. Buy your own computers, build your
room and run a computing facility
2. Rent computers and hosting from
external provider
3. Use grid computing to extract value from
existing machines
Option 1: Build your own Facility
 Advantages
 You get good PR when it opens
 You can get exactly the equipment you
want….well almost
 Disadvantages
 The project risks of building and
commissioning such facilities are
surprisingly large
 No flexibility – run at 100% all the time or
waste the investment
 All the responsibility, uptime, hardware
failures, hardware refresh, being
everything to everyone!
 Real cost
The costs of a computer room
 Cost of computer hardware is 1.0p GHzHr
 However “Bare Bones” facility calculation for a 1000 Dual CPU node
cluster shows




£1.3million running costs per year (500k in electricity alone)
GHzHr rate 1.27p to 1.43p
£4.6 to £6.3 million startup costs
This build has no UPS or other “high availability” features
 This should be the cost Universities should be able to pass on to
internal users
 However inherent inefficiencies in the internal process mean this
rapidly becomes more than 5p GHzHr – if the university can find
the initial capital in the first place!
 Anecdotally one institution believes it is possible to charge 10p
GHzHr and expect;
 Their academics to pay it
 The research councils to accept the charge on the FEC project sheet
The chain of events…..
 More and more research areas need substantial computing resources – more than any
department can contemplate
 The University steps in to provide a central computing facility – more efficient on the
surface
 You now need a computer room built to modern spec (as you’ll need 1000 CPUs)
 Almost always requires a substantial building project
 Building projects are notoriously late and over budget (typical fully costed build
multiplies initial quotes by 2)
 Every department now at the mercy of the progress of this central project, which
becomes politically more risky to control as costs overrun and deadlines are missed
 Computing facility comes online and has to recover much larger costs than anticipated
from departments (and their research projects) – almost inevitable now we have FEC
 A late project has had an academic opportunity cost which is difficult to quantify (but
almost certainly has cost research grants), and the overpricing needed has made it
unattractive to use
 University steps in to force use (User charge bumped up with general fund support)
 Nobody is happy, costs are high and research has suffered
 Even worse: if certain “high spec” users of computing have persuaded the University to
spend extra money on hardware which they need, as this results in central funds
sponsoring a particular group’s work at the expense of other uses
Option 2: Outsource CPU Resources
 Advantages
 You can get the best value from a load
of suppliers (competition is fierce)
 You can often get your resources online
in less than a week
 You have no risks with a building project
and the hardware maintenance,
infrastructure resilience and operational
hassle is no longer yours
 You are flexible – grow and shrink as
necessary
 Disadvantages
 You don’t get as large a choice of
hardware
Price Comparison for GHzHrs
 Some prices from the web for dedicated hosting
 5p was the norm 6 months ago
 Worthy mention www.VCompute.com
 5p/GHzHr but in conventional cluster arrangement with 8GB
RAM on each node, happy to supply over 10000 nodes
 Reasons for Variation
 Different RAM
 Pentium/XEON/AMD
 Bandwidth restrictions
 HDD size
 Additional services
Pence per GHzHr
6
5
4
3
2
1
0
1
2
3
4
5
6
7
8
9
10 11 12 13 14 15 16
Option 3: Use Existing Resources Better
 Grid Technology can enable the thousands of
machines in an instituion to be utilised much
more effectively
This is a real resource which is going to waste:
estimated £100 billion globally per annum.
 Office machines can be used to soak up the
more conventional computing tasks leaving only
specialist tasks for special machines
Specialist machines can be smaller and come back
into the departmental remit – where they belong!
Potential locked away….
 How many computers in Oxford?
 Oxford University has 5000 staff
 50000 registered IP addresses
 Suggests 10000 modern machines available
 How many in the UK academic sector?
 168 institutions employing 160000 staff
 Assume 200000 “decent” machines (2Ghz or better) available and
connected to the LAN
 3.5 Billion GHzHrs total, 2.6 Billion outside office hours
 Total incremental cost of this is dominated by the extra
electricity: £50000 per year per institution
 Equivalent to 0.3p GHzHr
 For a UK-wide cost of £8m/year, we could have 2 Darwin
machines
 Equivalent to #13 in the top 500
Grid Technology: Nereus
 Any proposed technology which attempts to exploit these
idle machines must
 Support Windows primarily (90% of all computers run Windows
not Linux)
 Not require admin privileges to run
 Be bulletproof to protect users and owners from each other (and
limit support calls)
 Must be simple and easy to install
 My particular interest and project: Nereus
 In development for 2 years, currently in beta
 Testing phase set to begin within weeks on many thousands of
machines (ironically not in academia and not in the UK!)
 Solves the above issues in a way not addressed by any current
grid middleware
Recommendations
 Do not…
 build any more computer rooms at an institution level
 Waste money on large special hardware
 Wait any longer to catch up the rest of the world in computing
resources
 Do…..
 Outsource computing resources to specialist providers
 Soak up the existing resources in institutions using grid technology
 Let special projects buy their special hardware for their own use as
before
 Finally a Request for a “composite” National Grid Service:
 We can build an academic grid using Nereus which pools the idle
time of all UK institutions – a resource of global capability
 Can the NGS supply conventional clusters and also manage a
desktop grid deployment to ensure the right users get the best
resources per £
 Can anyone suggest a means to fund the desktop grid part in the
UK- a small outlay will have massive benefits
Download