The UCL Condor Pool Experience John Brodholt , Paul Wilson , Wolfgang Emmerich

advertisement
Environment from the Molecular Level
A NERC eScience testbed project
The UCL Condor Pool Experience
John Brodholt1, Paul Wilson3, Wolfgang Emmerich2 and
Clovis Chapman2.
1. Department of Earth Sciences, University College London, Gower Street,
London WC1E 6BT, UK
2. Department of Computer Science, University College London, Gower Street,
London WC1E 6BT, UK
3. Anvil Software, London, UK.
Environment from the Molecular Level
A NERC eScience testbed project
The UCL Condor Pool
Approximately 946 Windows machines (yesterday)
1 to 2.4 GHtz Intel processors
256 to 512 MBytes memory (a few are more)
They are in “open access” student cluster rooms
PCs are all thin client “WTS” machines with network bootable operating
systems. (Citrix/Bpbatch - hit spacebar to upload new operating systems
image)
The pool is very simple – one manager, one submit machine (via ssh).
Environment from the Molecular Level
A NERC eScience testbed project
Q. Why was it someone from an Earth Science Dept. who got
it going?
1. Because three years ago, the eScience grants call made
me look up “the Grid” on the web and by chance I came
across the Condor web site.
2. I also happened to know how Information Systems at UCL
managed their student PCs.
3. Persuaded the Director of UCL’s Education and Information
Systems Division that I could put it in our eMinerals grant (I
think he assumed it wouldn’t get funded).
Environment from the Molecular Level
A NERC eScience testbed project
Key Political Issues
Even though the Director of EISD had agreed for us to put it in the grant,
we had to convince Information Systems themselves.
Numerous meetings ….
IS produced a five page document outlining what they thought their
policy on a large Condor cluster would be – i.e. the primary purpose of
the student cluster rooms must not be compromised. Nor should IS staff
use their time on the project … etc.
Needed testing (one cluster, then one image type).
Perhaps the key moment was when the UCL presented its eScience
projects to Tony Hey and the UCL Provost.
Environment from the Molecular Level
A NERC eScience testbed project
Timescale
Desktop - June 2002 (2 nodes)
Earth Science Student Cluster Room - Oct 2002 (18 nodes)
Physics Department (one WTS image) – Jan 2003 (150 nodes)
Campus – October 16th 2003 (930 nodes)
1 millionth hour of CPU – April 2nd, 2004
This matched exactly the timescale we outlined in the eMinerals grant
Environment from the Molecular Level
A NERC eScience testbed project
Other Issues
Difficult to persuade the scientists to get involved for just a few
machines.
Some needed to compile their codes for Windows machines – “It’s
simple, just convert them to Java ..” Wolfgang Emmerich, 2002!
Our central manager died a few times when a user submitted a few
thousand jobs all at the same time (took 24 hours to repair disk with
fsck). Now have a manager and a submit machine.
Students will do anything to reserve a machine – steal the mouse, put
out of order signs on them, and UNPLUG them. Also, IS themselves
briefly turn machines in some clusters off in order to clear the room.
This restricts the length of job.
Environment from the Molecular Level
A NERC eScience testbed project
UCL Condor job time fluctuations.
Dashed line shows 5 hr recommended maximum job time.
18.00
15.26
16.00
13.28
av. job times, hours
14.00
12.00
9.73
10.00
8.00
6.73
4.93
6.00
4.00
2.23
2.34
2.00
0.00
Oct 2OO3 Nov 2OO3 Dec 2OO3 Jan 2OO4 Feb 2OO4 Mar 2OO4
Apr2OO4
Environment from the Molecular Level
A NERC eScience testbed project
Spikes in user demand:
a) Not many users
b) Most are using simple schemes to
produce lots of initial input files and
send off to pool. Get results back and
spend a long time processing
them/extracting data/planning next set
of inputs.
Existing e-science technology
Distributed
Distributed
resources
Computing
(Condor pools
Portal
etc.)
User Input:
Structural model
Si/Al, cation types, [H2O] etc.
Model/Configuration
Generator
Jobs
Database
Steering
Database
Improve
generation / model
strategy
Analysis
(geometry, energy, fit)
Analysis
Database
User Input:
Diffraction data, chemical analysis,
building units
Si/Al, cation types, [H2O] etc.
Drip feeding and interactive steering of a
Condor pool using relational databases
Dewi Lewis, Rosie Coates and
Sam French
UCL Chemistry / RI
Environment from the Molecular Level
A NERC eScience testbed project
THE Science.
1. Simulation of pollutants in the environment
Binding of heavy metals and organic molecules in soils.
2. Studies of materials for long-term nuclear waste encapsulation
Radiocactive waste leaching through ceramic storage media.
3. Studies of weathering and scaling
Mineral/water interface simulations, e.g oil well scaling.
also
4. The Earth’s core and mantle
Many codes:
DL-POLY, GULP, METADISE, CRYSTAL, CASTEP, SIESTA, …
Environment from the Molecular Level
A NERC eScience testbed project
Now what?
Expand pool to include staff WTS machines ~
1500 machines (received 3 page email from IS who owns them?).
UCL Staff machines at hospitals ~ ???? machines.
Federate with other pools: hopefully make it more
flexible smooth spikes in demand.
Download