presentation

advertisement
A Multidisciplinary Computer Centre
… is it possible?
John Gordon
CCLRC eSC
CHEP March 2003
The Problem?
A UK Colleague, quoted a few years ago when
linux for physics was just becoming common:
“We have four Linux systems: one for users to
login, one for CERN Linux, one for DESY Linux,
one for Fermilab linux. And I think we will need
one for BaBar Linux soon”
• Things have changed but by how much?
• Many of the talks in this session describe
implementing a solution for one experiment
but the staff requirements of this solution
scale with number of experiments supported
and the fragmentation of resources is
inefficient.
John Gordon
eScience Centre
• Can we run a single centre for everyone?
LHC Hierarchical Model
~PBytes/sec
Online System
~100 MBytes/sec
~20 TIPS
There are 100 “triggers” per second
Each triggered event is ~1 MByte in size
~622 Mbits/sec
or Air Freight (deprecated)
US Regional
Centre
SpecInt95 equivalents
Offline Processor Farm
There is a “bunch crossing” every 25 nsecs.
Tier 1
1 TIPS is approximately 25,000
Tier 0
Germany Regional
Centre
~100 MBytes/sec
CERN Computer Centre
UK Regional
Centre
Italy Regional
Centre
~622 Mbits/sec
Tier 2
~622 Mbits/sec
Institute
Institute Institute
~0.25TIPS
Physics data cache
Institute
ScotGrid
~1 TIPS
NorthGrid London
~1
~1Tier2 Centre
TIPS
TIPS
~1 TIPS
London
Physicists work on analysis “channels”.
Each institute will have ~10 physicists working on one or more
channels; data for these channels should be cached by the
institute server
~1 MBytes/sec
Tier 4
LCGJohn Gordon
Physicist workstations
lcg.web.cern.ch/lcg
eScience Centre
Sitting in the Centre
LCG
Running US
Experiments
A site like ours sits
between many
experiments and grids
Future LHC
Experiments
John Gordon
eScience Centre
The multi-experiment centre
• So what does a big centre look like these
days?
• A big linux cluster and lots of disk?
• Many types of hardware
• All flavours of unix (still VMS!!)
• All uses from desktop to supercomputer
• Different disks (SCSI, IDE, RAID, SAN)
• Different tapes
• Different user communities
John Gordon
eScience Centre
The multi-experiment centre
• Unlikely to be able to run a centre for
all disciplines if we cannot even run
one for all HEP experiments
• This talk focuses on the problems of
supporting many different HEP
experiments
John Gordon
eScience Centre
Not a problem
• Lots of hardware problems, but the
same ones as big and small centres
• Lots of anecdotes about hardware
problems but sharing between
experiments hasn’t been an issue
recently.
– Apart from Suns for Babar
– and we backed away from AMD once
because an experiment wouldn’t accept
them.
John Gordon
eScience Centre
The problems
•
•
•
•
•
•
Software levels
‘experts’
Local rules
Security
Firewalls
The accelerator centres
John Gordon
eScience Centre
Software Levels
• Experiment A must upgrade the OS (or
compiler, etc), Experiment B cannot.
• Linux brings more hardware dependencies
– ExperimentA needs one kernel, fiberchannel
driver only available in another
• Now we have middleware too!!
– Experiments can disagree over middleware and
OS.
– And the middleware might not match the OS
John Gordon
eScience Centre
‘experts’
• A 200GB disk costs $100 in Best Buy
• Therefore 100TB should cost $50K
• If you pay more, you are profligate
and are wasting HEP funds!!!
• … and you should probably be able to
negotiate a further discount for bulk
purchase!
John Gordon
eScience Centre
Local Rules
• A responsible site probably has a policy for
who can use its resources, with forms,
acceptable use conditions and other
safeguards.
• Most countries have legal obligations to
trace users in case of law-breaking.
• Do we really want them to throw these
away for the grid?
• Even if we want to, only a purely HEP lab
can overrule the rules themselves
– Even they usually have masters (DoE……)
John Gordon
eScience Centre
Security -Why Do We Care?
• Illegal use of resources (stolen software,
child pornography ..)
• Base for high bandwidth attack on other
targets (commercial, government ..)
• Unauthorised access to local data (data
protection, financial info …)
• Health and safety: eg beam-line control
• Destruction of local data, disruption of
local service
• Gain passwords, keys to attack peer sites
John Gordon
eScience Centre
Security
• Most security issues are common to all
sites
• Issues especially relevant here are:
– Accelerator Centres (see earlier)
– Distributed computing crosses security
boundaries
• Authentication models, trust
– Remote users less attached to your integrity
• Shared usernames – how can you trace?
– Software often under active development
• Smaller user community and many less developers
than (eg) Apache
John Gordon
eScience Centre
Why Do We Need a Firewall?
You do not need a firewall if:
• Either: you have perfect (bug free)
operating systems and you have
infallible system administrators AND
users
• Or: you don’t care if you have
security incidents (unauthorised
access to resources)
John Gordon
eScience Centre
How Do Hackers Break in
• Coding errors in server software:
– Buffer overflows: give more than expected
(poor bounds checking)
– Provide unexpected control info (eg append
unexpected commands)
• Trojans and viruses – backdoors
• Inadequate access control. Eg:
– NFS export root filesystem R/W to world)
– https server allows googlebot access to control
menus …file … delete …really delete … !!!
• Scanning rate: hundreds per minute
John Gordon
eScience Centre
Common Firewall Policies
• Don’t bother! Very unlikely…disasters!
• Simple exclusion of some protocols. Eg
prevent SNMP off site.
• Only allow some protocols
– eg only allow kerberised or encrypted protocols.
• Protected host ranges
– eg keep some hosts/networks safe
• Protect large ranges of ports
– eg privileged port range.
• Access control by host/port
• Different sites – probably different policy!
John Gordon
eScience Centre
The accelerator centres
You will run
• Our Linux
• Our software
• Our middleware
• Our applications
• Our security model
• Don’t bother us with your local restrictions
or firewalls
Oh, and by the way, you’ll give us root access
to your machines to install it and sort out
any problems
John Gordon
eScience Centre
The Answers
• …… so far
• I hope I can learn more this week
John Gordon
eScience Centre
Software levels
• Will never get hardware vendors to
remove dependence on OS
• Lobby middleware developers to be
OS independent
– and to keep up reasonably quickly with
latest releases
• Experiment developers should code to
support multiple versions of
everything
– Don’t run to use new features
John Gordon
eScience Centre
‘experts’
•
•
•
•
Ignore
Politely tell them to ‘go away’
Explain the realities of 24x365 use
Ask them to demonstrate their
solutions
– And be prepared to accept if they are
correct
• Evaluate the most likely of their
suggestions
John Gordon
eScience Centre
Local Rules (BaBar/RAL example)
• RAL is a TierA centre for BaBar
• BaBar users have already signed up to
conditions for SLAC, BaBar, & Objectivity
• They get an X.509 certificate
• Sign EDG accceptable use conditions
• Users are made aware of RAL-specific
issues
– network traffic might be monitored
• RAL is happy that they know who the users
are and can trace them.
• They are allowed to run as grid users
John Gordon
eScience Centre
Local Rules
• Use other sites as examples
• Common acceptable use policies
– The more sites involved in writing them,
the more likely they are to be ‘acceptable’
• Get ACs to act as legal entity for a VO
– Need to trust the integrity of the VO
– Local admins feel better if they can sue
someone
• Don’t tell them they have no chance of suing
CERN
John Gordon
eScience Centre
Security
• Educate users through their
sysadmins. Make them aware of the
risks and responsibilities
• PKI and Grid offers ‘roles’ and
‘groups’ so someone can act as
production simulation manager but
still be identifiable.
John Gordon
eScience Centre
Firewalls
• One can often persuade local network
admin to make an exception once.
– But not many times
• Establish trust of your network admin
– Convince them that you take security seriously.
– Less likely to achieve this if your machines are
regularly broken into.
• Experiment and middleware developers
need to address firewall issues in their
design
– Security Group of LCG might help here.
John Gordon
eScience Centre
The accelerator centres
• They are not used to being
questioned.
– Put them face to face to resolve clashes
• HEPiX is a good forum for this.
Successes so far…..
– AFS, profiles
– Large Cluster Workshop
– Surveys on firewalls support….
• But the grid has been a step back
– Different centres, different grids.
John Gordon
eScience Centre
The accelerator centres
• This problem works against
experiment’s interests.
• Experiments should take more control
over their software environments,
take their own compilers and libraries
with them.
• Lobby for standard distributions
– and use them
John Gordon
eScience Centre
Summary
• It is possible to take the first steps
towards a truly multidisciplinary computer
centre
– Starting with HEP
• Labs and experiments need to talk and
adopt new/common practices
– Need a culture of collaboration in many
dimensions
– Lab-lab, experiment-experiment, and
experiment-labs
• Don’t forget that your experiment/
software/ middleware is not the only one
and some poor ****** is having to cope
John Gordon
with them all.
eScience Centre
Download