A Multidisciplinary Computer Centre … is it possible? John Gordon CCLRC eSC CHEP March 2003 The Problem? A UK Colleague, quoted a few years ago when linux for physics was just becoming common: “We have four Linux systems: one for users to login, one for CERN Linux, one for DESY Linux, one for Fermilab linux. And I think we will need one for BaBar Linux soon” • Things have changed but by how much? • Many of the talks in this session describe implementing a solution for one experiment but the staff requirements of this solution scale with number of experiments supported and the fragmentation of resources is inefficient. John Gordon eScience Centre • Can we run a single centre for everyone? LHC Hierarchical Model ~PBytes/sec Online System ~100 MBytes/sec ~20 TIPS There are 100 “triggers” per second Each triggered event is ~1 MByte in size ~622 Mbits/sec or Air Freight (deprecated) US Regional Centre SpecInt95 equivalents Offline Processor Farm There is a “bunch crossing” every 25 nsecs. Tier 1 1 TIPS is approximately 25,000 Tier 0 Germany Regional Centre ~100 MBytes/sec CERN Computer Centre UK Regional Centre Italy Regional Centre ~622 Mbits/sec Tier 2 ~622 Mbits/sec Institute Institute Institute ~0.25TIPS Physics data cache Institute ScotGrid ~1 TIPS NorthGrid London ~1 ~1Tier2 Centre TIPS TIPS ~1 TIPS London Physicists work on analysis “channels”. Each institute will have ~10 physicists working on one or more channels; data for these channels should be cached by the institute server ~1 MBytes/sec Tier 4 LCGJohn Gordon Physicist workstations lcg.web.cern.ch/lcg eScience Centre Sitting in the Centre LCG Running US Experiments A site like ours sits between many experiments and grids Future LHC Experiments John Gordon eScience Centre The multi-experiment centre • So what does a big centre look like these days? • A big linux cluster and lots of disk? • Many types of hardware • All flavours of unix (still VMS!!) • All uses from desktop to supercomputer • Different disks (SCSI, IDE, RAID, SAN) • Different tapes • Different user communities John Gordon eScience Centre The multi-experiment centre • Unlikely to be able to run a centre for all disciplines if we cannot even run one for all HEP experiments • This talk focuses on the problems of supporting many different HEP experiments John Gordon eScience Centre Not a problem • Lots of hardware problems, but the same ones as big and small centres • Lots of anecdotes about hardware problems but sharing between experiments hasn’t been an issue recently. – Apart from Suns for Babar – and we backed away from AMD once because an experiment wouldn’t accept them. John Gordon eScience Centre The problems • • • • • • Software levels ‘experts’ Local rules Security Firewalls The accelerator centres John Gordon eScience Centre Software Levels • Experiment A must upgrade the OS (or compiler, etc), Experiment B cannot. • Linux brings more hardware dependencies – ExperimentA needs one kernel, fiberchannel driver only available in another • Now we have middleware too!! – Experiments can disagree over middleware and OS. – And the middleware might not match the OS John Gordon eScience Centre ‘experts’ • A 200GB disk costs $100 in Best Buy • Therefore 100TB should cost $50K • If you pay more, you are profligate and are wasting HEP funds!!! • … and you should probably be able to negotiate a further discount for bulk purchase! John Gordon eScience Centre Local Rules • A responsible site probably has a policy for who can use its resources, with forms, acceptable use conditions and other safeguards. • Most countries have legal obligations to trace users in case of law-breaking. • Do we really want them to throw these away for the grid? • Even if we want to, only a purely HEP lab can overrule the rules themselves – Even they usually have masters (DoE……) John Gordon eScience Centre Security -Why Do We Care? • Illegal use of resources (stolen software, child pornography ..) • Base for high bandwidth attack on other targets (commercial, government ..) • Unauthorised access to local data (data protection, financial info …) • Health and safety: eg beam-line control • Destruction of local data, disruption of local service • Gain passwords, keys to attack peer sites John Gordon eScience Centre Security • Most security issues are common to all sites • Issues especially relevant here are: – Accelerator Centres (see earlier) – Distributed computing crosses security boundaries • Authentication models, trust – Remote users less attached to your integrity • Shared usernames – how can you trace? – Software often under active development • Smaller user community and many less developers than (eg) Apache John Gordon eScience Centre Why Do We Need a Firewall? You do not need a firewall if: • Either: you have perfect (bug free) operating systems and you have infallible system administrators AND users • Or: you don’t care if you have security incidents (unauthorised access to resources) John Gordon eScience Centre How Do Hackers Break in • Coding errors in server software: – Buffer overflows: give more than expected (poor bounds checking) – Provide unexpected control info (eg append unexpected commands) • Trojans and viruses – backdoors • Inadequate access control. Eg: – NFS export root filesystem R/W to world) – https server allows googlebot access to control menus …file … delete …really delete … !!! • Scanning rate: hundreds per minute John Gordon eScience Centre Common Firewall Policies • Don’t bother! Very unlikely…disasters! • Simple exclusion of some protocols. Eg prevent SNMP off site. • Only allow some protocols – eg only allow kerberised or encrypted protocols. • Protected host ranges – eg keep some hosts/networks safe • Protect large ranges of ports – eg privileged port range. • Access control by host/port • Different sites – probably different policy! John Gordon eScience Centre The accelerator centres You will run • Our Linux • Our software • Our middleware • Our applications • Our security model • Don’t bother us with your local restrictions or firewalls Oh, and by the way, you’ll give us root access to your machines to install it and sort out any problems John Gordon eScience Centre The Answers • …… so far • I hope I can learn more this week John Gordon eScience Centre Software levels • Will never get hardware vendors to remove dependence on OS • Lobby middleware developers to be OS independent – and to keep up reasonably quickly with latest releases • Experiment developers should code to support multiple versions of everything – Don’t run to use new features John Gordon eScience Centre ‘experts’ • • • • Ignore Politely tell them to ‘go away’ Explain the realities of 24x365 use Ask them to demonstrate their solutions – And be prepared to accept if they are correct • Evaluate the most likely of their suggestions John Gordon eScience Centre Local Rules (BaBar/RAL example) • RAL is a TierA centre for BaBar • BaBar users have already signed up to conditions for SLAC, BaBar, & Objectivity • They get an X.509 certificate • Sign EDG accceptable use conditions • Users are made aware of RAL-specific issues – network traffic might be monitored • RAL is happy that they know who the users are and can trace them. • They are allowed to run as grid users John Gordon eScience Centre Local Rules • Use other sites as examples • Common acceptable use policies – The more sites involved in writing them, the more likely they are to be ‘acceptable’ • Get ACs to act as legal entity for a VO – Need to trust the integrity of the VO – Local admins feel better if they can sue someone • Don’t tell them they have no chance of suing CERN John Gordon eScience Centre Security • Educate users through their sysadmins. Make them aware of the risks and responsibilities • PKI and Grid offers ‘roles’ and ‘groups’ so someone can act as production simulation manager but still be identifiable. John Gordon eScience Centre Firewalls • One can often persuade local network admin to make an exception once. – But not many times • Establish trust of your network admin – Convince them that you take security seriously. – Less likely to achieve this if your machines are regularly broken into. • Experiment and middleware developers need to address firewall issues in their design – Security Group of LCG might help here. John Gordon eScience Centre The accelerator centres • They are not used to being questioned. – Put them face to face to resolve clashes • HEPiX is a good forum for this. Successes so far….. – AFS, profiles – Large Cluster Workshop – Surveys on firewalls support…. • But the grid has been a step back – Different centres, different grids. John Gordon eScience Centre The accelerator centres • This problem works against experiment’s interests. • Experiments should take more control over their software environments, take their own compilers and libraries with them. • Lobby for standard distributions – and use them John Gordon eScience Centre Summary • It is possible to take the first steps towards a truly multidisciplinary computer centre – Starting with HEP • Labs and experiments need to talk and adopt new/common practices – Need a culture of collaboration in many dimensions – Lab-lab, experiment-experiment, and experiment-labs • Don’t forget that your experiment/ software/ middleware is not the only one and some poor ****** is having to cope John Gordon with them all. eScience Centre