here - LCG Fabric Area

advertisement
1. Linux at CERN
1.1.Introduction & Summary
For physics computing, Linux is the operating system of choice at CERN. It provides
a variety of services on machines ranging from desktops to servers. CERN has its own
Linux distribution and support infrastructure. In the future, Linux use will increase
due to rising number of farm machines, and investment into automatized tools will be
needed to administer such large number of machines with reasonable cost in
manpower.
1.2.History
Linux is a UNIX-like POSIX-compliant operating system grown out of a study
project of Finnish student Linus Torvalds in 1991 . Together with the utilities and
compiler from the GNU project and other open source projects, it has since has
evolved with the help of volunteers from all over the world to become a productionquality multi-architecture operating system (History) and user environment, subsumed
here under the name of the kernel ("Linux").
CERN has been using Linux since at least 1995 when individual physicists groups
started using it. As from 1997, CERN has had a centrally-supported Linux release,
then based on Red Hat 4, with all the CERN tools and environments (AFS, ASIS,
SUE) available as on the other vendor UNICes. Regular releases have followed the
Linux evolution of the outside world, and CERN has actively contributed to the
development (Gigabit drivers, porting GNU libc to IA64) as necessary to its needs.
Illustration 1different Linux environments
Usage
Batch &
Interactive Farms:
~1200 (in Computer Center)
'Special' servers
~200 Disk servers
~50 Tape servers
Desktops / Desk-side:
~1400
Embedded Systems:
(few now, lots later)
1.3.Linux use at CERN
a) Farms
A large part of the Linux systems installed at CERN is used for computing through
large batch farms running LFS, like LXBATCH. Today, the typical physicists
computing job fits comfortably within a single dual-processor IA32 machine, so the
farm machines are processing jobs independently without need for Single-SystemImage abstractions or MPI. This independence between (a large number of) jobs
running over extended periods of time has given rise to the term High-throughput
computing (HTC). Dual-Processor machines with commodity hardware and off-theshelf networking offer a sweet spot in terms of price / CPU performance for this kind
of workload. Linux supports this type of machine very well.
An "interactive" cluster (LXPLUS) allows users without a desktop Linux machine to
develop applications and submit them to the batch system. It also is being used for
reading mail or occasionally for web access, e.g. from X-Terminals or for remote
users.
A smaller interactive cluster with SUN Solaris allows to validate code against a
different compiler.
b) Special Servers
Similar to the batch farms, commodity hardware with Linux as the operating system
can deliver significant cost benefits over proprietary solutions for storage or specialpurpose server applications. At CERN, a combination of Dual-Processor IA32 with
(cheap) hardware RAID cards and IDE disk drives has offered reliable disk storage in
the form of the "Disk Server". Similarly, tape drives are directly attached via SCSI or
FC to Linux "Tape Servers".
Access to the data from the application is handled via the SHIFT architecture or
through CERN's mass storage application CASTOR.
Other special-purpose servers are also running Linux, for example ORACLE database
servers, AFS or NFS file servers or CERN's DNS service. Linux is used here in
parallel with SUN Solaris, typically being preferred as soon as the number of system
grows. Solaris typically runs at CERN on more reliable hardware for services where
high availability of individual machines is required.
c) Desktops
A number of physicists prefer to use Linux on their desktop or laptop computer for
their day-to-day work like reading mail or web browsing. CERN Linux comes with
the required graphical user environment (both GNOME and KDE are available) and
utilities. Compatibility with CERN-IT central mail and web services is regularly
checked.
This behavior by users has brought up a number of problems in terms of
interoperability in the past, especially documents in proprietary formats (like
Microsoft Word/ PowerPoint or Adobe Framemaker) have forced users to keep a
second machine with Microsoft Windows (or to run VMware or dual-boot). As of
lately, open source programs like OpenOffice.org are getting better at understanding
such formats, but are no full substitute yet.
Illustration 2Dependencies inside CERN
Linux 7.3
Running Linux on the desktop is also a preferred
solution for software developers, since it allows
them to be in complete control of the runtime
environment (unlike on the shared farm machines).
To facilitate this approach, one of the paradigms
for CERN Linux is to have the same operating
system (including libraries and compilers) on
desktops and farm machines (and embedded
devices, if possible).
d) Embedded appliances
These special-purpose devices used to run proprietary real time operating systems like
LynxOs or VxWorks. They are being used mainly by the accelerator controls and
experiments "online" groups, to handle data under special conditions (radiation, heat,
vibration) or constraints (hard or soft real time processing). Due to growing
familiarity with UNIX/POSIX, we see a trend to run Linux on such devices as long as
no hard real time guarantees are required. This allows for more comfortable
development and debugging (instead of the cross-compiling development proprietary
environments). From a system administration perspective, such machines are very
close to ordinary off-the-shelf PCs as used in the batch farms, but they may need
special supporting services (e.g. for diskless booting) or drivers (for VMEbus
devices). They also put special demands on the Linux kernel internals, for example on
context switching time or inter-task fairness.
1.4.CERN Linux distribution and certification
As mentioned earlier, the goal is to use a single Linux release to cover all aspects of
Linux use at CERN, both to comply with user expectation and to keep support effort
down. No commercial distribution fulfills all CERN's needs and runs on all the
hardware found at CERN, so CERN has been providing a modified version of Red
Hat. The modifications include additional kernel patches, new software like OpenAFS
and CERN-specific physics software and management applications.
Whenever the goal of having a uniform system cannot be met (e.g. due to new bugs
being discovered or incompatible hardware, both of which can trigger a kernel
update), such deviations are noted and are folded into the next release.
In order to ensure that no requirements on a new release are overlooked, a formal
process has been established for moving to a new release.
A "certification coordination" group (LXCERT) with appointed members from the
large Linux user groups and service providers inside CERN oversees the process and
is responsible for bringing up and arbitrating user requirements and dependencies
between different applications. Certification process is tracked, a final decision to
adopt or reject lies with this coordination group. In the future, the influence of GRID
computing will bring in more requirements from other sites as well.
1.5.CERN Linux support
AT CERN, the support for Linux is handled at multiple levels and in different groups:
 Users with several machines (like the CERN-IT farms) typically have dedicated
local support for the day-to-day running of their applications.
 Large user groups (experiments, divisions) also have local support to handle direct
user questions
 CERN-IT offers centralized support:
 the CERN Helpdesk will take calls and re-route them appropriately. This level
is handled by an external company.
 a second level will deal with recurring user problems and gives individual
assistance to users, like desktop installations. This level is handled by an
external company.
 An in-house third-level support handles everything else, including preparations
of new releases, kernel bug fixes, workarounds to common problems, assistance
to the farm operations and documentation.
 eventually, support calls may be opened with a vendor. Given that CERN has
no support contract for the majority of Linux machines, these calls are typically
used to inform the Linux user community and may not be resolved for
considerable time.
1.6.Outlook
The current assumptions about physicists' jobs still seem to hold, so the current
computing model is likely to be useful in the future. Therefore, the number of CPU
nodes in batch farms will increase to cope with the massive amount of computational
power required for LHC. Similarly, the online event filter farms will grow massively.
This growth requires new farm management tools to prevent operational costs from
exploding, such tools are currently under development as part of the "fabrics" efforts
of EDG, LCG and EGEE.
At the same time, a number of (relatively) low-cost storage solutions have appeared
that could offer advantages over the current "Disk-Server" model used by CERN.
Typically they provide for direct data access by the clients, and for better scalability
by keeping data and metadata on separate services. Such solutions are being evaluated
at the moment (e.g. As part of CERN's OpenLAB industry collaboration), they could
be integrated with or ultimately replace CERN's own storage solutions.
Similarly, new developments in the interconnect area such as InfiniBand, 10Gb
Ethernet, RDMA, PCI-X, PCI-Express, could be "enabling" technologies by
providing cheap high-bandwidth and low-latency connections between CPU nodes
themselves and storage. This could lead to new approaches for experiment data
analysis or the storage subsystem.
The various GRID projects bring in new challenges, both technical (new services to
be defined and implemented) and political (interoperability between sites perhaps
leading to a HEP-wide Linux distribution, access to remote resources).
Lastly, the Linux world itself is changing – vendors like Red Hat or SuSE are now
concentrating more on profitable bits of their business, making life harder for the
copycat distributions like CERN Linux which in the past have profited from "free"
software updates. Third-party hardware and software vendors like ORACLE, IBM
and SUN have all embraced Linux and are offering commercial support. The large
(noncommercial) user and developer community is meanwhile proceeding with
adding features (the 2.6 kernel is expected soon), often enough in uncoordinated
fashion, and creating new software as often as abandoning older products.
Download