Presentazione di PowerPoint

advertisement
Fabric Management
Massimo Biasotto, Enrico Ferro – INFN LNL
M.Biasotto, CERN, 5 november 2001
1
Legnaro CMS Farm Layout
2001
40 Nodes
4000 SI95
9 TB
N1
1
N24
N1
2
N1
N24
8
FastEth
FastEth
FastEth
SWITCH
SWITCH
SWITCH
N24 2001-2-3
32 – GigaEth 1000 BT
2001
11 Servers
3.3 TB
S1
Nx – Computational Node
Dual PIII – 1 GHz
512 MB
3x75 GB Eide disk + 1x20 GB for O.S.
M.Biasotto, CERN, 5 november 2001
S11
up to 190
Nodes
To WAN
34 Mbps 2001
155 Mbps 2002
S16
Sx – Disk Server Node
Dual PIII – 1 GHz
Dual PCI (33/32 – 66/64 512 MB
4x75 GB Eide Raid disks (exp up to 10)
1x20 GB disk O.S.
2
Datagrid
 Project structured in many “Work Packages”:
– WP1: Workload Management
– WP2: Data Management
– WP3: Monitoring Services
– WP4: Fabric Management
– WP5: Mass Storage Management
– WP6: Testbed
– WP7: Network
– WP8-10: Applications
 3 year project (2001-2003).
 Milestones: month 9 (Sept 2001), month 21 (Sept 2002),
month 33 (Sept 2003)
M.Biasotto, CERN, 5 november 2001
3
Overview
 Datagrid WP4 (Fabric Management) overview
 WP4 software architecture
 WP4 subsystems and components
 Installation and software management
 Current prototype: LCFG
 LCFG architecture
 LCFG configuration and examples
M.Biasotto, CERN, 5 november 2001
4
WP4 overview
 Partners: CERN, INFN (Italy), KIP (Germany),
NIKHEF (Holland), PPARC (UK), ZIB (Germany)
 WP4 website:
http://hep-proj-grid-fabric.web.cern.ch/hep-projgrid-fabric/
 Aims to deliver a computing fabric comprised of
all the necessary tools to manage a centre
providing Grid services on clusters of thousands
of nodes
M.Biasotto, CERN, 5 november 2001
5
WP4 structure
 WP activity divided in 6 main ‘tasks’
– Configuration management (CERN + PPARC)
– Resource management (ZIB)
– Installation & node management (CERN + INFN +
PPARC)
– Monitoring (CERN + INFN)
– Fault tolerance (KIP)
– Gridification (NIKHEF)
 Overall WP4 functionality structured into units called
‘subsystems’, corresponding to the above tasks
M.Biasotto, CERN, 5 november 2001
6
Architecture overview
 WP4 architectural design document (draft):
– http://hep-proj-grid-fabric.web.cern.ch/hep-proj-gridfabric/architecture/eu/default.htm
 Still work in progress: open issues that need further
investigation
 Functionalities classified into two main categories:
– User job control and management
 handled by Gridification and Resource Management
subsystems
– Automated system administration
 handled by Configuration Mgmt, Installation Mgmt,
Fabric Monitoring and Fault Tolerance subsystems
M.Biasotto, CERN, 5 november 2001
7
- provides the tools for
gathering and storing
performance, functional
and environmental
changes for all fabric
elements; WP4 subsystems
Architecture overview
- Interface between Gridwide services and local
fabric;
- Provides local
authentication, Grid User
authorization and
mapping of grid
credentials.
Resource
Broker
(WP1)
Fabric
Gridification
Data Mgmt
- provides transparent
(WP2)
access to different
cluster batch systems;
- enhanced capabilities
(extended scheduling
policies, advanced Local User
reservation, local
accounting).
Resource
Management
Farm A (LSF)
M.Biasotto, CERN, 5 november 2001
- central measurement
repository provides
Other Wpshealth
and status view of
services and resources;
- fault tolerance
correlation engines detect
failures and trigger
recovery actions.
Monitoring &
Fault Tolerance
Farm B (PBS)
Data
- provides theGrid
tools
to install
Storage
and manage all software
(WP5) nodes;
running on the fabric
(Mass storage,
- bootstrap services;
Disk pools)
software repositories;
Node
Management to install,
upgrade, remove and
configure software packages
on the nodes.
Grid Info
Services
(WP3)
Configuration
Management
Installation &
Node Mgmt
- provides a central storage
and management of all
fabric configuration
information;
- central DB and set of
protocols and APIs to store
and retrieve information. 8
Resource Management diagram
Accepts job requests,
verifies credentials and
schedules the jobs
Stores static and
dynamic information
describing the states of
the RMS and its
managed resources
Assigns resources to
incoming job requests,
enhancing fabric batch
systems capabilities (better
load balancing, adapts to
resource failures, considers
maintenance tasks)
proxies provide uniform
interface to underlying
batch systems (LSF,
Condor, PBS)
M.Biasotto, CERN, 5 november 2001
9
Monitoring & Fault Tolerance diagram
Monitoring User
Interface - graphical
interface to the
Measurement Repository
Human operator host
Measurement Repository
MS MS
MS
stores timestamped
MUI
information; it consists of
local caches on the nodes and
a central repository server
Central repository
Data
Base
FTCE
Fault Tolerance Correlation
Engine - processes
measurements of metrics stored
in MR to detect failures and
possibly decide recovery actions
AD
Actuator Dispatcher - used
by FTCE to dispatch Fault
Tolerance Actuators; it
consists of an agent
controlling all actuators on a
local node
Service master node
MR
server
Local node
MR
FTCE
cache
MSA
Monitoring Sensor Agent collects data from
Monitoring Sensors and
forwards them to the
Measurement Repository
M.Biasotto, CERN, 5 november 2001
Control
flow Tolerance Actuator Fault
Data flow
FTA
MS
executes automatic recovery
actions
Monitoring Sensor - performs
measurement of one or several metrics;
10
Configuration Management diagram
Configuration Database:
stores configuration
information and manages
modification and retrieval
access
Configuration
Cache Configuration Manager:
downloads node profiles from CDB
and stores them locally
Client Node
Database
High Level
Low Level
Description
Description
M.Biasotto, CERN, 5 november 2001
Cache
Configuration
Manager
A
P
I
Local
Process
11
Configuration DataBase
Low Level
Description
High Level
Description
cmsserver1 /etc/exports
/app
All computing nodes of
CMS Farm #3 use
cmsserver1 as
Application Server
cmsnode1, cmsnode2, ..
cmsnode3 /etc/fstab
cmsnode2
/etc/fstab /app
cmsserver1:/app
cmsnode1
/etc/fstab /app
cmsserver1:/app
cmsserver1:/app
M.Biasotto, CERN, 5 november 2001
/app
nfs..
nfs..
nfs..
12
Installation Management diagram
Software Repository - central
fabric store for Software
Packages
Administrative Scripting Layer Applications
Fabric Node
Bootstrap Service service for initial
installation of nodes
Actuator
Dispatcher
Monitoring
& Fault Tol.
NMA
SR
SP’s
SP’s
(local)
BS
System
images
Configuration Management Subsystem
Node Management Agent manages installation,
upgrade, removal and
configuration of software
packages
M.Biasotto, CERN, 5 november 2001
Control Flow: function calls
Data Flow: Configuration, SP’s,
system images. monitoring
13
Distributed design
 Distributed design in the architecture, in order to ensure
scalability:
– individual nodes as much autonomous as possible
– local instances of almost every subsystem: operations
performed locally where possible
– central steering for control and collective operations
Monitoring
Central
Central
Config DB
Repository
Monitoring
Monitoring
Monitoring
Local
Repository
M.Biasotto, CERN, 5 november 2001
Local
Config DB
Repository
Local
Config DB
Local Cache
Repository
Local Cache
Config DB
Local Cache
14
Scripting layer
 All subsystems are tied together using a high level
‘scripting layer’:
– allows administrators to code and automate complex
fabric-wide management operations
– coordination in execution of user jobs and
administrative task on the nodes
– scripts can be executed by Fault Tolerance subsystem
to automate corrective actions
 All subsystems provide APIs to control their components
 Subsystems keep their independence and internal
coherence: the scripting layer only aims at connecting
them for building high-level operations
M.Biasotto, CERN, 5 november 2001
15
Maintenance tasks
 Control function calls to NMA are known as ‘maintenance
tasks’
– non intrusive: can be executed without interfering with
user jobs (e.g. cleanup of log files)
– intrusive: for example kernel upgrades or node reboots
 Two basic node states from the administration point of view
– production: node is running user jobs or user services
(e.g. NFS server). Only non intrusive tasks can be
executed
– maintenance: no user jobs or services. Both intrusive
and non intrusive tasks can be executed
 Usually a node is put into maintenance status only when it
is idle, after draining the job queues or switching the
services to another node. But there can be exceptions to
immediately force the status change.
M.Biasotto, CERN, 5 november 2001
16
Installation & Software Mgmt Prototype
 The current prototype is based on a software tool originally
developed by the Computer Science Department of
Edinburgh University: LCFG (Large Scale Linux
Configuration)
http://www.dcs.ed.ac.uk/home/paul/publications/ALS2000/
 Handles automated installation, configuration and
management of machines
 Basic features:
– automatic installation of O.S.
– installation/upgrade/removal of all (rpm-based) software
packages
– centralized configuration and management of machines
– extendible to configure and manage custom application
software
M.Biasotto, CERN, 5 november 2001
20
LCFG diagram
Config files
LCFG Config Files
/etc/shadow
/etc/services
+inet.services
Make XML
Profile
ftp sshd
Read telnet loginLoad
ALLOWED_NETWORKS
rdxprof
ldxprof
HTTP+inet.allow_telnet
Profile
Profile
+inet.allow_login
+inet.allow_ftp
<inet>
....
in.ftpd : 192.168., 192.135.30.
sshd : ALL
Server
Abstract configuration
parameters for all nodes
stored in a central
repository
M.Biasotto, CERN, 5 november 2001
ALLOWED_NETWORKS
ALLOWED_NETWORKS
Profile
Generic
+inet.allow_sshd
ALL
<allow
cfg:template="allow_$
tag_$ daemon_$">
Object
Component
Web Server
XML Profile
mickey:x:999:20::/home/Mickey:/bin/tcsh
in.rlogind : 192.168., 192.135.30.
....
in.telnetd : 192.168., 192.135.30.
XML profiles
+inet.allow
/etc/group
/etc/inetd.conf
/etc/passwd
/etc/hosts.allow
telnet login ftp
+inet.daemon_sshd
yes
<allow_RECORD
cfg:name="telnet">
.....
<allow>192.168., 192.135.30.</allow>
Local cache
+auth.users
</allow_RECORD> inet myckey
auth
+auth.userhome_mickey
.....
/home/mickey
LCFG
Objects
+auth.usershell_mickey
</auth>
/bin/tcsh
Client nodes
<user_RECORD cfg:name="mickey">
<userhome>/home/MickeyMouseHome</userhome>
<usershell>/bin/tcsh</usershell>
A collection
</user_RECORD>
of agents read
configuration parameters and
either generate traditional config
files or directly manipulate various
services
21
LCFG: future development
LCFG Config Files
Current
Prototype
HTTP
Make XML
Profile
Read
rdxprof
Profile
Profile
Generic
Object
Component
Web Server
XML Profile
Local cache
Configuration
Database
HTTP
Cache
Cache
Manager
Manager
API
API
Profile
Generic
Object
Component
Web Server
XML Profile
Cache
M.Biasotto, CERN, 5 november 2001
LCFG Objects
Client nodes
Server
Future
Evolution
Load
ldxprof
Profile
LCFG Objects
22
LCFG configuration (I)
 Most of the configuration data are common for a category
of nodes (e.g. diskservers, computing nodes) and only a
few are node-specific (e.g. hostname, IP-address)
 Using the cpp preprocessor it is possible to build a
hierarchical structure of config files containing directives
like #define, #include, #ifdef, comments with /* */, etc...
 The configuration of a typical LCFG node looks like this:
#define HOSTNAME pc239
/* Host specific definitions */
#include "site.h"
#include "linuxdef.h"
#include "client.h"
/* Site specific definitions */
/* Common linux resources */
/* LCFG client specific resources */
M.Biasotto, CERN, 5 november 2001
23
LCFG configuration (II)
From "site.h"
#define LCFGSRV
#define URL_SERVER_CONFIG
#define LOCALDOMAIN
#define DEFAULT_NAMESERVERS
[...]
From "linuxdef.h"
update.interfaces
update.hostname_eth0
update.netmask_eth0
[...]
From "client.h"
update.disks
update.partitions_hda
update.pdetails_hda1
update.pdetails_hda2
auth.users
auth.usercomment_mickey
auth.userhome_mickey
[...]
M.Biasotto, CERN, 5 november 2001
grid01
http://grid01/lcfg
.lnl.infn.it
192.135.30.245
eth0
HOSTNAME
NETMASK
hda
hda1 hda2
free /
128 swap
mickey
Mickey Mouse
/home/Mickey
24
LCFG: configuration changes
 Server-side: when the config files are modified, a tool
(mkxprof) recreates the new xml profile for all the nodes
affected by the changes
– this can be done manually or with a daemon periodically
checking for config changes and calling mkxprof
– mkxprof can notify via UDP the nodes affected by the
changes
 Client-side: another tool (rdxprof) downloads the new
profile from the server
– usually activated by an LCFG object at boot
– can be configured to work as
 daemon periodically polling the server
 daemon waiting for notifications
 started by cron at predefined times
M.Biasotto, CERN, 5 november 2001
25
LCFG: what’s an object?
 It's a simple shell script (but in future it will probably be a
perl script)
 Each object provides a number of “methods” (start, stop,
reconfig, query, ...) which are invoked at appropriate times
 A simple and typical object behaviour:
– Started by profile object when notified of a configuration
change
– Loads its configuration from the cache
– Configures the appropriate services, either translating
config parameters into a traditional config file or directly
controlling the service (e.g. starting a daemon with
command-line parameters derived from configuration).
M.Biasotto, CERN, 5 november 2001
26
LCFG: custom objects
 LCFG provides the objects to manage all the standard
services of a machine: inet, syslog, auth, nfs, cron, ...
 Admins can build new custom objects to configure and
manage their own applications:
– define your custom “resources” (configuration
parameters) to be added to the node profile
– include in your script the object “generic”, which
contains the definition of common function used by all
objects (config loading, log, output, ...)
– overwrite the standard methods (start, stop, reconfig, ...)
with your custom code
– for simple objects usually just a few lines of code
M.Biasotto, CERN, 5 november 2001
27
LCFG: Software Packages Management
 Currently it is RedHat-specific: heavily dependent on the
RPM tool
 The software to install is defined in a file on the server
containing a list of RPM packages (currently not yet
merged in the XML profile)
 Whenever the list is modified, the required RPM packages
are automatically installed/upgraded/removed by a specific
LCFG object (updaterpms), which is started at boot or when
the node is notified of the change
M.Biasotto, CERN, 5 november 2001
28
LCFG: node installation procedure
IP address
LCFG Config
XML
Config URL
Files
Profiles
DHCP Server
Root Image
with LCFG
environment
LCFG Server
Start
object
“install”:
First Image
boot
via
floppy
Load
minimal
config
Root
complete
Load
complete
disk data
partitioning,
network,...
orwith
viavia
network
DHCP:
LCFG
After
reboot
LCFG
configuration
via
installation
of required
environment
objects
complete
script
the
IPInitialization
Address,
Gateway,
HTTP
packages
mounted
via NFS
node
configuration
starts
LCFG
Config
URL
WEB Server
Software
Packages
copy of LCFG configuration
reboot
NFS Server
Software Repository
Client Node
M.Biasotto, CERN, 5 november 2001
29
LCFG: summary
 Pros:
– In Edinburgh it has been used for years in a complex
environment, managing hundreds of nodes
– Supports the complete installation and management of
all the software (both O.S. and applications)
– Extremely flexible and easy to customize
 Cons:
– Complex: steep learning curve
– Prototype: the evolution of this tool is not clear yet
– Lack of user-friendly tools for the creation and
management of configuration files: errors can be very
dangerous!
M.Biasotto, CERN, 5 november 2001
30
Future plans
 Future evolution not clearly defined: it will depend also on
results of forthcoming tests (1st Datagrid milestone)
 Integration of current prototype with Configuration
Management components
– Config Cache Manager and API released ad prototypes
but not yet integrated with LCFG
 Configuration DataBase
– complete definition of node profiles
– user-friendly tools to access and modify config
information
 Development of still missing objects
– system services (AFS, PAM, ...)
– fabric software (grid sw, globus, batch systems, ...)
– application software (CMS, Atlas, ...) in collaboration
with people from experiments
M.Biasotto, CERN, 5 november 2001
31
Download