Deploying Data Centers with Puppet – A user’s perspective May 27, 2010 © NYSE Euronext. All Rights Reserved. Objective of this Presentation Share to the Puppet community the solution arrived by NYSE Euronext’s System Architecture and Engineering (SAE) team to address the technical challenges imposed by the Business Strategy of the company: • Consolidate various Data Centers across the globe in two brand-new state-of-the-art facilities, one in the US and another in Europe; • Adapt a Global standard process to build and configure servers and applications across multiple countries and teams; • Quickly deploy changes in large scale in an automated way. 2 Main Tools Used in the Solution • RHEL Kickstart – • RH Network Satellite – – • A single, very frugal kickstart profile for each RHEL release for the entire company. Manage packages of the OS, Third Parties and Home-grown RPMs. Manage patches and upgrades. Puppet – – Work horse of this model Common framework to apply and configure the Global standard server configuration and particularities of each environment and applications. 3 The Three Layer Approach We have organized our configuration needs in three levels: Base: All NYX global customizations on top of the RHEL default settings. These must be propagated across the enterprise. Example: global standard size for /usr file system, kernel parameter kernel.core_pattern = /var/crash/core.%e.%p.%h Zone (or environment or network): Configuration items common to a specific network or environment. Zone layer inherits the Base layer with the ability to override anything if necessary. Example: /etc/resolv.conf file across one given production network. Application: Any specific configuration required by any given application. This layer inherits the Zone layer and has the ability to change anything set on the previous two layers. Example: UTP application requires a 50GB /appl file system and a “qt” package installed. 4 The Puppet Modules behind of this approach 5 An example of the modules-base manifests • • • • NYX Global Build customizes 250+ items on top of RHEL default OS All servers include the entire modules-base via the module name ‘base’ Example: class base::setup { include include include include include include include include include include include include include include include include include include ssh::setup banners::setup resolver::setup ntp::setup sysctl::setup profile::setup services::setup locale::setup email::setup yum::setup crashdump::setup bootloader::setup rclocal::setup sudo::setup hardware::setup tcpwrappers::setup firewall::setup etc_services::setup } 6 An example of the modules-base manifests class sysctl::setup { # Configure kernel core dumps sysctl { "kernel.core_pattern": val => “/var/crash/core.%e.%p.%h", ensure => present, } } class profile::setup { # Deploy custom /etc/bashrc file { "/etc/profile": owner => root, group => root, mode => 0644, source => "puppet:///profile/etc_profile", } } 7 An example of the modules-zones manifests The organization has 20+ zones (and growing) across the globe, including production, QA, development networks, which one with own settings. Example of a zone setting: class trading::setup { $country = 'us' $searchpath = 'nyx.com' $domain = 'nyx.com' $nameservers = [ "10.0.X.X", "10.0.X.X" ] $timeservers = [ "10.0.X.X version 3 prefer", "10.0.X.X" ] $snmp_ro_communities = [ "dfasdfasdfasdf" ] $snmp_rw_communities = [ "sdfasdfasdfasd 127.0.0.1" ] $snmp_dlmods = [ "cmaX /usr/lib64/libcmaX64.so" ] $trapsink = [ "sdfasdfasd" ] $syslogserver = "10.0.X.X" $postfix_mailhost = "mailrelay.nyx.com" $ilo_licensed = "true" $ilo_password = "my_password" # Including the base classes include base::setup } 8 An example of the modules-app manifests • 60+ applications requiring their own settings • We try to configure the application puppet manifest to be pluggable on any zone Example of one application manifest: class sg::setup { package { ["libxslt", "libmng"]: ensure => present, } lvm::config_lvm { "tcpdump": mount_point => "/tcpdump", lvname => "ltcpdump", vgname => vg00, lvsize => "5G", fstype => "ext3", owner => “myuser", group => "users", mode => "0755", } } 9 How the three layers work together •The zone class usually sets variables that will be used by modules-base for configuration. •The zone class necessarily includes the modules-base. •All servers must belong to one zone. •The application module is coded to be pluggable to any zone (i.e. match engine application module can be on a QA environment or production environment) •The ability to build anything, anywhere and mimic Production config to QA environment and vice-versa. •The modules-app can override settings inherit on modules-zones. The modules-zones can override settings inherit on modules-base. • Using puppet, we synchronize the build server process with the latest requirements of the applications. 10 The Last Piece: the Node Definition File The node definition is the entity that matches the hostname to the zone and application modules. In addition, it has the networking configuration of the server. node "buildtest.nyx.com" { # Zone Class include corplan::setup # Application Class include sg::setup # Networking network::interface::setup {"bond0": ensure => present, config_type => bonded, ip => "10.0.0.100", netmask => "255.255.255.0", slave0 => "eth0", slave1 => "eth1", gateway => "10.0.0.254", arp_target => "10.0.0.254", arp_interval => "3000", restart_network => false, } } 11 NYX Build Server Components • DHCP/DNS Server (optional) • TFTP / PXE Server • Satellite Server or Proxy • Apache Server • Puppet Server • MySQL Server • The server resides on a separate network called build network since we do not build any server on productions network. 12 Automated Installation Process - Overview PreBoot Phase PXE Boot for 1st time Server out of the box Default kickstart label that executes a perl script via “wget” to register the Serial Number and the MAC address and reboots server. On the Build Server, the register script generates a customized PXE file after looking up Serial Number in mysql database. Custom PXE has hostname and OS version (RHEL4 or RHEL5). Phase 0 PXE Boot for 2nd time Start the RHEL installation with the minimum packages and Global Standard File Systems setup and sizes. The kickstart post install script register the new server on the satellite server and assign it to the appropriate channels. Puppet client is installed just before server is rebooted. Server come back from the first reboot Continue 13 Automated Installation Process – Overview Cont. Phase 1 Load lean OS from disk + init script A node definition is mandatory at this point and puppet client configures server to the NYX Global base (default) configuration and networking. An init.d script runs puppet for the first time and downloads its own configuration from the puppet server. Server reboots for the third time Phase 2 Phase2 init script Server might reboot or execute Phase2 init.d script directly On Phase2 is where Puppet is executed for the last time to configure the server to their final zone (environment) and application settings. The phase 2 execution can be triggered directly from Phase1 init script (in this case is executed on the build network) or can be executed once the server is on the production network. 14 PreBoot Phase • Very simple kickstart profile launched by default PXE label • It does nothing than executing a script via wget • It registers the server with two parameters: Serial Number and the MAC • The script will create a custom PXE boot file with given MAC that contains the RHEL release and the hostname of the server. • Example of the kickstart pre-script: %pre SERVER=`grep 'url address' /tmp/anaconda.log | awk '{print $6}'` cd /tmp && wget -q http://${SERVER}/pub/dmidecode chmod +x /tmp/dmidecode SN=`dmidecode | grep "Serial Number" | head -1 | awk '{print $3}'` MAC=`ifconfig | grep HWaddr | awk '{print $5}'` wget -q http://${SERVER}/cgi-bin/register.pl?mac=$MAC\&sn=$SN reboot 15 Phase 0 • A single kickstart profile for each RHEL release to the entire organization • Very minimum number of packages installed • Larger OS File system sizes than the RHEL Default • Server is registered to the Satellite • Puppet client is installed (puppetd) • NYX-Firstscript is set to run on the next reboot via init.d 16 Phase 0 – Fragment of the Kickstart profile %post --interpreter /bin/sh echo "Installing puppet..." yum -y install puppet facter ruby-rdoc ruby-shadow # Puppet is install via the activation key but by default will run as a daemon echo "Chkconfig'ing puppet off..." chkconfig puppet off # Grab nyx-firstboot script from the satellite or proxy RHN_SERVER=`grep '^serverURL=' /etc/sysconfig/rhn/up2date | cut -d'/' -f3` echo "Detected Puppetmaster/RHN as: $RHN_SERVER" echo "Downloading nyx-firstboot script..." wget -O /etc/init.d/nyx-firstboot http://$RHN_SERVER/pub/glb/nyxfirstboot # Make it executable and make it start on firstboot echo "Installing nyx-firstboot..." chmod 755 /etc/init.d/nyx-firstboot chkconfig --add nyx-firstboot 17 Phase 1 • It is run by the nyx-firstboot init script • nyx-firstboot executes puppetd twice: one to configure itself and another to set the server to the base config. • nyx-firstboot script command lines that executes puppetd # Get puppet server address, the same as the RHN Proxy/RHN Satellite PUPPET_SERVER=`grep '^serverURL=' /etc/sysconfig/rhn/up2date | cut d'/' -f3` # Run puppet once to configure itself and get SSL certs auto-signed puppetd -tov --server=$PUPPET_SERVER --tags utilities::puppetclientbootstrap --no-noop # Run puppet again to apply all base build manifests FACTER_sys_building=true puppetd -tov --server=$PUPPET_SERVER --tags sys_building --no-noop 18 Phase 2 (Last phase) The nyx-phase2 script runs puppetd again, this time puppetd fully executes the puppet manifests which configures the server entirely to the zone and application. The nyx-phase2 is set to execute automatically from init.d OR triggered automatically by nyx-firstboot. Fragment of the script: # Run puppet to apply application specific manifests puppetd -tov --no-noop | /usr/bin/tee /tmp/puppet2b if egrep -q "err:|warning:" /tmp/puppet*; then logger -t nyx-phase2 "Found excessive errors on puppet logs!" else # Shutdown the server - to be shipped to the rack logger -t nyx-phase2 "Shutting down the system! No errors found on the puppet logs!" /usr/bin/poweroff fi 19 Data Center Deployment: Numbers and Lessons • One server gets completely installed in 20 minutes in unattended mode • We usually install up to 15 servers in parallel, but we had to upgrade first the puppet master to run on Apache Web Server. The default web server has shown not to be very powerful. • Usually, one individual in one day can install up to 50 servers by himself • The most difficult part was getting the requirements of each application 20 Ancillary Tools for This Project • Subversion (extremely important): – we had to disseminate the culture of version control among many individuals and teams. – Each layer of the modules and node definitions have its own subversion repositoriy • Web Subversion • OpenGrok (search the manifests) • Redmine (ticket and changes control) • Tidal Scheduler – To execute puppet client once the servers are in the production network 21 Next Steps • Run puppetd to deploy changes across the board on the day-by-day basis. This will require more a cultural change and coordination across more departments. • Run puppetd in report mode daily (no-intrusive) to detect discrepancies between the reality and the puppet manifests • Use puppet dashboard as a central database of Node definitions (called External Node Classifier) • Create more custom facts and types in ruby (diminish the number of execs inside the manifests). • Use of external resources to automate other configuration such as Veritas Cluster. 22 Thank you! Q&A 23