Using OpenStack and Puppet to deliver IaaS at CERN.

advertisement
Ben Jones
ben.dylan.jones@cern.ch
12/9/2013
NEC'2013
2
Agile Infrastructure
•
Why change the operating model?
•
•
•
“We’re not special”
•
•
Twice the compute, same staff levels
New DC at Wigner, Budapest
Existence of open source tool chain: OpenStack,
puppet, foreman, kibana
“Coffee time” provisioning of cloud servers
12/9/2013
NEC'2013
3
12/9/2013
NEC'2013
4
New Data Centre
•
•
•
•
12/9/2013
Data centre in Geneva at
the limit of electrical
capacity at 3.5MW
New centre chosen in
Budapest, Hungary
Additional 2.7MW of
usable power
Local on-site support for
hardware maintenance
and installations
NEC'2013
5
What is Cloud?
•
Technology model
•
•
Operational model
•
•
virtualization of compute, network, storage
run your services in a certain way
Consumption model
•
•
“don’t make me talk to IT”
delivered instantly* over the wire, variable price
12/9/2013
NEC'2013
6
What is IaaS?
12/9/2013
NEC'2013
7
Private Cloud Software
• We use OpenStack, an open source cloud project http://openstack.org
• ATLAS and CMS High Level Trigger clouds
• HEP Clouds at BNL, IN2P3, NECTaR, FutureGrid, …
• Clouds at HP, IBM, Rackspace, eBay, PayPal, Yahoo!, Comcast,
Bloomberg, Fidelity, NSA, CloudWatt, Numergy, Intel, Cisco …
12/9/2013
NEC'2013
8
OpenStack
Open Source
• Apache 2.0 licensed
• No “enterprise” version
Open Design
• Open design summit
• Anyone is able to define core architecture
Open
Development
• GitHub
• Launchpad
Open Community
• OpenStack foundation in 2012
• Now 190+ companies, 3000+ developers, 11000+ members
12/9/2013
NEC'2013
9
CERN Network
Database
Block Storage
Provider
Cinder
Account mgmt
system
Microsoft Active
Directory
Network
Compute
Scheduler
Keystone
Nova
NEC'2013
12/9/2013
Horizon
Glance
10
CERN DB
on Demand
Nova
•
•
Cloud computing fabric controller
Network manager modified for CERN
•
•
•
•
integration with network database
specific to our use case, not pushed upstream
Nova Compute aware of CERN DNS & AD
Multiple availability zones
•
•
special zone for Hyper-V
scheduler has filter based on image distribution
metadata
12/9/2013
NEC'2013
11
Glance
•
•
Services for discovering, registering and retrieving VM
images
Aim for automated image creation / update
•
•
•
•
Images for all CERN supported OS
•
•
common process for Linux & Windows images
common tools – Aeolus Oz
CERN tools to hook up Oz & Glance API
user defined images supported
Initial contextualization via cloud-init
•
Cloudbase contributed cloud-init for windows
12/9/2013
NEC'2013
12
Keystone
•
•
Identity service: authentication, authorization
and service catalog
Full integration with Active Directory via LDAP
•
•
•
•
•
CERN’s AD: 44K users & 29K groups
Minimal changes to AD
CERN submitting changes upstream
Account mgmt. System Integration for project
creation / deletion
SSL for everything
12/9/2013
NEC'2013
13
12/9/2013
NEC'2013
14
Operational practices evolving
•
Security incidents
•
old: reinstall, new: replace with new VM
•
Misconfiguration requiring reboot
• Resize a service
•
•
•
lxplus.cern.ch add VMs to serve demand
resize VMs (or rather, replace with bigger)
In future resize services automatically
12/9/2013
NEC'2013
15
Service Models
• Pets are given names like pussinboots.cern.ch
• They are unique, lovingly hand raised and cared for
• When they get ill, you nurse them back to health
• Cattle are given numbers like vm0042.cern.ch
• They are almost identical to other cattle
• When they get ill, you get another one
12/9/2013
NEC'2013
16
Some other use cases…
•
•
12/9/2013
Hippos are cattle with
block storage. Useful
where there is
redundancy, ie
MongoDB,
Cassandra.
Canaries are cattle at
high risk to give early
warning of failures.
Fail fast and fix.
NEC'2013
17
Heat
•
•
Heat orchestrates composite cloud apps
(stacks)
HA (restarts resources) & “auto-scaling”
12/9/2013
NEC'2013
18
Configuration Management
•
Adopted puppet
•
•
•
widely used, large community, scales
Needed to make reproducible services in the
CERN CC
Simplify the configuration of OpenStack
itself.
•
community modules from RH, puppetlabs, users
12/9/2013
NEC'2013
19
12/9/2013
NEC'2013
20
Accounting
•
CERN computing is funded from CERN central
budgets, no billing but quotas
•
•
•
What to do when quota is exceeded?
Unused capacity?
•
•
low SLA usage to plug the gaps?
Fair share across the cloud?
•
•
Experiments don’t have credit cards
Worked for supercomputers but heavy for clouds at scale
Bursting to public clouds?
12/9/2013
NEC'2013
21
Ceilometer
•
•
Accounting for OpenStack by project
Collects statistics from each compute node
•
•
Sharded MongoDB store
•
•
•
common OpenStack message bus
2gb / day
HyperV in Havana
Cinder statistics upcoming
12/9/2013
NEC'2013
22
CERN Status
•
CERN IT OpenStack Cloud
•
•
Folsom based service ~500 hypervisors on KVM and Hyper-V
New “grizzly” production service opened late July
•
•
High availability components using load balancing
•
•
•
ie 3 nova controllers per cell
All Puppet managed to configure OpenStack
LHC experiment farms
•
•
•
280 hypervisors, 600 VMs, 50 projects and growing rapidly
CMS currently running 1,300 hypervisors with 50,000 cores
ATLAS starting to ramp up to a similar size
Other science grid sites moving to private cloud on OpenStack
•
Brookhaven, IN2P3, FutureGrid, NeCTAR, IHEP, …
12/9/2013
NEC'2013
23
Outlook
•
Track stable Grizzly releases in RedHat RDO
•
•
Scaling
•
•
Expect 15,000 hypervisors, 150,000 VMs by 2015
Manageability
•
•
Up to date but not too close to the leading edge
Metering, Orchestration with Heat, Bare Metal
Functionality
•
Load Balancing, High Availability Storage and Pets
12/9/2013
NEC'2013
24
What have we learnt?
•
Automate everything from the beginning
•
•
•
Constant rate of change requires a different approach
•
•
•
Focus on core technologies and keep up to date
Track new projects but don’t adopt too early unless strategic
Many of our users are cloud aware
•
•
Puppet and Stackforge are a great help
Distributions and appliances make getting started much easier
Culture changes for legacy application coding and IT services
Communities are major motivators
•
But administrators need to engage and adapt rather than reinvent
12/9/2013
NEC'2013
25
Conclusions
•
CERN IT is re-engineering to deliver
additional capacity to 11,000 physicists
within fixed resources
• Clouds models can simplify current large
scale computing infrastructure
• OpenStack and its ecosystem allows us to
meet this challenge and help others through
open source
12/9/2013
NEC'2013
26
Questions ?
12/9/2013
NEC'2013
27
Preproduction Service
12/9/2013
NEC'2013
28
mcollective, yum
Bamboo
Puppet
AIMS/PXE
Foreman
JIRA
OpenStack
Nova
git
Koji, Mock
Yum repo
Pulp
Active Directory /
LDAP
Hardware
database
Lemon /
Hadoop /
LogStash /
Kibana
Puppet-DB
12/9/2013
NEC'2013
29
Training for Newcomers
Buy the book rather than guru mentoring
12/9/2013
NEC'2013
30
Job Opportunities
12/9/2013
NEC'2013
31
Download