Intel® Virtual Storage Manager (VSM) Current Product Overview and Community 1.0 Roadmap Dan Ferber, Yaguang Wang 01 November 2014 Table of Contents • VSM Architecture Overview • VSM Current Features • VSM Roadmap and Planned Features Intel® Virtual Storage Manager Overview Internet Creates, Manages, and Monitors Ceph Clusters • Web-based UI • Administrator-friendly interface for management and monitoring Data Center Firewall Data Center • Configuration management • Storage Groups aggregate drives with similar performance • Zones aggregate drives within failure domains OpenStack Client Server Nova Controller Server(s) • Capacity Management • Pools segregate allocation by application • Capacity utilization by storage group and pool • Cluster Management SSH Operator • Manage capacity growth • Manage Server & disk maintenance • Cluster Monitoring • Capacity and performance • OSD, PG, and monitor state Cluster Server (OSDs/Monitor/MDS)) HTTP API VSM Controller (Controller Node Server) VSM Agent Socket (on each Ceph node) • VSM RESTful API • Software interface supports automation Management framework = Consistent configuration Operator-friendly interface for management & monitoring 3 Typical VSM-Managed Cluster • VSM Controller – Dedicated server or server instance • Server Nodes • Are members of VSM-managed Ceph cluster OpenStack-Administered Network • May host storage, monitor, or both • VSM agent runs on every server in VSM-managed cluster • Servers may contain SSDs for journal or storage or both • Network Configuration VSM Controller OpenStack Admin Client Node Client Node Client Node Client Node RADOS RADOS RADOS RADOS • Ceph public subnet – Carries data traffic between clients and Ceph cluster servers Ceph public - 10GbE or InfiniBand • Administration subnet – Carries administrative communications between VSM controller and agents Administration GbE • Also administrative comms between Ceph daemons • Ceph cluster subnet – Carries data traffic between Ceph storage nodes – replication and rebalancing • OpenStack admin (optional) • One or more OpenStack servers managing OpenStack assets (clients, client networking, etcetera) Ceph cluster 10GbE or InfiniBand Server Node Server Node Server Node Server Node Server Node VSM Agent VSM Agent VSM Agent VSM Agent VSM Agent Monitor Monitor Monitor Monitor OSD OSD OSD OSD OSD OSD OSD OSD OSD OSD OSD OSD • Independent OpenStack-managed network – not managed by or connected to VSM • Optionally connected to VSM via SSH connection SSD SSD SSD SSD • Allows VSM to “tell” OpenStack about Ceph storage pools 4 Managing Servers and Disks • Servers can host more than one type of drive • Drives with similar performance characteristics are identified by Storage Class. Examples: • 7200_RPM_HDD • 10K_RPM_HDD • 15K_RPM_HDD • Drives with the same Storage Class are grouped together in Storage Groups • Storage Groups are paired with specific Storage Classes. Examples: • Capacity = 7200_RPM_HDD • Performance= 10K_RPM_HDD • High Performance= 15K_RPM_HDD • VSM monitors Storage Group capacity utilization, warns on “near full” and “full” • Storage Classes and Storage Groups are defined in the cluster manifest file • Drives are identified by Storage Class in the server manifest file 5 Managing Failure Domains • Servers can be grouped into failure domains. In VSM, failure domains are indented by zones. • Zones are placed under each Storage Group • Drives in each zone are placed in their respective storage group • In the example at right, six servers are placed in three different zones. VSM creates three zones under each storage group, and places the drives in their respective storage groups and zones. • Zones are defined in the cluster manifest file • Zone membership is defined in the server manifest file VSM 0.5.9 Key Features, prior to open source Managing Capacity Managing the Ceph Cluster • Creating storage pools • Increasing the capacity of a storage pool • Monitoring pool utilization, performance • Creating initial Ceph cluster Managing Orchestration • Presenting storage pools to block storage (OpenStack Cinder) Managing VSM • Controlling access to VSM Monitoring The Cluster • Dashboard • Capacity & Performance: Storage Groups and Pools • Cluster Health: OSDs. PGs, and Monitors • Based on Firefly 80.1 and CentOS 6.5 • VSM must create the Ceph cluster it manages • • • • • • • • • • Adding and removing monitors Adding a Ceph server to the cluster Bringing a Ceph server down for maintenance Bringing a Ceph server up after maintenance Replacing failed disks Replacing failed servers Monitoring cluster health Removing a Ceph server from the cluster Troubleshooting the Ceph cluster Managing failure domains 7 VSM 0.6 engineering build, prior to open source Features • • • • • Based on Firefly 80.5 and CentOS 6.5 basic server Enhanced format for uptime on main console display Zone management Add and populate storage groups after ceph cluster creation Intel OEM partner can define their own storage class names • storage class names are no longer pre-defined • flexible storage class naming • Backup VSM controller and restore in event of VSM controller failure • CLI based process • Bug fixes 8 VSM 0.7 engineering build, beginning of VSM open source Features • Based on Firefly 80.5 and CentOS 6.5 basic server • Open sourced and available from github.com under the Apache 2 license • Bug fixes 9 VSM 0.7 Open Source Information • • • • • • • Main project page will be at 01.org The source code will be licensed as Apache 2 Developer’s Certificate of Origin will be used for code contribtions Code available on Git from github.com, plus technical wiki Jira will be used for bug submission and feature requests There will be an Intel VSM gatekeeper This is an engineering build, will publish a pathway roadmap on how we’ll do feature and maintenance builds 10 VSM 1.0 planned open source community availability January 2015 • Create erasure coded pools • Create replicated pools using separate (heterogeneous) primary and replica storage groups • VSM will use ceph-deploy infrastructure • Bug fixes • VSM Update • Can udate from the previous VSM release to the current VSM release (e.g. 0.7 to 0.8) without rebuilding the Ceph cluster • OEM field support can update from the previous production VSM release to the current production VSM release (e.g. 0.6 to 1.0) without rebuilding the Ceph cluster • Add and remove cache tiers (tiered pools) • User quotas • Communications between the VSM operators’ browsers and the VSM web server are encrypted via self-signed SSL 11 VSM After 1.0 • Schedule for new features depends on • • • • What Intel VSM OEM partners need What the community adds to the VSM open source tree Any features Intel adds Features and bugs plus priorities will be discussed openly in the community • Support and Bug fixes • By community with an Intel gatekeeper and code reviewer 12