The SimpliVity Data Virtualization Engine

advertisement
The SimpliVity
Data Virtualization Engine
An Overview of the Data Architecture Powering
SimpliVity’s OmniCube
Table of Contents
1.
Synopsis.................................................................................................................. 1
2.
The SimpliVity Data Virtualization Engine: Addressing the Complexity Crisis at the
Core ........................................................................................................................ 2
1. The Broken System .......................................................................................... 2
2. The 21st Century Data Efficiency Panacea: Deduplication, Compression and
Optimization (DCO™)....................................................................................... 3
3.
State of the Deduplication Market Today ................................................................ 4
4.
SimpliVity’s Data Virtualization Engine: A 21st Century Efficiency and Mobilitycentric Data Architecture ......................................................................................... 5
1. The Starting Point: Real-time Deduplication, Compression and Optimization
without Impact to Performance ......................................................................... 6
2. OmniCube Accelerator™ .................................................................................. 6
3. Enhancing the Value through Optimization ...................................................... 7
4. Tying It All Together: The OmniCube Global Federation .................................. 7
5.
OmniCube: Globally Federated Hyperconverged IT Infrastructure ........................ 9
6.
Summary ................................................................................................................. 9
SimpliVity’s Data Virtualization Engine
1. Synopsis
SimpliVity’s OmniCube™ is the cure for two of today’s most vexing problems in IT – the extreme
cost and complexity of the IT environment, and cross-site IT (the limitations of managing IT
across sites including the Public Cloud). OmniCube is the industry’s first globally federated and
hyperconverged IT infrastructure platform. Designed and optimized for the virtualized
environment, OmniCube provides a complete scalable and flexible IT infrastructure that meets
the data center requirements for high availability, performance, and serviceability. In addition to
these core server, storage and networking services, OmniCube delivers a complete set of
advanced features, providing global protection and management of all the virtual machines
within and across data centers, including the Cloud.
Scalability is achieved when two or more OmniCube systems are deployed together, creating
the OmniCube Global Federation, a massively scalable pool of shared resources that enables
efficient data movement, extensive scalability, protection, disaster recovery, and enterpriseclass system availability.
Figure 1: OmniCube Overview
At the root of today’s complexity crisis is an antiquated data architecture with an inherent
course-grained data architecture (that was optimized for a world that predated server
virtualization), and the need to span multiple sites and the Cloud, thus requiring far less data
mobility than the modern IT environment demands. SimpliVity has comprehensively solved the
problem by delivering a fundamentally new data architecture which powers the functionality of
OmniCube—known as the SimpliVity Data Virtualization Engine™, or DVE™.
DVE deduplicates, compresses and optimizes all data (at inception, in real-time, once and
forever), and provides a global framework for storing and managing the resulting fine-grained
data elements across all tiers within a single system (DRAM, Flash, HDD), across all the datalifecycle phases (primary, backup, WAN, archive), across geographies, data centers and the
Cloud. In doing so, DVE enables OmniCube to deliver the functionality that can only be
achieved today through the management of more than a dozen disparate products—and does it
at a fraction of the cost and at an extreme reduction in complexity as compared to today’s
traditional infrastructure stack. The result is a first-of-its-kind product and global architecture
featuring capabilities that break records in terms of TCO and functional scope (acquisition cost
and operating cost). This paper will provide an in-depth look into the Data Virtualization Engine,
and will demonstrate how the DVE powers all of the differentiated functionality of OmniCube.
1
SimpliVity’s Data Virtualization Engine
2. The SimpliVity Data Virtualization Engine: Addressing the Complexity
Crisis at the Core
1. The Broken System
The architecture upon which the average data center is built dates back nearly 30 years. It is
inflexible, inefficient, and incapable of supporting the modern business with its modern data
mobility needs (see figure 2). Virtualization has brought tremendous value in consolidating
resources and bringing some level of simplicity to the infrastructure, but its true potential
remains blocked by the underlying course-grain data architecture, which acts as an anchor for
the virtual machines (VMs) that are designed for mobility.
The fact that these architectures inhibit virtualization and data mobility should come as no
surprise, as they were optimized for a different world – a world that predated server
virtualization, the Cloud, and solid state drives (SSD). It was a world in which IT was tightly
centralized within a single data center, and remote IT sites—if they existed—were run
independently from the primary data center. The size of disk drives was small (3 GB), the cost
of storage was very high, replication was a luxury very few could afford, and almost 100% of
disaster recovery operations were optimized for using tape media.
But while the underlying data architectures have changed very little, IT’s role in business has
changed dramatically. IT—and the data that the IT team is chartered to protect—is at the core
of almost all businesses today, and as such, the demands on IT are ever-greater. To address
the demands, IT teams have been forced to deploy technologies in disparate products as they
could not be delivered within their existing infrastructure products. Thus, the last 10 years has
seen a wave of specialty devices—such as WAN optimization, disk-based backup, SSD storage
devices, and Cloud gateways—each delivering value, but collectively adding to the complexity
crisis.
Figure 2: Legacy Infrastructure Topology
2
SimpliVity’s Data Virtualization Engine
2. The 21st Century Data Efficiency Panacea: Deduplication, Compression
and Optimization (DCO™)
The need for a lighter data architecture—one that fosters mobility rather than inhibits it—has
been clear for some time. Many have seen great promise in data deduplication and
compression—and have recognized that if done well, these technologies can facilitate lighterweight, mobile data structures. Optimization further holds promise as a means of intelligently
managing data based on the anticipated usage of it by the applications it serves. Following are
brief definitions of these technologies:
A. Deduplication—the process of finding and eliminating redundant data within a given data
set in reference to the whole available repository—holds great promise in delivering a
light-weight, mobile data structure and therefore is seen as a key to solving the
complexity crisis by addressing the root cause.
B. Compression—the process of finding and eliminating redundant data within a given data
set, in relation to other data within the same dataset, is a simpler problem, but provides
complimentary value.
C. Optimization—the intelligent treatment of data based on its anticipated use by an
application. Systems that can identify file types and make real-time decisions about
whether and where to store that data can achieve overall improved storage efficiency,
performance, and bandwidth usage.
Specifically, deduplication, compression and optimization have several key benefits that
address the core requirements of today’s data center:






More efficient use of the SSD storage cache. A deduplication process that operates at
the right point in the data stream can reduce the footprint on the cache, improving the
overall system-wide performance.
Dramatic bandwidth reduction on replication between sites. Twenty years ago, the
IT organization was dedicated to a single primary data center, but today, almost all IT
teams manage multiple sites. A fundamental requirement of the infrastructure, then, is
fostering efficient data transfer among sites. Deduplicating data before it goes to a
remote site makes the transfer itself more efficient and saves significant bandwidth
resources.
Enhanced data mobility. A fundamental principle of server virtualization is the mobility
of the VMs, but course-grain data structures significantly block mobility in a traditional
infrastructure environment. When the data is deduplicated, it is easier to move VMs
from one server to another, and it is easier to move data in and out of the Cloud for the
same reason.
Efficient storage utilization. Required capacity can be reduced 2-3X in standard
primary use cases based on the effective use of deduplication, compression, and
optimization (DCO).
Enhanced performance given that less actual data needs to be written to disk or read
from disk. This is amplified in application environments such as VDI, where “boot storm”
can generate multiple GB of random reads from disk. With DCO, that can be reduced to
tens of MB.
Enhanced “time-to-data”. Achieve faster access to data (by virtue of there being less
physical data to move), when migrating data, or when recovering data from a remote site
or from the Cloud.
3
SimpliVity’s Data Virtualization Engine
The above list enumerates the great potential value of deduplication across a number of areas.
This may be counter-intuitive given that deduplication technologies have historically been
designed to optimize for HDD capacity. When introduced to the market in the mid-2000s,
dedupe was designated entirely for backup. In this use case, optimizing for capacity is crucial,
given massive redundancy of data and the ever increasing volume of data to be backed up and
retained. In primary storage systems, optimizing for disk capacity is a relatively lower priority.
HDD IOPs are a much more expensive system resource than HDD capacity.
All of this points in one direction: 21st century data has to be deduped, compressed and
optimized at the primary storage level, and no later. When data is deduplicated across all tiers
right from the point of inception, it has significant resource-saving ramifications all the way down
the stream, and opens up the advanced functionality required for today’s world.
3. State of the Deduplication Market Today
Deduplication emerged as a hot technology for the data center in the mid-2000s, and has
remained prominently used. However, thus far, deduplication has been designed by vendors as
an isolated data-lifecycle phase-specific and resource-intensive operation that has been
implemented in different products, by different vendors, each addressing a single specific
problem: dedupe of backup data, or dedupe of WAN data, or dedupe of archive data.
Despite the maturity of deduplication, and the great benefits of deduping primary data, no
vendor has thus far comprehensively solved the dedupe challenge in primary data. Some
products apply deduplication only within the SSD tier, and therefore only offer limited benefits in
terms of overall efficiency. Others apply compression technology and incorrectly use the term
“dedupe”. Because of the latency that dedupe may impose, many have deployed it as a “postprocess’, which severely limits other operations such as replication and backup. Most of these
sub-optimal implementations are a result of adding deduplication to an existing legacy
architecture, rather than developing it as the foundation for the overall 21st Century architecture.
The various fragmented work-arounds that vendors have delivered have varying levels of value,
but fall short of solving the underlying problem. These approaches provide some value, but
ultimately do not deliver a truly fine-grained and mobile data infrastructure. IT teams can be left
with higher acquisition costs and even more complexity as they manage partial dedupe amidst
their other infrastructure burdens.
Given these noted challenges, no vendor has thus far been able to offer a dedupe apparatus
that addresses all the dedupe opportunities and needs across the full life-cycle of data: from
primary storage, to backup, to the WAN, archive and the Public Cloud. As a result, IT teams
seeking the noted benefits have been forced to deploy multiple, disparate products, from
different vendors, each necessitating separate training and separate on-going management.
Yet despite the challenges of recent years it is clear that the future of IT infrastructure
operations will depend heavily on an efficient and effective combination of deduplication,
compression and optimization for primary production workloads.
4
SimpliVity’s Data Virtualization Engine
4. SimpliVity’s Data Virtualization Engine: A 21st Century Efficiency and
Mobility-centric Data Architecture
Rather than taking an existing data architecture and trying to build-in deduplication,
compression and optimization, SimpliVity took the inverse approach. As a first step, it designed
the core technology that performs real-time deduplication and compression on primary data, in
real-time, without impact to performance or latency (see below, the OmniCube Accelerator™),
and built an entire globally federated data architecture around that foundation that manages the
resulting fine-grained data elements across a Global Federation of systems.
In doing so, it addressed all of the core requirements for truly effective deduplication,
compression and optimization for the primary production infrastructure system and beyond:







Real-time
Once and forever (no need for a second pass, or hydration/dehydration inefficiencies)
Across all tiers of data within a system
Across all datasets
Across all locations
Including on the Public Cloud
Without impacting performance
In delivering DVE, SimpliVity is realizing the potential of well-implemented deduplication,
compression, and optimization of primary data. In addition to disk capacity, DVE optimizes
HDD IOPs, flash capacity, DRAM capacity, and WAN capacity. In so doing, DVE is going far
beyond capacity efficiency. While it may not be intuitive- by performing real-time deduplication,
compression, and optimization DVE improves system performance. With DVE, deduplication,
compression, and optimization occur before data is written to the HDD, thus preserving the
precious HDD IOPs. The “boot storm” is a great example. In a traditional storage platform, 100
Windows VMs booting at the same time will cause roughly 10,000MB of random disk reads. In
the SimpliVity OmniCube, this same workload will cause roughly 100MB of reads as all of the
data Windows reads to boot is common between the 100 VMs. This is a 100x savings in disk
operations.
Figure 3: DVE – Deduplication, Compression, Optimization
5
SimpliVity’s Data Virtualization Engine
1. The Starting Point: Real-time Deduplication, Compression and
Optimization without Impact to Performance
The DVE performs deduplication, compression and optimization in real-time, as the data is first
written into the OmniCube datastore. This contrasts to a more prevalent approach called postprocess deduplication, which allows data to be written first without deduplication and at some
later stage, performs the deduplication process. The big problem with post-processing
deduplication is that it introduces a lag where there was none before. Businesses are presented
with the choice to replicate data before deduplicating it or waiting to replicate until the dedupe
process is complete. But neither option is sufficient: replicating before deduplicating defeats the
purpose of deduplicating at all, and waiting to replicate can create RPO (Recovery Point
Objective) issues. It may take so long to dedupe the data that by the time it is deduped and
replicated, it no longer meets the RPO.
Given the clear superiority (and elegance) of performing deduplication real-time, why is it
unusual? In a word, performance. Deduplication is a resource intensive process. As data
enters the system, it must be scanned, analyzed, compared to an index or table that has
cataloged all existing blocks in the data set, and then acted upon (either deleted if redundant, or
written if new). Pointers and indexes need to be updated in real-time such that the system can
keep track of all data elements in all their locations, while maintaining an understanding of the
full data sets (pre-deduplication) that have been stored in the system. The challenge is
augmented if we wish to maximize data-efficiency by focusing the architecture of granular 4KB
or 8KB data sets (which is the original size at which data is written by the application). A system
managing 4KB blocks and ingesting data at 400MB/s needs to perform 100,000 such operations
per second.
Given the challenge, it is understandable that many vendors have opted to conduct this
operation out-of-band, so as not to impact performance. This is a challenge that SimpliVity
addressed head-on and resolved.
2. OmniCube Accelerator™
SimpliVity’s real-time deduplication breakthrough is the OmniCube Accelerator, a specially
architected SimpliVity PCIe module that processes all writes and manages the compute
intensive tasks of deduplication and compression. All data that is written to the OmniCube
datastore first passes through the OmniCube Accelerator at inception, as it is created. The
practical effect of real-time deduplication is that the DVE ends up processing data elements that
are between 4KB and 8KB in size, compared to the 10-20MB of traditional architectures, i.e.
2,000 times more efficient. The data is thus “born” to be mobile from the beginning, and remains
so throughout its lifecycle within the OmniCube Global Federation.
Within a given OmniCube system, deduplication makes each storage media tier more efficient—
DRAM, Flash, SSD, and HDD—thereby dramatically lowering the cost of the system compared
to traditional offerings.
While deduplication within a single OmniCube system provides great efficiencies and cost
savings, the additionally groundbreaking value of OmniCube lies in the Global Federation—the
network of connected OmniCube systems that provide High Availability (HA), resource sharing,
simplified scale-out, and replication for VM movement and Disaster Recovery (DR).
Additionally, with deduplication at the core, the DVE architecture has been designed and
6
SimpliVity’s Data Virtualization Engine
optimized for managing a very large set of fine-grained data elements, across a Federation of
systems that are both local (within the same data center) and remote (dispersed data centers),
including the Public Cloud. For example, a modest Federation of a few systems contains tens
of billions of deduped, compressed and optimized 4KB elements. The DVE enables this
Federation to efficiently track and manage all elements, and make real-time decisions about
which data elements to send via replication and which need not be sent due to the existence of
an exact match at the destination site.
Designing the overall data architecture around the deduplication, compression and optimization
engine has ensured that the value of dedupe pervades all media, all tiers (primary, backup, and
archive), and all locations.
3. Enhancing the Value through Optimization
While dedupe is the fundamental core, the DVE further enhances the CAPEX and OPEX
savings enabled with OmniCube by delivering remarkable efficiencies through “operatingsystem and virtualization aware” optimizations. The optimizations within OmniCube deliver
similar effects to dedupe in a different way—they identify data that need not be copied, or
replicated, and take data-specific actions to improve the overall efficiency of the system.
Given that OmniCube today is optimized for the VMware environment, most such optimizations
stem from awareness of VMware specific content or commands. For example, .vswp1 files,
though important to the functionality of each individual VM, do not need to be backed up or
replicated across sites. Thus, when preparing to backup or replicate a given VM from one site to
another, the DVE recognizes the .vswp file associated with a VM, and eliminates that data from
the transfer- saving time, bandwidth and capacity. Other optimizations are similar in nature—
leveraging DVE’s ability to find and make real-time decisions on common data types within a
VMware environment.
4. Tying It All Together: The OmniCube Global Federation
With the data permanently in an efficient fine-grained state, and an overall architecture designed
to manage and track billions of elements across a global network of systems, all of the core
functionality of OmniCube is enabled:







efficient data mobility within and across data centers,
intelligent data movement within the Federation,
data sharing for high availability,
cache accelerated performance,
cloud integration,
and a single point of global management,
that’s automated and VM-centric
1
.vswp files are swap files for guest VM memory, they are used by VMware if system memory is over-allocated.
.vswp files are overwritten when a VM boots, so there is no need to capture them in backups. OmniCube is built to
recognize .vswp files and therefore skip the backup and replication process on those files.
7
SimpliVity’s Data Virtualization Engine
Figure 4: OmniCube Global Federation
The value of deduplication, compression and optimization is amplified in the Global Federation
as the required bandwidth is reduced dramatically compared to non-deduplicating systems.
Importantly, this also speeds-up the delivery of data from/to remote sites and the Cloud.
Deduplication has traditionally been successful comparing incoming data to what lies within the
local array, but struggles in a true global setting. In contrast, the DVE contains an advanced
inter-node messaging system that allows OmniCube systems to communicate about the
contents of their local data store. In essence, when replicating from one system to another, this
allows each OmniCube system to know enough about what is on the remote systems to ensure
that only unique data (free of operating systems and VMW commands and other overhead) is
sent over across the wire. This inter-node communication can have dramatic effects. For
example, in many cases within an OmniCube Federation, on the first replication of a given VM,
very little data needs to traverse the wires. This is a radical departure compared to any other
replicating system, which must always replicate a full copy of a given data set during the first
replication. But in reality, any two VMs running common operating systems like Windows 2008
will have a large set of data elements in common. The DVE will recognize any such redundant
data that already exists at the remote site, and ensure only truly unique data elements are sent.
8
SimpliVity’s Data Virtualization Engine
5. OmniCube: Globally Federated Hyperconverged IT Infrastructure
The new DVE architecture supports a globally comprehensive solution that dramatically
consolidates and optimizes each of these new technologies into a single, efficient platform that
scales elegantly within the data center, and extends scalability to data centers across the world
including the Cloud. In short, the OmniCube is a hyperconverged, globally-federated IT
infrastructure that is designed and optimized for the virtualized environment:



Hyperconverged: OmniCube consolidates all core infrastructure functionality in addition
to the functionality delivered today by over a dozen specialized devices and products
into a single, scalable building block.
Globally-federated: OmniCube systems are networked together to create a dynamic
pool of shared resources within the data center, and an optimal inter-data center
replication and DR solution.
Optimized for virtualized environments: OmniCube has been designed and optimized
to simplify the IT operations of the virtual environment. The OmniCube Global
Federation provides a VM-centric and globally unified management framework to the IT
team. From a single pane of glass, an administrator can view and manage all of the
VMs—including all backup copies—in all of their locations across the globe.
6. Summary
By focusing its new architecture (from the ground-up) on solving the legacy data-architecture
problem, SimpliVity cured the disease rather than just addressing the symptom. SimpliVity
created a novel global solution that’s tuned for a virtualized, globally distributed and Cloud
enabled IT operation. By delivering DVE at the heart of OmniCube, SimpliVity succeeds in
delivering a Globally Federated Hyperconverged IT Infrastructure. In doing so, DVE overcomes
the various barriers that have limited the usage and deployment of deduplication (in primary
storage, in real-time, across all tiers within a system, across data centers and the Cloud, and
fine-grained). The result is a radically simplified and dramatically lower cost infrastructure
platform that delivers on the requirements for scalability, flexibility, performance elasticity, data
mobility, global management, and Cloud integration that today’s IT infrastructures require.
9
Download