BIG DATA ANALYTICS
PLATFORMS
Top Big Data Analytics Platforms
• Revolutionary. That pretty much describes the data analysis time in which
we live. Businesses grapple with huge quantities and varieties of data on
one hand, and ever-faster expectations for analysis on the other.
• Apache Hadoop, a nine-year-old open-source data-processing platform first
used by Internet giants including Yahoo and Facebook, leads the big-data
revolution.
• Cloudera introduced commercial support for enterprises in 2008, and
MapR and Hortonworks piled on in 2009 and 2011, as introduced its own
Hadoop distribution.
• Microsoft and Teradata offer complementary software and first-line
support for Hortonworks' platform.
• Oracle resells and supports Cloudera, while HP, SAP, and others act more
like Switzerland, working with multiple Hadoop software providers.
1010 data
New York-based 1010data launched its
analytical, private-cloud service way back
in 2000, building a base of customers on
Wall Street. Marquis customers include
NYSE Euronext and a number of big banks,
but the company has also branched out
into retail, CPG, gaming, healthcare,
government, and telecommunications.
1010data's columnar database supports massively parallel processing for scalability, but it's a
proprietary design with its own query language that supports a subset of SQL functions plus broader
query types including graph and time-series analyses. It also handles semi-structured data such as social
network and company machine data.
Actian
Ingres Corp. took the name Actian in 2011, and
the company has been fleshing out a big-data
portfolio ever since. Building on the 10,000+
customer base of Ingres, the open-source
transactional database, the company expanded
with Vector wise, a fast analytical database
management system (DBMS) now called Actian
Vector. It also acquired Versant, the vendor
behind the eponymous object database; and
Pervasive, maker of DataRush analyticsonHadoop and data-integration software now
called Actian DataFlow. The April 2013
acquisition of ParAccel marked an even bigger
push into big-data analytics with a massively
parallel processing DBMS now called Actian
Matrix.
Amazon
• Amazon Web Services hosts a who's who list of
data-management services from third-party
players -- Cloudera, Microsoft, Oracle, SAP, and
many others -- but the cloud giant has its own
long-term ambitions where big-data analysis is
concerned. Building on its Elastic Compute
Cloud (EC2) and Simple Storage Service (S3)
storage infrastructure, Amazon launched its
Hadoop-based Elastic MapReduce service way
back in 2009
In 2013, AWS added the Redshift Data Warehousing service (based on the ParAccel DBMS), which is
supported by another who's who list of independent data-integration, business intelligence, and
analytics vendors. Rounding out AWS's big-data capabilities are the DynamoDB NoSQL database
management service and Kinesis Stream Processing service.
Cloudera
• The market-leading distributor of Hadoop
software, Cloudera is pushing hard to extend
the data-processing framework into a
comprehensive "enterprise data hub" that
can serve as a first destination and central
point of management for all data within
enterprises.
• Cloudera vows support for open-source
Hadoop, but to ensure enterprise-grade
performance, reliability, data-access control,
and security, Cloudera offers proprietary
software including Cloudera Manager,
Cloudera Navigator, and certain vendorexclusive components for backup and
recovery.
HP HAVEn
HP calls its big-data-platform architecture
HAVEn, an acronym for Hadoop,
Autonomy, Vertica, Enterprise Security, and
"n" applications. HP doesn't have its own
Hadoop distribution, but it provides
reference hardware configurations for
leading Hadoop software distributors.
Autonomy's IDOL software addresses
search and exploration of unstructured.
Vertica is HP's massively parallel processing
columnar analytical DBMS designed for
speedy analysis of massive, structured data
sets. Competing with the likes of IBM
PureData for Analytics (Netezza) and
Pivotal Greenplum, Vertica is intended to
complement rather than replace legacy
enterprise data warehouse environments
such as Teradata.
Hortonworks
IBM
• IBM's cloud platform combines
platform as a service (PaaS) with
infrastructure as a service (IaaS)
to
provide
an
integrated
experience. The platform scales
and
supports
both
small
development
teams
and
organizations,
and
large
enterprise businesses. Globally
deployed across data centers
around the world, the solution
you build on IBM Cloud™ spins up
fast and performs reliably in a
tested
and
supported
environment you can trust.
InfiniDB
• InfiniDB is a column-store DBMS optimized for OLAP workloads. It has a
distributed architecture to support Massive Parallel Processing (MPP). It uses
MySQL as its front-end such that users familiar with MySQL can quickly migrate to
InfiniDB. Due to this fact, users can connect to InfiniDB using any MySQL
connector.
Infobright DB
• With the volumes of
machine data exploding,
Ignite’s Infobright DB is
specifically designed to
achieve
high
performance for large
volumes of machinegenerated data used in
complex ad hoc analytic
environments, without
the
database
administration required
by other products.
Kognitio
• Kognitio has enabled users to achieve very high performance analytics for over
twenty years. Until recently this has only been affordable for a niche market of
major industry players who understood that it was worth paying a premium for
high performance infrastructure to generate rapid and detailed insight from their
data.
MapR
With the MapR Data Platform, users can store, manage, process, and analyze all data including files, tables, and streams from operational, historical, and real-time data
sources - with mission-critical reliability to meet production SLAs. MapR offers a core set
of data services to ensure exabyte scale and high performance while providing
unmatched data protection, disaster recovery, security, and management services.
Microsoft Azure
The Azure Internet of
Things (IoT) is a
collection of Microsoftmanaged
cloud
services that connect,
monitor, and control
billions of IoT assets. In
simpler terms, an IoT
solution is made up of
one or more IoT
devices and one or
more back-end services
running in the cloud
that communicate with
each other.
Oracle DB
• Oracle’s revolutionary cloud database is self-driving, self-securing,
self-repairing, and designed to eliminate error-prone manual data
management. Easily deploy new or move your existing OLTP and data
warehouse to the cloud. The secured, intelligent, highly available
database in the cloud enables you to get more value from your data
to grow your business.
Pivotal BigData and IoT
There's no shortage of ambition at Pivotal, an EMC spinoff that offers big-data
infrastructure as well as an abstraction layer for cloud computing (based on Cloud
Foundry) and an agile application development environment (based on SpringSource).
Pivotal's big-data and
analytics
capabilities
blend the Pivotal HD
Hadoop
distribution
with GemFire SQL Fire
in-memory technology,
the
Greenplum
database, and HAWQ
(Hadoop With Query)
SQL
querying
capabilities. It also has
close ties and indatabase integrations
with SAS analytics.
SAP HANA
Teradata Vantage