Apache Cassandra @ Business Field 2 Author-Miss Purvaja A. Sable

advertisement
International Journal of Engineering Trends and Technology (IJETT) – Volume 26 Number 5- August 2015
Apache Cassandra @ Business Field
1st Author-Miss Monika D. Khade,
2nd Author-Miss Purvaja A. Sable
Department of Computer Science & Engineering
College of Engineering & Technology, Akola
Abstract— In the world of cloud computing, one
essential ingredient is a database that can
accommodate a very large number of users on an ondemand basis. Distributed storage mechanisms are
becoming the de-facto method of data storage for the
new generation of web applications used by
companies like Google, Amazon, Facebook, Twitter,
Salesforce.com, Linkedin.com and Yahoo! etc., which
are processing large amount of data at a petabyte scale.
This paper gives detailed overview of Apache
Cassandra – A distributed database management
system in business field. We are going step by step
from its introduction to the implementation and then
to the paradigm where it is used by major internet
companies. While doing so, we will also discuss its
features, tools available and ongoing research to
improve its performance.
Keywords –No SQL, Distributed Databases, Apache
Cassandra, Multi Data Center, Tunable Consistency.
I.
INTRODUCTION
A. What is Apache Cassendra?
Designed to handle large amounts of data across
many commodity servers, providing high
availability with no single point of failure.
Cassandra offers robust support for clusters
spanning multiple data centers, with asynchronous
master less replication allowing low latency
operations for all clients. Cassandra can be regarded
as distributed database system with combination of
technologies from Amazon Dynamo and Google
BigTable .The roots of Apache Cassandra lie in the
NoSQL database requirement for Facebook
Corporation. Cassandra was developed by Avinash
Lakshman and Prashant Malik at Facebook
Corporation to boost its inbox search feature. It was
designed to be a mean of database storage for
distributed architecture. Being deployable on
distributed platform it was highly expected to be
able to handle data spread across geographically
diverse servers, capable of providing seamless
service with built in fault tolerance and no single
point of failure .One of the most surprising features
of Cassandra is that, though it is said to be sharing a
lot of design and internal architecture details with
traditional relational database management system,
it is by no mean a relational database system.
Cassandra is responsible for providing a data model
to its users which can then be customized according
to data storage and access requirements.
ISSN: 2231-5381
In short, Cassandra can be described as an enormously
scalable, decentralized and fault tolerant database
management system which stores attribute values in
structured and indexed fashion for efficient querying
using Cassandra query language (CQL). Implementing
inbox search feature was one of the difficult tasks
given the existing relational database management
system. It required very high write throughput
capability. Though possible, it was infeasible and
inefficient to implements this feature with very high
number of geographically diverse users. Since its
introduction in 2008 number of Facebook there has
been more than double increase in number of users
and still Cassandra is giving satisfying performance.
Not only by Facebook, but due to its rich set of
features Cassandra has been deployed by various
major E-commerce businesses such as Netflix, digg
and Twitter to name a few.
II. HISTORY
A. Literature Review
Apache Cassandra was initially developed at
Facebook to power their Inbox Search feature by
Avinash Lakshman (one of the authors of Amazon
Dynamo) and Prashant Malik. It was released as an
open source project on Google code in July 2008. In
March 2009, it became an Apache Incubator project.
On February 17, 2010 it graduated to a top-level
project.
It was named after the Greek mythological prophet
Cassandra.
Releases after graduation include
0.6, released Apr 12 2010, added support for
integrated caching, and Apache Hadoop
MapReduce
0.7, released Jan 08 2011, added secondary
indexes and online schema changes
0.8, released Jun 2 2011, added the
Cassandra Query Language (CQL), selftuning memtables, and support for zerodowntime upgrades.
1.0, released Oct 17 2011, added integrated
compression, leveled compaction, and
improved read performance
http://www.ijettjournal.org
Page 263
International Journal of Engineering Trends and Technology (IJETT) – Volume 26 Number 5- August 2015
1.1, released Apr 23 2012, added self-tuning
caches, row-level isolation, and support for
mixed ssd/spinning disk deployments
1.2, released Jan 2 2013, added clustering
across
virtual
nodes,
inter-node
communication, atomic batches, and request
tracing
2.0, released Sep 4 2013, added lightweight
transactions (based on the Paxos consensus
protocol), triggers, improved compactions
2.0.4, released Dec 30 2013, added allowing
specifying datacenters to participate in a
repair, client encryption support to ss table
loader, allow removing snapshots of nolonger-existing CFs
2.1.0 released Sep 10 2014
2.1.6 released June 08, 2015
2.1.7 released June 22, 2015
2.2.0 released July 20, 2015
B. Evolution of Cassandra
Cassandra database system was born at Facebook in
2007 as a resource to handle inbox search feature
which consisted of high scalability and real time usage.
It was published as an open source project on Google
code in 2008 and became top-level project of apache.
Since its release it has undergone numerous changes
and additions. In the version released in 2010, it added
support for integrated caching. For year 2011 version
it included support for Cassandra query language
(CQL) and support for upgrades with no server
downtime is required by many real time websites such
as Facebook, Google and Amazon. As a part of recent
revision it exhibits more advanced features such as
support for SSD, self- tuning caches and row-level
isolation.
Apache Cassandra is a type of NoSQL database.
NoSQL is the class of database management systems
(DBMS). The NoSQL stands for the "Not only SQL".
It does not use SQL as querying language. NoSQL
has distributed, fault-tolerant architecture. There is no
fixed schema (formally described structure) and no
joins(typical in databases operated with SQL). In this
xpensive operation for combining records from two or
more tables into one set. Here joins require strong
consistency and fixed schemas, lack of these makes
NoSQL databases more flexible. It's not a replacement
for a RDBMS but compliments. A NoSQL database
(sometimes called as Not Only SQL) is a database that
provides a mechanism to store and retrieve data other
than the tabular relations used in relational databases.
These databases are schema-free, support easy
replication, have simple API, eventually consistent,
and can handle huge amounts of data. The primary
objective of a NoSQL database is to have simplicity
of design, horizontal scaling, and finer control over
availability. NoSql databases use different data
structures compared to relational databases. It makes
some operations faster in NoSQL. The suitability of a
given NoSQL database depends on the problem it
must solve.
From a business standpoint, considering a NoSQL
or ‗Big Data‘ environment has been shown to provide
a clear competitive advantage in numerous industries.
In the ‗age of data‘, this is compelling information as a
great saying about the importance of data is summed
up with the following ―if your data isn‘t growing then
neither is your business‖.
Fig. 2 NoSQL Database
B. Architecture Of NoSQL Web
Fig.1 Evolution Of Cassandra
III.
CASSANDRA- A NOSQL
A. NoSQL-A Database
ISSN: 2231-5381
A NoSQL database environment is, simply put, a
non-relational and largely distributed database system
that enables rapid, ad-hoc organization and analysis of
extremely high-volume, disparate data types. NoSQL
databases are sometimes referred to as cloud
databases, non-relational databases, Big Data
databases and a myriad of other terms and were
developed in response to the sheer volume of data
being generated, stored and analyzed by modern users
http://www.ijettjournal.org
Page 264
International Journal of Engineering Trends and Technology (IJETT) – Volume 26 Number 5- August 2015
(user-generated data) and their applications (machinegenerated data).
failure and therefore is capable of offering true
continuous availability.
In general, NoSQL databases have become the first
alternative to relational databases, with scalability,
availability, and fault tolerance being key deciding
factors. They go well beyond the more widely
understood legacy, relational databases (such as
Oracle, SQL Server and DB2 databases) in satisfying
the needs of today‘s modern business applications. A
very flexible and schema-less data model, horizontal
scalability, distributed architectures, and the use of
languages and interfaces that are ―not only‖ SQL
typically characterize this technology.
Fig.4 Architecture of Apache Cassandra
B.Dataflow model
Fig. 3 NOSQL Family Tree
IV.
CASSANDRAARCHITECTURE
A.Cassandra Database Architecture
The architecture of Cassandra greatly contributes to
its being able to scale, perform, and offer continuous
availability. Cassandra was built from the ground up
with the understanding that hardware and system
failures can and do occur. This translates into
Cassandra sporting a different way of managing and
protecting data than a traditional RDBMS.
Rather than using a legacy master-slave or a
manual and difficult-to-maintain sharded design,
Cassandra has a peer-to-peer distributed architecture
that is much more elegant, and easy to set up and
maintain. In Cassandra, all nodes are the same; there
is no concept of a master node, with all nodes
communicating with each other via a gossip protocol.
Cassandra‘s built-for-scale architecture means that
it is capable of handling petabytes of information and
thousands of concurrent users/operations per second
(across multiple data centers) as easily as it can
manage much smaller amounts of data and user traffic.
It also means that, unlike other master-slave or
sharded systems, Cassandra has no single point of
ISSN: 2231-5381
Apache Cassandra is not another relational database
in market. Instead of using relational model, it uses
key-value map to store its data. The structure is more
or less can be explained as in the following picture:
Cassanda is essentially a hybrid between a key-value
and a column-oriented (or tabular) database. A column
family (called "table" since CQL 3) resembles a table
in an RDBMS. Column families contain rows and
columns. Each row is uniquely identified by a row key.
Each row has multiple columns, each of which has a
name, value, and a timestamp. Unlike a table in an
RDBMS, different rows in the same column family do
not have to share the same set of columns, and a
column may be added to one or multiple rows at any
time. Each key in Cassandra corresponds to a value
which is an object. Each key has values as columns,
and columns are grouped together into sets called
column families. Thus, each key identifies a row of a
variable number of elements. These column families
could be considered then as tables. A table in
Cassandra is a distributed multi dimensional map
indexed by a key. Furthermore, applications can
specify the sort order of columns within a Super
Column or Simple Column family.
Cluster:
Cassandra is designed to be distributed over several
nodes/machines. A cluster consists of several nodes.
I've only ever used Cassandra in a single node which
is my computer.
Keyspace:
http://www.ijettjournal.org
Page 265
International Journal of Engineering Trends and Technology (IJETT) – Volume 26 Number 5- August 2015
A cluster consists of several keyspaces. Keyspace is
Cassandra defines a column family to be a logical
the place where our data reside. A keyspace could
division that associates similar data. Basic Cassandra
have several Column Family or Super Column Family. data structures: the column, which is a name/value
pair and a client-supplied timestamp of when it was
last updated, and a column family, which is a
container for rows that have similar, but not identical,
column sets. There is no need to store a value for
Column Family and Super Column Family:
every column every time a new entity is stored. A
cluster is a container for keyspaces—typically a single
Both Column Family and Super Column Family is a
keyspace. A keyspace is the outermost container for
collection of rows, just like a table is a collection of
data in Cassandra, but it‘s perfectly fine to create as
rows in relational database.
many keyspaces as the application needs. A column
family is a container for an ordered collection of rows,
each of which is itself an ordered collection of
Row
columns.
A row consists of columns; key-value columns for a
row in Column Family, or Super Columns for a row in
V.
INSTALLATION
Super Column Family.
A.Environment
Cassandra requires the following environment
variables
to be set:
Super Column:
• JAVA_HOME - The path location of your Java
Virtual Machine (JVM) installation
It is sort of container of sub-columns (which are of
• CLASSPATH - A path containing all of the required
type Key-value Column).
Java class files (.jar)
• CASSANDRA_CONF - Directory containing the
Key-value Column:
Cassandra configuration files for convenience,
Cassandra uses an include file, cassandra.in.sh, to
The most basic data structure in Cassandra where the
source these environment variables. It will check the
actual data is saved as byte. The behavior is a lot like
following locations for this file:
Java Hash datatype.
• Environment setting for CASSANDRA_INCLUDE
if set
• $CASSANDRA_HOME/bin
• /us/share/cassandra/cassandra.in.sh
• /us/local/share/cassandra/cassandra.in.sh
• /opt/cassandra/cassandra.in.sh
• $HOME/.cassandra.in.sh Cassandra also uses the
Java options set in $CASSANDRA_CONF/cassandraenv.sh.
If you want to pass additional options to the Java
virtual machine, such as maximum and minimum heap
size, edit the options in that file rather than setting
B.Installing Cassandra Locally
Fig. 5 Cassandra Data Model
Steps to Implement Cassandra:
Apache Cassandra in a nutshell is an open source,
peer to peer distributed database architecture,
decentralized, easily scalable, fault tolerant, highly
available, eventually consistent, schema free, column
oriented database. Generally in a master/slave setup,
the master node can have far- reaching effects if it
goes offline. By contrast, Cassandra has a peer-to-peer
distribution model, such that any given node is
structurally identical to any other node—that is, there
is no ―master‖ node that acts differently than a ―slave‖
node. The aim of Cassandra‘s design is overall system
availability and ease of scaling. Cassandra data model
comprises of Keyspace (something like a database in
relational databases) and column families (tables).
ISSN: 2231-5381
This document aims to provide a few easy to follow
steps to take the first-time user from installation, to
running single node Cassandra, and overview to
configure multimode cluster. Cassandra is meant to
run on a cluster of nodes, but will run equally well on
a single machine. This is a handy way of getting
familiar with the software while avoiding the
complexities of a larger system.
1)Step 1: Prerequisites and Connecting to the
Community
http://www.ijettjournal.org
Page 266
International Journal of Engineering Trends and Technology (IJETT) – Volume 26 Number 5- August 2015
Cassandra requires the most stable version of Java 7
or 8 you can deploy, preferably the Oracle/Sun JVM.
Cassandra also runs on opened and the IBM JVM. (It
will NOT run on Rocket, which is only compatible
with Java 6.) The best way to ensure you always have
up to date information on the project, releases,
stability, bugs, and features is to subscribe to the users
mailing list (subscription required) and participate in
the #cassandra channel on IRC.
2) Step 2: Download Cassandra
Download links for the latest stable release
can always be found on the website.
Users of Debi an or Debi an-based
derivatives can install the latest stable release
in package form, see DebianPackaging for
details.
Users of RPM-based distributions can get
packages from Datastax.
If you are interested in building Cassandra
from source, please refer to How to build
page.
For more details about misc builds, please refer to
Cassandra versions and builds page.
3) Step 3: Basic Configuration
The Cassandra configuration files can be found in
the conf directory of binary and source distributions. If
you have installed Cassandra from a dab or rpm
package, the configuration files will be located in
/etc/Cassandra.
4) Step 4: Start Cassandra
And now for the moment of truth, start up
Cassandra by invoking 'bin/cassandra -f' from the
command line1. The service should start in the
foreground and log gratuitously to the console.
Assuming you don't see messages with scary words
like "error", or "fatal", or anything that looks like a
Java stack trace, then everything should be working.
Press "Control-C" to stop Cassandra.
If you start up Cassandra without the "-f" option, it
will run in the background. You can stop the process
by killing it, using 'pkill -f Cassandra.
VI.
FEATURES
The main features of Apache Cassandra @ Business
are given below:
A.Decentralized
ISSN: 2231-5381
Every node in the cluster has the same role. There
is no single point of failure. Data is distributed across
the cluster (so each node contains different data), but
there is no master as every node can service any
request.
B.Supports replication and multi data centre
replication
Replication strategies are configurable. Cassandra is
designed as a distributed system, for deployment of
large numbers of nodes across multiple data centers.
Key features of Cassandra‘s distributed architecture
are specifically tailored for multiple-data centre
deployment, for redundancy, for failover and disaster
recovery.
C.Scalability
Read and write throughput both increase linearly as
new machines are added, with no downtime or
interruption to applications.
D.Fault-tolerant
Data is automatically replicated to multiple nodes
for fault-tolerance. Replication across multiple data
centers is supported. Failed nodes can be replaced
with no downtime.
E.Tunable consistency
Writes and reads offer a tenable level of consistency,
all the way from "writes never fail" to "block for all
replicas to be readable", with the quorum level in the
middle.
F.Map Reduce support
Cassandra has Hadoop integration, with MapReduce
support. There is support also for Apache Pig and
Apache Hive.
G.Query language
Cassandra introduces CQL (Cassandra Query
Language), a SQL-like alternative to the traditional
RPC interface. Language drivers are available for Java
(JDBC), Python (DBAPI2), Node.JS (Helenus), Go
(gocql) and C++.
VII.
SECURITY
Cassandra provides these security features to the open
source community.
A.Client-to-node encryption
Cassandra includes an optional, secure form of
communication from a client machine to a database
cluster. Client to server SSL ensures data in flight is
not compromised and is securely transferred back/
forth from client machines.
B.Authentication based on internally controlled
login accounts/passwords
Administrators can create users who can be
authenticated to Cassandra database clusters using the
CREATE USER command. Internally, Cassandra
manages user accounts and access to the database
http://www.ijettjournal.org
Page 267
International Journal of Engineering Trends and Technology (IJETT) – Volume 26 Number 5- August 2015
cluster using passwords. User accounts may be altered
and dropped using the Cassandra Query Language
(CQL).
C.Object permission management
Once authenticated into a database cluster using
either internal authentication, the next security issue to
be tackled is permission management. What can the
user do inside the database? Authorization capabilities
for Cassandra use the familiar GRANT/REVOKE
security paradigm to manage object permissions.
D.Enabling JMX authentication
The default settings for Cassandra make JMX
accessible only from local host. If you want to enable
remote JMX connections, change the LOCAL_JMX
setting in cassandra-env.sh and enable authentication
and/or ssl. After you enable JMX authentication,
ensure that tools that use JMX, such as nodetool and
DataStax OpsCenter, are configured to use
authentication.
VIII.
APPLICATIONS
Developing Cassandra Applications the primary
difference developers will find when developing
applications against Cassandra vs. RDBMSs is the
data model. Cassandra uses a Google Bigtable model,
which provides more flexibility than a relational
design and can more easily store structured, semistructured, and unstructured data.
C.Digg
The recent V4 relaunch is 100% Cassandra. We are
running on multiple clusters internally, our largest one
is 40 nodes spanning multiple datacenters. We also
have 1 core commiter on staff.
D.Twitter
We're using Cassandra in production for a bunch of
things at Twitter.
E. SoftwareProjects
Software Projects uses 20 Cassandra nodes across 3
datacenters to power the eCommerce platform for
3,000 businesses. We use Cassandra to store real-time
purchase data and provide various stats for our
customers. Cassandra is our primary data store.
F.eBay
EBay has Cassandra supporting multiple
applications with rings spanning several data centers.
G.Rackspace
Rackspace uses Cassandra for a variety of internal
needs. Several Rackspace employees are also core
committers and contributors on the project.
H.Ooyala
Ooyala uses Cassandra as the backing store for a
near-realtime video analytics platform, which allows
publishers to analyze and optimize the performance of
their online video content.
I.Despeger
Despegar uses a Cassandra cluster to storage user
sessions of its hotel booking site, and also as a
persistent cache of flight itineraries.
J.SimpleGeo
Fig.6 Various Applications used in Cassandra
A.Facebook:
Facebook moved off its pre-Apache Cassandra
deployment in late 2010 when they replaced Inbox
Search with the Facebook Messaging platform. In
2012, Facebook began using Apache Cassandra in its
Instagram unit. Facebook uses their internally
developed Cassandra and has the largest known
cluster in operation of around 150 nodes
B.Netflix:
Netflix – An Example of Succeeding in the Cloud
with Cassandra With more than 25 million members
worldwide, Netflix, Inc. (Nasdaq: NFLX) is the
world's leading Internet subscription service for
enjoying movies and TV shows. Netflix allows its
members to instantly watch unlimited movies and TV
episodes streaming over the Internet to computers and
TVs.
ISSN: 2231-5381
We use Cassandra as our core datastore for
providing location-based services and products. We
run Cassandra in multiple availability zones within
Amazon EC2.
Because NoSQL databases like Cassandra do not
support operations like SQL joins, data tends to be
highly denormalized. While such a thing (wide rows)
is normally a problem for an RDBMS, Cassandra
provides exceptional performance for objects with
many thousands of columns. The primary container of
data is a keyspace, which is like a database in an
RDBMS. Inside a keyspace are one or more column
families, which are like relational tables, but they are
more fluid and dynamic in structure. Column families
have one too many thousands of columns, with both
primary and secondary indexes on columns being
supported.
http://www.ijettjournal.org
Page 268
International Journal of Engineering Trends and Technology (IJETT) – Volume 26 Number 5- August 2015
In Cassandra, objects are created, data is inserted
and manipulated, and information queried via CQL –
the Cassandra Query Language, which looks nearly
identical to SQL. Developers coming from the
relational world will be right at home with CQL and
will use standard commands (e.g., INSERT, SELECT)
to interact with objects and data stored in Cassandra.
Companies running their applications on Apache
Cassandra have realized benefits which have directly
improved their business. Cassandra is capable of
handling all of the big data challenges that might arise:
massive scalability, an always on architecture, high
performance, strong security, and ease of management,
to name a few. Learn about how businesses have
successfully deployed Apache Cassandra in their
environments based on various types of applications
and use cases.
Many companies have successfully deployed and
benefited from Apache Cassandra including some
large
companies
such
as: Apple, Comcast,
Instagram, Spotify, eBay, Rackspace, Netflix,
and
many more. The larger production environments have
PB‘s of data in clusters of over 75,000 nodes.
Cassandra is available under the Apache 2.0 license.
Some of the application use cases that Cassandra
excels in include:
• Real-time, big data workloads
• Time series data management
• High-velocity device data consumption and analysis
• Media streaming management (e.g., music, movies)
• Social media (i.e., unstructured data) input and
analysis
• Online web retail (e.g., shopping carts, user
transactions)
• Real-time data analytics
• Online gaming (e.g., real-time messaging)
• Software as a Service (SaaS) applications that utilize
web services
• Online portals (e.g., healthcare provider/patient
interactions)
• Most write-intensive systems.
IX.
CONCLUSION
In this paper, the detailed study is made to
understand their features and working of cassendra.
We also explain the cassendra with the help of its
architecture. Cassandra is a popular among nosql
database and Cassandra can be used for applications
requiring faster writes and high availability. Nosql
databases are not "One size fits all". Each nosql
classification addresses a specific data storage and
processing requirements. Thus we presented the
features
of
Cassandra
distributed
database
management system and benefits of it using in real
world enterprise applications. Cassandra is excellent
choice for businesses which are looking for features
such as high availability, consistency, and low
downtime, and fault tolerance, high scalability in
terms of both users and data. These features are
ISSN: 2231-5381
empirically tested by the creators of Cassandra on
production data spread across geographically diverse
areas with volumes in terms of several terabytes. With
continuous development going on under Apache
software foundation, Cassandra is expected to evolve
as most prominent and powerful distributed database
management system.
ACKNOWLEDGMENT
We would like to thank Prof. Sonali Dhule
Assistant Professor of Computer Science and
Engeering at college of Engineering and Technology
at Sant Gadge Baba Amravati University for providing
us with an idea to write a survey paper on Apache
Cassandra @ Business Field. We also thanks to IJETT
for providing this template to modify. The heading of
the Acknowledgment section and the References
section must not be numbered.
REFERENCES
1.
Robin Hecht Stefan Jablonski, University of Bayreuth "
NoSQL Evaluation A Use Case Oriented Survey" 2011
International Conference on Cloud and Service Computing
2.
Dietrich
Featherston
"cassandra:
Principles
and
Application" Department of Computer Science University
of Illinois at Urbana-Champaign
3. Ameya Nayak, Anil Poriya Dept. of Computer Engineering
Thakur College of Engineering and Technology University
of Mumbai " Type of NOSQL Databases and its
Comparison with Relational Databases" International
Journal of Applied Information Systems (IJAIS) – ISSN :
2249- 0868 Foundation of Computer Science FCS, New
York, USA Volume 5– No.4, March 2013
4. D. Agrawal, P. Bernstein, E. Bertino, S. Davidson, U.
Dayal, M. Franklin, J. Gehrke, L. Haas, A. Halevy, J. Han et
al., ―Challenges and opportunities with big data - a
community white paper developed by leading researchers
across the United States,‖ 2011.
5. F. Bugiotti, L. Cabibbo, P. Atzeni, and R. Torlone,
―Database design for NoSQL systems,‖ in Proceedings of
the 33rd International Conference on Conceptual Modeling,
2014, pp. 223–231.
6. Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung - The
Google File system, ACM SIGOPS Operating Systems
Review - SOSP '03 Homepage Volume 37 Issue 5,
December 2003
7. Avinash Lakshman, Prashant Malik Cassandra-A
Decentralized Structured Storage System, ACM SIGOPS
Operating Systems Review archive, Volume 44 Issue 2,
April 2010
8. F. Bugiotti, L. Cabibbo, P. Atzeni, and R. Torlone,
―Database design for NoSQL systems,‖ in Proceedings of
the 33rd International Conference on Conceptual Modeling,
2014, pp. 223–231.
9. D. Agrawal, P. Bernstein, E. Bertino, S. Davidson, U.
Dayal, M. Franklin, J. Gehrke, L. Haas, A. Halevy, J. Han et
al., ―Challenges and opportunities with big data - a
community white paper developed by leading researchers
across the United States,‖ 2011.
10. M. Lawley and R. W. Topor, ―A query language for EER
schemas,‖ inProceedings of the 5th Australasian Database
Conference, 1994, pp. 292–304.
11. Melnik S, Gubarev A, Long JJ, et al. Dremel: interactive
analysis of web-scale datasets. Proceedings of the 36th
International Conference on Very Large Data Bases 2010,
pp. 330–339.
http://www.ijettjournal.org
Page 269
Download