Advances in Database Technology: F1-Fault Tolerant RDBMS, C-Block and Q system

advertisement
International Journal of Engineering Trends and Technology (IJETT) - Volume4Issue4- April 2013
Advances in Database
Technology: F1-Fault Tolerant
RDBMS, C-Block and Q system
Y.Sailaja1, M.Nalini Sri2
Y.Sailaja, under graduation, Department of Electronics and Computer Engeineering,K L University,Guntur,India
M.Nalinisri, Assistant Professor Department of Electronics and Computer Engineering, ,K.L University ,Guntur ,India
Abstract— In this paper, we discuss the latest
database technologies that supports the present critical
challenges faced by the organizations, as managing the data
effectively has become a major need. In particular the latest
f1-fault tolerant distributed RDBMS which is an hybrid
database that combines the scalability of big table and
functionality of SQL are discussed. Then c-block system
that address the challenge of identifying the duplicates in
large datasets for better efficiency and next the q system for
efficient data integration which performs automatic data
integration for the incoming datasets are discussed and
finally we examine the integration of all these technologies
in a system that would address the issues pertaining to data
management
Keywords—distributed RDBMS, database, big table SQL,
data integration, data management
I. INTRODUCTION
This paper is about the latest technology of database
systems. We all know that oracle sql are the databases in
use upto now but in future database systems fault tolerant
RDBMS, C-BLOCK AND Q-SYSTEM play a major
role in database. Since the database space is also in large
amount compared to previous databases.
II.
OVERVIEW OF FAULT TOLERANT DISTRIBUTED
RDBMS
Many of the services that are critical to Google’s ad
business have historically been backed by MySQL. We
have recently migrated several of these services to F1, a
new RDBMS developed at Google. F1 implements rich
relational database features, including a strictly enforced
schema, a powerful parallel SQL query engine, general
transactions, change tracking and notification, and
indexing, and is built on top of a highly distributed
ISSN: 2231-5381
storage system that scales on standard hardware in
Google data centres. The store is
dynamically shared, supports transactionallyconsistent replication across data centres, and is
able to handle data centre outages without data
loss.
The strong consistency properties of F1 and its storage
system come at the cost of higher write latencies compared
to MySQL. Having successfully migrated a rich customer
facing application suite at the heart of
Google’s ad business to F1, with no downtime, we will
describe how we restructured schema and applications to
largely hide this increased latency from external users.
The distributed nature of F1 also allows it to scale easily
and to support significantly higher throughput for batch
workloads than a traditional RDBMS.
With F1, we have built a novel hybrid system that
combines the scalability, fault tolerance, transparent
sharing, and cost benefits so far available only in “No
SQL” systems with the usability, familiarity, and
transactional guarantees expected from an RDBMS
III.
C-BLOCK FOR DATA DUPLICATION
The technique is used to improve storage utilization and
can also be applied to network data transfers to reduce
the number of bytes that must be sent. In the duplication
process, unique chunks of data, or byte patterns, are
identified and stored during a process of analysis. As the
analysis continues, other chunks are compared to the
stored copy and whenever a match occurs, the redundant
chunk is replaced with a small reference that points to the
stored chunk. Given that the same byte pattern may occur
dozens, hundreds, or even thousands of times (the match
frequency is dependent on the chunk size), the amount of
data that must be stored or transferred can be greatly
http://www.ijettjournal.org
Page 935
International Journal of Engineering Trends and Technology (IJETT) - Volume4Issue4- April 2013
reduced
IV. Q SYSTEM OF SEARCH- BASED INTEGRATION
The data structure manipulated by a Q-system is a
Q-graph, which is a directed acyclic graph with one
entry node and one exit node, where each arc bears
a labelled ordered tree. An input sentence is usually
represented by a linear Q-graph where each arc bears
a word (tree reduced to one node labelled by this wo
-rd). After analysis, the Q-graph is usually a bundle of
1-arc paths, each arc bearing a possible analysis tree.
After generation, the goal is usually to produce as
many paths as desired outputs, with again one word
per arc.
V. INTEGRATING THE C-BLOCK , Q SYSTEM AND
FAULT TOLARENT RDBMS
By combining the C-Block. Q-System and Fault tolerant
RDBMS we can get an efficient way of database to store
the good and efficient storing facility
- BLOCK
Q-SYSTEM
INTEGRATION
FAULTTOLER
ENT RDBMS
Data management strategies aside, we can comfortably say
that data needs to be of good quality. It doesn't make sense
to distinguish between levels of quality if the integrity of
your system is not perfect. There is perfect information,
then there is information you can't use due to its less than
hundred percent internal consistency. That's one feature that
technology for data management needs to be on the top of.
VI.
DATABASE TECHNOLOGY
Data technology is to keep data entry as simple as
possible. Let's face it, data entry is a low profile job in
most cases and for most companies. The people doing it
are not always well versed in computer science,
especially not in the perks of relational database models.
If you handed them a paper, which explains how it works
and what's going on behind the software, they'd likely
blink once, and then carry on with something else. If
they had a full understanding of the principles of
database management technologies, they'd be called
'database specialists' and ask for three times as much
money.
Without any intention to sound condescending or offensive
towards those who enter data into the system, it's still
important to say that they need all the help they can get. All
fields in a form need to be as clear as possible, without any
overlap, duplication, or instructions
suggest you use capital letters.
The second most significant feature of the data
technology is that it needs to be simple and must give
ways to eliminate the possibility of incorrect data.
Think data consistency.
VII. SECURITY
Data management technology needs to be as secure as
Fort Knox. Most of the time even the system
administrator, who is responsible for ensuring
uninterrupted operation, is given only limited access to
the software itself. Why would a tech geek want to
tamper with corporate intellectual property anyway?
Data centres where the servers reside need to comply
with several guidelines and laws. The value of these
digital assets have skyrocketed in the last five years, no
wonder a tier 4 data centre is as well guarded as a federal
prison, with a fence nine feet tall topped with barbed
wire and biometric authorization on all doors.
Some believe that data storage virtualization is the future
of data technology. The truth is it is not only the future
but the present. More and more companies are starting to
use virtualization to save space and process information
faster.
VIII. ADVANTAGES
The strong consistency properties of F1 and its storage
system come at the cost of higher write latencies compared
to MySQL. Having successfully migrated a rich customer
facing application suite at the heart of
Google’s ad business to F1, with no downtime, we will
describe how we restructured schema and applications to
largely hide this increased latency from external users.
The distributed nature of F1 also allows it to scale easily
and to support significantly higher throughput for batch
workloads than a traditional RDBMS.
Data management technology is gaining more and more
traction with successful companies. The time for investing
in a software solution produced to deal with this segment of
IT has become much more urgent due to the global
recession. The economic downturn has stimulated providers
of such software to try and add even better features to their
products. Is it time your company invested in technology for
data management?
user supplied details. That's also the reason why they
ISSN: 2231-5381
http://www.ijettjournal.org
Page 936
International Journal of Engineering Trends and Technology (IJETT) - Volume4Issue4- April 2013
post-processing step to automatically generated blocks,
CBLOCK {\em rolls-up} smaller blocks to increase recall.
We present experimental results on two large-scale deduplication datasets at Yahoo!---consisting of over 140K
movies and 40K restaurants respectively---and demonstrate
the utility of CBLOCK.
X.CONCLUSION
As it is a advanced technology no conclusion is required so
we can use it in future to gain more storage facilities and
easy access to database technologies
XI.REFERENCE
1.
2.
Data Management Technology Resources
Centre Data Migration . Those in the data management
world know that eventually most companies will have to
migrate their data centres. This article will give you the
information and tools you need to get the job done.
3.
4.
XI. DATA DEDUPLICATION:
De-duplication---identification of distinct records referring
to the same real-world entity---is a well-known challenge in
data integration. Since very large datasets prohibit the
comparison of every pair of records, {\em blocking} has
been identified as a technique of dividing the dataset for
pairwise comparisons, thereby trading off {\em recall} of
identified duplicates for {\em efficiency}. Traditional deduplication tasks, while challenging, typically involved a
fixed schema such as Census data or medical records.
However, with the presence of large, diverse sets of
structured data on the web and the need to organize it
effectively on content portals, de-duplication systems need
to scale in a new dimension to handle a large number of
schemas, tasks and data sets, while handling ever larger
problem sizes. In addition, when working in a map-reduce
framework it is important that canopy formation be
implemented as a {\em hash function}, making the
canopy design problem more challenging. We present
CBLOCK,
a system that addresses these challenges. CBLOCK learns
hash functions automatically from attribute domains and a
labelled dataset consisting of duplicates. Subsequently,
CBLOCK expresses blocking functions using a hierarchical
tree structure composed of atomic hash functions. The
application may guide the automated blocking process
based on architectural
constraints, such as by specifying a maximum size of each
block (based on memory requirements), impose disjointness
of blocks (in a grid environment), or specify a particular
objective function trading off recall for efficiency. As a
ISSN: 2231-5381
http://www.ijettjournal.org
5.
6.
Jeffrey Ullman 1997: First course in database
systems, Prentice-Hall Inc., Simon & Schuster, Page
1, ISBN 0-13-861337-0.
Tsitchizris, D. C. and F. H. Lochovsky (1982). Data
Models. Englewood-Cliffs, Prentice-Hall.
Beynon-Davies P. (2004). Database Systems 3rd
Edition. Palgrave, Basingstoke, UK. ISBN 1-40391601-2
Raul F. Chong, Michael Dang, Dwaine R. Snow,
Xiaomei Wang (3 July 2008). "Introduction to DB2".
Retrieved 17 March 2013.. This article quotes a
development time of 5 years involving 750 people for
DB2 release 9 alone
C. W. Bachmann (November 1973), "The
Programmer as Navigator", CACM (Turing Award
Lecture 1973)^ Codd, E.F. (1970)."A Relational
Model of Data for Large Shared Data Banks".
In: Communications of the ACM 13 (6): 377–387.
William Hershey and Carol Easthope, "A set
theoretic data structure and retrieval language",
Spring Joint Computer Conference, May 1972
in ACM SIGIR Forum, Volume 7, Issue 4 (December
1972), pp. 45-55, DOI=10.1145/1095495.1095500
Page 937
Download