International Journal of Engineering Trends and Technology (IJETT) - Volume4Issue4- April 2013 Advances in Database Technology: F1-Fault Tolerant RDBMS, C-Block and Q system Y.Sailaja1, M.Nalini Sri2 Y.Sailaja, under graduation, Department of Electronics and Computer Engeineering,K L University,Guntur,India M.Nalinisri, Assistant Professor Department of Electronics and Computer Engineering, ,K.L University ,Guntur ,India Abstract— In this paper, we discuss the latest database technologies that supports the present critical challenges faced by the organizations, as managing the data effectively has become a major need. In particular the latest f1-fault tolerant distributed RDBMS which is an hybrid database that combines the scalability of big table and functionality of SQL are discussed. Then c-block system that address the challenge of identifying the duplicates in large datasets for better efficiency and next the q system for efficient data integration which performs automatic data integration for the incoming datasets are discussed and finally we examine the integration of all these technologies in a system that would address the issues pertaining to data management Keywords—distributed RDBMS, database, big table SQL, data integration, data management I. INTRODUCTION This paper is about the latest technology of database systems. We all know that oracle sql are the databases in use upto now but in future database systems fault tolerant RDBMS, C-BLOCK AND Q-SYSTEM play a major role in database. Since the database space is also in large amount compared to previous databases. II. OVERVIEW OF FAULT TOLERANT DISTRIBUTED RDBMS Many of the services that are critical to Google’s ad business have historically been backed by MySQL. We have recently migrated several of these services to F1, a new RDBMS developed at Google. F1 implements rich relational database features, including a strictly enforced schema, a powerful parallel SQL query engine, general transactions, change tracking and notification, and indexing, and is built on top of a highly distributed ISSN: 2231-5381 storage system that scales on standard hardware in Google data centres. The store is dynamically shared, supports transactionallyconsistent replication across data centres, and is able to handle data centre outages without data loss. The strong consistency properties of F1 and its storage system come at the cost of higher write latencies compared to MySQL. Having successfully migrated a rich customer facing application suite at the heart of Google’s ad business to F1, with no downtime, we will describe how we restructured schema and applications to largely hide this increased latency from external users. The distributed nature of F1 also allows it to scale easily and to support significantly higher throughput for batch workloads than a traditional RDBMS. With F1, we have built a novel hybrid system that combines the scalability, fault tolerance, transparent sharing, and cost benefits so far available only in “No SQL” systems with the usability, familiarity, and transactional guarantees expected from an RDBMS III. C-BLOCK FOR DATA DUPLICATION The technique is used to improve storage utilization and can also be applied to network data transfers to reduce the number of bytes that must be sent. In the duplication process, unique chunks of data, or byte patterns, are identified and stored during a process of analysis. As the analysis continues, other chunks are compared to the stored copy and whenever a match occurs, the redundant chunk is replaced with a small reference that points to the stored chunk. Given that the same byte pattern may occur dozens, hundreds, or even thousands of times (the match frequency is dependent on the chunk size), the amount of data that must be stored or transferred can be greatly http://www.ijettjournal.org Page 935 International Journal of Engineering Trends and Technology (IJETT) - Volume4Issue4- April 2013 reduced IV. Q SYSTEM OF SEARCH- BASED INTEGRATION The data structure manipulated by a Q-system is a Q-graph, which is a directed acyclic graph with one entry node and one exit node, where each arc bears a labelled ordered tree. An input sentence is usually represented by a linear Q-graph where each arc bears a word (tree reduced to one node labelled by this wo -rd). After analysis, the Q-graph is usually a bundle of 1-arc paths, each arc bearing a possible analysis tree. After generation, the goal is usually to produce as many paths as desired outputs, with again one word per arc. V. INTEGRATING THE C-BLOCK , Q SYSTEM AND FAULT TOLARENT RDBMS By combining the C-Block. Q-System and Fault tolerant RDBMS we can get an efficient way of database to store the good and efficient storing facility - BLOCK Q-SYSTEM INTEGRATION FAULTTOLER ENT RDBMS Data management strategies aside, we can comfortably say that data needs to be of good quality. It doesn't make sense to distinguish between levels of quality if the integrity of your system is not perfect. There is perfect information, then there is information you can't use due to its less than hundred percent internal consistency. That's one feature that technology for data management needs to be on the top of. VI. DATABASE TECHNOLOGY Data technology is to keep data entry as simple as possible. Let's face it, data entry is a low profile job in most cases and for most companies. The people doing it are not always well versed in computer science, especially not in the perks of relational database models. If you handed them a paper, which explains how it works and what's going on behind the software, they'd likely blink once, and then carry on with something else. If they had a full understanding of the principles of database management technologies, they'd be called 'database specialists' and ask for three times as much money. Without any intention to sound condescending or offensive towards those who enter data into the system, it's still important to say that they need all the help they can get. All fields in a form need to be as clear as possible, without any overlap, duplication, or instructions suggest you use capital letters. The second most significant feature of the data technology is that it needs to be simple and must give ways to eliminate the possibility of incorrect data. Think data consistency. VII. SECURITY Data management technology needs to be as secure as Fort Knox. Most of the time even the system administrator, who is responsible for ensuring uninterrupted operation, is given only limited access to the software itself. Why would a tech geek want to tamper with corporate intellectual property anyway? Data centres where the servers reside need to comply with several guidelines and laws. The value of these digital assets have skyrocketed in the last five years, no wonder a tier 4 data centre is as well guarded as a federal prison, with a fence nine feet tall topped with barbed wire and biometric authorization on all doors. Some believe that data storage virtualization is the future of data technology. The truth is it is not only the future but the present. More and more companies are starting to use virtualization to save space and process information faster. VIII. ADVANTAGES The strong consistency properties of F1 and its storage system come at the cost of higher write latencies compared to MySQL. Having successfully migrated a rich customer facing application suite at the heart of Google’s ad business to F1, with no downtime, we will describe how we restructured schema and applications to largely hide this increased latency from external users. The distributed nature of F1 also allows it to scale easily and to support significantly higher throughput for batch workloads than a traditional RDBMS. Data management technology is gaining more and more traction with successful companies. The time for investing in a software solution produced to deal with this segment of IT has become much more urgent due to the global recession. The economic downturn has stimulated providers of such software to try and add even better features to their products. Is it time your company invested in technology for data management? user supplied details. That's also the reason why they ISSN: 2231-5381 http://www.ijettjournal.org Page 936 International Journal of Engineering Trends and Technology (IJETT) - Volume4Issue4- April 2013 post-processing step to automatically generated blocks, CBLOCK {\em rolls-up} smaller blocks to increase recall. We present experimental results on two large-scale deduplication datasets at Yahoo!---consisting of over 140K movies and 40K restaurants respectively---and demonstrate the utility of CBLOCK. X.CONCLUSION As it is a advanced technology no conclusion is required so we can use it in future to gain more storage facilities and easy access to database technologies XI.REFERENCE 1. 2. Data Management Technology Resources Centre Data Migration . Those in the data management world know that eventually most companies will have to migrate their data centres. This article will give you the information and tools you need to get the job done. 3. 4. XI. DATA DEDUPLICATION: De-duplication---identification of distinct records referring to the same real-world entity---is a well-known challenge in data integration. Since very large datasets prohibit the comparison of every pair of records, {\em blocking} has been identified as a technique of dividing the dataset for pairwise comparisons, thereby trading off {\em recall} of identified duplicates for {\em efficiency}. Traditional deduplication tasks, while challenging, typically involved a fixed schema such as Census data or medical records. However, with the presence of large, diverse sets of structured data on the web and the need to organize it effectively on content portals, de-duplication systems need to scale in a new dimension to handle a large number of schemas, tasks and data sets, while handling ever larger problem sizes. In addition, when working in a map-reduce framework it is important that canopy formation be implemented as a {\em hash function}, making the canopy design problem more challenging. We present CBLOCK, a system that addresses these challenges. CBLOCK learns hash functions automatically from attribute domains and a labelled dataset consisting of duplicates. Subsequently, CBLOCK expresses blocking functions using a hierarchical tree structure composed of atomic hash functions. The application may guide the automated blocking process based on architectural constraints, such as by specifying a maximum size of each block (based on memory requirements), impose disjointness of blocks (in a grid environment), or specify a particular objective function trading off recall for efficiency. As a ISSN: 2231-5381 http://www.ijettjournal.org 5. 6. Jeffrey Ullman 1997: First course in database systems, Prentice-Hall Inc., Simon & Schuster, Page 1, ISBN 0-13-861337-0. Tsitchizris, D. C. and F. H. Lochovsky (1982). Data Models. Englewood-Cliffs, Prentice-Hall. Beynon-Davies P. (2004). Database Systems 3rd Edition. Palgrave, Basingstoke, UK. ISBN 1-40391601-2 Raul F. Chong, Michael Dang, Dwaine R. Snow, Xiaomei Wang (3 July 2008). "Introduction to DB2". Retrieved 17 March 2013.. This article quotes a development time of 5 years involving 750 people for DB2 release 9 alone C. W. Bachmann (November 1973), "The Programmer as Navigator", CACM (Turing Award Lecture 1973)^ Codd, E.F. (1970)."A Relational Model of Data for Large Shared Data Banks". In: Communications of the ACM 13 (6): 377–387. William Hershey and Carol Easthope, "A set theoretic data structure and retrieval language", Spring Joint Computer Conference, May 1972 in ACM SIGIR Forum, Volume 7, Issue 4 (December 1972), pp. 45-55, DOI=10.1145/1095495.1095500 Page 937