International Journal of Engineering Trends and Technology (IJETT) – Volume 6 Number 8- Dec 2013 A Fast and Simple Method for Maintaining Staleness of Database Menda.Sravani*,Chanti.Suragala# * Final M.TechStudent,#Assistant professor *# Dept of CSE , SISTAM college, Srikakulam, Andhra Pradesh Abstract: In data-warehousing there is large amount of data can store in the database. For maintaining the updated database it takes more time. For more scalability it can take more processing time also. So we introduce a new framework to achieve this problem. In this we included grouping and partitioning to schedule the tasks of updating more number of jobs in less time with respect to execution time and utilization time. I.INTRODUCTION In Data warehousing when dealing with large amount of database, there are so many problems of updating of the database. When any transaction done on the database it takes more amore amount of time to update. During updating of the database if any other transaction process the it makes collisions of the data. It results duplication of the data or violation of the constraints of the data tables. When any query executed in the database table, It may effects only root tables or root table and derived table. So that the query results effects both the tables. It takes more time to update the query results. Many researchers studied this problem in many ways. One of the concepts is deadlocks based solutions and processing time based solutions. Deadlocks is the situation when two or more actions are waiting to execute one after the other. In this databases the deadlocks works like sequential order. One transaction over at that ending time another transaction starts. Processing time is the time taken for the completion of the execution of the query and refreshing of the database. After this processing time over another transaction starts. The goal of a streaming warehouse is to propagate new data across all the relevant tables and views as quickly as possible. Once new data are loaded, the applications and triggers defined on the warehouse can take immediate action and it allows businesses to make decisions in nearly ISSN: 2231-5381 real time, which may lead to increased profits and it improved customer satisfaction, and prevention of serious problems that could develop if no action was taken. II.RELATED WORK In the scheduling algorithms generally each transaction is considered as job. It contains three properties such as utilization time ,processing time, and the execution time. Utilization time: It is defined as the time taken by the piece of equipment to complete the particular job. Processing time: It is refered as the time taken to excute particular query in the database. Exceution time: The time taken by the cpu to complete the particular jobincluding the run time of the job. By using these three properties the sheduling can be framed to complete the job. Scheduling is mainly used to increase the excution of more tasks in less amount of time. Considering this context in our work completion of jobs in less amount of time in the data warehouses. It maintains more stalesness when updating the data tables in the databse. We can resduce CPU utilization time also. In the runtime of the query if more time utilizes by the CPU then the next job switch to deadlock that means after completion of the first job only the next job can start process. The idea is to partition the update jobs by their expected processing times to partition the available computing resources into tracks. A track logically represents a fraction of the computing resources required by our complex jobs that is including of CPU and memory and disk I/Os. When an update job is released and placed in the queue corresponding to its assigned partition where scheduling decisions are made by a local scheduler running a basic algorithm. We assume that each job is executed on exactly one track therefore that tracks become a mechanism for limiting concurrency and for separating long jobs from short jobs(with the number of tracks being the limit on the number of concurrent jobs). For simplicity, we assume that the same type of basic scheduling algorithm is used for each track. http://www.ijettjournal.org Page 394 International Journal of Engineering Trends and Technology (IJETT) – Volume 6 Number 8- Dec 2013 III. PROPOSED MODEL In our work we divide the total process into three types such as partitioning, grouping and classification. Partitioning is the process of ordering following the particular process. Grouping is combining the similar data into one. Classification is deciding that an item belongs to which group of class. Partitioning:For partitioning we adapted EDF(Earlier deadline First) Partitioning algorithm. In this algorithm initially it reads all jobs having the entities such as starting time and ending time and the execution time. It orders the jobs based on the ending times ordering in the descending order. First ending job ordered first. Grouping: Grouping is so called as clustering. In this clustering process we used balanced iterative reducing and clustering using hierarchies. It is an unsupervised data mining algorithm used to perform hierarchical clustering over particularly large data-sets. An advantage of Birch is its ability to incrementally and dynamically cluster incoming, multi-dimensional metric data points in an attempt to produce the best quality clustering for a given set of resources (memory and time constraints). Clustering feature can be organized by using CF tree which is height balancing factors such as Branching and height. Classification: In classification we used naive Bayesian classification. In this classification it mainly uses bayes classifier. a naive Bayes classifier considers the presence or absence of a particular feature is unrelated to the presence or absence of any other feature, given the class variable. A naive Bayes classifier considers each of these features to contribute independently to the probability that this fruit is an apple and regardless of the presence or absence of the other features. The algorithm shown as follows: 1. Input Jobs as J={j1(0,0.2,0.3),j2(0.2,0.5,0.9),j3(0.1,0.2,0.8),......... .jn(0,0.1,0.3)} 2. Order the jobs based on the execution times and deadlines. Example (jn,j3,j1,j2) that is ordered in descending order. 3. After ordering of jobs the clustering process will starts. It calculates centroid using → =∑ /N . Distance between the clusters by √∑ ( − ) 2 Phase 1: Scan dataset once, build a CF tree in memory Phase 2: (Optional) Condense the CF tree to a smaller CF tree Phase 3: Global Clustering Phase 4: (Optional) Clustering Refining (require scan of dataset) Consider example CF of a data point (3,4) is (1,(3,4),25) Phase1:Insert a point to the tree: ISSN: 2231-5381 Find the path (based on D0, D1, D2, D3, D4 between CF of children in a non-leaf node) then Modify the leaf . After that find closest leaf node entry (based on D0, D1, D2, D3, D4 of CF in leaf node). Then Check if it can “absorb” the new data point and modify the path to the leaf. It splitting operation starts – if leaf node is full then split into two leaf node and add one more entry in parent. Phase2: Chose a larger T (threshold) Consider entries in leaf nodes and Reinsert CF entries in the new tree. If new “path” is “before” original “path”, move it to new “path” If new “path” is the same as original “path”, leave it unchanged Phase 3: It Consider CF entries in leaf nodes only and uses centroid as the representative of a cluster. It performs traditional clustering (e.g. agglomerative hierarchy (complete link == D2) or K-mean or CL…) and Cluster CF instead of data points. Phase 4: It requires scan of dataset one more time and use clusters found in phase 3 as seeds. Then redistribute data points to their closest seeds and form new clusters and remove outliers. 4. Classification: naive Bayes classifiers can be trained very efficiently in a supervised learning setting. In many practical applications, parameter estimation for naive Bayes models uses the method of maximum likelihood and in other words and that can work with the naive Bayes model without accepting Bayesian probability or using any Bayesian methods. Bayes theorem plays a critical role in probabilistic learning and classification. It Uses prior probability of each category given no information about an item. Categorization produces a posterior probability distribution over the possible categories given a description of an item. Product Rule: P( A B) P( A | B) P ( B) P( B | A) P( A) Sum Rule: P( A B) P( A | B) P ( B) P( B | A) P( A) It Estimates instead of greatly reduces the number of parameters (and the data sparseness).The learning step in Naïve Bayes consists of estimating and based on the frequencies in the training data. The unseen instance is classified by computing the class that maximizes the posterior When conditioned independence is satisfied, Naïve Bayes corresponds to MAP classification. http://www.ijettjournal.org Page 395 International Journal of Engineering Trends and Technology (IJETT) – Volume 6 Number 8- Dec 2013 Experimental Results: The below screen show the results of the EDF partitioning based on the execution times. The Jobs are sorted in based with respect to earlier deadline first algorithm. In this next step the cluster process done using birch algorithm and clusters are shown below. In this jobs are clustered taking the input from the partitioning results. In these calculations inputs are job ids and the utilization times. After clustering similar attributes of the jobs are grouped together with utilization times. IV. CONCLUSION In our system we introduced a scheduling method that schedule jobs which performs on the network. We first partition the jobs based on the execution times and utilization times and group the jobs. By using this we can maintain staleness of the database. It takes less time to refresh the data. We tested on the simulation and It will work efficiently on complex environment. The calculation complexity also less and tested manually. REFERENCES After scheduling of the jobs shown below. In this less utilization times of jobs are scheduled first. Jobs which are grouped in the clusters are considered as track. In the particular track the job which contains less utilization time executed first in the particular track. ISSN: 2231-5381 [1] B.Adelberg, H. Garcia-Molina, and B. Kao, “Applying UpdateStreams in a Soft Real-Time Database System,” Proc. ACMSIGMOD Int’l Conf. Management of Data, pp. 245-256, 1995. [2] B. Babcock, S. Babu, M. Datar, and R. Motwani, “Chain:Operator Scheduling for Memory Minimization in Data StreamSystems,” Proc. ACM SIGMOD Int’l Conf. Management of Data,pp. 253-264, 2003. [3] S. Babu, U. Srivastava, and J. Widom, “Exploiting Kconstraintsto Reduce Memory Overhead in Continuous Queries over DataStreams,” ACM Trans. Database Systems, vol. 29, no. 3, pp. 545-580, 2004. [4] S. Baruah, “The Non-preemptive Scheduling of Periodic Tasksupon Multiprocessors,” Real Time Systems, vol. 32, nos. 1/2, pp. 9-20, 2006. [5] S. Baruah, N. Cohen, C. Plaxton, and D. Varvel, “ProportionateProgress: A Notion of Fairness in Resource Allocation,” Algorithmic ,vol. 15, pp. 600-625, 1996. [6] M.H. Bateni, L. Golab, M.T. Hajiaghayi, and H. Karloff,“Scheduling to Minimize Staleness and Stretch in Realtime DataWarehouses,” Proc. 21st Ann. Symp. Parallelism in Algorithms andArchitectures (SPAA), pp. 29-38, 2009. [7] A. Burns, “Scheduling Hard Real-Time Systems: A Review,”Software Eng. J., vol. 6, no. 3, pp. 116-128, 1991. [8] D. Carney, U. Cetintemel, A. Rasin, S. Zdonik, M. Cherniack, andM. Stonebraker, “Operator Scheduling in a Data Stream Manager,”Proc. 29th Int’l Conf. Very Large Data Bases (VLDB), pp. 838-849, 2003. http://www.ijettjournal.org Page 396 International Journal of Engineering Trends and Technology (IJETT) – Volume 6 Number 8- Dec 2013 [9] J. Cho and H. Garcia-Molina, “Synchronizing a Database toImprove Freshness,” Proc. ACM SIGMOD Int’l Conf. Managementof Data, pp. 117-128, 2000. [10] L. Colby, A. Kawaguchi, D. Lieuwen, I. Mumick, and K. Ross,“Supporting Multiple View Maintenance Policies,” Proc. ACMSIGMOD Int’l Conf. Management of Data, pp. 405-416, 1997. [11] M. Dertouzos and A. Mok, “Multiprocessor On-Line Schedulingof Hard- Real-Time Tasks,” IEEE Trans. Software. Eng., vol. 15,no. 12, pp. 1497-1506, Dec. 1989. [12] U. Devi and J. Anderson, “Tardiness Bounds under Global EDFScheduling,” Real-Time Systems, vol. 38, no. 2, pp. 133-189, 2008. [13] N. Folkert, A. Gupta, A. Witkowski, S. Subramanian, S.Bellamkonda, S. Shankar, T. Bozkaya, and L. Sheng, “OptimizingRefresh of a Set of Materialized Views,” Proc. 31st Int’l Conf. VeryLarge Data Bases (VLDB), pp. 1043-1054, 2005. [14] M. Garey and D. Johnson, Computers and Intractability: A Guide tothe Theory of NP-Completeness. W.H. Freeman, 1979. ISSN: 2231-5381 http://www.ijettjournal.org Page 397