www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242

www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume 4 Issue 1 January 2015, Page No. 10028-10042 Constructing Horizontal layout and Clustering Horizontal layout by applying Fuzzy Concepts for Data mining Reasoning Kalluri N V Satya Naresh, Divya Vani .Y divyasudha99@gmail.com Shri Vishnu Engineering College for Women Bhimavaram, Andhra Pradesh, India Abstract: Clustering is one of the significant tasks in data mining which is benevolent for bounteous users by affording analysis and decision making. This paper inaugurates agile and dexterous way to conceive horizontal layout and forthright usage of horizontal layout in data mining algorithms like clustering. Predominantly educing a data set in data mining project for analysis is a time conceiving, striving task so horizontal layouts are created and stored in database which averts the burden of performing data preprocessing in data mining projects .The vertical layouts created by vertical aggregations in SQL are impotent for data mining algorithms so horizontal aggregations are used to create horizontal layouts. It is surpass to create horizontal layout instead of creating vertical layout as vertical layout only creates one column per aggregated group by using normal SQL (Structured Query Language) aggregations and horizontal layouts returns many values per aggregated group or row so they are useful for data mining algorithms. Through CASE and SPJ methods horizontal aggregations are evaluated for creating horizontal layouts dexterously and agilely. This paper induces how horizontal layout can be created easily with CASE method than by using SPJ method. To prepare a data set for clustering takes more time and effort so the created horizontal layout is obliged for clustering directly without wastage of time and effort. As in data uncertainty is the key feature so by using soft computing concepts like Fuzzy Set, clustering of horizontal layout is done, hence clustered data is serendipitous for users for analysis and decision making and the whole process is elucidated with examples and experimental results. Keywords: Horizontal Aggregation, Horizontal layout, Vertical layout, Vertical Aggregation, Horizontal layouts are dreadfully of assistance in data Data mining algorithms, so this paper utterly perambulates mining Concepts. algorithms, Clustering, Fuzzy about effortless creation and clustering of horizontal layout by superintendence imprecise data. 1. Introduction: Kalluri N V Satya Naresh, IJECS Volume 4 Issue 1 January, 2015 Page No.10028-10042 Page 10028 Generally erecting a data set for data mining projects is ample information about itemized groups from data a most time conceiving process. The vertical layouts based on peculiar variables such as gender, name, age, spawned by normal SQL aggregation functions address, profession, phone number or income is called (vertical aggregations) are discordant for using in data as general aggregation. Utmost data mining algorithms mining tasks or projects. Vertical layout spawned by crave horizontal layout data set as input because vertical aggregations dwelled of more no of rows horizontal layout return values per aggregated row which are not I/O (Input or Output) efficient and are instead of one value per aggregated row. A latest class impotent for using in data mining tasks or projects. So of aggregate functions is contemplated to return a table to disentangle the problem of erecting data sets or data set having horizontal layout aggregating horizontal create expressions of numeric and transposing the results. horizontal layout easily. Horizontal layouts are Functions which belong to this type of class are augment I/O efficient than vertical layout for using in horizontal data mining algorithms like classification, regression epitomize the dilatation form of traditional SQL analysis, PDA, clustering. Horizontal layout can avoid aggregations, which return a group of values or the burden of creating data sets by performing data columns in a horizontal layout per aggregated row or preprocessing phase and data set creation phase with group instead of a single column or value per complex SQL queries. Vertical layouts have some aggregated row. aggregations are adopted to aggregations. Horizontal aggregations limitations to use for data mining algorithms which are erected by using normal SQL functions as they return only one column per aggregated group or row, so Horizontal layout is created by using functions called horizontal aggregations which create many columns or values per aggregated group or row instead of one value per row. They are many advantages with horizontal aggregations which are helpful for generating SQL code automatically and these are evaluated by using SPJ and CASE methods in this paper. In this paper it is clearly proved with example that it is easy and time efficient to create horizontal layout by using CASE method than using SPJ method. Without performing any data mining pre-processing tasks in-anticipation created horizontal layout is used unswervingly for clustering saving time and effort. Many vital operators and functions are needed to compute aggregations in SQL. Sum is the ultimate prevalently used aggregation of a column and assorted other aggregation operators return the row count, maximum, average and minimum over the groups of rows. For accomplishing aggregations all the extant operators have cramp to be used in data mining intendments to create large data sets. For OLTP (online transaction process) database schemas need to be profoundly normalized. But conventionally data mining, machine learning or statistical algorithms carve aggregated data to be in synopsized form. Data mining algorithms use suitable input as cross tabular (horizontal) pattern so for this intendment essential endeavor is required to compute aggregation. Clustering of horizontal layout is performed by using En masse creating a data set for data mining projects is Fuzzy Concepts handling impreciseness and vagueness a most time conceiving process. Horizontal layouts are of data. I/O and time efficient for using in data mining The mechanism where information is gleaned, asserted in a summary form and recycled for demographic analysis is known as data aggregation. Intension to get algorithms like classification, regression analysis, PDA, clustering which can avoid the burden of creating data sets by performing data preprocessing Kalluri N V Satya Naresh, IJECS Volume 4 Issue 1 January, 2015 Page No.10028-10042 Page 10029 phase and data set creation phase with complex SQL fixed than it is K-Means algorithm and if no of clusters queries. Vertical layouts have some cramp to use for are fixed than it is fuzzy-C Means algorithm. data mining algorithms which are created by using normal SQL functions as they return only one column per aggregated group or row, so Horizontal layout is created by using functions called horizontal aggregations which create many columns or values per aggregated group or row instead of one value per row. They are many advantages with horizontal As horizontal layout can be used precisely for data mining algorithms or projects we are using well-nigh for clustering because it is one of most important task in data mining. Clustering of Horizontal layout can be performed through Fuzzy C-Means algorithm. 2. Literature Review: aggregations like procreate SQL code automatically Database is formulating data to model pertinent and evaluated by using SPJ and CASE methods. aspects of verisimilitude in a way to support An advanced class function is Horizontal aggregation processes to return attributes or columns that are aggregated in a Management System (DBMS) are specially developed horizontal layout. Most algorithms require datasets software applications that interact with applications, with horizontal layout as input. It is tenacious task to users and database to capture data and analyze data. superintend data sets without rampart of DBMS. DBMS is special software designed to allow define, Intramural a Relational database it is worthier to try create, update, query and administrate database. Some with different subsets of dimensions and data points known DBMS are MYSQL, PostgreSQL, MariaDB, are easier, faster and flexible than working outside SQLLite, Oracle, Microsoft SQL Server, DBase, SAP with another alternative tool. Much like project, join, HANA, FoxPro, Libre office Base, IBM DB2, and File select, horizontal aggregation are performed by using Marker Pro. requiring information. Data Base operator and it is better to implement inside query processor. To select data from database SELECT statement is used. Projection is selecting of the columns of table In everyday and advanced applications intersperse of soft computing and tools are invigorated by soft computing. In real applications data uncertainty is the clamorous feature and as hard computing cannot handle vague and uncertain data soft computing is used. Zadeh inaugurated the notion of graded that one wishes to appear in the answer or table or data set. SQL join is used to built data set or table based on the common field between tables from two or more tables to combine rows of tables. Left outer join returns the matched tuples or rows from the right table and all the tuples or rows from left table. membership by perceiving the concept of Fuzzy set in order to apprehend impreciseness in data, and theorize Aggregation function groups multiple rows values to the most form a single value based on certain condition. The autonomous learning problem clustering is dealing most commonly used aggregation functions are with discovering a structure in a collection of average (), maximum (), mode (), median (), count (), unlabeled data. To cluster inexact and imprecise data minimum (), sum (). These normal SQL aggregation Fuzzy based clustering algorithms are used. In functions are also called as vertical aggregation clustering if the minimum no of elements in a cluster is functions useful to create vertical layout. Group by characteristic function of sets. The clause performs gathering of all the rows that contains Kalluri N V Satya Naresh, IJECS Volume 4 Issue 1 January, 2015 Page No.10028-10042 Page 10030 data in the opted columns and allows aggregation preparation phase takes lot of time and effort. The functions to operate on one or more columns. horizontal layouts can be precisely used as input data sets by data mining algorithms like classification, Data mining is the process of extracting knowledge from data. Data present in various data sources is collected and stored in data warehouses than data mining functionalities are performed on preprocessed data giving results of user understandable form. All tasks Data cleaning, transforming, reducing, regression analysis, association rule generation, Classification, clustering, outlier analysis comes under data mining tasks. This paper deals with clustering among different functionalities of data mining. regression analysis, clustering and PDA without again preparing data sets from data tables. Prevalent SQL aggregation functions like min, avg, sum, and max can be used to create vertical layout. Vertical layouts elicited by using accustom SQL aggregation functions but cannot be opted as I/O efficient for data mining algorithms because they can generate only one column per aggregated group and legion rows. Therefore a horizontal layout is imperative having many columns per aggregated group i.e returning many values per Data Clustering is the technique of partitioning a row. By excogitating functions like horizontal dataset into distinct clusters depending upon the aggregations educing horizontal layout can be comply. property of same identity of elements. The Elements Data mining tools can perforce generate SQL code. To which are having identical features are kept in a single assay horizontal aggregations methods like CASE and cluster, whereas not so identical elements are kept in SPJ can be afford. different clusters. In 1965 Zadeh determined the sign of fuzzy set and deliberated fuzzy set. Membership 2.3 Advantages of creating horizontal layouts using horizontal aggregations and clustering them: function is accredited with fuzzy set and considerate to tackle with imprecise data. (.) In data mining tools SQL code can be generated as A fuzzy set is defined as A  S, where S is a set in an horizontal aggregation constructs a template and universe, is defined by its membership function denoted by  A such that  A : X  [0,1] , that is every y  A is associated with a real number  A ( y ) , called the membership value of x, which satisfies 0<  A ( y ) <1. automates to reproduce, optimize and test SQL queries for correctness. (.) SQL queries generated axiomatically are more efficient than queries generated by end user. (.) The data set created by horizontal aggregations can be created unswervingly in the database. To cluster data by super visioning impreciseness by using Fuzzy set concept, clustering is performed for the created Horizontal layouts and the clustered data is serendipitous for users to analysis and decision making purposes. 2.2 Need For Creating Horizontal Layout: (.) The Horizontal layouts created can be straightly given as input for data mining algorithms like classification, regression analysis, clustering and PDA. (.) The clustered data created by clustering horizontal layout is more serendipitous by users for analysis and decision making. Horizontal layout predominantly untangles the burden of data mining projects as educing of data sets in data Kalluri N V Satya Naresh, IJECS Volume 4 Issue 1 January, 2015 Page No.10028-10042 Page 10031 2.4 Definitions: T is a database table with primary key P, C1, C2,…….,Ci as discrete columns, N as one numeric column and it is symbolized as T(P, C1, C2….Ci, N). In OLAP terms it is interpreted as T is the fact table having P as primary key, i dimensions, N as measure column where M is the size of the table, C1, C2 …….Ci Table 2.2 Vertical layout are foreign keys in fact table and primary keys in lookup tables. T is the input table, by executing SQL queries tables TV, TH are created where Table TV is the vertical layout table, TH is the horizontal layout. Conversion of vertical layout to horizontal layout is After giving the above SQL Query with SQL aggregation function like sum, above table 2.2 is the output for query which is called a vertical layout. As this vertical layout is having only one aggregated column and both C1, C2 acting as primary key it is not the goal of horizontal aggregations. useful for giving as input to data mining algorithms, Let us consider the following table T as example So horizontal tabular layout is required. The following having P as primary key, C1, C2 as discrete columns table 2.3 is horizontal layout having two aggregated and N as numeric column. columns and one primary key which is helpful for giving as input in data mining tasks or algorithms. Table 2.1 A Database Table Table 2.3 Horizontal layout 3. Methodology Consider the query Select C1, C2, Sum (N) from T group by C1, C2 order by C1, C2. 3.1 Horizontal Aggregations: Horizontal aggregations are abetting in times where the user wants to get output in horizontal form or craves amalgamating vertical layout with aggregations confide in on grouping columns. As vertical layout are not that abundantly commodious for data mining Kalluri N V Satya Naresh, IJECS Volume 4 Issue 1 January, 2015 Page No.10028-10042 Page 10032 algorithms horizontal layout are created by using The syntax for erecting of Horizontal layout is as horizontal follows: aggregations. Horizontal aggregations revamp the vertical layout to horizontal layout by transmogrifying the aggregation column N to list of transposing columns Y1……….YK. Consider an SQL Query that takes X1…..Xm as subset from C1…..Cp1. SELECT X1,….,Xj , Ha(N BY Y1,….,Yk) FROM F GROUP BY X1,….,Xj . Consider a palpable example of stores database procuring stores information in Table transaction. Table transaction is possessing strid, deptid, date, The syntax for conceiving vertical layout is as follows. Select X1….Xm, sum (N) from T group by X1….Xm. The above query will outturn a vertical layout data set possessing m+1 columns where the m columns X1…Xm act as primary and Sum (N) is the only one aggregated column. month, year, day, rate, qty, totalsales, itemqty, costAmt as columns. Suppose if we appetite to find out total sales for each storied by each day of the week. The normal SQL statement for the above query is Select strid, day, sum (totalsales) from transaction group by strid, day order by strid, day. To metamorphose the Vertical layout to horizontal layout, horizontal aggregation functions are used. The indispensable desideratum of This gives a vertical layout like below horizontal aggregations is to transmogrify aggregated column N by a list of columns Y1……Yk where the Y1….. Yk are subset of columns X1……Xm and k<m. So to inaugurate SQL code by horizontal aggregations there are four input parameters T, X1….Xm, N, Y1….Yk Where T is the Input table, X1….Xm are the grouping columns, N is the aggregated column and Y1….Yk are transposing columns. The frame of reference for horizontal aggregation is similar to the frame of reference for vertical aggregation. The horizontal aggregation function is connate by Ha(N BY Y1,….,Yk) where Ha is the standard SQL aggregation function , N is the aggregation column and Y1………..Yk are the Fig 3.1.1 Vertical layout created by using vertical aggregations transposing columns. Annexing of standard SQL aggregation or vertical aggregation function is This vertical layout is not useful for data mining tasks rendered by using “By” clause which transmutes the as it has only one aggregated column and both strid , aggregation column N to list of transposing columns day of week act as primary key returning many Y1………Yk which avails in conceiving a horizontal records. So by using horizontal aggregations layout instead of vertical layout creation. Kalluri N V Satya Naresh, IJECS Volume 4 Issue 1 January, 2015 Page No.10028-10042 Page 10033 horizontal layout is created having many aggregated columns and only strid as primary key. The SQL syntax with horizontal aggregations is as follows: Select strid, sum (total_sales BY day_of_Week) from transaction group by strid. Fig 3.2 System Architecture 3.2.1 Module 1(Selection Process) We need to select the table from database and select the columns that we want to group by, aggregate, transpose for which we want to create horizontal Fig 3.1.2 Horizontal layout created layout. by using Horizontal aggregations Select the group by column X1…..Xj 3.2 Creation and Clustering Horizontal Layout Select the aggregate column N This paper percolates creation of horizontal layout with CASE, SPJ methods and clusters the resulted Horizontal layout by using Fuzzy C-Means algorithm. Select the transposing column Y1….Yk. 3.2.2 Module 2(Creation of Horizontal Layout) An Example with results is also explained for understanding. Horizontal layouts can be created by In this module horizontal layouts are created by using CASE, SPJ and Pivot methods but PIVOT and CASE SPJ and CASE methods. method give the same result with almost same time complexity but CASE method is having better time complexity than SPJ method. So we are only using CASE and SPJ methods in our process, both gives same result with different time complexities. Creation and clustering horizontal layouts is done in three modules. This is the proposed System architecture: 3.2.2.2 SPJ Method: In this caliber we aggregate the column in horizontal way with the help of SPJ (Select, Project, Join) method. The basic idea is to create one table with a vertical aggregation for each result column, and then join all those tables to produce FH. We aggregate from F into d projected tables with d Select-Project-Join-Aggregation projection, join, aggregation). queries Each (selection, table F1 corresponds to one sub grouping combination and has {X1… Xj} as primary key and an aggregation on A as Kalluri N V Satya Naresh, IJECS Volume 4 Issue 1 January, 2015 Page No.10028-10042 Page 10034 the only non key column. It is necessary to introduce possible optimization is synchronizing table scans to an additional table F0 that will be outer joined with compute the d tables in one pass. projected tables to get a complete result set. Finally, to get TH we need d left outer joins with the T0 Three Main Steps in SPJ Method to create Horizontal and d tables so that all individual aggregations are layout: properly assembled as a set of d dimensions for each group. Outer joins set result columns to null for (.) First Table T0 is created having distinct combination of group by columns X1,……..,Xj. missing combinations for the given group. In general, nulls should be the default value for groups with For each unique combination of Transposing missing combinations. We believe it would be columns Y1 ,………,Yk , Tables T1 ,…….,Td are incorrect to set the result to zero or some other number created. by default if there is no qualifying rows. Such (.) approach should be considered on a per CASE basis. (.) Lastly Table T0 is left outer joined with each table INSERT INTO TH SELECT T0.X1, T0.X2, . . . , T0.Xj, T1 to Td. T1.N, T2.N,. . . ,Td.N FROM T0 LEFT OUTER JOIN How these tables are created is clearly explained below. T1 ON T0.X1 = T1.X1 and . . . and T0.Xj =T1.Xj LEFT OUTER JOIN F2 ON T0.X1= T2.X1 and . . . and T0.Xj = Table T0 defines the number of result rows, and builds T2.Xj…………..LEFT OUTER JOIN Fd ON T0.X1 = the primary key. T0 is populated so that it contains Td.X1 and . . . and T0.Xj = Td.Xj. every existing combination of X1,..……,Xj. Table F0 has X1,……,Xj as primary key and it does not have any Real Time Example for SPJ method: Consider a database having stores information and non key column. Transaction is a table in the database having StoreId, INSERT INTO T0 SELECT DISTINCT X1,. . . , Xj DepId, Date, Month, Year, Day, ItemId, Rate, Qty, FROM T. Amt as columns. We should create tables T1 to Td . Tables T1,, ……., Td contain individual aggregations Suppose if want find total sales amount for each storied by each day of week. for each combination of R1, . . .,Rk. The primary key of table T1….Td is Y1,……….,Yk and N is aggregated The following queries should be computed to construct column. horizontal layout by using SPJ method INSERT INTO T1 SELECT Xi,….Xj, V(N) FROM Query1: T/Tv INSERT INTO F0 SELECT DISTINCT storeid WHERE Y1 = v11 AND ………… Yk= Vk1 GROUP BY Xi,….Xj. Query2: Then each table T1 aggregates only those rows that correspond FROM Transaction. to the Ith unique combination of Y1……….Yk , given by the WHERE clause. A INSERT INTO F1 SELECT storeid, sum (amt) AS totalsalesamt FROM Transaction WHERE Day=’Mon’ GROUP BY storeid; Kalluri N V Satya Naresh, IJECS Volume 4 Issue 1 January, 2015 Page No.10028-10042 Page 10035 .Query3: OUTER JOIN F3 on F0.storeid=F3.storeid LEFT OUTER JOIN F4 on F0.storeid=F4.storeid LEFT INSERT INTO F2 SELECT storeid, sum (amt) AS totalsalesamt FROM Transaction OUTER JOIN F5 on F0.storeid=F5.storeid LEFT WHERE OUTER JOIN F6 on F0.storeid=F6.storeid LEFT Day=’Tue’ GROUP BY storeid; OUTER JOIN F7 on F0.storeid=F7.storeid. .Query4: By evaluating above queries we will get the horizontal INSERT INTO F3 SELECT storeid, sum (amt) AS layout that we want but it takes lot of effort as more totalsalesamt FROM Transaction WHERE Day=’Wed’ sub queries should be written and more join operations GROUP BY storeid; should be performed. Consider the same above query, .Query5: to create vertical layout for this just one query is INSERT INTO F4 SELECT storeid, sum (amt) AS totalsalesamt FROM Transaction WHERE enough i.e select storied, day, sum (amt) from Transaction group by storied, day. But to create horizontal layout we are writing 9 queries, so to reduce Day=’Thu’ GROUP BY strid; the effort and time complexity CASE method can be Query6: used to create horizontal layout easily with less effort. INSERT INTO F5 SELECT storeid, sum (amt) AS 3.2.2.3 CASE Method: totalsalesamt FROM Transaction WHERE Day=’Fri’ In this module we aggregate the column horizontally GROUP BY storeid; through CASE Method. The CASE statement returns a . value selected from a set of values based on Boolean expressions. From a relational database theory point of Query7: view this is equivalent to doing a simple INSERT INTO F6 SELECT storeid, sum (amt) AS projection/aggregation query where each non key totalsalesamt FROM Transaction WHERE Day=’Sat’ value is given by a function that returns a number GROUP BY storeid; based on some conjunction of conditions. In a similar manner to SPJ, the method directly aggregates from F. .Query8: INSERT INTO F7 SELECT storeid, sum (amt) AS Horizontal aggregation queries can be evaluated by totalsalesamt FROM F Transaction WHERE directly aggregating from F and transposing rows at Day=’Sun’ GROUP BY storeid; the same time to produce FH. First, we need to get the unique combinations of R1,….,Rk that define the Query9: INSERT matching Boolean expression for result columns. The INTO FH SELECT F0.storied, F1.totalsalesamt AS Mon-amt, F2.totalsalesamt AS Tue-amt, F3.totalsalesamt F4.totalsalesamt from F is as follows: AS Wed-amt, AS Thu-amt, F5.totalsalesamt AS fri-amt, F6.totalsalesamt SQL code to compute horizontal aggregations directly AS Sat-amt, F7.totalsalesamt AS Sun-amt FROM F0 LEFT OUTER JOIN F1 on F0.storeid=F1.storeid LEFT OUTER JOIN F2 on F0.storrid=F2.storeid LEFT V () is a standard (vertical) SQL aggregation that has a “CASE” statement as argument. Horizontal aggregations need to set the result to null when there are no qualifying rows for the specific horizontal group to be consistent with the SPJ method and also with the extended relational model. Kalluri N V Satya Naresh, IJECS Volume 4 Issue 1 January, 2015 Page No.10028-10042 Page 10036 SQL Syntax for CASE method is given below, in the previously created data set can be directly taken as syntax T is the original table and TH is the horizontal input for clustering instead of again creating data set. layout: The Horizontal layout clustered can be useful for SELECT DISTINCT Y1,……,Yk FROM T. analysis and decision making. As fuzzy C-means algorithm can handle vagueness of data, so to cluster INSERT INTO TH SELECT X1, . . . , Xj , V(CASE Horizontal layouts fuzzy C-Means algorithm is used. WHEN Y1 = v11 and . . . and Yk =vk1 THEN N ELSE null END) …………………,V(CASE WHEN Y1 = v1d and . . . and Yk = vkd THEN N ELSE null END) FROM F GROUP BY X1, X2 . . . ,Yj. 3.2.3.1 FUZZY C-MEANS ALGORITHM: As experienced in real life situations, the clustering of datasets by hard c-means leads to a partition of the dataset. But, this is unwanted in many cases and so the applicability of hard c-means has been limited. However, the concept of fuzzy sets, so that an element Example: Suppose in a store database if we want find out total items sold in each department of each store by can belong to any number of clusters with different membership values. each day of week. The following query is evaluated to create horizontal layout by using CASE method. select The objective function is StoreId, DepId, sum( CASE when Day='Fri' then Qty n c J m (U , v)   ( ik ) m ' (dik ) 2 else null end),sum( CASE when Day='Mon' then Qty k 1 i 1 else null end),sum( CASE when Day='Sat' then Qty 1  m'   and is else null end),sum( CASE when Day='Thr' then Qty m’ being a real number such that else null end),sum( CASE when Day='Tue' then Qty called the fuzzifier. ik  [0, 1] is the membership of else null end),sum( CASE when Day='Wed' then Qty else null end) from Trans1 Group By StoreId, DepId. the kth pattern to vi . Algorithm: 3.2.3 Module 3(Clustering) STEP 1: Fix c ( 2  c  n ) and select a value m’ The main objective in this paper is to create a data set Initialize the partition matrix easily so that it can be useful directly in data mining For r = 0, 1, 2,…. Do tasks or projects avoiding data preprocessing phase. The horizontal layout can be useful for any data STEP 2: Calculate the ‘c’ centers mining algorithm so we are using it directly for clustering. The previously created horizontal layout is taken as input for clustering. vi( r ) , i  1, 2,...c n using the formula vij   k 1 n m' ik  k 1 .xkj m' ik Suppose if there is a stores data base, if we want to find the stores that are having same total sales amount STEP 3: Update the partition matrix for the r th step for each day of week or if we want to cluster the stores U (r ) based on total sales for each day of week than Kalluri N V Satya Naresh, IJECS Volume 4 Issue 1 January, 2015 Page No.10028-10042 Page 10037 to Taking U ( r 1) = (  ik( r 1) ), where I k  {i | 2  c  n; d ik( r )  0} Query1: INSERT INTO F0 SELECT DISTINCT storeid FROM Transaction. Query2: 1 ik( r 1)  c  d (r) 2/( m ' 1)   , if I k   ,     ik( r )   j 1  d jk      0, where INSERT INTO F1 SELECT storeid, sum (amt) AS totalsalesamt FROM Transaction WHERE Day=’Mon’ GROUP BY storied. i  I'k  {1, 2,...c}  I k .Query3: STEP 4: If U ( r 1)  U ( r )   L STOP Else go to STEP 2. INSERT INTO F2 SELECT storeid, sum (amt) AS totalsalesamt FROM Transaction WHERE Day=’Tue’ GROUP BY storied. Here C denotes number of clusters, V denotes cluster centers, X denotes data point, d denotes distance between cluster centre and data point and U is the partition matrix where each element of matrix represents the membership value of a data point X belonging to Cluster C. .Query4: INSERT INTO F1 SELECT storeid, sum (amt) AS totalsalesamt FROM Transaction WHERE Day=’Wed’ 4. Results: GROUP BY storied. By taking one real time example construction of .Query5: Horizontal layout by using SPJ method and CASE INSERT INTO F1 SELECT storeid, sum (amt) AS method is provided. After creating Horizontal layout, it totalsalesamt is taken as input data set for clustering and clustering Day=’Thu’ GROUP BY strid. FROM Transaction WHERE is done using fuzzy C-means algorithm. Query6: Example: INSERT INTO F1 SELECT storeid, sum (amt) AS Consider a database having stores information. totalsalesamt FROM Transaction WHERE Day=’Fri’ Transaction is a table in the database having StoreId, GROUP BY storied. DepId, Date, Month, Year, Day, ItemId, Rate, Qty, Amt as columns. Suppose if we want to find total sales amount for each storied by each day of week. Query7: INSERT INTO F1 SELECT storeid, sum (amt) AS totalsalesamt FROM Transaction WHERE Day=’Sat’ GROUP BY storied. .Query8: SPJ Method: INSERT INTO F1 SELECT storeid, sum (amt) AS The following queries should be computed to construct totalsalesamt horizontal layout by using SPJ method. Day=’Sun’ GROUP BY storied. FROM F Transaction WHERE Kalluri N V Satya Naresh, IJECS Volume 4 Issue 1 January, 2015 Page No.10028-10042 Page 10038 then Qty else null end) from Trans1 Group By StoreId, Query9: INSERT INTO FH SELECT F0.storied, DepId.The results are as follows: F1.totalsalesamt AS Mon-amt, F2.totalsalesamt First we need to select the Transaction table from AS Tue-amt, F3.totalsalesamt AS Wed-amt, database containing stores information for which we AS Thu-amt, F5.totalsalesamt want to create horizontal layout. The input frame is as F4.totalsalesamt AS fri-amt, F6.totalsalesamt AS Sat-amt, follows. F7.totalsalesamt AS Sun-amt FROM F0 LEFT OUTER JOIN F1 on F0.storeid=F1.storeid LEFT OUTER JOIN F2 on F0.storrid=F2.storeid LEFT OUTER JOIN F3 on F0.storeid=F3.storeid LEFT OUTER JOIN F4 on F0.storeid=F4.storeid LEFT OUTER JOIN F5 on F0.storeid=F5.storeid LEFT OUTER JOIN F6 on F0.storeid=F6.storeid LEFT OUTER JOIN F7 on F0.storeid=F7.storeid. By evaluating above queries we will get the horizontal layout that we want but it takes lot of effort as more sub queries should be written and more join operations should be performed. Consider in the above query to By pressing the select table button we can select the create vertical layout just one query is enough i.e Transaction table and by pressing display button the select storied, day, sum(amt) from Transaction group selected table is displayed as follows. After this by by storied, day. But to create horizontal layout we are pressing writing 9 queries, so to reduce the effort CASE method GENERATION frame will be displayed. generate button the SQL CODE can be used to create horizontal layout easily with less effort. CASE Method: Suppose if we want find out total items sold in each department of each store by each day of week. The following query is evaluated to create horizontal layout by using CASE method. select StoreId, DepId, sum( CASE when Day='Fri' then Qty else null end),sum( CASE when Day='Mon' then Qty else null end),sum( CASE when Day='Sat' then Qty else null end),sum( CASE when Day='Thr' then Qty else null end),sum( CASE when Day='Tue' In this frame if we press view Columns button all the then Qty else null end),sum( CASE when Day='Wed' columns of the selected table will be displayed. Kalluri N V Satya Naresh, IJECS Volume 4 Issue 1 January, 2015 Page No.10028-10042 Page 10039 The clustering results are as follows: By selecting the columns that we want to group by, aggregate, transpose, aggregation function and method This is the input frame where we need to select the name and by clicking Generate button we get data set that we want to cluster by using the browse Horizontal layout as output. button. Here we are selecting the previously created horizontal layout as input for clustering. The data storeids are The above horizontal layout output is taken as input clustered by using Fuzzy C-Means algorithm. for clustering and clustering is performed by using fuzzy C-means algorithm. Suppose from the stores data base if we want to find the stores that are having same total sales amount for each day of week or if want cluster the stores based on total sales for each day of week than previously created data set can be directly taken as input for clustering instead of again creating data set. Kalluri N V Satya Naresh, IJECS Volume 4 Issue 1 January, 2015 Page No.10028-10042 Page 10040 4. Conclusion: (.)Preparing data set for data mining projects takes more effort and time but horizontal layout data set can be easily created using horizontal aggregation functions. (.)It is easy to create Horizontal Layout using CASE than SPJ method as SPJ method consists computing more sub queries where as in CASE method a single query is enough to compute. (.)Time Complexity (O(NlogN+dknlogn+dN)) of is CASE better method than time complexity of SPJ method (O(Nlog(N))+dknlogn+dN ) where N is the size of the input table F, n is the size of output table Horizontal layout, d is the distinct combination of transposing columns and k is the number of transposing columns . [2] G. Bhargava, P. Goel, and B.R. Iyer, “Hypergraph Based Reorderings of Outer Join Queries with Complex Predicates, ”Proc. ACM SIGMOD Int’l Conf. Management of Data (SIGMOD ’95), pp. 304315, 1995. [3] J.A. Blakeley, V. Rao, I. Kunen, A. Prout, M. Henaire, and C. Kleinerman, “.NET Database Programmability and Extensibility in Microsoft SQL Server,” Proc. ACM SIGMOD Int’l Conf. Management of Data (SIGMOD ’08), pp. 1087-1098, 2008. [4] J. Clear, D. Dunn, B. Harvey, M.L. Heytens, and P. Lohman, “Non- Stop SQL/MX Primitives for Knowledge Discovery,” Proc. ACM SIGKDD Fifth Int’l Conf. Knowledge Discovery and Data Mining (KDD ’99), pp. 425-429, 1999. [5] E.F. Codd, “Extending the Database Relational Model to Capture More Meaning,” ACM Trans. Database Systems, vol. 4, no. 4, pp. 397-434, 1979. (.)Fuzzy C-Means algorithm can give better clustering results than K-Means and Hard C-Means algorithms as it handles vagueness of data. 5. Future Work: (.)Other data mining algorithms like classification, regression analysis, Decision Making can also be implemented by taking Horizontal layout as input. (.)Horizontal layout can be clustered by using other soft computing clustering algorithms to handle impreciseness in data. (.)Missing values in data is not handled, so rough set concept can be used to handle missing data. (.)To reduce the execution time of clustering algorithm it can be parallelized using OPEN_MP. REFERENCES [1] Carlos Ordonez and Zhibo Chen.: “Horizontal Aggregations in SQL to Prepare Data Sets for Data Mining Analysis”, IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 24, NO. 4, APRIL 2012. [6] C. Cunningham, G. Graefe, and C.A. GalindoLegaria, “PIVOT and UNPIVOT: Optimization and Execution Strategies in an RDBMS,” Proc. 13th Int’l Conf. Very Large Data Bases (VLDB ’04), pp. 9981009, 2004. [7] C. Galindo-Legaria and A. Rosenthal, “Outer Join Simplification and Reordering for Query Optimization,” ACM Trans. Database Systems, vol. 22, no. 1, pp. 43-73, 1997. [8] H. Garcia-Molina, J.D. Ullman, and J. Widom, Database Systems: The Complete Book, first ed. Prentice Hall, 2001. [9] G. Graefe, U. Fayyad, and S. Chaudhuri, “On the Efficient Gathering of Sufficient Statistics for Classification from Large SQL Databases,” Proc. ACM Conf. Knowledge Discovery and Data Mining (KDD ’98), pp. 204-208, 1998. [10] J. Gray, A. Bosworth, A. Layman, and H. Pirahesh, “Data Cube: A Relational Aggregation Operator Generalizing Group-by, Cross- Tab and SubTotal,” Proc. Int’l Conf. Data Eng., pp. 152-159, 1996. [11] J. Han and M. Kamber, Data Mining: Concepts and Techniques, first ed. Morgan Kaufmann, 2001. [12] G. Luo, J.F. Naughton, C.J. Ellmann, and M. Watzke, “Locking Protocols for Materialized Kalluri N V Satya Naresh, IJECS Volume 4 Issue 1 January, 2015 Page No.10028-10042 Page 10041 Aggregate Join Views,” IEEE Trans. Knowledge and Data Eng., vol. 17, no. 6, pp. 796-807, June 2005. [13] C. Ordonez, “Horizontal Aggregations for Building Tabular Data Sets,” Proc. Ninth ACM SIGMOD Workshop Data Mining and Knowledge Discovery (DMKD ’04), pp. 35-42, 2004. [14] C. Ordonez, “Vertical and Horizontal Percentage Aggregations,” Proc. ACM SIGMOD Int’l Conf. Management of Data (SIGMOD ’04), pp. 866-871, 2004. [15] C. Ordonez, “Integrating K-Means Clustering with a Relational DBMS Using SQL,” IEEE Trans. Knowledge and Data Eng., vol. 18, no. 2, pp. 188-201, Feb. 2006. [16] C. Ordonez, “Statistical Model Computation with UDFs,” IEEE Trans. Knowledge and Data Eng., vol. 22, no. 12, pp. 1752-1765, Dec. 2010. [17] C. Ordonez, “Data Set Preprocessing and Transformation in a Database System,” Intelligent Data Analysis, vol. 15, no. 4, pp. 613- 631, 2011. [18] C. Ordonez and S. Pitchaimalai, “Bayesian Classifiers Programmed in SQL,” IEEE Trans. Knowledge and Data Eng., vol. 22, no. 1, pp. 139-144, Jan. 2010. [19] S. Sarawagi, S. Thomas, and R. Agrawal, “Integrating Association Rule Mining with Relational Database Systems: Alternatives and Implications,” Proc. ACM SIGMOD Int’l Conf. Management of Data (SIGMOD ’98), pp. 343-354, 1998. [20] H. Wang, C. Zaniolo, and C.R. Luo, “ATLAS: A Small But Complete SQL Extension for Data Mining and Data Streams,” Proc. 29th Int’l Conf. Very Large Data Bases (VLDB ’03), pp. 1113- 1116, 2003. [21] A. Witkowski, S. Bellamkonda, T. Bozkaya, G. Dorman, N. Folkert, A. Gupta, L. Sheng, and S. Subramanian, “Spreadsheets in RDBMS for OLAP,” Proc. ACM SIGMOD Int’l Conf. Management of Data (SIGMOD ’03), pp. 52-63, 2003. [22] Zadeh, L. A.: Fuzzy sets, Information and Control, 8, (1965), pp.338–353. [23]Sugeno, S.: Fuzzy measures and fuzzy integrals, in Fuzzy Automata and Decision Process, edited by M.Gupta, G.N. Sardis and B.R. Gaines (North Holland, Amsterdam, New York), (1977), pp. 82-102. [24]Attanasov, K. T.: Intuitionistic Fuzzy Sets, Fuzzy Sets and Systems, 20, (1986), pp.87–96. Kalluri N V Satya Naresh, IJECS Volume 4 Issue 1 January, 2015 Page No.10028-10042 Page 10042

www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242

Related documents

Products

Support

www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib