A Multidimensional Modeling Approach for OLAP within the Framework of the Relational Model based on Quotient Relations O. Mangisengi, A M. Tjoa Institute of Software Technology Technical University of Vienna, Austria {oscar,tjoa}@ifs.tuwien.ac.at Abstract The paper introduces the concept of quotient relations to model and query OLAP data. The use of quotient relation inherits the advantages of the original relational model as it was introduced by Codd. Drilldown and roll-up operations can be performed by the powerful partitioning and de-partitioning operators on quotient relations. The proposed approach fulfils the requirements for a formal data model, namely the existence of an implementation independent formalism, the separation of structure and content and the existence of a declarative query language. 1. Introduction The On-line Analytical Processing (OLAP) is emerging as the most important approach in Data Warehousing. OLAP allows to model data in a multidimensional way as a cube and to query and analyze data from many different perspectives. Independent from the different implementation aspects, OLAP data are presented to the user in a multidimensional data model [CCS93]. There are several ways how to formally define multidimensional models and their query languages. However until now there do not exist a commonly accepted formal multidimensional data model. Such a model is necessary as a basis for an accepted standardized logical data model for OLAP data. This would allow practitioners and researchers to specify their data warehouses in a unified way. The aim of this paper is to propose an approach for the conceptual modeling of multidimensional data which is entirely based on a relational data model. It seems to the authors that such a model would very much correspond with the original intuition of Codd when he introduced the concept of OLAP in his pioneering white paper [CCS93]. Among the different ways to define a data cube the star schema approach could be regarded as the most dominant one. A data cube is defined as a collection of at least one fact table and a set of dimension tables. In this paper we will introduce a meta model with the capability to describe these tables and the OLAP queries in an elegant manner. The general requirements for such a formal data model that can serve as a foundation for multidimensional database systems, are similar to those that made the relational model successful, namely the existence of an implementation independent formalism, the separation of structure and contents, and the existence of declarative query language [BSHD98]. Recently a number of approaches are proposed in the literature for the formal foundation of multidimensional modeling. [BSHD98] compares and describes the most important modeling approaches in this area, namely the approaches [AGS97] of Agrawal, Gupta, and Sarawagi, [CT97a,CT97b] of Cabbibo and Torlone, [LW96] of Li and Wang, and [GL97] of Gyssens and Lakshmanan. The aim of the paper is to introduce the well known concept of quotient relations which was introduced in [FK77] for OLAP modeling of multidimensional data within the framework of the relational model. The remainder of this paper is structured as follows. In section 2 we briefly present the quotient relation approach. The partitioning operator is very well suited for the construction of n-dimensional cubes. With the use of the partitioning and de-partitioning operator drill-downs and roll-ups can be performed in an easy manner.The modeling of OLAP-data by means of quotient relation is given in section 3. Section 4 presents an illustrative example for the use of quotient relations for multidimensional queries. Section 5 presents our conclusion. 2. Quotient Relations 2.1. Basic Concepts In this section we will briefly introduce the basic concepts of quotient relation as it was introduced in [FK77]. Let D1 , D2 , , Dn be (not necessarily distinct) sets. A relation R on D1 , D2 , , Dn is a subset of the Cartesian product D1 D2 Dn ; i.e. R D1 D2 Dn . - Corresponding to each D j , j 1,2,, n is an Aj called an attribute of R. The attributes of R are distinct. Let I denote the collection of attributes of the n-ary relation (The relation R is often represented as R(I)). Let R be an n-ary relation and A a subset of I. The relation A R R is said to be an equivalence on R if for tuples t, t R, t A t iff tA t A. Thus if - tuples t and t have the same attribute values with respect to the attributes of A they are equivalent under A . A is reflexive, symmetric, and transitive, hence an equivalence relation. As such, we say R is partitioned by A into disjoint blocks of equivalent tuples, each of which is denoted by t t | t R t A t The collection of all blocks, written as R A t | t R t R Example: For the given relation R(A,B,C,D) in Table 2.1 the relation R A is presented in Table 2.2. Table 2.2 Relation R A - B 2 2 3 1 2 3 C X X Z X X X D X Y W Y Z Y 3. The modeling of OLAP-data by means of quotient relation Let R(I) be an n-ary relation. For different choices of A, different quotient relations are obtained. Together they constitute the family of quotient relations QR QR R A | A I . The partitioning attribute set for every R QR (i.e. A I ) is denoted as quot R . It can be easily shown that for a given n-ary relation R, the family QR is a finite lattice with universal over - A 1 1 1 2 2 3 R, bounds the quotient relation R consisting of a single 3.1 Quotient Algebra in a Multidimensional Model Based on the theory of the quotient algebra, a relation can be partitioned by attributes due to its attributes values. The relation partitioned by these attributes can be viewed as a multidimensional representation of this relation. Furthermore, we will show how the operations of the quotient algebra could be used for handling multidimensional cubes. As a first example, we introduce a relation R(Year,Item,City,Sales). Let relation R contains the following data: block (i.e. R R ), and the quotient relation R in which each tuple is itself a block. The partial order in the lattice is given by the degree of refinement. Then - R participates in T only if tC tD (where , , , ), provided that the underlying domains of are compatible. The resultant quotient relation T is still partitioned by A. Cartesian Product The Cartesian product, T : R S is defined such that for i 1, 2, , k and j 1, 2, , p , a block [tij] of T is defined as [tij] = [ri] [sj], where denotes the Cartesian product of the conventional algebra; r1 , r2 , , rk denote the blocks of R ; and [s1],[s2],…,[sp], denotes the blocks of S . Note that T is partitioned by the union A B. Set operators like union, set-difference, and intersection are obviously also defined in the quotient algebra. It can easily be shown that the quotient algebra is relationally complete. A detailed description of the quotient algebra approach will be given in [KT98a]. Table 2.1 Relation R(A,B,C,D) R A B C D 1 2 X X 1 2 X Y 1 3 Z W 2 1 X Y 2 2 X Z 3 3 X Y R/A S QS partitioned by sets A and B respectively (i.e. quot R A and quot S B ). Further let C and D be other attribute sets of R. Projection The projection T : R C is defined only if quot R C , in which case T is partitioned by A. Blocks are treated as unit by the operators, and in case of projection each block is projected on C and possible tuples eliminated. Restriction The restriction T : RC D is defined so that a block [t] for any R QR we have R R R . Let R(I) be an n-ary relation with A and B denoting any two sets of attributes of R, and R a member of QR . The partitioning operator (/) induces a partition on a quotient relation, while the de-partitioning operator (*) eliminates a particular partition. 2.2. The Quotient Algebra Let R and S be n-ary and m-ary relations, respectively. Consider the quotient relations R QR and Table 3.1 Relation R(Year,Item,City,Sales) Year Item City Sales Y I Ci Sa 1994 1 New York City 40 1994 2 New York City 30 1994 1 Los Angeles 30 1994 2 Los Angeles 20 1994 1 San Francisco 10 1994 2 San Francisco 20 1995 3 New York City 20 1995 4 New York City 60 1995 2 Los Angeles 10 1995 4 Los Angeles 50 1995 2 San Francisco 30 1995 4 San Francisco 40 Example 3.1: Partitioning of relation R by the Year: Table 3.4 Quotient relation X2 derived by partitioning of Year and City City Item Sales Year Ci I Sa Y NYC 1 40 2 30 LA 1 30 1994 2 20 SF 1 10 2 20 NYC 3 20 1 60 LA 2 10 1995 4 50 SF 2 30 4 40 X 1 : R Y ear This operation denotes that a quotient relation R is partitioned by attribute Year. The result of this partitioning is shown by the following table 3.2. The attributes names of those attributes which are used for partitioning are highlighted by bold letters Year Y 1994 1994 1994 1994 1994 1994 1995 1995 1995 1995 1995 1995 Table 3.2 Quotient relation X1 derived by partitioning of Year City Item Ci I New York City(NYC) 1 New York 2 Los Angeles (LA) 1 Los Angeles 2 San Francisco(SF) 1 San Francisco 2 New York City 3 New York City 1 Los Angeles 2 Los Angeles 4 San Francisco 2 San Francisco 4 Sales Sa 40 30 30 20 10 20 20 60 10 50 30 40 For simplicity reasons we will represent the quotient relation by omitting the explicit repeating of attribute values in the tuples of the quotient relation. The next table shows a representation of relation X1 in this manner. Table 3.3 Simplified representation of table 3.2 City Item Sales Year Ci I Sa Y NYC 1 40 NYC 2 30 LA 1 30 1994 LA 2 20 SF 1 10 SF 2 20 NYC 3 20 NYC 1 60 LA 2 10 1995 LA 4 50 SF 2 30 SF 4 40 Example 3.2: Partitioning of relation X1 of the previous example by City. 3.2. Fact & Dimension Multidimensional model are mostly designed consisting of a fact relation and some dimension relations. Therefore, we will introduce the following definition of a join operator in the quotient algebra between a fact relation and its dimension relations. Let A be the set A1, A2 ,, An , X1, X 2 ,, X k and B be the set B , B ,, B , X , X ,, X , where X1, X 2 , , X k and X 1 , X 2 , , X k are the attributes in common in the attribute sets A and B. The natural join between two quotient relations Fact(A) and Dim(B), denoted as Fact A DimB , is defined as follows: Fact A DimB : Fact A DimBX X A B quot Fact quot Dim 1 2 n 1 2 k Example 3.3: Let Fact(A) with A={Year,Item,City,Sales} be the flat fact relation given in Table 3.1. Let Dim(B) with B={City,State} be the following dimension quotient relation (partitioned by City) given in Table 3.5. Table 3.5 Quotient Relation B{City,State} which is partitioned by City State City ST Ci SF CA LA CA Dallas TX X 2 : X1 City The join of Fact(A) with Dim(B), denoted Fact A DimB , has as its resulting quotient relation Table 3.6. It is obvious that this result is partitioned by City. Table 3.6 Result of Fact A DimB Year Item Sales State City Y I Sa ST Ci 1994 1 10 CA 1994 SF 2 20 CA 1995 2 30 CA 1995 4 40 CA 1994 1 30 CA 1994 LA 2 20 CA 1995 2 10 CA 1995 4 50 CA Example 4.4: Let Fact2(A) be a fact quotient relation partitioned by the attribute Year (i.e. quot(Fact2(A))={Year}) as given in table 3.3. Let Dim(B) be a dimension quotient relation with B={City,State} which is already partitioned by the attribute State (i.e. quot(Dim(B))={State}) given in Table 3.7. Table 3.7 Relation Dim(B){City,State} partitioned by State City State Ci ST SF CA LA Dallas TX NYC NY The join Fact A DimB has the following result (Table 3.8). Table 3.8 Result of Fact A DimB partitioned by State City Item Sales Year State Ci I Sa Y ST 1994 LA 1 30 LA 2 20 1995 LA 2 10 LA 4 50 CA 1994 SF 1 10 SF 2 20 1995 SF 2 30 SF 4 40 1994 NYC 1 40 NYC 2 30 NY 1995 NYC 3 20 NYC 1 60 3.3. Rolling-up & Drilling-down In this section, we apply the partitioning and departitioning operator for the drilling-down and rolling-up operations in the multidimensional model. The quotient algebra supports the partitioning operator (/) used to induce a partition on a quotient relation. On the other hand, it also supports the de-partitioning operator (*) to eliminate a particular partition. As stated above the following quotient algebra operation provides the operation of the partitioning operator. X : R A B The operation X denotes that a quotient relation R is partitioned by attribute A, and B. To eliminate the partitioning by B, the de-partitioning operator, *, can be used as follows: Y : X * B R A B B R A. In the multidimensional model, the two operations, rolling-up and drilling-down, can be applied using the partitioning and de-partitioning operator. 3.3.1. Rolling-Up Rolling-up is an operation used to aggregate data to a higher level in the dimension hierarchy. In the quotient algebra, the partition operator can be used to denote rollingup operations. Let R(A) be a quotient relation with quot(R(A)) as its partitioning attributes. The rolling-up operation in the quotient relation R(A) (with quot(R(A))) by aggregating the attribute X (with X quot R A ) is denoted by: R UPRA, X : R A X , (i.e. R quot R A X ) with quot R UPR A, X : quot R A X . To illustrate the rolling-up operation in the quotient algebra, we introduce the following quotient relation Q(ST,Ci,STO,Sa) with the aggregation hierarchy StoreCityState. We use a sample quotient relation Store1 by partitioning of Q using STO given by table 3.9. Table 3.9 Relation Store1 Q STO State ST CA CA CA CA CA CA NY City Ci SF SF SF LA LA LA NY Store STO 4 5 6 2 3 7 1 Sales Sa 100 300 500 200 400 600 500 Let us assume as above that the lowest level of aggregation in the quotient relation Store1 is the Store attribute. To roll-up the sales data from Store to City, the quotient algebra operation is given as follows: City1 : Store1 Ci ty . The result of rolling-up the sales data from Store to City is given in the table 3.10. Remark: It could be useful to introduce derived attributes in quotient relations (e.g. sum, average, maximum, minimum, etc.) which are functionally dependent from the partitioning attribute. Table 3.10 shows the introduction of sums for the partitioning attribute City. City1 : City1 State, City , Store, Sales , SumSales , City Table 3.10 Rolling-up Store to City and the introduction of derived attributes Sum(Sales,City) State Store Sales Sum City ST STO (Sales, Ci City) 4 100 CA SF 5 300 900 6 500 2 200 CA LA 3 400 1200 7 600 NY NY 1 500 500 If a higher summarization of data is required, then a quotient relation Q has to be partitioned by this higher level attribute. In our case, a rolling-up to State requires a further partitioning by ST and the quotient algebra operation can be written as follows: State1 : City1 State . The result of this roll up operation is given in table 3.11. Table 3.11 Rolling-up from City to State and the introduction of the derived attributes Sum(Sales,State) Store Sales Sum( Sum( State City STO Sales,C Sales,S ST Ci ity) tate) 4 100 SF 5 300 900 CA 6 500 2100 2 200 LA 3 400 1200 7 600 NY NY 1 500 500 500 3.3.2. Drilling-down Drilling-down in a multidimensional model is used to obtain the data of a finer granularity level within the aggregation hierarchy of a given dimension. In the quotient algebra, the de-partitioning operator can be used for this purpose. The de-partitioning operator can be regarded as the inversion of the partitioning operator. Let R(S) be a quotient relation partitioned by the attribute-set quot(R(S)). The de-partitioning operator (*) reduces the degree of partitioning of a quotient relation. R(S) can be de-partitioned by an attribute set X S . RS X : R / (S \ X) with quot(R(S)*X) := quot(R(S)) \ quot(X). It is obvious that RS S R R and RS X X RS . The drilling-down operation on a quotient relation R A into a relation where the details of B A are of interest can be performed by the following operation: D DOWN : R A * B , with B A , where R A is a quotient relation, and A denotes attribute set used for the original partition, and B is the partitioning attribute to be eliminated. In this operation, data is disaggregated from a higher level to a lower level. We obtain a drill-down of the sales data from State to City, in the following way: City1 : State1 * ST , (i.e. City1 : City1 ST ST ). In this example, the partitioning of the attribute ST is eliminated, and the table obtained by drilling down table 3.11 is table 3.10 again. If we want to obtain a more detailed view of the data, e.g. at store level, the required drill-down from City to Store can be realized by: Store1 : City1 * Ci Store1 Ci Ci or Store1 : Q STO Ci ST ST Ci : Q STO The result of this drilling-down operation is again table 3.9. 4. An extended example for the use of the quotient algebra in a Data Warehouse Table 4.1 Relation Sa(T,Pr,STO,SL) Time Product Store Sales T Pr STO SL 1 3 1 30 2 2 1 20 13 3 4 10 14 2 4 50 397 3 1 40 398 3 1 30 401 3 4 20 402 3 4 60 1 5 5 25 14 3 5 45 Store STO 1 4 5 7 Product Pr 2 3 4 5 Relation S(STO,Ci,ST) City Ci New York City Los Angeles San Francisco Austin State ST NY CA CA TX Relation P(Pr,I,Ct) Item Category I Ct Lots of Nuts Food Sweet tooth Food Fizzy Light Soft drinks Fizzy Classic Soft drinks Relation D(T,M,Y) Time Month Year T M Y 1 10 1994 2 10 1994 13 11 1994 14 11 1994 397 10 1995 398 10 1995 401 11 1995 402 11 1995 To show the use of the quotient algebra in a Data Warehouse, we use a small data warehouse schema example. In this example, the Data Warehouse has a sales fact relation (Sa) and three dimension relations. The dimension relations are given by the relation Store (S), Product (P), and Day (D). A sample illustration of the relation Sa, S, P, and D is given in the tables 4.1. To demonstrate the relevance of the use of quotient algebra in a data warehouse environment, a representative multidimensional query will be described by means of quotient algebra operations. Example: Comparison of the sales of a product `sweet tooth` for stores in state CA for the year 1994 and 1995. The steps used to process the multidimensional query using quotient relations are: 1. Restriction of stores which are located in California and partitioning by the attribute ST. R1 : S ST ST ' CA' STO, ST The result of the quotient algebra operation R1 is given in the table 4.2. 4. Join between the fact relation Sa with R1, R2, and R3. a. Join with R1 . R4 : Sa R1 . The table 4.5 shows the result of the quotient algebra operation R4. Table 4.5 Result of the quotient algebra operation R4 Time Product Store Sales State T Pr STO SL ST 13 3 4 10 14 2 4 50 401 3 4 20 CA 402 3 4 60 1 5 5 25 14 3 5 45 As a next step the (now) irrelevant attribute store can be omitted by the following projection. R4 : R4 T , Pr , SL, ST The quotient algebra operation R 4 is given in table 4.6. Table 4.6 Result of the quotient algebra operation R4 Time T 13 14 401 402 1 14 Table 4.2 Result of quotient algebra operation R1 Store City State STO Ci ST 4 LA CA 5 SF 2. Restriction of product “Sweet tooth” and projection on {Product,Item} R2 : PI ' Sweet tooth' P r , I b. The following table shows the result of the quotient algebra operation R2. Table 4.3 Result of the quotient algebra operation R2 Product Item Pr I 3 Sweet tooth 3. Restriction to relevant Time elements in the Year 1994 and 1995. The quotient algebra operation is given as follows: R3 : D Y Y '1994 ' Y '1995 ' T , Y The following table 4.4 shows the relation D partitioned by Year in 1994 and 1995 Table 4.4 Result of quotient algebra operation R3 Time Year T Y 1 2 1994 13 14 397 398 1995 401 402 Product Pr 3 2 3 3 5 3 Sales SL 10 50 20 60 25 45 State ST CA Join with R2 and projection on the relevant attributes Time, Item, Sales, and State. R : R R Time, Item, Sales , State . 5 4 2 Table 4.7 shows the result of the quotient algebra operation R5. Table 4.7 Result of the quotient algebra operation R5 Time Sales Item State T SL I ST 13 10 401 Sweet tooth 20 CA 402 60 14 45 c. Join with R3 . R6 : R5 R3 Table 4.8 Result of the quotient algebra operation R6 Time Sales Item State Year Y SL I ST Y 13 Sweet tooth 10 CA 1994 14 45 401 Sweet tooth 20 CA 1995 402 60 The (now) irrelevant attribute Time can be ignored by the following projection. R6 : R6 ST , Y , I , SL Table 4.9 Result of the quotient algebra operation R6 5. State ST CA Year Y 1994 Item I Sweet tooth CA 1995 Sweet tooth Introduction of the Sum(Sales(Year)) in R6 . derived Sales SL 10 45 20 60 attribute R6 R6 State , Year , Item, Sales , SumSales , Year Table 4.10 Introduction of the aggregation function sum(Sales(Year)) Sales Sum State Year Item SL (Sales, Year) ST Y I 1994 Sweet tooth 10 55 CA 45 CA 1995 Sweet tooth 20 80 60 Roll-up of R6 by ST and introduction of derived attribute Sum(Sales,Year). The result is given in Table 4.11. State , Year , Item, Sales , R7 : R6 ST SumSales , Year , SumSales , State 6. Table 4.11 Roll-up and introduction of the aggregation function sum(Sales(State)) Sales Sum( Sum( State Year Item SL Sales, Sales, ST Y I Year) State) 1994 Sweet tooth 10 55 CA 45 135 1995 Sweet tooth 20 80 60 5. Conclusion and Future Work Our work is to study the use of quotient relations as an instrument of modeling and querying OLAP data. In this approach OLAP cubes are represented by quotient relations for fact and dimensions. It has been shown that the partitioning and de-partitioning operators can be easily used for drilling-down and rolling-up. The advantages of a modeling approach using quotient relations are given by the implementation independent formalism of the quotient algebra. This advantage is comparable with the implementation independent formalism of Codd’s original relational model. Furthermore the approach fulfils the criterion of separation of structure and contents. The intentional description of quotient relations (i.e. its schemas consisting of the set of attributes of the relations together with the partitioning sets) are clearly separated from its extensions (i.e. the block-tuples of the quotient relation). The criterion of the existence of a declarative query language is given by the quotient algebra and has been shown in illustrative examples in the paper. Further investigation will be made on the object-oriented implementation of a quotient relation based OLAP representation and on the use of quotient relations for mandatory access control security concept for OLAP as it is represented for the traditional relational model in [KKST97]. An implementation of an object-oriented realization is under work. Acknowledgement The authors are very indebted to Andreas Kurz for many creative remarks. This work is supported by the Austrian Federal Bank, Project No. 6681. For the objectoriented implementation we thank Informix for their “Innovative Software Grant”. References [AGS97] R. Agrawal, A. Gupta, and S. Sarawagi. Modelling Multidimensional Databases. Proc. of the Int’l Conference on Data Engineering, 1997. [BSHD98] M. Blaschka, C. Sapia, G. Höfling, and D. Dinter. An overview of multidimensional data models for OLAP. Proc. of Database and Expert Systems Applications, IEEE Press, 1998. [CCS93] E.F. Codd, S.B. Codd, and C.T. Salley. Providing OLAP (On-line Analytical Processing) to User-Analysts: An IT Mandate. E.F. Codd & Associates, White paper, 1993. [CT97a] L. Cabbibo, and R. Torlone. A systematic Approach to Multidimensional Databases. Proc. of SEBD 1997. [CT97b] L. Cabbibo, and R. Torlone. Querying Multidimensional Databases. Proc. Of the 6th DBPL, 1997. [FK77] A.L. Furtado, and L. Kerschberg. An Algebra of Quotient Relations. Proc. of ACM SIGMOD, 1977. [GL97] M. Gyssens, and L.V.S. Lakshmanan. A Foundation for Multi-Dimensional Databases. Proc. of Int’l Conference on Very Large Databases, 1997, Athens, Greece. [KKST97] R. Kirkgöze, M. Katic, M. Stolba, and A.M. Tjoa. A Security concept for OLAP. Proceeding 8th. Int’l. Workshop on Database and Expert System Applications, IEEE Computer Society, 1997. [KT98a] A. Kurz and A.M. Tjoa. Web Warehousing. Thomson Publishing, 1998 (to appear). [LW96] C. Li and X.S. Wang. A data model for supporting on-line analytical processing. Proc. Conf. On Information and Knowledge Management, November 1996.