ERWIN DATA MODELLING Data model is a blueprint for the design of the database The ERD Diagram (Entity Relationship Diagram) is the most popular type of data model Some data modeling tools are Microsoft ‘Visio’ Sybase ‘Power designer’ Sybase ‘data architect’ Oracle designer Computer associates ‘Erwin’ IBM ‘relational data architect’ Erwin Erwin is probably the leading tool for data modeling. It is intuitive and easy to use, and has most of the features data modelers look for to make their jobs easier. The reverse engineering functionality is also great for retrospectively producing system documentation (providing database objects were commented when created!). Erwin is currently owned by Computer Associates DATABASE DESIGNING: Conceptual design(inputs/outputs) | | Logical design(relation between entities and flow) | (DBMS independent) ……………………………………………………………………………………………………………………………… ………………………… | (DBMS specific) | Physical design (stored procedures, triggers…) Conceptual design- information used in an enterprise, independent of all physical considerations. Logical design- The logical design of the database, including the entities and the relationships between them NORMALIZATION a logical database design involves using formal methods to separate the data into multiple, related tables. Some of the benefits of normalization include: Reducing NULL values Eliminating redundant data Avoids unnecessary coding Maximize clustered index Minimize number of indexes on a single table Keep the table thin(number of columns should be less) First Normal Form: Eliminate Repeating Groups - Make a separate table for each set of related attributes, and gives each table a primary key A relational table by itself in first normal form It satisfies three properties: Atomic values Rows must be unique Equal number of values Second Normal Form: Eliminate Redundant Data – If an attribute depends on only part of a multi-valued key, remove it to a separate table. Reduce first normal form entities to second normal form (2NF) by removing attributes that are not dependent on the whole primary key Third Normal Form: All non-key attributes should be mutually independent Eliminate Columns Not Dependent On Key – If attributes do not contribute to a description of the key, remove them to a separate table Boyce-Codd Normal Form: Every non-key attribute should be fully dependent on a key If there are non-trivial dependencies between candidate key attributes, separate them out into distinct tables. Reduce third normal form entities to Boyce/Codd normal form (BCNF) by ensuring that they are in third normal form for any feasible choice of candidate key as primary key. Physical Design: stored procedures, triggers… Physical Implementation: coding DENORMALIZATION As the name indicates, denormalization is the reverse process of normalization. It’s the controlled introduction of redundancy in to the database design. It helps to improve the query performance as the number of joins could be reduced Erwin makes database creation very simple by generating the DDL(sql) scripts from a data model by using its Forward Engineering technique or Erwin can be used to create data models from the existing database by using its Reverse Engineering technique. Erwin workplace consists of the following main areas: Logical: In this view, data model represents business requirements like entities, attributes etc. Physical: In this view, data model represents physical structures like tables, columns, datatypes Modelmart: Many users can work with a same data model concurrently. What can be done with Erwin? Logical, Physical and dimensional data models can be created. Data Models can be created from existing systems (rdbms, dbms, files etc.). Different versions of a data model can be compared. Data model and database can be compared. SQl scripts can be generated to create databases from data model. Reports can be generated in different file formats like .html, .rtf, and .txt. Data models can be opened and saved in several different file types like .er1, .ert, .bpx, .xml, .ers, .sql, .cmt, .df, .dbf, and .mdb files. By using ModelMart, concurrent users can work on the same data model. In order to create data models in Erwin, you need to have this All Fusion Erwin Data Modeler installed in your system. If you have installed Modelmart, then more than one user can work on the same model. Data Modeling Development Cycle 1. 2. 3. 4. 5. Gathering Business Requirements - First Phase Conceptual Data Modeling (CDM) Logical Data Modeling Physical Data Modeling Database design Steps to create a Data Model: 1» Get Business requirements. 2» Create High Level Conceptual Data Model. 3» Create Logical Data Model. 4» Select target DBMS where data modeling tool creates the physical schema. 5» Create standard abbreviation document according to business standard. 6» Create domain. 7» Create Entity and add definitions. 8» Create attribute and add definitions. 9» Based on the analysis, try to create surrogate keys, super types and sub types. 10» Assign datatype to attribute. If a domain is already present then the attribute should be attached to the domain. 11» Create primary or unique keys to attribute. 12» Create check constraint or default to attribute. 13» Create unique index or bitmap index to attribute. 14» Create foreign key relationship between entities. 15» Create Physical Data Model. 15» Add database properties to physical data model. 16» Create SQL Scripts from Physical Data Model and forward that to DBA. 17» Maintain Logical & Physical Data Model. 18» For each release (version of the data model), try to compare the present version with the previous version of the data model. Similarly, try to compare the data model with the database to find out the differences. 19» Create a change log document for differences between the current version and previous version of the data model. Data Modeler Role Business Requirement Analysis: » Interact with Business Analysts to get the functional requirements. » Interact with end users and find out the reporting needs. » Conduct interviews, brain storming discussions with project team to get additional » Gather accurate data by data analysis and functional analysis. requirements. Development of data model: » Create standard abbreviation document for logical, physical and dimensional data models. » Create logical, physical and dimensional data models(data warehouse data modeling). » Document logical, physical and dimensional data models (data warehouse data modeling). Reports: » Generate reports from data model. Review: » Review the data model with functional and technical team. Creation of database: » Create sql code from data model and co-ordinate with DBAs to create database. » Check to see data models and databases are in synch. Support & Maintenance: » Assist developers, ETL, BI team and end users to understand the data model. » Maintain change log for each data model. Logical vs Physical Data Modeling Logical Data Model Physical Data Model Represents business information and defines business Represents the physical implementation of the model in a rules database. Entity Table Attribute Column Primary Key Primary Key Constraint Alternate Key Unique Constraint or Unique Index Inversion Key Entry Non Unique Index Rule Check Constraint, Default Value Relationship Foreign Key Definition Comment Relational (OLTP) Data Modeling Relational Data Model is a data model that views the real world as entities and relationships. Entities are concepts, real or abstract about which information is collected. Entities are associated with each other by relationship and attributes are properties of entities. Business rules would determine the relationship between each of entities in a data model. The goal of relational data model is to normalize (avoid redundancy)data and to present it in a good normal form. While working with relational data modeling, a data modeler has to understand 1st normal form thru 5th normal form to design a good data model. Following are some of the questions that arise during the development of entity relationship data model. A complete business and data analysis would lead to design a good data model. 1» What will be the future scope of the data model? How to normalize the data? 2» How to group attributes in entities? 3» How to name entities, attributes, keys groups, relationships? 4» How to connect one entity to other? What sort of relationship is that? 5» How to validate the data? 6» How to normalize the data? 7» How to present reports? The completed relational data model is shown in Figure 1.5 and the corresponding data are shown in separate tables in the next page. Example of Relational Data Model: Figure 1.5 Dimensional Data Modeling Dimensional Data Modeling comprises of one or more dimension tables and fact tables. Good examples of dimensions are location, product, time, promotion, organization etc. Dimension tables store records related to that particular dimension and no facts(measures) are stored in these tables. For example, Product dimension table will store information about products(Product Category, Product Sub Category, Product and Product Features) and location dimension table will store information about location( country, state, county, city, zip. A fact(measure) table contains measures(sales gross value, total units sold) and dimension columns. These dimension columns are actually foreign keys from the respective dimension tables. Example of Dimensional Data Model: Figure 1.6 In the example figure 1.6, sales fact table is connected to dimensions location, product, time and organization. It shows that data can be sliced across all dimensions and again it is possible for the data to be aggregated across multiple dimensions. "Sales Dollar" in sales fact table can be calculated across all dimensions independently or in a combined manner which is explained below. Sales Dollar value for a particular product Sales Dollar value for a product in a location Sales Dollar value for a product in a year within a location Sales Dollar value for a product in a year within a location sold or serviced by an employee In Dimensional data modeling, hierarchies for the dimensions are stored in the dimensional table itself. For example, the location dimension will have all of its hierarchies from country, state, county to city. There is no need for the individual hierarchial lookup like country lookup, state lookup, county lookup and city lookup to be shown in the model Uses of Dimensional Data Modeling Dimensional Data Modeling is used for calculating summarized data. For example, sales data could be collected on a daily basis and then be aggregated to the week level, the week data could be aggregated to the month level, and so on. The data can then be referred to as aggregate data. Aggregation is synonymous with summarization, and aggregate data is synonymous with summary data. The performance of dimensional data modeling can be significantly increased when materialized views are used. Materialized view is a pre-computed table comprising aggregated or joined data from fact and possibly dimension tables which also known as a summary or aggregate table. Dimension Table Dimension table is one that describe the business entities of an enterprise, represented as hierarchical, categorical information such as time, departments, locations, and products. Dimension tables are sometimes called lookup or reference tables. Location Dimension In a relational data modeling, for normalization purposes, country lookup, state lookup, county lookup, and city lookups are not merged as a single table. In a dimensional data modeling(star schema), these tables would be merged as a single table called LOCATION DIMENSION for performance and slicing data requirements. This location dimension helps to compare the sales in one region with another region. We may see good sales profit in one region and loss in another region. If it is a loss, the reasons for that may be a new competitor in that area, or failure of our marketing strategy etc. Example of Location Dimension: Figure 1.8 In the above example, the location part of the Dimensional data model diagram is shown for easy understanding. It shows all the lookups country, state, county and city are connected to the single location dimension. Below are the data stored in each table found in the above location part. Dimension tables have been explained in detail under the section Dimensions. Relational Data Modeling is used in OLTP systems which are transaction oriented and Dimensional Data Modeling is used in OLAP systems which are analytical based. In a data warehouse environment, staging area is designed on OLTP concepts, since data has to be normalized, cleansed and profiled before loaded into a data warehouse or data mart. In OLTP environment, lookups are stored as independent tables in detail whereas these independent tables are merged as a single dimension in an OLAP environment like data warehouse Time Dimension In a relational data model, for normalization purposes, year lookup, quarter lookup, month lookup, and week lookups are not merged as a single table. In a dimensional data modeling(star schema), these tables would be merged as a single table called TIME DIMENSION for performance and slicing data. This dimensions helps to find the sales done on date, weekly, monthly and yearly basis. We can have a trend analysis by comparing this year sales with the previous year or this week sales with the previous week. Example of Time Dimension: Figure 1.11 Slowly Changing Dimensions Dimensions that change over time are called Slowly Changing Dimensions. For instance, a product price changes over time; People change their names for some reason; Country and State names may change over time. These are a few examples of Slowly Changing Dimensions since some changes are happening to them over a period of time. clustering defined resources like SQL Server move to one of the other servers. The entire process usually takes under a minute in a properly configured cluster. It’s not unheard of in some cases for a SQL Server to take 5 minutes to transfer its resources from one server to another. There are two types of failover clustering in Windows: Active/Passive and Active/Active. Active/Passive means that your cluster has an active node and a passive mode. If your active node failed, then its defined resources would shift to the passive node and it would become active. The passive node is not accessible unless an accident occurs and the resources shifted. Active/Active clustering takes the previous example and twists it slightly. In Active/Active clustering, both nodes are accessible and active. If a node fails, then its resources would shift to the other active node. The node that survives would then carry the load for both nodes. Keep this point in mind when you’re purchasing your equipment. You will need to ensure that in an Active/Active environment that both nodes could sustain the traffic generated for both nodes by themselves I prefer to use Active/Active clustering because in Active/Passive you have hardware that essentially goes unused until a problem arises. In Active/Active, you ensure that all of your expensive hardware is at full utilization. Each server that participates in clustering is referred to as a node. Windows 2000 Advanced Server supports two-node clusters and Windows 2000 Data Center Supports up to 4 nodes in the cluster. A tool called Cluster Administrator manages the Windows clustered resources including SQL Server. Inside Cluster Administrator, you can specify which server is the preferred owner of a resource (like SQL Server), and you can define who are the possible owners. This means that when you have the possibility of having 4-node clusters, you can specify that one service failover to a particular node. You can also set which server is dependent on another server. By setting a dependent service, you can make sure that SQL Server does not start until the drives are ready.