ERWIN

advertisement
ERWIN
DATA MODELLING
Data model is a blueprint for the design of the database
The ERD Diagram (Entity Relationship Diagram) is the most popular type of data model
Some data modeling tools are
Microsoft ‘Visio’
Sybase ‘Power designer’
Sybase ‘data architect’
Oracle designer
Computer associates ‘Erwin’
IBM ‘relational data architect’
Erwin
Erwin is probably the leading tool for data modeling. It is intuitive and easy to use, and has most of the features
data modelers look for to make their jobs easier. The reverse engineering functionality is also great for retrospectively
producing system documentation (providing database objects were commented when created!). Erwin is currently
owned by Computer Associates
DATABASE DESIGNING:
Conceptual design(inputs/outputs)
|
|
Logical design(relation between entities and flow)
|
(DBMS independent)
………………………………………………………………………………………………………………………………
…………………………
|
(DBMS specific)
|
Physical design (stored procedures, triggers…)
Conceptual design- information used in an enterprise, independent of all physical considerations.
Logical design- The logical design of the database, including the entities and the relationships between them
NORMALIZATION a logical database design involves using formal methods to separate the data into multiple,
related tables.
Some of the benefits of normalization include:
 Reducing NULL values
 Eliminating redundant data
 Avoids unnecessary coding
 Maximize clustered index
 Minimize number of indexes on a single table
 Keep the table thin(number of columns should be less)
First Normal Form:
Eliminate Repeating Groups - Make a separate table for each set of related attributes, and gives each table a
primary key
A relational table by itself in first normal form
It satisfies three properties:
 Atomic values
 Rows must be unique
 Equal number of values
Second Normal Form:
Eliminate Redundant Data – If an attribute depends on only part of a multi-valued key, remove it to a separate
table.
Reduce first normal form entities to second normal form (2NF) by removing attributes that are not dependent on
the whole primary key
Third Normal Form:
All non-key attributes should be mutually independent
Eliminate Columns Not Dependent On Key – If attributes do not contribute to a description of the key, remove
them to a separate table
Boyce-Codd Normal Form:
Every non-key attribute should be fully dependent on a key
If there are non-trivial dependencies between candidate key attributes, separate them out into distinct tables.
Reduce third normal form entities to Boyce/Codd normal form (BCNF) by ensuring that they are in third
normal form for any feasible choice of candidate key as primary key.
Physical Design: stored procedures, triggers…
Physical Implementation: coding
DENORMALIZATION
As the name indicates, denormalization is the reverse process of normalization. It’s the controlled introduction of
redundancy in to the database design. It helps to improve the query performance as the number of joins could be
reduced
Erwin makes database creation very simple by generating the DDL(sql) scripts from a data model by using its
Forward Engineering technique or Erwin can be used to create data models from the existing database by using its
Reverse Engineering technique.
Erwin workplace consists of the following main areas:



Logical: In this view, data model represents business requirements like entities, attributes etc.
Physical: In this view, data model represents physical structures like tables, columns, datatypes
Modelmart: Many users can work with a same data model concurrently.
What can be done with Erwin?








Logical, Physical and dimensional data models can be created.
Data Models can be created from existing systems (rdbms, dbms, files etc.).
Different versions of a data model can be compared.
Data model and database can be compared.
SQl scripts can be generated to create databases from data model.
Reports can be generated in different file formats like .html, .rtf, and .txt.
Data models can be opened and saved in several different file types like .er1, .ert, .bpx, .xml, .ers, .sql, .cmt, .df,
.dbf, and .mdb files.
By using ModelMart, concurrent users can work on the same data model.
In order to create data models in Erwin, you need to have this All Fusion Erwin Data Modeler installed in your
system. If you have installed Modelmart, then more than one user can work on the same model.
Data Modeling Development Cycle
1.
2.
3.
4.
5.
Gathering Business Requirements - First Phase
Conceptual Data Modeling (CDM)
Logical Data Modeling
Physical Data Modeling
Database design
Steps to create a Data Model:
1» Get Business requirements.
2» Create High Level Conceptual Data Model.
3» Create Logical Data Model.
4» Select target DBMS where data modeling tool creates the physical schema.
5» Create standard abbreviation document according to business standard.
6» Create domain.
7» Create Entity and add definitions.
8» Create attribute and add definitions.
9» Based on the analysis, try to create surrogate keys, super types and sub types.
10» Assign datatype to attribute. If a domain is already present then the attribute should be attached to the domain.
11» Create primary or unique keys to attribute.
12» Create check constraint or default to attribute.
13» Create unique index or bitmap index to attribute.
14» Create foreign key relationship between entities.
15» Create Physical Data Model.
15» Add database properties to physical data model.
16» Create SQL Scripts from Physical Data Model and forward that to DBA.
17» Maintain Logical & Physical Data Model.
18» For each release (version of the data model), try to compare the present version with the previous version of the
data model. Similarly, try to compare the data model with the database to find out the differences.
19» Create a change log document for differences between the current version and previous version of the data
model.
Data Modeler Role
Business Requirement Analysis:
» Interact with Business Analysts to get the functional requirements.
» Interact with end users and find out the reporting needs.
» Conduct interviews, brain storming discussions with project team to get additional
» Gather accurate data by data analysis and functional analysis.
requirements.
Development of data model:
» Create standard abbreviation document for logical, physical and dimensional data models.
» Create logical, physical and dimensional data models(data warehouse data modeling).
» Document logical, physical and dimensional data models (data warehouse data modeling).
Reports:
» Generate reports from data model.
Review:
» Review the data model with functional and technical team.
Creation of database:
» Create sql code from data model and co-ordinate with DBAs to create database.
» Check to see data models and databases are in synch.
Support & Maintenance:
» Assist developers, ETL, BI team and end users to understand the data model.
» Maintain change log for each data model.
Logical vs Physical Data Modeling
Logical Data Model
Physical Data Model
Represents business information and defines business Represents the physical implementation of the model in a
rules
database.
Entity
Table
Attribute
Column
Primary Key
Primary Key Constraint
Alternate Key
Unique Constraint or Unique Index
Inversion Key Entry
Non Unique Index
Rule
Check Constraint, Default Value
Relationship
Foreign Key
Definition
Comment
Relational (OLTP) Data Modeling
Relational Data Model is a data model that views the real world as entities and relationships. Entities are concepts,
real or abstract about which information is collected. Entities are associated with each other by relationship and
attributes are properties of entities. Business rules would determine the relationship between each of entities in a
data model.
The goal of relational data model is to normalize (avoid redundancy)data and to present it in a good normal form.
While working with relational data modeling, a data modeler has to understand 1st normal form thru 5th normal
form to design a good data model.
Following are some of the questions that arise during the development of entity relationship data model. A complete
business and data analysis would lead to design a good data model.
1» What will be the future scope of the data model? How to normalize the data?
2» How to group attributes in entities?
3» How to name entities, attributes, keys groups, relationships?
4» How to connect one entity to other? What sort of relationship is that?
5» How to validate the data?
6» How to normalize the data?
7» How to present reports?
The completed relational data model is shown in Figure 1.5 and the corresponding data are shown in separate tables
in the next page.
Example of Relational Data Model: Figure 1.5
Dimensional Data Modeling
Dimensional Data Modeling comprises of one or more dimension tables and fact tables. Good examples of
dimensions are location, product, time, promotion, organization etc. Dimension tables store records related to that
particular dimension and no facts(measures) are stored in these tables.
For example, Product dimension table will store information about products(Product Category, Product Sub
Category, Product and Product Features) and location dimension table will store information about location(
country, state, county, city, zip. A fact(measure) table contains measures(sales gross value, total units sold) and
dimension columns. These dimension columns are actually foreign keys from the respective dimension tables.
Example of Dimensional Data Model: Figure 1.6
In the example figure 1.6, sales fact table is connected to dimensions location, product, time and organization. It
shows that data can be sliced across all dimensions and again it is possible for the data to be aggregated across
multiple dimensions. "Sales Dollar" in sales fact table can be calculated across all dimensions independently or in a
combined manner which is explained below.



Sales Dollar value for a particular product
Sales Dollar value for a product in a location
Sales Dollar value for a product in a year within a location

Sales Dollar value for a product in a year within a location sold or serviced by an employee
In Dimensional data modeling, hierarchies for the dimensions are stored in the dimensional table itself. For example,
the location dimension will have all of its hierarchies from country, state, county to city. There is no need for the
individual hierarchial lookup like country lookup, state lookup, county lookup and city lookup to be shown in the
model
Uses of Dimensional Data Modeling
Dimensional Data Modeling is used for calculating summarized data. For example, sales data could be collected on
a daily basis and then be aggregated to the week level, the week data could be aggregated to the month level, and so
on. The data can then be referred to as aggregate data. Aggregation is synonymous with summarization, and
aggregate data is synonymous with summary data. The performance of dimensional data modeling can be
significantly increased when materialized views are used. Materialized view is a pre-computed table comprising
aggregated or joined data from fact and possibly dimension tables which also known as a summary or aggregate
table.
Dimension Table
Dimension table is one that describe the business entities of an enterprise, represented as hierarchical, categorical
information such as time, departments, locations, and products. Dimension tables are sometimes called lookup or
reference tables.
Location Dimension
In a relational data modeling, for normalization purposes, country lookup, state lookup, county lookup, and city
lookups are not merged as a single table. In a dimensional data modeling(star schema), these tables would be
merged as a single table called LOCATION DIMENSION for performance and slicing data requirements. This
location dimension helps to compare the sales in one region with another region. We may see good sales profit in
one region and loss in another region. If it is a loss, the reasons for that may be a new competitor in that area, or
failure of our marketing strategy etc.
Example of Location Dimension: Figure 1.8
In the above example, the location part of the Dimensional data model diagram is shown for easy understanding. It
shows all the lookups country, state, county and city are connected to the single location dimension. Below are the
data stored in each table found in the above location part. Dimension tables have been explained in detail under the
section Dimensions.
Relational Data Modeling is used in OLTP systems which are transaction oriented and Dimensional Data Modeling is
used in OLAP systems which are analytical based. In a data warehouse environment, staging area is designed on OLTP
concepts, since data has to be normalized, cleansed and profiled before loaded into a data warehouse or data mart. In
OLTP environment, lookups are stored as independent tables in detail whereas these independent tables are merged as a
single dimension in an OLAP environment like data warehouse
Time Dimension
In a relational data model, for normalization purposes, year lookup, quarter lookup, month lookup, and week
lookups are not merged as a single table. In a dimensional data modeling(star schema), these tables would be
merged as a single table called TIME DIMENSION for performance and slicing data.
This dimensions helps to find the sales done on date, weekly, monthly and yearly basis. We can have a trend
analysis by comparing this year sales with the previous year or this week sales with the previous week.
Example of Time Dimension: Figure 1.11
Slowly Changing Dimensions
Dimensions that change over time are called Slowly Changing Dimensions. For instance, a product price changes
over time; People change their names for some reason; Country and State names may change over time. These are a
few examples of Slowly Changing Dimensions since some changes are happening to them over a period of time.
clustering
defined resources like SQL Server move to one of the other servers. The entire process usually takes under a minute in
a properly configured cluster. It’s not unheard of in some cases for a SQL Server to take 5 minutes to transfer its
resources from one server to another.
There are two types of failover clustering in Windows: Active/Passive and Active/Active. Active/Passive means that
your cluster has an active node and a passive mode. If your active node failed, then its defined resources would shift to
the passive node and it would become active. The passive node is not accessible unless an accident occurs and the
resources shifted. Active/Active clustering takes the previous example and twists it slightly. In Active/Active
clustering, both nodes are accessible and active. If a node fails, then its resources would shift to the other active node.
The node that survives would then carry the load for both nodes. Keep this point in mind when you’re purchasing your
equipment. You will need to ensure that in an Active/Active environment that both nodes could sustain the traffic
generated for both nodes by themselves
I prefer to use Active/Active clustering because in Active/Passive you have hardware that essentially goes unused until
a problem arises. In Active/Active, you ensure that all of your expensive hardware is at full utilization.
Each server that participates in clustering is referred to as a node. Windows 2000 Advanced Server supports two-node
clusters and Windows 2000 Data Center Supports up to 4 nodes in the cluster. A tool called Cluster Administrator
manages the Windows clustered resources including SQL Server. Inside Cluster Administrator, you can specify which
server is the preferred owner of a resource (like SQL Server), and you can define who are the possible owners. This
means that when you have the possibility of having 4-node clusters, you can specify that one service failover to a
particular node. You can also set which server is dependent on another server. By setting a dependent service, you can
make sure that SQL Server does not start until the drives are ready.
Download