What is data modeling?

advertisement

/w EPDw UJNzg5N

T08: D ATA A NALYSIS AND D ATA M ODELING

T ABLE OF C ONTENTS

T08: Data Analysis and Data Modeling .......................................................................................................................... 1

Learning Objectives ................................................................................................................................................... 2

What is data modeling? ......................................................................................................................................... 3

Part 1: Gathering the Facts: Business Rules ............................................................................................................... 5

Business Decisions ................................................................................................................................................. 5

Data Collection Tactics ........................................................................................................................................... 6

Developing and Identifying Business Rules ........................................................................................................... 6

Characteristics of Business Rules ........................................................................................................................... 7

Part 2: The Components of Conceptual Data Modeling ............................................................................................ 7

Just the Facts, Ma’am. ........................................................................................................................................... 8

Entities ................................................................................................................................................................... 8

Attributes ............................................................................................................................................................... 8

Relationships ......................................................................................................................................................... 9

Part 3: Conceptual Data Modeling with Crow’s Foot ERDs ....................................................................................... 9

Conceptual Modeling Diagrams............................................................................................................................. 9

Attributes in the Crow’s foot ERD ........................................................................................................................ 10

Relationships in the Crow’s foot ERD .................................................................................................................. 11

Weak Entities ....................................................................................................................................................... 12

Associative and Composite Entities ..................................................................................................................... 13

Part 4: A Simple ERD design methodology .............................................................................................................. 15

L EARNING O BJECTIVES

Now that you’re fairly well versed in database and DBMS implementations in addition to client /server and distributed databases, it’s time to go back to the beginning and close the loop. Over the next few weeks we’ll learn how to define the data requirements of a business problem, create a conceptual model which formalizes those requirements, and then transform that conceptual model into a logical model.

This week we’ll be focusing on establishing data requirements and formulating a conceptual data model. Our learning objectives are:

Understand the concept of data modeling

Develop business rules

Develop and apply good data naming conventions

Construct simple data models using Entity Relationship Diagrams (ERDs)

Develop entity relationships and define various types of attributes

If you look at our methodology below we will be looking at activities that take place in the planning and analysis phases of our methodology:

W HAT IS DATA MODELING ?

Data modeling is based on the old adage that a “picture is worth 1000 words.”  It is the process of creating a visual representation of data requirements for a problem domain. IT professionals who engage in the process of data modeling typically have the job titles of DBA, business analysts, system analyst s, and software engineers. It takes lots of experience to be a good data modeler, hence the reason these people make “the big bucks” .

In the first week of class, we learned there are 5 representations of a data model: conceptual, logical, internal, external, and physical. We have already seen the internal model in the form of SQL tables, keys and constraints and the external model as views, stored procedures, and functions. What we’ve yet to explore but will in the coming weeks is the conceptual and logical models. It is implied whenever people refer to the art of “data modeling”  they ’ re generalizing about building conceptual and logical models.

Recall (from your Systems Analysis and Design Course, and from the beginning of this course ):

Data Model

Conceptual

Data Model

What does it show?

A graphical depiction of the data requirements and business rules of the problem domain.

Who is it for?

A communications tool between the systems analyst and the end-user. Also serves as a formal definition of the data requirements for the project.

Logical Data

Model

A transformation of the conceptual model into a form which can be implemented on a relational DBMS. It still has the same formal requirements as a conceptual model, only the requirements are organized in a manner conducive to an SQL implementation.

A communications tool between the systems analyst and the DBA. The logical model serves as a formal definition (a.k.a. data dictionary) for the database implementation independent of a particular DBMS.

The actual implementation of a logical data model on

Internal Data

Model a specific DBMS. The same logical model implemented on two different DBMSs (SQL Server, and Oracle) will look somewhat different.

The DBA and to some degree the application programmers. End-users interact with the internal data model via the user-interface that supports the external data model.

So how does one get from “idea to implementation” or in other words from business requirements to the internal data model? The following Figure explains the process. The arrows represent processes, and the shapes represent data models. We will be exploring each of the arrows, learning the techniques and approaches for transforming from one data model to another, over the next few weeks.

P ART 1: G ATHERING THE F ACTS : B USINESS R ULES

B USINESS D ECISIONS

Before we can build a database solution we need to understand the business or organizational context in which the database will live. Think about a house analogy, before we start construction i.e. before we break ground for the foundation of the house shouldn’t we have some idea of the needs and wants for the occupants? This is the same logic we use for building a database. Before we can design a database shouldn’t we determine the needs and wants of the people using the data? If you think the approach should be to open up your favorite DBMS and start rattling off CREATE TABLE statements, well then you’re way off the mark!

A good place to start is with determining the kinds of decisions the users need to make. There are three fundamental levels of decision-making: operational, tactical and strategic. Generally speaking operational decisions are made at the lowest levels of the organization by individual contributors and front-line supervisors.

The data for this level has to be very granular, come from internal sources, and be appropriate for making day-today decisions. Tactical decision making is made at the mid-managerial level. The data needs to be summarized so

that managers can make decisions based on needs next week, next month or next year. At the strategic level senior executives need a more futuristic view of the organization. They need access to data trends and future projections. Much of the strategic data normally comes from outside the organization. Where as the data for operational and tactical decisions generally come for internal data sources.

So in your analysis you need to determine, are the decisions specific to the operational needs of the business? Or are the decisions more tactical or strategic in nature where the decision-makers require less granular data but more aggregate-level data? Once you’ve settled on the types of decisions, you need to establish which decisions can’t the decision makers make because they don’t have the data. It is a good idea to make detailed lists of these decisions, organized by type, and then verify them with the three levels of decision makers.

Once you have an idea for how the data is to be used for making decisions you have to start collecting the actual data. The data collection process is often a “chicken vs. egg”  scenario.

Do we ask: what are the entities we need to collect data about (the chicken?) Or do we determine: what are the rules that affect the entities that we collect data about (the egg?) This becomes an interesting dichotomy. Well maybe not all that interesting. Database designers have most often started with the data as the starting point and then applied the business rules to arrive at the data model. In this course we will also start with the data since this is a database course then apply the business rules to arrive at our data models.

D ATA C OLLECTION T ACTICS

Determining the data you need to store in your database is a challenging process since your sources may not provide you with accurate data. Do not discount the possibility that some of your sources are incompetent and others lie. The process usually starts with a variety of data gathering techniques: interviews, surveys, questionnaires and “job-shadowing”  or actually working in the business unit if possible.

I recall my own efforts at designing sales quotation software for my employer ““ a construction equipment reseller. I didn’t know a backhoe from a bucket let alone an ADT from a TTT, but to be successful I had to learn it all and learn it quickly. No data collecting technique I implored accomplished a faster and more accurate job than actually teaming up with a few salespeople in the different business lines and hitting the road for a week. When you walk in the end-user’s shoes there are no external filters on your powers of observation. It also allows you to witness the human side of the system you’re designing, and brings in a little perspective of how that system will change and affect how people work ““ both positively and negatively.

The data you collect are usually associated with real world things. In my case I collected data about customers, employees, products (equipment, and accessories), prices, inventory, configurations, shipments etc. These categories of data became my entities, i.e. the things that I wanted to collect data about. The actual data became attributes that described the details about these entities. This was a good starting point. But to effectively model the data we need more. We need the context in which the data will be used. We also need the relationship of the data to other data. In other words we need the rules, the business rules. These business rules help you define how the pieces of data are connected to each other.

D EVELOPING AND I DENTIFYING B USINESS R ULES

A business rule is “a statement that defines or constrains some aspect of the business. It is intended to assert business structure or to control or influence the behavior of the business “¦ prevent, cause, or suggest things to happen (Guide Business Rules Project, 1997.) In my sales quotation system we had business rules like:

A salesman is assigned to one or more customers

A customer can purchase one or more products

 A product is either heavy equipment (a backhoe for example) or an accessory (such as a bucket or fork)

Equipment is described by multiple Configuration Line Items

A Customer places one or more order order. The salesman is responsible for that order. The order contains one or more products.

Here you can see that these simple rules start to put context around the data and that there are relationships developing among the entities. Let’s explore business rules a little further to see how we can develop more complex rules by examining their characteristics and their types.

C HARACTERISTICS OF B USINESS R ULES

Declarative - what, not how, statement of policy, describes what the process validates

Precise - clear, agreed-upon meaning, one interpretation

Atomic - one statement, no part of the rule can stand on its own

Consistent - internally and externally, no conflicts

Expressible - structured, natural language

Distinct - non-redundant, may refer to other rules

Business-oriented - understood by business people

It is important that your business rules have these characteristics as they are critical to defining accurate relationships among data.

P ART 2: T HE C OMPONENTS OF C ONCEPTUAL D ATA M ODELING

The Conceptual Model represents the data requirements from the end-user’s point of view. It is a representation of what rather than how. Right now you’re probably thinking, “Okay what does that mean?”  Let me try to explain further.

In the conceptual model, your focus is on discovering, defining, and establishing the data requirements and then formalizing those requirements. Your emphasis is placed on answering the following questions and then creating a picture which represents the answer to them:

What are the major things or entities which will eventually need to be stored in a database?

What other things do we need to store for each entity?

What are the business rules that connect the entities to each other?

One thing you don’t need to be concerned with at this stage is how you’re going to represent your findings with tables, keys and such. In the conceptual model, that’s not a concern. (See what I mean by focusing on the what rather than the how? :-)

As we explained earlier, the Conceptual Model serves as a communications vehicle between the data modeler and her customer or end user. The data modeler asks the end user these questions, draws up a diagram representing

the answer, and then communicates the diagram back to the end-user to verify the accuracy of the original problem, if there’s any disconnect between what the user said and what the analyst heard, the diagram is revised.

Yes, that is a gross over-simplification of the process, but that’s how it works in a nutshell.

J UST THE F ACTS , M A ’ AM .

So there are two basic steps in the conceptual modeling process. Establishing the end-user requirements, and then formalizing them with a diagram. There are many techniques for gathering the data many of which we touched upon in the previous section. Once you have the facts you need, its time to establish the end-user requirements of the data model. One method of doing this is the Facts technique. In a nutshell, with the Facts technique, you establish:

The Entities. What are the major things you need to store data about? Example: customers, and orders.

The Attributes. For each entity, what characteristics of it do we need to store? Example: customer name, address, phone, credit cards for the customer

 The Relationships. What are the required business rules which connect the entities to each other?

Example: customer places order

Once you identify the facts, drawing a diagram from those facts becomes almost trivial.

There are a variety of entity relationships modeling techniques that use their own set of symbols for drawing diagrams. There are the:

Chen - Developed by Peter Chen in 1976

Crow’s Foot - Developed by Charles Bachman FYI - This is the one we will use

Rein85 - developed by D. Reiner in 1985

In this course we will be using the Crow’s Foot modeling symbols.

E NTITIES

Entities are persons, places, things and events for which you want to store data. An Entity represents a single instance of something, such as a customer, an order, or a vehicle. As such it is important to represent your entity in the singular form (e.g. Customer) rather than in plural form (e.g Customers).

How does an entity relate to a database table? In general an entity is one row in the table, so if the entity is an

Order, then the table would be called Orders. Remember entities are not tables, but specific instances of data. You do that, and you’re on your way to becoming a good data modeler.

A TTRIBUTES

Attributes represent the characteristics of each entity which need to be stored in the database. Typically, the attributes end up corresponding to the columns in the database table. Besides enumerating each of the required attributes, it is also to denote the type of attribute, where appropriate. Here is a list of attribute types and their purpose:

Simple. Simple attributes are the default type of attribute. They are already in their most atomic form.

E.g. Last Name, Date of Birth, Order Date.

Unique [U]. Unique attributes are those with distinct values for each entity. These are sometimes primary key candidates, and should at least have a unique constraint place on them in the database implementation. E.g. Social Security Number, SUID, email (depending on circumstances)

Required [R]. Required attributes must have a value. At the database implementation level this indicates whether or not to allow null.

Composite [C]. Composite attributes are those attributes which should be divided into simple attributes in the table design, but can be left in their composite form to simplify the conceptual model. The breakdown of composite attributes into simple attributes should be a trivial task. E.g. Customer Name would break down into Last Name and First Name.

Derived [D]. A derived attribute can be calculated from other attributes. These might be represented in the table implementation as calculated columns or user-defined functions. E.g. Employee Years of Service

(calculated from Employee Date Of Hire)

Multi-Valued [M]. As the name implies, a multi-valued attribute contains more than one value per instance of the entity. This has no easy representation at the table level and must be addressed in the logical. E.g. Employee Siblings SSN, Employee Certifications. The important thing to remember is that you’re trying to establish what is required at the conceptual level and not how it will be represented in an

SQL table.

Note: Attribute types are not mutually exclusive, so an attribute can be required and unique, or composite and multi-valued, for example. Of course, some combinations do not make sense, such as unique and multi-valued.

R ELATIONSHIPS

Relationships are the associations between the entities in your data model. As we’ve seen before They should represent the business rules of the specific requirements of data model, and should not exist, just to tie things together. After all you’re attempting to model the specific problem domain, and not the entire universe! The entities connected via a relationship are called the participants, and have one of three classifications one to one

(1:1), one to many (1:M), or many to many (M:N).

It is important to understand that relationships are read in both directions. Starting on one side, with the singular instance of the entity, the cardinality of the relationship to the other participant is expressed. Next, the process is repeated in reverse. When expressing cardinality, make sure to count the number of instances at minimum and at maximum. At minimum there will always be 0 or 1 instances, and at maximum 1 or M. Whew! That is a mouthful.

It is time for an example.

For Example, take the following business rule: Customer places Order.

A Customer places 0 or M Orders. (Some customers may have never placed an order before, while other have placed several)

An Order is placed by 1 and only 1 Customer. (To be considered an order, it must be associated with at minimum and at maximum 1 customer.)

Overall, this relationship classification would be 1:M (one to many) Got it?

P ART 3: C ONCEPTUAL D ATA M ODELING WITH C ROW ’ S F OOT ERD S

C ONCEPTUAL M ODELING D IAGRAMS

The traditional diagram of choice for conceptual models is the Entity-Relationship diagram, or ERD. There are two popular notations for ERDs the Chen Model and the Crow’s Foot model. Both represent the same thing, and only use different notation. Here’s an example of an entity with 5 attributes in both models:

IMPORTANT! PLEASE REMEMBER TO KEEP THE RELATIONAL DATABASE MODEL STUFF (TABLES, KEYS, AND WHAT-

NOT) OUT OF YOUR CONCEPTUAL MODEL. THE CONCEPTUAL MODEL IS AN END-USER COMMUNICATIONS TOOL,

AND SHOULD NOT BE CLUTTERD UP WITH NOTATION THAT WILL CONFUSE THE END-USER, SUCH AS AUTO-

GENERATED PRIMARY KEYS, AND FOREIGN KEYS. THE BOTTOM LINE IS YOU SHOULD ONLY REPRESENT THOSE

THINGS IN THE CONCEPTUAL MODEL THAT ACUTALLY EXIST AS PART OF THE DATA REQUIREMENTS!

We will use the Crow’s foot model in class.

A TTRIBUTES IN THE C ROW ’ S FOOT ERD

When expressing attributes with the Crow’s foot model, use the following notation for different attribute types:

Unique [U]. Underline the attribute, or denote with a [U]

Required [R]. Boldface the attribute, or denote with an [R]

Composite [C]. Use a dashed underline or denote with a [C]

Derived [D]. Use italics or denote with a [D]

Multi-Valued [M]. Circle the attribute, or denote with an [M]

Here an example of an entity with various types of attributes:

R ELATIONSHIPS IN THE C ROW ’ S FOOT ERD

The following ends are used for relationship cardinality:

The symbols zero, one and many refer to an entity’s cardinality. Cardinality expresses the numerical relationship of an occurrence in entity in relationship to an occurrence or occurrences in the related entity. These cardinality symbols only have meaning when they are used with a relationship between two entities. Let’s examine the following business rules and look at the associated cardinality.

A department can have one or many employees working in it

An employee can work in only one department

Note: All relationships read in both directions!!!!!

Here is what the data might look like. Notice that one Department entity occurrence is associated with many

Employee entity occurrences.

W EAK E NTITIES

An Entity is said to be weak when it cannot exist without some other entity that it is related to. As the data modeler, you choose which entities are and aren’t weak. The design implication of designating an entity as a weak entity is that its primary key of the weak entity will be composite, consisting of the foreign key values from the related entity. This forms a stronger relationship between the entity and its weak counterpart. This is also known as an identifying relationship because one entity’s existence depends on another. Observe the following example:

This example shows two different ways you can implement a relationship between entities.

In the top ERD from the example, you cannot be a Basketball Player without being associated with a team. At the database implementation level of the model, this means you can’t enter a row in the basketball player table without knowing which team the player is on. Conceptually, the Basketball Player entity is considered weak because it cannot exist without the concept of basketball team. Thus we call Basketball player the weak entity and the relationship between team and player is an identifying relationship.

In the bottom ERD it is possible to have a player entity without a corresponding team,since TeamID is not required, nor is it part of what makes a basketball player unique. In this case Basketball player is not weak and relationship between team and player is non-identifying.

Whether or not you choose to implement a weak entity depends entirely on the situation. For example, the top

ERD might be suitable for a recreational basketball league where you don’t want players not associated with teams, whereas the bottom ERD design might be more suitable for professional basketball leagues where players may not be currently associated with a team.

A SSOCIATIVE AND C OMPOSITE E NTITIES

Weak entities with more than one identifying relationship are called composite or associative entities. These

entities cannot exist without 2 or more entities. A many to many relationship that carries data is also an associative entity.

An Example of an associative entity many-to-many relationship carrying data:

Example of a ternary (three way) associative entity:

P ART 4: A S IMPLE ERD DESIGN METHODOLOGY

The following is a simplified methodology for identifying the requirements and drawing a preliminary ERD from them.

1 Identify the business entities. Those things for which you need to store data about.

2. For each entity, enumerate the attributes. The additional data we care about for each entity.

2.a. For each attribute, be sure to identify the attribute type: Unique, Multivalued, Composite, Derived, or

Required. Attributes can fall into multiple types, such as RU ““ required and unique.

3. Identify the relationships or business rules required to connect the entities as described by the problem

domain. For each relationship / business rule:

3.a. Establish the preliminary cardinality: one-to-one, one-to-many, or many-to-many.

3.b. Establish the participation of each end. For example, on the “one” side, is it: zero-or-one or 1-and-only-one?

On the “many” side, is it zero-or-many or one-or-many?

4. Some final things to work out:

4.a. If your business rule is defined as an “is-a” or “has-a” relationship, then use the sub-type/super-type notation.

4.b. For each one-to-many relationship: Determine if the relationship is identifying or existence-dependent on the

“many” side. If so then the entity on the many side of the relationship is a weak entity, and cannot exist without the entity on the “one” side.

4.c. For each many-to-many relationship: Determine if the relationship carries data? If so, then you need to create an associative entity. The associative entity is a new weak entity with the data on the relationship as the attributes.

Download