D@TA Normalisation ERD Modelling Data Analysis And more!

advertisement
D@TA NERD
November 2008
Normalisation
ERD Modelling
Data Analysis
And more!
PLUS:
Crosswords,
Puzzles, Spot
The
difference
And more!
Send us a copy! The funniest one each month will win a prize!
Got a funny picture?
WINNER!
Cartoon of the
month
CONTENTS
COVER 1,2,3,4 By the Astist
Eimear Duffy
Introduction to Normalisation
Kevin Mallon - here we are
introduced to the normal – A
FIRST FOR MANY OF US AND NORMAL IS SUBJECTIVE
Page 22
IS NORMAL BETTER? - SAM Senior
- Mr SQL Visits Dr Database to
Find Out if He's Normal and looks
at the Yin and Yang of life as a
database
Page 25
Against all odds A Puzzle …Padraic
Lavin – Wow now I am confused.
Page 33
ERD and Distributed Databases
Tanya Polianinova what’s the point
in having a good ERD if you don’t
spread it around Page 28
Speed Test Patrick Crowe sends
the theory around the lap a
couple of times Page 36
Contents & Desk Top Publishing By Patrick Crowe
1
DATA-NERD Issue 1 November 2008
.
Mrs. Peacock In The Library ... By Gene Kelly . In
this StudyDetective Kelly TERD will lay the cludeo
plot PAGE 20
PAGE 3...DATA-NERD
NERD A Cradle for
our Creativity....... Ian Reston
In this article Ian provides a
practicle solution to help the
publishing staff understand what
the hell is going with the creative
department!
HOW DATA ANALYSIS CAN HELP
YOU TO IMPROVE YOUR SEXUAL
LIFE – Alfredo del Campo our in
house Latin Lover solves all your
problems ...Read it if you need
IT! Page7
Database Design: Only for
fellows with Mercs? Gary
Gallagher – from the building
site to the gas guzzler or Carroll
to Chen. Gary stresses that
design leads to function. P10
2
Page 15..Let’s get Physical
....Denis Farrell examines the
logical and the physical, the
Brain or the Brawn!
INTRODUCTION TO ERD
MODELLING ..Fatih Degirmenci –
no better place to start than with
the model man Page 18.
DATA-NERD Issue 1 November 2008
A Cradle for our Creativity
By Ian Retson
Article
In this first Issue of our Data-Attack Magazine we thought what better way to relate our
readers to the subject than to describe in outline our very own in-house bespoke Cradle
database. This is the key part of our information system that allows us to focus on bringing
you interesting Creative articles like this one and less time worrying about the mechanics
required to produce it.
“Genius is one percent inspiration and ninety-nine percent perspiration” [1]
The Cradle is at the core of our steady state organization driving our business in the creation,
collection and communication of information aimed at you the Database NERD and the
wannabee NERD community.
There are separate specialist Publishing and Distribution systems that were purchased as off
the shelf packages. This allowed us to concentrate on our key information system.
The “hands that rock the cradle” [2] or stakeholders were identified initially within the
Inception Phase; this provides us with a Top Down external view of the system and helps us
establish boundaries:
•
•
•
•
•
•
Our NERD customer (YOU) demands informative, varied format and fun articles
that also communicate the latest trends within the world of databases.
A free cut-down on-line version of each Magazine Issue is also made available and is
used as a vehicle for registration of extra keen SUPER-NERDS.
The ACCOUNTANTS (NON-NERD) require that we are cost effective.
The EDITORIAL (UBER-NERD) staff requires that articles are available for review,
to meet editorial and final production deadlines.
The JOURNALIST (NERD-SYMPATHISER) requires a repository where they can
lodge their articles and have access to a library of previous contributions from
internal and external sources.
The NERD in turn is encouraged to provide feedback including contributions
(NERD- SYMPATHISER-NERD).
During the Elaboration Phase the following details were established, providing a bottom-up
view of the system; note the nouns and verbs:
3
DATA-NERD Issue 1 November 2008
A Magazine is issued on a regular basis made up of Articles approved by the Editor. SubEditors are responsible for individual departments e.g. News, Puzzles, Feedback, etc. An
issue maybe categorized as regular or special re-issue or on-line version. An Article is created
from one or more Items contributed by our in-house and external Agency Journalists.
An Item is designated a media type which currently distinguishes between photograph,
illustration and text, but there maybe more in the future.
At the moment only one magazine is produced but market conditions permitting we hope to
expand into the OO Modeling world and on to infinity. Our Subscribers are both individuals
and retail shops. Subscribers are encouraged to contribute articles.
Note that we didn’t leave our data experts perspiring in the basement but we embraced them
as an integral part of the ongoing analysis & design and so we avoided the mistake where
“The database team often works on its own without open doors of communication.”[3]
“The foundation of modern database technology is without question the relational model; it is
that foundation that makes the field a science”. [4]
4
DATA-NERD Issue 1 November 2008
“Design Engineering should always begin with a consideration of data; the foundation for all
other elements of the design”. [5]
5
DATA-NERD Issue 1 November 2008
Some interesting nerdy points from the Cradle ERD:
•
The description of the stakeholders, provide us with insight into the boundaries and
scope of the system. The Publishing, Distribution and Accountancy packages are
outside the scope of the Cradle System; however the entities Article, Subscriber and
Staff respectively indicate the genesis of data interfaces between the systems.
•
Note the correlation between the nouns in the business description and the entity
names in the ERD. The verbs would normally provide us with the associations or
relationships between the entities but they can be spotted as Foreign Key attributes.
Can you add the association roles to the ERD?
•
The main high volume transactional tables are Item followed by Article, which act as
the main system repository; both of which have numeric primary key constituents for
efficient processing; thereafter the tables are more Master control tables concerned
with categorizing & grouping the transactions.
•
An Article may consist of one or more Items. This promotes parallel activity allowing
items to be contributed outside of Issue and Article deadlines; the concept also
supports the efficient re-use of items across multiple articles over time.
•
The cancelled attribute in Article provides us with the capability of stopping an article
being added to an Issue after it has been approved by the Editor. This allows us to
resurrect the article for future issues and avoids a messy deletion option. How would
a deletion option work? What would be its consequences?
•
Editor and Journalist are shown as separate entities since their roles are quite distinct
within the system i.e. An Editor controls Articles and Issues whereas Journalists
contribute items but both are subtypes of the Staff Entity. Note that a Journalist
maybe external therefore is not a {complete} subtype [This is a discussion for our
sister Magazine Object-Attack!!!!].
Further normalization can be achieved as you may have spotted; Address Information
is present in the Agency, Staff and Subscriber entities. How would you rationalize
this into the Diagram?
“He who asks question is a fool for five minutes; he who does not ask a question is a fool
forever.” [6]
•
Answers in next Issue when again more
6
DATA-NERD Issue 1 November 2008
HOW
DATA ANALYSIS CAN HELP YOU TO
IMPROVE YOUR
SEXUAL LIFE
Greetings, my dear reader! Now that I’ve got
your attention we can move on to the
fascinating world of the Data Analysis. Right
now, you must be wondering: “And what on
earth does Data Analysis have to do with my
sexual life?” - fair enough, keep reading this
article and you will find out by yourself.
First of all, let’s give a definition of Data
Analysis: “is
is a process of gathering,
modelling, and transforming data with the
goal of highlighting useful information,
information
suggesting conclusions, and supporting
decision making. Data analysis has multiple
facets and approaches, encompassing diverse
techniques under a variety of names, in
different business, science, and social science
domains.” – Wikipedia.
Analysis, Design, Standards and Support
Support.
Data Analysis is in the first phase, its input
will be the results of Data Gathering and its
output will be the input for Conceptual Model
and Usuability Requirements.
Having said that, everybody agrees that Data
Analysis is a useful activity to do but in the
real world we can find a surprisingly
isingly common
case where collected data is stored but is never
analysed.
In this article, we will cover what rol Data
Aanalysis plays in the design of a project,
project
next step will be to talk about how we can
collect data and various techniques to do so.
Following that point, if you are still with me,
we will have an overview of both quantitative
and qualitative data and most important of all
we will discover the links between your sexual
life and Data Analysis.
Data Gathering/Data
Data
Collection Techniques
Project phases design
Data Analysis is one in multiple steps, but no
less important, that belong to the complex
process of Engineering Methodology
Methodology. These
are the different phases we follow that
comprise the design of a project/product:
7
This is the very initial phase of the design.
design
Following are the most common techniques
that are adopted to gather data:
User Interviews, Contextual Enquiry,
Enquiry Personas
/ Scenarios, Direct Interview,
Interview Indirect
Interviews.
DATA-NERD Issue 1 November 2008
Quantitative data analysis
Here I recommend some useful software
programs
rograms for analysing quantitative data:
data
•
•
•
Epi-info: Covers most
ost of the
statistical analyses.
Minitab: Covers
overs all the basic
statistical analyses.
SPSS: Statistical
tatistical package.
A brief definition of quantitative research
can be, a measure of how many actors (can be
humans, or anything that interacts with the
system under study) act in a particular way.
The collection of data tends
ends to include large
amount of information – ie, minimum number
of intervies should be 50. Questionnaires are
the most common tool used for this purpose,
with closed questions normally.
Data quantitative analysis strategy:
8
For describing the participants,, we can use
the typical descriptive
escriptive statistics
statistics, such
Frequency counts, Proportions,, Measures of
central tendency (mean,
mean, median, mode),
mode and
Measures of dispersion (standard
standard deviation,
inter-quartile range, etc…).
Talking of relationship or association
association, we can
count on Association and Correlation.
Correlation
If what we are treating is comparative studies,
studies
we have several techniques to work with, ie:
Student's t-test statistic, Mann-Whitney
Whitney U
test, paired t-test, Analysis of Variance,…
Variance
Analysing qualitative research
Some useful software that can help:
•
•
NVIVO: Accumulates data,
data assigns
codess to data and analyses this
encoded-data numerically.
Ethnograph: A similar program to
NVIVO.
DATA-NERD Issue 1 November 2008
Note for the reader: honestly, did you really
believe that the Analysis of Data could
improve the sexual life of anybody?.... Got ya!
So, what is qualitative research
research? In market
research, is used to help the observant to
understand the motives of the people, how they
feel and why. For this purpose, the researcher
asks questions such as why do you..? to collect
detailed information. Compared to quantitative
methods, where accumulated data is much
larger, samples
amples tend to be smaller. Here we
have the most common methods of Data
Analysys in Qualitative Research, compiled by
Donald Ratcliff:
Typology,
Taxonomy,,
Constant
Comparison Grounded Theory, Analytic
Induction,
Logical
Analysis/Matrix
Analysis, Quasi-statistics,, Event Analysis
Microanalysis,
Metaphorical Analysis,
Analysis
Domain Analysis, Hermeneutical Analysis,
Discourse analysis,
nalysis, Semiotics, Content
Analysis, Henomenology Heuristic Analysis,
Analysis
Narrative analysis.
DATA PRESENTATION
Information processed from a sample
mple can be
presented in many ways. Rather than just
giving plain numbers about the central
tendency and dispersion, we should look for
friendlier ways of presenting data such graphs
or chartss (ie, frequency, polygon, histogram,
bar/par);
people could see better the
result of the research.
9
DATA-NERD Issue 1 November 2008
Database Design:
Only for fellows with Mercs?
GARY GALLAGHER
Liam Carroll is currently one of Ireland’s leading property developers. His company, Zoe Developers,
have built more apartments in Dublin’s inner city than all other builders combined. Carroll is heavily
involved in the high profile Dublin docklands re-development project and is responsible for the
Cherrywood Innovation and Technology Business Park site in Loughlinstown, which houses corporate
giants such as Dell and Friends First.
Carroll’s current standing, however, is a far cry from his initial forays into property development.
Examples of early efforts in 1989 include Fisherman’s Wharf - “a humdrum scheme of townhouses
and apartment blocks” and Portobello Harbour, described as having “no design or functional integrity”
(narrow, lego-like constructions with one room on each floor). All of these early developments share
one startling characteristic – Carroll did not employ architects for their design. Architects, Carroll
claimed, were “only interested in designing penthouses for fellows with Mercs”. It was only with the
introduction of Government apartment design guidelines in 1995, coupled with the prospect of more
complex development schemes, that Carroll finally decided to engage architects to plan and design
properties correctly. This move paid off, catapulting Carroll from his status as ‘the shoebox apartment
king’ to a respected and successful developer responsible for some of the country’s largest residential
and commercial developments.
As you worryingly re-examine the title of this magazine, possibly thinking that you may have picked
up the wrong one in the shop, let me re-assure you that the above anecdote does hold some connection
to Database Design. It is based on a widely accepted saying among database workers that building a
database without a design is akin to building a house without an architect’s blueprint. Before
elaborating, let us first examine what we are talking about when discussing Database Design.
Database design is also referred to as database modelling, however it has nothing to do with women,
catwalks or lingerie (sorry lads). Fear not though, as some similarities do exist for the more imaginative
of us. Data modelling is essentially a method of organising data so that it can be used effectively by
databases. It is concerned with structuring data in a way that it presentable and is placed in nice neat
packages for processing by the database. It is the first, and some would argue most important step, in
creating a database.
Webopedia defines data modelling as ‘the analysis of data entities and their relationships to other data
entities”. An entity, in this case, is any object about which we wish to store information in the database.
10
DATA-NERD Issue 1 November 2008
They are items in the real world that are capable of existing independently. To illustrate this in simple
terms think about a computer vendor’s database. Here, in simple terms, you would need to store
information about the vendor’s clients (cust. ID, name, address, tel. #) and about the products that it
sells (model number, spec, price, availability). The entities here are therefore ‘client’ and ‘product’.
Now that you have some idea what it is, you may ask why it’s important enough for us to waste our
time and your money publishing a whole magazine about Database Design. Fair question, lets try to
demonstrate why it is so useful (obviously the sight of the Portobello Harbour shanties isn’t enough for
you) by looking at another example, this time loosely based around the current scramble to become
GWB’s successor at the White House. In the aftermath of such an election, in depth analysis would be
carried out on various aspects of the election. For example - knowing the number of people that have
voted for the various different political parties would be invaluable. This could be achieved by
including a column in the database from the very beginning for which party each person voted for. If,
however, this column was omitted at the beginning it would be very time consuming collating the
relevant data to get the same result. It is at the database design stage where the decision to include such
a column would be made.
The importance of the design stage is equally apparent with even the most basic of databases. You
could say that if a Formula 1 racing car doesn’t have smooth aerodynamics, it will drag and go slower.
Equally, if a database doesn’t adhere to best practices, it won’t perform as efficiently as possible. There
are several methodologies used for creating the ‘perfect’ database. In this edition we focus on what are
widely regarded as the two most effective techniques – the usage of Entity Relationship Diagrams
(ERDs) to assist in matching the business needs of the database to the physical design; and a process of
safeguarding the database from structural problems known as Database Normalisation.
An ERD is essentially a graphic representation of the entities, and the relationship between the entities,
within a database. Although initially introduced in the 1960s by a General Electrics engineer, the
development of ERDs is credited to the American scientist Professor Peter Chen. Chen’s original ERD
paper was selected as one of the 38 most influential papers in Computer Science, resulting in his ERD
approach being ranked as one of the top methodologies in systems development by several surveys of
FORTUNE 500 companies. Yes folks, it works. While an ERD is mainly concerned with the
relationships between the entities of a database, the goal of database normalisation is to reduce the
amount of space a database consumes by eliminating unnecessary duplication of data, thus increasing
overall performance. Although often previously overlooked as a complicated process for academic
geniuses, it is now accepted that a grasp of the principles of normalisation can drastically improve
database performance.
These methodologies will be explained in more detail as you read on, where their importance will
hopefully become even more apparent. Should their relevance escape you however, you may want to
consider again the following. From the shoebox king, to one of the worlds most influential computer
scientists, the basic principles used in creating a database remain - effective planning and design are
essential parts of any project. Without them, the roof might fall in.
11
DATA-NERD Issue 1 November 2008
Keeping IT Real:
by Aine Daly
How to use Logical ERD Modelling in Effective Database Design
The logical data model is primarily focused on the representation of
REALITY…tangible objects, actual characteristics, bona fide
relationships…these
are the fundamentals of logical modelling. Analytically structured to reflect
the
core requirements of a business. The model is independent of technology and
not created with a physical data store in mind. This will come into play in the
next phase of design – the Physical Model.
Systems have both Technological Components
•
•
•
Program
Database Management
System
Screen Components
Technology Independent Components
•
•
Logical Data model
Business Rules
&
The logical model concentrates on the needs of the business, there are no
details included about the physical hardware and database technology.
It reveals the business processes and data that exist and reflects the
relationships between the two. The goals at the Logical ERD model are:
12
DATA-NERD Issue 1 November 2008
a
ESTABLISH INFORMATION/BUSINESS REQUIREMENTS...
..DATA-ENTITIES, RELATIONSHIPS, ATTRIBUTES, CARDINALITY...
b
GRAPHICALLY REPRESENT THESE REQUIREMENTS
….SO THAT THEY MAY BE UNDERSTOOD
COMMUNICATION between the Business/organization and the Database
designer is critical in order to achieve the above objectives. Both may have
different ideas about what the requirements and structure of the database
should be and collaboration ensures that the system developed will fit the
business needs. The Logical ERD can be used as a tool of communication as it
can be easily explained to non-technical clients.
Logical Entity Relationship Diagram Models convey a great deal of
information using a very apt and succinct notation. The components used in
Logical ERD development are: Entities, Relationships and Attributes. Using
these components the logical model identifies entities and the correct
relationships among them. The term unique identifier is used to describe data
element that differenciates between one entity and another. It replaces the
term Primary Keys because once again, it is technology independent
whereas Primary Key represents a unique identification of a row in a table
that can be used as a foreign key in a related table.
Normalization is used to remove reduntant data and optimize the overall
data structure by grouping the data elements correctly, ensuring that entities
are properly formed and each attribute is assigned to the correct entity. This
systematic process produces a solid database structure which will allow for
data to be stored and retrieved in the most efficient manner. If the correct
data is not captured problems are sure to follow. If the relevent entities or
relationships are not represented correctly in a data model, then end-user
queries about these entities and relationships cannot be answered.
13
DATA-NERD Issue 1 November 2008
Regardless of the application that is used in implementation, if you take the
time to carefully build a logical model your result will be solid foundation for
your database. It is this framework which will dictate the relevance, speed
and efficiency of the final database and an organizations success when using
it to conduct business. It should also have a positive impact on the cost of the
system development as it resolves problems at an early stage and does not
incorporate redundant data. Figuring out these issues at the design and
database developments phase is significantly cheaper then trying to fix a
problem in an implemented system. The next step is the Physical model
summarised below:
•
Logical
•
•
The implementation of the logical
model in the chosen database
structure
The physical diagram is platformspecific and more detailed
mapping of the logical model to the
physical hardware and database
technology
Physical
14
DATA-NERD Issue 1 November 2008
Let’s get Physical
By Denis Farrell
To understand Physical ERD Modelling fully, we have to look at the complete ERD Modelling
Picture.
In the design phase of databases, data is represented using a certain data model. These data
models are a gathering of concepts or notations for describing data, data relationships and
data constraints. Data models are either:
1. Conceptual models
•
•
•
Collection of entities.
Flexible data structuring capabilities.
Examples of this model is object-orientated model, semantic data model and
entity-relationship model.
2. Record based logical models
• Data is considered as a collection of fixed – size record.
• These models are closer to the physical level or file structure so they are
easier to implement.
• The three most well known models of this kind are relational data model ,
network data model or hierachical data model.
3. Physical models
•
Provide concepts that describe the details of how data is stored in the
computer’s memory
It is important to understand how logical and physical models relate to each other and the
differences between them.
Logical
The first stage is to gather all the business requirements for the planned
database and convert these requirements into a model. The logical model
does not look at the needs of the database but the business requirements are
used to determine the needs of the database.
After all the business requirements and information is collected, reports and
diagrams are produced together with entity relationship diagrams, business
process diagrams, and eventually process flow diagrams. The diagrams
created should demonstrate the processes and data that exists. It should also
demonstrate the relationship between the data and the business processes.
15
DATA-NERD Issue 1 November 2008
Logical modelling should clearly depict a visual illustration of the activities and
data relevant to a particular business. Logical modelling has implications on
the direction of the design of the database, however it also indirectly affects
the performance and administration of an implemented database. If time is
taken to perform logical modelling, more opportunities arise for planning the
design of the physical database.
Logical modelling produces diagrams and documentation which determines
whether or not the business requirements have been completely gathered.
This information is the then reviewed by developers, management and end
users to decide if more research and work is required before the
commencement of the physical modelling.
From Logical Modelling we expect to get the following deliverables.
•
Entity relationship diagrams
This give the development team the initial picture what the database needs to deliver.
It will show the different categories of data for the business and how they relate to
each other.
•
Business process diagrams
The process model illustrates all the parent and child processes that are performed
by individuals within a company. This shows the development team how data moves
within the business
•
User feedback documentation
Physical Modelling
Physical modelling relates to the actual design of a database. It is cost
effective and a practical tool for problem solving and design optimisation. The
requirements that were recognised in the logical model set out the basis for
the design of the database. The physical model deals with the converting the
requirements gathered in the logical model into a relational database model.
Throughout physical modelling objects such as tables and columns are
created. This is based on the entities and attributes defined in the logical
model. Also at this stage constraints are defined, including the primary keys,
foreign keys and other unique keys. From database tables views can be
created to summarise data. All the pieces are brought together in the physical
model and this defines the database for the business.
One restriction of physical modelling is that it is software specific. This means
that the objects defined in the physical model can vary on the relational
database software been used. Variations exist in the way the data types are
represented and stored. Conceptually, basic types of data are the same with
different implementations. Databases systems differ in the objects that are
16
DATA-NERD Issue 1 November 2008
available in one may not be available in another and as a result of this,
physical models hardware and software dependent. Oracle is an example of
software that will work with many operating system such as Windows NT and
UNIX. Java-based products can be used on virtually all operating platforms
and hence its popularity. So when choosing database software, hardware and
operating system platforms, these need to be looked at in conjunction with
one another.
From physical modelling we expect to get the following deliverables.
•
Server model diagrams
This diagram demonstrates relationships within a database, shows tables and
columns.
17
•
User feedback documentation
•
Database design documentation
DATA-NERD Issue 1 November 2008
INTRODUCTION TO ERD MODELLING
By Fatih Degirmenci
One of the most painful problems of database design is different views of designers,
programmers, and users and this causes design of useless databases or databases which do not
reflect purposes of actual database. Data Modelling is the first step of Database Design Process
and it is laid between real world objects and database model. To keep everyone involved and
aware of design, it is necessary to use a method that simplifies design process. Entity
Relationship Diagram Modelling is a method that removes potential roadblocks and simplifies
database design process.
DATABASE DESIGN AND ERD MODELLING
Database design is a software engineering activity falls in design activity in generic software
engineering process.
Database design process consists of a number of steps including identifying the data to be stored,
determining relationships between stored data, and structuring data. [1]
Modelling part is an intermediary step that falls in between requirements gathering and
construction, and ERD Modelling is widely used modelling schema for this purpose. It allows us
to abstract notional representation of structured data using conceptual schema to design database
and it is a general data modelling type for relational databases, which helps design process to be
simplified. [2]
Some of the key terms of ERD Modelling are described by Paul Chen as below
“An entity is a “thing” which can be distinctly identified. A specific person, company, or event
is an example of an entity. A relationship is an association among entities.” [3]
There are several types of ERD Modelling and widely used type of ERD Modelling is developed
by Peter Chen. In Chen’s ERD Modelling, entities are represented by rectangles and entity name
is in these rectangles expressed in singular form. [4]
student
Entity attributes are not shown on ERD itself in original Chen model but it is extended to include
attributes. Attribute preceded by an asterisk is the identifier of entity. [4]
*sId
name
student
address
18
telephone
DATA-NERD Issue 1 November 2008
Relationships show how two or more entities related with each other in forms of verbs, for
example student submits assignment. In this example, student and assignment are entities and
submit is the relationship.
submits
student
assignment
There are several other notations which can be used to draw ERDs and one of the widely used
notations is Crow’s foot notation.[1] If we redraw above example with using this notation, we
have below diagram.
student
assignment
submits
Relationships can be in several forms, one-to-one, one-to-many, and many-to-many. In one-toone relationship, one entity is related to only one entity. In previous example, a student related
with one assignment to show one-to-one relationship. In real world, a student may submit more
than one assignment and this is a good opportunity to show one-to-many relationship. In this
case, this relationship can be redrawn as below to include one-to-many relationship.
student
assignment
submits
Completed ERD shows the overall plan of database, which is named logical ERD. Database
designers need to be aware of logical ERD. In DBMS terms, realization is done in physical ERD
schema.
In database design, communication with end users is an important step to gather requirements of
database and have a common view on real world entities. When data modelling starts,
differences of end users’ views and developer’s views are become the main problem which is
laid upon developer’s hands and could be solved if developer creates a data model that can be
understood by end user. ERD Modelling is useful when users need to know more on design and
developers need to explain design aspects to users. This type of schema gives chance to its users
and developers to share common view of data and knowledge on how database design issues can
be handled.
REFERENCES
[1]
“Entity-relationship model - Wikipedia, the free encyclopedia”;
http://en.wikipedia.org/wiki/Entity-relationship_model.
[2]
S. Bagui and R. Earp, Database Design Using Entity-relationship Diagrams,
Auerbach Publications, 2003.
[3]
P.P.S. Chen, “The entity-relationship model—toward a unified view of data,” ACM
Transactions on Database Systems (TODS), vol. 1, 1976, pp. 9-36.
[4]
J.L. Harrington, Relational Database Design Clearly Explained, Morgan Kaufmann
Publishers, 2002.
19
DATA-NERD Issue 1 November 2008
Mrs. Peacock in the Library
By: Gene Kelly
Mrs. Peacock
In The Library
With The Candle Stick?
Dr. Black Murdered!
Dr. John Black (48), self made millionaire, hosted a weekend celebration at his country
mansion to celebrate the 30th anniversary of his company, DBD inc. Suspicions first arose when Dr. Black was
nowhere to be seen in the drawing room for pre dinner drinks on Saturday night. By the time deserts were
being served there was still no sign of Dr. Black and Mrs. White, his maid of 25yrs, now feeling a little worried,
went to Dr. Black’s room to look for him. Just as she was about to knock on his door, she heard a scream echo
from what appeared to be the kitchen, this was abruptly followed by another scream coming from the
entrance hall. Mrs. White went to investigate…
Black’s Tudor Mansion, built in 1586
When Mrs. White reached the bottom of the stairs she
was met by Mrs. Peacock who was being comforted by
Reverend Green. They were both standing beside a pool
of blood which had been smeared across the carpet. Mrs.
White felt a strange feeling in her stomach, she wasn’t
sure if it was worry or hope, She continued to the kitchen
to find the source of the first scream. In the kitchen she
was met by Miss Scarlet who was standing by the cold
room with her hand on the door, Mrs. White was closely
followed by Professor Plum who had also come to find the
source of the scream, they both looked into the open cold
room to find the body of Dr. Black.
Mrs. White ran to the nearest telephone, which was in the Lounge, she called
the local police station and informed them of the news, they would send
someone over right away… as Mrs. White made her way back to the others,
she passed through the Billiards Room where she met Colonel Mustard
sitting in a leather armchair, swirling his snifter of cognac with one hand
while holding his wooden pipe with the other. Apparently oblivious to the
happenings in the rest of the house. Mrs. White told Colonel Mustard about
the body and led him through the conservatory into the ballroom where the
rest of the guests had gathered. When Mrs. White arrived in the ballroom,
she noticed that one of the bronze candle sticks that stood by the fireplace
was missing. Just as she was about to point this out, A Knock!
Mrs. White went to answer the front door, where she was met by Mr. Parker, the
local police officer and another man whom she did not recognise. Mr Parker
introduced the other man as Dr. Peter Chen, who was visiting from Louisiana State
University to help update the methods used to collect police data. With that Dr. Chen
proclaimed, don't worry Mrs. White, I'm on the CASE!
20
DATA-NERD Issue 1 November 2008
21
DATA-NERD Issue 1 November 2008
Introduction to Normalisation
By Kevin Mallon
Normalisation is the process of organising data in a database. The goal of data
normalisation is to reduce and if possible, eliminate data redundancy. This is an
important consideration for application developers because it is incredibly difficult to
store objects in a relational database that maintains the same information in several
places. Redundant data also wastes disk space and creates maintenance problems.
The main reason for normalizing is the possible corruption of databases due to three
main factors - insertion anomalies, deletion anomalies and update anomalies.
Insertion Anomalies
Why
Normalise?
Deletion Anomalies
Update Anomalies
Normalisation can also be referred to as canonical synthesis as this is the process of
designing a database model without redundant data items. Well normalised data
makes the task of programming a lot easier and works very well in multi-platform,
enterprise wide environments. Data Normalisation is sometimes known as the cure
for Spreadsheet Syndrome, the lumping of every possible piece of information into as
few tables as possible, sometimes into a single table.
Normalisation
Spreadsheet Syndrome
Concepts
22
DATA-NERD Issue 1 November 2008
The original concept of database normalisation was introduced by Edgar Frank Codd
in 1970 in his paper “A Relational Model of Data for Large Shared Data Banks”. In
this paper, Codd states “there is, in fact, a very simple elimination procedure which
we shall call normalization. Through decomposition non-simple domains are replaced
by "domains whose elements are atomic (non-decomposable) values."”
There are a few rules for database normalisation. Each rule is called a "normal form."
If the first rule is observed, the database is said to be in "first normal form." 1NF is
often referred to as the atomic rule. In a database, this means that each column should
only be designed to hold one and only one piece of information. If the first three
rules are observed, the database is considered to be in "third normal form." Although
other levels of normalization are possible, third normal form is considered the highest
level necessary for most applications. The concept of functional dependencies is the
basis for the first three normal forms. A functional dependency occurs when one
attribute in a relation uniquely determines another attribute. This can be written A ->
B which would be the same as stating "B is functionally dependent upon A. The table
below shows the three most common forms of normalisation.
Level
First Normal Form
(1NF)
Rule
An entity type is in 1NF when it contains no repeating
groups of data.
Second Normal Form
An entity type is in 2NF when it is in 1NF and when all of
(2NF)
its non-key attributes are fully dependent on its primary key.
Third Normal Form
An entity type is in 3NF when it is in 2NF and when all of
(3NF)
its attributes are directly dependent on the primary key
23
DATA-NERD Issue 1 November 2008
SOLUTION TO: Puzzle Page 34
1
A
7
E
9
W
T
3
O
N
E
T
V
T
B
2
R
E
L
I
A
T
I
O
L
B
U
5
D
T
I
E
A
I
A
V
E
R
M
O
N
C
O
M
H
10
S
I
U
Y
T
N
D
E
11
D A
P
T
T
A
B
E
B
I
O
C
A
S
E
T
L
12
O
A
13
S
O
G
8
6
4
R
D
E
R
G
M
E
R
C
E
O
R
Y
24
DATA-NERD Issue 1 November 2008
Is Normal Better?
By:Sam Senior
Mr SQL Visits Dr Database to Find Out if He's
Normal...
Mr SQL: Wow! I followed the plan of decomposing
d
tables into more tables and can feel the redundant data
just slipping away.
Dr Database, I am not sure if I am
Mr SQL:
Normal or not. Can you help me?
Dr Database:
Well, Mr SQL, do you feel atomic?
Mr SQL:
Not sure what you mean?
Dr Database:
Well, a Normalised database has
atomic data. Think of an atom. In other words, the data
can't be broken down any more. For example, first
fi name
can't be broken down any more.
Mr SQL:
Dr Database:
As I predicted, you now have no
duplicated data due to decreased redundancy.
Mr SQL: My CPU is a lot cooler but when people query
me it takes me longer to respond
espond because of the table
JOINs.
Dr Database:
Mr SQL: Denormalise? But I spent ages trying to
Normalise! Why would I want to do that?
I'm just a raw, Unnormalised database.
Dr Database:
me out...
Dr Database:
Do you feel any anomalies?
Mr SQL:
Oh, yes, plenty Doc. I have
inconsistent data and my CPU's very hot and overloaded.
Also, I feel so bloated and large...must
..must be all the
redundant data I have.
Dr Database:
Sounds like you have an acute case of
Spreadsheet Syndrome. Well, I guess you need to be
Normalised. I will outline the basic plan...
Three Normal Forms later...
25
Well, we could Denormalise you a bit.
Well, it's not all black and white. Hear
What are the advantages
vantages of Normalisation?
Since there is no duplicity in a Normalised database there
will be little or no anomalies. This means little to no
administration to ensure that the redundant data is
accurate and up-to-date.
date. In addition, little or no
redundant data means fewer storage requirements. A
simplier more efficient structure also means the database
is more scalable. Also, write actions such as INSERT,
UPDATE and APPEND, ie: writing to the database, will
run better.
DATA-NERD
DATA
Issue 1 November 2008
CUSTOMER
However, it's not all good...
CustomerNum, CustomerName, Phone1, Phone2, Phone3...
As the table count increases during the Normalisation
process so to does the JOIN count. If the database is large
then JOIN jungles can be created which can eventually
effect response times.
What can be done to improve performance?
Improve the Normalisation design so that it reflects the
data usage; create indexes for frequently queried
attributes; clustering or just accepting poor performance.
However, if the users still complain…Denormalise!
Denormalisation is part of the physical design phase and
can only be done after the data has been Normalised.
ANOMOLY WARNING: DO NOT DENORMALISE
UNNORMALISED/RAW DATABASES!
Question: don't read any further. What do you think
Denormalisation means and why would a SQL
administrator do it?
“Denormalisation is the design process of taking
normalised data and producing a physical design in
which normalised data is rearranged so that optimal
access and manipulation of data can be achieved.”
[Inmon]
Normalised Database Example
CUSTOMER
CustomerNum, CustomerName...
CUST_PHONE
CustomerNum, Phone
Denormalised Database Example
26
Here are some reasons why a database administrator
would contemplate using Denormalisation.
• No calculated values. For example, an
online shopping cart may have a field
called total_price, price * quantity,
which is forbidden by the Third
Normalised form. Information
Warehouses use large numbers of precalculated summary tables known as
Materialised Views. This improves
response times for summary data, ie: no
complex calculations required because a
pre-calculated result on a summary table
is queried.
• The key reason: performance. To
avoid JOIN jungles. A Normalised
database must locate the relevant tables
and then JOIN the data to either get the
information or process the data. Thus a
Normalised database uses a higher
amount of I/O and CPU. In addition,
Relational DBMSs are optimised to
perform three-way joins therefore the
database loses efficiency when more
complex joins are required. The outcome
of Denormalisation is better response
times, ie: reduced I/O and CPU. For
systems that depend on real-time
information Denormalisation may be
required.
• To maintain historical data. For example,
a Saleperson's surname may change and
if the customer name is stored in a
Normalised database any invoice report
won't list the old/new surname.
However, if the surname is stored in a
separate invoices table as redundant
data then both surnames will appear in
the report.
• For specific application requirements.
Application coding could be simpler
DATA-NERD Issue 1 November 2008
because the data is spread across fewer
tables and easier to locate.
What tools can be used to Denormalise?
To reduce the number of tables/joins it is
important to analyse which entities are accessed
by applications and how they relate to each
other. This can be achieved by using Entity
Relationship Diagrams, Data Flow Diagrams
and Cross-Reference Matrices to identify
database usage.
Disadvantages...
The key risk of Denormalisation is anomalies
caused by redundant data. Tracking the
redundant data will require extra
administrative effort.
Like everything in life, there's a balance, Ying and
Yang, et cetera...
There’s a happy medium between Normalisation and
Denormalisation but both require a complete
understanding of the data and the specific business
requirements.
27
DATA-NERD Issue 1 November 2008
ERD and Distributed Databases
By: Tanya Polianinova
Distributed databases are widely used by many companies for data storage and
manipulation. The next few paragraphs of the paper will explain the concepts of
Distributed databases and will describe the principals behind Entity Relationship
Diagram. The advantages and disadvantages of both items will be discussed in detail
as well as descriptions for each of the item.
History
Databases have been used since the time when electronic computing has started.
Around 1970s, the Distributed Database concept was introduced and since then a
variety of different organisations worldwide uses them for data storage. Around the
same time the Entity Relationship Diagram was first introduced by Charles Bachman.
ERDs are used for different databases designs and can be served as ‘foundation’ for
database development and planning.
Distributed Databases
Database represents a collection of different data that is
stored on the computerised system. Data is stored,
created, organised and sorted, manipulated and
retrieved by using different software programs or
Database Management System (DBMS) and variety of
query languages, such as SQL.
Distributed Database is a database that stores data in the different locations on the
network, which can be located in different geographical locations and is controlled by
DBMS and allows multiple users to access and manipulate data without interfering
with each other. In another words, although the data is spread across, the user sees
database as centralised system with data stored in one place.
28
DATA-NERD Issue 1 November 2008
Data is spread across by using fragments that allow multiple re-creations of the same
data. Different forms of data distribution can be used to spread data across.
Data can be replicated, where the copies of the
same data are kept in many different locations.
Data can also be Horizontally or Vertically
Fragmented. With Horizontal fragmentation, the
data is distributed across different sites, whether
with Vertical fragmentation the data is split by the
columns across multiple systems. Sometimes data
can be reorganised or in another words data is
manipulated in some way, for example
summarised and then stored. And the last method
to data distribution is known as Separate Schema, in which the data is kept in
different databases in order to facilitate different systems to access and use data with
help of different programs and interfaces.
Data in Distributed Database is regularly
synchronised in order to ensure that all of the data
is up-to-date. Data synchronisation is done by
using timestamps. Every time the data in the
database is created or updated, a timestamp is
recorded with the date and time of that update, the system then uses timestamps to
see whether the data was modified from previous time by comparing timestamps, and
updates data if required.
Distributed Database is designed in such way where the user sees the database as
centralized system, rather than a system with data circulated across multiple
locations. Although Distributed database has very complex design, it can be costly to
create and needs very high security requirements, it has many benefits. Those
benefits include reduced network traffic, as server or network is not used for most of
the database activities, improved data manipulation time, reliability and availability.
29
DATA-NERD Issue 1 November 2008
3.4 ERDs for Distributed Databases
Entity Relationship Diagram or ERD is used to graphically represent entities (tables
or objects) of database and the
relationships between these entities. ERD
shows data flows and interactions between
different objects, which are linked together
by using unique identifiers or primary
keys.
Each entity in ERD represents an object of
some kind, e.g. student or person, who is
accompanied with its attributes, for
example ID, Name, Date of Birth,
Address, etc. The entities interact with
each other by using relationships, e.g. student is assigned to the group. Sometimes the
relationship defines the number of entities with which the object interacts, e.g. many
students can be assigned to one group.
ERDs are easy to use, create and are good as communication tool. ERD can be used
as the foundation for the database design and structure. It is important, as it represents
the structure and behaviour of the system or user requirements. It can be used as
elements for planning and development processes. Although ERD can be weak tool
for representing specifications and data descriptions and even can cause a loss of
information, it has an advantage over other methods of database structure
representations, as it comes in a graphical form. This allows people without any
specific technical skills to understand how database works. This is very useful
characteristic, as database design can be very complex and difficult to understand.
30
DATA-NERD Issue 1 November 2008
Giammarco Schisani
19th of October 2008
ERD Puzzle
Fill in the blanks
By: Giammarco Schisani
Instructions
Given the following description of an Entity Relationship Diagram, fill in the blanks in the Puzzle below.
Entity Relationship Diagrams
A relational 10 can be modelled using a 7 Relationship Diagrams (or ER Diagrams). Such diagrams are capable of
describing the main components of an Entity Relationship 6: entities and 2.
An entity describes something that can be uniquely identified, such as:
•
•
•
•
An 12 in an e-13 website;
A customer in an e-commerce 9;
A product in an e-commerce website;
A 11 of products in an e-commerce website (e.g. “Monitors”, “Printers”, etc.).
Entities can often be described by a 4 (e.g. “order”, “customer”, etc.).
In an ER 5down, an entity is described with a box:
Order
A relationship describes how two or more entities relate to each other. Relationships can often be described by a 8. For
example:
•
“Places”: A customer places an order;
In an ER diagram, a relationship is described by a 5across:
Customer
31
Places
Order
DATA-NERD Issue 1 November 2008
Both entities and relations can have attributes. An attribute represents information about the entity or relationship. For
example:
•
•
•
An “order” entity might have an “ID” 1, that uniquely identifies the order;
A “Customer” entity might have “Name” and “Surname” attributes;
A “Places” relationship between a “Customer” and an “Order” entity might have a “Date” attribute indicating
when the order has been placed.
In an ER Diagram, an attribute is represented by an
Firstname
Surname
ID
Date
Customer
Order
Places
:
See Page 26 for Solution
1
7
9
3
2
4
5
6
11
10
8
12
13
32
DATA-NERD Issue 1 November 2008
Puzzle 1: Against all odds By Paraic Lavin
You work in a small company as a database administrator earning lots of money. These tables below
(A, B & C) have been designed by three different colleagues who work in another division. Their boss
has asked you to check them in order to prevent future problems, efficiency, etc. Can you spot the odd
table out?
Table A
Did you know? #1
Data should be
presented in table
format.
Figure 1.
Table B
Did you know? #2
Data should be
accessible without
ambiguity.
Figure 2.
Table C
Did you know? #3
INSERT, DELETE,
UPDATE commands
must be supported by
use of a single
command.
Figure 3.
33
DATA-NERD Issue 1 November 2008
Puzzle 2: Deleting for good not for evil
Puzzle 2A – “The Adventures of Dataman”
You are “Dataman”, a superhero with a penchant for whiskey and who recognises bad design as evil
in database tables. Can you remove one column from the following table in Figure 4 so that removing
the column converts the table into first normal form (1NF) and save the word from evil yet again?
Table D
Did you know? #5
Physical changes to
the data store should
not affect the logical
database structure.
Figure 5.
Puzzle 2B - “Dataman Returns”
Al-primary-key-da have attacked western financial markets by introducing bad design into one critical
database table. Governments across the world have said they will guarantee all affected tables but the
public fears that it is not enough. Can you delete one column and save the world yet again from
financial ruin?
Table E
Did you know? #6
Figure 6.
Constraints must
exist to preserve data
integrity.
Table F
Did you know ? #7
Codd's 12 rules are
really 13 rules
because they are
numbered 0 to 12.
Figure 7.
Answers:
34
DATA-NERD Issue 1 November 2008
Puzzle 2: Deleting for good not for evil
Puzzle 1 Against all odds:
The answer is Table A. Although none of the tables are fully normalised Table A is clearly not
normalised at all as it has repeating information i.e. Class_1, Class_2, Class_3. Should two of these
columns be deleted in favour of one “Class” column the table would be in 1NF – First Normal Form.
Puzzle 2: Deleting for good:
Puzzle 2A – Delete column FavColour or FavColour2. Either answer is correct.
Puzzle 2B – Delete column CustomerName from Table E as this information is duplicated in Table F.
35
DATA-NERD Issue 1 November 2008
The Need for Speed - War of The fields
By Patrick Crowe
In this edition of DATA-Nerd we take the chance to get out of the class-room and
take a couple of laps under the clock. In this practical I examine if the theory
regarding the correct definition of database fields is really required for
performance and if it is required does it make a real difference out in the real
world.
Objective
To examine the difference in performance between two databases identical in all
respects except the field type for one column was declared as INT in one database
and NVCHAR in the second. The column in question was used to contain numbers only.
The Test
All operations were executed using queries in MS SQL Server Management Express. The results were
obtained using the Client Statistics functionality in the same application
The DATABASES
DATABASE
Column Name
NUMBER_INT
Letter
WOTW
Speed_Test
Data Type
Int
nchar(10)
Text
ALLOW Nulls
Checked
Checked
Checked
DATABASE
Column Name
NUMBER_nchar
Letter
WOTW
Speed_Test2
Data Type
nchar(100)
nchar(10)
Text
ALLOW Nulls
Checked
Checked
Checked
Contents of Database
Column Name
CONTENT
NUMBER_INT/NUMBER_nchar Number from 1 to 535294
Letter
A
The first Paragraph from War of the Worlds by H.G. Wells 1898 (source:
WOTW
http://www.bartleby.com/1002/101.html) 230 words, 1331 characters.
The databases contained 535294 rows after population
TEST 1 – BULK INSERT
To test the Bulk Import speed from a
The data was imported from a Comma Separated (CSV) Text file using the following :
BULK INSERT Test_Table FROM 'c:\test2.csv' WITH (FIELDTERMINATOR =
',')
RESULTS
36
DATA-NERD Issue 1 November 2008
Contents of Database
TOTAL Execution Time(ms)
Speed _Test (INT)
Speed _Test2(nchar)
436437
242875
Difference
193562
TEST 2 – Simple select
The following select was used to return a rows of the
database
For Database:
Speed_Test
Select * from [Test_Table]
Where
Number_nchar > 0
For Database:
Speed_Test2
Select * from [Test_Table]
Where
Number_nchar > 0
RESULTS
The test was run 4 times for each database and the results are in milliseconds
DataBase
Speed Test
Speed Test2
Difference
37
Test 2
Test 1
Test 3
Test 4
Average
15734
15062
14156
14750
14925.5
194406
213265
209062
244390
215280.8
200355.3
DATA-NERD Issue 1 November 2008
4
3
Difference
SpeedTest2
2
SpeedTest
1
0
50000
100000
150000
200000
250000
300000
Conclusions
It is clear from the test results in this particular environment that the correct declaration of a numeric
field has significant performance issues. As part of the overall design of a database care should be
taken to numerals and Characters to help optimise performance.
The Environment
Hard ware
Lenovo
ThinkPad R61 T8100 @2.10 GHZ
CPU
Core 2 Duo
Memory
RAM 4GB
Disk Space (at start of Speed Test) 142 GB, 84MB free
Software
Operating System
Windows XP professional 2002 Service Pack 2
Database
9.00.1399.06
Microsoft SQL Server 2005 standard Edition , Version
Database Management
MS SQL Server Management Express ,Version 9.00.2047.00
Other Software
(open but not in use during test)
38
MS EXCEL, Google Chrome
DATA-NERD Issue 1 November 2008
HOROSCOPE
Psychic Meg
is on hand to
analyse the
cosmos!
What the stars have in store for you!
ARIES
TAURUS
The stars have aligned just
for you. Now is the time to
sell your collection on eBay.
The recession hasn’t hit
your star sign just yet! Sell
sell sell!
This will be a deeply
depressing week when you
realise your database has
way more friends than you
do. Maybe now is a good
time to step into the real
world.
GEMINI
Be careful what you wish
for; it just might happen.
Think BIG and BIG is what
you will get. Hopefully this
won’t apply to your
waistline but could be very
advantageous in your
career!
CANCER
Fail to plan and you could
be planning to fail! Make
sure your recovery and
failover plans do work. This
month could be tricky… Be
prepared!
LEO
This is your future self!
Don’t give up on your timetravel research. Take the
time to include people
around you in formulating
a plan. Others will
appreciate it and recognise
you as a team player.
VIRGO
'My Precious' - Finishing
your Germanic translation
of the Lord of the Rings
book will finally culminate 6
years worth of Friday and
Saturday nights. Time to
party!
LIBRA
You are destined to meet
the person of your dreams
this week. Keep your
distance however. Time to
kick on-line dating into
cyberspace. Things are not
always as they seem!
SCORPIO
“There is no spoon!” Keep
this phrase in mind this
month as nothing is clear
or set in stone just yet.
Clarity will come next
month. Swirling your cup
will help mix the coffee,
milk and sugar.
CAPRICORN
You will arrive in a strange
universe where you still live
in your parent’s house,
Battle Star Galactica is no
longer cool, and your mum
still licks her thumb and
uses it to wash dirt off your
face. Do your best to
survive until the next worm
hole opens up then jump as
if your life depended on it!
AQUARIUS
Front page news - Your
dreams of making “Wonder
Woman vs. Cat Woman”
into a movie will finally be
realised. Keep the
spandex-wearing stories to
yourself though – your plan
of world domination must
remain a secret. The world
is not ready – just yet!
SAGITTARIUS
Feeling paranoid that your
car might be an Autobot?
Don’t fret; you aren’t losing
your mind. It will need a
service, so book it in soon.
PISCES
Abandon ship. Your robots
have become self aware.
All mayhem is about to
break loose. You and your
kind are the first to be
integrated and soldered
into the motherboard.
Abort while you can!
Advertisement
Want
to
Learn
more?
Check out www.comp.dit.ie for the full
range of innovative, exciting and
flexible industry focused full-time and
part-time undergraduate and post
graduate courses.
Download