T01-notes-the_database_environment

advertisement
IST459: NOTES : THE DATABASE ENVIRONMENT
T OPIC : T HE D ATABASE E NVIRONMENT
T ABLE
OF
C ONTENTS
Topic: The Database Environment .................................................................................................................................1
Learning Objectives ...................................................................................................................................................1
Part 1: Databases: It’s all about the data or *is* it? ..................................................................................................2
Data .......................................................................................................................................................................2
Information ............................................................................................................................................................2
Metadata ...............................................................................................................................................................3
Data Management .................................................................................................................................................4
Part 2: Databases vs. DBMS .......................................................................................................................................6
Database ................................................................................................................................................................6
So, what features does the DBMS bring to the database party? ..........................................................................7
What are the benefits and drawbacks of the DBMS? ............................................................................................8
Wow the DBMS does so much. It slices. It dices. It does Windows! .....................................................................9
Data Models: Degrees or “Layers” of Data Abstraction ........................................................................................9
L EARNING O BJECTIVES
In this learning unit we will learn the fundamental concepts which will lay the foundation for the rest of the course.
Some of these objectives will be covered in this document, others in the class lecture, assigned readings, and labs.






Concertize the concepts of data, information, data management and metadata
Explain what a database is and why databases are important
Describe a database management system
Differentiate between the DBMS and a database
Describe the different data models and abstraction layers
Explain the similarities and differences among DBMS products
Page 1


Explain DBMS history and modern uses
Describe how data is physically stored in primary and secondary storage
P ART 1: D ATABASES : I T ’ S
ALL ABOUT THE DATA OR
* IS *
IT ?
Are databases really all about the data? Well, not really. As you will see, data are just one piece of the puzzle. And
to truly differentiate between what a database is what it’s not, you must first have a clear understanding of these
four fundamentals: Data, Information, Metadata, and Data Management.
D ATA
What is data? Data is a generic label for the attributes, facts, figures, measurements or characteristics that
describe real world or super-natural objects or entities. Data are typically objects like people, places, things, events
or ideas that we care to store for a specific application or purpose. Data can be very useful or it can cause
challenges which lead to bad decision-making and high data management costs. There are four characteristics of
data that we need to consider. For data to be useful it needs to be ARTC, pronounced “artsy”:




Accurate - correctly represent an actual entity attribute
Relevant - germane or pertinent to the entity being described
Timely - within the timeframe for when it is most useable
Contextual - able to be associated with other data
Computers systems and software help us keep our data ARTC. For example, before the era of those great
technological advanced known as mobile phones and caller ID, people actually had to write names and phone
numbers down on paper in an address book. (I know it’s hard to believe, but true!) Storing, organizing and
retrieving information from these archaic address books was quite a challenge.
Notice I said retrieving information and not retrieving data? It is quite common for people to use the terms data
and information interchangeably despite there being a fundamental difference between the two concepts. BTW It is our civic duty as information lists to politely correct our mothers, fathers, neighbors, and postal carriers
whenever the terms data and information are bastardized.
Data are raw unprocessed facts. By itself data has no meaning and no structure. For example, a series of digits,
such as these 4439686 are just data. When data takes on meaning, because of some form of context, we call it:
I NFORMATION
Information is interpreted or processed data. It is the result of someone or something (like a computer) finding use
for data. Whenever someone or something incurs knowledge from data, that data is information. If I told you that
the data from the previous paragraph is my office phone number 443-9686, for example, then the data now has
meaning in context, so it is information.
Try to think of information as data that has been processed via context and/or manipulated in a way the result is
more useable for making better decisions. Remember, data by itself is useless. It’s the context that gives it
meaning, and hence makes it information. If I handed you an unlabeled CD-R disc what do you know about it? Not
much. You know there are bits and bytes on it, but that’s about it. The contents of that CD-R Disc are data. If you
Page 2
pop that CD-R, and it starts playing Barry Manilow’s greatest hits well, now you’ve got some sweet-sounding
information!
Here’s another, more systematic way to think of information. I’m sure somewhere along the line in your academic
careers you learned about the Information Processing Cycle (IPC). The IPC is the world’s most generic data-flow
diagram (DFD):
Figure 1: The information processing cycle, IPC
The input into a process is always data and the output of that process is always information in the context of the
process. Since the output of one process can be the input of another, information can be data; it truly is about the
context! Take this DFD for checking out a shopping cart for an e-commerce website, such as Amazon.com:
Figure 2: A DFD for checking out an E-shopping cart.
The middle arrow in this diagram is information from the first process and data to the second process!
The human brain is a powerful and efficient information processor; constantly placing information in context for
us, almost unconsciously. Once we learn the context behind the data, it is really difficult to think about it in any
other way. For example consider this data: $5,000 | mafudge@syr.edu | 911. It’s kind of hard to look at these and
not process them as 5 thousand dollars, your instructor’s email address, and the phone number for emergencies.
You interpret them incorrectly as information, even though they are actually data. Why? Because your mind has
already learned the context!
M ETADATA
As I said earlier, data itself has no meaning or structure, but on the other hand, I’m sure you’ve seen structured
data before. When I last represented my office phone number, I placed a hyphen between the 3rd and 4th digits,
like this 443-9686. What does that hyphen tell us about the data? What if I represented the data this way:
$4,439,686 does our knowledge of the $ symbol change our intrinsic interpretation of the data? Local phone US
Page 3
numbers are always 7 digits long. The $ symbol means currency. These are all data descriptors or “data about
data”•- they’re metadata!
Here are some things that metadata describes:







Data name - What name or label do we put on the data? What do we call the data? E.g. that’s a phone
number.
Data definition - How do we describe what the data is used for? What are some of its exceptions or
issues? E.g. Phone numbers are used to call people.
Data type - What are the allowable characters that can be used? E.g. Integers? Dates? Currency? Text?
Length - How many characters are allowed? E.g. 7? 10? Between 7 and 10?
Location - Where is the data allowed to live? What is its source? E.g. phone numbers are local to my
mobile phone.
Constraints -Which specific characters or string of characters are allowed? Does the data have to exist in
one location in order to be used in another? E.g. For example, an employee’s hourly wage must be larger
than or equal to the minimum wage.
Ownership - Who or what applications are allowed access to the data? E.g. only accessible by me.
Metadata is an important concept since all databases use structured data to organize and categorize data, and that
structure is metadata. Going back to the cell phone address book feature example from earlier, you can enter the
contact name, phone number, email, select an Icon for the number, etc. The contacts themselves represent data,
but they are structured into the categories of name, phone, and email. The categories are the meta-data, and the
actual names, phone numbers, and emails themselves are the data.
D ATA M ANAGEMENT
You’ve got data and information. You can structure it with metadata. But what good is data if you cannot read or
manipulate it? Data management is the process of storing, maintaining, and retrieving data. Yes, it is a process,
and the details of that process depend on the data and its structure (a.k.a. the metadata). How do you enter a new
contact into your mobile phone, for example? It is the same procedure for every mobile phone, or is it easier on
some phones than on others? Does every mobile phone ask for the same data (i.e. is structured with the same
metadata)?
There are 4 data management activities, cutely known as the “CRUD” operations:




Create - adding new data
Read - retrieving information
Update - modifying existing data
Delete - removing data
If we go back to the old address book example, people were responsible for their own data management under
this scenario. If someone’s phone number changed, you simply cross it out with a pen and write in a new one. If
you run out of room on one page, flip it over and use the next page. And forget keeping things in alphabetical
order in a PNP (pen-and-paper) address book. Over time, the data in your data got messy, making the “R” in CRUD
quite difficult!
Page 4
Figure 3: Paper makes for ineffective data management.
Today, computers assist with the data management activities greatly. We enter the data, and then technology will
capture organize, sort and filter the out the data into useful information. For example, most popular mobile
phones of today have a Facebook phonebook feature. This feature reads your http://www.facebook.com friend
list, and for any of your friends with phone numbers listed in their profile their name, profile picture and phone
numbers are added to your phonebook. Neato!
Figure 4: Technology trivializes data management.
Page 5
P ART 2: D ATABASES VS . DBMS
At this point you might be wondering: Are you going to define database or what? I already did. I just took my own
sweet time.  I’ll also discuss the differences between a database and a DBMS, as well as give you the current lay
of the DBMS land.
D ATABASE
A database is an organized collection of data and metadata, managed over a period of time. The data are what
we’re mainly interested in, so that we may retrieve information, typically via query (where we ask a question of
the data or perform a read in the CRUD operations). However, it is the metadata which is also important as it helps
describe and structure the data, making it convenient to query in the first place. For example you might search
your mobile phone contact list for last names beginning with “F”. If you’re database is not structured by last name
(using meta-data) it would be very difficult to query the data in this manner. Meta-data helps us determine what
data is there to query in the first place.
Databases are not one-time deals, and over time the data management activities CRUD are used to manipulate the
data within the database. Data within databases are persistent; they stick around in the database for as long and
they’re relevant and hence as long as we want or need them to.
So, to put it all together every database has:




Data: raw, unprocessed facts and
Metadata for structuring, constraining, and describing the data
Data management activities for performing the CRUD operations, which in turn...
Helps keep the data ARTC and allows us to retrieve information from it.
Page 6
Figure 5: Putting it all together - a picture’s worth 1,000 words. Well, at least 8 in this case :-)
When most of you think of the term database you’re more than likely envisioning a computerized database
implemented using software designed for that specific purpose - some sort of application with fancy entry screens
and pretty reports cobbled together in Microsoft Access, or Filemaker for instance. Software of this ilk is known as
database management systems (DBMS). However, it is important to realize that databases have existed long
before the computer was ever conceptualized. Of the databases that exist today, some are computerized, some
are not. Some use DBMS; some don’t. What do you think file cabinets we used for back in the day? 
IMPORTANT: A Database does not have to be computerized or digital. A database management system is
computer software which facilitates the use of databases
SO,
WHAT FEATURES DOES T HE
DBMS
BRING TO THE DATABASE PARTY ?
Again, I’d like to reiterate that anyone can make a computerized database using only Notepad, or better yet, a
spreadsheet. Of course by the same logic you can also dig a 3ft deep hole with a spoon. The DBMS is software
specifically suited to the task of database management, including the storage and retrieval of data, rules for
defining metadata, and of course the simplification of the data management (CRUD) tasks. Yes, the DBMS is to
databases what PhotoShop, the GIMP, or Flikr is to digital images, or better yet what plumbing is to civilization!
When you design a database using a DBMS, you get a whole lot more, such as these features of the modern
DBMS:

Robust metadata implementation. Meta-data can be defined to mimic actual business rules, perform
calculations, control how data is entered, and automatically change or delete data to maintain data
integrity. For example if a contact is removed from the database, that contact would also be removed
Page 7







W HAT
from any of their contact groups as well. Metadata management is one of the most significant advantages
the DBMS brings to the table.
Efficient and effective data management. This is another significant advantage of the DBMS. Meta-data
structures can be built without having to focus on how data will be stored. For example, to add a contact
to a database stored in a DBMS you only need to tell it what to do (add a contact with this data, please),
and not how to do it as you might have to do with a Notepad database - (put the contact data at the end
of the file, and write the first name, first, then the last name, then the phone number).
A Built-in query language. The built-in query language allows the database user to write ad-hoc queries to
get quick answers to questions, without the need to learn to program the specifics of how to read the
data. A question such as which contacts’ phone numbers are in area code 315 can be easily queried with a
DBMS.
Concurrent access to data. Multiple users can access and edit the same data at the same time courtesy of
the DBMS's built in transaction management and concurrency control mechanisms. This management is
completely transparent to the end user; if one person’s in the middle of editing a contact another person
needs the same data the 2nd person will wait a few microseconds until the update is complete before
seeing the data. Try that with notepad!
Logical abstraction of data. The meta-data describing how the data is internalized can be completely
independent of how the data is created, read, updated or deleted. This allows the database designer to
compartmentalize and package the data for groups of people in different ways, yet store it only once, in
the most efficient manner. Whoa… English please! Think of it this way - the presentation of the data, is
completely independent from its storage. As such a database administrator can create different means of
updating data for different users, but store it only once in the same data source.
Data Security. The DBMS is capable of controlling authentication and authorization to both the meta-data
and data. Robust DBMS security implementations permit the administrator to control access at the
individual data type level, for example allowing a group of users to only view contacts with area code 706
for example.
Backup and recovery management. Databases can be backed-up while in use and configured so that the
state of the data can be restored to any given point in time. Both are essential features to have if multiple
users are accessing an updating the data.
Communication interfaces. The DBMS allows remote systems and users to interact with it over a
computer network, both programmatically and interactively. While this makes the DBMS increasingly
complex, it provides for a greater degree of flexibility in its use. Users can place orders online via web
pages, while sales managers view real-time sales reports in a spreadsheet, for example.
ARE THE BENEFITS AND DRAWBACKS OF THE
DBMS?
With a feature list like this, there’s got to be some benefits, right? Sure, check these out:




Reduced data redundancy - even though the database management system doesn’t directly impose this
as a constraint it certainly made it easier to reduce the amount of duplicate data
Improved data consistency - the reduction in data redundancy greatly adds to data consistency. When
data lives in only one place it is far easier to maintain its integrity.
Improved data sharing - the DBMS made it easier to put data logically in one place. The users and IT
specialists didn’t have to perform extensive searches to locate the data. It also was easier to access the
data since the user didn’t have to be concerned with technical details on where and how the data was
stored.
Improved IT specialist productivity - the cost and time for producing database applications decreased
because the technical details for accessing the data are solved once and reused over and over. The IT folks
can concentrate on the specific business problem or opportunity. Many DBMSs come with an integrated
development environment (IDE) that provides the IT specialist with a rich set of tools for designing and
building database applications.
Page 8




Enforcement of Standards - even though the DBMS doesn’t enforce the standards directly, it does
provide a single location for monitoring and enforcement by application development personnel.
Improved data quality - the DBMS has a set of rule and constraint capabilities that database designers can
use to enforce integrity of the data. The beauty is that these rules/constraints are applied once, at the
data-level, not at the program-level. So any access to the data automatically gets the rules applied,
independently of the person or program adding, updating or requesting the data.
Improved data accessibility - end users without programming experience can access data. The technical
complexities have been reduced substantially and the use of a declarative query language makes it easier
and more practical for the non-IT specialist to access the data.
Reduced program maintenance - with the metadata and the data rule and integrity constraints designed
into the database there is less program code to maintain. Programs are generally smaller and lesscomplicated.
But there are also some drawbacks, mainly in the area of managerial complexity such as:



Specialized personnel - depending on the size and complexity of the organization you may need to add a
number of specialists to your staff: data analysts, database designers, database developers and database
administrators to name a few. We’ll talk more about the specific skills sets required for successful
database implementations later in the semester.
Increased installation and management costs - DBMS software and their associated personnel don’t
come cheap. For large, complex multi-user organizations plan on substantial hardware, software,
personnel and training costs. Check out: http://www.intranetjournal.com/articles/200704/ij_04_25_07a.html
Organization conflict - on the social engineering side, plan on a significant pushback on agreement to data
definitions, data formats, data ownership, data values and data access. Keep in mind that putting data in
one logical location coupled with easier access will cause some management issues that will need to be
dealt with. Unifying data sources seems like a no-brainer idea, but it will add some managerial overhead
as there are multiple hands in the pot, so to speak.
W OW THE DBMS
DOES SO MUCH . I T SLICES . I T DICES . I T DOES
W INDOWS !
Sure the benefits seem to outweigh the drawbacks, but like those infomercials, there’s a catch. There’s always a
catch. A great DBMS does not make a great database. Only a solid database design makes a great database, and
the DBMS only helps facilitate that database design. So the question becomes:
HOW DO I DESIGN A GOOD DATABASE?
The answer to that question is a major goal of this course. To teach you to become a good database designer is
one of the key objectives of any course on databases. A good database design begins with assessing the needs of
the database, including the data requirements and intended uses, and from these requirements are drafted a data
model. For example, is it a transactional database with frequent updates, or will it compile historic and statistical
information? After the needs are assessed, they are formalized into a conceptual design model. This conceptual
model represents a formalization of the database requirements. The next step is to transform the conceptual
design into a structural blueprint for the database. This logical design model is independent of a particular DBMS
product, so it still maintains a degree of flexibility. And lastly we implement the logical model of our database in
the DBMS of our choice.
D ATA M ODELS : D EGREES
OR
“L AYERS ”
OF
D ATA A BSTRACTION
Page 9
Design models represent the various degrees of abstraction for the same database. They provide different
perspectives on what is essentially the same thing. For example the way you use, understand, and interact with
your car is much different from how a mechanic or automotive engineer does. This sample principle holds true for
databases. How you view a database depends on your skillet and needs.





The conceptual design model is the highest level of abstraction for a database. It represents the global
view of the database requirements from the user’s perspective. Items in the conceptual model represent
what needs to be done, and conceptual designs serve as a communications tool between the database
designer and her customers. The conceptual layer is abstract and is not implemented by any DBMS.
The logical design model is the application of a database model such as (relational, hierarchical, objectoriented, object-relational, etc…) to the conceptual model. In this class we will take the conceptual design
and “Relationalize” it by applying relational theory, yielding a logical model. Logical designs are
independent of any DBMS implementation and therefore not tied into a particular DBMS. Logical design is
a technical endeavor; not suited for communication with the customer, but rather the technician who will
implement the database.
The internal implementation model represents how the database looks as when implemented in
software. With respect to what we will do in the class, the internal implementation model is SQL table
designs used to model the original requirements of the database. The internal DM is dependent on the
DBMS software used to implement the database. While the logical design of a database is the same, it
would look somewhat different when implemented on one DBMS such as Oracle over another, say,
Microsoft Access.
The external implementation model represents the end-user’s view of the internal data model. It is a
technical representation of the data which abstracts the complexity of the internal data model, and often
corresponds with the original conceptual model. For example, a conceptual database design might have a
business rule “customer places order.”•While the representation of this business rule in the internal
implementation model might be very complex, the DBMS allows us to abstract this complexity back to its
original simplicity found in the conceptual model.
The physical data model represents how the database is implemented though software (for example, in
MS Access the entire DB is in an MDB file, and the LDB file controls locking on the database. In general
computer scientists, database administrators, software engineers and system administrators are
concerned of issues with the physical data model.
Figure 6 Data Models and their abstractions
Page 10
Download