T01-notes-the_database_environment

IST459: NOTES : THE DATABASE ENVIRONMENT T OPIC : T HE D ATABASE E NVIRONMENT T ABLE OF C ONTENTS Topic: The Database Environment .................................................................................................................................1 Learning Objectives ...................................................................................................................................................1 Part 1: Databases: It’s all about the data or *is* it? ..................................................................................................2 Data .......................................................................................................................................................................2 Information ............................................................................................................................................................2 Metadata ...............................................................................................................................................................3 Data Management .................................................................................................................................................4 Part 2: Databases vs. DBMS .......................................................................................................................................6 Database ................................................................................................................................................................6 So, what features does the DBMS bring to the database party? ..........................................................................7 What are the benefits and drawbacks of the DBMS? ............................................................................................8 Wow the DBMS does so much. It slices. It dices. It does Windows! .....................................................................9 Data Models: Degrees or “Layers” of Data Abstraction ........................................................................................9 L EARNING O BJECTIVES In this learning unit we will learn the fundamental concepts which will lay the foundation for the rest of the course. Some of these objectives will be covered in this document, others in the class lecture, assigned readings, and labs.       Concertize the concepts of data, information, data management and metadata Explain what a database is and why databases are important Describe a database management system Differentiate between the DBMS and a database Describe the different data models and abstraction layers Explain the similarities and differences among DBMS products Page 1   Explain DBMS history and modern uses Describe how data is physically stored in primary and secondary storage P ART 1: D ATABASES : I T ’ S ALL ABOUT THE DATA OR * IS * IT ? Are databases really all about the data? Well, not really. As you will see, data are just one piece of the puzzle. And to truly differentiate between what a database is what it’s not, you must first have a clear understanding of these four fundamentals: Data, Information, Metadata, and Data Management. D ATA What is data? Data is a generic label for the attributes, facts, figures, measurements or characteristics that describe real world or super-natural objects or entities. Data are typically objects like people, places, things, events or ideas that we care to store for a specific application or purpose. Data can be very useful or it can cause challenges which lead to bad decision-making and high data management costs. There are four characteristics of data that we need to consider. For data to be useful it needs to be ARTC, pronounced “artsy”:     Accurate - correctly represent an actual entity attribute Relevant - germane or pertinent to the entity being described Timely - within the timeframe for when it is most useable Contextual - able to be associated with other data Computers systems and software help us keep our data ARTC. For example, before the era of those great technological advanced known as mobile phones and caller ID, people actually had to write names and phone numbers down on paper in an address book. (I know it’s hard to believe, but true!) Storing, organizing and retrieving information from these archaic address books was quite a challenge. Notice I said retrieving information and not retrieving data? It is quite common for people to use the terms data and information interchangeably despite there being a fundamental difference between the two concepts. BTW It is our civic duty as information lists to politely correct our mothers, fathers, neighbors, and postal carriers whenever the terms data and information are bastardized. Data are raw unprocessed facts. By itself data has no meaning and no structure. For example, a series of digits, such as these 4439686 are just data. When data takes on meaning, because of some form of context, we call it: I NFORMATION Information is interpreted or processed data. It is the result of someone or something (like a computer) finding use for data. Whenever someone or something incurs knowledge from data, that data is information. If I told you that the data from the previous paragraph is my office phone number 443-9686, for example, then the data now has meaning in context, so it is information. Try to think of information as data that has been processed via context and/or manipulated in a way the result is more useable for making better decisions. Remember, data by itself is useless. It’s the context that gives it meaning, and hence makes it information. If I handed you an unlabeled CD-R disc what do you know about it? Not much. You know there are bits and bytes on it, but that’s about it. The contents of that CD-R Disc are data. If you Page 2 pop that CD-R, and it starts playing Barry Manilow’s greatest hits well, now you’ve got some sweet-sounding information! Here’s another, more systematic way to think of information. I’m sure somewhere along the line in your academic careers you learned about the Information Processing Cycle (IPC). The IPC is the world’s most generic data-flow diagram (DFD): Figure 1: The information processing cycle, IPC The input into a process is always data and the output of that process is always information in the context of the process. Since the output of one process can be the input of another, information can be data; it truly is about the context! Take this DFD for checking out a shopping cart for an e-commerce website, such as Amazon.com: Figure 2: A DFD for checking out an E-shopping cart. The middle arrow in this diagram is information from the first process and data to the second process! The human brain is a powerful and efficient information processor; constantly placing information in context for us, almost unconsciously. Once we learn the context behind the data, it is really difficult to think about it in any other way. For example consider this data: $5,000 | mafudge@syr.edu | 911. It’s kind of hard to look at these and not process them as 5 thousand dollars, your instructor’s email address, and the phone number for emergencies. You interpret them incorrectly as information, even though they are actually data. Why? Because your mind has already learned the context! M ETADATA As I said earlier, data itself has no meaning or structure, but on the other hand, I’m sure you’ve seen structured data before. When I last represented my office phone number, I placed a hyphen between the 3rd and 4th digits, like this 443-9686. What does that hyphen tell us about the data? What if I represented the data this way: $4,439,686 does our knowledge of the $ symbol change our intrinsic interpretation of the data? Local phone US Page 3 numbers are always 7 digits long. The $ symbol means currency. These are all data descriptors or “data about data”•- they’re metadata! Here are some things that metadata describes:        Data name - What name or label do we put on the data? What do we call the data? E.g. that’s a phone number. Data definition - How do we describe what the data is used for? What are some of its exceptions or issues? E.g. Phone numbers are used to call people. Data type - What are the allowable characters that can be used? E.g. Integers? Dates? Currency? Text? Length - How many characters are allowed? E.g. 7? 10? Between 7 and 10? Location - Where is the data allowed to live? What is its source? E.g. phone numbers are local to my mobile phone. Constraints -Which specific characters or string of characters are allowed? Does the data have to exist in one location in order to be used in another? E.g. For example, an employee’s hourly wage must be larger than or equal to the minimum wage. Ownership - Who or what applications are allowed access to the data? E.g. only accessible by me. Metadata is an important concept since all databases use structured data to organize and categorize data, and that structure is metadata. Going back to the cell phone address book feature example from earlier, you can enter the contact name, phone number, email, select an Icon for the number, etc. The contacts themselves represent data, but they are structured into the categories of name, phone, and email. The categories are the meta-data, and the actual names, phone numbers, and emails themselves are the data. D ATA M ANAGEMENT You’ve got data and information. You can structure it with metadata. But what good is data if you cannot read or manipulate it? Data management is the process of storing, maintaining, and retrieving data. Yes, it is a process, and the details of that process depend on the data and its structure (a.k.a. the metadata). How do you enter a new contact into your mobile phone, for example? It is the same procedure for every mobile phone, or is it easier on some phones than on others? Does every mobile phone ask for the same data (i.e. is structured with the same metadata)? There are 4 data management activities, cutely known as the “CRUD” operations:     Create - adding new data Read - retrieving information Update - modifying existing data Delete - removing data If we go back to the old address book example, people were responsible for their own data management under this scenario. If someone’s phone number changed, you simply cross it out with a pen and write in a new one. If you run out of room on one page, flip it over and use the next page. And forget keeping things in alphabetical order in a PNP (pen-and-paper) address book. Over time, the data in your data got messy, making the “R” in CRUD quite difficult! Page 4 Figure 3: Paper makes for ineffective data management. Today, computers assist with the data management activities greatly. We enter the data, and then technology will capture organize, sort and filter the out the data into useful information. For example, most popular mobile phones of today have a Facebook phonebook feature. This feature reads your http://www.facebook.com friend list, and for any of your friends with phone numbers listed in their profile their name, profile picture and phone numbers are added to your phonebook. Neato! Figure 4: Technology trivializes data management. Page 5 P ART 2: D ATABASES VS . DBMS At this point you might be wondering: Are you going to define database or what? I already did. I just took my own sweet time.  I’ll also discuss the differences between a database and a DBMS, as well as give you the current lay of the DBMS land. D ATABASE A database is an organized collection of data and metadata, managed over a period of time. The data are what we’re mainly interested in, so that we may retrieve information, typically via query (where we ask a question of the data or perform a read in the CRUD operations). However, it is the metadata which is also important as it helps describe and structure the data, making it convenient to query in the first place. For example you might search your mobile phone contact list for last names beginning with “F”. If you’re database is not structured by last name (using meta-data) it would be very difficult to query the data in this manner. Meta-data helps us determine what data is there to query in the first place. Databases are not one-time deals, and over time the data management activities CRUD are used to manipulate the data within the database. Data within databases are persistent; they stick around in the database for as long and they’re relevant and hence as long as we want or need them to. So, to put it all together every database has:     Data: raw, unprocessed facts and Metadata for structuring, constraining, and describing the data Data management activities for performing the CRUD operations, which in turn... Helps keep the data ARTC and allows us to retrieve information from it. Page 6 Figure 5: Putting it all together - a picture’s worth 1,000 words. Well, at least 8 in this case :-) When most of you think of the term database you’re more than likely envisioning a computerized database implemented using software designed for that specific purpose - some sort of application with fancy entry screens and pretty reports cobbled together in Microsoft Access, or Filemaker for instance. Software of this ilk is known as database management systems (DBMS). However, it is important to realize that databases have existed long before the computer was ever conceptualized. Of the databases that exist today, some are computerized, some are not. Some use DBMS; some don’t. What do you think file cabinets we used for back in the day?  IMPORTANT: A Database does not have to be computerized or digital. A database management system is computer software which facilitates the use of databases SO, WHAT FEATURES DOES T HE DBMS BRING TO THE DATABASE PARTY ? Again, I’d like to reiterate that anyone can make a computerized database using only Notepad, or better yet, a spreadsheet. Of course by the same logic you can also dig a 3ft deep hole with a spoon. The DBMS is software specifically suited to the task of database management, including the storage and retrieval of data, rules for defining metadata, and of course the simplification of the data management (CRUD) tasks. Yes, the DBMS is to databases what PhotoShop, the GIMP, or Flikr is to digital images, or better yet what plumbing is to civilization! When you design a database using a DBMS, you get a whole lot more, such as these features of the modern DBMS:  Robust metadata implementation. Meta-data can be defined to mimic actual business rules, perform calculations, control how data is entered, and automatically change or delete data to maintain data integrity. For example if a contact is removed from the database, that contact would also be removed Page 7        W HAT from any of their contact groups as well. Metadata management is one of the most significant advantages the DBMS brings to the table. Efficient and effective data management. This is another significant advantage of the DBMS. Meta-data structures can be built without having to focus on how data will be stored. For example, to add a contact to a database stored in a DBMS you only need to tell it what to do (add a contact with this data, please), and not how to do it as you might have to do with a Notepad database - (put the contact data at the end of the file, and write the first name, first, then the last name, then the phone number). A Built-in query language. The built-in query language allows the database user to write ad-hoc queries to get quick answers to questions, without the need to learn to program the specifics of how to read the data. A question such as which contacts’ phone numbers are in area code 315 can be easily queried with a DBMS. Concurrent access to data. Multiple users can access and edit the same data at the same time courtesy of the DBMS's built in transaction management and concurrency control mechanisms. This management is completely transparent to the end user; if one person’s in the middle of editing a contact another person needs the same data the 2nd person will wait a few microseconds until the update is complete before seeing the data. Try that with notepad! Logical abstraction of data. The meta-data describing how the data is internalized can be completely independent of how the data is created, read, updated or deleted. This allows the database designer to compartmentalize and package the data for groups of people in different ways, yet store it only once, in the most efficient manner. Whoaâ€¦ English please! Think of it this way - the presentation of the data, is completely independent from its storage. As such a database administrator can create different means of updating data for different users, but store it only once in the same data source. Data Security. The DBMS is capable of controlling authentication and authorization to both the meta-data and data. Robust DBMS security implementations permit the administrator to control access at the individual data type level, for example allowing a group of users to only view contacts with area code 706 for example. Backup and recovery management. Databases can be backed-up while in use and configured so that the state of the data can be restored to any given point in time. Both are essential features to have if multiple users are accessing an updating the data. Communication interfaces. The DBMS allows remote systems and users to interact with it over a computer network, both programmatically and interactively. While this makes the DBMS increasingly complex, it provides for a greater degree of flexibility in its use. Users can place orders online via web pages, while sales managers view real-time sales reports in a spreadsheet, for example. ARE THE BENEFITS AND DRAWBACKS OF THE DBMS? With a feature list like this, there’s got to be some benefits, right? Sure, check these out:     Reduced data redundancy - even though the database management system doesn’t directly impose this as a constraint it certainly made it easier to reduce the amount of duplicate data Improved data consistency - the reduction in data redundancy greatly adds to data consistency. When data lives in only one place it is far easier to maintain its integrity. Improved data sharing - the DBMS made it easier to put data logically in one place. The users and IT specialists didn’t have to perform extensive searches to locate the data. It also was easier to access the data since the user didn’t have to be concerned with technical details on where and how the data was stored. Improved IT specialist productivity - the cost and time for producing database applications decreased because the technical details for accessing the data are solved once and reused over and over. The IT folks can concentrate on the specific business problem or opportunity. Many DBMSs come with an integrated development environment (IDE) that provides the IT specialist with a rich set of tools for designing and building database applications. Page 8     Enforcement of Standards - even though the DBMS doesn’t enforce the standards directly, it does provide a single location for monitoring and enforcement by application development personnel. Improved data quality - the DBMS has a set of rule and constraint capabilities that database designers can use to enforce integrity of the data. The beauty is that these rules/constraints are applied once, at the data-level, not at the program-level. So any access to the data automatically gets the rules applied, independently of the person or program adding, updating or requesting the data. Improved data accessibility - end users without programming experience can access data. The technical complexities have been reduced substantially and the use of a declarative query language makes it easier and more practical for the non-IT specialist to access the data. Reduced program maintenance - with the metadata and the data rule and integrity constraints designed into the database there is less program code to maintain. Programs are generally smaller and lesscomplicated. But there are also some drawbacks, mainly in the area of managerial complexity such as:    Specialized personnel - depending on the size and complexity of the organization you may need to add a number of specialists to your staff: data analysts, database designers, database developers and database administrators to name a few. We’ll talk more about the specific skills sets required for successful database implementations later in the semester. Increased installation and management costs - DBMS software and their associated personnel don’t come cheap. For large, complex multi-user organizations plan on substantial hardware, software, personnel and training costs. Check out: http://www.intranetjournal.com/articles/200704/ij_04_25_07a.html Organization conflict - on the social engineering side, plan on a significant pushback on agreement to data definitions, data formats, data ownership, data values and data access. Keep in mind that putting data in one logical location coupled with easier access will cause some management issues that will need to be dealt with. Unifying data sources seems like a no-brainer idea, but it will add some managerial overhead as there are multiple hands in the pot, so to speak. W OW THE DBMS DOES SO MUCH . I T SLICES . I T DICES . I T DOES W INDOWS ! Sure the benefits seem to outweigh the drawbacks, but like those infomercials, there’s a catch. There’s always a catch. A great DBMS does not make a great database. Only a solid database design makes a great database, and the DBMS only helps facilitate that database design. So the question becomes: HOW DO I DESIGN A GOOD DATABASE? The answer to that question is a major goal of this course. To teach you to become a good database designer is one of the key objectives of any course on databases. A good database design begins with assessing the needs of the database, including the data requirements and intended uses, and from these requirements are drafted a data model. For example, is it a transactional database with frequent updates, or will it compile historic and statistical information? After the needs are assessed, they are formalized into a conceptual design model. This conceptual model represents a formalization of the database requirements. The next step is to transform the conceptual design into a structural blueprint for the database. This logical design model is independent of a particular DBMS product, so it still maintains a degree of flexibility. And lastly we implement the logical model of our database in the DBMS of our choice. D ATA M ODELS : D EGREES OR “L AYERS ” OF D ATA A BSTRACTION Page 9 Design models represent the various degrees of abstraction for the same database. They provide different perspectives on what is essentially the same thing. For example the way you use, understand, and interact with your car is much different from how a mechanic or automotive engineer does. This sample principle holds true for databases. How you view a database depends on your skillet and needs.      The conceptual design model is the highest level of abstraction for a database. It represents the global view of the database requirements from the user’s perspective. Items in the conceptual model represent what needs to be done, and conceptual designs serve as a communications tool between the database designer and her customers. The conceptual layer is abstract and is not implemented by any DBMS. The logical design model is the application of a database model such as (relational, hierarchical, objectoriented, object-relational, etc…) to the conceptual model. In this class we will take the conceptual design and “Relationalize” it by applying relational theory, yielding a logical model. Logical designs are independent of any DBMS implementation and therefore not tied into a particular DBMS. Logical design is a technical endeavor; not suited for communication with the customer, but rather the technician who will implement the database. The internal implementation model represents how the database looks as when implemented in software. With respect to what we will do in the class, the internal implementation model is SQL table designs used to model the original requirements of the database. The internal DM is dependent on the DBMS software used to implement the database. While the logical design of a database is the same, it would look somewhat different when implemented on one DBMS such as Oracle over another, say, Microsoft Access. The external implementation model represents the end-user’s view of the internal data model. It is a technical representation of the data which abstracts the complexity of the internal data model, and often corresponds with the original conceptual model. For example, a conceptual database design might have a business rule “customer places order.”•While the representation of this business rule in the internal implementation model might be very complex, the DBMS allows us to abstract this complexity back to its original simplicity found in the conceptual model. The physical data model represents how the database is implemented though software (for example, in MS Access the entire DB is in an MDB file, and the LDB file controls locking on the database. In general computer scientists, database administrators, software engineers and system administrators are concerned of issues with the physical data model. Figure 6 Data Models and their abstractions Page 10

T01-notes-the_database_environment

Related documents

Products

Support

T01-notes-the_database_environment

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib