Introduction to database

advertisement

黃獻華

B89901164

Group: 5

網多報告

Introduction to database

Database today are so important to every organization. They are used to maintain internal records about all the information of the organization such as all the members’ records. Just imagine when they still use traditional method to store the information may be by using data book. It may waste a lot of paper for there might be repetitive data in it. Beside that, it might also consume lots of time just to search for member information. However, gladly we can take use of database which gives us lots of convenience in storing as well as querying for information from tons of data we have. In this report, I want to introduce what database is and how it works.

Firstly, it is better for us to recognize the difference between data and information. Often the terms data and information are used interchangeably, but they are distinctly different. Data are raw, unsummarized, and unanalyzed facts. Information is data that have been processed into a meaningful form. A list of a supermarket’s daily receipts is data, but it is not information, because it is too detailed to be very useful. Summarizing the data to give daily departmental totals is information because the store manager can use the report to monitor store performance. The same report would be data for a regional manager because again it is too detailed for meaningful decision making at the regional level.

Information for a regional manager may be a weekly report of sales by the department for each supermarket. Data are always data, but one person’s information can be another person’s data.

Information that is meaningful to one person can be too detailed for another person. A manager ’s notion of information can change quickly, however. If a problem is identified, a manager might request finer levels of detail to diagnose the problem’s cause. Thus what was previously data suddenly becomes information because it helps solve the problem. When the problem is solved, the information reverts to data. There is a need for information systems that let managers customize the processing of data so that they always get information. As their needs change, they need to be able to adjust the detail of the reports they receive. For a clearer understanding, here is another example:

When we’re playing puzzle, we can think that a little piece of the puzzle as data. But after we arrange them into a complete picture, we can think that as information since it is what we want. So, we can

simply think that to obtain information, all the collected data should be processed into a form that is useful, that is information.

A database is a collection of information that is organized so that it can be accessed, managed and updated. In one view, database can be classified according to the types of the content, such as full text, numeric and images. We can simply think database as an electric shelf in where the computer files are stored. In common parlance, the term database refers to a collection of data that is managed by

DBMS (Database management system). To describe about database system, let us take look of this figure.

It consists of: users, hardware, software and data. For example there is a client that want to access the database, thus he/she is the user. He/she absolutely must use his/her computer application to access the database. In order to access database, DBMS should be used as the interface between client’s application software and the database itself. Then, DBMS will do the required action to the database.

The DBMS is expected to:

Allow users to create new databases and specify their schema (logical structure of the data), using a specialized language called a data-definition language.

Give users the ability to query the data (a query is database lingo for a question about the data) and modify the data, using an appropriate language, often called a query language or data-manipulation language.

Support the storage of very large amounts of data, many gigabytes or more, over a long period of time, keeping it secure from accident or unauthorized use and allowing efficient access to the data for queries and database modifications.

Control access to data from many users at once, without allowing the actions of one user to affect other users and without allowing simultaneous accesses to corrupt the data accidentally.

The capabilities that a DBMS provides the user are:

Persistent storage. Like a file system, a DBMS supports the storage of very large amounts of data that exists independently of any processes that are using data. However, the DBMS goes far beyond the file system in providing flexibility, such as data structures that support efficient access to very large amounts of data.

Programming interface. A DBMS allows the user or an application program to access and modify data through a powerful query language. Again, the advantage of a DBMS over a file system is the flexibility to manipulate stored data in much more complex ways than the reading and writing of files.

Transaction management. A DBMS supports concurrent access to data, especially simultaneous access by many distinct processes (called transactions) at once. To avoid some of the undesirable consequences of simultaneous access, the DBMS supports isolation, the appearance that transactions execute one-at-a-time, and atomicity, the requirement that transactions execute either completely or not at all. A DBMS also supports durability, the ability to recover from failures or errors of many types.

The reasons why we need database:

Compactness

Speed

Less drudgery

Currency

Reducing repetitive data

Consistency

Independent data

Shared data

Security

As an example, let us consider the most popular search engine http://www.google.com since it also use large database to store websites contents inside it. From a user view, not every website stored is important. But when a user tried to search a certain topic using that search engine, the result given can be regarded as information since they are useful to the user.

Next, we consider the simple relational database model of tables with tuples (rows) and attributes

(columns). A popular way to design a database is through ER diagrams, and we look at a sample diagram. There are three typical implementation models of databases: hierarchical, network, and relational. Each is based on the notion of data stored as a set of records (imagine a set of file cards, for example). Hierarchical (e.g., IMS) and network (e.g., IDMS) models are based on traversing data links to process a database; they are typically used for large mainframe systems and are not considered further here. We focus on relational database management systems. They have become popular, perhaps largely due to their simple data model:

Data is presented as a collection of relations

Each relation is depicted as a table

Columns are attributes

Rows ("tuples") represent entities

Every table has a set of attributes that taken together as a "key" (technically, a "superkey") uniquely identifies each entity

For example, a company might have an Employee table with a row for each employee. What attributes might be interesting? This, of course, depends on the application and use the data will be put to, and is determined at database design time. In our example, we might have a payroll application and need salary and mailing address information.

Just as a side note, the notion of view can be useful. Imagine that a company maintains a database of its employees -- there might be a lot of attributes like age, salary, emergency contacts, appraisal, etc.

There may be needs to look at the database for different applications serving different users. The company may need to make available demographic data, for example, to a governmental agency. Only some of the attributes need be supplied - and others ought not to so as to protect privacy. Different views can be provided into the same data; in a relational database management systems, a view can be seen as yet another table.

An entity is some object with a real or conceptual existence in the world -- "tofu", "Advanced Java

Class", "Folger Museum", "Elaine", "company", for example. An attribute is a property of an entity --

"address", "size", "mother", "age", for example. As mentioned above, a relational column is an attribute.

A relationship defines roles in which entities work together -- "Bill WORKS-FOR Motorola", "jbs

TEACHES advanced-java". relational database management systems represent relationships as tables.

A side note for those already familiar with normalizing databases - ER design has been shown (Eugene

Wong) to give relations in third normal form. Also, ER diagrams can be mapped not just to relational database management systems, but also to the network and hierarchical models.

A relational database management system should be able to query data, update data and control data. In order to apply this function, there must be a language to describe what the client wants, that is SQL language. SQL is both a Data Definition Language (DDL) and a Data Manipulation Language (DML).

As a DDL, it allows a database administrator or database designer to define tables, create views, etc. As a DML, it allows an end user to retrieve information from tables. It came from an IBM Research project entitled "SEQUEL" where the intent was to create a structured English-like query language to interface to the early System R database system. Along with QUEL, SQL was the first high level declarative database language. There are two ways to use SQL, which are directly using SQL online and by using high level language such as C+, FORTRAN, etc.

Database application that may commonly be used are Oracle, Informix, Sybase, Microsoft Access,

Postgresql, Mysql.

Finally, let us see what data mining is. Organization are accumulating vast volumes of data because of the implementation of technology that makes it easier and cheaper to collect data. The world’s data are estimated to be doubling every 20 months, and many large companies now routinely manage terabytes(10 12 ) of data. Thus, we can see that the growth of data is too fast compared to the ability of human to be able to analyze the information inside those massive data collection. We are drowning in data, but starving for knowledge! That’s why we need data mining.

Data mining, the extraction of hidden predictive information from large databases, is a powerful new technology with great potential to help companies focus on the most important information in their data warehouses. Data mining tools predict future trends and behaviors, allowing businesses to make proactive, knowledge-driven decisions. The automated, prospective analyses offered by data mining move beyond the analyses of past events provided by retrospective tools typical of decision support systems. Data mining tools can answer business questions that traditionally were too time consuming to resolve. They scour databases for hidden patterns, finding predictive information that experts may miss because it lies outside their expectations.

Data mining techniques are the result of a long process of research and product development. This evolution began when business data was first stored on computers, continued with improvements in data access, and more recently, generated technologies that allow users to navigate through their data in real time. Data mining takes this evolutionary process beyond retrospective data access and navigation to prospective and proactive information delivery. Data mining is ready for application in the business community because it is supported by three technologies that are now sufficiently mature:

Massive data collection

Powerful multiprocessor computers

Data mining algorithms

References:

Data Management – Richard T. Watson

A first course in database systems – Jeffrey D. Ullman & Jennifer Widom

 資料庫系統概論 – C.J. Date

 http://www.gss.com.tw/eis/12/datamini.htm

Download