What is a database?
shared file containing integrated data with controlled redundancy
Often implemented as a group of related tables (examples on pages 4-5)
Relationships between tables often implemented as other tables.
Database:
A collection of tables as in the previous slide
Database management system (DBMS):
Database plus tools to process requests, enforce integrity constraints, provide security, analyze usage, optimize access, etc.
Database application.
Software that accesses data from a database.
Relation:
2-dimensional table having the following properties
Table entries are single valued or atomic.
This means that an entry does not consist of a structure more complex than strings, dates, etc.
For example, a column type cannot be another table or relation.
entries in any one column are of the same type
Each column, also called attribute, has a name order of columns & rows is unimportant no two rows are identical
NOTE: The purist may disagree with this definition as he or she will see a table as an implementation of a relation.
A relation is more of an abstract concept but is usually implemented using a table.
Some do use the terms interchangeably.
reduce redundancy and inconsistency. e.g. most student information from the tables on page 5 is not replicated.
One fact is in one place.
Data is shared
Security is applied centrally
language independent (COBOL, VB,
C++, java, C#, etc)
Multiple applications
Easier to maintain integrity data independence allow data access without knowledge of its internal organization and structure
This is an important decision, and based on a set of rules known as normalization, which we will cover later.
However, the figure on page 18 illustrates a simple example related to an important issue.
1-8
Used to extract information from one or more tables.
Can specify what you want without specifying how to get it.
Example on p. 9
Can be standalone or embedded in application software.
General format
Select stuff
From one or more tables
Where conditions
Ranges from nearly trivial to fairly complex logic
metadata (Sysfiles) - also data dictionary.
Description of all tables in the database.
Example on p. 12.
More of a concern for a Database
Administrator (DBA).
Know what it is - but we will not focus on it.
Database is stored on a server
Application software often runs on a client using languages such as C# and others.
Typically written by application programmers.
Stored Procedure.
Procedures stored on a server.
Typically written by DB people.
Can be invoked by client applications.
Can be used for common activities used by multiple users.
Some DBAs may limit database access except through certain stored procedures.
It gives them more control.
Trigger.
A special type of procedure that is invoked automatically when a certain action occurs.
Can be used to make sure needed data elements are updated due to user actions.
Example: A student adds a class and a trigger is activated to update tuition & fees.
The app that adds the student does not know about the trigger but the DBMS knows.
Some terms:
Entity-Relationship (E-R) Diagrams.
Entities are somewhat like the classes you’ve designed in previous courses.
Relationships define how entities are related to one another.
Together they must reflect the reality as it is understood.
Design phase:
Design entities, relationships, and constraints consistent with perceived reality.
Test phase:
Create tables, stored procedures, triggers, forms, reports, etc. consistent with the E-R diagram and test.
Implementation phase:
Put into production
Hierarchical model
IMS (Information Management System)
Developed largely by IBM
Required all data be organized as a hierarchy
Tended to be awkward since not all realities are hierarchical in nature
network model (CODASYL-
Conference on Data Systems
Languages)
Data organized into complex graph
(network) data structures.
Application programs reflected the actual data structures.
Changes in design potentially affect ALL applications – costly!!
Relational Model (dominant form today)
Object Model (not a commercial success).
See table on p. 21 for a general history – also the prose on p. 23.
We will NOT cover web-based databases/services since we already have a course for that.
Edgar Codd’s landmark paper in CACM
(Communications of the Association of
Computing Machinery) A relational Model of data for large shared databases in 1970.
Codd, a mathematician working for IBM in
San Jose CA, applied concepts of relational algebra to the problem of a
“stored data bank”.
Paved the way for the development of the relational database.
Mapping objects to relational databases.
E.g. how does an object oriented program access non-object-oriented data in a relational database?
http://www.agiledata.org/essays/mapp ingObjects.html
We will see how this works when we discuss ADO.NET
We will not focus on complex data structures but there are a few things you must be aware of.
This appendix is online in a zipped file at http://wps.prenhall.com/bp_kroenke_datab ase_11/127/32761/8386898.cw/index.html
Contains concentric magnetic tracks on each surface.
Each track is divided into sectors .
Disk head moves radially inward and outward while the disk rotates .
Disks are SLOW and a potential bottleneck
Need to minimize disk head movement
(seek time) for optimal performance.
Rotational delays (time for sector to rotate past the head) also a factor
Linked List:
Database records, disk sectors, or clusters of sectors are maintained in a linear linked list.
Simple but can be very slow, especially for finding a record based on a key or index value.
i.e. find an employee record given the employee’s ID.
See pages D-3 and D-4 of the appendix.
Indexes: list of field values that identify records along with the location of that record.
List can be linear or some other structure.
e.g. a textbook often has a linear index at the end
Searching the index is a lot faster than searching through all of the content.
However, for many millions of records a linear index can still be inefficient.
B-tree hierarchical arrangement of index values.
Provides quick access & order to data.
See pages D-5 and D-6 of the appendix.
Typically each level would correspond to a sector or cluster.
Might have millions or records accessible via only a few index layers.
[ http://technet.microsoft.com/enus/library/cc917672.aspx
Hash function
Index value (sometimes called a key) fed into a hash function which specifies where to store the entire record.
To locate a record, given its key, apply the hash function to the key value and the location is calculated.
R is a record
R.k
is the value of R’s key field
H is a hash function that calculates a location
H (R.k)
Hash table (database records)
R is stored here
Time to find a record can be independent of the number of records.
Assumes a good hash function and sufficient space.
Each are nontrivial and the subject of a course in data structures or algorithms.
Dense
1 entry for each record
Useful if records are stored in random order
Non-dense
1 index for a group of records (say 1 on a page)
Useful if records are maintained in order
Internal physical storage conceptual, sometimes the DBA view a collection of Base Tables
Base table is a table with a direct underlying storage structure.
Created from the E-R diagram described by a data dictionary or metadata .
External, sometimes user view
Collection of tables defined for a particular user using SQL.
They are called logical tables , virtual tables, or derived tables , or just view
The data in a view is presented to the user as a single table though it is actually derived from one or more base tables specified in its definition.
These logical tables do not exist in the same sense as a base table – there is
NO direct underlying storage structure.
A view can simplify the user’s view of the database and provide security by hiding certain parts of a base table.
Examples:
Table consisting of student with a given major or GPA value) the single table in Fig 1-20 (page 18) could be a view derived by joining the two other base tables in the same figure.
DBA user user user
DBA (Database Administrator) defines conceptual schema, internal schema, user liaison, security and integrity, backup/recovery, performance
Accessing SQL Server:
Start
All Programs
Microsoft SQL
Server 2008 R2
SQL Server
Management Studio .
You may see a window indicating the
MS SQL Server Management Studio is configuring for first time use.
Just wait and be patient.
In the Connect to Server window, select
Database Engine for the Server Type
ICSD for the server name
Windows Authentication for Authentication
(These should all be defaults).
Then press the connect button.
In the Object Explorer pane (left side of screen), expand the Databases folder.
If you don’t see an Object Explorer Pane, select it from the View menu.
There are four databases that start with
“CS451” you should have read-only access to each one. I will use these during the semester.
To see the tables in one of them, expand the database folder and the subsequent
Tables folder that appear. Right click on one of the table names and select Select
Top 1000 Rows .
Test this and let me know of any access problems.