Data & Databases Basic Data Fundamentals Data vs Information Data: facts Computer systems store data. Information: facts organized for a specific application Information is the output of a computer system Data Atomic Data Data in units that cannot be subdivided • First Name, Price, etc. Binary Large Objects (BLOB) Images and figures • Pictures, Cartoons Objects Data and associated processes • Spreadsheet tables, graphics components Atomic Data Most current systems store atomic data. Atomic data is typically made up of characters (letters and numbers) stored in coded groups of 8 bits called bytes. ASCII American Standard Code for Information Interchange EBCDIC Extended Binary Coded Decimal Interchange Code Atomic Data Because atomic variables are made up of a relatively small number of codes, computers can interpret and manipulate them Binary Large Objects These are typically bit mapped pictures or images (such as FAXs) that are stored. Parts of these pictures have meaning (unlike atomic data). Digital pictures, video clips, stored images and signatures are BLOBs. PIXEL blobs are usually stored as pixel codes that represent a point on a video screen Bit Maps 1028 800 1028 x 640 = 657,920 pixels Black & White = 1 bit per pixel 256 tone Color = 8 bits per pixel High Resolution Image Processing Images cannot be processed easily (other than displayed and scaled) by normal technology and usually require atttached atomic indexes. Text images can be converted although the software is not fully reliable. Objects Objects combine data and the processes that manipulate it. An example of an object is a spreadsheet table imported with the manipulation rules. Encapsulation Formal Interface for Messages Inheritance Object Processing Object databases are new technologies with immature design methodologies. They promise extensive code reuse and easy system development if the technology proves itself. Data, Metadata and Business Rules Metadata is data about data, the structure descriptions of the stored files. Metadata includes location, format, field name, data description, etc. Business rules include editing and processing rules as well as automatic procedures that must be processed for accurate data. Data Access Technology Flat File structures. Data in one location; metadata and business rules in programs Database Management Systems Data and metadata together; business rules in programs Object Oriented systems Data, metadata and business rules together Database Approach Program 1 Meta-Data Program 2 Meta-Data Program 3 Meta-Data Program 1 Program 2 Program 3 FILE SYSTEM Data DATABASE MetaData Data Distribution Strategies for Databases Centralized Data and Processing: Dumb terminal with "screen scraping". Intelligent Terminal: Data and processing centralized; data preparation and display on remote devices. Distributed Logic: Data storage distributed; processed at the optimal location. A version of parallel processing. Client Server: Data (usually departmental) maintained on a server. Subsetting occurs on the server, processing on client machines. Distributed Database: Data distributed among different locations; processing access data wherever it is located. Data may be replicated or partitioned. Data Trends Centralized and Client/Server systems Multi Layered architectures Objects and encapsulation Data driven organizations Functions of a DBMS C.J. Date Indexing Views Security Integrity Concurrency Backup/Recovery Design Documentation Update/Query Views Custom representations of a database that correspond to the needs of a class of users. Stored SELECT statements. Views Views Permit Maintaining a constant user interface Restricting access to specified attributes Specifying user rights Views 3 Schema Architecture LOGICAL CONCEPTUAL PHYSICAL User Views (Views or Queries) Database administrators model for the data (E-R diagrams) Actual data placement and structure (SQL statements) Security Components that limit access or actions to limit potential damage to data. Security Approaches Views limit access and actions Authorization Rules identify users and restrict actions User Defined Procedures in addition to database security functions Encryption encode stored data Authentication positively identify users Authorization Rules Some DBMS products authorize actions based on specific records and functional descriptions. However, most DBMS’s limit actions on tables to one of: • Read: view but not change • Insert: read and add records • Update: read, insert and change records • Alter/Delete: read, insert, update and delete records, change table structure Authorization Rules Subject Object Action Constraint Sales Dept Cust Insert Credit < $5000 Program Ar4 Order Modify None Terminal 12 Cust Modify Balance Due Order Trans Cust Read None Integrity Components that preserve the relationship among different related records in the database Integrity The relationship among records in the database Referential Integrity Non Key Integrity Derived Conditions Concurrency Preventing two users from interfering with each other when they use the same information Concurrency Lockout Restricting access to users who could be misled by partial transactions Versioning Making trial updates on versions of the database and denying one if there is a data conflict. Locks 00 01 02 03 Master Student Fred Anthony Steve Ivan Grade Program 1 locks record <exclusive>. No other program can read the record. No program can have an active lock. Program 2 locks record <shared> Other programs can read, but not change record. No program can have an exclusive lock. Versioning Version 1 Time 1 Version 2 Version 3 Time 2 Time 3 Commits version 3 only after changes to versions 1 and 2 have been rolled back. Backup and Recovery Processes to confirm and repeat transactions so that database can be restored to a valid state after a problem. Backup and Recovery Backup Copies • Master • Transaction Log Journalization • Forward Log • Backward Log Checkpoints DBMS Logs 00 01 02 03 Master Student Fred Anthony Steve Ivan Grade Transaction Insert Li with grade A Change Fred’s grade to A Recover from Backup Backup + Transaction = Recovered Database Slow May give different answers from original DBMS Logs Transaction Ins Li with grade A Chg Fred’s grade to A 00 01 02 03 Master Student Fred Anthony Steve Li Grade A A Forward Log Student Grade 03 Li A 00 Fred A Backward Log Student Grade 03 n/p 00 Fred DBMS Logs Transaction Ins Li with grade A Chg Fred’s grade to A 10:00 Checkpoint 00 01 02 03 Master Student Fred Anthony Steve Li Grade A A Forward Log Student Grade 03 Li A 00 Fred A Chkpt Backward Log Student Grade 03 n/p 00 Fred Chkpt DBMS Logs Transaction Ins Li with grade A Chg Fred’s grade to A 10:00 Checkpoint Chg Steve grade to B 00 01 02 03 Master Student Fred Anthony Steve Li Grade A B A 03 00 02 03 00 02 Forward Log Student Grade Li A Fred A Chkpt Steve B Backward Log Student Grade n/p Fred Chkpt Steve Recover to Checkpoint Using Logs Contaminated Database + - Backward Log Recent Transactions = = Correct at Checkpoint Recovered Database Transaction Processing A set of computer operations required to process a single unit of work. A transaction must conclude with the database in a valid state whether the transaction terminates correctly or abnormally Transaction Boundaries Set Boundary • Obtain Locks • Execute Code Modules • Evaluate Correctness Commit or Rollback • Release Locks Transaction Boundaries Premiere Products Example SALESREP CUSTOMER ORDER PRODUCT ORDER-PRODUCT Place an order for a new customer with a 1500 credit limit Transaction Boundaries Premiere Products Example SALESREP CUSTOMER ORDER PRODUCT ORDER-PRODUCT •Insert CUSTOMER Record •Update CUSTOMER with SALESREP Foreign Key •Insert ORDER Record •Insert ORDER-PRODUCT with Foreign Keys •Update ProductOnHand in PRODUCT •Check Credit Limit D A T A