CpSc 2100 Software Design and Development

advertisement
CpSc 3220
File and Database
Processing
Lecture 1
Course Overview
File Storage Basics
Course Introduction
 Syllabus and tentative schedule are on Blackboard
 The course covers two main areas:
File Processing
Database Processing
Course Outcomes
1. Implement file reading and writing programs using PHP.
2. Identify file access schemes, including:
sequential file access
direct file access
indexed sequential file access.
3. Describe file-sorting and file-searching techniques.
4. Describe data compression and encryption techniques.
5. Design a rational database using E-R modeling techniques.
6. Build a relational database.
7. Write database queries using SQL.
8. Implement a web-based relational database using MySQL.
File Processing Concepts
 A study of how data is stored and maintained on
secondary storage
 Different ways of categorizing files
Data files - numeric and character data
Text
Binary
Graphics / Audio / Video
Unstructured / Structured
 We will focus on structured data files
File Structures




File Structures are persistent data structures
Files composed of records
Records composed of fields
Files can be viewed as tables
File -> Table
Record -> Row
Field -> Column
Instructor File
ID
10101
12121
15151
22222
32343
33456
45565
58583
76543
76766
83821
98345
Name
Srinivasan
Wu
Mozart
Einstein
El Said
Gold
Katz
Califieri
Singh
Crick
Brandt
Kim
DeptName
Comp.Sci.
Finance
Music
Physics
History
Physics
Comp.Sci.
History
Finance
Biology
Comp.Sci
Elec.Eng.
Salary
65000
90000
40000
95000
60000
87000
75000
62000
80000
72000
92000
80000
Department File
DeptName
Biology
Comp.Sci.
Elec.Eng.
Finance
History
Music
Physics
Building
Watson
Taylor
Taylor
Painter
Painter
Packard
Watson
Budget
90000
100000
85000
120000
50000
80000
70000
The CRUD paradigm
 Open the current version of a file
 Process it using the CRUD operations
Create records
Retrieve records
Update records
Delete records
 Output and close the new version of the file
Aside: A Process for
Generating Acronyms
 Step 1. Chose a group of words or phrases that
identify your process and let their first letters become
the acronym
Create records
Retrieve records
Update records
Delete records
 If that doesn’t give an acceptable acronym go to step 2
Generating Acronyms
Step 2
 Re-order the words so that their first letters make a
better acronym
Create records
Update records
Retrieve records
Delete records
 If that doesn’t work go to step 3
Generating Acronyms
Step 3
 Find synonyms for one or more the words so that
their first letters will make a good acronym
 For example:
Update becomes Change records
Delete becomes Remove records
Retrieve becomes Access records
Create becomes Produce new records
 If that doesn’t work go to step 4
Generating Acronyms
Step 4
 Give up and get back to serious work
Physical Storage Media




Speed
Cost
Reliability
Type
volatile storage
non-volatile storage
Physical Storage Media
 Cache – fastest and most costly form of storage;
volatile;.
 Main memory - fast access (10s to 100s of
nanoseconds); expensive; volatile
 Flash memory – half fast; cheap; non-volatile
 Magnetic-disk – slow; cheap; non-volatile
 Optical storage – slower; cheaper; non-volatile
 Tape storage – slow access/fast transfer; cheap;
non-volatile
Storage Hierarchy
 Primary storage: fastest media but volatile
(cache, main memory).
 Secondary storage: non-volatile, moderately
fast access time; also called on-line storage (flash
memory, magnetic disks)
 Tertiary storage: non-volatile, slow access time;
also called off-line storage (magnetic tape, optical
storage)
Magnetic Hard Disk
Mechanism
Performance Measures
 Access time – the time it takes from when a read or
write request is issued to when data transfer begins.
Seek time – time to reposition the arm over the correct
track; 4 to 10 milliseconds on typical disks
Rotational latency – time for the addressed sector to
appear under the head; 4 to 11 milliseconds on typical
disks (5400 to 15000 rpm)
 Data-transfer rate – the rate at which data can be
retrieved from or stored to the disk; 25 to 100 MB per
second max rate
Disk-Block Access
 Block
a contiguous sequence of sectors from a single
track;
the smallest amount that can be accessed
sizes range from 512 bytes to several kilobytes
Inner track
Outer track
Optimization of Disk Block Access
• Optimize block access time by organizing the
blocks to correspond to how data will be
accessed
• Store related information on the same or nearby
cylinders.
• Files may get fragmented over time
• Systems have utilities to defragment the file
system, in order to speed up file access
The architecture of a web application
`
Web Server
Client
Database Server
The Internet
`
Client
© 2010, Mike Murach & Associates,
Inc.
E-mail Server
Murach's PHP and MySQL,
C1
Slide 20
The architecture of the Internet
`
`
`
LAN
`
LAN
`
`
`
`
LAN
`
`
LAN
WAN
LAN
`
LAN
`
WAN
IXP
`
`
IXP
LAN
LAN
`
LAN
`
© 2010, Mike Murach & Associates,
Inc.
`
`
`
WAN
WAN
`
`
IXP
`
`
`
LAN
`
`
LAN
`
LAN
`
`
Murach's PHP and MySQL,
C1
Slide 21
How static web pages are processed
HTTP request
`
HTML
file
HTTP response
Web Browser
© 2010, Mike Murach & Associates,
Inc.
Web Server
Murach's PHP and MySQL,
C1
Slide 22
Summary





File processing allows persistent data structures
Most languages include libraries for file handling
File processing is a large and complicated subject
File storage devices can be grouped in three classes
Magnetic disks are the most common storage device
for file processing
For Next Time
 Read Chapter 1 of PHP and MySQL book
Download