CpSc 3220 File and Database Processing Lecture 1 Course Overview File Storage Basics Course Introduction Syllabus and tentative schedule are on Blackboard The course covers two main areas: File Processing Database Processing Course Outcomes 1. Implement file reading and writing programs using PHP. 2. Identify file access schemes, including: sequential file access direct file access indexed sequential file access. 3. Describe file-sorting and file-searching techniques. 4. Describe data compression and encryption techniques. 5. Design a rational database using E-R modeling techniques. 6. Build a relational database. 7. Write database queries using SQL. 8. Implement a web-based relational database using MySQL. File Processing Concepts A study of how data is stored and maintained on secondary storage Different ways of categorizing files Data files - numeric and character data Text Binary Graphics / Audio / Video Unstructured / Structured We will focus on structured data files File Structures File Structures are persistent data structures Files composed of records Records composed of fields Files can be viewed as tables File -> Table Record -> Row Field -> Column Instructor File ID 10101 12121 15151 22222 32343 33456 45565 58583 76543 76766 83821 98345 Name Srinivasan Wu Mozart Einstein El Said Gold Katz Califieri Singh Crick Brandt Kim DeptName Comp.Sci. Finance Music Physics History Physics Comp.Sci. History Finance Biology Comp.Sci Elec.Eng. Salary 65000 90000 40000 95000 60000 87000 75000 62000 80000 72000 92000 80000 Department File DeptName Biology Comp.Sci. Elec.Eng. Finance History Music Physics Building Watson Taylor Taylor Painter Painter Packard Watson Budget 90000 100000 85000 120000 50000 80000 70000 The CRUD paradigm Open the current version of a file Process it using the CRUD operations Create records Retrieve records Update records Delete records Output and close the new version of the file Aside: A Process for Generating Acronyms Step 1. Chose a group of words or phrases that identify your process and let their first letters become the acronym Create records Retrieve records Update records Delete records If that doesn’t give an acceptable acronym go to step 2 Generating Acronyms Step 2 Re-order the words so that their first letters make a better acronym Create records Update records Retrieve records Delete records If that doesn’t work go to step 3 Generating Acronyms Step 3 Find synonyms for one or more the words so that their first letters will make a good acronym For example: Update becomes Change records Delete becomes Remove records Retrieve becomes Access records Create becomes Produce new records If that doesn’t work go to step 4 Generating Acronyms Step 4 Give up and get back to serious work Physical Storage Media Speed Cost Reliability Type volatile storage non-volatile storage Physical Storage Media Cache – fastest and most costly form of storage; volatile;. Main memory - fast access (10s to 100s of nanoseconds); expensive; volatile Flash memory – half fast; cheap; non-volatile Magnetic-disk – slow; cheap; non-volatile Optical storage – slower; cheaper; non-volatile Tape storage – slow access/fast transfer; cheap; non-volatile Storage Hierarchy Primary storage: fastest media but volatile (cache, main memory). Secondary storage: non-volatile, moderately fast access time; also called on-line storage (flash memory, magnetic disks) Tertiary storage: non-volatile, slow access time; also called off-line storage (magnetic tape, optical storage) Magnetic Hard Disk Mechanism Performance Measures Access time – the time it takes from when a read or write request is issued to when data transfer begins. Seek time – time to reposition the arm over the correct track; 4 to 10 milliseconds on typical disks Rotational latency – time for the addressed sector to appear under the head; 4 to 11 milliseconds on typical disks (5400 to 15000 rpm) Data-transfer rate – the rate at which data can be retrieved from or stored to the disk; 25 to 100 MB per second max rate Disk-Block Access Block a contiguous sequence of sectors from a single track; the smallest amount that can be accessed sizes range from 512 bytes to several kilobytes Inner track Outer track Optimization of Disk Block Access • Optimize block access time by organizing the blocks to correspond to how data will be accessed • Store related information on the same or nearby cylinders. • Files may get fragmented over time • Systems have utilities to defragment the file system, in order to speed up file access The architecture of a web application ` Web Server Client Database Server The Internet ` Client © 2010, Mike Murach & Associates, Inc. E-mail Server Murach's PHP and MySQL, C1 Slide 20 The architecture of the Internet ` ` ` LAN ` LAN ` ` ` ` LAN ` ` LAN WAN LAN ` LAN ` WAN IXP ` ` IXP LAN LAN ` LAN ` © 2010, Mike Murach & Associates, Inc. ` ` ` WAN WAN ` ` IXP ` ` ` LAN ` ` LAN ` LAN ` ` Murach's PHP and MySQL, C1 Slide 21 How static web pages are processed HTTP request ` HTML file HTTP response Web Browser © 2010, Mike Murach & Associates, Inc. Web Server Murach's PHP and MySQL, C1 Slide 22 Summary File processing allows persistent data structures Most languages include libraries for file handling File processing is a large and complicated subject File storage devices can be grouped in three classes Magnetic disks are the most common storage device for file processing For Next Time Read Chapter 1 of PHP and MySQL book