File Structures by Folk, Zoellick, and Ricarrdi Chap3. Secondary Storage and System Software File Structure SNU-OOPSLA Lab 1 Chapter Objectives Describe the organization of typical disk drives, including basic units of organization and their relationships Identity and describes the factors affecting disk access time, and describe methods for estimating access times and space requirements Describe magnetic tapes, identify some tape applications, and investigate the implications of block size on space requirements and transmission speeds Identify fundamental differences between media and criteria that can be used to match the right medium to an application Describe in general terms the events that occur when data is transmitted between a program and a secondary storage device Introduce concepts and techniques of buffer management Illustrate many of the concepts introduced in the chapter, especially system software concepts, in the context of UNIX File Structure SNU-OOPSLA Lab 2 Contents 3.1 Disks 3.2 Magnetic Tape 3.3 Disk versus Tape 3.4 Introduction to CD-ROM 3.5 Physical Organization of CD-ROM 3.6 CD-ROM Strengths and Weaknesses 3.7 Storage as a Hierarchy 3.8 A Journey of a Byte 3.9 Buffer Management 3.10 I/O in UNIX File Structure SNU-OOPSLA Lab 3 3.1 Disks Disks Random Access Device( the disk shape provide this feature while other geometric shapes not provide it) Disk device - DASD(Direct Access Storage Devices) Magnetic disk( store the binary data by using the two magnetize nodes N and S). hard disk - most common floppy disk - inexpensive, little capacity magnetic tape - serial access Iomega Zip (100 M byte) (The Zip drive is a medium-capacity removable floppy), Jaz (1 G byte) (The Jaz drives a removable hard disk storage system introduced by the Iomega company in 1995.) Nonmagnetic disk optical discs - increasingly important for secondary storage e.g. CD, DVD 4 Compact disc (CD) is a digital optical disc data storage format. DVD (sometimes explained as "digital video disc" or "digital versatile disc" is a digital optical disc storage format. Blu-ray Disc (BD) is a digital optical disc data storage format designed to supersede the DVD format, in that it is capable of storing high-definition video resolution) Blu-ray Discs contain 25 GB per layer. File Structure SNU-OOPSLA Lab 5 Electronic Disks Electronic disk is a data storage device functionally similar to a hard disk drive but using flash memory. Flash memory is an electronic nonvolatile computer storage medium that can be electrically erased and reprogrammed. File Structure SNU-OOPSLA Lab 6 3.1 Disks Organization of Disks(1) Components of Disk Tracks (Number of adjacent sectors at the same diameter) Sectors Disk pack the smallest addressable portion of a disk(smallest block that can be read or write) a collection of a lot of platters Cylinder (vertical collection of tracks) Arms are moving together so we can read or write vertical sector in parallel. Seeking (Finding the requested sectors that represent our file) Read/write (r/w )arm movement set the position of requested sectors. the slowest part of reading data from disk because it is 7 mechanical motion. 3.1 Disks Organization of Disks(2) Track Sector Gaps Surface of disk showing tracks and sectors File Structure SNU-OOPSLA Lab 8 3.1 Disks Organization of Disks(3) Spindle Platters Read/write heads Boom Schematic illustration of disk drive File Structure SNU-OOPSLA Lab 9 3.1 Disks Estimating Capacities(1) Cylinder --> Track --> Sector --> Byte Track capacity = # of sectors per track * bytes per sector Cylinder capacity = # of tracks per cylinder * track capacity Drive capacity = # of cylinders * cylinder capacity ** Inner track, Outer track? Same capacity? File Structure SNU-OOPSLA Lab 10 Inner track: the track more near to the disk center. Outer track: the track more far from the disk center. Is the size of the inner and outer tracks equal (same capacity)? The size of the inner outer tracks logically same (have the same number of sectors) but physically no they have different sizes so that we have gaps. File Structure SNU-OOPSLA Lab 11 Example: Hard disk with 5 platters each surface contains 20 tracks and each track contains 50 sectors of size 512 bytes; find the disk size? File Structure SNU-OOPSLA Lab 12 3.1 Disks Estimating Capacities(2) Ex) file with 20,000 fixed-length data records on 300MB disk # of bytes per sector = 512 # of sectors per track = 40 # of tracks per cylinder = 11 # of cylinders = 1,331 How many cylinders does the file require if each data record requires 256 bytes? File Structure SNU-OOPSLA Lab 13 3.1 Disks Organizing Tracks by Sector(1) Two ways to organize data on a disk by sector & by user-defined block Physical placement of sectors logical view and physical view adequate way to view a file logically, but always not good way to store sectors physically interleaving — — File Structure problems occur due to the gap between disk revolution speed and disk controller speed high performance disks (with high speed disk controller) now offer 1:1 interleaving SNU-OOPSLA Lab 14 3.1 Disks Organizing Tracks by Sector(2) • Interleaving If interleaving factor is 3 . . Disk . 4 File Structure . . 34 23 3 33 22 2 SNU-OOPSLA Lab 32 21 1 15 3.1 Disks Organizing Tracks by Sector(3) Clusters (for improving file access speed) One file consists of one or more clusters Cluster is the smallest unit of space One cluster is a fixed number of contiguous sectors. Contiguous sectors per cluster and Mapping table No additional seek (arm movement) for one cluster Very useful when access granularity is medium large cluster vs. small cluster FAT (File Allocation Table) contains a list of all the clusters in file each cluster entry --> physical location of the cluster File Structure SNU-OOPSLA Lab 16 3.1 Disks Organizing Tracks by Sector(3) • FAT(File Allocation Table) - A table containing mappings to the physical locations of all the clusters in all files on disk storage File allocation table(FAT) Cluster number 1 2 3 . . . File Structure Cluster location The part of the FAT pertaining to our file 2 . . . 1 SNU-OOPSLA Lab 17 3.1 Disks Organizing Tracks by Sector(4) Extents (further attempt for file access speed) Contiguous clusters per extent and Mapping table File may have several extents Very useful when accessing the whole file sequentially Many extents of a file increase amount of seeking file extents extent extent extent extent extent extent File Structure SNU-OOPSLA Lab 18 3.1 Disks Organizing Tracks by Sector(5) Fragmentation the space that goes unused within a cluster, block, track, or other unit of physical storage internal fragmentation: loss of space within a sector or cluster trade-offs in use of large cluster sizes ex) size of sector is 512 bytes and size of all records in file is 300 bytes we may enforce one record / sector we may allow span sectors File Structure SNU-OOPSLA Lab 19 3.1 Disks Organizing Tracks by Block(1) User-defined size of blocks can be fixed or variable No sector spanning ==> No internal fragmentation Blocking factor the number of records that are to be stored in each block In block-addressing, each block of data is accompanied by one or more subblocks containing extra information about the data block Note: the “block” has a different meaning in the context of the UNIX I/O system File Structure SNU-OOPSLA Lab 20 3.1 Disks Sector organization vs. Block organization Sector 1 Sector 2 Sector 3 Sector 4 Sector 5 111 111 111 111 111 222 223 334 444 555 (a) sector organization 111111 111111 111 222 22 333 4 444 555 (b) block organization File Structure SNU-OOPSLA Lab 21 3.1 Disks Organizing Tracks by Block(2) Subblocks count subblocks contain the num of bytes in the accompanying data block key subblocks contain key for last record in data block Count subblock Data subblock Count Data subblock subblock Count subblock Count Key Data subblock subblock subblock Count Key Data subblock subblock subblock File Structure SNU-OOPSLA Lab Key subblock 22 3.1 Disks Nondata Overhead(1) Stored during preformatting Sector-addressable disks (physical) sector/track address, condition, gaps, and synchronization marks of no concern to programmer Block-organized disks (user-defined) some of nondata for programmer more nondata provided with blocks than with sectors File Structure SNU-OOPSLA Lab 23 3.1 Disks Nondata Overhead(2) Ex) block-addressable disks 20,000 bytes/track subblock, interblock gap equivalent to 300 bytes/block If we want to store a file containing 100-byte records, how many records can be stored per track if blocking factor is 10 or 60 ? if blocking factor is 10, then 20000 / ( (10*100)+300) = 15 blocks ==> 150 records if blocking factor is 60, 180 records(=3 blocks) File Structure SNU-OOPSLA Lab 24 3.1 Disks Nondata Overhead(3) Block-organized scheme larger blocking factor (many records in a block) less amount of nondata block more efficient use of storage pros. flexibility -> savings in time & efficiency cons. internal fragmentation problem necessary for programmer and OS to do extra data organization sector interleaving cannot be used to improve performance File Structure SNU-OOPSLA Lab 25 3.1 Disks Cost of Disk Access Disk Access Time Seek time time to move arm to locate cylinder more costly in multi-user environment average seek time for a particular file operation Rotational delay Seek time + Rotational delay + Transfer time time taken for disk to rotate correct sector under read/write arm Transfer time (for actual data) number of bytes transferred number of bytes on a track File Structure SNU-OOPSLA Lab rotation time 26 Timing Computation Seagate Cheetah 9 Gbyte drive Minimum seek (track-to-track): 0.78msec Maximum seek: 19 msec Average seek: 8 msec Rotation delay: 3 msec Read one track: 6 msec Read one cluster: 0.28 msec Read 8704K size file sequentially access the whole file: 1.7 sec randomly access the whole file: 9.25 sec File Structure SNU-OOPSLA Lab 27 3.1 Disks Effect of Block Size on Performance (In UNIX) UC Berkeley CSRG team Many experiments in BSD UNIX System Trade-off between block size, fragmentation, access time Doubling block size to 1,024 bytes improves performance by more than a factor of 2 In 512-byte block size: the slow throughput & low internal fragmentation (6.9%) In 4,096-byte block: the fastest throughput & internal fragmentation (45.6%) ==> Invent the cluster concept! File Structure SNU-OOPSLA Lab 28 3.1 Disks Disk as Bottleneck Disk performance is increasing, but still slow! Even high-performance network is faster than disk (5 M/ sec HD is slower than 100M bps LAN) Disk bound jobs CPU and network must wait Solution techniques multiprogramming striping - parallel I/Os avoid accessing disk RAM disk, disk caches - locality File Structure SNU-OOPSLA Lab 29 3.2 Magnetic Tape Magnetic Tape A set of parallel tracks 9 tracks - parity bit Frame one-bit-wide slice of tape Interblock gaps permit File Structure stopping and starting SNU-OOPSLA Lab 30 3.2 Magnetic Tape Nine-track tape Track Frame 0 1 1 0 1 0 0 1 0 Gap File Structure Data block SNU-OOPSLA Lab Gap 31 3.2 Magnetic Tape Estimating Tape Length(1) There is an interblock gap for each data block Space requirement s s = n * ( b + g ) b is the physical length of a data block g is the length of an interblock gap n is the number of data blocks Tape density Tape speed Size of interblock gap File Structure SNU-OOPSLA Lab 32 3.2 Magnetic Tape Estimating Tape Length(2) Ex) one million 100-byte records 6,250 BPI tape 0.3 inches of interblock gap How much tape is needed? when blocking factor is between 1 and 50 Nominal recording density Effective recording density: num File Structure of byte per block / num of inches for block SNU-OOPSLA Lab 33 3.2 Magnetic Tape Estimating Data Transmission Times Factors of data transmission rate interblock gaps effective recording density nominal recording density speed of r/w head time to start/stop the tape File Structure SNU-OOPSLA Lab 34 3.3 Disk versus Tape Disk VS. Tape Disk Random access Immediate access Expensive seek in sequential processing Tape Sequential access Long-term storage No seek in sequential processing Decrease in cost of disk and RAM More RAM space is available in I/O buffers, so disk I/O decreases Tertiary storage for backup: CD-ROM, tape ... File Structure SNU-OOPSLA Lab 35 3.4 Introduction to CD-ROM Introduction to CD-ROM CD-ROM: Compact Disc Read-Only Memory Can hold over 600MB(200,000 pages) Easy to replicate Useful for publishing or distributing medium But, not storing and retrieving data CD-ROM is a child of CD audio CD audio provides High storage capacity Moderate data transfer rate But, against high seek performance Poor seek performance File Structure SNU-OOPSLA Lab 36 3.4 Introduction to CD-ROM History of CD-ROM: Videodisc Videodisc technology developed in late 1960's and early 1970's The goal was to store movie A number of methods for storing video signals 1. Use a needle to respond mechanically to grooves in a disc like a vinyl LP record 2. Use optical storage Many companies were fighting which approach should become a standard VideodiskLaserVisionCD audioCD-ROM File Structure SNU-OOPSLA Lab 37 3.4 Introduction to CD-ROM History of CD-ROM: LaserVision LaserVision Emerged as the winner of standard battles about standard Support CLV(Constant Linear Velocity) and CAV(Constant Angular Velocity) format Fast seek performance by using CAV format Data are stored in analog form Why did they fail? Earlier disputes over the physical format of the video disc, many incompatible encoding scheme and error correction techniques were used by many firms No standard format the market never grew File Structure SNU-OOPSLA Lab 38 3.4 Introduction to CD-ROM History of CD-ROM: CD-ROM CD-ROM Philips and Sony developed CD-ROM in 1984 in order to store music on a disc Use a digital data format The development of CD-ROM as a licensing system results in widely acceptance in the industry Promised to provide a standard physical format Any CD-ROM drive can read any sector which they want Computer applications store data in a file not in terms of sector, thus, file system standard should be needed In late 1985, videodisc/digital data industry moved into CD-ROM industry In early summer of 1986, an official standard for organizing files was worked out File Structure SNU-OOPSLA Lab 39 3.5 Physical Organization of CD-ROM Physical Organization of Master Disk Master Disc Formed by using the digital data, 0 or 1 Made of glass and coated that is changed by the laser beam Two part of CD-ROM Pit The areas that is hit by the laser beam Scatter the light Land Smooth, unchanged areas between pits Reflect the light light laser beam land File Structure SNU-OOPSLA Lab pit 40 3.5 Physical Organization of CD-ROM Encoding Scheme of CD-ROM Encoding scheme Constraint The alternating pattern of high- and low-density reflected light is the signal 1 : transition from pit to land and back again 0s : the amount of time between transitions The limits of the resolution of the optical pickup (generic hardware limit), there must be at least two 0’s between any pair of 1’s (no two adjacent 1s) We cannot represent all bit patterns, thus, we need translation scheme We need at least 14 bits to represent 8 bits under this constraint EFM(eight to fourteen modulation) coding is used 110 File Structure 000000012 SNU-OOPSLA Lab 10000100000000EFM 41 3.5 Physical Organization of CD-ROM Format of CD-ROM CD audio chose CLV format instead of CAV format because CD audio requires large storage space CD audio is played from the beginning to the end sequentially Format of CD-ROM CLV(Constant Linear Velocity) A single spiral pattern Same amount of space for each sector Capability for writing all of sectors at the maximum density Rotational speed is slower in reading outer edge than in inner edge Finding the correct speed though trial and error Characteristics Poor seek performance No straightforward way to jump to a specified location File Structure SNU-OOPSLA Lab 42 3.5 Physical Organization of CD-ROM Constant Angular Velocity Disk Magnetic disk usually uses CAV(Constant Angular Velocity) Concentric tracks and pie-shaped sectors Data density is higher in inner edge than in outer edge Storage waste: total storage is less than a half of CLV Spin the disc at the same speed for all positions Easy to find a specific location on a disk good seek performance CLV File Structure CAV SNU-OOPSLA Lab 43 3.5 Physical Organization of CD-ROM Addressing of CD-ROM Addressing Track density varies thus, each second of playing time on a CD is divided into 75 sectors Magnetic disk: cylinder/track/sector approach CD-ROM: a sector-addressing scheme 75 sectors/sec, 2 Kbytes/sector At least one-hour of playing time Maximum capacity can be calculated: 600 Mbytes 60 min * 60 sec/min * 75 sectors/sec = 270,000 sectors We address a given sector by referring minutes, second, and sector of play 16:22:34 means 34th sector in the 22nd second in the 16th minutes of play File Structure SNU-OOPSLA Lab 44 3.5 Physical Organization of CD-ROM Fundamental Design of the CD disc Initially designed for delivering digital audio information Store audio data in digital form Wave patterns should be converted into digital form Measure of the height of the sound: 65,536 different gradation(16 bits) Sampling rate: 44.1 kHz, because of 2 times of 20,000 Hz upto which people can listen 16 bits sample, 44,100 times per second, and two channel for stereo sound, we should store 176,400 bytes per seconds Storage capacity of CD is 75 sectors per seconds, we have 2,352 bytes per sector CD-ROM divides this raw sector as shown in the following figure 12 bytes synch File Structure 4 bytes sector ID 2,048 bytes user data 4 bytes error detection SNU-OOPSLA Lab 8 bytes null 276 bytes error correction 45 3.6 CD-ROM Strengths and Weaknesses File Structure Problem of CD-ROM The most important problem in using CD-ROM as a storage media Strongness and weakness of CD-ROM Strong aspects of CD-ROM Data transfer rate: 75 sectors/sec Storage capacity : over 600 Mbytes Inexpensive to duplicate and durable Weak aspects of CD-ROM Poor seek performance (weak random access) Magnetic disk: 30 msec, CD-ROM : 500 msec Comparison of access time of a large file from several media RAM: 20 sec, Disk: 58 days, CD-ROM: 2.5 years We should have a good file structure avoiding seeks to an even greater extent that on magnetic disk File Structure SNU-OOPSLA Lab 46 3.7 Storage as a Hierarchy Storage as a Hierarchy Primary storage Secondary storage 1/1,000,000,000 sec semi-conductors: registers, RAM, RAM disk, and disk cache 1/1,000 sec magnetic disks(DASD), tape and mass storage(Serial) Off-line storage (archival & back-up) 10 sec removable magnetic disks, optical discs, tapes File Structure SNU-OOPSLA Lab 47 3.8 A Journey of a Byte A Journey of a Byte(1) What happens when a program writes a byte to a file on a disk? WRITE(TEXT, c, 1); User Program ---> OS File manager --> I/O buffer ---> Disk controller File Structure I/O processor ---> ---> Disk SNU-OOPSLA Lab 48 3.8 A Journey of a Byte A Journey of a Byte(2) Logical layer 1 The program asks the OS to write the contents of the variable c to the next available position in TEXT. 2 The OS passes the job on to the file manager 3 The file manager looks up TEXT in a table containing information about it, such as whether the file is open and available for use, what types of access are allowed, if any, and what physical file the logical name TEXT corresponds to. 4 The file manager searches a FAT for the physical location of the sector that is to contain the byte. 5 The file manager makes sure that the last sector in the file has been stored in a system I/O buffer in RAM, then deposits the ‘P’ into its proper position in the buffer. File Structure SNU-OOPSLA Lab 49 3.8 A Journey of a Byte A Journey of a Byte(3) 6 The file manager gives instructions to the I/O processor about where the byte is stored in RAM and where it needs to be sent on the disk 7 The I/O processor finds a time when the drive is available to receive the data and puts the data in proper format for the disk. It may also buffer the data to send it out in chunks of the proper size for the disk 8 The I/O processor sends the data to the disk controller. 9 The controller instructs the drive to move the r/w head to the proper track, waits for the desired sector to come under the r/w head, then sends the byte to the drive to be deposited, bit-by-bit, on the surface of the disk. File Structure SNU-OOPSLA Lab 50 Physical layer 3.8 A Journey of a Byte What is buffer? Definition - the part of main memory available for storage of copies of disk blocks Program buffers vs. System I/O buffers Buffer manager subsystem responsible for the allocation for blocks goal: minimize the number of disk access utilize the memory space effectively File Structure SNU-OOPSLA Lab 51 3.9 Buffer Management Buffer Management Buffer bottlenecks What if program performs both input and output on one character at a time, and only one I/O buffer is available? At least two buffer - for input and output I/O bound jobs: wait for I/O completion Buffering Strategies for Performance use more than one buffer I/O system processes next block when CPU is processing current one File Structure SNU-OOPSLA Lab 52 3.9 Buffer Management Buffering Strategies(1): Multiple buffering Double buffering swapping roles of two buffers after I/O finished allows O/S operating on one buffer while the other buffer is being loaded or emptied any number of buffers can be used Buffer pooling takes from a pool of available buffers decides which buffer to take from a buffer pool take buffer by LRU File Structure SNU-OOPSLA Lab 53 3.9 Buffer Management Double Buffering I/O buffer 1 (a) To disk Program data area I/O buffer 2 I/O buffer 1 (b) Program data area I/O buffer 2 File Structure SNU-OOPSLA Lab To disk 54 3.9 Buffer Management Buffering Strategies(2): Move Move mode (using both system buffer & program buffer) & Locate mode moving data from one place in RAM to another before they can be accessed sometimes, unnecessary data moves Locate mode (using system buffer only or program buffer only) perform I/O directly between secondary storage and program buffer (program’s data area) system buffers handle all I/Os, but program uses locations through pointer variable File Structure SNU-OOPSLA Lab 55 3.9 Buffer Management Move mode & Location mode Move mode Locate mode File Structure program’s data area user’s program location (pointer) system buffer disk system buffer disk SNU-OOPSLA Lab 56 3.9 Buffer Management Buffering Strategies(3): Scatter/Gather IO A file with many blocks and each block with header and data Need to put the headers in one buffer and the data in a different buffer ==> may occur complication Scatter-input mode a single READ can scatter data into a collection of buffers Gather-output mode a single WRITE can gather several buffers and output File Structure SNU-OOPSLA Lab 57 3.9 Buffer Management Scatter Input & Gather Output scatter input buffer 1 buffer 2 gather output buffer 1 disk buffer 3 buffer 2 File Structure read() disk SNU-OOPSLA Lab write() 58 3.10 I/O in UNIX I/O in UNIX PROCESS user programs KERNEL character I/O system (terminals, printers, etc.) block device drivers File Structure libraries shell commands I/O system block I/O system (normal files) disk <Kernel I/O structure> character device drivers disk... consoles printers... SNU-OOPSLA Lab system call interface network I/O system (sockets) network interface drivers ...networks... 59 3.10 I/O in UNIX UNIX Kernel(1) The central part of UNIX operating system View all I/O as operating on a sequence of bytes Make all operations below top layer independent of application’s logical view of file Data structures related to unix files file-descriptor table: owned by user process open-file table: owned by kernel index-node: a kind of FAT (one inode for each file in use) index-node table: owned by kernel File Structure SNU-OOPSLA Lab 60 3.10 I/O in UNIX UNIX Kernel(2) File descriptor table associates each file descriptor to open file table every process has its own file descriptor table Open file table entries for every open file file structures: r/w mode, offset, reference count File Structure SNU-OOPSLA Lab 61 3.10 I/O in UNIX UNIX Kernel(3) descriptor table file descriptor file table entry o(keyboard) 1(screen) 2(error) 3(normal) 4(normal) 5(normal) . . . File Structure to open file table SNU-OOPSLA Lab 62 3.10 I/O in UNIX UNIX Kernel(4) open file table R/W mode write # of processes using it 1 Offset of next access ptr to write routine ...... ...... 100 inode table entry to inode table write() routine for this type of file File Structure SNU-OOPSLA Lab 63 3.10 I/O in UNIX UNIX Kernel(5) Inode(Index node) data structure used to describe a file when a file is opened, a copy of inode is loaded into RAM for rapid access has a list of disk blocks of the file this list is UNIX counterpart to FAT Device driver I/O processor program invoked by kernel performing I/O for devices File Structure SNU-OOPSLA Lab 64 3.10 I/O in UNIX Inode Structure device permissions owner’s userid file size . . . block count An inode file allocation . . table . File Structure SNU-OOPSLA Lab 65 3.10 I/O in UNIX Hard/Soft Link & File types Linking hard file names to files link: direct reference pointer from a directory to the inode of a file ln src-file target-file soft(symbolic) link: pathnames link a file name to another file name ln -s src-file target-file File types normal file (governed by block IO system) special file: stream, signal control of device (governed by character IO system) sockets: endpoints of IPC (governed by network IO system) File Structure SNU-OOPSLA Lab 66 Let’s Review !!! 3.1 Disks 3.2 Magnetic Tape 3.3 Disk versus Tape 3.4 Introduction to CD-ROM 3.5 Physical Organization of CD-ROM 3.6 CD-ROM Strengths and Weaknesses 3.7 Storage as a Hierarchy 3.8 A Journey of a Byte 3.9 Buffer Management 3.10 I/O in UNIX File Structure SNU-OOPSLA Lab 67