File Structures

advertisement
Connecting with Computer
Science, 2e
Chapter 10
File Structures
Objectives
• In this chapter you will:
– Learn what a file system does
– Understand the FAT file system and its advantages
and disadvantages
– Understand the NTFS file system and its advantages
and disadvantages
– Compare common file systems
– Learn how sequential and random file access work
– See how hashing is used
– Understand how hashing algorithms are created
Connecting with Computer Science, 2e
2
Why You Need to Know About...File
Structures
• Knowledge of how an operating system stores and
maintains data in a computer
– Allows better comprehension of how a computer
handles and manipulate files
– Allows the computer to run as efficiently as possible
Connecting with Computer Science, 2e
3
What Does a File System Do?
• Responsibilities of a file system
– Creating, manipulating, renaming, copying, and
removing files to and from a storage device
– Organizing files into common storage units
• Called directories / folders
– Keeping track of file and directory locations
– Assisting users
• Relate files and folders to the physical structure of the
storage medium
Connecting with Computer Science, 2e
4
What Does a File System Do?
• Files used by operating systems and applications
include:
–
–
–
–
–
–
Word-processing documents
Source code for programs you have written
Music files
Movie files
Spreadsheets
Photos
• Operating systems use a file folder icon to
represent a directory
Connecting with Computer Science, 2e
5
What Does a File System Do?
Figure 10-1, Files and directories in a file system are
similar to documents and folders in a filing cabinet
Connecting with Computer Science, 2e
6
What Does a File System Do?
Figure 10-2, Folders and files in Windows
Connecting with Computer Science, 2e
7
What Does a File System Do?
• Hard disk
– Most common storage medium for a file system
– Physically organized into tracks and sectors
– Read/write heads move over specified areas of the
hard disk to store ( write ) or retrieve ( read ) data
– Random access device
• Reads or writes data directly on the disk
• Faster than sequential access
– Reads and writes from beginning to end
• Makes use of the file system to organize files
Connecting with Computer Science, 2e
8
File Systems and Operating Systems
• File management system: Dependent on the
operating system
• FAT ( File Allocation Table ): Used from MS-DOS
to Windows ME
• NTFS ( New Technology File System): Default for
Windows ( NT – includes XP, Vista, 7, 8 )
• Unix and Linux support several file systems: XFS,
JFS, ReiserFS, ext3, others
• Mac OS X file system: HFS and HFS+
Connecting with Computer Science, 2e
9
FAT
• Groups hard drive sectors into clusters
– Increases performance by organizing blocks of
sectors contiguously
• Maintains a relationship between files and clusters
– Clusters have two entries in the FAT
• Current cluster information
• Link to next cluster or special code indicating the last
cluster
• Keeps track of writable clusters and bad clusters
Connecting with Computer Science, 2e
10
FAT
Figure 10-3, Sectors are grouped into clusters on a hard disk
Connecting with Computer Science, 2e
11
FAT
• Hard drive organization
– Partition boot sector
• Contains information on how to access volumes
– Main and backup FAT
• If error in reading the main FAT, backup copied to
main to ensure stability
– Root directory
• Contains entries for every file and folder in the
directory
– Data area
• Measured in clusters
Connecting with Computer Science, 2e
12
FAT
Figure 10-4, Typical FAT file system
Connecting with Computer Science, 2e
13
Disk Fragmentation
• File clusters scattered in different locations on the
storage medium
• Windows provides the Disk Defragmenter utility
–
–
–
–
Reorganizes clusters contiguously
Improves performance
Minimizes movement of the read/write heads
Use regularly to ensure system runs at peak
performance
Connecting with Computer Science, 2e
14
Disk Fragmentation
Figure 10-5, Files become fragmented as they’re stored in
noncontiguous clusters; a defragmenting utility moves files to
contiguous clusters and improves disk performance
Connecting with Computer Science, 2e
15
Advantages of FAT
• Efficient use of disk space
– Does not have to use contiguous space for large
files
• File names up to 255 characters ( FAT32 )
• Easy to recover deleted files upon deletion
– System places E5h in the first position of filename
• File remains on drive
• Replace E5h with original first letter of the filename
Connecting with Computer Science, 2e
16
Disadvantages of FAT
• Performance slows down as more files are stored
on the partition
• Hard drive fragments easily
• Lack of security
• File integrity problems
– Lost clusters
– Invalid files and directories
– Allocation errors
• As disk size grows, so does the sector size –
results in wasted space
Connecting with Computer Science, 2e
17
NTFS
• Overcomes FAT system limitations
• “Journaling” file system
– Keeps track of transaction performed
– “Rolls back” transactions if errors found
• Uses a Master File Table ( MFT )
– Stores data about all files and directories
– Similar to database table with records
• Uses clusters
• Reserves blocks of space to allow the MFT to grow
Connecting with Computer Science, 2e
18
Advantages of NTFS
• File access is very fast and reliable
• MFT allows system recovery from problems without
losing significant amounts of data
• Security is greatly increased over FAT
• File encryption with EFS ( Encrypting File System )
• File compression reduces file size
– Saves disk space
Connecting with Computer Science, 2e
19
Disadvantages of NTFS
• Large overhead
– Not recommended for volumes less than 4 GB
• Cannot access NTFS volumes from:
–
–
–
–
–
MS-DOS
Windows 95
Windows 98
Linux
Mac
Connecting with Computer Science, 2e
20
Comparing File Systems
• Choosing correct file system
– Operating system dependent
– Rarely depends on hardware
• NTFS: Windows 2000, XP, Vista, 7, and 8
– Supports drive sizes up to 16 TB (1600 GB)
• FAT: Windows 9x
– Older small hard drives, small removable devices
• UNIX/Linux
– Many file system choices
Connecting with Computer Science, 2e
21
Comparing File Systems
Table 10-1, Fat16, FAT32, and NTFS compared
Connecting with Computer Science, 2e
22
Comparing File Systems
Table 10-2, Some UNIX/Linux file systems
Connecting with Computer Science, 2e
23
File Organization
• Topics covered:
– File characteristics
– How files are stored on disks and other media
Connecting with Computer Science, 2e
24
Binary or Text
• Text files
– Consist of ASCII or Unicode characters ( Text )
– Typically read with word-processing programs or text
editors
• Easy to view and modify
• Binary files
– Computer readable ( not human readable )
– Coded and numeric information
– More compact than text files
• Examples: executable programs, applications, sound
and image files
Connecting with Computer Science, 2e
25
Sequential or Random Access
• Sequential storage
– Data accessed one chunk after the other in order
• Random storage
– Data accessed in any order
– Also called direct or relative access
Connecting with Computer Science, 2e
26
Sequential or Random Access
Figure 10-6, Sequential versus random access
Connecting with Computer Science, 2e
27
Sequential Access
• Starts at the beginning and processes to the end of
the file
– Writing process is very fast
• New data added to the end of a file
– Retrieving, inserting, deleting, modifying data
• Very slow
– Stores data in rows like a database record
• Field delimiters or specific fixed sizes for each field
Connecting with Computer Science, 2e
28
Sequential Access
Figure 10-7, A comma can be used as a field delimiter
Connecting with Computer Science, 2e
29
Sequential Access
Figure 10-8, Data can also be in fixed-length format
Connecting with Computer Science, 2e
30
Random Access
• Provides faster access to large amounts of data
• Stores fixed-length records ( relative records )
– Ability to mathematically calculate the record’s
position on disk surface and go right to it
– Ability to update records in place
• May waste disk space
– Partial record or no data
• Works well when sequential record number can
easily identify records
Connecting with Computer Science, 2e
31
Random Access
Figure 10-9, Record organization and file access
Connecting with Computer Science, 2e
32
Hashing
• Used for accessing relative record files
– Uses unique value called a hash key
• Widely used in database management systems
• Involves a hashing algorithm to generate hash keys
for each record
– Combining hash keys establishes an index to rows
or records of information
Connecting with Computer Science, 2e
33
Why Hash?
• Allows a key field number not suited for relative file
access to be converted into a relative record
number
– Example: phone numbers as keys in a customer
information table
• Divide highest possible phone number by the
expected number of customers to get the hash key
• 9999999999 / 2000 (estimated number of customers)
= approximately 5,000,000
• Phone number 7025551234 / 5,000,000 gives the
record number 1045
Connecting with Computer Science, 2e
34
Why Hash?
• Hashing may result in collisions
– Same relative key is generated for more than one
original key value
• One solution:
– Expand algorithm to add the sum of the digits of the
phone number to the relative key
• Sum of the digits in phone number 7025551234 is 34
• Original key 1045 + 34 = 1079
• Lessens collisions but does not eliminate them
Connecting with Computer Science, 2e
35
Dealing with Collisions
• Best hashing algorithms have collisions
• One solution: create overflow area
– Records with duplicate record numbers are placed in
the overflow area at the end of the file
– Record retrieval
• Hash key is calculated, and record at calculation
position is retrieved
• If the record at that location isn’t the correct one, the
overflow area is searched sequentially
Connecting with Computer Science, 2e
36
Dealing with Collisions
Figure 10-10, An overflow area helps resolve collisions
Connecting with Computer Science, 2e
37
Hashing and Computing
• Efficient hashing algorithm
– Important to companies producing database
management systems
• Many different hashing algorithms are used in
computing
– Encryption and decryption
– Indexing
– Many programming languages have specialized
libraries of built-in hashing routines
Connecting with Computer Science, 2e
38
One Last Thought
• Determining a computer system’s worth
– Often measured in terms of data stored on hard
drives
– Data can be difficult to replace
• Data storage dependent on file systems
• Strong understanding of file systems allows more
data availability and protraction
Connecting with Computer Science, 2e
39
Summary
• Hard drive
– Random access device
– Stores information in tracks and sectors
– Accesses data through read/write heads
• File system
– Responsible for creating, manipulating, renaming,
copying, and removing files from a storage device
• Windows uses either FAT or NTFS
Connecting with Computer Science, 2e
40
Summary
• FAT keeps track of which files are using specific
clusters
– Vulnerable to disk fragmentation
• NTFS uses MFT to keep track of files and
directories
– Used with Windows NT Series
• NTFS advantages over FAT
– Better reliability and security, journaling, file
encryption, and file compression
Connecting with Computer Science, 2e
41
Summary
• Linux can be used with many file systems
• Files contain binary or text ( ASCII) data
• Data is usually stored and accessed either
sequentially or randomly ( relative access )
• Hashing
– Common method for accessing a relative file
– Collisions occur when the hash key is duplicated for
more than one relative record location
Connecting with Computer Science, 2e
42
The End
43
Download