Project 3: An Introduction to File Systems CS3430 Operating Systems University of Northern Iowa 1 Introduction The goal of project 3 is to understand basic file system design and implementation file system testing data serialization/de-serialization At the end of the project, you will feel like a file system expert! 2 Outline Project 3 Specification Downloading and testing file system image General FAT32 data structures Endian-ness 3 Project 3 More than you wanted to know about FAT32.. 4 Project 3 You will create a user-space utility to manipulate a FAT32 file system image No more kernel programming! Utility must understand a few basic commands to allow simple file system manipulation Utility must not corrupt the file system and should be robust 5 FAT32 Manipulation Utility Utility only recognizes the following built-in readonly commands: info stat volume quit cd ls read 6 File System Image Manipulation utility will work on a preconfigured FAT32 file system image Actually a file File system image will have raw FAT32 data structures inside Just like looking at the raw bytes inside of a disk partition 7 File System Image Your FAT32 manipulation utility will have to Open the FAT32 file system image Read parts of the FAT32 file system image and interpret the raw bytes inside to service your utility’s file system commands… …just like a file system! 8 General FAT32 Data Structures 9 Terminology Byte – 8 bits of data, the smallest addressable unit in modern processors Sector – Smallest addressable unit on a storage device. Usually this is 512 bytes Cluster – FAT32-specific term. A group of sectors representing a chunk of data FAT – Stands for file allocation table and is a map of files to data 10 FAT32 Disk Layout 3 main regions… Reserved Region Track Sector FAT Region Data Region Disk arm Reserved Region Reserved Region – Includes the boot sector, the extended boot sector, the file system information sector, and a few other reserved sectors Reserved Region Boot Sector FS Information Sector FAT Region Additional Reserved Sectors (Optional) Data Region FAT Region FAT Region – A map used to traverse the data region. Contains mappings from cluster locations to cluster locations Reserved Region FAT Region File Allocation Table #1 Data Region Copy of File Allocation Table #1 Data Region Data Region – Using the addresses from the FAT region, contains actual file/directory data Reserved Region FAT Region Data Region Data until end of partition Endian Big or little? 15 Machine Endianness The endianness of a given machine determines in what order a group of bytes are handled (ints, shorts, long longs) Big-endian – most significant byte first Little-endian – least significant byte first This is important to understand for this project, since FAT32 is always formatted as little-endian FAT32 Endianness The following are a few cases where endianness matters in your project: Reading in integral values from the FAT32 image Reading in shorts from a FAT32 image Combining multiple shorts to form a single integer from the FAT32 image Interpreting directory entry attributes Endian Example (English Version) Imagine you can only communicate three letters at a time, and your word is “RAPID” Big-endian 1. RAP 2. ID Word = RAPID Little-endian 1. PID 2. RA Word = PIDRA (come again?) Endian Example (data version) short value = 15; /* 0x000F */ char bytes[2]; memcpy(bytes, &value, sizeof(short)); In little-endian: bytes[0] = 0x0F bytes[1] = 0x00 In big-endian: bytes[0] = 0x00 bytes[1] = 0x0F Endian Example (data version 2) int value = 13371337; /* 0x00CC07C9 */ char bytes[4]; memcpy(bytes, &value, sizeof(int)); In little-endian: In big-endian: bytes[0] = 0xC9 bytes[1] = 0x07 bytes[2] = 0xCC bytes[3] = 0x00 bytes[0] = 0x00 bytes[1] = 0xCC bytes[2] = 0x07 bytes[3] = 0x09 Visualizing Example 2 Value = 13371337 (0x00CC07C9) index 0 1 2 3 little endian 0xC9 0x07 0xCC 0x00 big endian 0x00 0xCC 0x07 0xC9 Helper Resources and Programs Look at the resources page to see how to do file I/O in C Look at how to compile python and java programs on the Linux virtual machines Will have to read and seek around in the image file. If your project does not compile or run on the Linux virtual machines, the most it can earn is 50% Parameter passing program In C, but other languages are very similar 22 Additional Project 3 Information Like other projects, may work in teams of 2 or alone 23 Next Steps Take a look at the fat32 specification file from Microsoft Somewhat confusing – just like the real world (haha) Take a look at the image file 24