(quiz starts at 6:30) Monday, November 25, 2013 5:51 PM Special Page 1 Some cosmic considerations Thursday, December 03, 2009 3:42 PM Some cosmic considerations When a file is deleted, its blocks are not erased. All that happens is that its directory entry is X'd out. It is possible to reconstruct files from what is left on disk A colleague of mine based his Ph.D. thesis on recovering personal information from discarded hard drives for sale on ebay! Then he called each person and told them he had, e.g., their credit card numbers! My favorite disk tool: Zero Assumption Recovery Doesn't mount the disk. Figures out what it can based upon no assumptions. Advises you of what it can reconstruct. My favorite free utility: Eraser. Ignores the filesystem. Writes random hex codes into the free blocks. Special Page 2 Special needs Tuesday, December 07, 2010 11:49 AM So far, we've considered filesystems on devices that can be read and written with little concern for wear. Not all devices are like that. Two problems: How to construct a filesystem on a device in which writes degrade the device (flash). -- today How to construct a filesystem on a device in which one can only write once (CDROM, DVD), or where writes must be orchestrated with erases. Special Page 3 Dealing with the physical Thursday, December 03, 2009 3:32 PM Dealing with physical limits The ext2 format is optimized for physical disk: no limit on rewriting superblock re-written for every block allocation Flash is a very different medium than physical disk limited number of rewrites before failure (1000/block) how does one deal with the super-block in this case? A very strange answer: add a logical-to-physical layer between the filesystem and the flash! Flash drive maintains a mapping between logical flash blocks and physical flash blocks. Flash drive attempts to write new data into a new physical block, then remaps it! Thus we optimize the life of the flash! This is called "load leveling" or "wear leveling". Q: where is the logical to physical mapping kept? A: a very complex scheme ensues, partly on disk, partly in memory. Special Page 4 Monday, November 25, 2013 6:09 PM Special Page 5 Dealing with Flash Memory Thursday, November 29, 2012 12:50 PM Dealing with Flash Memory Only truly portable file system is FAT. Most flash drives use it. But it was built for disks. Q: How do we cope with the flash rewrite problem? A: block virtualization. Basic strategy is as follows: Virtualize the raw device. Underneath the file system. Result is a raw filesystem that is robust against multiple writes to the same block. How: we keep moving the (physical) block! http://www.cs.tau.ac.il/~stoledo/Pubs/flash-survey.pdf Special Page 6 Properties of flash memory Thursday, November 29, 2012 5:58 PM All flash memory: Blocks start out as all 1's. Can program a bit by changing it to a 0. Bits are written in sectors. Can erase a block by changing it back to all 1's. Typical parameters: 1 (writable) sector = 4 KB. 1 (erasable) block = 128 KB. Special Page 7 Two kinds of flash Thursday, November 29, 2012 5:57 PM Two kinds of flash: NOR flash (legacy, older drives): Each bit can be individually cleared once per erase cycle. Each block can be erased (set to 1's) in entirety. NAND flash (modern): Must write a sector at a time. Sector writes (4 KB) can only make 1->0 transitions; (0->1 transitions are ignored). Thus one can write via a masking strategy: write 1 to preserve existing sector contents. Limited number of sector writes per erase cycle. (We then have to copy the block and erase the original) Block erases (128 KB) can make the whole block 1's. Special Page 8 Basic flash writing strategies Thursday, November 29, 2012 5:56 PM Basic flash writing strategies: Descriptor arrays: if something has to change, record the changes in a pre-allocated set of contiguous sectors. Whiteout: set a descriptor to all 0's: tells controller to use the next descriptor in the array. Wear-leveling: virtualize all addresses of sectors on the flash; when rewriting, move the sector and change the mapping. Backup stores: keep two copies of every descriptor block, and keep shifting which one is written to Special Page 9 Page descriptors Thursday, November 29, 2012 5:55 PM The page descriptor: contiguous to the physical page Virtual page number that this physical page represents (inverse mapping). Three status bits in the page descriptor: free/used: 1 if free, 0 if used. pre-valid/valid: 1 if pre-valid, 0 if valid. valid/obsolete: 1 if valid, 0 if obsolete. Three states of a page: free valid obsolete state 1 1 1 unused 0 1 1 allocated 0 0 1 valid 0 0 0 obsolete Special Page 10 Writing a page Thursday, November 29, 2012 5:55 PM Q: Why this much detail? A: Atomicity of block writes. Writing a page: Find a blank page (virtual page number is all 1's) Mark it allocated. Write contents into it. Write logical page number into it. Mark it valid. Enter it into the page table (logical to physical): a memory object. Mark old version (if any) obsolete. Special Page 11 The mapping table Thursday, November 29, 2012 5:54 PM The mapping table A memory object. Initialized when flash drive is inserted. Contains virtual-to-physical page mapping. Updated as pages change. Older flash drives were small=> initialize all virtual blocks during insertion. Newer flash drives are large => need some way to speed up that access. Special Page 12 FTL Thursday, November 29, 2012 5:53 PM Problem: virtualization takes time to initialize. Solution: an on-flash image of block mapping. Infrequently modified. Using a white-out strategy to invalidate old maps. One solution: Flash Translation Layer (FTL) FTL Two-layer mapping Virtual block # in filesystem (via first mapping, to) Logical erase unit #, block # (via second mapping, to) Physical erase unit #, block # Special Page 13 Backup maps Thursday, November 29, 2012 5:53 PM Backup maps Always keep two copies of each mapping table. When changing a map, First see if the backup map location is all 1's If it is, Clear the main map entry to all 0's Put the new location into the backup map entry. Else rewrite the main map. Q: Why such trouble? A: Consider what happens if power fails. Moral of story: the structure of manipulation for a filesystem is very much dependent on the limitations of the media. Special Page 14 Flash and security Monday, November 25, 2013 4:59 PM One cannot erase a flash reliably. The only strategy that works is to overwrite all blocks. By changing the interface, old blocks can be read. Thus do not put anything on a flash that you wouldn't like people to recover. Special Page 15 Rethinking filesystems Tuesday, December 07, 2010 11:54 AM Rethinking filesystems The cloud has forced us to think of files in a radically different way. Inside google, There are no "files". There are distributed objects with multiple instances. To store an instance of an object, one throws it into the cloud. To reconstruct something like a file, one makes a query into the cloud to return matching instances. In other words, files are a convenient illusion, maintained for backward compatibility. Special Page 16 Example: our mail versus google mail Tuesday, December 07, 2010 11:58 AM Our mail: INBOX is a file. Consisting of a concatenation of mail messages. Where deleting one requires rewriting the whole file. Google mail: Mail is a distributed object. Messages are distributed object instances. Your inbox is a query. "All mail" is the universal query. There is no "file" where all of your messages are stored. They're spread among over 10,000 machines! More about this in "Cloud Computing"! Special Page 17