Information Management DIG 3563 – Lecture 17 File Structures and Cloud Computing J. Michael Moshell University of Central Florida Original image* by Moshell et al . Imagery is fromWikimedia except where marked with *. 1 File System Organization * Disks have sectors; each sector has an address (integer) * A file is a collection of sectors. They can be contiguous or fragmented. * To find the sectors comprising a file, we need a directory. * The directory system records which sectors belong to each file. recovermyfiles.com * The Operating System has software to manage directories & files. planetoftunes.com -2 - Formatting a Disk * factory (low level) format: - timing tracks, etc. "marks in the parking lot" - usually not re-doable * local reformatting: * Check for read/write errors * Mark good sectors and bad ones * Create a list of available sectors * Set up file structure: - directory - boot sector (for bootable drives) stripespls.com -3 - File System Organization * A simple (conceptual) architecture: Directory: Dirnum 1 2 3 4 Filename Filesize addresses.doc employees.doc payroll.xls etc 144300 99800 17100 Headsector 22010 33500 33482 recovermyfiles.com * at sectors 22010, 22021 we have records: block dirnum 22010 22021 nextblock data ... 1 22021 Adams, John \t 222 West ... 22040 Wilson, Steve \t 333 East ... So the file is a linked list (like a treasure hunt) through the disk's sectors. (Not all disks are organized this way.) planetoftunes.com -4 - File System Errors * Disk drive hardware checks parity when reading sectors * If a parity error occurs, data may have been lost * Usually this just reports a failure to the OS and you're stuck. However – the actual disk drive hardware can probably still read the data; it just doesn't LIKE it. So, specialized software can sometimes get this "bad checksum" data and display it ... we discuss this shortly. -5 - File System Organization * Deleting a file: The OS keeps an available sector list of sectors that can be reused. recovermyfiles.com To DELETE a file, the system just changes its first and last links. (Think of out-of-service boxcars). The data is not gone, it's just unlinked. It will be overwritten, when (and if) the OS needs more space. tdc.ca -6 - Losing and Recovering Data recovermyfiles.com Now what if the directory or a sector gets screwed up? a) software error: erase the pointer or link to a file. or b) hardware error: part of directory or sector gets corrupted The data is still out there, but OS can't find it. If you can directly READ THE SECTORS, you will find broken strands of spaghetti ... with clues in 'em. restaurantwidow.com -7 - Recovering Data What clues exist? Links (obviously) if it's a linked system Try to reconstruct the files, or fragments of them Directory item numbers, if these exist Try to "work backwards" and reconstruct the directory The data itself (e. g. search for "Adams") Use syntactic knowledge to match up partial sentences in blocks. Which block might match that one? nguins live in Antarc... .. and we re spect the opinions of... 492.7 \t 333.9e14 ... -8 - Recovering Data If you have 'bad sectors' (i. e. bad checksums) Read the data and override the parity error messages Humans are normally required to look at the data and piece it back together. Success is not guaranteed. Formatting a drive writes 0 in all the sectors. SOME claim they can recover what was there before (maybe NSA can?) But it is not a high-percentage bet. -9 - Forensics: Finding Hidden Stuff * simplest cases: just "erased" your files? - straightforward disk recovery may work. macforensiclabs.com * the famous photocopier story. - copiers have hard drives and remember what was copied. http://www.cbsnews.com/stories/2010/04/19/eveningnews/main64 12439.shtml * RAMsticks are just like hard drives; "delete" does not empty. (Nonvolatile RAM versus volatile RAM. Why isn't it ALL nonvolatile?) -10 - Forensics: Finding Hidden Stuff * virtual memory: copies part of your RAM into hard drive on computer. macforensiclabs.com * those images may include print queues and other information that can be recovered. * backup systems may not have been reformatted even if the main hard drive was reformatted. * offsite backup probably was NOT reformatted; old sectors may have copies of data you wanted to make disappear. -11 - File Structures: Summary * vocabulary terms throughout lecture * backup/archive/redundant storage * criteria for choice of offsite backup motifake.com * understand and explain disk organization * understand how disk errors occur * analyze what data could be recovered from a particular accident * discuss forensic issues concerning disk data erasure and recovery -12 - Cloud Computing and Digital Asset Management • First let's look at the Cloud - Where did it come from? - What is it? - How can it help me? - What new skills will I need to use it? - What effect does Cloud have on DAM? -13 - As of the Year 2000 ... • Most Internet Service Providers sold ( ... rented ...) • dedicated hosting One website: delivered by 1 computer mystore.com • shared virtual hosting yourstore.com N websites each got 1/Nth computer histore.com herstore.com -14 - And a few giants (Yahoo, Google, Amazon) Built giant 'ad hoc' systems with thousands of CPUs and petabytes of storage. phaseoneenterprises.com -15 - And a few giants (Yahoo, Google, Amazon) Built giant 'ad hoc' systems with thousands of CPUs and petabytes of storage. Amazon noticed ... phaseoneenterprises.com less than 10% of their capacity was being used most of the time. -16 - ... and in 2006 launched Amazon Web Services The 'utility model': power plants have capacity to meet AVERAGE demand and so can deliver UNLIMITED* power to some customers when needed. en.wikimedia.org (*"Unlimited" as long as << total capacity) -17 - The Shared Telescope Model Astronomers worldwide now schedule time on big telescopes through the Internet and don't have to go to a cold mountaintop and stay up all night to capture imagery. as.utexas.edu -18 - The Shared Computing Model NASA released NEBULA in 2008, to share research computers instead of building additional data centers. NEBULA is an open source cloud management system. -19 - ... resembles the old Mainframe Timeshare model Before PCs, we programmed on punch-cards as.utexas.edu -20 - ... resembles the old Mainframe Timeshare model Before PCs, we programmed on punch-cards and thought it was a great INNOVATION when time-sharing became possible. as.utexas.edu -21 - But with one fundamental difference: In 1965 this was SCARCE and we were NUMEROUS (relatively) as.utexas.edu redlinecs.com.au (Skilled specialists who wanted to use computers) -22 - But with one fundamental difference: In 2012 this is ABUNDANT and we are EVERYONE allthingsdistributed.com reuters.com -23 - ... relies on fast, reliable networks ... may reduce your company's IT costs * software is expensive – so RENT it * hardware is expensive to update – so RENT it * buildings are expensive – so share them * land is expensive – build in rural areas -24 - Key Cloud Concepts: 1. Agility through dynamic provisioning - Order up "supercomputer for an hour" 2. API Accessibility - Your program can specify the needed QOS* QOS: Quality of Service: - Maximum guaranteed latency (e. g. <1ms) - Minimum guaranteed CPU (e. g. >1 petaflop) -25 - What's a flop? "Floating Point Operations" like x=239.44*456.3733 per second Math models (physics, stock market, statistics) may need tera = billion*billion of flops giga = 109 tera = 1012 peta = 1015 exa = 1018 -26 - Key Cloud Concepts: 1. Agility through dynamic provisioning - Order up "supercomputer for an hour" 2. API Accessibility - Your program can specify the needed QOS* 3. Virtualization - You "THINK" you have your own machine - Protection models don't need to be reinvented http://www.vmware.com/virtualization/ -27 - One Key Cloud Concern: SECURITY. (I know this guy) http://www.acsac.org/2012/workshops/ccw/ One solution (for larger firms): Build your own Cloud. http://www.enterprisenetworkingplanet.com/ebooks/509 -28 50510/95900/4190310/ Quickly, web-hosts realized that they could virtualize their service bigbird.com cookie.com elmo.com kermit.com piggie.com -29 - Software as a Service (SaaS) pin.primate.wisc.edu The 800 pound anthropoid: Salesforce.com http://www.salesforce.com sales cloud (CRM systems) force.com – build your own -30 - Digital Asset Management in the Cloud pin.primate.wisc.edu 1. Simple: Dropbox 2. Specialized for software: Github 3. Rich metadata -> DAM (e. g. AlienBrain) Media Valet http://www.mediavalet.co/home.aspx Widen Fordela -31 - Digital Asset Management in the Cloud pin.primate.wisc.edu 1. Simple: Dropbox 2. Specialized for software: Github 3. Rich metadata -> DAM (e. g. AlienBrain) Media Valet http://www.mediavalet.co/home.aspx "CMIS compliant?" -32 - Content Management Interoperability Standard http://en.wikipedia.org/wiki/Content_Management_Inte roperability_Services CMIS is an open standard that defines how DAM systems can manage metadata ("generic properties") for files and folders. Adobe, HP, IBM, Microsoft, Oracle + + + -33 - Digital Asset Management in the Cloud pin.primate.wisc.edu 1. Simple: Dropbox 2. Specialized for software: Github 3. Rich metadata -> DAM (e. g. AlienBrain) Media Valet Widen - http://www.widen.com/ Fordela -34 - Digital Asset Management in the Cloud pin.primate.wisc.edu 1. Simple: Dropbox 2. Specialized for software: Github 3. Rich metadata -> DAM (e. g. AlienBrain) Media Valet Widen Fordela http://www.fordela.com/ - VIDEO focus (started by LucasArts veterans) -35 - Choosing a DAM System pin.primate.wisc.edu Here's a logically organized Buyer's Guide http://www.datamation.com/storage/digital-assetmanagement-buying-guide-1.html -36 - Choosing a DAM System pin.primate.wisc.edu Here's a logically organized Buyer's Guide http://www.datamation.com/storage/digital-assetmanagement-buying-guide-1.html End of lecture ... End of lectureS. When we return ... Project Show-and-tell! -37 -