The Cloud - University of Central Florida

advertisement
Information Management
DIG 3563 – Lecture 17
File Structures
and
Cloud Computing
J. Michael Moshell
University of Central Florida
Original image* by Moshell et al .
Imagery is fromWikimedia except where marked with *.
1
File System Organization
* Disks have sectors; each sector
has an address (integer)
* A file is a collection of sectors. They can
be contiguous or fragmented.
* To find the sectors comprising a file,
we need a directory.
* The directory system records which
sectors belong to each file.
recovermyfiles.com
* The Operating System has software
to manage directories & files.
planetoftunes.com
-2 -
Formatting a Disk
* factory (low level) format:
- timing tracks, etc.
"marks in the parking lot"
- usually not re-doable
* local reformatting:
* Check for read/write errors
* Mark good sectors and bad ones
* Create a list of available sectors
* Set up file structure:
- directory
- boot sector (for bootable drives)
stripespls.com
-3 -
File System Organization
* A simple (conceptual) architecture:
Directory:
Dirnum
1
2
3
4
Filename
Filesize
addresses.doc
employees.doc
payroll.xls
etc
144300
99800
17100
Headsector
22010
33500
33482
recovermyfiles.com
* at sectors 22010, 22021 we have records:
block
dirnum
22010
22021
nextblock data ...
1
22021 Adams, John \t 222 West ...
22040 Wilson, Steve \t 333 East ...
So the file is a linked list (like a treasure
hunt) through the disk's sectors.
(Not all disks are organized this way.)
planetoftunes.com
-4 -
File System Errors
* Disk drive hardware checks parity when reading sectors
* If a parity error occurs, data may have been lost
* Usually this just reports a failure to the OS and you're stuck.
However – the actual disk drive hardware can probably still
read the data; it just doesn't LIKE it.
So, specialized software can sometimes get this "bad checksum"
data and display it ... we discuss this shortly.
-5 -
File System Organization
* Deleting a file:
The OS keeps an available sector list
of sectors that can be reused.
recovermyfiles.com
To DELETE a file, the system just
changes its first and last links. (Think of out-of-service boxcars).
The data is not gone, it's just unlinked.
It will be overwritten, when (and if)
the OS needs more space.
tdc.ca
-6 -
Losing and Recovering Data
recovermyfiles.com
Now what if the directory or a sector gets
screwed up?
a) software error: erase the pointer or link to
a file.
or
b) hardware error: part of directory or sector gets corrupted
The data is still out there, but OS can't find it.
If you can directly READ THE SECTORS, you will find
broken strands of spaghetti ... with clues in 'em.
restaurantwidow.com
-7 -
Recovering Data
What clues exist?
Links (obviously) if it's a linked system
Try to reconstruct the files, or fragments of them
Directory item numbers, if these exist
Try to "work backwards" and reconstruct the directory
The data itself (e. g. search for "Adams")
Use syntactic knowledge to match up partial sentences
in blocks. Which block might match that one?
nguins live in Antarc...
.. and we re
spect the opinions of...
492.7 \t 333.9e14 ...
-8 -
Recovering Data
If you have 'bad sectors' (i. e. bad checksums)
Read the data and override the parity error messages
Humans are normally required to look at the data and piece it
back together.
Success is not guaranteed.
Formatting a drive writes 0 in all the sectors. SOME claim they
can recover what was there before (maybe NSA can?)
But it is not a high-percentage bet.
-9 -
Forensics: Finding Hidden Stuff
* simplest cases: just "erased" your files?
- straightforward disk recovery may work.
macforensiclabs.com
* the famous photocopier story.
- copiers have hard drives and remember what was copied.
http://www.cbsnews.com/stories/2010/04/19/eveningnews/main64
12439.shtml
* RAMsticks are just like hard drives; "delete" does not empty.
(Nonvolatile RAM versus volatile RAM.
Why isn't it ALL nonvolatile?)
-10 -
Forensics: Finding Hidden Stuff
* virtual memory: copies part of your RAM
into hard drive on computer.
macforensiclabs.com
* those images may include print queues and other information
that can be recovered.
* backup systems may not have been reformatted even if the main
hard drive was reformatted.
* offsite backup probably was NOT reformatted; old sectors may
have copies of data you wanted to make disappear.
-11 -
File Structures: Summary
* vocabulary terms throughout lecture
* backup/archive/redundant storage
* criteria for choice of offsite backup
motifake.com
* understand and explain disk organization
* understand how disk errors occur
* analyze what data could be recovered from a particular accident
* discuss forensic issues concerning disk data erasure and
recovery
-12 -
Cloud Computing and
Digital Asset Management
• First let's look at the Cloud
- Where did it come from?
- What is it?
- How can it help me?
- What new skills will I need to use it?
- What effect does Cloud have on DAM?
-13 -
As of the Year 2000 ...
• Most Internet Service Providers sold ( ... rented ...)
• dedicated hosting
One website: delivered by 1 computer
mystore.com
• shared virtual hosting
yourstore.com
N websites each got 1/Nth computer
histore.com
herstore.com
-14 -
And a few giants (Yahoo, Google,
Amazon)
Built giant 'ad hoc'
systems with
thousands of CPUs
and petabytes of
storage.
phaseoneenterprises.com
-15 -
And a few giants (Yahoo, Google,
Amazon)
Built giant 'ad hoc'
systems with
thousands of CPUs
and petabytes of
storage.
Amazon noticed ...
phaseoneenterprises.com
less than 10% of their capacity was being used
most of the time.
-16 -
... and in 2006 launched
Amazon Web Services
The 'utility model': power plants
have capacity to meet
AVERAGE demand
and so can
deliver UNLIMITED*
power to some customers
when needed.
en.wikimedia.org
(*"Unlimited" as long as << total capacity)
-17 -
The Shared Telescope Model
Astronomers worldwide
now schedule time on
big telescopes
through the Internet
and don't have to go to a cold mountaintop
and stay up all night
to capture imagery.
as.utexas.edu
-18 -
The Shared Computing Model
NASA released NEBULA in 2008,
to share research computers
instead of building additional
data centers.
NEBULA is an open source cloud management
system.
-19 -
... resembles the old Mainframe
Timeshare model
Before PCs, we
programmed on punch-cards
as.utexas.edu
-20 -
... resembles the old Mainframe
Timeshare model
Before PCs, we
programmed on punch-cards
and thought it was a
great INNOVATION
when time-sharing
became possible.
as.utexas.edu
-21 -
But with one fundamental difference:
In 1965 this was SCARCE
and we were NUMEROUS
(relatively)
as.utexas.edu
redlinecs.com.au
(Skilled specialists who wanted to use computers)
-22 -
But with one fundamental difference:
In 2012 this is ABUNDANT
and we are
EVERYONE
allthingsdistributed.com
reuters.com
-23 -
... relies on fast, reliable networks
... may reduce your company's IT costs
* software is expensive – so RENT it
* hardware is expensive to update – so RENT it
* buildings are expensive – so share them
* land is expensive – build in rural areas
-24 -
Key Cloud Concepts:
1. Agility through dynamic provisioning
- Order up "supercomputer for an hour"
2. API Accessibility
- Your program can specify the needed QOS*
QOS: Quality of Service:
- Maximum guaranteed latency (e. g. <1ms)
- Minimum guaranteed CPU (e. g. >1 petaflop)
-25 -
What's a flop?
"Floating Point Operations" like x=239.44*456.3733
per second
Math models (physics, stock market, statistics)
may need tera = billion*billion of flops
giga = 109
tera = 1012
peta = 1015
exa = 1018
-26 -
Key Cloud Concepts:
1. Agility through dynamic provisioning
- Order up "supercomputer for an hour"
2. API Accessibility
- Your program can specify the needed QOS*
3. Virtualization
- You "THINK" you have your own machine
- Protection models don't need to be reinvented
http://www.vmware.com/virtualization/
-27 -
One Key Cloud Concern:
SECURITY.
(I know this guy)
http://www.acsac.org/2012/workshops/ccw/
One solution (for larger firms): Build your own Cloud.
http://www.enterprisenetworkingplanet.com/ebooks/509
-28 50510/95900/4190310/
Quickly, web-hosts realized that they
could virtualize their service
bigbird.com
cookie.com
elmo.com
kermit.com
piggie.com
-29 -
Software as a Service (SaaS)
pin.primate.wisc.edu
The 800 pound anthropoid:
Salesforce.com
http://www.salesforce.com
sales cloud (CRM systems)
force.com – build your own
-30 -
Digital Asset Management
in the Cloud
pin.primate.wisc.edu
1. Simple: Dropbox
2. Specialized for software: Github
3. Rich metadata -> DAM (e. g. AlienBrain)
Media Valet http://www.mediavalet.co/home.aspx
Widen
Fordela
-31 -
Digital Asset Management
in the Cloud
pin.primate.wisc.edu
1. Simple: Dropbox
2. Specialized for software: Github
3. Rich metadata -> DAM (e. g. AlienBrain)
Media Valet http://www.mediavalet.co/home.aspx
"CMIS compliant?"
-32 -
Content Management
Interoperability Standard
http://en.wikipedia.org/wiki/Content_Management_Inte
roperability_Services
CMIS is an open standard that defines how DAM
systems can manage metadata ("generic properties")
for files and folders.
Adobe, HP, IBM, Microsoft, Oracle + + +
-33 -
Digital Asset Management
in the Cloud
pin.primate.wisc.edu
1. Simple: Dropbox
2. Specialized for software: Github
3. Rich metadata -> DAM (e. g. AlienBrain)
Media Valet
Widen - http://www.widen.com/
Fordela
-34 -
Digital Asset Management
in the Cloud
pin.primate.wisc.edu
1. Simple: Dropbox
2. Specialized for software: Github
3. Rich metadata -> DAM (e. g. AlienBrain)
Media Valet
Widen
Fordela http://www.fordela.com/ - VIDEO focus
(started by LucasArts veterans)
-35 -
Choosing a DAM System
pin.primate.wisc.edu
Here's a logically organized Buyer's Guide
http://www.datamation.com/storage/digital-assetmanagement-buying-guide-1.html
-36 -
Choosing a DAM System
pin.primate.wisc.edu
Here's a logically organized Buyer's Guide
http://www.datamation.com/storage/digital-assetmanagement-buying-guide-1.html
End of lecture ... End of lectureS.
When we return ... Project Show-and-tell!
-37 -
Download