Retrospecting VM Images Wolfgang Richter Glenn Ammons

advertisement
Retrospecting VM Images
Wolfgang Richter
Glenn Ammons*, Jan Harkes, Adam Goode, Vasanth Bala*,
Nilton Bila+, Eyal de Lara+, Mahadev Satyanarayan
*
IBM Research
+ University of Toronto
The Importance of Content
http://www.pdl.cmu.edu/
2
Wolfgang Richter © July 17, 2016
Retrospection
• VM collections growing
• 300% Year over year, IBM Research RC2
• System, application, and user content
• Searchable history
•
•
•
•
Debugging opportunities
Legal data or code origin
Malware tracking
License violations
http://www.pdl.cmu.edu/
3
Wolfgang Richter © July 17, 2016
Roadmap
• What is the retrospection problem?
• What are the main challenges?
• How can we solve them?
http://www.pdl.cmu.edu/
4
Wolfgang Richter © July 17, 2016
The retrospection mechanism should place as few
constraints as possible on the code used for search
computations.
PRINCIPLE 1
http://www.pdl.cmu.edu/
5
Wolfgang Richter © July 17, 2016
A search computation should only be performed
on demand for a specific query, and its scope should be
restricted to the smallest relevant subset of VM images.
PRINCIPLE 2
http://www.pdl.cmu.edu/
6
Wolfgang Richter © July 17, 2016
Control of policy for retrospection should reside with the
owners of VM images and their delegates.
PRINCIPLE 3 - WIP
http://www.pdl.cmu.edu/
7
Wolfgang Richter © July 17, 2016
Find The Picture
• Rich content-based and application-specific queries
• 10 Graphics OK
• 100 Graphics OK
• 1000+ Graphics ?
http://www.pdl.cmu.edu/
8
Wolfgang Richter © July 17, 2016
OpenDiamond Platform
• Distributed, interactive, unindexed search
• Focuses on the principle of early discard
• Enables arbitrary search queries
• Arbitrary x86 binary code
The retrospection mechanism should place as
few constraints as possible on the code used for
search computations.
http://www.pdl.cmu.edu/
9
Wolfgang Richter © July 17, 2016
Available Structured Data
• VM’s have attributes and metadata
• Owners
• Files
• File Systems
• Files have attributes and metadata
•
•
•
•
Owners
File Type
Permissions
Modification Timestamp
http://www.pdl.cmu.edu/
10
Wolfgang Richter © July 17, 2016
Scoping Solution
• Metadata MySQL database
• Scope Server
• Manage access to data
A
search
computation should only
be performed on demand
• Scope Cookie
for a specific query, and
• X.509 signed cookie
its scope should be
• Determines accessible data restricted to the smallest
relevant subset of VM
images.
http://www.pdl.cmu.edu/
11
Wolfgang Richter © July 17, 2016
Problem: VM Sprawl in Files
Data from 78 NCSU VCL
VM Images based on Windows XP
http://www.pdl.cmu.edu/
12
Wolfgang Richter © July 17, 2016
Problem: VM Sprawl in Bytes
Data from 78 NCSU VCL
VM Images based on Windows XP
http://www.pdl.cmu.edu/
13
Wolfgang Richter © July 17, 2016
Solution: Deduplication in Files
Reduce
Search
Time
Data from 78 NCSU VCL
VM Images based on Windows XP
http://www.pdl.cmu.edu/
14
Wolfgang Richter © July 17, 2016
Solution: Deduplication in Bytes
Reduce
Storage
Space
Data from 78 NCSU VCL
VM Images based on Windows XP
http://www.pdl.cmu.edu/
15
Wolfgang Richter © July 17, 2016
IBM Research Mirage
• File-level deduplication
• Files are referenced by SHA-1 tag
• Reads VM image partitions and file systems
• On-disk deduplicated format
• Centralized VM store – a potential bottleneck
http://www.pdl.cmu.edu/
16
Wolfgang Richter © July 17, 2016
Network Bottlenecks Megabytes
Takeaway:
Centralized store limited
by network bandwidth,
limiting parallelism.
http://www.pdl.cmu.edu/
17
Wolfgang Richter © July 17, 2016
Network Bottlenecks Objects
Takeaway:
The number of objects
pushed determines the
possible
number
of
search processes.
http://www.pdl.cmu.edu/
18
Wolfgang Richter © July 17, 2016
Dataretriever
• Abstract data source
• getObject() interface
• Search process oblivious to where data comes from
• Access deduplicated data
• Unmodified client and search server
• Solve network bottleneck: Data partitioning
• Compute on local data rather than central store
• Layer of indirection enables this without modification
http://www.pdl.cmu.edu/
19
Wolfgang Richter © July 17, 2016
Architecture
Scope Cookie
Scope Definition
Scope
Mirage
Server
Client
Request
Objects
Raw Data
Dataretriever
Dataretriever
Dataretriever
Server
Server
Metadata Query
Query+Cookie
MySQL
http://www.pdl.cmu.edu/
20
Wolfgang Richter © July 17, 2016
Revisiting Network Bottlenecks
How bad is the
bottleneck with
content-based
queries?
http://www.pdl.cmu.edu/
21
Wolfgang Richter © July 17, 2016
CPU-Bound Search Process
Takeaway:
Content search
is limited by
computation,
although
embarrassingly
parallel.
http://www.pdl.cmu.edu/
22
Wolfgang Richter © July 17, 2016
Achievable Efficient Retrospection
Takeaway:
Search scales with servers,
and the Mirage case closely
matches local.
http://www.pdl.cmu.edu/
23
Wolfgang Richter © July 17, 2016
Current Research: Principle 2
• Control of policy to owners via encryption
• Proof of concept: convergent encrypt /home
• Encrypt files using file hash as key
• Fine-grained, per file
• Future direction: key escrow?
• Support investigations and warrants
• Support multiple encryption methods?
• Per VM Image? Groups?
http://www.pdl.cmu.edu/
24
Wolfgang Richter © July 17, 2016
Recap
• Retrospection – search VM image content
• Main challenges
1. Get data efficiently
– Solution: Dataretriever
2. Handle big and growing data
– Solution: Scoping + Deduplication
3. Privacy and encryption
http://www.pdl.cmu.edu/
25
Wolfgang Richter © July 17, 2016
OpenDiamond - http://diamond.cs.cmu.edu
IBM Mirage - http://doi.acm.org/10.1145/1346256.1346272
Convergent Encryption - http://doi.acm.org/10.1145/339331.339345
LEARN MORE
http://www.pdl.cmu.edu/
26
Wolfgang Richter © 7/17/2016
Download