pptx

advertisement
Chipping Away at Censorship
with User-Generated Content
Sam Burnett, Nick Feamster and
Santosh Vempala
Internet Censorship is a Problem
•
•
•
•
See http://rsf.org for more
12 censors
11 monitors
25% of population
More on the way
It’s Not Only China…at Home, Too
Intro to Internet Censorship
Censor
Censor
Firewall
AliceUser
Block Traffic
Punish
Censored net
Uncensored net
Bob
Solution: Use a Helper
Helper
Firewall
Alice
Censored net
Uncensored net
Bob
The helper sends messages to and from
blocked hosts on your behalf
Design Goals for the Helper
• Be robust against blocking
• Be deniable against user identification
• Require no dedicated infrastructure
What about Proxies and Mixnets?
(e.g., Tor)
Proxy
Proxy
Firewall
Alice
Censored net
Uncensored net
Bob
• Censors can block proxies if the proxy list is public
• Not deniable if encryption is incriminating
• Requires dedicated infrastructure (network of proxies)
What About Covert Channels?
(e.g., Infranet)
usenix.org
Unblocked host
Firewall
Alice
Censored net
Uncensored net
Bob
• Not entirely robust against blocking
• More deniable because messages are hidden
• Requires dedicated infrastructure (Web servers)
Collage: Let User-Generated Content
Help Defeat Censorship
User-generated content hosts
Alice
• Robust by using redundancy
• Users generate innocuous-looking traffic
• No dedicated infrastructure required
Bob, a Flickr user
Why Might Collage Work?
• Lots of User-Generated Content (UGC)
– More than 4 billion Flickr images
– A day of video uploaded to YouTube every minute
• Many sites host UGC
• We have tools to store censored data in UGC
– Steganography, watermarking
Outline
• Background and Design Goals
• Collage Design
• Performance and Demo
Collage, Step-by-Step
Embedded
Vector Vector
Message
Alice
Collage steps:
1. Obtain message
2. Pick message identifier
3. Obtain cover media
4. Embed message in cover
5. Upload UGC to content host
6. Find and download UGC
7. Decode message from UGC
Bob
Content host
Step 2:
3:
4:
5:
6:
7:
1: Pick
Obtain
Embed
Upload
Find message
Decode
and
cover
message
UGC
message
download
to
media
identifier
content
in
from
cover
UGC
UGC
host
• Application
Your personal
Next
slide specific
photos
• Only
Generous
intended
usersrecipient should know it
Embedding Messages in Vectors
•
•
Encrypt the message using the identifier
Generate chunks using erasure coding
– Generate many chunks, recover from any k-subset
– Allows splitting among many vectors, robustness
•
Embed chunks into vectors
Collage steps:
1. Obtain message
2. Pick message identifier
3. Obtain cover media
4. Embed message in cover
5. Upload UGC to content host
6. Find and download UGC
7. Decode message from UGC
Steganography: hard to detect
Watermarking: hard to remove
Do the reverse to decode
Where Are Bob’s Vectors?
• Crawling all of Flickr is not an option
• Must agree on a subset of a content host
without any immediate communication
Collage steps:
1. Obtain message
2. Pick message identifier
3. Obtain cover media
4. Embed message in cover
5. Upload UGC to content host
6. Find and download UGC
7. Decode message from UGC
Solution: A predictable way of
mapping message identifiers to
subsets of content hosts
Solution: Task Mapping
1.
2.
3.•
Tasks
Hash the identifier
Hash the tasks
Map
identifier
closest
Receivers
performtothese
taskstasks
to get vectors
• Senders publish vectors so that
when receivers perform tasks,
they get the sender’s vectors
Collage steps:
1. Obtain message
2. Pick message identifier
3. Obtain cover media
4. Embed message in cover
5. Upload UGC to content host
6. Find and download UGC
7. Decode message from UGC
Message Identifier
http://nytimes.com
1
Search for blue11
flowers on Flickr
Look at JohnDoe’s9videos on YouTube
Tasks
6
3
How Does Collage Meet the
Design Goals?
• Robust against blocking
– Erasure coding
– Many content hosts
• Deniable against user identification
– Traffic only to/from content hosts
– Depends upon task construction
• Require no dedicated infrastructure
– Messages stored on content hosts
How Do You Start Using Collage?
Send & Receive Messages
1. Distribute software
– CDROM
– Spam everyone
– A secure network
2. Refresh task list
– Receive using Collage
– Online resource
3. Message identifier
– Application specific
Help Censored Users
1. Donate your UGC vectors
– Photos on Flickr
– Tweets on Twitter
– Etc.
2. Write Collage applications
Outline
• Background and Design Goals
• Collage Design
• Performance and Demo
Performance Metrics
• Sender and receiver traffic overhead
• Sender and receiver transfer time
• Storage required on content hosts
But these metrics can vary a lot:
• Different content hosts
• Different tasks
• Different applications
Case Study
News Articles
Covert Tweets
Content host
Message size
Vectors needed
Storage needed
Flickr
30 KB
5
600 KB
Twitter
140 Bytes
30
4 KB
Sending traffic
Sending time
Receiving traffic
1,200 KB
5 minutes
6,000 KB
1,100 KB
60 minutes
600 KB
Receiving time
2 minutes
½ minute
Experiments performed on a 768/128 Kbps DSL connection
Demo of a Collage Application
What Should You Do Now?
• Try out the demo application
• Donate your photos
– For now, only Flickr Pro users
– Embed news articles when you upload photos
Visit http://gtnoise.net/collage
Conclusion
• Collage evades Internet censorship by
tunneling messages inside user-generated content
– Robust against blocking
– Deniable against user identification
– Requires no dedicated infrastructure
• More work needed
– Statistical deniability against traffic analysis
– Learn timing behavior from users
– Tor bridge discovery
http://gtnoise.net/collage
Yes, Steganography is Broken
• Doesn’t mean it isn’t still practical
• Collage can use new information hiding
techniques as they come along
– Not dependent upon a particular technology
• Use many hiding techniques in parallel
Can Bob Be Behind the Firewall?
Comparison to Other Systems
Technique
System
Robustness
Deniability
Infrastructure
Onion routing
Tor
Layered
encryption;
bridges
Encryption
Network of relay
servers
Anonymous remailing
Mixminion
Many hops;
mixing
Encryption
Network of
remailers
Infranet
Browsing regular
Could use many
Custom Web
Web sites;
proxy forwarders
servers
Steganography
CovertFS
Browsing regular
Multiple content
Web sites;
None
hosts
Steganography
Collage
Erasure coding;
Many content
hosts
Covert channels
Browsing
content hosts;
Info hiding
None
“A Web based Covert File System”,
HotOS ’07
• Stores files inside of Flickr photos
• Intended for personal file storage
• Collage is more robust
– Erasure coding
– Automatically spread data among content hosts
• Could be implemented using Collage
• Implementation not publically available
Where Could We Store Content?
• Photo sharing sites (Wikipedia lists 71)
• Video sharing sites (Wikipedia lists 68)
• Web forums, blogs
– Plugins for popular forum and blogging software
What If Senders and Receivers
Disagree on Task Lists?
• Consistent hashing ensures some agreement
• Depends on number of tasks per identifier:
– If identifier mapped to 2 tasks, can still
communicate with 25% disagreement
– If identifier mapped to 4 tasks, can still
communicate with 50% disagreement
Agreeing on Message Identifiers
• Goal: Agree without communicating
• For news reader application:
– Public knowledge, hopefully censor doesn’t block
• Another option:
– Exchange keys with communicators beforehand
– Identifier is, e.g., ciphertext of current date
What Affects Deniability?
• Task construction:
– The content host
– Popularity the sequence of Web requests made
– Injection of think times
– Injection of garbage requests
– Alternative routes to the same vectors
• Frequency of task execution
• Are you uploading suspicious vectors?
– E.g., tagging pictures with bogus terms
Censorship Is an Arms Race
• Popular systems will eventually be blocked
• We must continually create systems that are
harder to compromise
Download