Chipping Away at Censorship with User-Generated Content Sam Burnett, Nick Feamster and Santosh Vempala Internet Censorship is a Problem • • • • See http://rsf.org for more 12 censors 11 monitors 25% of population More on the way It’s Not Only China…at Home, Too Intro to Internet Censorship Censor Censor Firewall AliceUser Block Traffic Punish Censored net Uncensored net Bob Solution: Use a Helper Helper Firewall Alice Censored net Uncensored net Bob The helper sends messages to and from blocked hosts on your behalf Design Goals for the Helper • Be robust against blocking • Be deniable against user identification • Require no dedicated infrastructure What about Proxies and Mixnets? (e.g., Tor) Proxy Proxy Firewall Alice Censored net Uncensored net Bob • Censors can block proxies if the proxy list is public • Not deniable if encryption is incriminating • Requires dedicated infrastructure (network of proxies) What About Covert Channels? (e.g., Infranet) usenix.org Unblocked host Firewall Alice Censored net Uncensored net Bob • Not entirely robust against blocking • More deniable because messages are hidden • Requires dedicated infrastructure (Web servers) Collage: Let User-Generated Content Help Defeat Censorship User-generated content hosts Alice • Robust by using redundancy • Users generate innocuous-looking traffic • No dedicated infrastructure required Bob, a Flickr user Why Might Collage Work? • Lots of User-Generated Content (UGC) – More than 4 billion Flickr images – A day of video uploaded to YouTube every minute • Many sites host UGC • We have tools to store censored data in UGC – Steganography, watermarking Outline • Background and Design Goals • Collage Design • Performance and Demo Collage, Step-by-Step Embedded Vector Vector Message Alice Collage steps: 1. Obtain message 2. Pick message identifier 3. Obtain cover media 4. Embed message in cover 5. Upload UGC to content host 6. Find and download UGC 7. Decode message from UGC Bob Content host Step 2: 3: 4: 5: 6: 7: 1: Pick Obtain Embed Upload Find message Decode and cover message UGC message download to media identifier content in from cover UGC UGC host • Application Your personal Next slide specific photos • Only Generous intended usersrecipient should know it Embedding Messages in Vectors • • Encrypt the message using the identifier Generate chunks using erasure coding – Generate many chunks, recover from any k-subset – Allows splitting among many vectors, robustness • Embed chunks into vectors Collage steps: 1. Obtain message 2. Pick message identifier 3. Obtain cover media 4. Embed message in cover 5. Upload UGC to content host 6. Find and download UGC 7. Decode message from UGC Steganography: hard to detect Watermarking: hard to remove Do the reverse to decode Where Are Bob’s Vectors? • Crawling all of Flickr is not an option • Must agree on a subset of a content host without any immediate communication Collage steps: 1. Obtain message 2. Pick message identifier 3. Obtain cover media 4. Embed message in cover 5. Upload UGC to content host 6. Find and download UGC 7. Decode message from UGC Solution: A predictable way of mapping message identifiers to subsets of content hosts Solution: Task Mapping 1. 2. 3.• Tasks Hash the identifier Hash the tasks Map identifier closest Receivers performtothese taskstasks to get vectors • Senders publish vectors so that when receivers perform tasks, they get the sender’s vectors Collage steps: 1. Obtain message 2. Pick message identifier 3. Obtain cover media 4. Embed message in cover 5. Upload UGC to content host 6. Find and download UGC 7. Decode message from UGC Message Identifier http://nytimes.com 1 Search for blue11 flowers on Flickr Look at JohnDoe’s9videos on YouTube Tasks 6 3 How Does Collage Meet the Design Goals? • Robust against blocking – Erasure coding – Many content hosts • Deniable against user identification – Traffic only to/from content hosts – Depends upon task construction • Require no dedicated infrastructure – Messages stored on content hosts How Do You Start Using Collage? Send & Receive Messages 1. Distribute software – CDROM – Spam everyone – A secure network 2. Refresh task list – Receive using Collage – Online resource 3. Message identifier – Application specific Help Censored Users 1. Donate your UGC vectors – Photos on Flickr – Tweets on Twitter – Etc. 2. Write Collage applications Outline • Background and Design Goals • Collage Design • Performance and Demo Performance Metrics • Sender and receiver traffic overhead • Sender and receiver transfer time • Storage required on content hosts But these metrics can vary a lot: • Different content hosts • Different tasks • Different applications Case Study News Articles Covert Tweets Content host Message size Vectors needed Storage needed Flickr 30 KB 5 600 KB Twitter 140 Bytes 30 4 KB Sending traffic Sending time Receiving traffic 1,200 KB 5 minutes 6,000 KB 1,100 KB 60 minutes 600 KB Receiving time 2 minutes ½ minute Experiments performed on a 768/128 Kbps DSL connection Demo of a Collage Application What Should You Do Now? • Try out the demo application • Donate your photos – For now, only Flickr Pro users – Embed news articles when you upload photos Visit http://gtnoise.net/collage Conclusion • Collage evades Internet censorship by tunneling messages inside user-generated content – Robust against blocking – Deniable against user identification – Requires no dedicated infrastructure • More work needed – Statistical deniability against traffic analysis – Learn timing behavior from users – Tor bridge discovery http://gtnoise.net/collage Yes, Steganography is Broken • Doesn’t mean it isn’t still practical • Collage can use new information hiding techniques as they come along – Not dependent upon a particular technology • Use many hiding techniques in parallel Can Bob Be Behind the Firewall? Comparison to Other Systems Technique System Robustness Deniability Infrastructure Onion routing Tor Layered encryption; bridges Encryption Network of relay servers Anonymous remailing Mixminion Many hops; mixing Encryption Network of remailers Infranet Browsing regular Could use many Custom Web Web sites; proxy forwarders servers Steganography CovertFS Browsing regular Multiple content Web sites; None hosts Steganography Collage Erasure coding; Many content hosts Covert channels Browsing content hosts; Info hiding None “A Web based Covert File System”, HotOS ’07 • Stores files inside of Flickr photos • Intended for personal file storage • Collage is more robust – Erasure coding – Automatically spread data among content hosts • Could be implemented using Collage • Implementation not publically available Where Could We Store Content? • Photo sharing sites (Wikipedia lists 71) • Video sharing sites (Wikipedia lists 68) • Web forums, blogs – Plugins for popular forum and blogging software What If Senders and Receivers Disagree on Task Lists? • Consistent hashing ensures some agreement • Depends on number of tasks per identifier: – If identifier mapped to 2 tasks, can still communicate with 25% disagreement – If identifier mapped to 4 tasks, can still communicate with 50% disagreement Agreeing on Message Identifiers • Goal: Agree without communicating • For news reader application: – Public knowledge, hopefully censor doesn’t block • Another option: – Exchange keys with communicators beforehand – Identifier is, e.g., ciphertext of current date What Affects Deniability? • Task construction: – The content host – Popularity the sequence of Web requests made – Injection of think times – Injection of garbage requests – Alternative routes to the same vectors • Frequency of task execution • Are you uploading suspicious vectors? – E.g., tagging pictures with bogus terms Censorship Is an Arms Race • Popular systems will eventually be blocked • We must continually create systems that are harder to compromise