PRIVACY USC CSCI499 Dr. Genevieve Bartlett USC/ISI Privacy The state or condition of being free from observation. Privacy The state or condition of being free from observation. Not really possible today…at least not on the internet. Privacy The right of people to choose freely under what circumstances and to what extent they will reveal themselves, their attitude, and their behavior to others. Privacy is not black and white Lots of grey areas and points for discussion What seems private to you may not seem private to me Three examples to start us off: HTTP Cookies Google Street View Facebook HTTP cookies: What are they? Cookies = small text file Received from a server, stored on your machine Usually web Purpose: HTTP is stateless, so cookies maintain state for the HTTP protocol Eg keeping the contents of your “shopping cart” while you browse a site HTTP cookies: rd 3 party cookies You visited your favorite site unicornsareawesome.com unicornsareawesome.com pulls ads from lameads.com You get a cookie from lameads.com, even though you never visited lameads.com lameads.com can track your browsing habits every time you visit any page with ads from lameads.com… those might be a lot of pages HTTP cookies: Grey Area? 3rd party cookies allow ad servers to personalize your ads = more useful to you. Good! But You choose to go to unicornsareawesome.com = ok with unicornsareawesome.com knowing about how you use their site Nowhere did you choose to let lameads.com monitor your browsing habits Short Discussion: Collusion: tool to track these 3rd party cookies TED talk on “Tracking the Trackers” http://www.ted.com/talks/gary_kovacs_tracking_the_t rackers.html Google Street View: What is it? Google cars drive around and take 360° panoramic pictures. Images are stitched together and can be browsed through on the Internet Google Street View: Me Google Street View: Lots to See Google Street View: Grey Area Expectation of privacy? I’m in public, I can expect people will see me Expectations? Picture linked to location Searchable Widely available Available for a long time to come Facebook: What is it? Social networking site Connect with friends Share pictures, interests (“likes”) Facebook: Grey Area Who uses Facebook data and how is data used? 4.7 million liked a page about health conditions or treatments. Insurance agents? 4.8 million shared information about dates of vacations. Burglars? 2.6 million discussed recreational use of alcohol. Employers? Facebook: More Grey Security issues with Facebook Confusion over privacy settings Sudden changes in default privacy settings Facebook tracks browsing habits, even if a user isn’t logged in (third-party cookies) Facebook sells user information to ad agencies and behavioral trackers Why start with these examples? 3 examples: HTTP cookies, Google Street View, Facebook Lots more “every day” examples Users gain benefits by sharing data Tons of data generated, widely shared and accessible and stored (for how long?) Are users really aware of how and who? Today’s Agenda Privacy and Privacy & Security How do we “safely” share private data? Privacy and Inferred Information Privacy and Social Networks How do we design a system with privacy in mind? Privacy and Privacy & Security How do we “safely” share private data? Privacy and Inferred Information Privacy and Social Networks How do we design a system with privacy in mind? Examples private information Tons of information can be gained from Internet use: Behavior Preferences Eg. Person X and Person Y are friends. PPI (private, personal/protected information) Eg. Person Y likes high heel shoes and uses Apple products. Associations Eg. Person X reads reddit.com at work. credit card #s, SSN, nick names, addresses PII (personally identifying information) Eg. Your age + your address = I know who you are, even if I’m not given your name. How do we achieve privacy? policy + security mechanisms + law + ethics + trust Anonymity & Anonymization mechanisms Make each user indistinguishable from the next Remove PPI & PII Aggregate information Who wants private info? Governments – surveillance Businesses – targeted advertising, following trends Attackers – monetize information or cause havoc Researchers – medical, behavioral, social, computer Who has private info? You and me End-users Customers Patients Businesses Protect mergers, product plans, investigations Government & law enforcement National security Criminal investigations Privacy and Security Security enables privacy Data is only as safe as the system its on Sometimes security at odds with privacy Eg. Security requires authentication, but privacy is achieved through anonymity Eg. TSA pat down at the airport Privacy and Privacy & Security How do we “safely” share private data? Privacy and Inferred Information Privacy and Social Networks How do we design a system with privacy in mind? Why do we want to share? Share existing data sets: Research Companies Buy data from each other Check out each other’s assets before merges/buyouts Start a new dataset: Mutually Share beneficial relationships data with me and you can use this service Sharing everything? Easy, but what are the ramifications? Legal/policy may limit what can be shared/collected IRBs: Institutional Review Board HITECH & HIPAA: Health Insurance Portability and Accountability Act Future use and protection of data? Mechanisms for limited sharing Remove really sensitive stuff (sanitization) PPI & PII (private, personal & private identifying) Without a crystal ball, this is hard Anonymization Replace information to limit ability to tie entities to meaningful identities Aggregation Remove PII by only collecting/releasing statistics Anonymization Example Network trace: PAYLOAD Anonymization Example Network trace: PAYLOAD All sorts of PII and PPI in there! Anonymization Example Network trace: PAYLOAD Routing information: IP addresses, TCP flags/options, OS fingerprinting Anonymization Example Network trace: PAYLOAD Remove IPs? Anonymize IPs? Anonymization Example Network trace: PAYLOAD Removing IPs severely limits what you can do with the data. Replace with something identifying, but not the same data. IP1 = A IP2 = B Etc. Aggregation Example “Fewer U.S. Households Have Debt, But Those Who Do Have More, Census Bureau Reports” Methods can be bad or good Just because someone uses aggregation or anonymization, doesn’t mean the data is safe Example: Release aggregate stats of people’s favorite color? Privacy and Privacy & Security How do we “safely” share private data? Privacy and Inferred Information Privacy and Social Networks How do we design a system with privacy in mind? What is Inferred? Take 2 sources of information, correlate data X + Y = …. Example: Google Street View + what my car looks like + where I live = you know where I was back in November Another example Paula Broadwell who had an affair with CIA director David Petraeus, similarly took extensive precautions to hide her identity. She never logged in to her anonymous e-mail service from her home network. Instead, she used hotel and other public networks when she e-mailed him. The FBI correlated hotel registration data from several different hotels -- and hers was the common name. Another example: Netflix & IMDB Netflix prize: released an anonymized dataset Correlated with IMDB: undid anonymization (University of Texas) Privacy and Privacy & Security How do we “safely” share private data? Privacy and Inferred Information Privacy and Social Networks How do we design a system with privacy in mind? What is social networking data? Associations Not what you say, but who you talk to OMG NEW BOYFRIEND Why is social data interesting? From a privacy point of view: Guilt by association Eg. Government very interested Phone records (US) Facebook activity (Iran) Computer Communication Computer communication = social network What sites/servers you visit/use = information on your relationship with those sites/servers You Unicornsareawesome.com Never mind the content…How often you visit and who you visit may reveal a lot! How do we provide privacy? Of course encrypt content (payload)! But: Network/transport layer = no encryption (for now) Anyone along the path can see source and destination… so now what? Onion Routing General idea: bounce connection through a bunch of machines Don’t we bounce around already? Not actually what happens…… Don’t we bounce around already? Closer to what actually happens. Don’t we bounce around already? Yes, we route packets through a series of routers BUT this doesn’t protect the privacy of who’s talking to whom… Why? PAYLOAD Don’t we bounce around already? Yes, we route packets through a series of routers BUT this doesn’t protect the privacy of who’s talking to who… Why? ENCRYPTED Contains routing information. Yes, we bounce… but: Everyone along the way can see src & dst Routes are easy to figure out ENCRYPTED Contains routing information = Can’t encrypt Everyone along the path (routers and observers) can see who is talking to whom Onion routing saves us Each router only knows about the last/next hop Routes are hard to figure out Change frequently Chosen by the source The Onion part of Onion Routing Layers of encryption PAYLOAD Last hop’s key Second hop’s key First hop’s key Onion Routing Example: Tor You Unicornsareawesome.com Onion Routing Example: Tor You Tor Router IPs + public key for each router Tor directory Get a list of Tor Routers from the publically known Tor directory Onion Routing Example: Tor Tor Routers You Unicornsareawesome.com Onion Routing Example: Tor You 1st 2nd 3rd Choose a set of Tor routers to use Unicornsareawesome.com Onion Routing Example: Tor You 1st 2nd 3rd Packets are now encrypted with 3 keys Unicornsareawesome.com Onion Routing Example: Tor Source: YOU, Dest: 1st Tor router You 1st 2nd 3rd Unicornsareawesome.com Onion Routing Example: Tor You 1st Decrypts 1st layer 2nd 3rd Unicornsareawesome.com Onion Routing Example: Tor Source: 1st Tor router, Dest: 2nd Tor router You 1st 2nd 3rd Unicornsareawesome.com Onion Routing Example: Tor You 1st 2nd Decrypts 2nd layer 3rd Unicornsareawesome.com Onion Routing Example: Tor You 1st 2nd Source: 2nd Tor router, Dest: 3rd Tor router 3rd Unicornsareawesome.com Onion Routing Example: Tor You 1st 2nd Decrypts last layer 3rd Unicornsareawesome.com Onion Routing Example: Tor You 1st 2nd Source: 3rd Tor router, Dest: Unicornsareawesome.com 3rd Original (unencrypted) packet sent to server. Unicornsareawesome.com What does our attacker see? You Encrypted traffic from You, to 1st Tor router What does our attacker see? You Other view points? Not easily traceable to you. What does our attacker see? Global view points? Very unlikely... But if so… trouble! What does our attacker see? Also unlikely… can perform correlation between end-to-end. Reliance on multiple users You What would happen here if You were the only one using Tor? Side note: Tor is an overlay Tor routers are often just someone’s regular machine. Traffic is still routed over regular routers too. Onion Routing: Things to Note Not perfect, but pretty nifty End host (unicornsareawesome.com) does not need to know about the Tor protocol (good for wide usage and acceptance) Data is encrypted all the way to the last Tor router If end-to-end application (like HTTPS) is using encryption, the payload is doubly encrypted along the Tor route. Privacy and Privacy & Security How do we “safely” share private data? Privacy and Inferred Information Privacy and Social Networks How do we design a system with privacy in mind? Designing privacy preserving systems Aim for the minimum amount of information needed to achieve goals Think through how info can be gained and inferred Inferred is often a gotcha! x + y = something private, but x and y by themselves don’t seem all that special Think through how information be gained On the wire? Stored in logs? At a router? At an ISP? Privacy and Stored Information Data is only as safe as the system How long is the data stored affects privacy Longer term = bigger privacy risk (in general) Longer time frame, more data to correlate & infer Longer opportunity for data theft Increased chances of mistakes, lapsed security etc. An example of keeping privacy in mind My work: P2P file sharing detection