Privacy-Aware Computing Introduction Outline Brief introduction Motivating applications Major research issues Tentative schedule Reading assignments Project Grading Parties concerning privacy Individual privacy Customer data Public data: census data, voting record Health record locations Online activities … Organization privacy Owning collections of personal data Business secrets Legal issues prevent data sharing … Cases of privacy aware computing Public use of private data Data mining enables knowledge discovery on large populations, but people are reluctant to release personal information due to the privacy concern The Centers for Disease Control want to identify disease outbreaks by pooling multiple datasets that contain patient information Insurance companies have data on disease incidents, and patient background, etc.. Personal medical records help them maximize profits – but customers will not be happy with that. More Examples Industry Collaborations / Trade Groups. An industry trade group may want to identify best practices to help members, but some practices are trade secrets. How do we provide “commodity” results to all (Manufacturing using chemical supplies from supplier X have high failure rates), while still preserving secrets (manufacturing process Y gives low failure rates)? Multinational corps Multinational corps may want to pool data from different countries for analysis, but national laws may prevent transborder data sharing More examples Web search Search engine companies keep the cookies and search history, which can be used to derive personal information (AOL dataset) Social networking When you use social networks, you leave a trace of personal data and interactions Companies can use the data for Ads targeting – there is a risk of privacy breach and personal data abuse More examples Mobile computing When you allow google latitude to trace your locations, you loose location privacy Life style, clinic visits, political tendency, domestic violence Cloud computing Users have to outsource data to the cloud Data can be sensitive (personal information, customer records, patient info…) Major research areas Micro data publishing Anonymize data for statistical analysis and modeling Privacy preserving data mining Data outsourcing Cloud computing Outsource data to untrusted parties for using data intensive services Databases Statistical databases Private information retrieval Major areas Social networks Personal bio data, preferences, friends, interactions How to design mechanisms for users to conveniently control private data Mobile computing Location privacy Collaborative computing Collaborative data mining – share model but not individual records Major technical challenges Techniques Data perturbation Change data values while preserving global information Data anonymization Make sure at least k records have the same “virtual identifiers”, while preserving info Cryptographic techniques Secure multiparty computation Private information retrieval crypto-protocols for privacy preserving DM Privacy evaluation Tradeoff between privacy and data utility Differences between Security and privacy Privacy: decisions on what personal information is released and who can access it. Security makes sure these decisions are respected Security is often a necessary method to implement privacy National security and privacy They are conflicting… Enhance national security Surveillance devices are everywhere US PATROIT Act 2001 … the Act dramatically reduced restrictions on law enforcement agencies' ability to search telephone, e-mail communications, medical, financial, and other records … Big Brother is watching you – individuals have to sacrifice privacy Tentative Schedule Data perturbation Data anonymization Privacy metrics and differential privacy Privacy preserving data mining Private information retrieval Secure data outsourcing Privacy in online social networks Other privacy issues Reading assignments One selected paper from the reading list for most weeks ~10 Submit reading summary Before Monday noon How to write reading summary? Five parts: Title Research problems Major contributions Strengths Weaknesses or missing points Length: a few paragraphs to one page Paper presentation Choose one paper from the reading list, or recent major conferences Finish in 15 minutes Maximum two students per class Signup sheet When: office hours: 3-4:30pm MW, first two week Make sure you pick a slot asap Course Project 1~2 person per team Types Experimental study on existing techniques (from the paper list) Propose new algorithms Apply the learned techniques to some applications Your research Note You are encouraged to propose your own project The goal is to help you better understand problems and techniques and get some hands-on experience Project Schedule Proposal About 2 pages Problem description and what you plan to do By the end of January Final deliverables Report Code Class discussion You are encouraged to ask questions or present different opinions in the class Many of the topics are active research topics You have chances to generate publishable ideas Grading Reading summaries – 35% Paper presentation – 10% Project proposal – 10% Project final report – 15% Code – 10% Final exam – 20% Communication Announcements by emails Other issues, keke.chen@wright.edu Office: Joshi 385 Office hours: 3-4:30pm MW or by appointment. Slides will be posted on www.cs.wright.edu/keke.chen/privacy/