Privacy, Security and Data Analysis (26:711:685) Dr. Jaideep Vaidya (jsvaidya@business.rutgers.edu, x1441) New technology has increasingly enabled corporations and governments to collect and use huge amount of data related to individuals. At the same time, legitimate uses in healthcare, crime prevention and terrorism demand that collected information be shared by more people than most of us ever know. Today, the challenge is enabling the legitimate use of the collected data without violating privacy. From the organizational perspective, enabling safe and secure use of owned data can lead to great value addition and return on investment. In this course, we are going to analyze the legal and social aspects of privacy and explore potential tools, techniques and technologies that can enhance privacy. Thus, you will learn the basic issues underlying privacy in computing today. We will consider the core issues surrounding privacy, security, data storage and analysis and the technologies that have been developed to address those issues. The plan is to understand the theoretical concept of secure computation, using data mining to give an application oriented view. We will look at the important regulations in force today including HIPAA, Sarbanes-Oxley, EU 95/47, etc. and consider what comprises compliance. We will see the benefits of information sharing, including managerial impacts, and how to enable it in a secure manner. At the end of the course, a student should know the state of the art in privacy preserving computation and be able to apply his/her knowledge to new research. Prerequisites: Undergraduate level knowledge in basic statistics and databases is needed. Grading: Paper reviews: 10% Paper Presentations: 20% Class Project: 30% Final Exam: 30% Class Participation: 10% Paper reviews: You need to write a review for papers covered in the class. Your review should consist of at least three paragraphs: one paragraph summarizing the paper, one paragraph describing (at least) three good ideas mentioned in the paper, and the last paragraph critiquing or describing three things that can be improved in the paper. Paper Presentations: You need to prepare a 50 minute power point presentation for the paper that is assigned to you. You are welcomed to discuss the details of the paper and the presentation with me. Class Project: Your project should try to identify a new secure computation problem and propose a novel solution or implement a solution for it. I fully expect to see publishable work proceeding from this. Projects can be done individually or as a group of two students. Your project should proceed in two phases: 1. Around the middle of the term, you should schedule an appointment to discuss the project topic with me. 2. At the end of the class, you should give a presentation related to the outcome of your project. Class Participation: You should exchange ideas and engage in a class discussion for the paper that you are not presenting. Final Exam: The final exam will cover all of the material taught in the semester. It will be an open book exam and will test your understanding of the concepts learned throughout the semester. Course Topics: (tentative) Part I: Understanding Privacy Social Aspects of Privacy: End of Privacy? Legal Aspects of Privacy Privacy Regulations Effect of Database and Data Mining technologies on privacy Privacy challenges raised by new emerging technologies such RFID, biometrics. Part II: Using technology for preserving privacy. Statistical Database security Inference Control Secure Multi-party computation and Cryptography Privacy-preserving Data mining Hippocratic databases Course Outline: Week 1 We will go over the course topics and discuss the grading issues. We will also discuss background material for the course. You should read two articles from the Economist: • End of Privacy? (The Economist April 29th 1999) • The surveillance society (The Economist April 29th 1999) We will also discuss Chapter 1 from the book “The Database Nation” by Simson Garfinkel (Available as an e-book through library) Week 2 We will discuss the legal aspects of privacy. You should write one page review related to HIPAA Laws. We will discuss Sarbanes-Oxley in the context of security and methods of compliance. • Summary of HIPAA Laws • Privacy and Human Rights 2003 Survey's Overview part We will also discuss the Chapters 2&3 from the book “The Database Nation” by Simson Garfinkel (Available as an e-book through library) Week 3 Basic Cryptographic Background – Symmetric and Asymmetric Encryption, Oblivious Transfer. We will discuss the privacy issues related to DNA databases. • Medicine's New Central Bankers (The Economist, Dec 8th 2005) • Paper 1 Week 4 We will discuss privacy issues due to surveillance devices. • Privacy International's Statement on Surveillance Cameras • Paper 2 Week 5 We will discuss privacy issues related to RFID and Biometrics • Paper 3 • Paper 4 Week 6 We will discuss a technique called k-anonymity • Paper 5 • Paper 6 Week 7 Overview of statistical methods for privacy and the inference problem. • Paper 7 • Paper 8 • Paper 9 Week 8 Overview of Secure Multi-party Computation Techniques – we will look at the survey “Varieties of Secure Computing” (Paper 10). • Paper 11 • Paper 12 Week 9 Privacy-preserving Data Mining (Cryptographic Approach) • Paper 13 • Paper 14 • Paper 15 Week 10 Privacy-preserving Data Mining (Randomization Approach) • Paper 16 • Paper 17 Week 11 Additional secure computation approaches & Hippocratic databases • Paper 18 • Paper 19 Week 12 Privacy-Preserving Transportation Logistics and Supply Chain Management • Paper 20 • Paper 21 Week 13 Financial Cryptography • Paper 22 • Paper 23 Week 14 Project Presentations Exam Reading List: 1. How (not) to protect genomic data privacy in a distributed network: using trail reidentification to evaluate and design anonymity protection systems, Bradley Malin and Latanya Sweeney, Journal of Biomedical Informatics. 2004; 37(3): 179-192 [Earlier version available as Technical Report CMU-ISRI-04-115] 2. Preserving privacy by de-identifying facial images, Elaine Newton, Latanya Sweeney, and Bradley Malin, IEEE Transactions on Knowledge and Data Engineering. 2005; 17(2): 232-243 [Extended earlier version available as Technical Report CMU-CS-03-119] 3. Radio-Frequency Identification: Security Risks and Challenges, by Sanjay E. Sarma, Stephen A. Weis, and Daniel W. Engels, in RSA Laboratories CryptoBytes, Volume 6, No. 1, Spring 2003, pages 2-9. 4. Security and Privacy Issues in E-passports, Ari Juels, David Molnar, and David Wagner, SecureComm 2005. 5. k-anonymity: a model for protecting privacy, Latanya Sweeney, International Journal on Uncertainty, Fuzziness and Knowledge-based Systems, 10 (5), 2002, pp. 557-570. 6. Data Privacy through Optimal k-Anonymization, Roberto J. Bayardo Jr., Rakesh Agrawal, ICDE 2005, pp. 217-228. 7. Security-Control Methods for Statistical Databases: A Comparative Study, Nabil R. Adam, John C. Wortmann, ACM Comput. Surv. 21(4), 1989, pp. 515-556. 8. Data mining, national security, privacy and civil liberties, Bhavani Thuraisingham, ACM SIGKDD Explorations 4(2), 2002, pp. 1-5. 9. The inference problem: a survey, Csilla Farkas and Sushil Jajodia, SIGKDD Explorations, 4(2), 2002, pp. 6-11. 10. Varieties of secure distributed computing, Matt Franklin and Moti Yung, Proc. Sequences II, Methods in Communications, Security and Computer Science, 1991, pp. 392-417. 11. Randomization in privacy preserving data mining, Alexandre Evfimievski, SIGKDD Explorations 4(2), 2002, pp. 43-48. 12. Cryptographic techniques for privacy-preserving data mining, Benny Pinkas, SIGKDD Explorations 4(2), 2002, pp. 12-19. 13. Secure set intersection cardinality with application to association rule mining, Jaideep Vaidya and Chris Clifton, Journal of Computer Security, 13(4), 2005 pp. 593-622. 14. Privacy-preserving Data Mining, Yehuda Lindell and Benny Pinkas, Crypto 2000. 15. Privacy-Preserving Set Operations, Lea Kissner and Dawn Song, Advances in Cryptology, 2005. 16. Privacy-preserving Data Mining, Rakesh Agrawal and Ramakrishnan Srikant, ACM SIGMOD International Conference on Management of Data, 2000, pp. 439450 17. On the Design and Quantification of Privacy Preserving Data Mining Algorithms, Dakshi Agrawal and Charu C. Aggarwal, 20th Symposium on Principles of Database Systems, 2001 (PODS ’01). 18. Secure Regression on Distributed Databases, Alan Karr, Xiaodong Lin, Ashish Sanil, Jerome Reiter, Journal of Graphical and Computational Statistics. 14: 263 – 279 19. Hippocratic databases, Rakesh Agrawal, Jerry Kiernan, Ramakrishnan Srikant and Yirong Xu, 28th Int'l Conference on Very Large Databases, 2002 (VLDB ’02). 20. An Approach to Identifying Beneficial Collaboration Securely in Decentralized Logistics Systems, Chris Clifton, Ananth Iyer, Richard Cho, Wei Jiang, Murat Kantarcioglu, and Jaideep Vaidya, Manufacturing & Service Operations Management, 2007. 21. Secure Supply-Chain Protocols by M. J. Atallah, H. G. Elmongui, and V Deshpande, L. Schwarz, IEEE International Conference on Electronic Commerce, 2003, pp. 293-302. 22. Risk Assurance for Hedge Funds using Zero-Knowledge Proofs, Michael Szydlo, Financial Cryptography, 2005, pp. 156-171. 23. Probabilistic Escrow of Financial Transactions with Cumulative Threshold Disclosure, Stanislaw Jarecki, Vitaly Shmatikov, Financial Cryptography 2005, pp. 172-187.