Privacy, Security and Data Analysis (26:711:685)

advertisement
Privacy, Security and Data Analysis (26:711:685)
Dr. Jaideep Vaidya (jsvaidya@business.rutgers.edu, x1441)
New technology has increasingly enabled corporations and governments to collect and
use huge amount of data related to individuals. At the same time, legitimate uses in
healthcare, crime prevention and terrorism demand that collected information be shared
by more people than most of us ever know. Today, the challenge is enabling the
legitimate use of the collected data without violating privacy. From the organizational
perspective, enabling safe and secure use of owned data can lead to great value addition
and return on investment. In this course, we are going to analyze the legal and social
aspects of privacy and explore potential tools, techniques and technologies that can
enhance privacy.
Thus, you will learn the basic issues underlying privacy in computing today. We will
consider the core issues surrounding privacy, security, data storage and analysis and the
technologies that have been developed to address those issues. The plan is to understand
the theoretical concept of secure computation, using data mining to give an application
oriented view. We will look at the important regulations in force today including
HIPAA, Sarbanes-Oxley, EU 95/47, etc. and consider what comprises compliance. We
will see the benefits of information sharing, including managerial impacts, and how to
enable it in a secure manner. At the end of the course, a student should know the state of
the art in privacy preserving computation and be able to apply his/her knowledge to new
research.
Prerequisites:
Undergraduate level knowledge in basic statistics and databases is needed.
Grading:
 Paper reviews: 10%
 Paper Presentations: 20%
 Class Project: 30%
 Final Exam: 30%
 Class Participation: 10%
Paper reviews: You need to write a review for papers covered in the class. Your review
should consist of at least three paragraphs: one paragraph summarizing the paper, one
paragraph describing (at least) three good ideas mentioned in the paper, and the last
paragraph critiquing or describing three things that can be improved in the paper.
Paper Presentations: You need to prepare a 50 minute power point presentation for the
paper that is assigned to you. You are welcomed to discuss the details of the paper and
the presentation with me.
Class Project: Your project should try to identify a new secure computation problem and
propose a novel solution or implement a solution for it. I fully expect to see publishable
work proceeding from this. Projects can be done individually or as a group of two
students. Your project should proceed in two phases:
1.
Around the middle of the term, you should schedule an appointment to discuss
the project topic with me.
2.
At the end of the class, you should give a presentation related to the outcome of
your project.
Class Participation: You should exchange ideas and engage in a class discussion for the
paper that you are not presenting.
Final Exam: The final exam will cover all of the material taught in the semester. It will be
an open book exam and will test your understanding of the concepts learned throughout
the semester.
Course Topics: (tentative)
Part I: Understanding Privacy
Social Aspects of Privacy: End of Privacy?
Legal Aspects of Privacy
Privacy Regulations
Effect of Database and Data Mining technologies on privacy
Privacy challenges raised by new emerging technologies such RFID, biometrics.
Part II: Using technology for preserving privacy.
Statistical Database security
Inference Control
Secure Multi-party computation and Cryptography
Privacy-preserving Data mining
Hippocratic databases
Course Outline:
Week 1
We will go over the course topics and discuss the grading issues. We will also discuss
background material for the course. You should read two articles from the Economist:
•
End of Privacy?
(The Economist April 29th 1999)
•
The surveillance society (The Economist April 29th 1999)
We will also discuss Chapter 1 from the book “The Database Nation” by Simson
Garfinkel (Available as an e-book through library)
Week 2
We will discuss the legal aspects of privacy. You should write one page review related to
HIPAA Laws. We will discuss Sarbanes-Oxley in the context of security and methods of
compliance.
•
Summary of HIPAA Laws
•
Privacy and Human Rights 2003 Survey's Overview part
We will also discuss the Chapters 2&3 from the book “The Database Nation” by Simson
Garfinkel (Available as an e-book through library)
Week 3
Basic Cryptographic Background – Symmetric and Asymmetric Encryption, Oblivious
Transfer. We will discuss the privacy issues related to DNA databases.
•
Medicine's New Central Bankers (The Economist, Dec 8th 2005)
•
Paper 1
Week 4
We will discuss privacy issues due to surveillance devices.
•
Privacy International's Statement on Surveillance Cameras
•
Paper 2
Week 5
We will discuss privacy issues related to RFID and Biometrics
•
Paper 3
•
Paper 4
Week 6
We will discuss a technique called k-anonymity
•
Paper 5
•
Paper 6
Week 7
Overview of statistical methods for privacy and the inference problem.
•
Paper 7
•
Paper 8
•
Paper 9
Week 8
Overview of Secure Multi-party Computation Techniques – we will look at the survey
“Varieties of Secure Computing” (Paper 10).
•
Paper 11
•
Paper 12
Week 9
Privacy-preserving Data Mining (Cryptographic Approach)
•
Paper 13
•
Paper 14
•
Paper 15
Week 10
Privacy-preserving Data Mining (Randomization Approach)
•
Paper 16
•
Paper 17
Week 11
Additional secure computation approaches & Hippocratic databases
•
Paper 18
•
Paper 19
Week 12
Privacy-Preserving Transportation Logistics and Supply Chain Management
•
Paper 20
•
Paper 21
Week 13
Financial Cryptography
•
Paper 22
•
Paper 23
Week 14
Project Presentations
Exam
Reading List:
1. How (not) to protect genomic data privacy in a distributed network: using trail reidentification to evaluate and design anonymity protection systems, Bradley
Malin and Latanya Sweeney, Journal of Biomedical Informatics. 2004; 37(3):
179-192 [Earlier version available as Technical Report CMU-ISRI-04-115]
2. Preserving privacy by de-identifying facial images, Elaine Newton, Latanya
Sweeney, and Bradley Malin, IEEE Transactions on Knowledge and Data
Engineering. 2005; 17(2): 232-243 [Extended earlier version available as
Technical Report CMU-CS-03-119]
3. Radio-Frequency Identification: Security Risks and Challenges, by Sanjay E.
Sarma, Stephen A. Weis, and Daniel W. Engels, in RSA Laboratories
CryptoBytes, Volume 6, No. 1, Spring 2003, pages 2-9.
4. Security and Privacy Issues in E-passports, Ari Juels, David Molnar, and David
Wagner, SecureComm 2005.
5. k-anonymity: a model for protecting privacy, Latanya Sweeney, International
Journal on Uncertainty, Fuzziness and Knowledge-based Systems, 10 (5), 2002,
pp. 557-570.
6. Data Privacy through Optimal k-Anonymization, Roberto J. Bayardo Jr., Rakesh
Agrawal, ICDE 2005, pp. 217-228.
7. Security-Control Methods for Statistical Databases: A Comparative Study, Nabil
R. Adam, John C. Wortmann, ACM Comput. Surv. 21(4), 1989, pp. 515-556.
8. Data mining, national security, privacy and civil liberties, Bhavani
Thuraisingham, ACM SIGKDD Explorations 4(2), 2002, pp. 1-5.
9. The inference problem: a survey, Csilla Farkas and Sushil Jajodia, SIGKDD
Explorations, 4(2), 2002, pp. 6-11.
10. Varieties of secure distributed computing, Matt Franklin and Moti Yung, Proc.
Sequences II, Methods in Communications, Security and Computer Science,
1991, pp. 392-417.
11. Randomization in privacy preserving data mining, Alexandre Evfimievski,
SIGKDD Explorations 4(2), 2002, pp. 43-48.
12. Cryptographic techniques for privacy-preserving data mining, Benny Pinkas,
SIGKDD Explorations 4(2), 2002, pp. 12-19.
13. Secure set intersection cardinality with application to association rule mining,
Jaideep Vaidya and Chris Clifton, Journal of Computer Security, 13(4), 2005 pp.
593-622.
14. Privacy-preserving Data Mining, Yehuda Lindell and Benny Pinkas, Crypto 2000.
15. Privacy-Preserving Set Operations, Lea Kissner and Dawn Song, Advances in
Cryptology, 2005.
16. Privacy-preserving Data Mining, Rakesh Agrawal and Ramakrishnan Srikant,
ACM SIGMOD International Conference on Management of Data, 2000, pp. 439450
17. On the Design and Quantification of Privacy Preserving Data Mining Algorithms,
Dakshi Agrawal and Charu C. Aggarwal, 20th Symposium on Principles of
Database Systems, 2001 (PODS ’01).
18. Secure Regression on Distributed Databases, Alan Karr, Xiaodong Lin, Ashish
Sanil, Jerome Reiter, Journal of Graphical and Computational Statistics. 14: 263 –
279
19. Hippocratic databases, Rakesh Agrawal, Jerry Kiernan, Ramakrishnan Srikant
and Yirong Xu, 28th Int'l Conference on Very Large Databases, 2002 (VLDB
’02).
20. An Approach to Identifying Beneficial Collaboration Securely in Decentralized
Logistics Systems, Chris Clifton, Ananth Iyer, Richard Cho, Wei Jiang, Murat
Kantarcioglu, and Jaideep Vaidya, Manufacturing & Service Operations
Management, 2007.
21. Secure Supply-Chain Protocols by M. J. Atallah, H. G. Elmongui, and V
Deshpande, L. Schwarz, IEEE International Conference on Electronic Commerce,
2003, pp. 293-302.
22. Risk Assurance for Hedge Funds using Zero-Knowledge Proofs, Michael Szydlo,
Financial Cryptography, 2005, pp. 156-171.
23. Probabilistic Escrow of Financial Transactions with Cumulative Threshold
Disclosure, Stanislaw Jarecki, Vitaly Shmatikov, Financial Cryptography 2005,
pp. 172-187.
Download