B12 6005 Disclaimer — This paper partially fulfills a writing requirement for first year (freshman) engineering students at the University of Pittsburgh Swanson School of Engineering. This paper is a student, not a professional, paper. This paper is based on publicly available information and may not be provide complete analyses of all relevant data. If this paper is used for any purpose other than these authors’ partial fulfillment of a writing requirement for first year (freshman) engineering students at the University of Pittsburgh Swanson School of Engineering, the user does so at his or her own risk. DATA MINING: A SECURITY RISK OR A SECURITY ADVANTAGE? Ben Birkett, beb96@pitt.edu, Sanchez 10:00, Laura Friedland, lnf14@pitt.edu, 6:00 Revised Proposal — Within the past 5 years, data mining has become more and more prevalent the United States due to recent scandals and exposes on the topic. Simply put by Professor Jason Frand of UCLA, data mining “is the process of analyzing data from different perspectives and summarizing it into useful information” [1]. Due to major advances in technology and the average person’s growing dependence on technology, data mining now affects every citizen in the United States, whether he or she is aware of it or not. Within the past ten years, data mining has become an essential tool that government organizations, such as the National Security Agency and the Central Intelligence Agency, use to protect the country and gain intelligence on potential threats. Within data mining, there are several techniques, both old and new, referred to as neighborhoods, clustering, trees, networks and rules. Each of these techniques utilizes algorithms to find connections and trends in people’s everyday computer activity [2]. This allows the government to easily identify potential threats within the country and outside of the country. However, this also leads to the fear that if the U.S. government has access to every citizen’s computer activity, the government could have access to personal files and documents; which many could argue violates the rights of citizens [3]. Data mining is a continuous innovation [1]. It is constantly growing and changing as the technology the world uses grows and changes. New techniques and uses are frequently discovered, however; its most useful application is security, despite the controversy it generates. This paper will discuss the technology and methods behind data mining, how data mining works, and how it helps to improve national security. The ethics and the fallbacks regarding privacy will also be discussed in depth in reference to the NSA scandal that recently came to light when Edward Snowden revealed to the public that the NSA has access to every citizen’s private computer. Both technical and ethical articles will be used to highlight and discuss the potential, good and bad, and the controversy of data mining. Data mining methods are expanding rapidly allowing for the mass collection of information. This mass amount of information is then used by many government agencies to identify threats, gain intelligence, and obtain a better understanding of enemy networks. However, the ability to collect this information from any computer draws into question whether or not data mining leads to a violation of the average University of Pittsburgh Swanson School of Engineering 1 01/28/16 citizen’s privacy and has created a debate as to if data mining is ethically plausible. REFERENCES [1] J. Frand. (2010). Data Mining: What is Data Mining?. (online article). http://www.anderson.ucla.edu/faculty/jason.frand/teacher/tec hnologies/palace/datamining.htm [2] A. Berson. (2000). Building Data Mining Applications for CRM (Enterprise). McGraw-Hill Education. (Print book). [3] J. Pappalardo. (2013, Oct.). “NSA Data Mining: How It Works.” Popular Mechanics. (online article). DOI: 00324558 ANNOTATED BIBLIOGRAPHY A. Berson. (2000). Building Data Mining Applications for CRM (Enterprise). McGraw-Hill Education. (print book). From an educational book about data mining, we are using an excerpt which gives an overview of data mining techniques. Within this overview, the book explains various specific methods for data mining and discusses how to use and apply them. This technical source will help us explain how data mining works and will let us delve into specific techniques for data mining. E. Svoboda. (2009). “Digital Exposure.” Discover. (print article). Vol. 30, Issue 10 This article touches on concerns about data mining, and how it can be used against us. Also discussed, is the way businesses use data mining. The article relates the normal, unsuspecting person to the world of data mining and shows how his or her data can, and probably is, being obtained and used. In our paper, this article will help relate data mining to people who do not yet know much about it, and provide reasons as to why they should. G. Tsiafoulis, C. Zorkadis. (2010). “A neural-network clustering-based algorithm for privacy preserving data mining.” Computational Intelligence and Security (CIS). (online article). ISBN: 978-1-4244-9114-8. pp. 401-405 This article from a conference relating to computer intelligence proposes methods for preserving privacy in the use Ben Birkett Laura Friedland of data mining. Specifically, providing various levels of anonymity for certain data, based off of what it is. This article applies technical aspects of data mining to the ethics surrounding it. It will help us provide a connection between the technical methods of data mining and the issue of privacy related to it. X=3&database=3&format=expertSearchAbstractFormat&ded upResultCount=&SEARCHID=bd4c0c72Md34bM4967Maf0 4M824cacb492d0 This article, published in professional and respected journal specializing in computer technology, discusses how data mining can use decision tree algorithms to predict and identify possible attacks. It discusses in depth how decision trees can search private networks for keywords and identify possible threats. It also discusses the effectiveness of the program and how changes could be made. This article will assist us in showing real life applications of data mining and how it works to identify security issues. J. Bamford. (2015). “The Black-and-White Security Question.” Foreign Policy. (print article). pp.70-75 This article, from a magazine which focuses on American foreign policy, puts forth the idea of using government intelligence, such as that obtained through data mining, to help people by making it public. In presenting this idea, the article discusses the ethical issues relating to the intel that the U.S. government collects through data mining. In our paper, this article can provide discussion relating to the ethics of data mining. S. Nath. (2006, Dec.). “Crime Pattern Detection Using Data Mining.” Web Intelligence and Intelligent Agent Technology. (online article). DOI: 10.1109/WI-IATW.2006.55 http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=4053200 &tag=1 This article, written for a computer security and intelligence conference, discusses a data mining clustering model that can be used to assist in finding evidence, solving crimes, and flagging potential future crimes. It provides statistics and efficiency reports as well as case studies it was used on. This article will be useful for when we wish to discuss the vast applications and effects data mining has in all security issues, domestic and foreign. J. Frand. (2010). “Data Mining: What is Data Mining?.” (online article). http://www.anderson.ucla.edu/faculty/jason.frand/teacher/tec hnologies/palace/datamining.htm This article, published by the Univerisity of California Los Angeles, written by professor of mathematics, Jason Frand, details what data mining is, how it works, and the possibilities it presents. This article defines the basic structures of data mining, such as classes and clusters and disusses in depth how decision trees work and how they can be applied to security. Information from this article will help us clarify what the technology is and define in simple terms how it works. J. Pappalardo. (2013, Oct.). “NSA Data Mining: How It Works.” Popular Mechanics. (online article). DOI: 00324558 http://web.a.ebscohost.com/ehost/detail/detail?sid=7dcfa7aeb20b-4aad-98eebb7853923ef6%40sessionmgr4001&vid=5&hid=4104&bdata =JnNpdGU9ZWhvc3QtbGl2ZQ%3d%3d#AN=90650431&d b=aph This article, published in the respectable “Populat Mechanics” magazine, discusses the ethics of data mining in reference to personal internet security and the NSA Snowden scandal. It describes information landscapes, Exabyte’s, metadata tracking and worldwide data and the possible security threats these concepts impose. This article also goes into depth about data leaks and security issues. This article will be useful when we wish to discuss the ethics of data mining and the threat to privacy it could possess. M. Shree, J. Visumathi, P. Jayarin. (2016). “Identification of attacks using proficient data interested decision tree algorithm in data mining.” Advances in Intelligent Systems and Computing. (online article). DOI: 10.1007/978-81-322-26741_60 https://www.engineeringvillage.com/search/doc/abstract.url? pageType=expertSearch&searchtype=Expert&SEARCHID= bd4c0c72Md34bM4967Maf04M824cacb492d0&DOCINDE 2