Jesus Juarez Juan Soltero Paul Romo Data Mining Implement it Ethically! A Report from the Engineering Group Delta October 28, 2011 2 Table of contents: Introduction……………………………………………………………………..3 Research Approach……………………………………………………………..3 Phase 1: Defining the project scope……………………………………3 Phase 2: Gather information……………………………………………3 Phase 3: Get familiarized with data mining and its uses……………...4 What is data mining?...………………………………………….4 History……………………………………………………………4 Phase 4: Ethical and unethical uses…………………………………….5 Corporate Ethical Use…………………………………………..5 Corporate Unethical Use………………………………………..6 Results of Study……………………………………….........................................6 Ethical Example:………………………………………………………...6 Unethical Example:……………………………………………………...7 Recommendations: Our Final Stand…………………………………………...7 Final Stand…………………………………………………………………..…...8 Worked Cited...…………………………………………………………………..9 3 Data Mining: Implement it Ethically! Introduction With the raise of technology comes a great risk of more personal information open for data mining. Businesses can use different data mining techniques without making a distinction between ethical and unethical practices. Other businesses not only have our information but may sell it to other businesses for a profit disregarding their customer’s privacy. Businesses can detect future trends to be able to keep up with the demand of their loyal customers while attracting new customers using information mined from their current costumers’ database. In this report we will talk about what data mining is, its history, ethical and unethical uses, as well as how corporations may benefit or hinder customers and themselves. Research Approach We set out a couple of steps to conduct and keep our research in focus and to avoid scope creep. Phase 1: Defining the project scope. Phase 2: Gather information. Phase 3: Get familiarized with data mining and its uses. Phase 4: Ethical and unethical uses. Phase 1: Defining the project scope Data mining is a very broad subject and it can be approach from many different angles. With that in mind, we wanted to focus primarily on the main ethical and unethical uses without getting too deep into all the technicalities implicated. We will be discussing how companies in general can use data mining to increment their profit ethically as well as unethically. Phase 2: Gather information We were somewhat familiarized in general with what data mining is. To better inform the reader, we research the world wide web using search engines like Google and Bing using words like ‘data mining’, ‘ethical data mining’, ‘unethical data mining’, and ‘data mining regulations.’ We also looked at some government articles to make our research more reliable. 4 Phase 3: Get familiarized with data mining and its uses During our research we found out that the first two primaries uses for data mining are to improve the customers’ experiences and to increase the companies’ revenue. What is data mining? Data mining is a process in which businesses and corporations get hidden information from their customers that will aid them to see a future trend of product they are likely to buy. It can be defined to a variety of ways such as, the process of finding hidden information, knowledge discovery, and many others. Data mining techniques are the common statistical base, the artificial intelligence (AI), Oracle computer software, and genetic algorithms to assist many corporations to aid in future profit. For example one use of data mining is in a Midwest chain of grocery stores used the software oracle to determine all the local buying patterns of their customers, and discovered that the men that bought diapers on Thursday and Saturdays tended to buy beer also. Looking deeper into the buying patterns it noticed that customers went more on Saturdays for their weekly shopping. Now with this information the Midwest grocery stores can use this for a bigger profit buy simply moving the beer closer to the diapers and making these two products full price on Thursday and Saturday (Palace, 1996). By simply analyzing some ongoing trends that happened during the weekends, Midwest was able to increase its profits; this is what data mining can do. We will go deeper into this and explore the ethics of how corporations should use it ethically. Before we go into depth, let’s touch the history of how data mining began and evolved. History In just a few short years ago, few people had not even heard of the term data mining. Data mining was really a collection of data bases that were hand written and kept in an index drawer and sorted. Due to the high prices and lack of processing power, it wasn’t until the 1980’s when customer data could be mined, but not how it is done today. Though data mining is the evolution of a field with a long history, the term itself was only introduced relatively recently, in the 1990s. Data mining roots are traced back along three family lines. The longest of these three lines is classical statistics. Without statistics, there would be no data mining. as regression analysis, standard distribution, standard deviation, standard variance, discriminant analysis, cluster analysis, and confidence intervals, all of which are used to study data and data relationships. The second longest family line is artificial intelligence, or AI. This discipline, which is built upon heuristics as opposed to statistics, attempts to apply human-thought-like processing to statistical problems. Machine learning, which is more accurately described as the union of statistics and AI. It is also an evolution of A.I. An article in Data-Mining-Software.com states, “Data mining is finding increasing acceptance in science and business areas which need to analyze large amounts of data to discover trends” (Data Mining Software, 2011). 5 Phase 4: Ethical and unethical Data mining can be used and or implemented by different parties like businesses, corporations, and the government itself. We will refer to these identities as parties from now on. Let’s start by saying that some people may not know that their shopping habits, names, addresses, and other information is being stored in a database by organizations. Some others might think that their information is saved on only one location but reality is that it can fall in the hands of anyone. If for any reason a copy gets on the internet then more copies can very easily be created. Customers should have the option whether or not their information could be collected and stored in a database. If a party would give the customers the option to opt out from their information being collected and stored, then most likely many customers would do so. Ambitious corporations would not want to give their customers this option because if would mean losing information. Losing this kind of information could potentially mean to give the competitors the upper hand. These parties should ask themselves if their customer know that their data is being collected and stored. If not, how would the customers feel if they were to find out? The ethical way to this would be to let the customer have a say so in regards to his/her information and respect that decision. Not asking the customer for permission to collect and store his/her information and selling this information to third parties would be unethical. Corporate Ethical Use Businesses are discovering new trends and patterns of behavior that previously went unnoticed. One example that (Palace, Data Mining: What is Data Mining?, 1996) gives is Brian James, assistant coach of the Toronto Raptors, uses data mining techniques to rack and stack his team against the rest of the NBA. It doesn’t stop there because even the NBA uses a software called The Advanced Scout which can analyze players from videos and help coaches come up with plays and strategies based on a shooting percentage that the other team is accomplishing. Data mining is happening everywhere around us whether you know it or not. It’s not a bad thing because even places like Wal-Mart have been doing it to keep their prices down. All the sales that are collected show which items are sold the most for a particular month, week, or season. From there they can analyze which items to display and see if other businesses are selling a similar product and compete by adjusting their prices. Consumer corporations utilizing data mining are not the only ones putting these algorithms to work. Pharmaceutical companies are using this system to the best of its ability. In an article in InformationWeek one example provided a benefit to using data mining. “For instance, sophisticated computer algorithms and software analytic tools are helping drug companies figure out which patients with specific medical characteristics are most likely to benefit from compounds in new drugs, and they're weeding out patients more likely to find a drug toxic” (McGee, InformationWeek, 2006). What makes this ethical is instead of testing large groups of people they can now use smaller groups combine the acquired results and using their algorithms with dose amounts and different people compositions can see if a particular medicine will benefit patients or be toxic to them. This also means that the trial periods are shortened from fifteen years before the use of data mining algorithms to at most two years. Medication is out sooner for patients that really need it and data is collected and used for what it is meant to be. Besides a few bumps in the road all parties benefit when programs and information are used in the appropriate way. This is not always the 6 case when data mining is involved with corporations and businesses that use this type of technology. In fact there have been instances where businesses take the unethical route. Corporate Unethical Use Data mining does have its unethically uses of all the data that corporations do collect then store them in warehouses like, some insurances, and government agencies according to Thearling K, (1998, March 17). With all the available information that is on peoples social networking sites is the best way to get a data bases collection from. Then with today’s technology boost more of our information is open for them to harness. According to President Clinton “We can’t let breakthroughs in technology break down our privacy”, Parten, C (2000, March 15). That foreshadowed the raise of technology to the present technology advances that we currently carries around such as the smart phones and Apple technology that have capabilities to access personal banking accounts to their social networking. For example of the unethically using data mining might be surprising to many users of Facebook. From a report from the Wall Street Journal, one of the many applications that Facebook runs using unwilling fifty nine million members by transferring user information to twenty five different data mining and advertising firms last year Kranzl, J (2010, October 18). Then one in particular app farm simulator did that and some with the addition of accessing information but accessed “personal information about users friends”, Kranzl, J (2010, October 18). But once confronted defenders argued that it was only the “Facebook ID” not any personal information but in fact that only your “Facebook ID” lets them access to users pictures, and age for starters, Kranzl, J (2010, October 18). A spokesman of Facebook stated, committed in addressing the issue and started to shutdown similar applications, Kranzl, J (2010, October 18). This barley one of the many social networking’s sites and it’s not the only way that some black labeled company can’t data mine into your email and start getting anything you can think if that is in text. Even now it is more dangerous to be able to be a target for companies just want a simple email address then once accessed they can start selling all the information they get from it sell it other companies for a pretty penny. Results of Study We discovered that some companies are very well structure and follow ethical guidelines very strictly while other concentrate their efforts in increasing revenue regardless of privacy rights. Ethical Example: A perfect example that will help us understand the ethical use of data mining is the government’s data mining activities. According to The Constitution Project organization (2010), “used properly, data mining can provide a valuable tool for the government to uncover fraud or criminal activity…and it is important to ensure that the government’s collection, acquisition, and use of data does not infringe upon individual privacy rights and respects the constitutional rights 7 of freedom of expression, due process, and equal protection.” Data mining can be a great help for the government as long as any privacy or constitutional right is not broken. Some people would be inclined to assume that since it’s the government performing these activities then there is nothing stopping it from violating individual’s privacy. In the 2007 Data Mining Report(p.7), Hugo Teufel III cited from Section 549 of the Senate bill which defines “data mining” as the following: “[A] query or search or other analysis of 1 or more electronic databases, whereas – (A) at least 1 of the databases was obtained from or remains under the control of a non-Federal entity, or the information was acquired initially by another department or agency of the Federal Government for purposes other than intelligence or law enforcement;13 (B) a department or agency of the Federal Government or a non-Federal entity acting on behalf of the Federal Government is conducting the query or search or other analysis to find a predictive pattern indicating terrorist or criminal activity; and (C) the search does not use a specific individual’s personal identifiers to acquire information concerning that individual.” From this definition we can see that even the government is bound by regulations when it comes to data mining; their information has to be collected from a non-Federal agency, they have to be in charge of the query via themselves or a non-Federal entity under their command, and the query cannot be based on a specific information about an individual. The government has to follow these regulations in order for their data to be useful in an ethical way. Unethical Example: Some parties can also use data mining unethically. A perfect example is a third party computing product registration cards. “While the consumer who purchased a new dishwasher may think she's submitting her appliance information directly to Kenmore, in fact, a sizeable data management agency provides the service to Kenmore for free so that it might receive consumer data and sell it elsewhere” (Shermach, 2006). Remember, the customer should have the option to opt out from their data being collected and stored. Some parties take advantage of the need and want from customers to register their products by offering free computing services for bigger parties with the intention of making profit from that data being processed. Customers think that they are submitting their information directly to Kenmore for product registration only. In reality, a third party is collecting the customer’s information not only for Kenmore but for themselves to sell to other parties. This collection and sale of this information without customer’s knowledge is unethical. Another example of a party using data mining unethically is Echonometrix. While reading an article from the Office of the Attorney General from New York, Schneiderman quoted Attorney General Cuomo saying “Echometrix sells software that protects children by gathering information for parents about what their kids are doing online, but at the same time it was marketing its data to outside companies without its customer's knowledge” (Schneiderman, 2010). This party was selling a program to parents and guardians for internet monitoring. They were covertly collecting and selling information from private messages to other parties without the knowledge or consent from their customers. The collection and selling of information itself is not an ethical practice. It is the way you go about that could be unethical. 8 Data mining without the knowledge or consent from the owner’s information would constitute an unethical practice. Recommendations: Our Final Stand In conclusion, to improve the ethical use of data mining we recommend the following: 1. When it comes to repurposing or improving data mining techniques/activities, make the customer’s privacy a priority. 2. Awareness on ethical data mining - employees from the appropriate department should be informed, maybe using a course/video created by the company itself, about ethically using data mining to increase profit. 3. Delegate regulations and enforcement to one particular person/department. 4. Don’t repurpose data mining. After reviewing all the information and all research, data mining has come to be what it is now and will continue to evolve along with technology. There will always be codes, regulations, and guidelines for its use. But when it comes to implementing them in a corporate environment, it will be the company’s moral ethics that will help make a fine distinction between protecting the customer’s privacy and an unethical way to raise revenue. 9 The Engineering Group Delta thanks you for your valuable time and giving the opportunity to bring this important matter to the table. If you have any questions, regards, comments, or would like to discuss this matter more in depth or in person, please contact the Engineering at jsolte@nmsu.edu. Works Cited Data Mining Software Data Mining History (2011, October 24) Website: http://www.data-mining-software.com/data_mining_history.htm Kranzl, J. (2010, October 18). Farmville Accused of Data Mining Retrieved from: VG 24/7 Website: http://www.vg247.com/2010/10/18/farmville-accused-of-data-mining/ McGee, Marianne K. (2006, February 13). A Pill, A Scalpel, A Database Retrieved from: Information Week The business Value of Technology Website: http://www.informationweek.com/news/1791034370 Palace, B. (1996 spring). Data Mining: What is Data Mining? Retrieved from University of California at Los Angeles, Anderson Graduate School of Management website: http://www.anderson.ucla.edu/faculty_pages/jason.frand/teacher/technologies/palace/data mining.htm Parten, C. (2000, May 15). Data Mining vs. Privacy Retrieved from: Insurance Journal Website: http://www.insurancejournal.com/magazines/southcentral/coverstory/2000/05/15/21117.h tm Shermach, K. (2006 August 25). Data Mining: W here Legality and Ethics Rarely Meet Retrieved from: E commerce Times Website: http://www.ecommercetimes.com/story/52616.htm 10 Schneiderman, E. (2010, September 15). Cuomo Announces Agreement Stopping Software Company "Echometrix" From Selling Children's Private Online Conversations to Marketers Retrieved from: Office of the Attorney General State of New York website: http://www.ag.ny.gov/media_center/2010/sep/sep15a_10.html Teufel III, H. Chief Privacy Offer (2007, July 6). 2007 Report to Congress on the Impact of Data Mining Technologies on Privacy and Civil Liberties U.S. Department of Homeland Security Website: http://www.dhs.gov/xlibrary/assets/privacy/privacy_rpt_datamining_2007.pdf The Constitution Project. (2010). Principles for Government Data Mining: Preserving Civil Liberties in the Information Age Website: http://www.constitutionproject.org/pdf/DataMiningPublication.pdf Thearling, K. (1998, March 17). Data Mining and Privacy: A conflict in the making? Website: http://www.thearling.com/text/dsstar/privacy.htm 11