it!

advertisement
Jesus Juarez
Juan Soltero
Paul Romo
Data Mining
Implement it Ethically!
A Report from the
Engineering Group Delta
October 28, 2011
2
Table of contents:
Introduction……………………………………………………………………..3
Research Approach……………………………………………………………..3
Phase 1: Defining the project scope……………………………………3
Phase 2: Gather information……………………………………………3
Phase 3: Get familiarized with data mining and its uses……………...4
What is data mining?...………………………………………….4
History……………………………………………………………4
Phase 4: Ethical and unethical uses…………………………………….5
Corporate Ethical Use…………………………………………..5
Corporate Unethical Use………………………………………..6
Results of Study……………………………………….........................................6
Ethical Example:………………………………………………………...6
Unethical Example:……………………………………………………...7
Recommendations: Our Final Stand…………………………………………...7
Final Stand…………………………………………………………………..…...8
Worked Cited...…………………………………………………………………..9
3
Data Mining: Implement it Ethically!
Introduction
With the raise of technology comes a great risk of more personal information open for data
mining. Businesses can use different data mining techniques without making a distinction
between ethical and unethical practices. Other businesses not only have our information but may
sell it to other businesses for a profit disregarding their customer’s privacy. Businesses can detect
future trends to be able to keep up with the demand of their loyal customers while attracting new
customers using information mined from their current costumers’ database. In this report we will
talk about what data mining is, its history, ethical and unethical uses, as well as how corporations
may benefit or hinder customers and themselves.
Research Approach
We set out a couple of steps to conduct and keep our research in focus and to avoid scope creep.
Phase 1: Defining the project scope.
Phase 2: Gather information.
Phase 3: Get familiarized with data mining and its uses.
Phase 4: Ethical and unethical uses.
Phase 1: Defining the project scope
Data mining is a very broad subject and it can be approach from many different angles. With that
in mind, we wanted to focus primarily on the main ethical and unethical uses without getting too
deep into all the technicalities implicated. We will be discussing how companies in general can
use data mining to increment their profit ethically as well as unethically.
Phase 2: Gather information
We were somewhat familiarized in general with what data mining is. To better inform the reader,
we research the world wide web using search engines like Google and Bing using words like
‘data mining’, ‘ethical data mining’, ‘unethical data mining’, and ‘data mining regulations.’ We
also looked at some government articles to make our research more reliable.
4
Phase 3: Get familiarized with data mining and its uses
During our research we found out that the first two primaries uses for data mining are to improve
the customers’ experiences and to increase the companies’ revenue.
What is data mining?
Data mining is a process in which businesses and corporations get hidden information from their
customers that will aid them to see a future trend of product they are likely to buy. It can be
defined to a variety of ways such as, the process of finding hidden information, knowledge
discovery, and many others. Data mining techniques are the common statistical base, the
artificial intelligence (AI), Oracle computer software, and genetic algorithms to assist many
corporations to aid in future profit. For example one use of data mining is in a Midwest chain of
grocery stores used the software oracle to determine all the local buying patterns of their
customers, and discovered that the men that bought diapers on Thursday and Saturdays tended to
buy beer also. Looking deeper into the buying patterns it noticed that customers went more on
Saturdays for their weekly shopping. Now with this information the Midwest grocery stores can
use this for a bigger profit buy simply moving the beer closer to the diapers and making these
two products full price on Thursday and Saturday (Palace, 1996). By simply analyzing some
ongoing trends that happened during the weekends, Midwest was able to increase its profits; this
is what data mining can do. We will go deeper into this and explore the ethics of how
corporations should use it ethically. Before we go into depth, let’s touch the history of how data
mining began and evolved.
History
In just a few short years ago, few people had not even heard of the term data mining. Data
mining was really a collection of data bases that were hand written and kept in an index drawer
and sorted. Due to the high prices and lack of processing power, it wasn’t until the 1980’s when
customer data could be mined, but not how it is done today. Though data mining is the evolution
of a field with a long history, the term itself was only introduced relatively recently, in the 1990s.
Data mining roots are traced back along three family lines. The longest of these three lines is
classical statistics. Without statistics, there would be no data mining. as regression analysis,
standard distribution, standard deviation, standard variance, discriminant analysis, cluster
analysis, and confidence intervals, all of which are used to study data and data relationships. The
second longest family line is artificial intelligence, or AI. This discipline, which is built upon
heuristics as opposed to statistics, attempts to apply human-thought-like processing to statistical
problems. Machine learning, which is more accurately described as the union of statistics and
AI. It is also an evolution of A.I. An article in Data-Mining-Software.com states, “Data mining
is finding increasing acceptance in science and business areas which need to analyze large
amounts of data to discover trends” (Data Mining Software, 2011).
5
Phase 4: Ethical and unethical
Data mining can be used and or implemented by different parties like businesses, corporations,
and the government itself. We will refer to these identities as parties from now on. Let’s start by
saying that some people may not know that their shopping habits, names, addresses, and other
information is being stored in a database by organizations. Some others might think that their
information is saved on only one location but reality is that it can fall in the hands of anyone. If
for any reason a copy gets on the internet then more copies can very easily be created. Customers
should have the option whether or not their information could be collected and stored in a
database. If a party would give the customers the option to opt out from their information being
collected and stored, then most likely many customers would do so. Ambitious corporations
would not want to give their customers this option because if would mean losing information.
Losing this kind of information could potentially mean to give the competitors the upper hand.
These parties should ask themselves if their customer know that their data is being collected and
stored. If not, how would the customers feel if they were to find out? The ethical way to this
would be to let the customer have a say so in regards to his/her information and respect that
decision. Not asking the customer for permission to collect and store his/her information and
selling this information to third parties would be unethical.
Corporate Ethical Use
Businesses are discovering new trends and patterns of behavior that previously went unnoticed.
One example that (Palace, Data Mining: What is Data Mining?, 1996) gives is Brian James,
assistant coach of the Toronto Raptors, uses data mining techniques to rack and stack his team
against the rest of the NBA. It doesn’t stop there because even the NBA uses a software called
The Advanced Scout which can analyze players from videos and help coaches come up with
plays and strategies based on a shooting percentage that the other team is accomplishing. Data
mining is happening everywhere around us whether you know it or not. It’s not a bad thing
because even places like Wal-Mart have been doing it to keep their prices down. All the sales
that are collected show which items are sold the most for a particular month, week, or season.
From there they can analyze which items to display and see if other businesses are selling a
similar product and compete by adjusting their prices. Consumer corporations utilizing data
mining are not the only ones putting these algorithms to work. Pharmaceutical companies are
using this system to the best of its ability. In an article in InformationWeek one example
provided a benefit to using data mining. “For instance, sophisticated computer algorithms and
software analytic tools are helping drug companies figure out which patients with specific
medical characteristics are most likely to benefit from compounds in new drugs, and they're
weeding out patients more likely to find a drug toxic” (McGee, InformationWeek, 2006). What
makes this ethical is instead of testing large groups of people they can now use smaller groups
combine the acquired results and using their algorithms with dose amounts and different people
compositions can see if a particular medicine will benefit patients or be toxic to them. This also
means that the trial periods are shortened from fifteen years before the use of data mining
algorithms to at most two years. Medication is out sooner for patients that really need it and data
is collected and used for what it is meant to be. Besides a few bumps in the road all parties
benefit when programs and information are used in the appropriate way. This is not always the
6
case when data mining is involved with corporations and businesses that use this type of
technology. In fact there have been instances where businesses take the unethical route.
Corporate Unethical Use
Data mining does have its unethically uses of all the data that corporations do collect then store
them in warehouses like, some insurances, and government agencies according to Thearling K,
(1998, March 17). With all the available information that is on peoples social networking sites is
the best way to get a data bases collection from. Then with today’s technology boost more of our
information is open for them to harness. According to President Clinton “We can’t let
breakthroughs in technology break down our privacy”, Parten, C (2000, March 15). That
foreshadowed the raise of technology to the present technology advances that we currently
carries around such as the smart phones and Apple technology that have capabilities to access
personal banking accounts to their social networking. For example of the unethically using data
mining might be surprising to many users of Facebook. From a report from the Wall Street
Journal, one of the many applications that Facebook runs using unwilling fifty nine million
members by transferring user information to twenty five different data mining and advertising
firms last year Kranzl, J (2010, October 18). Then one in particular app farm simulator did that
and some with the addition of accessing information but accessed “personal information about
users friends”, Kranzl, J (2010, October 18). But once confronted defenders argued that it was
only the “Facebook ID” not any personal information but in fact that only your “Facebook ID”
lets them access to users pictures, and age for starters, Kranzl, J (2010, October 18). A
spokesman of Facebook stated, committed in addressing the issue and started to shutdown
similar applications, Kranzl, J (2010, October 18). This barley one of the many social
networking’s sites and it’s not the only way that some black labeled company can’t data mine
into your email and start getting anything you can think if that is in text. Even now it is more
dangerous to be able to be a target for companies just want a simple email address then once
accessed they can start selling all the information they get from it sell it other companies for a
pretty penny.
Results of Study
We discovered that some companies are very well structure and follow ethical guidelines very
strictly while other concentrate their efforts in increasing revenue regardless of privacy rights.
Ethical Example:
A perfect example that will help us understand the ethical use of data mining is the government’s
data mining activities. According to The Constitution Project organization (2010), “used
properly, data mining can provide a valuable tool for the government to uncover fraud or
criminal activity…and it is important to ensure that the government’s collection, acquisition, and
use of data does not infringe upon individual privacy rights and respects the constitutional rights
7
of freedom of expression, due process, and equal protection.” Data mining can be a great help for
the government as long as any privacy or constitutional right is not broken. Some people would
be inclined to assume that since it’s the government performing these activities then there is
nothing stopping it from violating individual’s privacy. In the 2007 Data Mining Report(p.7),
Hugo Teufel III cited from Section 549 of the Senate bill which defines “data mining” as the
following:
“[A] query or search or other analysis of 1 or more electronic databases, whereas –
(A) at least 1 of the databases was obtained from or remains under the
control of a non-Federal entity, or the information was acquired
initially by another department or agency of the Federal Government
for purposes other than intelligence or law enforcement;13
(B) a department or agency of the Federal Government or a non-Federal
entity acting on behalf of the Federal Government is conducting the
query or search or other analysis to find a predictive pattern indicating
terrorist or criminal activity; and
(C) the search does not use a specific individual’s personal identifiers to
acquire information concerning that individual.”
From this definition we can see that even the government is bound by regulations when it comes
to data mining; their information has to be collected from a non-Federal agency, they have to be
in charge of the query via themselves or a non-Federal entity under their command, and the
query cannot be based on a specific information about an individual. The government has to
follow these regulations in order for their data to be useful in an ethical way.
Unethical Example:
Some parties can also use data mining unethically. A perfect example is a third party computing
product registration cards. “While the consumer who purchased a new dishwasher may think
she's submitting her appliance information directly to Kenmore, in fact, a sizeable data
management agency provides the service to Kenmore for free so that it might receive consumer
data and sell it elsewhere” (Shermach, 2006). Remember, the customer should have the option
to opt out from their data being collected and stored. Some parties take advantage of the need
and want from customers to register their products by offering free computing services for bigger
parties with the intention of making profit from that data being processed. Customers think that
they are submitting their information directly to Kenmore for product registration only. In
reality, a third party is collecting the customer’s information not only for Kenmore but for
themselves to sell to other parties. This collection and sale of this information without
customer’s knowledge is unethical. Another example of a party using data mining unethically is
Echonometrix. While reading an article from the Office of the Attorney General from New York,
Schneiderman quoted Attorney General Cuomo saying “Echometrix sells software that protects
children by gathering information for parents about what their kids are doing online, but at the
same time it was marketing its data to outside companies without its customer's knowledge”
(Schneiderman, 2010). This party was selling a program to parents and guardians for internet
monitoring. They were covertly collecting and selling information from private messages to
other parties without the knowledge or consent from their customers. The collection and selling
of information itself is not an ethical practice. It is the way you go about that could be unethical.
8
Data mining without the knowledge or consent from the owner’s information would constitute an
unethical practice.
Recommendations: Our Final Stand
In conclusion, to improve the ethical use of data mining we recommend the following:
1. When it comes to repurposing or improving data mining techniques/activities, make the
customer’s privacy a priority.
2. Awareness on ethical data mining - employees from the appropriate department should be
informed, maybe using a course/video created by the company itself, about ethically
using data mining to increase profit.
3. Delegate regulations and enforcement to one particular person/department.
4. Don’t repurpose data mining.
After reviewing all the information and all research, data mining has come to be what it is
now and will continue to evolve along with technology. There will always be codes, regulations,
and guidelines for its use. But when it comes to implementing them in a corporate environment,
it will be the company’s moral ethics that will help make a fine distinction between protecting
the customer’s privacy and an unethical way to raise revenue.
9
The Engineering Group Delta thanks you for your valuable time and giving the opportunity to
bring this important matter to the table. If you have any questions, regards, comments, or would
like to discuss this matter more in depth or in person, please contact the Engineering at
jsolte@nmsu.edu.
Works Cited
Data Mining Software Data Mining History (2011, October 24)
Website:
http://www.data-mining-software.com/data_mining_history.htm
Kranzl, J. (2010, October 18). Farmville Accused of Data Mining Retrieved from: VG 24/7
Website:
http://www.vg247.com/2010/10/18/farmville-accused-of-data-mining/
McGee, Marianne K. (2006, February 13). A Pill, A Scalpel, A Database Retrieved from:
Information Week The business Value of Technology
Website:
http://www.informationweek.com/news/1791034370
Palace, B. (1996 spring). Data Mining: What is Data Mining? Retrieved from
University of California at Los Angeles, Anderson Graduate School of Management
website:
http://www.anderson.ucla.edu/faculty_pages/jason.frand/teacher/technologies/palace/data
mining.htm
Parten, C. (2000, May 15). Data Mining vs. Privacy Retrieved from: Insurance Journal
Website:
http://www.insurancejournal.com/magazines/southcentral/coverstory/2000/05/15/21117.h
tm
Shermach, K. (2006 August 25). Data Mining: W here Legality and Ethics Rarely Meet
Retrieved from: E commerce Times
Website:
http://www.ecommercetimes.com/story/52616.htm
10
Schneiderman, E. (2010, September 15). Cuomo Announces Agreement Stopping Software
Company "Echometrix" From Selling Children's Private Online Conversations to
Marketers
Retrieved from: Office of the Attorney General State of New York
website:
http://www.ag.ny.gov/media_center/2010/sep/sep15a_10.html
Teufel III, H. Chief Privacy Offer (2007, July 6). 2007 Report to Congress on the Impact of
Data Mining Technologies on Privacy and Civil Liberties U.S. Department of Homeland
Security
Website:
http://www.dhs.gov/xlibrary/assets/privacy/privacy_rpt_datamining_2007.pdf
The Constitution Project. (2010). Principles for Government Data Mining: Preserving Civil
Liberties in the Information Age
Website:
http://www.constitutionproject.org/pdf/DataMiningPublication.pdf
Thearling, K. (1998, March 17). Data Mining and Privacy: A conflict in the making?
Website:
http://www.thearling.com/text/dsstar/privacy.htm
11
Download