Briefing Document: Data Aggregation Prepared for the American

advertisement
CS199r: Privacy and Technology
Group: Kevin Bombino, Nicholas Cirella, Jonathan Hyman, Benjamin Grubin, Haoqi Zhang
Spring 2007
Briefing Document: Data Aggregation
Prepared for the American Civil Liberties Union
The American Civil Liberties Union (ACLU) was founded in 1920 to protect individual
rights and freedoms (“civil liberties”) in a nation where only a simple majority governs.
Although the Bill of Rights and Constitution guarantee certain rights, history has shown
that majority groups within the federal and state governments have routinely attacked the
rights of individuals and minority groups. The ACLU maintains that it is their job to look
out for every individual and guard his or her interests against those of the government.
How does the protection of civil liberties intersect with the current practice of data
aggregation? The ACLU website1 lists four main areas of their focus with regard to civil
liberties: First Amendment rights (speech, association, assembly, press, religion), the
right to equal protection under the law, the right to due process, and the right to privacy.
We find that it is through their commitment to protect a right to privacy that the ACLU
has decided to come down against data mining and aggregation. The ACLU believes that
privacy is a fundamental human right, saying that “individuals cannot live their lives
freely, and democracy cannot work effectively, unless individuals have a reasonable
measure of knowledge and control over how they present themselves to the world.”2
Furthermore, the ACLU postures that as privacy erodes, a situation could develop where
citizens are forced to conform to every rule, law, and societal norm – for if they don’t,
someone will learn of their “transgressions” and use them to destroy the citizen’s
reputation.
How does the ACLU believe that data aggregation leads to a breach of privacy?
Certainly, the group recognizes that there is a privacy violation simply in the individual
acts of collection of data by various companies and government entities, but there has
always been protection in the fact that this information has been scattered about in
various courthouses and databases. Aggregation enormously simplifies the task of
obtaining this information about any given person, thus making privacy breaches much
simpler, faster, and more efficient. Simply put, the data aggregators have taken it upon
themselves to compile detailed information about individual citizens without their
knowledge or permission, with the intention of distributing this knowledge to others.
Although many of these data aggregators claim that they only conduct “background
checks” for “legitimate users”, at least one group has at one point sold their background
check software in CD-ROM form for around $40.3 The ACLU maintains that as it
becomes easier for anyone to “watch” a citizen, that citizen’s rights to privacy diminish,
and the scenario described above becomes more likely.
One argument typically employed to counter this position is that data aggregation
companies such as ChoicePoint do not actually create any new data about the consumer,
and that the data they collect is already public. Therefore, any data released by
ChoicePoint cannot be a breach of privacy. In fact, this argument would continue, the
1
http://www.aclu.org/about/index.html
http://www.aclu.org/privacy/consumer/15301leg20050310.html
3
http://seattlepi.nwsource.com/business/aptech_story.asp?category=1700&slug=Backgro
und%20Check
2
CS199r: Privacy and Technology
Group: Kevin Bombino, Nicholas Cirella, Jonathan Hyman, Benjamin Grubin, Haoqi Zhang
Spring 2007
any privacy issues would lay on the doorstep of those who were willing to part with the
original information: the government, courts, etc.
The current position of the ACLU regarding data aggregation companies such as
ChoicePoint is that these companies do fundamentally undermine the privacy of
American citizens. The ACLU feels that even though data aggregators don’t create new
data, they decrease privacy by bringing data together into one central database (the idea
there being that data scattered in multiple databases is harder to gather). The ACLU also
feels that the government is not doing enough to protect American citizens from
criminals bent on fraud from obtaining personal information, also citing that the
government relies heavily on data aggregators to create dossiers of innocent citizens not
suspected in any wrongdoing.
Despite the government’s access to aggregated data, the ACLU is strongly concerned
about how anyone with enough capital can purchase personal records about a person and
use that to commit identity fraud. Furthermore, the ACLU believes that there is a huge
problem inherent in the data aggregation model in that citizens are not notified of
breaches of their personal data unless they live in California, per a 2003 bill requiring
companies to notify individuals whose information has been compromised by
unauthorized third parties. The ACLU would like to expand the California bill into
national law and also require that data aggregators notify individuals about the
information that is being gathered about them and give citizens the choice of determining
what information the data aggregating company is permitted to store.
However, our analysis of the position of the ACLU is that the ACLU needs to focus on
more realistic applications of the problems with data aggregation and also change their
wording to be less colloquial and more professional. For example, the ACLU FAQ on
ChoicePoint claims that “ChoicePoint and its competitors have succeeded in laughing all
the way to the bank as they collect information on consumers…” or that data aggregation
is a serious threat in “an era when individuals are being held without charge by the U.S.
military, confined to detention camps without trial, and spirited away by the CIA to
prisons in foreign countries that practice torture.”4 While the first example is of highly
colloquial language that makes the ACLU’s argument sound like one spoken among
friends, the second example is laced with political sentiment and judgment, which is
likely to upset and turn away people with different political views from the site, causing
them to ignore the good warnings that the ACLU is trying to provide.
Additionally, the ACLU animated illustration of the problems with data aggregation5 is
highly unrealistic, detailing a pizza delivery company looking into a customer’s health
records to charge him more for pizza, and then criticizing the customer for being cheap
when he complains about the increased price because the pizza delivery company notices
that he has plans to go to Hawaii and paid a lot for the tickets. Situations such as those
are highly improbable and probably upset only the highly paranoid of privacy. It is our
position that the ACLU needs to recreate the problems of data aggregation in terms of
more realistic dangers that Americans can sympathize with: being discriminated by a
4
ACLU. FAQ on ChoicePoint. 3/10/2005.
<http://www.aclu.org/privacy/consumer/15301leg20050310.html>
5
ACLU. Surveillance Campaign. <http://www.aclu.org/pizza>
CS199r: Privacy and Technology
Group: Kevin Bombino, Nicholas Cirella, Jonathan Hyman, Benjamin Grubin, Haoqi Zhang
Spring 2007
health care provider because of family history, or being unable to get a job because of
inaccurate information provided by the aggregators, etc.
The expression of ACLU’s position in the “Bigger Monster, Weaker Chains”
report is another example of weak argumentation against data aggregation, this time
aimed at some underlying technology. For example, neural networks have been
researched for a long time, have been shown to be highly effective in classification on
large datasets, and have helped to identify credit card fraud. In their article, the ACLU
refers to neural networks as “highly experimental” as an argument against their use,
despite the large body of evidence demonstrating their effectiveness. ACLU also uses
this argument to explain why neural networks cannot help the government draw accurate
inference. While the statement is defensible, the argument is weak. It is not their
experimental nature, but the requirement that neural networks be trained on an extensive
corpus of data. Yet these neural networks unlikely to be able to be trained on a huge
dataset of personal information and be able to draw classifications on terrorists when
there are only so few terrorists that can be used as training data. Furthermore, the outputs
of neural networks does not provide a justification for its output, and can be used as an
argument against how unreliable inference based on such information can lead to the
victimizing of people without a clear reason.
Further attacks on the technological problems about governmental projects like Total
Information Awareness can strengthen the ACLU's position. In addition to attacking
whether accurate inference can be drawn, the ACLU should also attack whether
databases can be accurately joined together to form one view. Merging databases is no
easy task, and doing so (especially on top of potentially unreliable data) is likely to create
hard-to-detect errors in the dataset. By bringing technology into focus, the ACLU can
argue that the government is spending Americans tax money to built an impossible
technology to monitor the American people. This allows the ACLU to counter arguments
about such systems needing to be built first before they can be shown to be ineffective.
Doing so would strengthen ACLU’s position on why the government shouldn’t pursue
these projects at all, especially without input from the American people.
While the ACLU’s position data aggregation has several strong and consistent points, the
message can often get lost in sensationalism and vague lambasting of the government and
corporations. While there is a place for assigning blame and speculating on the future
neither of these should be the emphasis of one’s position. Sensationalism can grab
people’s attention, and get people interested in one’s cause, but cannot form that basis of
a realistic movement for change. The pizza delivery example is not only far from possible
with current technology, but also unlikely and provides no useful ideas on how to fix
current problems. Lumping Choicepoint together with CIA torture. These scare tactics
add nothing to the argument and in truth make little sense.
On the other hand there are the foundations on which to build solutions are
present in the same articles that contain such sensationalism. This article proposes that
new privacy laws be created to protect individuals from negative use of data aggregation
and identity theft. Furthermore, a list of principles is included describing a possible basis
for new laws6. This is the basis for a far more substantial position. New questions spring
to mind as one reads this section, such as: how can such laws be enacted and what can I
6
http://www.aclu.org/privacy/consumer/15301leg20050310.html
CS199r: Privacy and Technology
Group: Kevin Bombino, Nicholas Cirella, Jonathan Hyman, Benjamin Grubin, Haoqi Zhang
Spring 2007
do to help? In addition there are good things that come of data aggregation that must be
preserved. For instance background checks are an important safeguard preventing sex
offender from becoming grade school teachers and allowing companies a measure of
protection from those that might harm them. This must be acknowledged in any
purposeful argument. Less sensationalism, meaningful solutions and acknowledgment of
opposing positions would vastly improve the force and quality of the ACLU’s positions.
Download