Privacy Implications of Online Data Collection Lorrie Faith Cranor

advertisement
Privacy Implications
of Online Data
Collection
DIMACS Workshop
Lorrie Faith Cranor
AT&T Labs-Research
http://www.research.att.com/~lorrie/
Recent headlines
Activists charge DoubleClick double cross
Websites Pull Back From Doubleclick
Doubleclick shelves plan to tag Web surfers
Senators Raise Privacy Issue In AOL-Time Warner Hearing
Clinton Issues Privacy Warning To Technology Leaders
2
Online profiling in the comics!
Cathy
March 1, 2000
3
How do they get my data?
 Browsers advertise
IP address, domain name, organization, referring page
platform: O/S, browser
which information is requested
 Information available to
end servers
local system administrators
other third parties (e.g., doubleclick.com)
 Cookies, Web bugs, advertising networks
4
Browsers like to chatter
A typical HTTP request
GET http://www.amazon.com/ HTTP/1.0
User-Agent: Mozilla/3.01 (X11; I; SunOS 4.1.4 sun4m)
Host: www.amazon.com
Referer: http://www.alcoholics-anonymous.org/
Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, */*
Cookie: session-id-time=868867200; session-id=6828-2461327649945; group_discount_cookie=F
5
Servers record what they hear
Server logs
store host, time, date, requested URL, referrer
ppp.bu.edu - - [09/Dec/1996:20:33:22
-500] “Get /cgibin/wwwais?hemoglobin+gene
HTTP/1.0” 200 527
affiliation: Boston University, probably working
from home, probably student or faculty in
biology
6
What about cookies?
 Cookies can be useful
used like a staple to attach multiple parts of a form
together
used to identify you when you return to a web site so
you don’t have to remember a password
used to help web sites understand how people use
them
 Cookies can be harmful
used to profile users and track their activities,
especially across web sites
7
YOU
Search for
medical
information
Buy book
Set
cookie
Read
cookie
Ad
Search
engine
Ad
Ad company
can get your
name and
address from
book order and
link them to
your search
Book
Store
8
Referer log problems
GET methods result in values in
URL
These URLs are sent in the referer
header to next host
Example:
http://www.merchant.com/cgi_bin/order?name=
Tom+Jones&address=here+there&credit+car
d=234876923234&PIN=1234& -> index.html
9
What DoubleClick knows…
… about Richard M. Smith
 Personal data:
My Email address
My full name
My mailing address (street, city, state, and Zip code)
My phone number
 Transactional data:
Names of VHS movies I am interesting in buying
Details of a plane trip
Search phrases used at search engines
Health conditions
10
No clicks required
“It was not necessary for me to click on
the banner ads for information to be
sent to DoubleClick servers.”
– Richard M. Smith
http://www.tiac.net/users/smiths/privacy/banads.htm
11
DoubleClick examples
AltaVista Yellow Pages – Complete home address
(Fixed January 2000)
Banner ad URL: http://live.av.com/scripts/search.dll?ep=
7&gca=address&orderby=distance&sstreet=172+mason+terr
&scity=brookline&sstate=MA&szip=02446&scountry=
USA&query=sinsa&qname=&sic=&ck=&userid=130782922&
userpw=.&uh=130782922,0,&ccity=brookline&cstate=MA&ver
=hb1.2.2
Travelocity – Email address
Referring URL: http://dps1.travelocity.com/promoptout.ctl
?email=smiths@TIAC.NET
12
Merging online and offline data
In mid-February DoubleClick
announced plans to merge
“anonymous” online data with personal
information obtained from offline
databases
By the first week in March the plans
were put on hold
13
Public concern
April 1997 Louis Harris Poll of Internet
users
5% say they have been the victim of an
invasion of privacy while on the Internet
53% say they are concerned that information
about which sites they visit will be linked to
their email address and disclosed without their
knowledge
See also “Beyond Concern” study:
http://www.research.att.com/projects
/privacystudy/
14
International issues
European Union Data Directive
prohibits secondary uses of data
without informed consent
Creating personally-identifiable online profiles
will have to be opt-in in most cases
Upfront notice must be given when data is
collected – no web bugs
No transfer of data to non-EU countries unless
there is adequate privacy protection
15
Children's issues
Children’s Online Privacy Protection Act
(COPPA) requires parental consent
before collecting personally-identifiable
data from children online
16
Subpoenas
Data on online activities is increasingly
of interest in civil and criminal cases
The only way to avoid subpoenas is to
not have data
Your files on your computer in your
home have much greater legal
protection that your files stored on a
server on the network
17
Privacy concerns
 Data is often collected silently
Web allows lots of data to be collected easily, cheaply,
unobtrusively and automatically
Individuals not given meaningful choice
 Data from many sources may be merged
Even non-identifiable daa can become identifiable
when merged
 Data collected for business purposes may be
used in civil and criminal proceedings
18
Some solutions
Privacy policies
Voluntary guidelines and codes of
conduct
Seal programs
Infomediaries
Technologies for facilitating notice and
choice
P3P
19
P3P1.0 – A First Step
Offers an easy way for web sites to
communicate about their privacy
policies in a standard machine-readable
format
Can be deployed using existing web servers
This will enable users to use tools that:
Display symbols, play sounds, or provide
snapshots of sites’ policies
Display symbols or prompts after comparing
policies with user preferences
20
P3P is a Partial Solution
 P3P1.0 helps users understand privacy
policies but is not a complete solution
 Seal programs and regulations help ensure
that sites comply with their policies
 Anonymity tools reduce the amount of
information revealed while browsing
 Encryption tools secure data in transit and
storage
 Laws and codes of practice provide a base
line level for acceptable policies
21
Implementing a P3P 1.0 Server
 Formulate privacy policy
 Translate privacy policy into P3P format
 Place P3P policy on web site
 One policy for entire site or multiple policies for
different parts of the site
 Associate policy with web resources:
 Configure server to insert P3P header with link to
P3P policy; or
 Insert link to P3P policy in HTML content
22
A simple HTTP transaction
GET /x.html HTTP/1.1
Host: foo.com
. . . Request web page
Web
Server
HTTP/1.1 200 OK
Content-Type: text/html
. . . Send web page
23
A simple HTTP transaction
With P3P 1.0 added
GET /x.html HTTP/1.1
Host: foo.com
. . . Request web page
Web
Server
HTTP/1.1 200 OK
Opt: http://www.w3.org/2000/P3Pv1/;
ns=11
HTTP/1.1 200 OK
11-Policy: http://foo.com/p3p.xml
Content-Type: text/html
Content-Type: text/html
. . . Send web page
. . . Send web page
GET /p3p.xml HTTP/1.1
Host: foo.com
. . .
Request P3P Policy
HTTP/1.1 200 OK
. . . Send P3P Policy
24
Implementing a P3P1.0 Client
 Client can be implemented as browser, proxy,
plugg-in, part of an electronic wallet, java
applet, javascript, etc.
Can be entirely server side
 Look for link to P3P policy and fetch policy
with HTTP GET request
 Parse policy and take appropriate action
Display symbol, play sound, prompt user, etc.
Action can optionally be based on user preferences
Action can optionally allow data to be automatically
filled into form or transferred from electronic wallet
25
Some P3P Client Ideas
 Symbols for how data is
used
 complete transaction
 R&D
 Customization
 marketing
 Symbols to indicate
whether data is shared
 Symbols to indicate site
has privacy seal
 Symbols to indicate
compliance with laws
and regulations
 complies with German law
 complies with German law
if user gives informed
consent
 does not comply with
German law
 Symbols to indicate
match/mismatch with
user preferences
 information about cause of
mismatch on mouse-over
26
P3P Policies
 Machine-readable (XML) version of web site
privacy policies
 Use P3P Vocabulary to express data
practices
 Use P3P Base Data Set to express type of
data collected
 Capture common elements of privacy policies
but may not express everything (sites may
provide further explanation in humanreadable policies)
27
The P3P Vocabulary
 Who is collecting data?
 What data is collected?
 For what purpose will
data be used?
 Is there an ability to
change preferences
about (opt-in or opt-out)
of some data uses?
 Who are the data
recipients (anyone
beyond the data
collector)?
 To what information
does the data collector
provide access?
 What is the data
retention policy?
 How will disputes about
the policy be resolved?
 Where is the humanreadable privacy
policy?
28
Example Privacy Policy
TheCoolCatalog of 123 Main Street, Bethesda, MD 20814, USA, makes
the following statement for the Web page at
http://www.TheCoolCatalog.com/catalog/. We have a privacy seal
from PrivacySeal.org. Our privacy policy is posted at
http://www.TheCoolCatalog.com/PrivacyPractice.html. We do not
provide access capabilities to information we have about you.
We use cookies and collect your gender, information about your clothing
preferences, and (optionally) your home address to customize our
entry catalog pages and for our own research and product
development. We retain this information indefinitely.
We also maintain server logs that include information about visits to the
http://www.TheCoolCatalog.com/catalog/ page, and the types of
browsers our visitors use. We use this information in order to
maintain and improve our web site. We retain this information
indefinitely.
29
P3P/XML Encoding
<POLICY xmlns="http://www.w3.org/2000/P3Pv1"
entity=“TheCoolCatalog, 123 Main Street, Bethesda, MD 20814, USA">
<DISPUTES-GROUP><DISPUTES resolution-type="independent"
service="http://www.PrivacySeal.org" description="PrivacySeal.org"
image="http://www.PrivacySeal.org/Logo.gif"/></DISPUTES-GROUP>
<DISCLOSURE discuri="http://www.TheCoolCatalog.com/PrivacyPractice.html"
access="none"/>
<STATEMENT>
<CONSEQUENCE-GROUP><CONSEQUENCE>a site with clothes you would
appreciate</CONSEQUENCE></CONSEQUENCE-GROUP>
<RECIPIENT><ours/></RECIPIENT> <PURPOSE><custom/><develop/></PURPOSE>
<RETENTION><indefinitely/></RETENTION>
<DATA-GROUP>
<DATA name="dynamic.cookies" category="state"/>
<DATA name="dynamic.miscdata" category="preference"/>
<DATA name="user.gender"/>
<DATA name="user.home." optional="yes"/>
</DATA-GROUP>
</STATEMENT>
<STATEMENT>
<RECIPIENT><ours/></RECIPIENT> <PURPOSE><admin/><develop/></PURPOSE>
<RETENTION><indefinitely/></RETENTION>
<DATA-GROUP>
<DATA name="dynamic.clickstream.server"/>
<DATA name="dynamic.http.useragent"/>
</DATA-GROUP>
</STATEMENT>
</POLICY>
PrivacyBank.Com
PrivacyBank
bookmark
31
Infomediary example:
PrivacyBank
PrivacyBank
bookmark
32
Challenge
 Data is useful for research, targeting potential
customers, building relationships with
customers, etc.
 Privacy laws make data collection more
difficult
 Data collectors have personal privacy
concerns too
 How can we collect data in ways that reduce
privacy concerns while remaining useful for
research and business?
33
Download