22839 >> Ben Lefschetz: Okay. So let's get started. ... Thank you very much for coming here this morning, coming...

advertisement
22839
>> Ben Lefschetz: Okay. So let's get started. I'm Ben Lefschetz (phonetic).
Thank you very much for coming here this morning, coming out so early in the
morning for some of you. So today I'm delighted to present to you Alfred Kobsa
from UC Irvine. Alfred will talk about actually a selection of topics as far as I
understand. An overview talk covering sort of topics from privacy enhanced
personalization, a topic that many of us intelligence here having growing
increasing interested in over the past, I don't know, maybe a year or two, but
actually Alfred's group has been working on these topics for quite a bit longer
than that. Thank you very much.
>> Alfred Kobsa: Thank you. My talk is about user-tailored privacy for
personalized websites. And here's a kind of roadmap. Just to be on the safe
side, I will first briefly talk about personalization. Then briefly talked about
privacy as related to personalization, then about privacy-enhanced
personalization. And then I would like to present the project that is nearly
completed where we worked on a system that gives privacy-enhanced
personalization for internationally operating websites.
And depending on the time remaining I will also discuss two new projects of mine
and I understand I must present at least one of those.
This diagram is from the past millennium. It shows personalization in 1998,
1999. Foresta Research did a survey among the 44 pioneering companies at
that time that had some personalization implemented at their websites, and you
see tailored e-mail alert was very popular as a personalization method at the
time. Customized content [indiscernible] started it, wish lists, recommendations
already existed. And Emerson even experimented with custom pricing but
quickly gave it up.
So this is kind of old personalization. Now we have far more advanced forms of
personalization. We have recommenders for everything, music, movies, books,
news articles. We have personalized search. Some search engines even have
[indiscernible] default. We have personalized ads on the back pages that we
visit. Would he have quite a lot of personalized in the educational areas, so
hundreds of thousands of U.S. high school students already work with tutoring
software, primarily in math, that builds models of what these students know and
do not know and tailor their teaching strategies accordingly.
Then we have increasingly personalization mobile devices where location is
taken into account. And to some extent even habits.
We see more and more the content of information being adapted to visitors'
presumed knowledge, expertise, and also on the horizon to media preferences.
So whether people see pictures or other text or other movies.
So this is kind of the front of personalization today.
How do personalized systems do that? Just one minute summary. There's two
main data sources. One is user input that has been requested from users. And
the other main source is user interaction logs. So the system watches what the
user is doing and keeps a log of that. Those are the main primary data sources.
Based on that, all sorts of inferences are being drawn and one can still loosely
group there into three different areas. One is assignment of users to groups
based on marketing research. So this is customer segments that are being
identified, or human computer interaction research.
Then there is still rule-based inferences around where websites apply business
rules. They're typically also based on marketing research.
The majority of inference methods nowadays however is based on machine
learning, so algorithms learn about users and then can make predictions about
users.
The primary input data and the results of those inferences are stored in a
persistent user profile. So user model personalization is typically not in one short
process anymore but rather it's a process that involves many user sessions and
it's updated all the time. Yeah?
>>: Question about the example you have on this page over here. It says
recommend this product, a dish, because you like [indiscernible] cooking. Do
you feel that that explanation is productive, counterproductive or it depends?
>> Alfred Kobsa: I'll answer your question at the end. We don't know yet, but we
will know in two weeks. I'm kidding.
All right. There were quite a few surveys, user studies over the past 12, 13 years
which quite consistently showed that personalization specifically on the web
delivers benefits for both users and their vendors. So already back in the 1990s,
market research firms could show that personalized websites are more sticky
than nonpersonalized websites and convert more visitors into buyers. And there
were service among users who stated that they liked personalization, even will to
spend some very limited amount of time to improve personalization.
And also one could see that personalization leads to more -- more downloads,
more purchases. Emerson has quite an impressive amount of additional sales
through their recommendations that Emerson gives. And also, there's quite
some evidence that personalized ads are more often kicked through than
nonpersonalized ads.
This would look like a win-win situation, both for users and for companies that
provide personalization if there were not this privacy thingy. So personalized
websites need to collect a substantial amount of personalized data about their
users in order to be able to personalize. And they also do this in a fairly I object
conspicuous manner so people don't see that they're being tracked. And users
don't like that.
There is tons of evidence over the past ten years, even longer, that people are
concerned about divulging information about themselves, about being tracked
online, that they're not only concerned, but that they also take actions like leaving
websites, faking registration information. You're not alone if you do that. Not
buying from sites because of privacy concerns.
So about a hundred surveys have been made over the past ten years that kind of
consistently show that. The numbers are slightly different. The organizations
that administer these surveys are also different, ranging from marketing research
firms to the American Association of Retired People to universities.
But the result from these surveys is fairly consistent: People are concerned and
don't like it.
Now, one caveat. There has already been some criticism that these surveys
may be a little bit biased. And Harper and Singleton put this very harshly.
Service generally and private service in particular suffer from the talk is cheap
problem. It costs the consumer nothing to express a desire for a law to protect
privacy. After all, who would not state that he or she is concerned in some sense
about privacy?
So this is a little bit cynical, maybe. But experiments have shown that there is
some truth in that. So there were a couple of experiments conducted where
people were first asked about their privacy attitudes, their concerns, and then
they were put into a concrete situation where they had to make privacy-relevant
decisions, like buying something where they were asked to give out data about
themselves.
And it turned out that people quite often did other things than what they had
originally stated. So that there's a gap between people's stated privacy attitudes
and what they do then in a concrete situation.
I will come back to that in a second.
Okay. So we have on the one hand personalization requires personalized data,
but users are reluctant to give out personal data.
And this looks like a tradeoff. So the more privacy I want, the lest personalization
quality I can get and vice versa. And therefore, on the VAP (phonetic), in trade
magazines, you often find discussions about finding the right balance between
personalization and privacy, privacy versus personalization. But it does not have
to be an either/or.
Current research shows that the relationship between privacy and
personalization is quite indirect. Its situation-dependent very much. It's
influenced, mitigated by quite a number of factors.
And so you see here copied example models to the being proposed that kind of
try to show the relationship between privacy and personalization, depending on
other factors. And you see there's quite a lot of influencing factors.
So the idea nowadays is that people use some sort of privacy calculus in making
privacy decisions and thereby take quite a few factors into account.
And that people are not very good on verbalizing, predicting what their privacy
calculus would need as a result in advance if you ask them. Yeah?
>>: I don't see the studies. Do you have any? Can you point me to studies
where they've examined users and showed that this calculus was taking place?
>> Alfred Kobsa: Yeah. I can point to you studies, yeah. So this is Jaloppa
(phonetic) or the structure model that he developed. This is unpublished by
myself. But we can talk afterwards and I can point you to some literature. It's
mostly in the social science literature. Okay.
Now, if there is no such direct stringent relationship between privacy and
personalization, then it makes sense to search for personalization with more
privacy. Some of you do work on personalization methods that are more
privacy-friendly than the standard personalization methods that are currently
around. And sometimes even without any loss in quality of personalization,
sometimes possibly with some loss.
So we are not bound to this line here in our search for personalization algorithms,
but rather we can look a little bit upwards, look for more privacy with little reduced
personalization quality or ideally same personalization quality. Yeah?
>>: So I understand that [indiscernible] purposes, but at the same time, I wonder
if there is a [indiscernible] there on trying to measure privacy on some sort of a
continuum because my experience, it tends to be one of these various
psychological driven sort of notions, with huge gaps right [indiscernible] I mean, if
you push me, I'm willing to sort of make this jump to the next, you know, drop of
privacy or something like that. So very difficult to define some sort of a
continuum status, in fact measurable.
>> Alfred Kobsa: Yeah. So we and other people do ask for their privacy
impression about a new system, and you get answers, but those answers are
typically highly correlated with other attitudes about the system. So for instance,
trust in a system is a very important factor. And those are typically very much
correlated.
So our lesson was not to ask the privacy question isolation but throughout a take
other factors into account, including for instance, recommendation quality, which
is also highly correlated with trust and interacted with privacy.
So it's not one-dimensional. You can ask one-dimensionally, but it's not the full
answer.
This is the goal of privacy-enhanced personalization. Question, research
question: Can we have good personalization and good privacy at the same
time?
In the research that I'm going to present now, we operationalized this question a
little bit. We take privacy constraints as a given, so we don't negotiate them.
And then try to optimize personalization within these constraints.
In research that I'm going to show at the end, we kind of will not take this stance
anymore.
So we want to give optimal personalization within the given privacy constraints,
without those constraints.
We have two major kinds ever constraints. One is people's individual privacy
preferences. So what people like and dislike in terms of privacy. And the other
constraint is all sorts of privacy norms. So there's law, there's industry
self-regulation, there's abstract privacy principles.
And we take these into account. These set the boundaries so to speak. And
within these boundaries, we try to reconcile personalization and privacy. And
again, there's two major thrusts in this area. One is to the development of
privacy-enhancing or privacy-enhanced technology. And some of you are
working in this area.
And then there's privacy-minded user interaction research. And typically, these
two thrusts go hand in hand. So it's not very often that you just do one of those.
Let's talk about those individually. So amount of privacy constraints first. Users'
individual privacy preferences. What do users like and dislike in terms of
privacy?
We conducted very many interviews with people about privacy concerns in
different areas. And we found that people have very different individual privacy
preferences. There are commonalities, of course, but variability is very high.
One can make some generalizations. So it's clear that people's privacy concern
depends on the type of information that is being sought. People have less
problems giving out basic demographic information about themselves, like where
they live, what city, lifestyle, taste, does not raise many privacy concerns.
Whereas financial information, contact information, exact address, phone
number, specifically for women, credit card, Social Security card number, of
course, raise more privacy concerns.
So this would be one generalization that can be found.
Interestingly enough, the value of the data plays a value. So there was at least
one experiment that showed that people have less concerns giving out certain
data that are listed below here if their personal data doesn't deviate too much
from the perceived average. Once people deviate quite a bit, they become more
hesitant to give out this information.
Now, this has been confirmed in a person-to-person privacy situation, but it may
also apply to person-to-organization privacy situation.
Privacy norms would be the second kind of privacy constraint to which we cater.
There is quite a few privacy norms around. So more than 50 countries and
nearly a hundred states worldwide have privacy laws enacted. The U.S. doesn't
have an umbrella privacy law, but as you probably know, there are privacy laws
specifically geared towards children's privacy, privacy of online data, and media
rentals.
Industries came up with self-regulation. So many companies have privacy,
internal privacy regulations. Some industry sectors have them. Like the U.S.
network advertises initiatives. Members must respect certain privacy regulations.
And there also exist more abstract privacy principles like the OECD or the Asian
Pacific Economic Corporation. They came up with privacy principles mostly as
basis for the development of national privacy laws.
But member organizations like the ACM came up with privacy principles and
asks its members to respect those.
The interesting thing about that he is privacy norms and specifically up here
these privacy laws is that they effect quite a bit what a personalized system is
allowed to do with and without the consent of the user.
We analyzed about 30 international privacy laws, and here you find professions
from the German privacy law which is quite strict, or from Europe wide general
privacy framework that's the European countries have to implement that in some
cases severely restrict what you're allowed to do in personalization. So usage
logs must be deleted after each session, says the German multimedia law,
unless the user agrees with this.
Now, this is a problem because if you do machine learning waived on user logs,
you typically combine several -- the logs from several sessions because a single
session is typically very short and you don't have enough data to be able to learn.
Where usage logs of different services may not be combined or European law
says if an important decision is being made, the human must always be in the
loop.
So these personalized tutoring systems that high school students use in the U.S.,
they could easily give you an F under this regulation or flunk you from school or
whatever. There must always a human in the loop.
So privacy laws to some extent severely affect personalization methods, where
they may be employed or may not be employed. And in my example, I will now
present methods for reconciling privacy and personalization. It takes this into
account in the context of an internationally operating website.
We use internationally operating websites because they're subject to many
privacy laws no matter where they are located. So European privacy laws
impose severe -- it's called an export restrictions -- on data that is being
transmitted outside of the European union. So even if your service in U.S., you
are subject to these export restrictions.
>>: [Indiscernible]? Are you kind of inherently international ->> Alfred Kobsa: In some way, yes. Yeah. But some companies take this into
account and smaller ones don't.
>>: [Indiscernible].
>> Alfred Kobsa: And the large ones do take this into account. So Google does.
Yahoo does. Amazon does. Necessarily once we have a location in the
respective country, you are under pressure to really take this into account. But
even if major websites do not have a [indiscernible] presentation yet, they tend to
adjust to national legislation. A is a goodwill. B, they might get a local
representation [indiscernible].
>>: [Indiscernible] I think Netflix is a very good example of that for other reasons
[indiscernible].
>> Alfred Kobsa: Yeah. They will now go into Europe. They announced it. But
this is definitely a valid consideration. For instance, what Netflix does in terms of
profiling would not be admitted into Germany without consent of the user.
>>: A lot depends on whether you're protecting your butt in terms of legal stuff,
right? Because if you're a company in California and you realize you're operating
or selling in California [indiscernible], which is why Amazon has cut a lot of their
affiliate programs because of the various tax-collecting laws in the various states.
They've cut programs in Illinois and California and a whole bunch of states
because they didn't [indiscernible] collect sales tax.
>>: Sure. So you're just talking about people who are taking international things
because I can put up a website and [indiscernible] log data personalized stuff.
>>: Right.
>>: And I wouldn't be internationally operating.
>>: Right. Be internationally operating, but since you're not targeting it
specifically at European people, you can get away with it like [indiscernible] or
that you were to [indiscernible], be like, oh, I'm based in the U.S., wasn't ever
meant, blah, blah, blah.
>>: Yeah.
>> Alfred Kobsa: On top of that internationally operating websites should also
take individual privacy preferences into account. I will show an example later.
But in the future, they may even have to take individual privacy preferences into
account.
Just last week, the Spanish data commission -- data protection commissioner
filed an official request with Google Spain that is 99 -- I think it was 89 or 99
people wanted certain pieces of information not to be retrievalable anymore by
Google for various reasons.
For instance, in Europe, if you commit a crime and if you serve your time, then
after a certain period of time, any public mention of this fact is prohibited and for
some of those people, it was still on -- retrievalable online. So they had a kind of
legal right.
In other cases, it was just embarrassing information. And so the Spanish Data
Protection Commission is now negotiating with Google. We'll see how this turns
out.
Okay. Now, actually, this kind of summarizes what we just discussed.
Companies react differently. Disney told me their website confirms both to
European privacy legislation and to the U.S. COPA act, which is about children's
privacy. So they have a kind of minimum subset what they do. Others do
country-tailored. Google does country-tailored. IBM does country-tailored. And
they try to group different countries like the German-speaking countries since
they have similar privacy legislations.
We propose more flexible -- and the downside with all of this is it's inflexible and
individual privacy preferences are not at all taken into account.
We propose a more flexible way of approaching this problem that is based on the
fact that personalization currently is not very much done anymore as part of an
application system, but rather that there some separate personalization server,
user-modeling server that carries out all or most personalization.
So here you see such a server that contains profiles of many, many users. And
here you see personalization methods that take data from here and create
inferences and put back those inferences.
And here are the personalized implications that fetch and give data to the user
modeling server. So this is the backdrop of our research.
And we now look at the privacy implications of these user-modeling methods that
we just saw. Specifically had kind of data are being requested. Is it just
demographic data? Is it user-supplied data? Is it more tracking of users.
And these personalization methods have different data requirements and
therefore also different privacy implications. And our basic idea is to give each
user only those personalization methods that have privacy implications with
which the user consents and which are permitted based on the privacy legislation
that may possibly apply to the user. So every user gets their user-tailored
privacy, so to speak.
As a tool to facilitate implementation, we use product line architectures that were
originally developed as a common architecture for a set of related products like
operating systems on cell phones. So in a product line architecture, you try to
capture commonalities of all your products, optional parts, and then variants for
your different cell phone models.
And product line architectures were developed with the intent to instantiate a
certain architecture for a specific cell phone model, its development time or at the
latest, at production time.
Increasingly, though, product line architectures are also being used dynamically
at runtime. And one of my colleagues, André van der Hoek, has been working
on that for many years. And so we try to create a runtime instance of the
personalized system for each user which has the added advantage that people
can change their minds and change the privacy preferences and we can also
cater to that. Yeah?
>>: So there seems to be a bit of a [indiscernible] unless you have some
[indiscernible] properties like the [indiscernible] right to forget, I think they call it.
[Indiscernible] which is quite difficult to ->> Alfred Kobsa: Absolutely, yeah. Absolutely.
Let's look at an example so this is not so abstract. Let's assume we have an
internationally operating website. Some sort of recommend the system that
adapts to privacy constraints. Let's look at it more closely. Here are three users:
Bob, Alice, Cheng. And here you have the privacy constraints that apply to these
users. Bob is in the U.S. Alice is in Germany. Cheng is in China.
Since Alice sits in Germany, this multimedia law applies to her, so that profiling is
not allowed except with the consent of the user. The personal data must be
deleted after each user session, except for certain purposes.
Cheng does not have national privacy law in China. But assume he dislikes
being tracked.
And Bob in the U.S. also does not have an umbrella national privacy law, but
assume that this recommender system is part of the NAI that has self-regulation
with regard to privacy about, for instance, not being allowed to combine personal
identifiable and not personally identifiable information.
Now, we look at these user-modeling methods that we saw before. And check
when they can be used in the privacy constrains of these three users or not.
So Cheng we say doesn't like tracking. And incidentally, the last three methods
all involve tracking. So we are not allowed to use those for Cheng.
And now, we construct a runtime system instance only with the permitted
personalization method for each user. So in this case, we get three different
instances. If the privacy constraints of two users are the same -- and this
happens quite often -- then these people would share the same architecture.
And those architectures now generate the inferences that get into the user
profiles of those users.
Now, this is a nice idea, and we implemented this idea. Actually we did four
different implementations. And the first thing that we looked at was does this
idea scale? Product line architectures are very complex and you can imagine
dynamic product line architectures at runtime are a little bit problematic. So we
did some simulation with a number of simulation parameters that were all very
conservatively chosen. So we said that our system has ten personalization
methods.
So far I've never seen a personalized system that uses like five, six different
personalization methods at the same time.
We restricted the number of privacy constraints to 100. In all these privacy laws,
we hardly found more than 25, 35 restrictions.
And we assumed that our architecture is distributed on a cloud, but we only
simulated single node of this cloud. Simply, we did not have a cloud available.
And we came up through private experiments with these other simulation
parameters. And we simulated this on a relatively cheap platform, so nothing out
of the ordinary.
And this is the testbed. I'll skip that. Yeah?
>>: So you said that most of the legislations that you have got already have
more than 50 or so constraints. How does that fit with something like HIPAA
which is also privacy legislation ->> Alfred Kobsa: Yeah.
>>: -- running into hundreds of pages?
>> Alfred Kobsa: Yeah. Sorry. I shouldn't have said that. We only looked at
national privacy laws. HIPAA is extremely complex, very, very detailed. And I
quite frankly not sure whether it fits into this framework.
We are dealing here with constraints that apply more on the higher level and
where the objects that are permitted or not permitted whole personalization
methods.
HIPAA has tons of very, very specific formalistic requirements, and we didn't look
into it very much. And I kind of fear that we will have troubles pulling it into shape
to allow it to handle it in this. Those national privacy laws are much different from
HIPAA.
There was another question? Yeah?
>>: How do [indiscernible] themselves choose what their preferences are?
>> Alfred Kobsa: I'll show that in a minute.
Here you see performance times. And those are the four implementations that I
mentioned. Here you see that we get these spikes here. Also the means is half
a second four assigning users to an architecture, so this is intolerable.
Here you see a more customized architecture that uses caching so I mentioned
that user who have the same privacy constraints or similar privacy constraints
even can share an architecture. We use caching for that. And you see that the
performance times are much more acceptable in this implementation.
>>: So you think about as the latency induction or [indiscernible]?
>> Alfred Kobsa: This is the way -- what edit time -- thanks for asking. What edit
time can users expect? The first time that they're visiting a site because of our
privacy thing that we do here.
>>: And then they would have a [indiscernible] that would set ->> Alfred Kobsa: No. Their process, their session would be assigned to an
architecture.
>>: But it seems like caching goes against some of these other regulations
about, you know, remembering the user ->> Alfred Kobsa: Yeah. It's not data caching, but caching of architectures.
Architecture caching. Imagine every user had its own personalized systems that
geared towards the privacy constraints, but if you have a million users, you do
not want to have a million instances. So let's combine users that have the same
instance of our general personalized systems and give them the same instants.
That's the only thing behind it.
This is more a relative comparison between the four different implementations.
And here you can see that caching brings quite a bit of efficiencies. Those are
two caching versions.
We did a back of the [indiscernible] log calculations, how much in additional
resources would be needed by the major internationally operating websites which
happens to be Google currently. So they have three points. 3.24 billion visits per
month, which translates into 1,250 visits per second. And we can handle two
visitors per second on the node. So we would need about 2,500 nodes. Just as
a rough calculation. Which is extremely cheap.
Two caveats. One positive caveat, we assume that all these privacy parameters
would be randomly distributed. Which is however not the case. So people kind
of gravitate towards specific typically privacy preferences and also countries,
some countries have many visitors and so the privacy constraints from those
countries would be over represented, which increases the benefits of caching.
So under these considerations, this might be very conservative. However, you
also didn't take into account the efforts to cache all these many -- to keep these
all many instances, but we have to feel that the hosting efforts will be roughly the
same.
>>: It seems like the list of caveats [indiscernible] because if you look at the
systems deployed on a large scale, which you have done, you see all sorts of
interesting patterns that emerge, such as precomputation that they do a lot of -- a
great deal of caching on various levels which has benefits [indiscernible]
interesting downsides from the standpoint of privacy. Other tricks like for
instance, Google will often throw up a capture at you or will force you
[indiscernible]. So [indiscernible], they could be doing computation in the
background, you know, speculatively calculating a page for you that they'll show
to you in a second.
>> Alfred Kobsa: Yeah.
>>: So there's tons and tons of stuff ->> Alfred Kobsa: I would agree with that and will respond directly to this. We
feel that major instances, so ones that are being used very often, will have a
permanent life of themselves and do what you just mentioned just by themselves.
And this is quite interesting because then you have personalization geared
towards certain user groups. And this becomes quite interesting. I will come
back to it at the end of my talk. It does the same what is currently be done, but
within the privacy constraints of a specific user group for a specific country, for
instance, or for people who share certain privacy constraints.
>>: So you have a hundred bits of user privacy preferences.
>> Alfred Kobsa: Yeah.
>>: And you say that many preferences are common, but I bet there's a long tail
of privacy preferences [indiscernible]. You have a hundred bits to identify user
and track them based on the particular configuration that you use, which may
then, you know, give you information about [indiscernible].
>> Alfred Kobsa: Unfortunately, I have to say we have no data about that. We
did study where we randomly generated such privacy constraints. To my
knowledge, there is no real study yet that looks at the long tail of specifically not
of users' individual privacy preferences.
They are very diverse, by all means, and in this case, our architecture kind of
becomes more efficient if there is not so much variability because then caching
has [indiscernible], but in principle we can cater to any combination of privacy
preferences.
So you asked how can we tell users to specify privacy preferences. We
implemented book retailers website that deliberately was made to look similar to
some neighborhood bookstore that you have around here. This website asks a
couple of nosey questions. And here on the right side, you find privacy control
panel as we call it, where people can specify privacy preferences. And then they
see here what kind of personalization methods will operate or will not operate
under these preferences. So people can click here and see effect some
personalization methods immediately. And they can also click on the eye icons
here and get further information both about preferences and personalization
methods.
And we now wanted to find out whether people appreciate this possibility. Now,
how can you find out whether users like it? One way is simply to ask users to
show them the original site and the site with the privacy preference panel and
ask them what they like better. This is inquiry-based user study. And it's good
because you get information about users' rationale that you would not get if you
would simply look at what users are doing.
But on the other hand, as we know, often users tell you something, what they
would do or so, but the behavior eventually does not correspond to the stated
intentions. And in the area of privacy, this seem to have a major effect. And
therefore, we decided to do both, so both to run a behavioral experiment and but
also to ask people about their attitudes.
Now, our experiment is based in some part on deception. And in order that
you're not possibly shocked by that is correct I show this slide that essentially
says we're in good company with that. Even though it's not frequently used, in
the area of human-computer interaction, it is being used. Mostly in cases where
you cannot fully implement the system because it's too time-consuming or
impossible given the state of the art. But you nevertheless tell the user you were
working with the real system.
And there is different degrees of deception involved and I listed them here
inversely. So Wizard of Oz experiments, there you have a human who secretly
does what the system is supposed to do.
You have pretend studies where you implement shortcuts. You tell users they
are working with the real system, but in reality, you implement shortcuts so the
real system does not exist.
And the third one that you channel user past missing system functionality through
clever user interface design and task allocation. That's quite commonly done in
human-computer interaction user studies. So our experiment uses pretend
methods and also number three, you channel people past missing parts.
We told user that they would have to do a usability test with a new verse of this
well-known online bookstore. And this new version would first ask users tons of
questions and then recommend books to them.
We told them that they would not have any obligation to us to answer any
question, but giving correct answers would improve recommendation quality.
We also warned them that data would be given out to this company. And we told
them that they -- at the end they would have the opportunity to buy one of the
recommended books for very cheap price.
And this gave them some intensive to answer those questions truthfully because
this would guarantee that at the end, they would get better recommendations and
could possibly buy one interesting book for a high discount.
And we also warned them from the beginning that if they decided to buy a book,
then they would have to show us their ID and also show us the credit card data
for verification.
Now, at first, users answered those questions. Those were mostly book-related.
Some of them were a little bit sensitive, like about people's book preferences with
regards to their sexual preservations or people's interest in certain medical areas,
including more sensitive medical areas like venereal diseases.
People answered nine pages of questions, were encountered to review their
answers so their recommendations would get better. They then submitted the
answer. And I forgot to mention, during this time, all of this fake book
recommendation counter visually decreased. So at the beginning, at one million
books are being considered for them. And by answering questions, this counter
visibly decreased. But again, this was all fake.
Then they got 50 books recommended at the end. But everyone got the same
books recommended. No matter how many questions they had asked, now
matter what answers they had given.
This we did in order not to introduce personalization quality as an additional
factor in our experiment. Yeah?
>>: [Indiscernible] assumption that the more questions they answered, that was
how they obtained the 70 percent discount?
>> Alfred Kobsa: No.
>>: [Indiscernible] no strings tied to the 70 percent discount?
>> Alfred Kobsa: No. There's no strings tied. No.
So the motivation for people was if you answer many questions, then the
recommendation set will be better, then you can possibly find the book that is
more interesting for you. So you can make a good bargain if you answer many
questions and answer them truthfully.
>>: [Indiscernible] measure of satisfaction with the recommendation provided
then?
>> Alfred Kobsa: We did. I will come up with that.
So we recommended those books to them, and they could pick one of those.
The books were chosen to be interesting for our subject audience which were
mostly students, so sex and crime, health advisories, trail guides.
And then they could decide whether or not they wanted a book. And this was an
important step because everything that's done so far they done under
pseudonym. They could choose their own pseudonym. But if they decided to
buy a book, then they had to identify themselves.
And at the end we checked their data in order to prevent that they could send the
book to their parents or girlfriend, boyfriend, whatever. So they had to give their
real data and the threat was that their real data is associated with all the answers
that they had given pseudonymously.
Okay. So this was the experiment. Again, there was no recommendation
behind, but all was pretend.
We administered this experiment to students from my university. We had to
discard seven of them, six of them because they looked familiar to the
experimenter and we felt that they may behave more privacy consciously in this
situation.
They were randomly assigned to two system versions. One was the version that
you saw with the privacy control panel, and the other was simply without this. So
there was only one factor that varied between those conditions.
And we assumed that based on literature, that users will be more willing to give
out personal data in the version with the privacy control panel and also judge the
privacy friend in this higher.
>>: Do you take into account the research that looks at if a privacy disclaimer is
placed at the beginning, people are becoming more privacy conscious and ->> Alfred Kobsa: Yeah. We took this into account and [indiscernible] mentioning
of privacy at the beginning of an experiment. So when we sometimes
administered standard privacy instruments, and we always do it at the end, we
also made the experience that people behaved more privacy conscious if you
bring up the topic of privacy at the beginning already.
All right. Here are the results. This is the condition, the controller condition
without the privacy control panel. Here's with the privacy control panel. So
people answered more questions without privacy control panel.
Many questions were multiple-choice questions. They also gave more answers.
And interestingly enough, more people decided to buy a book in this enhanced
condition with the privacy control panel. Even though everyone got the same
recommendations.
Why so? Well, the only explanation that we have is in order to buy a book,
people had to identify themselves, give out name, address, and confidential
financial data. And they were more willing to do that in a situation that looked
more privacy friendly. We have no other reasonable explanation.
So this means that -- yeah?
>>: Oh, maybe the next slide you're going to talk about this. So those people
who were presented with this panel, did they exercise any options? Did people
with ->> Alfred Kobsa: Yeah. Yeah. So we asked about people. We did not -- we did
video record the experiment, but not [indiscernible] that. We asked people. And
here's the answers to questions like I paid attention to the privacy control panel.
Note again, we showed people the privacy control panel, but the main thrust of
the experiment and everything that we told users was this is a new version. You
do a usability test, you have to answer questions, your data is being given to the
company. We did not really point them to the privacy control panel. But people
nevertheless said that they paid attention to it, that they clicked [indiscernible] to
get more information, that they set options. 60 percent said so, and 40 percent
said they played around with it.
This is not mutually inclusive, exclusive.
So people at least noticed, paid attention, played around with it. Some of them
deliberately set preferences. And even though we did not, again, specifically
point people to this privacy control panel, people found it useful with four out of
only five scale, and would those who use it, so stated usage was quite high.
So it seems based on this, that this would be quite interesting as a control
mechanisms for users on websites to be able to have some control over their
privacy.
Some caveats again. Emerson has a high reputation. We tested this before.
And reputation plays a role in privacy issues. So we would need to test -- redo
this experiment with a site that has a lower reputation.
Also, this privacy control panel was permanently visible all the time. And
websites designers are not going to allow this. I mean, it takes up quite a lot of
screen real estate. But it's unclear if we just present an icon to users: Here's a
link to a privacy control panel, whether we will see the same effect, or whether it's
necessary that this is always in front of the users to have those effect that I
mentioned. Yeah?
>>: To pick on the first part a little bit more, seems like there's a huge
[indiscernible] effect happening here. So if you were, you know, I mean, if this
was done by, imagine, I don't know, [indiscernible]. You wouldn't engage
otherwise. Or your phone company knows where you are most of the time
because you carry a cell phone. Then, you know, then the concern, the level of
concern goes down tremendously because of the concerns you would have
otherwise if you didn't trust [indiscernible]. You had the fly-by-night books.com,
then you know, that would be -- it would seem like a different story eventually all
together and [indiscernible] little you could do alleviate the distrust you would
have with that [indiscernible] organization [indiscernible].
>> Alfred Kobsa: Yeah. I would agree with that. And if you look here, we may
have experienced some ceiling effect already. This is hundred percent questions
answered. So people were not shy giving at least one answer to every question
to Emerson.
So possibly with less trustworthy site, there's difference between with and without
privacy control panel might get larger since we would not have ceiling effect here.
>>: [Indiscernible] also efficient [indiscernible] for somebody to present Amazon
or a Wells Fargo [indiscernible].
>> Alfred Kobsa: Absolutely, yes. Mm-hmm. Yeah.
>>: [Indiscernible].
>> Alfred Kobsa: Yeah. There are very experiments to show that simply if you
choose a conservative layout for your site, this people's disclosure will
significantly increase.
And the last one is maybe most interesting. We are not completely sure whether
people fully understood our privacy control panel. It's very complicated to explain
personalization methods to ordinary users.
The question though is does it really matter? And this brings me back to some
other experiment with [indiscernible]. I'm not sure -- I gave a talk on privacy here
at MSR five years ago already. And I do not recall whether any of you attended
this talk, but if you did, you may possibly think in the back of your minds: Didn't
we here about this experiment already five years ago? And yes, you did. But we
administered the same experiment twice. Five years ago was German students.
Now again at UCI.
At that time, the manipulations were very much different. It was about
comprehensible and less comprehensible presentation of company's privacy
policies to users. Here it's about the privacy control panel. Nevertheless, we got
fairly comparable results.
Also in the old experiment, disclosure increased. People bought more books.
And of course it's a little bit dangerous to make generalizations based on two
experiments only, but the question certainly arises: Does it really matter what I
do in terms of privacy? Or will I get similar effects whatever reasonable thing I do
in terms of privacy?
Now, so maybe it's not really necessary that people fully understand but rather
it's important to give people control, to be -- give people comprehensible
disclosure so that major important privacy methods will all lead to similar
increases in disclosure and purchases. But this is mere speculation.
And this old experiment to answer your question completely, we also
administered to a sites that have lesser reputation. We found the same effect.
So roughly the same increase in purchase and roughly the same increase in
disclosure. So it was not the case as I speculated just before that for sites with
lesser reputation, the difference would go up. It was roughly the same.
>>: Are you planning on [indiscernible] as you mentioned before, there is still a
difference between knowing that concern ->> Alfred Kobsa: I would love to do a big test with some of the things that I
presented, by all means. Okay.
I quickly present -- I keep this and just present this to you. Would you answer
this question. This is a recommend a system for applications. And it asks
maybe know your household income. I want this here. Maybe you know your
household income. We can recommend apps that are popular among people
with the same income. So it gives an explanation of what the system is going do
with this information.
85 percent of users told us their household income. We varied the numbers a
little bit. Does this kind of -- is this a reason for you to give it out?
Past benefit for others. 94 percent of people in the past who gave us their
household income got better recommendations.
Or finally, the recommendations for you will be 80 percent better. So the
previous one just said something about past experiences for others, but here it's
kind of a prediction for you.
We have to aim to empower users to make better privacy decisions. And in this
experiment, we tried to find out what piece of information is more convincing for
users than others to reveal this requested piece of information.
This has not yet been done. At least as far as we are informed. So it would be
interesting to see what kind of added information will convince users to give out
information about themselves.
And this is not only about this, but rather the whole thing is to test and augment a
model for predicting the user -- evaluating the user advocacy or further command
the system that takes satisfaction into account, that takes privacy into account.
Also this is a small experiment in a [indiscernible].
If you want to participate still in this experiment, it's still up. So just go to this web
page and you can even bring an Emerson gift card. Thank you very much.
[Applause].
>> : [Indiscernible] any more questions [indiscernible]?
>>: Let me ask a question. So going back just a couple of slides you have, this
thing about predicted benefit, there seems to be a [indiscernible] but in order to
evaluate [indiscernible] benefit, you kind of need to give out that information to
maybe sort of [indiscernible]. So what do you -- I mean, you can give rough
estimates because of prior performance, but then again, prior performance is not
really a predictor or future gains. So what are your thoughts on that? Is there
such a thing as showing the advantages selection of sharing information of the
user [indiscernible].
>> Alfred Kobsa: Our thoughts is to make this recommendation -- so here it says
it will be 80 percent better, but of course, it very much depends, as you said, on
the value. In general, the more extreme the value of a piece of data is, the better
it can be used for improving implementation quality, but the more people are
reluctant to give it out.
And our idea is -- but we still have to test it -- to give people examples. If your
household income is in this bracket, we typically can improve by 20 percent, in
this bracket by so and so many. But we still have to test this. But this looks like
a reasonable approach to us, a firsthand approach to solve this [indiscernible]
problem.
>>: Secondly, what is the [indiscernible] from lying about all of this essential?
The [indiscernible] cues could be completely made up. That's entirely possible.
>> Alfred Kobsa: This is entirely possible. We are aware of that. Yeah.
We are -- well, in this case, in other experiments, we ask people afterward: Did
you lie about it? And these typically gives a control measure. They are very
experiments however that showed that if this button is provided, no, I don't want
to give it. Or if it's optional, then people who would otherwise lie chose just to
say no in most cases.
So we feel that the data that we would get here will be pretty accurate. We don't
look at the data anyways. But we also believe that people will give correct data
since they have the option to opt out. Yeah?
>>: I have a question. Do you mind going back to the social piece?
>> Alfred Kobsa: Mm-hmm.
>>: So I'm curious if you're going to look at the difference between cohorts,
85 percent of our users, or really that's not even cohorts. That's all users, people
like you, your friends and whether you -- based on your research or others, you
see a distinction in providing those levels of social cues. Not just the users of
this app, but people like you or even more so, people you actually know make
these decisions.
>> Alfred Kobsa: We did such an experiment in the context of instant
messaging.
As you know, there's all sorts of private information given out by instant
messaging systems that is privately related like when someone is online or off
line.
We, again, a pretend experiment where we told people that they're going to
evaluate and installer for an instant messaging system. And in the installation
process, we were asked to log in to the IM system that they really used and we
would import their name, the names and IDs of their friend from their IM system.
And then we told them for this privacy option -- we gave them a couple of privacy
options. So and so many of their friends have chosen their option and this
option. We varied the numbers. We also allowed them to exclude people so that
certain people cannot see their up time.
We found that there was some effect, but it was much smaller than the effect of
each individual privacy option.
So there was noticeable effect. Some privacy options simply were more privacy
intrusive than others. And this has a far higher effect on people's privacy choices
than what you claimed their friends had done.
So these navigation cues only have small effects seemingly. And there was
some other experiment in the area of security that had similar findings.
Essentially only if you really put that under people's nose, they were able to find
statistically significant effect.
>>: If you used people's Facebook friends, you showed the faces of the friend
that had made the decision, measure how much impact that might have.
>> Alfred Kobsa: Yeah.
>>: I guess it would be substantial.
>> Alfred Kobsa: Yeah. It could be that if you use some other context,
Facebook results are different. But in our IM context, there was an effect even
statistically of approaching significant but it was not very strong.
>>: The other question, if I may, I wanted to ask was about the mental model of
users and about the data sharing. Most of the work that you showed, it was
really a notion of you give the data away and then it's gone. Now it goes to some
third party and they basically can do what they want.
Have you looked in your work at all on the model where the user retains some
level of persistent control over the information and that the request is per use or
per given context and the ultimate sort of -- let's say the record lives with the
user. That they really become the source of truth for the information rather than
the information [indiscernible].
>> Alfred Kobsa: Together with German Ph.D. students, I was involved
[indiscernible] and we developed a system that allows people, A, to keep track
what data they had given to what website, which is important first step because
this is quickly forgotten. And also that they could request some data to be
deleted, which you can do under European privacy law.
And we did some user evaluation and people liked it, yeah, but we did not expect
anything else. We did not run some behavioral study. It's just about people's
liking and yes, people would use that, but this was a kind of quite obvious result.
>>: I guess the question I'm asking is something that a number of folks now are
trying to understand is whether the mental model of the data ultimately goes to
the third party from you and it's gone versus it's really an asset that you yourself
have assets to. And perpetual persistent rights over the course of a lifetime
even, and that the equity that you create in your digital activity grows over time
and it's something you want to curate and that would have value. And if the
relationships in each of these individual transactions aren't the end of the story,
they're part of a larger story about your digital life.
>> Alfred Kobsa: Yeah.
>>: I mean, what are ->> Alfred Kobsa: Yeah, I mean, you probe the area identity management where
similar ideas are being proposed. Again, there are big projects funded by
European commission where industry was involved, quite a lot of industry even.
And the idea is that you were the owner of your data. Websites are just stewards
of your data who use your data for certain business purposes. And that's it.
It's also essential the philosophy behind much of European privacy legislation
that you can only keep data as a company as long as it's needed for the business
purposes for which the data was originally collected.
>>: [Indiscernible] architecture of the Internet with very few exceptions and some
research projects ->> Alfred Kobsa: [Indiscernible], exactly.
>>: Right. But there isn't a place to see all of the data that you're actually
creating. It's disbursed by definition. And so I'm wondering in your work whether
you think that will create a different response from users if they have that ->>: If they can preempt ->>: Oh, sorry.
>>: -- just a second. [Indiscernible] but our [indiscernible]. So you are most
welcome to join us and continue this conversation at the next period.
>>: Okay. [Applause.]
Download