22839 >> Ben Lefschetz: Okay. So let's get started. I'm Ben Lefschetz (phonetic). Thank you very much for coming here this morning, coming out so early in the morning for some of you. So today I'm delighted to present to you Alfred Kobsa from UC Irvine. Alfred will talk about actually a selection of topics as far as I understand. An overview talk covering sort of topics from privacy enhanced personalization, a topic that many of us intelligence here having growing increasing interested in over the past, I don't know, maybe a year or two, but actually Alfred's group has been working on these topics for quite a bit longer than that. Thank you very much. >> Alfred Kobsa: Thank you. My talk is about user-tailored privacy for personalized websites. And here's a kind of roadmap. Just to be on the safe side, I will first briefly talk about personalization. Then briefly talked about privacy as related to personalization, then about privacy-enhanced personalization. And then I would like to present the project that is nearly completed where we worked on a system that gives privacy-enhanced personalization for internationally operating websites. And depending on the time remaining I will also discuss two new projects of mine and I understand I must present at least one of those. This diagram is from the past millennium. It shows personalization in 1998, 1999. Foresta Research did a survey among the 44 pioneering companies at that time that had some personalization implemented at their websites, and you see tailored e-mail alert was very popular as a personalization method at the time. Customized content [indiscernible] started it, wish lists, recommendations already existed. And Emerson even experimented with custom pricing but quickly gave it up. So this is kind of old personalization. Now we have far more advanced forms of personalization. We have recommenders for everything, music, movies, books, news articles. We have personalized search. Some search engines even have [indiscernible] default. We have personalized ads on the back pages that we visit. Would he have quite a lot of personalized in the educational areas, so hundreds of thousands of U.S. high school students already work with tutoring software, primarily in math, that builds models of what these students know and do not know and tailor their teaching strategies accordingly. Then we have increasingly personalization mobile devices where location is taken into account. And to some extent even habits. We see more and more the content of information being adapted to visitors' presumed knowledge, expertise, and also on the horizon to media preferences. So whether people see pictures or other text or other movies. So this is kind of the front of personalization today. How do personalized systems do that? Just one minute summary. There's two main data sources. One is user input that has been requested from users. And the other main source is user interaction logs. So the system watches what the user is doing and keeps a log of that. Those are the main primary data sources. Based on that, all sorts of inferences are being drawn and one can still loosely group there into three different areas. One is assignment of users to groups based on marketing research. So this is customer segments that are being identified, or human computer interaction research. Then there is still rule-based inferences around where websites apply business rules. They're typically also based on marketing research. The majority of inference methods nowadays however is based on machine learning, so algorithms learn about users and then can make predictions about users. The primary input data and the results of those inferences are stored in a persistent user profile. So user model personalization is typically not in one short process anymore but rather it's a process that involves many user sessions and it's updated all the time. Yeah? >>: Question about the example you have on this page over here. It says recommend this product, a dish, because you like [indiscernible] cooking. Do you feel that that explanation is productive, counterproductive or it depends? >> Alfred Kobsa: I'll answer your question at the end. We don't know yet, but we will know in two weeks. I'm kidding. All right. There were quite a few surveys, user studies over the past 12, 13 years which quite consistently showed that personalization specifically on the web delivers benefits for both users and their vendors. So already back in the 1990s, market research firms could show that personalized websites are more sticky than nonpersonalized websites and convert more visitors into buyers. And there were service among users who stated that they liked personalization, even will to spend some very limited amount of time to improve personalization. And also one could see that personalization leads to more -- more downloads, more purchases. Emerson has quite an impressive amount of additional sales through their recommendations that Emerson gives. And also, there's quite some evidence that personalized ads are more often kicked through than nonpersonalized ads. This would look like a win-win situation, both for users and for companies that provide personalization if there were not this privacy thingy. So personalized websites need to collect a substantial amount of personalized data about their users in order to be able to personalize. And they also do this in a fairly I object conspicuous manner so people don't see that they're being tracked. And users don't like that. There is tons of evidence over the past ten years, even longer, that people are concerned about divulging information about themselves, about being tracked online, that they're not only concerned, but that they also take actions like leaving websites, faking registration information. You're not alone if you do that. Not buying from sites because of privacy concerns. So about a hundred surveys have been made over the past ten years that kind of consistently show that. The numbers are slightly different. The organizations that administer these surveys are also different, ranging from marketing research firms to the American Association of Retired People to universities. But the result from these surveys is fairly consistent: People are concerned and don't like it. Now, one caveat. There has already been some criticism that these surveys may be a little bit biased. And Harper and Singleton put this very harshly. Service generally and private service in particular suffer from the talk is cheap problem. It costs the consumer nothing to express a desire for a law to protect privacy. After all, who would not state that he or she is concerned in some sense about privacy? So this is a little bit cynical, maybe. But experiments have shown that there is some truth in that. So there were a couple of experiments conducted where people were first asked about their privacy attitudes, their concerns, and then they were put into a concrete situation where they had to make privacy-relevant decisions, like buying something where they were asked to give out data about themselves. And it turned out that people quite often did other things than what they had originally stated. So that there's a gap between people's stated privacy attitudes and what they do then in a concrete situation. I will come back to that in a second. Okay. So we have on the one hand personalization requires personalized data, but users are reluctant to give out personal data. And this looks like a tradeoff. So the more privacy I want, the lest personalization quality I can get and vice versa. And therefore, on the VAP (phonetic), in trade magazines, you often find discussions about finding the right balance between personalization and privacy, privacy versus personalization. But it does not have to be an either/or. Current research shows that the relationship between privacy and personalization is quite indirect. Its situation-dependent very much. It's influenced, mitigated by quite a number of factors. And so you see here copied example models to the being proposed that kind of try to show the relationship between privacy and personalization, depending on other factors. And you see there's quite a lot of influencing factors. So the idea nowadays is that people use some sort of privacy calculus in making privacy decisions and thereby take quite a few factors into account. And that people are not very good on verbalizing, predicting what their privacy calculus would need as a result in advance if you ask them. Yeah? >>: I don't see the studies. Do you have any? Can you point me to studies where they've examined users and showed that this calculus was taking place? >> Alfred Kobsa: Yeah. I can point to you studies, yeah. So this is Jaloppa (phonetic) or the structure model that he developed. This is unpublished by myself. But we can talk afterwards and I can point you to some literature. It's mostly in the social science literature. Okay. Now, if there is no such direct stringent relationship between privacy and personalization, then it makes sense to search for personalization with more privacy. Some of you do work on personalization methods that are more privacy-friendly than the standard personalization methods that are currently around. And sometimes even without any loss in quality of personalization, sometimes possibly with some loss. So we are not bound to this line here in our search for personalization algorithms, but rather we can look a little bit upwards, look for more privacy with little reduced personalization quality or ideally same personalization quality. Yeah? >>: So I understand that [indiscernible] purposes, but at the same time, I wonder if there is a [indiscernible] there on trying to measure privacy on some sort of a continuum because my experience, it tends to be one of these various psychological driven sort of notions, with huge gaps right [indiscernible] I mean, if you push me, I'm willing to sort of make this jump to the next, you know, drop of privacy or something like that. So very difficult to define some sort of a continuum status, in fact measurable. >> Alfred Kobsa: Yeah. So we and other people do ask for their privacy impression about a new system, and you get answers, but those answers are typically highly correlated with other attitudes about the system. So for instance, trust in a system is a very important factor. And those are typically very much correlated. So our lesson was not to ask the privacy question isolation but throughout a take other factors into account, including for instance, recommendation quality, which is also highly correlated with trust and interacted with privacy. So it's not one-dimensional. You can ask one-dimensionally, but it's not the full answer. This is the goal of privacy-enhanced personalization. Question, research question: Can we have good personalization and good privacy at the same time? In the research that I'm going to present now, we operationalized this question a little bit. We take privacy constraints as a given, so we don't negotiate them. And then try to optimize personalization within these constraints. In research that I'm going to show at the end, we kind of will not take this stance anymore. So we want to give optimal personalization within the given privacy constraints, without those constraints. We have two major kinds ever constraints. One is people's individual privacy preferences. So what people like and dislike in terms of privacy. And the other constraint is all sorts of privacy norms. So there's law, there's industry self-regulation, there's abstract privacy principles. And we take these into account. These set the boundaries so to speak. And within these boundaries, we try to reconcile personalization and privacy. And again, there's two major thrusts in this area. One is to the development of privacy-enhancing or privacy-enhanced technology. And some of you are working in this area. And then there's privacy-minded user interaction research. And typically, these two thrusts go hand in hand. So it's not very often that you just do one of those. Let's talk about those individually. So amount of privacy constraints first. Users' individual privacy preferences. What do users like and dislike in terms of privacy? We conducted very many interviews with people about privacy concerns in different areas. And we found that people have very different individual privacy preferences. There are commonalities, of course, but variability is very high. One can make some generalizations. So it's clear that people's privacy concern depends on the type of information that is being sought. People have less problems giving out basic demographic information about themselves, like where they live, what city, lifestyle, taste, does not raise many privacy concerns. Whereas financial information, contact information, exact address, phone number, specifically for women, credit card, Social Security card number, of course, raise more privacy concerns. So this would be one generalization that can be found. Interestingly enough, the value of the data plays a value. So there was at least one experiment that showed that people have less concerns giving out certain data that are listed below here if their personal data doesn't deviate too much from the perceived average. Once people deviate quite a bit, they become more hesitant to give out this information. Now, this has been confirmed in a person-to-person privacy situation, but it may also apply to person-to-organization privacy situation. Privacy norms would be the second kind of privacy constraint to which we cater. There is quite a few privacy norms around. So more than 50 countries and nearly a hundred states worldwide have privacy laws enacted. The U.S. doesn't have an umbrella privacy law, but as you probably know, there are privacy laws specifically geared towards children's privacy, privacy of online data, and media rentals. Industries came up with self-regulation. So many companies have privacy, internal privacy regulations. Some industry sectors have them. Like the U.S. network advertises initiatives. Members must respect certain privacy regulations. And there also exist more abstract privacy principles like the OECD or the Asian Pacific Economic Corporation. They came up with privacy principles mostly as basis for the development of national privacy laws. But member organizations like the ACM came up with privacy principles and asks its members to respect those. The interesting thing about that he is privacy norms and specifically up here these privacy laws is that they effect quite a bit what a personalized system is allowed to do with and without the consent of the user. We analyzed about 30 international privacy laws, and here you find professions from the German privacy law which is quite strict, or from Europe wide general privacy framework that's the European countries have to implement that in some cases severely restrict what you're allowed to do in personalization. So usage logs must be deleted after each session, says the German multimedia law, unless the user agrees with this. Now, this is a problem because if you do machine learning waived on user logs, you typically combine several -- the logs from several sessions because a single session is typically very short and you don't have enough data to be able to learn. Where usage logs of different services may not be combined or European law says if an important decision is being made, the human must always be in the loop. So these personalized tutoring systems that high school students use in the U.S., they could easily give you an F under this regulation or flunk you from school or whatever. There must always a human in the loop. So privacy laws to some extent severely affect personalization methods, where they may be employed or may not be employed. And in my example, I will now present methods for reconciling privacy and personalization. It takes this into account in the context of an internationally operating website. We use internationally operating websites because they're subject to many privacy laws no matter where they are located. So European privacy laws impose severe -- it's called an export restrictions -- on data that is being transmitted outside of the European union. So even if your service in U.S., you are subject to these export restrictions. >>: [Indiscernible]? Are you kind of inherently international ->> Alfred Kobsa: In some way, yes. Yeah. But some companies take this into account and smaller ones don't. >>: [Indiscernible]. >> Alfred Kobsa: And the large ones do take this into account. So Google does. Yahoo does. Amazon does. Necessarily once we have a location in the respective country, you are under pressure to really take this into account. But even if major websites do not have a [indiscernible] presentation yet, they tend to adjust to national legislation. A is a goodwill. B, they might get a local representation [indiscernible]. >>: [Indiscernible] I think Netflix is a very good example of that for other reasons [indiscernible]. >> Alfred Kobsa: Yeah. They will now go into Europe. They announced it. But this is definitely a valid consideration. For instance, what Netflix does in terms of profiling would not be admitted into Germany without consent of the user. >>: A lot depends on whether you're protecting your butt in terms of legal stuff, right? Because if you're a company in California and you realize you're operating or selling in California [indiscernible], which is why Amazon has cut a lot of their affiliate programs because of the various tax-collecting laws in the various states. They've cut programs in Illinois and California and a whole bunch of states because they didn't [indiscernible] collect sales tax. >>: Sure. So you're just talking about people who are taking international things because I can put up a website and [indiscernible] log data personalized stuff. >>: Right. >>: And I wouldn't be internationally operating. >>: Right. Be internationally operating, but since you're not targeting it specifically at European people, you can get away with it like [indiscernible] or that you were to [indiscernible], be like, oh, I'm based in the U.S., wasn't ever meant, blah, blah, blah. >>: Yeah. >> Alfred Kobsa: On top of that internationally operating websites should also take individual privacy preferences into account. I will show an example later. But in the future, they may even have to take individual privacy preferences into account. Just last week, the Spanish data commission -- data protection commissioner filed an official request with Google Spain that is 99 -- I think it was 89 or 99 people wanted certain pieces of information not to be retrievalable anymore by Google for various reasons. For instance, in Europe, if you commit a crime and if you serve your time, then after a certain period of time, any public mention of this fact is prohibited and for some of those people, it was still on -- retrievalable online. So they had a kind of legal right. In other cases, it was just embarrassing information. And so the Spanish Data Protection Commission is now negotiating with Google. We'll see how this turns out. Okay. Now, actually, this kind of summarizes what we just discussed. Companies react differently. Disney told me their website confirms both to European privacy legislation and to the U.S. COPA act, which is about children's privacy. So they have a kind of minimum subset what they do. Others do country-tailored. Google does country-tailored. IBM does country-tailored. And they try to group different countries like the German-speaking countries since they have similar privacy legislations. We propose more flexible -- and the downside with all of this is it's inflexible and individual privacy preferences are not at all taken into account. We propose a more flexible way of approaching this problem that is based on the fact that personalization currently is not very much done anymore as part of an application system, but rather that there some separate personalization server, user-modeling server that carries out all or most personalization. So here you see such a server that contains profiles of many, many users. And here you see personalization methods that take data from here and create inferences and put back those inferences. And here are the personalized implications that fetch and give data to the user modeling server. So this is the backdrop of our research. And we now look at the privacy implications of these user-modeling methods that we just saw. Specifically had kind of data are being requested. Is it just demographic data? Is it user-supplied data? Is it more tracking of users. And these personalization methods have different data requirements and therefore also different privacy implications. And our basic idea is to give each user only those personalization methods that have privacy implications with which the user consents and which are permitted based on the privacy legislation that may possibly apply to the user. So every user gets their user-tailored privacy, so to speak. As a tool to facilitate implementation, we use product line architectures that were originally developed as a common architecture for a set of related products like operating systems on cell phones. So in a product line architecture, you try to capture commonalities of all your products, optional parts, and then variants for your different cell phone models. And product line architectures were developed with the intent to instantiate a certain architecture for a specific cell phone model, its development time or at the latest, at production time. Increasingly, though, product line architectures are also being used dynamically at runtime. And one of my colleagues, André van der Hoek, has been working on that for many years. And so we try to create a runtime instance of the personalized system for each user which has the added advantage that people can change their minds and change the privacy preferences and we can also cater to that. Yeah? >>: So there seems to be a bit of a [indiscernible] unless you have some [indiscernible] properties like the [indiscernible] right to forget, I think they call it. [Indiscernible] which is quite difficult to ->> Alfred Kobsa: Absolutely, yeah. Absolutely. Let's look at an example so this is not so abstract. Let's assume we have an internationally operating website. Some sort of recommend the system that adapts to privacy constraints. Let's look at it more closely. Here are three users: Bob, Alice, Cheng. And here you have the privacy constraints that apply to these users. Bob is in the U.S. Alice is in Germany. Cheng is in China. Since Alice sits in Germany, this multimedia law applies to her, so that profiling is not allowed except with the consent of the user. The personal data must be deleted after each user session, except for certain purposes. Cheng does not have national privacy law in China. But assume he dislikes being tracked. And Bob in the U.S. also does not have an umbrella national privacy law, but assume that this recommender system is part of the NAI that has self-regulation with regard to privacy about, for instance, not being allowed to combine personal identifiable and not personally identifiable information. Now, we look at these user-modeling methods that we saw before. And check when they can be used in the privacy constrains of these three users or not. So Cheng we say doesn't like tracking. And incidentally, the last three methods all involve tracking. So we are not allowed to use those for Cheng. And now, we construct a runtime system instance only with the permitted personalization method for each user. So in this case, we get three different instances. If the privacy constraints of two users are the same -- and this happens quite often -- then these people would share the same architecture. And those architectures now generate the inferences that get into the user profiles of those users. Now, this is a nice idea, and we implemented this idea. Actually we did four different implementations. And the first thing that we looked at was does this idea scale? Product line architectures are very complex and you can imagine dynamic product line architectures at runtime are a little bit problematic. So we did some simulation with a number of simulation parameters that were all very conservatively chosen. So we said that our system has ten personalization methods. So far I've never seen a personalized system that uses like five, six different personalization methods at the same time. We restricted the number of privacy constraints to 100. In all these privacy laws, we hardly found more than 25, 35 restrictions. And we assumed that our architecture is distributed on a cloud, but we only simulated single node of this cloud. Simply, we did not have a cloud available. And we came up through private experiments with these other simulation parameters. And we simulated this on a relatively cheap platform, so nothing out of the ordinary. And this is the testbed. I'll skip that. Yeah? >>: So you said that most of the legislations that you have got already have more than 50 or so constraints. How does that fit with something like HIPAA which is also privacy legislation ->> Alfred Kobsa: Yeah. >>: -- running into hundreds of pages? >> Alfred Kobsa: Yeah. Sorry. I shouldn't have said that. We only looked at national privacy laws. HIPAA is extremely complex, very, very detailed. And I quite frankly not sure whether it fits into this framework. We are dealing here with constraints that apply more on the higher level and where the objects that are permitted or not permitted whole personalization methods. HIPAA has tons of very, very specific formalistic requirements, and we didn't look into it very much. And I kind of fear that we will have troubles pulling it into shape to allow it to handle it in this. Those national privacy laws are much different from HIPAA. There was another question? Yeah? >>: How do [indiscernible] themselves choose what their preferences are? >> Alfred Kobsa: I'll show that in a minute. Here you see performance times. And those are the four implementations that I mentioned. Here you see that we get these spikes here. Also the means is half a second four assigning users to an architecture, so this is intolerable. Here you see a more customized architecture that uses caching so I mentioned that user who have the same privacy constraints or similar privacy constraints even can share an architecture. We use caching for that. And you see that the performance times are much more acceptable in this implementation. >>: So you think about as the latency induction or [indiscernible]? >> Alfred Kobsa: This is the way -- what edit time -- thanks for asking. What edit time can users expect? The first time that they're visiting a site because of our privacy thing that we do here. >>: And then they would have a [indiscernible] that would set ->> Alfred Kobsa: No. Their process, their session would be assigned to an architecture. >>: But it seems like caching goes against some of these other regulations about, you know, remembering the user ->> Alfred Kobsa: Yeah. It's not data caching, but caching of architectures. Architecture caching. Imagine every user had its own personalized systems that geared towards the privacy constraints, but if you have a million users, you do not want to have a million instances. So let's combine users that have the same instance of our general personalized systems and give them the same instants. That's the only thing behind it. This is more a relative comparison between the four different implementations. And here you can see that caching brings quite a bit of efficiencies. Those are two caching versions. We did a back of the [indiscernible] log calculations, how much in additional resources would be needed by the major internationally operating websites which happens to be Google currently. So they have three points. 3.24 billion visits per month, which translates into 1,250 visits per second. And we can handle two visitors per second on the node. So we would need about 2,500 nodes. Just as a rough calculation. Which is extremely cheap. Two caveats. One positive caveat, we assume that all these privacy parameters would be randomly distributed. Which is however not the case. So people kind of gravitate towards specific typically privacy preferences and also countries, some countries have many visitors and so the privacy constraints from those countries would be over represented, which increases the benefits of caching. So under these considerations, this might be very conservative. However, you also didn't take into account the efforts to cache all these many -- to keep these all many instances, but we have to feel that the hosting efforts will be roughly the same. >>: It seems like the list of caveats [indiscernible] because if you look at the systems deployed on a large scale, which you have done, you see all sorts of interesting patterns that emerge, such as precomputation that they do a lot of -- a great deal of caching on various levels which has benefits [indiscernible] interesting downsides from the standpoint of privacy. Other tricks like for instance, Google will often throw up a capture at you or will force you [indiscernible]. So [indiscernible], they could be doing computation in the background, you know, speculatively calculating a page for you that they'll show to you in a second. >> Alfred Kobsa: Yeah. >>: So there's tons and tons of stuff ->> Alfred Kobsa: I would agree with that and will respond directly to this. We feel that major instances, so ones that are being used very often, will have a permanent life of themselves and do what you just mentioned just by themselves. And this is quite interesting because then you have personalization geared towards certain user groups. And this becomes quite interesting. I will come back to it at the end of my talk. It does the same what is currently be done, but within the privacy constraints of a specific user group for a specific country, for instance, or for people who share certain privacy constraints. >>: So you have a hundred bits of user privacy preferences. >> Alfred Kobsa: Yeah. >>: And you say that many preferences are common, but I bet there's a long tail of privacy preferences [indiscernible]. You have a hundred bits to identify user and track them based on the particular configuration that you use, which may then, you know, give you information about [indiscernible]. >> Alfred Kobsa: Unfortunately, I have to say we have no data about that. We did study where we randomly generated such privacy constraints. To my knowledge, there is no real study yet that looks at the long tail of specifically not of users' individual privacy preferences. They are very diverse, by all means, and in this case, our architecture kind of becomes more efficient if there is not so much variability because then caching has [indiscernible], but in principle we can cater to any combination of privacy preferences. So you asked how can we tell users to specify privacy preferences. We implemented book retailers website that deliberately was made to look similar to some neighborhood bookstore that you have around here. This website asks a couple of nosey questions. And here on the right side, you find privacy control panel as we call it, where people can specify privacy preferences. And then they see here what kind of personalization methods will operate or will not operate under these preferences. So people can click here and see effect some personalization methods immediately. And they can also click on the eye icons here and get further information both about preferences and personalization methods. And we now wanted to find out whether people appreciate this possibility. Now, how can you find out whether users like it? One way is simply to ask users to show them the original site and the site with the privacy preference panel and ask them what they like better. This is inquiry-based user study. And it's good because you get information about users' rationale that you would not get if you would simply look at what users are doing. But on the other hand, as we know, often users tell you something, what they would do or so, but the behavior eventually does not correspond to the stated intentions. And in the area of privacy, this seem to have a major effect. And therefore, we decided to do both, so both to run a behavioral experiment and but also to ask people about their attitudes. Now, our experiment is based in some part on deception. And in order that you're not possibly shocked by that is correct I show this slide that essentially says we're in good company with that. Even though it's not frequently used, in the area of human-computer interaction, it is being used. Mostly in cases where you cannot fully implement the system because it's too time-consuming or impossible given the state of the art. But you nevertheless tell the user you were working with the real system. And there is different degrees of deception involved and I listed them here inversely. So Wizard of Oz experiments, there you have a human who secretly does what the system is supposed to do. You have pretend studies where you implement shortcuts. You tell users they are working with the real system, but in reality, you implement shortcuts so the real system does not exist. And the third one that you channel user past missing system functionality through clever user interface design and task allocation. That's quite commonly done in human-computer interaction user studies. So our experiment uses pretend methods and also number three, you channel people past missing parts. We told user that they would have to do a usability test with a new verse of this well-known online bookstore. And this new version would first ask users tons of questions and then recommend books to them. We told them that they would not have any obligation to us to answer any question, but giving correct answers would improve recommendation quality. We also warned them that data would be given out to this company. And we told them that they -- at the end they would have the opportunity to buy one of the recommended books for very cheap price. And this gave them some intensive to answer those questions truthfully because this would guarantee that at the end, they would get better recommendations and could possibly buy one interesting book for a high discount. And we also warned them from the beginning that if they decided to buy a book, then they would have to show us their ID and also show us the credit card data for verification. Now, at first, users answered those questions. Those were mostly book-related. Some of them were a little bit sensitive, like about people's book preferences with regards to their sexual preservations or people's interest in certain medical areas, including more sensitive medical areas like venereal diseases. People answered nine pages of questions, were encountered to review their answers so their recommendations would get better. They then submitted the answer. And I forgot to mention, during this time, all of this fake book recommendation counter visually decreased. So at the beginning, at one million books are being considered for them. And by answering questions, this counter visibly decreased. But again, this was all fake. Then they got 50 books recommended at the end. But everyone got the same books recommended. No matter how many questions they had asked, now matter what answers they had given. This we did in order not to introduce personalization quality as an additional factor in our experiment. Yeah? >>: [Indiscernible] assumption that the more questions they answered, that was how they obtained the 70 percent discount? >> Alfred Kobsa: No. >>: [Indiscernible] no strings tied to the 70 percent discount? >> Alfred Kobsa: No. There's no strings tied. No. So the motivation for people was if you answer many questions, then the recommendation set will be better, then you can possibly find the book that is more interesting for you. So you can make a good bargain if you answer many questions and answer them truthfully. >>: [Indiscernible] measure of satisfaction with the recommendation provided then? >> Alfred Kobsa: We did. I will come up with that. So we recommended those books to them, and they could pick one of those. The books were chosen to be interesting for our subject audience which were mostly students, so sex and crime, health advisories, trail guides. And then they could decide whether or not they wanted a book. And this was an important step because everything that's done so far they done under pseudonym. They could choose their own pseudonym. But if they decided to buy a book, then they had to identify themselves. And at the end we checked their data in order to prevent that they could send the book to their parents or girlfriend, boyfriend, whatever. So they had to give their real data and the threat was that their real data is associated with all the answers that they had given pseudonymously. Okay. So this was the experiment. Again, there was no recommendation behind, but all was pretend. We administered this experiment to students from my university. We had to discard seven of them, six of them because they looked familiar to the experimenter and we felt that they may behave more privacy consciously in this situation. They were randomly assigned to two system versions. One was the version that you saw with the privacy control panel, and the other was simply without this. So there was only one factor that varied between those conditions. And we assumed that based on literature, that users will be more willing to give out personal data in the version with the privacy control panel and also judge the privacy friend in this higher. >>: Do you take into account the research that looks at if a privacy disclaimer is placed at the beginning, people are becoming more privacy conscious and ->> Alfred Kobsa: Yeah. We took this into account and [indiscernible] mentioning of privacy at the beginning of an experiment. So when we sometimes administered standard privacy instruments, and we always do it at the end, we also made the experience that people behaved more privacy conscious if you bring up the topic of privacy at the beginning already. All right. Here are the results. This is the condition, the controller condition without the privacy control panel. Here's with the privacy control panel. So people answered more questions without privacy control panel. Many questions were multiple-choice questions. They also gave more answers. And interestingly enough, more people decided to buy a book in this enhanced condition with the privacy control panel. Even though everyone got the same recommendations. Why so? Well, the only explanation that we have is in order to buy a book, people had to identify themselves, give out name, address, and confidential financial data. And they were more willing to do that in a situation that looked more privacy friendly. We have no other reasonable explanation. So this means that -- yeah? >>: Oh, maybe the next slide you're going to talk about this. So those people who were presented with this panel, did they exercise any options? Did people with ->> Alfred Kobsa: Yeah. Yeah. So we asked about people. We did not -- we did video record the experiment, but not [indiscernible] that. We asked people. And here's the answers to questions like I paid attention to the privacy control panel. Note again, we showed people the privacy control panel, but the main thrust of the experiment and everything that we told users was this is a new version. You do a usability test, you have to answer questions, your data is being given to the company. We did not really point them to the privacy control panel. But people nevertheless said that they paid attention to it, that they clicked [indiscernible] to get more information, that they set options. 60 percent said so, and 40 percent said they played around with it. This is not mutually inclusive, exclusive. So people at least noticed, paid attention, played around with it. Some of them deliberately set preferences. And even though we did not, again, specifically point people to this privacy control panel, people found it useful with four out of only five scale, and would those who use it, so stated usage was quite high. So it seems based on this, that this would be quite interesting as a control mechanisms for users on websites to be able to have some control over their privacy. Some caveats again. Emerson has a high reputation. We tested this before. And reputation plays a role in privacy issues. So we would need to test -- redo this experiment with a site that has a lower reputation. Also, this privacy control panel was permanently visible all the time. And websites designers are not going to allow this. I mean, it takes up quite a lot of screen real estate. But it's unclear if we just present an icon to users: Here's a link to a privacy control panel, whether we will see the same effect, or whether it's necessary that this is always in front of the users to have those effect that I mentioned. Yeah? >>: To pick on the first part a little bit more, seems like there's a huge [indiscernible] effect happening here. So if you were, you know, I mean, if this was done by, imagine, I don't know, [indiscernible]. You wouldn't engage otherwise. Or your phone company knows where you are most of the time because you carry a cell phone. Then, you know, then the concern, the level of concern goes down tremendously because of the concerns you would have otherwise if you didn't trust [indiscernible]. You had the fly-by-night books.com, then you know, that would be -- it would seem like a different story eventually all together and [indiscernible] little you could do alleviate the distrust you would have with that [indiscernible] organization [indiscernible]. >> Alfred Kobsa: Yeah. I would agree with that. And if you look here, we may have experienced some ceiling effect already. This is hundred percent questions answered. So people were not shy giving at least one answer to every question to Emerson. So possibly with less trustworthy site, there's difference between with and without privacy control panel might get larger since we would not have ceiling effect here. >>: [Indiscernible] also efficient [indiscernible] for somebody to present Amazon or a Wells Fargo [indiscernible]. >> Alfred Kobsa: Absolutely, yes. Mm-hmm. Yeah. >>: [Indiscernible]. >> Alfred Kobsa: Yeah. There are very experiments to show that simply if you choose a conservative layout for your site, this people's disclosure will significantly increase. And the last one is maybe most interesting. We are not completely sure whether people fully understood our privacy control panel. It's very complicated to explain personalization methods to ordinary users. The question though is does it really matter? And this brings me back to some other experiment with [indiscernible]. I'm not sure -- I gave a talk on privacy here at MSR five years ago already. And I do not recall whether any of you attended this talk, but if you did, you may possibly think in the back of your minds: Didn't we here about this experiment already five years ago? And yes, you did. But we administered the same experiment twice. Five years ago was German students. Now again at UCI. At that time, the manipulations were very much different. It was about comprehensible and less comprehensible presentation of company's privacy policies to users. Here it's about the privacy control panel. Nevertheless, we got fairly comparable results. Also in the old experiment, disclosure increased. People bought more books. And of course it's a little bit dangerous to make generalizations based on two experiments only, but the question certainly arises: Does it really matter what I do in terms of privacy? Or will I get similar effects whatever reasonable thing I do in terms of privacy? Now, so maybe it's not really necessary that people fully understand but rather it's important to give people control, to be -- give people comprehensible disclosure so that major important privacy methods will all lead to similar increases in disclosure and purchases. But this is mere speculation. And this old experiment to answer your question completely, we also administered to a sites that have lesser reputation. We found the same effect. So roughly the same increase in purchase and roughly the same increase in disclosure. So it was not the case as I speculated just before that for sites with lesser reputation, the difference would go up. It was roughly the same. >>: Are you planning on [indiscernible] as you mentioned before, there is still a difference between knowing that concern ->> Alfred Kobsa: I would love to do a big test with some of the things that I presented, by all means. Okay. I quickly present -- I keep this and just present this to you. Would you answer this question. This is a recommend a system for applications. And it asks maybe know your household income. I want this here. Maybe you know your household income. We can recommend apps that are popular among people with the same income. So it gives an explanation of what the system is going do with this information. 85 percent of users told us their household income. We varied the numbers a little bit. Does this kind of -- is this a reason for you to give it out? Past benefit for others. 94 percent of people in the past who gave us their household income got better recommendations. Or finally, the recommendations for you will be 80 percent better. So the previous one just said something about past experiences for others, but here it's kind of a prediction for you. We have to aim to empower users to make better privacy decisions. And in this experiment, we tried to find out what piece of information is more convincing for users than others to reveal this requested piece of information. This has not yet been done. At least as far as we are informed. So it would be interesting to see what kind of added information will convince users to give out information about themselves. And this is not only about this, but rather the whole thing is to test and augment a model for predicting the user -- evaluating the user advocacy or further command the system that takes satisfaction into account, that takes privacy into account. Also this is a small experiment in a [indiscernible]. If you want to participate still in this experiment, it's still up. So just go to this web page and you can even bring an Emerson gift card. Thank you very much. [Applause]. >> : [Indiscernible] any more questions [indiscernible]? >>: Let me ask a question. So going back just a couple of slides you have, this thing about predicted benefit, there seems to be a [indiscernible] but in order to evaluate [indiscernible] benefit, you kind of need to give out that information to maybe sort of [indiscernible]. So what do you -- I mean, you can give rough estimates because of prior performance, but then again, prior performance is not really a predictor or future gains. So what are your thoughts on that? Is there such a thing as showing the advantages selection of sharing information of the user [indiscernible]. >> Alfred Kobsa: Our thoughts is to make this recommendation -- so here it says it will be 80 percent better, but of course, it very much depends, as you said, on the value. In general, the more extreme the value of a piece of data is, the better it can be used for improving implementation quality, but the more people are reluctant to give it out. And our idea is -- but we still have to test it -- to give people examples. If your household income is in this bracket, we typically can improve by 20 percent, in this bracket by so and so many. But we still have to test this. But this looks like a reasonable approach to us, a firsthand approach to solve this [indiscernible] problem. >>: Secondly, what is the [indiscernible] from lying about all of this essential? The [indiscernible] cues could be completely made up. That's entirely possible. >> Alfred Kobsa: This is entirely possible. We are aware of that. Yeah. We are -- well, in this case, in other experiments, we ask people afterward: Did you lie about it? And these typically gives a control measure. They are very experiments however that showed that if this button is provided, no, I don't want to give it. Or if it's optional, then people who would otherwise lie chose just to say no in most cases. So we feel that the data that we would get here will be pretty accurate. We don't look at the data anyways. But we also believe that people will give correct data since they have the option to opt out. Yeah? >>: I have a question. Do you mind going back to the social piece? >> Alfred Kobsa: Mm-hmm. >>: So I'm curious if you're going to look at the difference between cohorts, 85 percent of our users, or really that's not even cohorts. That's all users, people like you, your friends and whether you -- based on your research or others, you see a distinction in providing those levels of social cues. Not just the users of this app, but people like you or even more so, people you actually know make these decisions. >> Alfred Kobsa: We did such an experiment in the context of instant messaging. As you know, there's all sorts of private information given out by instant messaging systems that is privately related like when someone is online or off line. We, again, a pretend experiment where we told people that they're going to evaluate and installer for an instant messaging system. And in the installation process, we were asked to log in to the IM system that they really used and we would import their name, the names and IDs of their friend from their IM system. And then we told them for this privacy option -- we gave them a couple of privacy options. So and so many of their friends have chosen their option and this option. We varied the numbers. We also allowed them to exclude people so that certain people cannot see their up time. We found that there was some effect, but it was much smaller than the effect of each individual privacy option. So there was noticeable effect. Some privacy options simply were more privacy intrusive than others. And this has a far higher effect on people's privacy choices than what you claimed their friends had done. So these navigation cues only have small effects seemingly. And there was some other experiment in the area of security that had similar findings. Essentially only if you really put that under people's nose, they were able to find statistically significant effect. >>: If you used people's Facebook friends, you showed the faces of the friend that had made the decision, measure how much impact that might have. >> Alfred Kobsa: Yeah. >>: I guess it would be substantial. >> Alfred Kobsa: Yeah. It could be that if you use some other context, Facebook results are different. But in our IM context, there was an effect even statistically of approaching significant but it was not very strong. >>: The other question, if I may, I wanted to ask was about the mental model of users and about the data sharing. Most of the work that you showed, it was really a notion of you give the data away and then it's gone. Now it goes to some third party and they basically can do what they want. Have you looked in your work at all on the model where the user retains some level of persistent control over the information and that the request is per use or per given context and the ultimate sort of -- let's say the record lives with the user. That they really become the source of truth for the information rather than the information [indiscernible]. >> Alfred Kobsa: Together with German Ph.D. students, I was involved [indiscernible] and we developed a system that allows people, A, to keep track what data they had given to what website, which is important first step because this is quickly forgotten. And also that they could request some data to be deleted, which you can do under European privacy law. And we did some user evaluation and people liked it, yeah, but we did not expect anything else. We did not run some behavioral study. It's just about people's liking and yes, people would use that, but this was a kind of quite obvious result. >>: I guess the question I'm asking is something that a number of folks now are trying to understand is whether the mental model of the data ultimately goes to the third party from you and it's gone versus it's really an asset that you yourself have assets to. And perpetual persistent rights over the course of a lifetime even, and that the equity that you create in your digital activity grows over time and it's something you want to curate and that would have value. And if the relationships in each of these individual transactions aren't the end of the story, they're part of a larger story about your digital life. >> Alfred Kobsa: Yeah. >>: I mean, what are ->> Alfred Kobsa: Yeah, I mean, you probe the area identity management where similar ideas are being proposed. Again, there are big projects funded by European commission where industry was involved, quite a lot of industry even. And the idea is that you were the owner of your data. Websites are just stewards of your data who use your data for certain business purposes. And that's it. It's also essential the philosophy behind much of European privacy legislation that you can only keep data as a company as long as it's needed for the business purposes for which the data was originally collected. >>: [Indiscernible] architecture of the Internet with very few exceptions and some research projects ->> Alfred Kobsa: [Indiscernible], exactly. >>: Right. But there isn't a place to see all of the data that you're actually creating. It's disbursed by definition. And so I'm wondering in your work whether you think that will create a different response from users if they have that ->>: If they can preempt ->>: Oh, sorry. >>: -- just a second. [Indiscernible] but our [indiscernible]. So you are most welcome to join us and continue this conversation at the next period. >>: Okay. [Applause.]