User Modelling (UM) Dr. Alexandra I. Cristea

advertisement
User Modelling (UM)
Dr. Alexandra I. Cristea
http://www.dcs.warwick.ac.uk/~acristea/
Reading Reminder
• Recommended Reading: User Modelling for the
Social Semantic Web (Plumbaum et al, 2011);
Ontological technologies for user modelling
(Sosnovsky, Dicheva, 2010); Generic User
Modelling (Kobsa, 2007); User Profiles (Gauch
et al, 2007).
• Recommended Papers for reference:
Challenges of SW for UM (Dolog, Nejdl, 2003).
2
Overview: UM
1.
2.
3.
4.
5.
6.
Introduction
History
Information collection
User Model Representation
User Model Construction
Issues
3
Overview: UM
1. Introduction
A. Definition
B. Motivation
C. Application fields
D. Classifications
4
Definition:
What is user model (ling)?
• “If a program can change its behaviour
based on something related to the user,
then the program does (implicit or explicit)
user modelling.”
• “A user model is a representation of the user
in an information system, in the form of
information which the system collects and
maintains in order to improve the quality of
information access” (Brusilovsky)
5
• Compare with Wikipedia def
Core vs. Extended User Profile
• Core profile
– info related to user’s search goals and interests
• Extended profile
– info related to user as a person: demographic info,
e.g., name, age, country
– education level
– abilities
– profession...
• Determined by the application needs
6
EG:
http://webprotege.stanford.edu/#Edit:projectId=aec
7af2a-e644-48b3-916b-f2dc7ef76bd8
7
Motivation: Why user modelling?
• pertinent information
– What is pertinent to me may not be pertinent to you
– information should flow within and between users
– users should control the level of information-push
• large amounts of information
– too much information, too little time
– people often become aware of information when it is not
immediately relevant to their needs
– Difficult to handle
– Etc.
8
Application fields: What for?
• Semantic Web
• Web 2.0
• Recommender Systems (e.g., commercial
systems; content-based recommenders)
• Information retrieval
• Information filtering
• User Simulation (e.g. of user roles)
• Intelligent Tutoring Systems
• Expert Systems
• Adaptive Hypermedia
• TO ADAPT TO THE USER
9
Classification
• According to the way User Information is collected:
– explicit, through user intervention
– implicit, through agents that passively monitor user activities
• According to the life-period of the profile / model
– Static profiles that maintain the same information over time.
– Dynamic profiles that can be modified or augmented.
• Short-term profiles represent the user’ s current interests
– Long-term profiles indicate interests that are not subject to
frequent changes over time
• Model structure
– Keyword profiles
– Semantic net profiles
– Concept profiles
10
The Big Picture
Data
Collection
User
User Info
Model
Constructor
Model
Technology or
Application
Structure
Personalised
Services
11
A (classical) thought on UM:
IF Man cannot understand Man,
HOW can a machine built by Man understand Man?
12
Overview: UM
1. Introduction
2. History
A.
B.
C.
D.
Early days
Early systems
User Modelling Shells
User Modelling Servers
13
Early days
• Start: 1978, 79:
– Allen, Cohen & Perrault:
Speech research for dialogue coherence
– Elaine Rich: Building & Exploiting User Models
(PhD thesis)
• 10 year period of developments
– UM performed by application system
– No clear distinction between UM components &
other system tasks
• mid 80’s: Kobsa, Allgayer, etc. (see UMUAI)
– Distinction appears
– No reusability consideration
14
15
Early systems
• GUMS (Finin & Drager, 1989; Kass
1991)
– General User Modelling System
– Stereotype hierarchies
– Stereotype members + rules about them
– Consistency verification
 set framework for General UM systems
• Called UM shell systems (Kobsa)
16
UM shell services (Kobsa ‘95)
17
UM shells Requirements
• Generality
– As many services as possible
– “Concessions”: student-adaptive tutoring systems
• Expressiveness
– Able to express as many types of assumptions as
possible (about U)
• Strong Inferential Capabilities
– AI, formal logic (predicate l., modal reasoning,
reasoning w. uncertainty, conflict resolution)
18
Characteristics of UM Server
• Client-server architecture for the WEB !!!
• Advantages:
– Central repository w. U info for 1/more applic.
• Info sharing between applications
• Complementary info from client DB integrated easily
– Info stored non-redundant
• Consistency & coherence check possible
• Info on user groups maintained w. low redundancy
(stereotypes, a-priori or computed)
– Security, id, authentication, access control,
encryption can be applied for protecting UM
19
UM server Services
• Comparison of U selective actions
– Amazon: “Customers who bought this book
also bought: […]”
• Import of external U info
– ODBC(Open Database Connectivity)
interfaces, or support for a variety of DB
• Privacy support
– Company privacy policies, industry, law
20
UM server Requirements
• Quick adaptation
– Preferably, at first interaction, to attract customers 
levels of adaptation, depending on data amount
• Extensibility
– To add own methods, other tools API for U info
exchange
• Load balancing
– Reaction to increased load: e.g., CORBA based
components, distributed on the Web
• Failover strategies (in case of breakdown)
• Transaction Consistency
– Avoidance of inconsistencies, abnormal termination
21
UM server trends
• More recommender systems than real UM
• Based on network environments
• Less sophisticated UM, other issues (such as
response time, privacy) are more important
• Separation of tasks is essential, to give flexibility:
– Not only system functions separately from UM functions,
but also
– UM functions separation:
• domain modelling, knowledge, cognitive modelling, goals and
plans modelling, moods and emotion modelling, preferences
modelling, and finally, interface related modelling
– In this way, the different levels of modelling can be added
at different times, and by different people
22
Overview: UM
1. Introduction
2. History
3. Information collection
A.
B.
C.
D.
Information type
Information sources
User Identification Method
User Information Collection Method
23
Information type:
What can we adapt to?
•
•
U knowledge
U Cognitive properties
• (learning style, personality, etc.)
•
•
•
U Tasks, Goals and Plans
U Mood and Emotions
U preferences
24
Adaptation to User Knowledge
• Conceptual knowledge
– that can be explained by the system
• User option knowledge
– about possible actions via an interface
• Problem solving knowledge
– how knowledge can be applied to solve
particular problems
• Misconceptions
– erroneous knowledge
25
26
What can we adapt to?

•
User knowledge
Cognitive properties
• learning style, personality, etc.
•
•
•
User goals and plans
User mood and emotions
User preferences
27
Kolb scale
diverger
(concrete, reflective)
"Why?"
reflective
Child,
Budda,
philosopher
Business
person
accomodator
(concrete, active)
"What if?"
assimilator
(abstract, reflective)
"What?"
Teacher,
reviewer
Programmer
active
converger
(abstract, active)
"How?"
28
• Last time: UM definition, history, started
information collection: knowledge,
cognitive properties: cognitive styles
• Next: continue cognitive styles, other
user info collection types, user info
sources, user info collection methods
29
30
Van der Veer et al.
Other cognitive styles
•
•
•
•
Visual-verbal (verbalizer-imager; image-text)
Sequential-global (analytic-wholistic)
Active-reflective
Sensing-intuitive
• See (and try out) ILS questionnaire:
• http://www.engr.ncsu.edu/learningstyles/ilswe
b.html
• Etc.
31
What can we adapt to?


User knowledge
Cognitive properties
• (learning style, personality, etc.)
•
•
•
User goals and plans
User mood and emotions
User preferences
32
User Goals and Plans
• What is meant by this?
– A user goal is a situation that a user wants to
achieve.
– A plan is a sequence of actions or event that the
user expects will lead to the goal.
• System can:
– Infer the user’s goal and suggest a plan
– Evaluate the user’s plan and suggest a better one
– Infer the user’s goal and automatically fulfil it
(partially)
– Select information or options to user goal(s)
(shortcut menus)
33
What can we adapt to?


User knowledge
Cognitive properties
• (learning style, personality, etc.)

•
•
User goals and plans
User mood and emotions
User preferences
34
Moods and emotions?
• New, relatively unexplored area!
• Unconscious level difficult to recognise, but
it is possible to look at type speed, error
rates / facial expressions, sweat, heartbeat
rate...
• Conscious level can be guessed from task
fulfilment (e.g. failures)
• Emotions affect the user’s cognitive
capabilities  it can be important to affect
the user’s emotions (e.g. reduce stress)
35
Emotional Modelling
We address how emotions arise from an evaluation
of the relationship between environmental events &
an agent’s plans and goals, as well as the impact
of emotions on behaviour, in particular the impact
on the physical expressions of emotional state
through suitable choice of gestures &
body language.
36
Gratch, 5th Int. Conf. on Autonomous Agents, Montreal, Canada, 2001
Sample model of emotion assessment
37
Conati, AAAI, North Falmouth, Massachusetts 2001
What can we adapt to?


User knowledge
Cognitive properties
• (learning style, personality, etc.)


•
User goals and plans
User mood and emotions
User preferences
38
Adaptation to user preferences/
interests
• So far, the most successful type of adaptation.
Preferences can in turn be related to knowledge /
goals / cognitive traits, but one needs not care
about that.
• Examples:
–
–
–
–
Firefly
www.amazon.com
Mail filters
Grundy (Rich: personalized book recommendation expert system)
39
Other UM parameters to adapt to
•
•
•
•
•
User context
User environment
User group
User social interaction
Etc.
40
Overview: UM
1. Introduction
2. History
3. Information collection
A.
B.
C.
D.
Information type
Information sources
User Identification Method
User Information Collection Method
41
42
‘ Deep’ or ‘shallow’ modelling?
• Deep models give more inferential power!
• Same knowledge can affect several parts of
the functionality, or even several applications
• Better knowledge about how long an
inference stays valid
• But deep models are more difficult to acquire
• How do we get information about the user?
43
Overview: UM
1. Introduction
2. History
3. Information collection
A.
B.
C.
D.
Information type
Information sources
User Identification Method
User Information Collection Method
44
User Identification
• Five basic approaches to user identification:
– software agents
– logins
– enhanced proxy servers
– cookies
– session ids
• The first 3 techniques are more accurate, but
require active participation of the user. The
last 2 are less invasive
45
Overview: UM
1. Introduction
2. History
3. Information collection
A.
B.
C.
D.
Information type
Information sources
User Identification Method
User Information Collection Method
46
User Information Collection
• Explicit Feedback Systems
– direct user intervention, e.g. via HTML forms
– (potentially) accurate, extra burden on users
– e.g. demographic info (birthday, marriage status, job,
personal interests)
– Users may not accurately report interests
– Static profile versus changing user interests.
• Implicit Feedback Systems
– Collect information while user is performing regular tasks
– Open Web personalization needs extra software capturing
user activity.
47
What Kind of Implicit Feedback?
• Tracking of regular activities
–
–
–
–
Clicks
Time spent
Scrolling and mouse movement
Eye tracking, etc.
• Enabling and tracking additional interest-bearing
activities
–
–
–
–
Bookmarking (e.g., del.icio.us)
Downloading (Scienstein paper recommender)
Annotating (Knowledge Sea, Brusilovsky)
Q&A
48
Info Collection: Browser Cache and Proxy Server
• Browsing histories collection:
– users share browsing caches on a periodic basis
– users install a proxy server as gateway to Internet,
capturing all Internet traffic generated
• Disadvantages:
– Sharing histories: too much work from the user
– shared with one particular Website only
– Typically from a single computer. Otherwise:
• Share browsing cache from multiple computers
• Install same proxy server on each computer
• Use a login system with same user profile
49
Info Collection: Browser
Agent
• either standalone application with browsing capabilities
or plug-in to existing browser (i.e., HeyStaks)
• Advantage:
– richer information about user (browsing history, actions
performed: bookmarking, downloading, scrolling and
mousing).
• Disadvantages:
– users needs to install a new application or plugin
– large software development/ maintenance investment
– typically only available on that particular computer
• Or install on multiple computers and assure synchronization
50
Info Collection: Desktop
Agents
• searches not limited to Web: includes databases,
users personal documents (Google Desktop Search –
discontinued; Windows Search 4.0)
– enhanced user profile
• Client-side approaches: burden on users to
collect/share activity log (unless integrated w. OS)
– Microsoft, Apple, and Google have been actively working on
it
51
Info Collection: Web
and Search Logs
• Weblogs capture browsing histories of users at website
– to adapt website based on user behaviour
• Search logs contain info about queries + date/time/result
– For user profiling
• Advantage: user does not need to install a desktop
application and/or upload their information to the service.
• Disadvantage: only activities at search site tracked, less info
available.
• Heavily used by IBM, Google and Microsoft
52
• Last time: (finished) user information
type, user information sources, user
identification, user data collection
• Next time: UM representation, UM
construction, UM issues
53
Concluding: What information is available?
User modelling is always about guessing …
54
Overview: UM
1.
2.
3.
4.
Introduction
History
Information collection
User Model Representation
1. Keyword Profiles
2. Semantic Network Profiles
3. Concept Profiles
55
Generic User Modelling Techniques
•
•
Rule-based frameworks
Frame-based frameworks
– (incl. Keyword Profiles)
•
Network-based frameworks
– (incl. Semantic Networks and Concept Profiles)
•
•
•
Probability-based frameworks
Sub-symbolic techniques
Example-based frameworks
56
Keyword Profiles
• from web pages visited, bookmarked, saved, explicitly provided
• Bag-of-words
– most popular words
– may be associated with numerical weight representing importance
• Profile vectors
– overlay of a keyword vector
– 0-1 vector
– Weighted vector
• Benefits
– Simplicity
• Shortcomings:
– Words may have multiple meanings. Same idea expressed by different
words. polysemy, synonymy, >> ambiguous, inaccurate
57
0-1Keyword profiles
• Rows represent document terms
• Columns represent users
• User 1 liked document “the cat is on the
mat”
• User 2 liked document “the mat is on the
floor”
User 1
User 2
Cat
1
0
floor
0
1
mat
1
1
Weighted Keyword Profiles
Term 1
Term 2
Term n
User 1
W11
W12
…
W1n
User 2
W21
W22
…
W2n
Wm1
Wm2
…
Wnm
…
User m
Advanced Keyword Profiles
• Deal with shortcomings: synonymy, polysemy,
interest drift
• Examples:
– Representing user as a set of keyword vectors (e.g.,
one per bookmark (interest))
– representing each interest with three keyword
vectors, i.e., a long-term descriptor and two shortterm descriptors, one positive and one negative
– separate profiles for each topic and distinguish short
and long-term profiles
Domain (topic)-based UM
Art
0.60
0.72
…
0.45
0.33
Portrait
Sculpture
…
Watercolor
Painting
0.88
0.27
…
0.79
0.33
Soccer
Bat
…
Touchdown
Score
0.15
0.87
…
0.31
0.63
Rock
Symphony
…
Score
Orchestra
Sports
Music
Semantic Network Profile
• polysemy => weighted semantic network:
– node contains particular word found in corpus, arcs represent cooccurrences of two
• To increase accuracy, words can be grouped in “synsets”.
– nodes are “synsets”,
– arcs are co-occurrences of the “synsets” members within a document
of interest to the user,
– node and arc weights represent the users level of interest.
62
Semantic Network Profile
• Advanced relevance network for query expansion
Java -> Java and programming -> Java and (programming or development)
63
Concept Profile
• Similar to semantic network: with nodes and arcs.
• nodes represent abstract topics considered
interesting to the user, not specific words (sets)
• hierarchical concepts enable generalizations.
• simplest: constructed from reference taxonomy
(WordNet) or thesaurus
• complex: from reference ontology
• levels in hierarchy: fixed, or change dynamically
according to users interests.
64
Concept profile over news taxonomy
• For each domain concept (or taxon) an overlay
model stores estimated level of user interests
65
Additional types of UM
representation and classification
• Overlay: user’s characteristics are a subset
of expert system’s characteristics (e.g.
knowledge of a domain)
• Predictive: run models and see which best
fits user’s input
• Analytic: analyse user’s input and see which
features exist and try and account for
structure (data mining)
66
Overview: UM
1.
2.
3.
4.
5.
Introduction
History
Information collection
User Model Representation
User Model Construction
– Building Keyword Profiles
– Building Semantic Network Profiles
– Building Concept Profiles
67
Building Keyword Profiles
1. extracting keywords from Web pages collected.
2. keyword weighting
1. popular weighting scheme: Tf*Idf (information retrieval
theory)
2. Latent Semantic Indexing (LSI)
3. Linear Least Squares Fit (LLSF) for creating the keywordbased feature vectors.
3. number of words frequently capped: only top most
highly weighted terms from any page contribute to
profile.
68
TF*IDF
• term frequency tft,d of term t in document d = number
of times that t occurs in d.
• document frequency dft,D of t = number of documents
that contain t in the whole corpus D.
• N = number of documents in D.
• inverse document frequency idft,D of t in D:
idf t , D  log 10 ( N/ | d t,D |)
• The tf-idf weight of a term is the product of its tf weight
and its idf weight.
w t ,d ,D  tft ,d  idf t , D
Latent Semantic Indexing
• Principle:
– words that are used in the same contexts tend to have similar
meanings.
– Uses SVD (singular value decomposition).
– Overcomes synonymy and polysemy.
• Algorithm:
– Constructs term-document matrix of occurrences (term: row;
doc:column): sparse, large; applies weighting with local term
weight (tf in doc) and global weight (df in corpus) –with various
fcts (binary, log, etc.)
– Rank-reduced SVD (finds relationships between terms):
transforms matrix in 3 matrixes: term-concept, singular values
concept-document; truncates it to largest k (100-300) entries
in sing. val.
Linear Least Squares Fit
• finding the best-fitting curve to a given set of points by
minimizing the sum of the squares of the offsets ("the
residuals") of the points from the curve
For obtaining
keyword-based
feature vectors
Building Semantic
Network Profile
• keywords added to a semantic net
– if keyword already in, node’s score increased by user’s
feedback (or decreased, for negative feedback).
– otherwise, a new node is created.
• update the weights on the keyword
co-occurrence arcs.
72
Building Concept Profiles
• Similar to semantic network; however:
• Usually, a reference ontology (often,
hierarchy) is used as a base
• User profiles can be a weighted
mapping over the ontology
73
Overview: UM
1.
2.
3.
4.
5.
6.
Introduction
History
Information collection
User Model Representation
User Model Construction
Issues
1. Privacy
2. Profile exchange
3. Profile editing
74
Privacy Issues
• Personal user information: critical data
– Where is it stored?
• Local machine
• Not at all
– How is it protected?
– Can users alter it?
– Real identity? Country laws?
• Anonymity: session ids, cookies, login w.
pseudonyms
75
Profile Exchange
• multiple systems collect user info
– Integrating, exchanging profiles => better
personalization
– Ubiquitous User Modelling
– Ontologies for profile exchange
• GUMO (follow-up)
76
Who Maintains the Model/Profile?
• user/administrator
– Sometimes the only choice
• system (automatic personalization)
• collaborative - user and system
– User creates, system maintains
– User can influence and edit
– Does it help or not?
77
Conclusions
• accurate representation of user is crucial to performance of
personalized information access systems
• surveyed user modelling definitions, history, popular
techniques for collecting user formation, representing and
building user models, issues (privacy)
• On-going research topics...
– How to improve profile accuracy?
– How to quickly achieve profile stability? How to identify major/minor,
long-term/short-term interest of users?
– How to determine appropriate level of depth in the interest hierarchy
in user profile...?
78
Download