Paper

advertisement
Fuzzy Set Theoretic Framework for Handling of Subjectivity and
Associated Uncertainty in Knowledge Representation
Dr. Azene Zenebe and Dr. David Anyiwo
Department of Management Information Systems
Bowie State University
Bowie, Maryland 20715
Abstract
In cybernetics reality is viewed as an interactive conception between the observer and the
observed. Consequently, subjectivity is omnipresent during any attempt for constructing
‘meaning and understanding’. This paper introduces the relationships among cybernetics,
fuzzy set and uncertainty. It then presents fuzzy set theoretic framework for representing
subjectivity and the associated uncertainty in knowledge modeling and representation.
The framework is applied to movie objects representation and reasoning in movie
recommender systems. Related recent research and development efforts are also
discussed.
1. Introduction: Cybernetics and Fuzzy Set Theory
Originally, cybernetics was defined in 1947 by Wiener as ‘the science of communication
and control, and grew out of Shannon's information theory, which was designed to
optimize the transfer of information through communication channels, and the feedback
concept used in engineering control systems.’ With advancement of computer
technology, various other fields are related to cybernetics, including knowledge
elicitation and representation. Knowledge representation tools for intelligence and
modeling use computer programs that reorganize models in order to make knowledge
representation more adequate, correct, and semantically rich.
The cybernetic approach is centrally concerned with this unavoidable limitation of what
we can know: our own subjectivity. That is, cybernetics' epistemological stand is that all
human knowing is constrained by our perceptions and our beliefs, and hence is
subjective. As a result, cybernetics has directly affected software for intelligent training,
knowledge representation, cognitive modeling, computer-supported cooperative work,
and neural modeling.
Support for cybernetics to be engaged with subjectivity comes from the fuzzy set theory.
In fuzzy set theory, a set is a collection of objects such that each object has an associated
membership degree usually between zero and one as a subjective evaluation or a degree
of truth. As natural extension of fuzzy set, fuzzy logic provides a means to model partial
truth by extending the Aristotelian two-valued logic to infinite-valued logic. Moreover,
fuzzy set theory also supports the acceptance of uncertainty associated with subjectivity,
imprecision and vagueness in the beholder of the observer about reality. Negoita (2002)
stated:
1
“Cybernetics is defined by control. If physics is the science of
understanding the physical environment, then control should be viewed
as the science of modifying the environment, in physical, biological, or
even social sense. Bringing cybernetics to the streets, the fuzzy sets
wittingly helped to accelerate the shift from the Enlightenment ideal of
the two-valued logic, to postmodern preoccupation with many degrees
of truth.” (Negoita 2002).
Pascal and Fermat tackled uncertainty through probability theory from 17th century.
However, this theory neither allows subjective belief to be dealt with nor allows the
problem of imprecise and uncertain knowledge to be solved. In the 1980s, it was realized
that uncertainty is a multidimensional concept. The dimensions are related to the different
categories of uncertainty that exist in a system model, including uncertainty due to
randomness, uncertainty due to ambiguity and uncertainty due to vagueness and
imprecision. Hence, uncertainty is obscured when it was conceived solely in terms of
probability theory that covers only one of its dimensions.
The different dimensions of uncertainty call for appropriate mathematical formalisms
including evidence theory, fuzzy set theory (Zadeh 1965) and possibility theory (Zadeh
1978) that complement the probability theory. One of the dimensions of uncertainty is
uncertainty due to imprecision and vagueness that is identified with lack of sharp or
precise distinctions in the world. It is also referred to as fuzziness.
Uncertainty has a significant role in improving the usefulness of systems model. Ignoring
uncertainty in modeling and during inference by a computational system has its own
consequences since decisions are made in situations where uncertainty exists. Klir and
Wierman (Klir and Wierman 1999) explain the significant role of uncertainty as follows:
“ … Uncertainty becomes very valuable when considered in connection
to the other characteristics of systems models: a slight increase in
uncertainty may often significantly reduce complexity and; at the same
time it increases, credibility of the model. Uncertainty is thus an
important commodity in the modeling business, a commodity which can
be traded for gains in the other essential characteristics of models.” (Klir
and Wierman, pp. 4).
This paper is organized into five sections. Section 1 introduces the relationships among
cybernetics, fuzzy set and uncertainty. Section 2 presents the fuzzy set formal definitions
and the framework developed for handling subjectivity and associated uncertainty in
knowledge representation. Section 3 presents the application of the framework to
Ontology and the Semantic Web; and Section 4 presents the utility of the knowledge
model for personalized recommendation for movies. Finally, Section 5 presents the
conclusion and future research directions.
2
2. Fuzzy Set Theoretic Framework for Representing Subjectivity and
Associated Uncertainty
2.1.
Fuzzy Set Theory
Fuzzy set theory consists of mathematical approaches that are flexible and wellsuited for handling incomplete information, the un-sharpness of classes of objects or
situations, or the gradualness of preference profile (Dubois and Prade 2000). Its building
blocks are fuzzy sets (membership functions), aggregation operators, techniques of
measurement of membership, similarity and fuzzy orderings, fuzzy relations equations,
fuzzy number and intervals, fuzzy interval-valued analysis, and approximate reasoning
and fuzzy rules (Zimmermann 1996; Pedrycz and Gomide 1998; Dubois and Prade
2000).
A fuzzy set B in X is characterized by its membership, denoted by  B , and
defined as (Zadeh 1965):
B ( x) : x  X  [0,1] where, X is a domain space or universe of discourse. Also,
B can be characterized by the set of pairs, B  {( x,  B ( x)), x  X }
 B (x) is the grade of membership of x in B having different interpretation
depending on the context in which X is used and the concept to be represented. Dubois
and et al. (Dubois, Ostasiewicz et al. 2000), and Bilgiç and Turksen(Bilgiç and Turksen
2000) present a review of various interpretations of the fuzzy membership function
together with techniques for elicitation of a membership function. The two relevant
interpretations are:
 degree of similarity -  B represents the proximity between pieces of
information. For example, membership grade of a user's movie interest to the
fuzzy set of "Drama movies lover" can be estimated by degree of similarity.
 degree of uncertainty or truth -  B can be viewed as the degree of plausibility
that X has value x, given that all that is known about it is that "X is A", where A
is a fuzzy set. (Zadeh, 1978 #2044).
Furthermore, the type of membership function that is suitable can only be
determined in the application context, however in certain cases the meaning captured by
fuzzy sets is not too sensitive to the variations in the shape (Pedrycz and Gomide 1998).
In practice, triangular, trapezoid, Gaussian function, S-function,   function and
exponential-like function are the most commonly used membership functions.
There are various fuzzy set operators as a substitute or extension of the crisp set
operators. For fuzzy set A and B in X, the triangular norm (t-norm) and a triangular conorm (t-conorm or s-norm) are the general classes of intersection and union operators
(Pedrycz and Gomide 1998). The max operator is defined as Max (A or B) =
maximum{  A (x) , B (x) }, and min operator defined as Min (A or B) = minimum{  A (x) ,
 B (x) } for x in X.
3
Fuzzy set and logic provide a way to quantify the uncertainty due to vagueness
and imprecision; allow computers to process abstract or subjective concepts that are
represented with linguistic variables like very rich/expensive; and can universally model
a complex system without the need to know the underlying governing mathematical
equations (Zadeh 1994). For a symbolic variable (a variable with symbolic values),
uncertainty can be represented in terms of qualitative expressions or by using fuzzy sets
with a corresponding membership function. Examples of symbolic variables are
preference, genre content of a movie and degree of role of an actress in a movie.
Therefore, fuzzy systems are capable of processing incomplete and imprecise data, and
provide approximate, but acceptable, solutions to problems that are difficult for other
traditional methods to solve. (O’Brien, James A., 2002).
2.2.
Knowledge Representation using Fuzzy Set for Movies
Movie genres describe the content of movies, and movies are multi-genres
(Altman 1999). It is inappropriate to treat all genres equally as some genres may be more
significant than others. An analysis of the descriptions of the main film genres shows that
movies of genre g1 (e.g. action) and movies of genre g2 (e.g. adventure) share common
subject matter and other movie’s attributes (Staiger 1997). Hence, it is sometimes
difficult to judge whether a movie belongs completely to a genre or not. As a result, it
induces uncertainty in the determination of the genres distribution of a movie. Fuzzy set
allows us to represent the uncertainty data.
With the definition of a movie in space of genres (Table 1), a movie has one
major genre denoted by g1 and other minor genres g2, g3, etc. in the decreasing order of
degree of genre presence in the movie. For a given vector G = {gk, k = 1 …N}, where N
is total number of genres in a movie, the corresponding degree of membership of a movie
mj to a genre gk in G is denoted by gjk = μ g k (m j ) . Hence, for m j , Gj={( gk, gjk), k=
1… N}, where gjk can be obtained either heuristically from domain experts or empirically
from the data.
Table 1: A user’s ratings of m-movies for a user, and genre distribution of a movie
Attribute (Genre)
mi
g1
g2
g
j
...
gN
g i1
g i2
g ij
...
g im
ith movie in space of genre
...
To determine the degree of genres presence in movies, the following two steps are
followed:
4
Step 1: Arrange gk in order of descending magnitude to the movie under consideration. In
IMDB, movie’s genres are presented in their order of significance [IMDB sites/
documentation]. For example, movie ‘BOOTMEN’ has Comedy as a major and
Drama as a minor genre, which is stated as:
Step 2: Assign higher degrees of membership or compatibility value to more important
genres of a movie. For instance,
If mj has only one genre, then gj1 = 1 and gjk = 0 for all k = 2 to N.
If mj has two genres, then gj1 = 0.8, gj2 = 0.2 and gjk = 0 for all k = 2 to N.
If mj has three genres, then gj1 = 0.70 and gj2 = 0.30, gj3 = 0.10 and gjk = 0 for all
k = 2 to N.
and so on
We propose to represent the generalization of this type of heuristic rules gjk using
a fuzzy set membership function. In particular, it is represented as a function of the
number of genres (|Lj|) in a movie mj and rank position (p) of a genre using a decreasing
and smoothing exponential function (Figure 1), defined as:
gjk =  g k (m j ) 
p/2
 *L j ( p 1)
(2)
for p between 1 and |Lj|, and  > 1 is the threshold to differentiate/optimize the difference
between consecutive genres in a movie. After a number of trials,  is assigned a value of
1.2.
Two examples that use the exponential membership function are presented next.
(i) For movie ‘BOOTMEN’: Comedy/Drama: L=2 and G ={( Comedy,1), (Drama,
0.68)}.
(ii) For movie ‘Muppet Treasure Island (1996)’: Family / Action / Adventure / Comedy /
Musical / Thriller: Lj = 6 and G = {( Family, 1), (Action, 0.31), (Adventure, 0.22),
(Comedy, 0.16), (Musical, 0.12), (Thriller, 0.09)}.
The membership function in equation (2) considers the total number of distinct
genres in a movie, which leads to varying degree of membership values for the same
genre at same rank positions among movies with different number of genres. It also
results in a normalized fuzzy set representation of a movie in the genre space, where the
maximum membership value is 1. Which means  g k (m j ) represents the degree of
similarity of a movie mj to a hypothetical or prototype pure gk type movie.
5
Possibility distribution of genres
in a movie
1
membership
degree
0.8
0.6
0.4
0.2
0
1
2
3
4
5
6
7
Rank position of a ge nre
Figure 1: Possibility distribution of genres in a movie
Similarly, the actors in a movie can be represented in a vector A = { a1, a2, … ak}
for k actors. The degree of role or importance of an actor ak in a movie mi can be
represented by degree of membership associated with the fuzzy set degree of role or
importance. That is, Aj = {( ak, μ a (m j ) ), for k=1 to K}. Similar to the membership
k
function defined for genres, μ a (m j ) can be defined as
k
akj = a (m j )  p / 2
k
 * | Aj | ( p  1)
(3)
Where, it is represented as a function of the number of actresses (|Aj|) in a movie
mj and rank position/role (p) of an actor between 1 and K=|Aj|, and  > 1 is the threshold
to differentiate/optimize the degree of role among actors in a movie.
The representation scheme can be generalized and applied to an item (I) with
multi-valued feature X with overlapping or non-mutual exclusive possible values, like a
movie having one or more genres, actors/actresses, etc. That is, for X={x1, x2, x3, …. xk},
x k ( I j ) represents the membership degree of an item Ij to the hypothetical pure item with
value type xk of feature X. For example, for a book the features can be topic, author; and
for music the features can be music genre, and band members.
6
3. Application in Ontology and the Semantic Web
Ontology is ‘a controlled vocabulary that describes objects and the relations
between them in a formal way, and has a grammar for using the vocabulary terms to
express something meaningful within a specified domain of interest.’ It uses classes to
represent concepts, and supports taxonomy and non-taxonomy relations between classes.
Current Ontologies for Movies do not support the representation of subjectivity and
associated uncertainty information in movies classes and attributes. For instance, it is
inappropriate to treat all genres, actors/actresses, etc. equally as some genres and
actors/actresses may be more significant than others. Also, it is sometimes difficult to
judge whether a movie completely or to some extent belongs to a genre or not.
There are various taxonomy and non-taxonomy relations for movies application
domain. For instance, Figure 2 reveals the generic labels employed by film reviewers in
the television listings magazines in the British What’s On TV over several months in
1993.
Figure 2: The generic labels employed by film reviewers in the television listings
magazines
Source: http://www.aber.ac.uk/media/Documents/intgenre/intgenre6.html
The fuzzy theoretic framework described in Section 3 can be used to represent the
uncertainty information in classifying a movie object to one or more classes in Figure 2.
For example, if a movie is Romantic and Comedy with degrees of membership of 0.8 and
0.5 respectively, then the movie can be classified as Romantic comedy with SQRT(
0.8*0.5) = 0.63 of degree of membership.
7
Related literature include the Fuzzy Ontology Generation frAmework (FOGA)
(Quan, Hui et al. 2004) used to generate scholarly ontology for the Semantic Web from
citation databases automatically. Membership values are used to evaluate similarities
between concepts on a concept hierarchy (Quan, Hui et al. 2004). It is based on the fact
that all keywords are not equal in describing a document, and it is difficult to judge
whether a document belongs completely to a research area or not. In order to support the
need for covering uncertainty in the Semantic Web context, Fuzzy RuleML is an ongoing
initiation by Fuzzy RuleML Technical Group in W3C. Similarly, Fuzzy OWL is an
extension of OWL to handle subjectivity and associated uncertainty in the Semantic Web
(Giorgos, Giorgos et al. 2005).
4. Utility for User Preference Modeling and Personalized
Recommender Systems
There are various preference elicitation methods. The two popular traditional
methods based on decision and utility theory are utility function and analytical hierarchy
process elicitation methods (Geisler, Ha et al. 2001; Ha and Haddawy 2003). These
methods mainly query users about the behavior of value function, or utility of every
outcome in terms of each decision criteria. They are time-consuming, error-prone and
require a lot of effort from users. To overcome these limitations of explicit elicitation of
preference, computer based implicit elicitation methods are developed.
The representation of movies in the genre, actor space, as introduced in previous
sections, creates opportunities to study the pattern in user preferences for movie genres.
The underlying assumption is that a user prefers items of some features over others.
Using domain analysis on an item I of interest based on its feature X, the soundness of
preference modeling based on feature X needs to be verified. Then, user preferences to X
can be inferred from users’ feedback on items, first by segmentation of the items into
three groups: disliked items (DI), liked items (LI), and neutral or indifferent items (NI).
Secondly, the X composition of each segment item represented in the fuzzy set is
analyzed to determine the membership degree of each value or category of X values in
DI, LI, and NI. Using an aggregation function, say average, the aggregated memberships
based on the X values’ of items in DI, LI, and NI are assigned to non preferred (NP),
preferred (P) and indifferent (I) preference classes. Those X values’ not in NP, P or I are
assigned to Unknown (U) preference class. Hence, user’s preference to a class value of
X can belong to one or more of these classes with varying degrees of membership.
An algorithm for preference modeling is under development by considering
movie as I and genre G as X. Furthermore, a recommendation algorithm based on the
preference models of a user is under development.
8
5. Conclusion and Future Research
This paper first introduces the relationships among cybernetics, fuzzy set and
uncertainty. Secondly, it presents the fuzzy set framework that provides a means to
represent subjectivity that inherently exists in information and knowledge-based
information systems. It provides better ways to organize and represent information, thus
enhancing the modeling of complex systems. Thirdly, it presents the application of the
proposed framework for Ontology and the Semantic Web. Finally, it presents the utility
of the knowledge model for personalized recommendation for movies.
Future research will focus on extending the framework in various complex
information systems as well as evaluating the effectiveness of the framework in
personalized recommender systems.
9
REFERENCES
Altman, R. (1999). Film/Genre. London, British Film Institute.
Bilgiç, T. and I. B. Turksen (2000). Measurement of Membership Functions: Theoretical
and Empirical Work. Fundamentals of Fuzzy Sets, Kluwer, pp. 195-232. D.
Dubois and H. Prade. Boston, Kluwer. 1: 195-232.
Dubois, D., W. Ostasiewicz, et al. (2000). Fuzzy Sets: History and Basic Notions.
Fundamentals of Fuzzy Sets. D. Dubois and H. Prade. Boston, Kluwer Academic
Publishers: 21-124.
Dubois, D. and H. Prade (2000). General Introduction. Fundamentals of Fuzzy Sets. H.
Prade. Boston, Kluwer Academic Publishers: 21-124.
Geisler, B., V. Ha, et al. (2001). Similarity of Personal Preferences: Theoretical
Foundations and Empirical Analysis. IUI'01, Santa Fe, New Mexico, ACM.
Giorgos, S., S. Giorgos, et al. (2005). Fuzzy OWL: Uncertainty and the Semantic Web.
OWL: Experiences and Directions, Galway, Ireland.
Ha, V. and P. Haddawy (2003). "Similarity of Personal Preferences: Theoretical
Foundations and Empirical Analysis." Artificial Intelligence 146(2): 149-173.
Klir, G. J. and M. J. Wierman (1999). Uncertainty-Based Information. Heidelberg,
Germany, Physica-Verlag.
Negoita, C. (2002). "Postmodernism, Cybernetics and Fuzzy Set Theory." Kybernetes:
The International Journal of Systems & Cybernetics 31(7-8): 1043 - 1049.
Pedrycz, W. and F. Gomide (1998). An Introduction to Fuzzy Sets. Cambridge,
Massachusetts, The MIT Press.
Quan, T. T., S. C. Hui, et al. (2004). FOGA: A Fuzzy Ontology Generation Framework
for Scholarly Semantic Web. In Proceedings of the Knowledge Discovery and
Ontologies Workshop, Pisa, Italy.
Staiger, J. (1997). "Hybrid or Inbred: The Purity Hypothesis and Hollywood Genre
History." Film Criticism. 22(1): 5-21.
Zadeh, L. (1978). "Fuzzy Sets as a Basis for a Theory of Possibility." Fuzzy Sets System
1(1): 3-28.
Zadeh, L. A. (1965). "Fuzzy Sets." Information Control 8: 338-353.
Zadeh, L. A. (1994). "Fuzzy Logic, Neural Networks, and Soft Computing." Comm.
ACM 37(3): 77-84.
Zimmermann, H.-J. (1996). Fuzzy Set Theory and Its Applications. Boston, MA, Kluwer
Academic Publishers.
10
Download