Success Measures of Accelerated Learning Agents for e-Commerce By J.

Success Measures of
Accelerated Learning Agents for e-Commerce
By
Kimberly J. Ryan
BA, Mathematics 1992
Boston College
Submitted to the System Design and Management Program
In Partial Fulfillment of the Requirements for the Degree of
Master of Science in Engineering and Management
at the
Massachusetts Institute of Technology
September 1999
Signature of Author: ....
.t ................
.
JI
.
. . ..
Kimberly J.Ryan
August 23,1999
Certified by.........
.k.
..................
.
.............
Dan Ariely
August 23,1999
Accepted By..............
.
.........................
Tom Kochan
August 23,1999
@Kimberly J.Ryan. All Rights reserved.
The author hereby grants MIT the right to reproduce and distribute publicly, both paper and electronic copies of this thesis
document inwhole or in part.
Abstract
One way to address the information explosion on the World Wide Web is to use an "information
filtering agent" or "smart agent" which can select information according to the interest and/or need
of an end-user. Two main types of smart agents are Feature Based Filtering (FBF) which builds a
profile of the individual and Collaborative Based Filtering (CBF) which builds a profile of the market
segment.
With Feature Based Filtering (FBF), content is separated into classifications or attributes and
presented to the user based upon a predetermined ranking of the content within that classification.
To the extent that CBF is a shortcut for FBF, we suggest that FBF is a more promising approach.
The results however should also apply to CBF.
The methods by which we attempted to separate the leaming process for utility compared
subjective and objective tests, single vs. multiple stimuli and multiple correlated vs. multiple
orthogonal stimuli. This work examines the success measures of various utility leaming methods in
an attempt to determine which method most successfully predicts user preference. The utility
profiling methods examined were: attribute ranking, regression of orthogonal data, regression of
correlated data, regression of orthogonal data sets and regression of correlated data sets.
In conclusion, this work presents to the reader the conditions under which the successful methods
may be used.
2
Table of Contents
A B STR ACT ...................................................................................................................................................
2
TA B LE OF CONTEN TS..............................................................................................................................3
TABLE OF FIGURES AND TABLES........................................................................................................5
A CK N OW LEDG EM N TS .........................................................................................................................
6
MOTW ATION..............................................................................................................................................
7
FEATURE BASED FILTERING .........................................................................................................................
8
A dvantages..............................................................................................................................................
8
Disadvantages.........................................................................................................................................
8
MATHEMATICAL DESCRIPTION OF THE CLASSIFICATION PROBLEM...........................................................
COLLABORAT
E BASED FILE
IG ...........................................................................................................
A dvantages............................................................................................................................................
Disadvantages
OF.THE.CLM..........................................................................................
MATHEMATICAL DESCR.ON OF THE CLUSTERING PROBLEM..................................................................
ISSU ES EXA
N ED..................................................................................................................................
9
I1
11
12
12
14
HYPOTHESIS .............................................................................................................................................
15
ypothesis ..........................................................................................................................................
15
pothesis 2 ..........................................................................................................................................
15
Hy1pothesis 3................................................................................................
15
Hypothesis 4............................................................................................................................................
16
METHODS......................................................................................................
16
SUBJECTS...................................................................................................................................................
16
PROCEDURE................................................................................................................................................
16
A ttributeFiltering..................................................................................................................................
17
CorrelatedStimuli .................................................................................................................................
17
OrthogonalStimuli................................................................................................................................
17
OrthogonalSet Stimuli.....................................................................................
17
CorrelatedSet Stim uli...........................................................................................................................
18
Final Test ... Stiul............................................................................................................................
18
TECHNOLOGY DESIGN ................................................................................................................................
19
3
RESUL S.........................................................................
20
.........................................................................
DESCRIP ION OF DATA ANALYSIS..............................................................................................................
20
Hypothesis 1- Assertion that objective data will be better than subjective data.................................
21
Hypothesis 2 - Orthogonalstimuliwill provide better opportunityto learn than correlatedstimuli...... 23
Hypothesis 3 -Sets ofstimuli will provide better opportunityto learnthan individualstimuli............ 24
Hypothesis 4 - Sets ofstimuli given correlatedstimuliprovide opportunityto learnthan sets of stimuli
given orthogonalstimuli........................................................................................................................
DISCU SSIO N .....................................
. ...................... ...................................
26
.................................... 27
REFERENC ES............................................................................................................................................30
APPEND IX I.............................................
...................
.........................
.. ...........................................
.. 32
APPEN DIX II..............................................................................................................................................34
APPEND IX Il ..............................................................................
4
..........................................................
41
Table of Figures and Tables
FIGURE I - THE CLASSIFICATION PROBLEM..................................................................................................
9
FIGURE 2 - APPROXIMATE SEPARATING PLANE.............................................................................................
10
FIGURE 3 - THE CLUSTERING PROBLEM...........................................................................................................
13
FIGURE 3 - IMAGE EXAMPLES WITH ATRmuTEs SPECIFIED...........................................................................
16
TABLE 1 - DESCRIPTION OF ATTRIBUTE VALUES FOR ALL LEVELS OF EACH INDIVIDUAL ATTRIBUTE........... 19
EXAMPLE 1- DISTANCE FROM IDEAL, SUBJECTIVE PREFERENCE POLLING ..................................................
20
FIGURE 4- MEAN DIFFERENCE FOR SUBJECTIVE AND OBJECTIVE TESTS......................................................
22
FIGURE 5- MEAN DIFFERENCE FOR INDIVIDUAL ATTRIBUTES OF SUBJECTIVE AND OBJECTIVE TESTS ........... 22
FIGURE 6 - MEAN DIFFERENCE FOR CORRELATED AND ORTHOGONAL TESTS.............................................
23
FIGURE 7 - MEAN DIFFERENCE FOR INDIVIDUAL ATTRIBUTES OF CORRELATED AND ORTHOGONAL TESTS... 24
FIGURE 8 - MEAN DIFFERENCE FOR SINGLE STIMUI VS. SETS OF STIMUL TESTS.......................................
25
FIGURE 9 - MEAN DIFFERENCE FOR OF INDIVIDUAL ATTRIBUTES OF SINGLE STIMULI VS. SETS OF STIMULI
TESTS.....................................................................................................................................................25
FIGURE A 1 - ATTRiuTE TEST, PHASE I...................................................................................................
34
FIGURE A 2 - CORRELATED TEST, PHASE 1 ..............................................................................................
35
FIGURE A 3 - ORTHOGONAL TEST, PHASE I ..............................................................................................
36
FIGURE A 4 - ORTHOGONAL SET TEST, PHASE 1 .......................................................................................
37
FIGURE A 5 - CORRELATED SET TEST, PHASE I ..........................................................................................
38
FIGUR A 6 - FINAL SCREEN ..........................................................................................................................
39
FIGURE A 7 - FINAL TEST, PHASE 2 (SAME FOR ALL PHASE
1 TESTS)..........................................................
40
TABLE A I - CORRELATION M ATRiX..............................................................................................................
41
TABLE A 2- ORTHOGONAL M ATRIX ..............................................................................................................
42
TABLE A 3- CORRELATED SET ......................................................................................................................
43
TABLEA 4- ORTHOGONAL SET .....................................................................................................................
46
TABLE A 5 -FIN AL T EST ................................................................................................................................
49
5
Acknowledgements
There were quite a few people instnmental in my success in completing this degree. I would like to
thank Dan Ariely for his tremendous efforts in assisting me with my thesis, John Williams for giving
me the opportunity to attend MIT, the SDM program office (Anna Barkley, Margee Best, Matt
Flynn, Dan Frey, Jonathan Griffin, Leen Int Veld, Mats Nordlund) for their organizational efforts, Ely
Dahan for his assistance with setting up the design of my experiment, and William Aboujaoude and
Steve Martin for initially encouraging me to pursue this degree.
Two other people deserve special recognition for their heroic efforts. I would like to thank John
Martin for his technical and emotional support in the last phase of this degree. And I would also like
to give a special thank you to my son, Bradlee Ryan, for whom this degree was initially started and
who suffered the most in my pursuit of it.
AlWays fie two stings to your bow..
-W Martin
6
Motivation
The most important element in marketing is an intimate understanding of customer needs; yet the
markets for most products and services are extremely fragmented in today's economy. Customer
intelligence is the key to business success in an increasingly competitive environment. The only
way to keep customers happy is to meet their evolving needs, today, and, in the future. But
understanding and responding to the needs of thousands of customers is extremely difficult. A
company must be able to answer questions such as: who are my core shoppers? What are their
shopping habits? What are their preferences? Armed with the answers to these questions, every
person in the organization, from the CEO down, can make intelligent customer-centric business
decisions. The result is higher customer acquisition, retention, and a healthier bottom line.
Today's most successful online businesses are leveraging technology to do what human
employees no longer can - get to know the individual wants, needs and preferences of every
customer and dynamically personalize their offerings to each customer based on that knowledge.
The most promising way to address the information explosion is to use an "information filtering
agent" or "smart agent" which can select information according to the interest and/or need of an
end-user.
The general idea behind smart agents is that documents or products (which I will refer to as
content in this paper) are recommended based on properties or attributes and their match with the
user profile. In this work, I am trying to understand which of several methods is most successful in
leaming and predicting customer preferences.
Two main types of smart agents are Feature Based Filtering (FBF) which builds a profile of the
individual and Collaborative Based Filtering (CBF) which builds a profile of the market segment.
Both FBF and CBF employ data mining algorithms in order to work properly. In each case, the
ultimate goal is to successfully predict content that satisfies the end user.
7
Feature Based Filtering
Filtering works by learning about each individual's preferences through observing behavior, such
as click-thrus or recalling past behavior, such as purchase histories and then asking the individual
to rate a number of relevant items. With Feature based Filtering (FBF), content is separated into
classifications or attributes and presented to the user based upon a ranking of the content within
that classification. An example of this would be the web page that contains meta-tags for keywords.
The user, poiling any standard search engine for one or more of the keywords would then find that
particular web page. In this example, the keywords serve as attributes. The page ranking in the list
of returned pages would be based upon the number of times the keyword was displayed, relative
to the other pages. Another example, coming from the commerce perspective, would be where a
shopper would be searching for a particular item. By polling on the attributes of the desired item,
the shopper would be returned a list of available products that closely match the attributes desired.
Advantages
=
New content is as likely as old content to be presented to the user given an equal
-
profile rating for a specific attribute.
The ability to learn and adapt according to a specific individual's profile.
No requirement that other users participate in the system.
=
Searches are generally optimized in terms of performance for queries on the indexed
-
attributes, and may therefore return results quicker.
Improves the targeting of advertisements and announcements
Disadvantages
-
Classification of content may be misleading. (e.g. irrelevant documents which happen
to contain the keyword- a search on the keyword Lotus search returning information
about the software company rather than the car; an online catalog that labels the color
=
of a sweater as "cream"; when the user is searching for an "off-white" sweater.)
Little opportunity for serendipitous discovery (it is quite common to browse a brick and
mortar shop for a particular item, and then find a related but equally satisfying item
while there, e.g. searching the shelves for a particular book title, and then finding other
-
books on the same shelf that look interesting and buying both)
Requires storage of personal information, which may result in concerns over privacy
" Storage of preference may be economically prohibitive for large data sets
- When there are many attributes and not much data, it may be too difficult to make an
accurate recommendation.
8
Mathematical Description of the Classification Problem
"In classification the basic goal is to predict the most likely state of a categorical variable (the
class)." [3] The task of a classification algorithm is to estimate a function gwhich maps points from
an input space x to an output space y given a finite sampling of the mapping given only a finite set
of the mappings as shown in Figure 1.
Figure 1 - The classification problem
To formulate as a linear program, the problem is restated as estimating a classification function
which assigns a given vector x into one of the disjoint sets A or B in n-dimensional feature space.
The classification function g (x) has the following form:
xeA
9(X)= 0 if
1 if
xeB
All the m elements of the finite point set A c R" (n-dimensional real space) are represented as
m
the matrix Ae R'x.
Where each element of A is represented by a row in A. Similarly, the k
elements of finite point set B as B e R' xn
The goal is to distinguish between the points of A and B by constructing a separating plane P such
that
P={xIxe Rn,xT o=y},
9
With the normal o e R and distance
to the origin. The requirement is to determine o and
y so that the separating plane P defines two open half spaces { x | x E Rn, xj( > y} containing
nT
mostly points of A, and {x Ix e R",
x o < y} containing mostly points of B.
Where
|ll
12
i
=
)
12)1/2
j=1
Hence we wish to satisfy
Ao > ey, Bo < ey
[I
as far as possible. This can be done only if convex hulls of A and B are disjoint. Under this
assumption 1 can be satisfied in some best sense by minimizing some norm of the average
violations of 1 such as
Figure 2 - Approximate separating plane
7
| 1wo||2
0
0
0
0
0
f(w, r) =min
mjn
CO,,m
w,,
10
(-Aa)+ ey + e)I1+ k-||1(Ba) -ey +e)|1
[2]
The linear programming formulation 2 obtains an approximate separating plane that minimizes a
weighted sum of the distances of misclassified points to the approximate separating plane. For the
non-convex problem of minimizing such misclassified points.
Collaborative Based Filtering
Collaborative Filtering works by learning about each individual's preferences through observing
behavior, such as dick-thrus or recalling past behavior, such as purchase histories and then asking
the individual to rate a number of relevant items. The technology then pools this information with
knowledge gained from a community of other individuals who share similar tastes and interests into
a cluster. Individuals are then grouped within their cluster so that the agent can make predictions.
Because the collective preferences of a community are a predictor of how an individual in that
community might like items he/she has not yet tried, the technology draws upon this knowledge to
make recommendations with high predictive accuracy, although it is unknown how high the
accuracy actually is.
The Massachusetts Institute of Technology's World Wide Web Consortium has developed an
interesting example of collaborative filtering. They have developed a set of technical standards
called PICS (Platform for Internet Content Selection) so that people can electronically distribute
descriptions of digital works in a simple, computer-readable form. Computers can process these
labels in the background, automatically shielding users from undesirable material or directing their
attention to sites of particular interest. The original impetus for PICS was to allow parents and
teachers to screen materials they felt were inappropriate for children using the Intemet. Rather than
censoring what is distributed, as the Communications Decency Act and other legislative initiatives
have tried to do, PICS enables users to control what they receive.
Advantages
-
Enhances knowledge distribution amongst communities of like-minded people.
-
Facilitates the creation of interest groups.
-
Recommendations are based upon the quality of content (as determined by the
-
group) rather than the objective properties of the content itself.
Content need not be amenable to parsing by the computer.
The mathematical program is described in detail in [3].
11
"
=
Learms quickly
Not limited by the number of attributes and missing data, i.e. the demands on the data
are lower
Disadvantages
"
Recommendations require a large community of users in order to be accurate
"
Recommendations are limited by the group norms and will not be tailored for a
specific individual as they rely on the fact that tastes are not randomly distributed, but
rather form general trends and patterns.
-
Sensitive to the individual's distance from the mean of the duster
=
Slow to recommend new products
=
Little is learned about individual consumers, which limits targeting of advertising to the
interest group
-
Requires storage of personal information, which may result in concerns over privacy
"
Requires the sharing of information, which is an even larger concern for privacy
-
Storage of preference may be economically prohibitive for large data sets
Mathematical Description of the Clustering Problem
Clustering deals with the problem of assigning elements of a given set to groups or clusters of like
points. The problem is formulated as a non-hierarchical clustering approach where the number of
dusters or groups is fixed a priori.
Given a set A ofm points in R represented by the matrix A e R, and a number k of desired
clusters. We need to determine centers C, I= 1,..... k in R such that the sum of the minima over
I e {1,...,k}of the 1-norm distance between each point A i = 1,.....,m, and cluster centers C is
minimized.
12
Figure 3 - The clustering problem
0
0*
Cluster
*@r
.0@e
Distance
0.
Given m points {x', x2, ..., x'} in n-dimensional real space R", and a fixed integer k of clusters,
determine k centers in R, {c', c2, ... , e}, such that the sum of the "distances" of each point to a
nearest center is minimized. The clustering problem is:
|| xm - c'
mininn
, cC
c I,...
where norm
k
1|.11
is some arbitrary norm on R.
The clustering problem is formulated as a bilinear program as
m
min imize
c, d, t
-d
x' - c'
k
(t(e
T
di,
d ,, i=1,...,m, I=1,...k,
Subject tok
13
Issues Examined
Smart agents have two distinct life stages: that of learning and that of performance. In the case of
real-time adaptive agents, learning may continue into the performance stage so that the agent
becomes "smarter" with each use. The learning process is important because itis the limiting factor
by which we are able to understand the customer. Unless one listens to the voice of our customer,
one can never expect customer pull for products and will always be working on the product push
model of marketing. This work examines the leaming process.
Learning may take place on many dimensions. Inthis work, I have studied the following aspects of
the data in order to determine with dimensions allow for the greatest predictive accuracy:
orthogonality of attributescorrelation of attributes for single stimuli, orthogonality/correlation within a
set of stimuli, and subjective vs. objective preference polling.
I first look along the dimension of objective vs. subjective polling to determine which is more
successful in predicting preference. Itmay be the case that when polled about attributes and their
relative importance, subjects are not as truthful or as knoAledgeable about their expected
preferences as they could be. I want to determine if there is greater predictive accuracy by the
methods that infer a preference than the methods that explicitly ask for a preference. This would
test the dimension of subjective vs. objective learning.
Within the methods that infer a response, the tests are separated into those that show correlated
stimuli and those that show orthogonal stimuli. This test will determine if orthogonal stimuli provide
more information about the subject than correlated stimuli giving a measure of success for single
stimuli that have been separated by the orthogonal vs. correlated dimensions.
Another test of this experiment includes sets of data where the stimuli are either correlated or
orthogonal within each set. I will attempt to determine if the set of correlated images provides more
information than the set of orthogonal images, testing the dimension of sets of data.
This paper will attempt to compare the success measures of preference prediction across these
dimensions of utility profiling. The utility profiling methods examined were: attribute ranking,
regression of orthogonal data, regression of correlated data, regression of orthogonal data sets
and regression of correlated data sets. The regression of orthogonal and correlated data weare
single stimuli displays, while the regression of orthogonal and correlated data sets were multiple
stimuli displays, the attribute ranking method was a subjective preference poll.
14
To expand on the above information, for both the correlated and orthogonal single stimuli, an
individual level regression model (main effects only) was done. For the correlated and orthogonal
set stimuli, two models were developed. One model was exactly the same as that for the single
stimuli, using the rating as the dependent variable and regressed on the 7 attributes. The other
model looked at the choice amongst the altematives that the subject made. For the subjective
preference polling several models were developed. One model looked at the weightings of the
attributes, while another looked at the ideal stimuli chosen and finally the third model combined the
weightings and the ideal stimuli. A detailed explanation of the models developed can be found in
the Results section of this paper.
These methods may be used within the context of either a FBF or a CBF to further enhance their
accuracy. Here I will only look at the FBF case, although the results should also hold for the CBF
case as CBF is a special case of FBF.
Hypothesis
I expect that objective preference polling, where the individual explicitly states his/her preferences
along with their personal utilityweightings on features will have the greatest success in retuming
preferred images. Further, I expect that the orthogonal data will provide better predictive accuracy
than the correlated data. Itis also anticipated that the sets of images will allow a greater amount of
leaming than the individual images. Finally, itmay be the case that the set given correlated images
provides more information than the set given the orthogonal images. Explicitly stated:
Hypothesis I
Objective data will be better than subjective data. This will be tested by comparing attribute rating
vs. correlated/orthogonal for both single stimuli and sets of stimuli.
Hypothesis 2
Orthogonal stimuli will provide better opportunity to leam than correlated stimuli. Comparing single
orthogonal stimuli and sets of orthogonal stimuli vs. single correlated stimuli and sets of correlated
stimuli will test this.
Hypothesis 3
Sets of stimuli will provide better opportunity to learn than individual stimuli. Comparing orthogonal
and correlated single stimuli vs. orthogonal and correlated sets of stimuli will test this.
15
Hypothesis 4
Sets of stimuli given correlated stimuli provide opportunity to learn than sets of stimuli given
orthogonal stimuli. Comparing orthogonal sets of stimuli vs. correlated sets of stimuli will test this.
Methods
Subjects
An e-mail was sent to the Massachusetts Institute of Technology's System Design and
Management (SDM) classes of 1997,1998 and 1999 requesting participation in the data gathering
of this experiment (See Appendix 1). The general demographics of the subject set included roughly
70% males who were in the age range of 30-40. Respondents were mainly from the US and
currently reside within the US. Respondents typically had 7-15 years of professional experience as
engineers and at least one post-graduate degree beyond the bachelor's degree. There were
approximately 65 respondents who began the experiment, 93% of them completed both tasks.
Procedure
Respondents pointed their web browser to a web site where they were given the same set of
instructions that were included in the e-mail requesting their participation (See Appendix 1). In this
introduction stage, I explained to the test subjects that they will be asked to view a series of images
and that they will be asked to state their preference for these images. All participants were shown
the examples in Figure 3 in order to familiarize themselves with the types of images that they will
be shown.
Figure 3 - Image Examples with Attributes specified.
Images
Attributes
16
High Density
Blue/Green
High Pointalization
High Saturation
Medium Density
Purples
Medium Pointalization
Medium Saturation
Dark
Light
No Blur
Black Background
Blur
Grey Background
Once they began the experiment, they were given one of five tasks selected at random based
upon the modulus of the time at which they first began the experiment. Each of the five tasks is
explained in greater detail below.
Attribute Filtering
Participants in the Attribute Filtering experiment were shown a table (See Figure A 1)and asked to
select their preference amongst the categories. They were also asked to divide 100 points
amongst the categories in order to place weightings on highly valued characteristics vs. less valued
characteristics. Once this task was completed, the participants began Phase 2 of the experiment,
which is described below in the Final Test section.
Correlated Stimuli
Images displayed to the test subjects for this experiment were chosen such that they would be
correlated to each other. A test matrix was built, checking the correlation of each attribute to the
other and 20 images were selected such that the correlation between the attributes was as close to
one as possible (for specific numbers on the correlation between attributes, see Appendix Ill) .The
respondents were asked to rank their preference against twenty images in Phase 1 of the
experiment. They were told to select a number between -100 and +100; where -100 meant that
they really disliked the image, 0 meant that they were indifferent and +100 meant that they really
liked the image (See Figure A 2). Once this task was completed, the participants began Phase 2
of the experiment, which is described below in the Final Test section.
Orthogonal Stimuli
Images displayed to the test subjects for this experiment were chosen such that they would be
orthogonal to each other. A test matrix was built, checking the orthogonality of each attribute to the
other and 20 images were selected such that the correlation between the attributes was as close to
zero as possible (for specific numbers on the orthogonality between attributes, see Appendix ll).
The respondents were asked to rank their preference against twenty images in Phase 1 of the
experiment. They were told to select a number between -100 and +100; where -100 meant that
they really disliked the image, 0 meant that they were indifferent and +100 meant that they really
liked the image (See Figure A 3). Once this task was completed, the participants began Phase 2 of
the experiment, which is described below in the Final Test section.
Orthogonal Set Stimuli
Images were displayed in sets of four to the test subjects for this experiment. Twenty sets, for a
total of eighty images were displayed where each of the images was chosen such that it would be
17
orthogonal to other images in the set. Image 1 from each set came from the orthogonal matrix. The
respondents were asked to select the individual image within each set that they preferred the most
and then to rank their preference for that image in Phase 1 of the expemrient. They were told to
select a number between -100 and +100; where -100 meant that they really disliked the image, 0
meant that they were indifferent and +100 meant that they really liked the image See Figure A 4).
Once this task was completed, the participants began Phase 2 of the experiment, which is
described below in the Final Test section.
Correlated Set Stimuli
Images were displayed in sets of four to the test subjects for this experiment. Twenty sets, for a
total of eighty images were displayed where each of the images was chosen such that itwould be
correlated to other images in the set. Image 1 from each set came from the orthogonal matrix. The
respondents were asked to select the individual image within each set that they preferred the most
and then to rank their preference for that image in Phase 1 of the experiment. They were told to
select a number between -100 and +100; where -100 meant that they really disliked the image, 0
meant that they were indifferent and +100 meant that they really liked the See Figure A 5). Once
this task was completed, the participants began Phase 2 of the experiment, which is described
below in the Final Test section.
Final Test
Regardless of the utility examination given, each participant was asked to rate their preference for
forty images in Phase 2 of the experiment. Each image was shown one at a time, similar to the
display of information for the orthogonal and correlated stimuli display. They were told to select a
number between -100 and +100; where -100 meant that they really disliked the image, 0 meant
that they were indifferent and +100 meant that they really liked the image (See Figure A 6).
18
Technology Design
A database was created using Lotus Domino R5 that was enabled for the WM.
The database
application was built to support both Internet Explorer 3.0 and above and Netscape 4.0 and above.
It employed JavaScript and Java so that image display would be rapid after initially loading the
images. One design trade off was that the initial loading of the images would take longer using this
method, but the overall performance would be better. This was done to reduce participant
frustration as the experiment wore on, and to hopefully produce a higher completion rate.
Images were built containing one classification from each of seven attributes.
The attributes were:
1)Sizes of Circles
2) Pointalization
3) Light/Dark
4) Saturation
5) Color
6) Density
7) Sharpness
Each attribute had between two and four levels as displayed in Table 1 resulting in 1944 possible
variations on the images. These images served as the stimuli by which we will elicit a preference
rating by the test subjects. Itis presumed that the attributes of the images all represent
independent variables.
Table I - Description of Attribute Values for All Levels of Each Individual Attribute.
ATTRIBUTE
LEVEL 1
LEVEL 2
LEVEL 3
LEVEL 4
X3
X2
X1
--
Blues, Greens
Purples
Reds, Orange
--
Pointalized
5
15
50
--
Saturation
50
0
-50
--
Light/Dark
50
25
0
-25
Motion Blur
0
10
--
--
Background
Black
Gray
White
--
Density
Color
19
Results
Description of Data Analysis
In stage I for both the conelated and orthogonal single stimuli, an individual level regression model
(main effects only) was done by regressing ratings (DV) on the values of stimuli on the 7 attributes
(IV). I then used the seven regression coefficients (one for each attribute) as a measure of the
weight of that attribute on the overall evaluation (I also used the intercept).
For the correlated and orthogonal set stimuli, two models were developed. One model was exactly
the same as that for the single stimuli, using the rating as the dependent variable and regressed on
values of stimuli on the 7 attributes as the independent variables. The other model took all 80
stimuli and noted whether the test subject chose it or not and regressed these choice/no-choice
measures on the values of stimuli on the 7 attributes. Because the dependent measure in this
case was binary, I used logistic, rather than linear regression.
For the subjective preference polling several models were developed. Model 1 was simply the
weights given by the test subject, Model 2 took the absolute difference from the optimal stimuli as
chosen by the test subject and the presented stimuli. Specifically, the values for each of the
attributes were compared between the ideal stimuli (as chosen by each subject) and the given
stimuli, the absolute value of the difference was taken for each individual attribute to give a
measure of distance from the ideal each attribute was for the given stimuli. (See Example 1) Model
3 combined the two previous models by multiplying the attribute weighting by the distance from the
ideal each attribute had for the given stimuli.
Example 1-Distance From Ideal, Subjective Preference Polling
Ideal stimuli
Given Stimuli
Distance from Ideal
3111221
1111111
2000110
222229"
1222213
1000011
In stage 2 an individual level regression model (main effects only) was done. I then used the
regression coefficients (1-7) as a measure of the weight of that attribute on the overall evaluation (I
also used the intercept). The ratings were taken as the dependent variable and regressed on the 7
attributes.
It is important to that the reader understands that in all cases the models were estimated for each
individual and that the parameters of the model were later used as the base for the analysis.
20
All this created the dataset. Next, I began to look into the data set resulting from the statistical
analysis.
I constructed a measure for the difference between the intercepts of task 1 and task 2. This
measure captures the overall bias of the experiment. Overall, the results of the mean difference
between task 1 and task 2 was negative for any test. This is statistically significant, and may be
interpreted to mean that people discriminated less as the exam wore on. It may also mean that
there are limits on people's patience. Despite this, I was able to support several of my hypotheses
through the analysis of the data.
Hypothesis 1-Assertion that objectivedata willbe better than subjectivedata.
To perform this analysis, I separated the data along the lines of the objective and subjective tests.
Specifically, I looked at the means of the three types of subjective tests and the mean of the three
models of objective tests. The mean came from the difference between task 1 and 2 based upon
the regression coefficients developed.
Looking at the mean derived from the regression from the two tasks and then taking the difference
between the means of task 1 and task 2; I was able to determine that the subjective preference
polling fared worse than the objective tests. This supports my assertion in hypothesis 1. That is,
those subjects who explicitly stated their preference for specific ultimately did not choose stimuli
along those same weighted attributes in the final test. In other words, the test subjects initially
placed a high value on certain attributes, only to ignore images with those attributes in light of
viewing the stimuli as a whole.
21
Figure 4- Mean Difference for Subjective and Objective Tests
Figure 5- Mean Difference for Individual Attributes of Subjective and Objective Tests
30 -
---
20
10
Ile!
0
-10
-20
-30
-40
Objective
-U-Subjective
22
Motion Blur
Background
3.055
-7.97
1.9235
1.53
-34.716
-9.234
Density
Color
Pbintalized
Saturation
Light/Dark
-2.8365
21.179
-12.3695
-8.714
3.873
3.523
-3.606
-5.391
Inferring preference through observation of past behavior overall did much better than explicit
statements by the test subjects, furthermore the information inferred from single orthogonal stimuli
isfar superior to any other method. The results of plotting the differences in the means can be
viewed inFigure 4. The results of plotting the values for the specific attributes can be found in
Figure 5.
Hypothesis2 - Orthogonalstimuliwillprovide better opportunityto learn than conelated
stimuli.
Next, when evaluating the statement of hypotheses 2, I separated the data along the lines of
orthogonality and correlation. As shown in Figure 6, the mean of correlated data was always further
from zero than that of the orthogonal data, except inthe case of the multiple stimuli. That isto say,
the data does support hypothesis 2,which asserts that orthogonal data will provide a better
opportunity to leam utility than correlated data.
Figure 6 - Mean Difference for Correlated and Orthogonal Tests
2
-
-
-
1
0
01 -
E3 Correlated, Choice
0-2
Correlated, Multiple
0-3
-4
Correlated, Single
0 Orthogonal, Choice
-6
-7
Orthogonal, Multiple
U Orthogonal, Single
-8
Orthogonal vs. Correlated
The results of plotting the values for the specific attributes can be found in Figure 7. Clearly the
orthogonal single test has the least variance on any attribute.
23
Figure 7 - Mean Difference for Individual Attributes of Correlated and Orthogonal Tests
50
---
40
30
20
-10
-
-20
-30
----
Correlated
-40
Color
Pbintalized
Saturation
Light/Dark
Notion Blur
Background
iultiple
11-542
-14.106
-20.329
3.409
-30.577
2.523
Density
2Orthogonal -19.117
2.343
-0.402
0.451
8.272
-34.535
Correlated Single
-2.691
39.802
-29.69
-16.312
3.796
-15.634
2.984
Orthogonal Single
-2.982
2.556
4.951
-1.116
2.314
-0.306
0.863
Nkiltiple
-7.551
Hypothesis 3 -Sets of stimuli will provide better opportunity to learn than individual stimuli.
The data from my experiment does not support this result, in fact itsupports the opposite result that single stimuli provide more opportunity to learn than sets of stimuli. An interesting observation
is that there is great variation in the single stimuli tests, whereas there is little variation in the
multiple stimuli tests. This may be interpreted to mean that the subjects tended to give a relative
rating when they saw the multiple stimuli and an absolute rating when they saw the single stimuli.
From this one can draw the conclusion that how a response it elicited determines the result. As
shown in Figure 8, the orthogonal single stimuli test provides so much information, that it
overwhelms all other data. The results of plotting the values for the specific attributes can be found
in Figure 9.
24
Figure 8 - Mean Difference for Single Stimuli vs. Sets of Stimuli Tests
21
0
Correlated Single
*Correlated Set
o Orthogonal Single
oOrthogonal Set
-3
c -4
E -5
-6
-7
-- - of Stimuli
Sets
vs.
Single
Figure 9 - Mean Difference for of Individual Attributes of Single Stimuli vs. Sets of Stimuli
Tests
30
-
20
-
0
U
10
0
0
0
a
-10
i
-20
*
-30
E~siy R~ntlie
olr
-40
---
-
Single
Set
Stuato
Lgh/Dr
Light/Dark Mlotion Blur Backgroun
d_
Mtin
lr
acgru
Density
Color
Flointalized
Saturation
-2.8365
21.179
-12.3695
-8.714
3.055
-7.97
1.9235
0.905
-3.7875
-7.254
-9.939
5.8405
-32.556
-2.514
25
Hypothesis 4 - Sets of stimuli given conelated stimuli provide opportunity to learn than sets
of stmuli given orthogonal stimuli.
Referring back to Figures 5 and 6, one can clearly see that the case of the multiple stimuli for
correlated data produced a mean closer to zero than the multiple orthogonal stimuli, although it
was not statistically significant. Unfortunately, this means that I cannot support the claim made in
hypothesis 4.
26
Discussion
The fact that subjective preference polling fared worse than the objective tests can be explained in
several ways. Test subjects may have been unwilling to share truthful information about their actual
preference. This is a known and widespread problem for electronic data gathering, where
individuals are concerned over issues of privacy and security. It is interesting that subjects who
participated in the objective conditions were willing to share veridical information, which perhaps
can be explained by the fact that they unwittingly gave their information rather than being asked
outright.
The implications of this difference in sharing of truthful information is especially critical at the
beginning of the sales or new product requirements cycle when the merchant is attempting to learn
about the customers preference. Through personal observation in mry work as a software product
manager, people are more willing to talk about themselves and their work than they are to answer
specific questions. By developing relationships with customers and learning about how they do
their daily work, I have been much more successful in developing product requirements than
through survey methods. The results of this experiment confirm this observation.
Another reason that subjective preference polling may have been less successful is that
customers, who are used to making holistic decisions simply were unable to differentiate on the
individual attributes. That is, test subjects may have placed a high value on an individual attribute
only to find that itmay not have been as important in light of all the attributes together. How often
has one determined that price was the most important attribute, only to find that one would be
willing to pay more for a product that was slightly more expensive, but far superior in other
attributes than the least expensive model? While itis not statistically significant, several test
subjects remarked that Attribute 6 (Motion Blur) was a determining factor for them, in addition the
difference data from stage 1 to 2 on this attribute was very high for the subjects in the subjective
condition (See Figure 5). However the results showed that overall this attribute did not alone
determine preference.
Based upon these results, I would recommend that any site which is attempting gain that intimate
understanding of customer needs, which was previously discussed as critical for making customercentric business decisions, follow a method which infers customer preference rather than outright
asking for information.
Correlated data has an inherent problem that itdoes not provide a large variation of attributes on
which to make distinctions. To the extent that similar sets of data may have large amounts of data
27
on one attribute, and very little on another one can easily ascertain that correlated data does not
provide a complete picture of the attribute universe. Due to the greater variety of stimuli Within
orthogonal data, more information can be inferred with less training data.
One observation is that smart agents that attempt to infer utility will become "smarter" quicker if it
looks at orthogonal data vs. correlated data. Another important observation is that not all data
about a user needs to be saved. Clearly, itwould be sufficient to save disparate data and toss old
data which is similar to the new. This would solve one disadvantage previously mentioned which
mentioned that large data sets are prohibitive to maintain.
An example would be an agent that records my musical preference. As I order music from
different genre's and artists, the agent would be able to leam about my tastes along many
attributes. Howv&er as I order music by the same artist the agent is only able to determine that I
like that particular artist more than others, that is itonly leams about me on a small number of
attributes. Given that there are more attributes in the set that contains all of the artists and genre's
that I have previously enjoyed; itwill be easier for the agent to relatively determine my preference
for a new album if the agent has information which spans the entire orthogonal set of my past
purchases. Itis not necessary to maintain the information that I have ordered multiple albums from
the same artist as this provides little new information.
When looking across the dimensions of sets vs. single stimuli, my data showed that the single
stimuli provided a better result. This may mean that single is inherently better, especially in the
case of the orthogonal single stimuli, or itmay mean something more. It may mean that there is
something inherently different between choosing amongst several altematives (already having
made the purchase decision) and choosing a product in itself.
There was little variation between the orthogonal and correlated sets of data, which means that
given multiple stimuli it may not matter whether the choice is between similar or completely
different products, only that the purchase decision has been made. Clealy an understanding of the
customer is required to determine how to best elicit a response.
Inthe case of decision processes that are concerned with choosing a product in of itself, the
current work would recommend inferring from orthogonal data sets the values of customer
preference for various attributes. Examples of types of products in this category would be electronic
equipment, luxury items, recreational purchases; as each of these types of products are sold
based upon the decision to have them or not.
28
In the case of decision processes that are chosen from amongst alternatives, that is the purchase
decision has been made and now the consumer is choosing between brands or models, the
current work does not have a recommendation since the standard used (task 2) was ratings single
stimuli. Based on the analysis of the decision process, I would recommend doing something
comparative as information derived from orthogonal stimuli provides the largest amount of
information with the lowest amount of data.
My motivation for beginning this work was to determine which method of utility profile would be
most successful in determining customer preference. While I have demonstrated that there are
differences in the measures of success amongst the methods, I have also come to the conclusion
that understanding who the customer is and where they are in their purchase decision is required
prior to selecting a method. I am able to conclude that if your customer is choosing a product in and
of itself, then single orthogonal stimuli from which a preference is inferred has the highest measure
of success.
Next Steps
One of the conclusions that I have drawn was that the single stimuli had the highest measure of
success. This may have been due to the fact that task 2 was a single stimuli. To prove that the
central issue is the match between the elicitation mode and the criteria, itis recommended to
repeat the entire experiment, with task 2 as a set. If in the next experiment the results are the
same, the conclusion will be that the single orthogonal method is overall superior; however if in the
next experiment the multiple method will be superior, the conclusion will be that that it is the match
between stage 1 and stage 2 that is crucial.
Another future step might be to develop a smart agent that utilizes these methods and suggests
stimuli based upon task1 in task 2. One could then test in real time other issues including
performance, magnitude of error, number of errors, distance from the ideal and user satisfaction
with the algorithm.
29
References
1. Special Issue on Information Filtering, Communications of the ACM, Vol. 35, No. 12,
December 1992.
2.
Barrett, R., Maglio, P. P., & Kellem, D. C. (1997) How to personalize the web. Proceedings of
the conference on human factors in computing systems (CHI '97). New York: ACM Press.
3. Bradely, Usana M. Fayyad, O.L. Mangasarian. Data Mining: Overview and Optimization
Opportunities
4. Clarke R. "Extra-Organisational Systems: A Challenge to the Software Engineering
Paradigm"' Proceeds IFIP World Congress, Madrid, September 1992.
5. Deerweester S., Dumais S., Fumas G., Landauer T., Harshman R., "Indexing by Latent
Semantic Analysis", Journal of the American Society for Information Science, Vol. 41, No. 1,
1990, pp. 391-407.
6.
Fox, A., & Brewer, E. A. (1996). Reducing WWW latency and bandwidth requirements by realtime distillation. Proceedings of the fifth international World-Wide Web conference
7. Goldberg D., Nichols D., Oki B. and Terry D., "Using Collaborative Filtering to Weave an
Information Tapestry", Communications of the ACM, Vol. 35, No. 12, December 1992, pp. 6170.
8. Benjamin N. Grosof, David W. Levine, Hoi Y. Chan, Colin J. Parris, and Joshua S. Auerbach.
"Reusable Architecture for Embedding Rule-based Intelligence in Information Agents" (Dec. 01
1995). Proceedings of the Workshop on Intelligent Information Agents, at the ACM Conference
on Information and Knowledge Management (CIKM-95), edited by Tim Finin and James
Mayfield. Held Baltimore, MD, USA, Dec. 1-2, 1995.
9.
Resnick Paul; lacovou, Neophytos; Sushak, Mitesh; Bergstrom, Peter; Riedl,
John;''GroupLens: An Open Architecture for Collaborative Filtering of Netnews", Proceedings
of the CSCW 1994 conference, October 1994.
10. Resnick, Paul; "Filtering Information on the Internet". Scientific American, March 1997.
30
11. Rich, Elaine; ''User Modeling via Stereotypes", Cognitive Science, Vol. 3, pp. 335-366, 1979.
12. Rosenfeld, L. B., and Holland, M.P. "Automated Filtering." Online; May,1994
13. Robert Rosenthal and Ralph Rosnow, "Essentials of Behavioral Research: Methods and Data
and Analysis", McGraw Hill, second edition, 1991.
14. Ryan, K KIlosterman, S.; Patil, S. "Optimization in Data Mining". Prepared for Prof. Tom
Magnanti - Spring 1998.
15. Upendra Shardanand, "Social Information Filtering for Music Recommendation", MIT EECS
M. Eng. Thesis, also TR-94-04, Learning and Common Sense Group, MIT Media Laboratory,
1994.
31
Appendix I
The following instructions were sent to the subjects requesting their assistance with data collection.
Instructions:
In this study, you will be shown a series of images and asked to state your preference for each
image.
The images will vary on seven attributes (dimensions).
eDensity - Describes the number of circles in an image.
*Color Family - Describes the hue of the circles.
ePointalization - Describes the size of the points that make the individual circles.
eSaturation - Describes the strength of the color within the circles.
eBrightness- Describes the amount of light in the circles themselves.
*Blur- Describes the crispness of the circles.
eBackground-
Describes the background color of the image.
An illustration of these attributes can be seen in the two images below.
There will be two steps in this study. Please evaluate all the images in both stages. If you cannot
complete the entire study, please do not continue as it may skew my results. The entire time to
complete the study should be approximately 10 minutes.
In the first phase of the study, you will be given either a table of questions about your preferences
or a series of images. Please follow the instructions on the screen.
In the second phase everyone will be shown a series of 40 images and asked to rank them from 100 to 100. -100 means you really hate the image and +100 means you really love it.You may
also choose any number in between to rank the images.
32
"
e
e
Please wait until all the images have loaded before you begin to rate them.
There are no right or wrong answers so please indicate your first reaction to the images.
Please feel free to print and refer to these instructions as you complete the exams.
Thank you for taking the time to assist me with my data collection for my thesis. All submissions will
remain anonymous and will in no way be used for any other purposes other than this research.
To begin please point your web browser to http://iesi.mit.edu/kir/thesis.nsf
33
Appendix I
The following pages are screen captures of each of the possible screens that the subjects may
have encountered.
t
r
Kimberly
s J Ryan
L
T hesis
Netscape
rut ~
tC
~Lk
~-.xV~
-~
PT eference Test : Phase 1
-The images in this database have seven attributes. (see examples)
* First, please indicate yourpreference for each attribute in the "Your Preference" column. Choose so that each menu selection is such
that it reflects your preference for this attribute.
" Second, next to each attribute indicate how important it is foryour overall esthetic evaluation. The numbers of all seven weighting must
add to 100.
* Select Cetixue when done
-lDensity-
Saturation -High
Brightness- Dark
Blur-uone
Background- Black
*
Br
Density- Medium
FanlyColor- Purples
Pomntalnzaton- Medium.
Saturation- Medium
Bghtness- Light
Blr- Yes
Background- Grey
Fsele
in
I-Select
slect
Backpan
High
FamilyColor- Blue/Green
Massachusetts Institute of Technology Copynght @ 1999 KimberlyJ. Ryan
H-AtiobtTsnt
*90aai
j10$ ta 0:
D0ne
F
wotaa-swWglKibry
n-*1
J.-
|3
~aVlrei~
Figure A 1 - Attribute Test, Phase 1
34
Peference Test : Phase
I
Massachusetts Institute of TechnologyCopyright@ 1999 KimberlyJ. Ryan
Figure A 2 - Correlated Test, Phase I
35
r
ce Test: Phase 1
Massachusetts Institute ofTechnology Copyright@ 1999 KmberlyJ. Ryan
Figure A 3 - Orthogonal Test, Phase I
36
reference Test : Phase
I
:
DISIM
COMPIETELY
-100.....
..
10aedag
wokrioa"eS
NEUTTRAL
.......
i
IMGE
COMPIETELY
.....
100
Massachusetts Institute of Technology Copyright @ 1999 Kimberly J. Ryan
--
O4-a
Se T
Figure A 4 - Orthogonal Set Test, Phase 1
37
'reference Test : Phase 1
DISLIKE
NEUTRAL
COMPI1ELY
.......
....
LIKE
COMPLEIELY
...
100
j
Massachusetts Institute of Technology Copyright@ 1999 KimberyJ. Ryan
Figure A 5 - Correlated Set Test, Phase I
38
Tliak Ye
Ifysu are fieresSed in the results of tis expernmwt, plese seman emml
kjr@mitedu. Iwlibe hgy to sharn my flnuqs vit yu.
Massachusetts Institute ofTechnology Copyright@ 1999 KimberlyJ. Ryan
Figure A 6 - Final Screen
39
Kimiberly J Ryan T
1hesis
Netscape
RNA F,
reference Test :Phase 2
Pleae give a emr imea yeurprefrence hr the preferred image:
Use ayxumber bewee-100and+100.
DISLTEF
COMPLETELY
-100 ........
inage
NEUTRAL
LIKE
COMPLEIEY
.....100
16@f40
Massachusetts Institute of Technology Copyright
FigureSC 7sia
1999 Kimbedy J.Ryan
l Tes Phase 2(amfo at
Figure A 7 - Final Test, Phase 2 (Samne for all Phase 1 tests)
40
Appendix III
Data Set Matrices
Table A I - Correlation Matrix
1i
1
1
1
4
2
1
1111421 jpg
2
2
2
2
3
2
2
2222322.jpg
3
3
3
3
2
2
3
3333223.jpg
1
1
1
1
1
2
1
1111121.jpg
2
2
2
3
4
2
3
3
3
3
4
2
3
3333423.jpg
1
1
2
1
4
2
1
1121421.jpg
3
2
2
2
3
2
3
3222323.jpg
3
3
3
3
3
2
3
3333323.jpg
3
3
3
3
3
2
1
3333321.jpg
3
3
3
3
3
1
3
3333313.jpg
3
3
3
3
1
1
3333311.jpg
2
2
2
2
2
2
2
2222jpg
2
2
2
2
2
1
3
2222213.jpg
2
2
2
2
2
2
2
2222222.jpg
22
1
2
1i
1
22
1
1
2
1
1222212 jpg
1112211I.jpg
3
1
1
1
1
2
1
3111121.jpg
3
1
2
2223422.jpg
a<->b
b<->c
c<->d
d<->e
e<->f
0.62718
0.962622
0.882403
0.454796
0.358057
a<->c
b<->d
c<->e
d<-f
e->g
0.556187
0.930502
0.503953
0.0456
0.17622
a<->d
b<->e
C<-> f
dc->g
0.394157
0.135333
0.53110
a<->e
d<->f
e->g
0.503953
0.045596
0.543961
a<->f
b<->g
0.051614
0.59095
0.5201
f<,>g
avg correlation
0.0378
0.474191
41
Table A 2 - Orthogonal Matrix
111
2
3
3
2
2
3
1
1
1
11111.jp9
1
1
2
2332112.jpg
1
2
3
3223123.jpg
1
2223211.jpg
1
2
21
2
22
3
1
1
1
2
1
3
3
2
2
3
3
3
2
3
1
2
3
3
1
1
3
2
2
2
1
3111212jpg
1
-
3
1332223.jpg
1
3332311.jpg
1223312.jpg
3
1
1
1
1
2
2
1
3
3
2
1
3
2133213.jpg
3
2
1
2
1
1
3321211.jpg
3
2
1
4
2
2
2321422.jpg
1
2
4
1
2
3
1212.jpg
3212413.jp
1133411 j
1
1
3
3
4
1
1
3
1
3
3
4
2
2
3133422.jpg
1
4
1
3
1321413.jpg
1
2212411.jpg
1
1
2
2
1
2
4
1
1
3
1
3
3
1
2
2
3
1
3
2
a<>b
b<->c
-007335 01
c<->d
d
e
7-
0285714
1313313.jpg
-
1
0<->f
f<-'g
0.05
01257
c<->f
d<->g
0-074.12-0.1-0.1420.
ac->d
b<->e
ec.>g
0.0721
0.0459
0a<ed-$f
0.A2D010 a- 4-0-1252940.142
42
2111323.jpg
2
2231321.jpg
avg correlation
Table A 3 - Correlated Set
1
1
1
111421.jpg
-I
22jpg
4
2
111321Jpg
1
1
1
I
1
1 05
-0.
-od
-
-
- -
0.
S22jpg
-
9
1
4
1
1
2
112121
jpg
1 11211 jpg
1
0
1 0.
0.852
3421 jpg
1
2
0.81649
3411.jpg
1
1
S2jpg
13312.Jpgjp
3
3
1 6
1
1
0.
0
1 111121Jpg9
1
1
1
111222.jpg
2
1
1
1
1
22222jpg
Jpg
21111
3
3422pg
0577
0. 5771
0
0.
10.
1
222ijpg
21
2
11
1
1jp
22jpg
057
0.
0.9
4
0.8164
3423.jpg
3
3
1
1
1
2
1
1
1
2
1
1 1112114pg
2
1
13312114pg
11
1
10.3 331
-0.52223
i
1
1
1211.jpg
121421.jpg
4[1
2423.jp'g
1~~r 1 1
0.=
0.5
0.870
3
2
1112114pg
11
1 0.577
0.577_
22223ng
|
2
2
3
2"""")
12
22221 23jpg
32
3323323Jp9
43
1
1
33
2
3
3323jpg
1
11411.pg
2
1
1
4
1
3
3
3
3
2
22.jpg
3
3
1
1
1
1331111.jpg
0174072
1
022941V1 022941
2
3
3
1
2
1
2122322jpg
1
1
1
23
1
4
3
2
1
1
3
3
1122322jpg
0 17778
--
S
1
1
1
1
2111113jpg
3
3
1
3
1
333111jpg
3
2
1111322jpg
3
3
1
1
3
2
1
0174078
.3
5
65773
3
S
3
1
3322132jpg
3
2
1
1
1
1111jpg
1
1
1
1111 jpg
3322jpg
1
0.147
1 0357
1 0.
2
3
3
2
1
2
2
2
2
1
1
32122.jpg
11 123.-jpg
-3
1 1122111jpg
2
2 1
1
1
2
-
3
1
3
3
0.301511
1
0.5'-
2
2
4
2
3t_
1
1 0l 407810.2581
0.447214i
2
2
2
1
1
11
2
22jpg
22
jpg
1113111 jpg
1
1
31
22
2
222223.jpg
1
1 0.5773
213.jpg
111222.jpg
2
2
1
3
1
2
1
2
0.9044
0.81649
0.
0081649 0.
0.81649
2222.jpg
2
1
1
1
33lljpg
1 33131.p
333
11
1
3
1
21211jpg
1
3
3
1.00451
[2
2411jp
1 332.p
1
1
-
1
-
0.
05
1 0.301511 0.770
-3
44
1
0.577
0.57735
0.
0.81649
0.852
21112112.jpg
1
2
1
11333311jpg
I
1
2 1222212jpg
2
2 222222jpg
2~111223jpg
1
4
2
2
1
3
1
1
1
33422.jpg
1 232211.jpg
1
2
2
1 111121.jpg
1
22422.jpg
1
1
3
3
1
1
1
1
1122112.jpg
1
1
111112.jpg
1
1
0 81649
0.5
1
1 0.544331
1
1
3
1
070710
1
1
1
2
2
0.5
0.426401 0.301511 0.
0577
1 111111.jpg
1
1222422.jpg
1
4
1
0.73854
23411.jpg
1 311321.jpg
3
1
2111.jpg
11132211.jpg
1
4
3
1
1
3222.jpg
1
1
3311.jpg
0.192
2
3
0.301511 0.301511 0.852
1
2
3
1
21222.jpg
1
0.
0.426401 0.81649
0.426401 0
1 1112211.jpg
1
1
1
3321.jpg
0.816497 0.333333
0.
0
0.577
1
2
3
0.4082
0.5
45
Table A 4 - Orthogonal Set
1
3
3
2
1
1
3
2
2
1
3
2
1
3
3
2
2
3 213223.jpg
1
3
2
1 1321321.jpg
2
1
3
2
22
3213222.jpg
2
1
2
2
1321221.jpg
0 02
-0426401
3 2123.jpg
3
1
3
2
1
1
21332112.jpg
2
3
2
4
2
2422.jpg
2
1
3
2
2
3 1213223.jpg
.05
-1 -0.188982 0.750929
0.
-O
3
2
2
2
3211.jpg
11
1
1
1
3
2
4
2
2 1132422.jpg
2
2
1
3
2
2
3 2213223.jpg
2
3
2
1
3
2
1
0.81649 7
-0.5
3
1
1
1
2
3
2
1
1
3
2
-0.707107
0.426401
1
3
2
3
1321.jpg
26401 -063634 0.522233
11 1212.jpg
2
1
-
4
2
1 232421.jpg
-
2
1
3 213213.jpg
4
2
1322421.jpg
C
0
1 -0.904534
3
2
2
2
31332223.jpg
1
3
2
1
1
2 132112.jpg
2
1
3
4
1
33213413.jpg
2
1
3
1
1321311.jpg
1
-
1
-
0.426401 0.316228-0.258199 0.522233
0.636364 -0.090
2311.jpg
3
3
3
3
1
1
3
1
3
4
2
2 132422.jpg
1
1
1
2
3
2
3 1113223.jpg
-
1
32
0.301511 0.301511 -0.904534 -0.707107
46
3 2132423.jpg
4
2
1
1
2 2132422.jpg
2
-0090
2
3
2
013484 0.774597 0.522233
I1 ......332;112.jpg
1
3
-42401
4
-0.
-0.090909 -0090
11111.jpg
1
1
1
1
23321.jpg
0 0522233
1
2
2
3
3
1
2 223312.jpg
2
1
3
2
1
2
2 2132122.jpg
3
2
1
3
3
2
3 3213323.jpg
1 321121
-0.577
-0. -0.426401 0.
-0.426401
1
1
2
--
1
1
212113.jpg
1 331321.jpg
-4301511
0.22941
4
1
2413.jpg
1
2
1
1212.jpg
2 1
3
3
3
3
2
131122.jpg
1
1
1
1
213113.jpg
1
1323321.jpg
0301511 -0
05
-05
-0426401 -0.
1
1 211321.jpg
-
0577
-0301511
05
0426401
1 222311.jpg
1
3
2
21422.jpg
1
2422.jpg
4
1
1
1
1
2
0.70710
1
2
2
113223.jpg
3
1
1 121311 .jpg
212413.jpg
1
4
1
0.81649
0.17407
-0.426401 -0
0.70710
21211.jpg
123222.jpg
2
2
1
1
1
1
1
3
2
133213.jpg
1
2
2
-0.577
-0301511
1
1
122322.jpg
-
05
0 852
1
132412.jpg
1212222.jpg
2
1
1
-022941
0.426401 0.301511
1
4
3
1
1
111323.jpg
3
1
1
1
Jpg
232422.jpg
4
1
223323.jpg
1
1
4
1
4
2
-0.5
-0.577
0-57735
I
111213 21.jpg
3
1
1
1 133411.Jpg
1
1 32421jpg
3223.jpg
2
1 333321.jpg
1
-0.301511 -0.17407 -0.3
-0 5
-05
0
1
3
133422jpg
1
21
22412.jpg
2
1
2
2223.jpg
47
1
0 0.707107
0.5 0.707107
-0.636364 -0.426401
3
1
-
1I
1 321413.jpg
1
3
3
4
1
2 2133412.jpg
2
3
2
2
2
3 3232223.jpg
3
2
1
2
1 1321421.jpg
1
-
1
1 2124111.jpg
2
22422.jpg
1
2
2
4301511
-0555
0
4
-0.636364
1
1
3
3
3
2
1
1
3
2
3
1
2
11323121.jpg
3
1
3
3
1
31313313.jpg
2
1
2
2
1
1
12122111.jpg
3
1
2
2123122.jpg
2
2
1
1 332211I.jpg
1
3
.707107 0.30151 1 -0.523
-0
-0943
1
2
2
2
2
2
3
2
3
1
1
3
-0.81649
48
3
3
2
1
0.5740
052223
0 0.707107
3
1
3322.jpg
2
-0.5
1
1 1321211.jpg
1
2
1
2
3
0.1 74078
2
-
1 231321.jpg
2
1
-0426401 0.816497 -08
-
1222422.jpg
33223213.jpg
1 311321.jpg
Table A 5 - Final Test
1
1
1
1
111421.j
-1
2222322.j
3333223.j
1
1
1
1
1
1111121.j
1
-
2223422.j
23321 12 .j
1
1
322312 3.j
1
22232 1l.j
1
1
1
1
311121 2 j
1
1232323.j
1
S
2122121.j
-1
-
2123413.j
1
1
1
1
1
1
1
1
1
213141l.j
1
1212221.j
1
1
1
1
1
2123212.j
2121422]
1
1
3212122.j
1
1
1
1
1
2211311.j
1
3131411.j
1
2122221j
S1222323.j
1
123213.j
1
1
3
2
1
1
1
21 324 11.j
1
1222221.j
1
2
2122212.j
1
1
2212311j
i
-j
-
-
2
1
1
2221422.
1
3212322j
1
1
1
3231311.j
1233323.j
1
1
1
1
1
1
1
-
1
1
1
2121413.j
1
1
1
1
1
2131111.
1
1211221.j
2122312.j
1
1
1
1
-
11
2112121.j
2212321.j
2121212j
49
3
2
1
2
1
1
2
3212112.jpg
3
1
3
1
1
1
1
3131111.jpg
a<->b
b<->c
c<->d
d<->e
e<->f
f<->g
7 0.096875
0.062 0.03653 0.1111
-0.088247
a<->c
c<->O
c-f Iec->g
0.14816 0.365
0.192
0169419 0.1466
a<->d
b<->e
b<->d
0.050511 -0.0
a<->e
c<->f
0.165361024
0.038278375
e<->g
-0.102169 0.339276 0.221463
ac-Sf
0.0366
d<->g
-0.2310 0.484714
d<->f
avg correlation
0.076428394
bc->'g
429814 0.221855
-0.012714643
a<->g
0.096275
50
0.016045 867