Success Measures of Accelerated Learning Agents for e-Commerce By Kimberly J. Ryan BA, Mathematics 1992 Boston College Submitted to the System Design and Management Program In Partial Fulfillment of the Requirements for the Degree of Master of Science in Engineering and Management at the Massachusetts Institute of Technology September 1999 Signature of Author: .... .t ................ . JI . . . .. Kimberly J.Ryan August 23,1999 Certified by......... .k. .................. . ............. Dan Ariely August 23,1999 Accepted By.............. . ......................... Tom Kochan August 23,1999 @Kimberly J.Ryan. All Rights reserved. The author hereby grants MIT the right to reproduce and distribute publicly, both paper and electronic copies of this thesis document inwhole or in part. Abstract One way to address the information explosion on the World Wide Web is to use an "information filtering agent" or "smart agent" which can select information according to the interest and/or need of an end-user. Two main types of smart agents are Feature Based Filtering (FBF) which builds a profile of the individual and Collaborative Based Filtering (CBF) which builds a profile of the market segment. With Feature Based Filtering (FBF), content is separated into classifications or attributes and presented to the user based upon a predetermined ranking of the content within that classification. To the extent that CBF is a shortcut for FBF, we suggest that FBF is a more promising approach. The results however should also apply to CBF. The methods by which we attempted to separate the leaming process for utility compared subjective and objective tests, single vs. multiple stimuli and multiple correlated vs. multiple orthogonal stimuli. This work examines the success measures of various utility leaming methods in an attempt to determine which method most successfully predicts user preference. The utility profiling methods examined were: attribute ranking, regression of orthogonal data, regression of correlated data, regression of orthogonal data sets and regression of correlated data sets. In conclusion, this work presents to the reader the conditions under which the successful methods may be used. 2 Table of Contents A B STR ACT ................................................................................................................................................... 2 TA B LE OF CONTEN TS..............................................................................................................................3 TABLE OF FIGURES AND TABLES........................................................................................................5 A CK N OW LEDG EM N TS ......................................................................................................................... 6 MOTW ATION.............................................................................................................................................. 7 FEATURE BASED FILTERING ......................................................................................................................... 8 A dvantages.............................................................................................................................................. 8 Disadvantages......................................................................................................................................... 8 MATHEMATICAL DESCRIPTION OF THE CLASSIFICATION PROBLEM........................................................... COLLABORAT E BASED FILE IG ........................................................................................................... A dvantages............................................................................................................................................ Disadvantages OF.THE.CLM.......................................................................................... MATHEMATICAL DESCR.ON OF THE CLUSTERING PROBLEM.................................................................. ISSU ES EXA N ED.................................................................................................................................. 9 I1 11 12 12 14 HYPOTHESIS ............................................................................................................................................. 15 ypothesis .......................................................................................................................................... 15 pothesis 2 .......................................................................................................................................... 15 Hy1pothesis 3................................................................................................ 15 Hypothesis 4............................................................................................................................................ 16 METHODS...................................................................................................... 16 SUBJECTS................................................................................................................................................... 16 PROCEDURE................................................................................................................................................ 16 A ttributeFiltering.................................................................................................................................. 17 CorrelatedStimuli ................................................................................................................................. 17 OrthogonalStimuli................................................................................................................................ 17 OrthogonalSet Stimuli..................................................................................... 17 CorrelatedSet Stim uli........................................................................................................................... 18 Final Test ... Stiul............................................................................................................................ 18 TECHNOLOGY DESIGN ................................................................................................................................ 19 3 RESUL S......................................................................... 20 ......................................................................... DESCRIP ION OF DATA ANALYSIS.............................................................................................................. 20 Hypothesis 1- Assertion that objective data will be better than subjective data................................. 21 Hypothesis 2 - Orthogonalstimuliwill provide better opportunityto learn than correlatedstimuli...... 23 Hypothesis 3 -Sets ofstimuli will provide better opportunityto learnthan individualstimuli............ 24 Hypothesis 4 - Sets ofstimuli given correlatedstimuliprovide opportunityto learnthan sets of stimuli given orthogonalstimuli........................................................................................................................ DISCU SSIO N ..................................... . ...................... ................................... 26 .................................... 27 REFERENC ES............................................................................................................................................30 APPEND IX I............................................. ................... ......................... .. ........................................... .. 32 APPEN DIX II..............................................................................................................................................34 APPEND IX Il .............................................................................. 4 .......................................................... 41 Table of Figures and Tables FIGURE I - THE CLASSIFICATION PROBLEM.................................................................................................. 9 FIGURE 2 - APPROXIMATE SEPARATING PLANE............................................................................................. 10 FIGURE 3 - THE CLUSTERING PROBLEM........................................................................................................... 13 FIGURE 3 - IMAGE EXAMPLES WITH ATRmuTEs SPECIFIED........................................................................... 16 TABLE 1 - DESCRIPTION OF ATTRIBUTE VALUES FOR ALL LEVELS OF EACH INDIVIDUAL ATTRIBUTE........... 19 EXAMPLE 1- DISTANCE FROM IDEAL, SUBJECTIVE PREFERENCE POLLING .................................................. 20 FIGURE 4- MEAN DIFFERENCE FOR SUBJECTIVE AND OBJECTIVE TESTS...................................................... 22 FIGURE 5- MEAN DIFFERENCE FOR INDIVIDUAL ATTRIBUTES OF SUBJECTIVE AND OBJECTIVE TESTS ........... 22 FIGURE 6 - MEAN DIFFERENCE FOR CORRELATED AND ORTHOGONAL TESTS............................................. 23 FIGURE 7 - MEAN DIFFERENCE FOR INDIVIDUAL ATTRIBUTES OF CORRELATED AND ORTHOGONAL TESTS... 24 FIGURE 8 - MEAN DIFFERENCE FOR SINGLE STIMUI VS. SETS OF STIMUL TESTS....................................... 25 FIGURE 9 - MEAN DIFFERENCE FOR OF INDIVIDUAL ATTRIBUTES OF SINGLE STIMULI VS. SETS OF STIMULI TESTS.....................................................................................................................................................25 FIGURE A 1 - ATTRiuTE TEST, PHASE I................................................................................................... 34 FIGURE A 2 - CORRELATED TEST, PHASE 1 .............................................................................................. 35 FIGURE A 3 - ORTHOGONAL TEST, PHASE I .............................................................................................. 36 FIGURE A 4 - ORTHOGONAL SET TEST, PHASE 1 ....................................................................................... 37 FIGURE A 5 - CORRELATED SET TEST, PHASE I .......................................................................................... 38 FIGUR A 6 - FINAL SCREEN .......................................................................................................................... 39 FIGURE A 7 - FINAL TEST, PHASE 2 (SAME FOR ALL PHASE 1 TESTS).......................................................... 40 TABLE A I - CORRELATION M ATRiX.............................................................................................................. 41 TABLE A 2- ORTHOGONAL M ATRIX .............................................................................................................. 42 TABLE A 3- CORRELATED SET ...................................................................................................................... 43 TABLEA 4- ORTHOGONAL SET ..................................................................................................................... 46 TABLE A 5 -FIN AL T EST ................................................................................................................................ 49 5 Acknowledgements There were quite a few people instnmental in my success in completing this degree. I would like to thank Dan Ariely for his tremendous efforts in assisting me with my thesis, John Williams for giving me the opportunity to attend MIT, the SDM program office (Anna Barkley, Margee Best, Matt Flynn, Dan Frey, Jonathan Griffin, Leen Int Veld, Mats Nordlund) for their organizational efforts, Ely Dahan for his assistance with setting up the design of my experiment, and William Aboujaoude and Steve Martin for initially encouraging me to pursue this degree. Two other people deserve special recognition for their heroic efforts. I would like to thank John Martin for his technical and emotional support in the last phase of this degree. And I would also like to give a special thank you to my son, Bradlee Ryan, for whom this degree was initially started and who suffered the most in my pursuit of it. AlWays fie two stings to your bow.. -W Martin 6 Motivation The most important element in marketing is an intimate understanding of customer needs; yet the markets for most products and services are extremely fragmented in today's economy. Customer intelligence is the key to business success in an increasingly competitive environment. The only way to keep customers happy is to meet their evolving needs, today, and, in the future. But understanding and responding to the needs of thousands of customers is extremely difficult. A company must be able to answer questions such as: who are my core shoppers? What are their shopping habits? What are their preferences? Armed with the answers to these questions, every person in the organization, from the CEO down, can make intelligent customer-centric business decisions. The result is higher customer acquisition, retention, and a healthier bottom line. Today's most successful online businesses are leveraging technology to do what human employees no longer can - get to know the individual wants, needs and preferences of every customer and dynamically personalize their offerings to each customer based on that knowledge. The most promising way to address the information explosion is to use an "information filtering agent" or "smart agent" which can select information according to the interest and/or need of an end-user. The general idea behind smart agents is that documents or products (which I will refer to as content in this paper) are recommended based on properties or attributes and their match with the user profile. In this work, I am trying to understand which of several methods is most successful in leaming and predicting customer preferences. Two main types of smart agents are Feature Based Filtering (FBF) which builds a profile of the individual and Collaborative Based Filtering (CBF) which builds a profile of the market segment. Both FBF and CBF employ data mining algorithms in order to work properly. In each case, the ultimate goal is to successfully predict content that satisfies the end user. 7 Feature Based Filtering Filtering works by learning about each individual's preferences through observing behavior, such as click-thrus or recalling past behavior, such as purchase histories and then asking the individual to rate a number of relevant items. With Feature based Filtering (FBF), content is separated into classifications or attributes and presented to the user based upon a ranking of the content within that classification. An example of this would be the web page that contains meta-tags for keywords. The user, poiling any standard search engine for one or more of the keywords would then find that particular web page. In this example, the keywords serve as attributes. The page ranking in the list of returned pages would be based upon the number of times the keyword was displayed, relative to the other pages. Another example, coming from the commerce perspective, would be where a shopper would be searching for a particular item. By polling on the attributes of the desired item, the shopper would be returned a list of available products that closely match the attributes desired. Advantages = New content is as likely as old content to be presented to the user given an equal - profile rating for a specific attribute. The ability to learn and adapt according to a specific individual's profile. No requirement that other users participate in the system. = Searches are generally optimized in terms of performance for queries on the indexed - attributes, and may therefore return results quicker. Improves the targeting of advertisements and announcements Disadvantages - Classification of content may be misleading. (e.g. irrelevant documents which happen to contain the keyword- a search on the keyword Lotus search returning information about the software company rather than the car; an online catalog that labels the color = of a sweater as "cream"; when the user is searching for an "off-white" sweater.) Little opportunity for serendipitous discovery (it is quite common to browse a brick and mortar shop for a particular item, and then find a related but equally satisfying item while there, e.g. searching the shelves for a particular book title, and then finding other - books on the same shelf that look interesting and buying both) Requires storage of personal information, which may result in concerns over privacy " Storage of preference may be economically prohibitive for large data sets - When there are many attributes and not much data, it may be too difficult to make an accurate recommendation. 8 Mathematical Description of the Classification Problem "In classification the basic goal is to predict the most likely state of a categorical variable (the class)." [3] The task of a classification algorithm is to estimate a function gwhich maps points from an input space x to an output space y given a finite sampling of the mapping given only a finite set of the mappings as shown in Figure 1. Figure 1 - The classification problem To formulate as a linear program, the problem is restated as estimating a classification function which assigns a given vector x into one of the disjoint sets A or B in n-dimensional feature space. The classification function g (x) has the following form: xeA 9(X)= 0 if 1 if xeB All the m elements of the finite point set A c R" (n-dimensional real space) are represented as m the matrix Ae R'x. Where each element of A is represented by a row in A. Similarly, the k elements of finite point set B as B e R' xn The goal is to distinguish between the points of A and B by constructing a separating plane P such that P={xIxe Rn,xT o=y}, 9 With the normal o e R and distance to the origin. The requirement is to determine o and y so that the separating plane P defines two open half spaces { x | x E Rn, xj( > y} containing nT mostly points of A, and {x Ix e R", x o < y} containing mostly points of B. Where |ll 12 i = ) 12)1/2 j=1 Hence we wish to satisfy Ao > ey, Bo < ey [I as far as possible. This can be done only if convex hulls of A and B are disjoint. Under this assumption 1 can be satisfied in some best sense by minimizing some norm of the average violations of 1 such as Figure 2 - Approximate separating plane 7 | 1wo||2 0 0 0 0 0 f(w, r) =min mjn CO,,m w,, 10 (-Aa)+ ey + e)I1+ k-||1(Ba) -ey +e)|1 [2] The linear programming formulation 2 obtains an approximate separating plane that minimizes a weighted sum of the distances of misclassified points to the approximate separating plane. For the non-convex problem of minimizing such misclassified points. Collaborative Based Filtering Collaborative Filtering works by learning about each individual's preferences through observing behavior, such as dick-thrus or recalling past behavior, such as purchase histories and then asking the individual to rate a number of relevant items. The technology then pools this information with knowledge gained from a community of other individuals who share similar tastes and interests into a cluster. Individuals are then grouped within their cluster so that the agent can make predictions. Because the collective preferences of a community are a predictor of how an individual in that community might like items he/she has not yet tried, the technology draws upon this knowledge to make recommendations with high predictive accuracy, although it is unknown how high the accuracy actually is. The Massachusetts Institute of Technology's World Wide Web Consortium has developed an interesting example of collaborative filtering. They have developed a set of technical standards called PICS (Platform for Internet Content Selection) so that people can electronically distribute descriptions of digital works in a simple, computer-readable form. Computers can process these labels in the background, automatically shielding users from undesirable material or directing their attention to sites of particular interest. The original impetus for PICS was to allow parents and teachers to screen materials they felt were inappropriate for children using the Intemet. Rather than censoring what is distributed, as the Communications Decency Act and other legislative initiatives have tried to do, PICS enables users to control what they receive. Advantages - Enhances knowledge distribution amongst communities of like-minded people. - Facilitates the creation of interest groups. - Recommendations are based upon the quality of content (as determined by the - group) rather than the objective properties of the content itself. Content need not be amenable to parsing by the computer. The mathematical program is described in detail in [3]. 11 " = Learms quickly Not limited by the number of attributes and missing data, i.e. the demands on the data are lower Disadvantages " Recommendations require a large community of users in order to be accurate " Recommendations are limited by the group norms and will not be tailored for a specific individual as they rely on the fact that tastes are not randomly distributed, but rather form general trends and patterns. - Sensitive to the individual's distance from the mean of the duster = Slow to recommend new products = Little is learned about individual consumers, which limits targeting of advertising to the interest group - Requires storage of personal information, which may result in concerns over privacy " Requires the sharing of information, which is an even larger concern for privacy - Storage of preference may be economically prohibitive for large data sets Mathematical Description of the Clustering Problem Clustering deals with the problem of assigning elements of a given set to groups or clusters of like points. The problem is formulated as a non-hierarchical clustering approach where the number of dusters or groups is fixed a priori. Given a set A ofm points in R represented by the matrix A e R, and a number k of desired clusters. We need to determine centers C, I= 1,..... k in R such that the sum of the minima over I e {1,...,k}of the 1-norm distance between each point A i = 1,.....,m, and cluster centers C is minimized. 12 Figure 3 - The clustering problem 0 0* Cluster *@r .0@e Distance 0. Given m points {x', x2, ..., x'} in n-dimensional real space R", and a fixed integer k of clusters, determine k centers in R, {c', c2, ... , e}, such that the sum of the "distances" of each point to a nearest center is minimized. The clustering problem is: || xm - c' mininn , cC c I,... where norm k 1|.11 is some arbitrary norm on R. The clustering problem is formulated as a bilinear program as m min imize c, d, t -d x' - c' k (t(e T di, d ,, i=1,...,m, I=1,...k, Subject tok 13 Issues Examined Smart agents have two distinct life stages: that of learning and that of performance. In the case of real-time adaptive agents, learning may continue into the performance stage so that the agent becomes "smarter" with each use. The learning process is important because itis the limiting factor by which we are able to understand the customer. Unless one listens to the voice of our customer, one can never expect customer pull for products and will always be working on the product push model of marketing. This work examines the leaming process. Learning may take place on many dimensions. Inthis work, I have studied the following aspects of the data in order to determine with dimensions allow for the greatest predictive accuracy: orthogonality of attributescorrelation of attributes for single stimuli, orthogonality/correlation within a set of stimuli, and subjective vs. objective preference polling. I first look along the dimension of objective vs. subjective polling to determine which is more successful in predicting preference. Itmay be the case that when polled about attributes and their relative importance, subjects are not as truthful or as knoAledgeable about their expected preferences as they could be. I want to determine if there is greater predictive accuracy by the methods that infer a preference than the methods that explicitly ask for a preference. This would test the dimension of subjective vs. objective learning. Within the methods that infer a response, the tests are separated into those that show correlated stimuli and those that show orthogonal stimuli. This test will determine if orthogonal stimuli provide more information about the subject than correlated stimuli giving a measure of success for single stimuli that have been separated by the orthogonal vs. correlated dimensions. Another test of this experiment includes sets of data where the stimuli are either correlated or orthogonal within each set. I will attempt to determine if the set of correlated images provides more information than the set of orthogonal images, testing the dimension of sets of data. This paper will attempt to compare the success measures of preference prediction across these dimensions of utility profiling. The utility profiling methods examined were: attribute ranking, regression of orthogonal data, regression of correlated data, regression of orthogonal data sets and regression of correlated data sets. The regression of orthogonal and correlated data weare single stimuli displays, while the regression of orthogonal and correlated data sets were multiple stimuli displays, the attribute ranking method was a subjective preference poll. 14 To expand on the above information, for both the correlated and orthogonal single stimuli, an individual level regression model (main effects only) was done. For the correlated and orthogonal set stimuli, two models were developed. One model was exactly the same as that for the single stimuli, using the rating as the dependent variable and regressed on the 7 attributes. The other model looked at the choice amongst the altematives that the subject made. For the subjective preference polling several models were developed. One model looked at the weightings of the attributes, while another looked at the ideal stimuli chosen and finally the third model combined the weightings and the ideal stimuli. A detailed explanation of the models developed can be found in the Results section of this paper. These methods may be used within the context of either a FBF or a CBF to further enhance their accuracy. Here I will only look at the FBF case, although the results should also hold for the CBF case as CBF is a special case of FBF. Hypothesis I expect that objective preference polling, where the individual explicitly states his/her preferences along with their personal utilityweightings on features will have the greatest success in retuming preferred images. Further, I expect that the orthogonal data will provide better predictive accuracy than the correlated data. Itis also anticipated that the sets of images will allow a greater amount of leaming than the individual images. Finally, itmay be the case that the set given correlated images provides more information than the set given the orthogonal images. Explicitly stated: Hypothesis I Objective data will be better than subjective data. This will be tested by comparing attribute rating vs. correlated/orthogonal for both single stimuli and sets of stimuli. Hypothesis 2 Orthogonal stimuli will provide better opportunity to leam than correlated stimuli. Comparing single orthogonal stimuli and sets of orthogonal stimuli vs. single correlated stimuli and sets of correlated stimuli will test this. Hypothesis 3 Sets of stimuli will provide better opportunity to learn than individual stimuli. Comparing orthogonal and correlated single stimuli vs. orthogonal and correlated sets of stimuli will test this. 15 Hypothesis 4 Sets of stimuli given correlated stimuli provide opportunity to learn than sets of stimuli given orthogonal stimuli. Comparing orthogonal sets of stimuli vs. correlated sets of stimuli will test this. Methods Subjects An e-mail was sent to the Massachusetts Institute of Technology's System Design and Management (SDM) classes of 1997,1998 and 1999 requesting participation in the data gathering of this experiment (See Appendix 1). The general demographics of the subject set included roughly 70% males who were in the age range of 30-40. Respondents were mainly from the US and currently reside within the US. Respondents typically had 7-15 years of professional experience as engineers and at least one post-graduate degree beyond the bachelor's degree. There were approximately 65 respondents who began the experiment, 93% of them completed both tasks. Procedure Respondents pointed their web browser to a web site where they were given the same set of instructions that were included in the e-mail requesting their participation (See Appendix 1). In this introduction stage, I explained to the test subjects that they will be asked to view a series of images and that they will be asked to state their preference for these images. All participants were shown the examples in Figure 3 in order to familiarize themselves with the types of images that they will be shown. Figure 3 - Image Examples with Attributes specified. Images Attributes 16 High Density Blue/Green High Pointalization High Saturation Medium Density Purples Medium Pointalization Medium Saturation Dark Light No Blur Black Background Blur Grey Background Once they began the experiment, they were given one of five tasks selected at random based upon the modulus of the time at which they first began the experiment. Each of the five tasks is explained in greater detail below. Attribute Filtering Participants in the Attribute Filtering experiment were shown a table (See Figure A 1)and asked to select their preference amongst the categories. They were also asked to divide 100 points amongst the categories in order to place weightings on highly valued characteristics vs. less valued characteristics. Once this task was completed, the participants began Phase 2 of the experiment, which is described below in the Final Test section. Correlated Stimuli Images displayed to the test subjects for this experiment were chosen such that they would be correlated to each other. A test matrix was built, checking the correlation of each attribute to the other and 20 images were selected such that the correlation between the attributes was as close to one as possible (for specific numbers on the correlation between attributes, see Appendix Ill) .The respondents were asked to rank their preference against twenty images in Phase 1 of the experiment. They were told to select a number between -100 and +100; where -100 meant that they really disliked the image, 0 meant that they were indifferent and +100 meant that they really liked the image (See Figure A 2). Once this task was completed, the participants began Phase 2 of the experiment, which is described below in the Final Test section. Orthogonal Stimuli Images displayed to the test subjects for this experiment were chosen such that they would be orthogonal to each other. A test matrix was built, checking the orthogonality of each attribute to the other and 20 images were selected such that the correlation between the attributes was as close to zero as possible (for specific numbers on the orthogonality between attributes, see Appendix ll). The respondents were asked to rank their preference against twenty images in Phase 1 of the experiment. They were told to select a number between -100 and +100; where -100 meant that they really disliked the image, 0 meant that they were indifferent and +100 meant that they really liked the image (See Figure A 3). Once this task was completed, the participants began Phase 2 of the experiment, which is described below in the Final Test section. Orthogonal Set Stimuli Images were displayed in sets of four to the test subjects for this experiment. Twenty sets, for a total of eighty images were displayed where each of the images was chosen such that it would be 17 orthogonal to other images in the set. Image 1 from each set came from the orthogonal matrix. The respondents were asked to select the individual image within each set that they preferred the most and then to rank their preference for that image in Phase 1 of the expemrient. They were told to select a number between -100 and +100; where -100 meant that they really disliked the image, 0 meant that they were indifferent and +100 meant that they really liked the image See Figure A 4). Once this task was completed, the participants began Phase 2 of the experiment, which is described below in the Final Test section. Correlated Set Stimuli Images were displayed in sets of four to the test subjects for this experiment. Twenty sets, for a total of eighty images were displayed where each of the images was chosen such that itwould be correlated to other images in the set. Image 1 from each set came from the orthogonal matrix. The respondents were asked to select the individual image within each set that they preferred the most and then to rank their preference for that image in Phase 1 of the experiment. They were told to select a number between -100 and +100; where -100 meant that they really disliked the image, 0 meant that they were indifferent and +100 meant that they really liked the See Figure A 5). Once this task was completed, the participants began Phase 2 of the experiment, which is described below in the Final Test section. Final Test Regardless of the utility examination given, each participant was asked to rate their preference for forty images in Phase 2 of the experiment. Each image was shown one at a time, similar to the display of information for the orthogonal and correlated stimuli display. They were told to select a number between -100 and +100; where -100 meant that they really disliked the image, 0 meant that they were indifferent and +100 meant that they really liked the image (See Figure A 6). 18 Technology Design A database was created using Lotus Domino R5 that was enabled for the WM. The database application was built to support both Internet Explorer 3.0 and above and Netscape 4.0 and above. It employed JavaScript and Java so that image display would be rapid after initially loading the images. One design trade off was that the initial loading of the images would take longer using this method, but the overall performance would be better. This was done to reduce participant frustration as the experiment wore on, and to hopefully produce a higher completion rate. Images were built containing one classification from each of seven attributes. The attributes were: 1)Sizes of Circles 2) Pointalization 3) Light/Dark 4) Saturation 5) Color 6) Density 7) Sharpness Each attribute had between two and four levels as displayed in Table 1 resulting in 1944 possible variations on the images. These images served as the stimuli by which we will elicit a preference rating by the test subjects. Itis presumed that the attributes of the images all represent independent variables. Table I - Description of Attribute Values for All Levels of Each Individual Attribute. ATTRIBUTE LEVEL 1 LEVEL 2 LEVEL 3 LEVEL 4 X3 X2 X1 -- Blues, Greens Purples Reds, Orange -- Pointalized 5 15 50 -- Saturation 50 0 -50 -- Light/Dark 50 25 0 -25 Motion Blur 0 10 -- -- Background Black Gray White -- Density Color 19 Results Description of Data Analysis In stage I for both the conelated and orthogonal single stimuli, an individual level regression model (main effects only) was done by regressing ratings (DV) on the values of stimuli on the 7 attributes (IV). I then used the seven regression coefficients (one for each attribute) as a measure of the weight of that attribute on the overall evaluation (I also used the intercept). For the correlated and orthogonal set stimuli, two models were developed. One model was exactly the same as that for the single stimuli, using the rating as the dependent variable and regressed on values of stimuli on the 7 attributes as the independent variables. The other model took all 80 stimuli and noted whether the test subject chose it or not and regressed these choice/no-choice measures on the values of stimuli on the 7 attributes. Because the dependent measure in this case was binary, I used logistic, rather than linear regression. For the subjective preference polling several models were developed. Model 1 was simply the weights given by the test subject, Model 2 took the absolute difference from the optimal stimuli as chosen by the test subject and the presented stimuli. Specifically, the values for each of the attributes were compared between the ideal stimuli (as chosen by each subject) and the given stimuli, the absolute value of the difference was taken for each individual attribute to give a measure of distance from the ideal each attribute was for the given stimuli. (See Example 1) Model 3 combined the two previous models by multiplying the attribute weighting by the distance from the ideal each attribute had for the given stimuli. Example 1-Distance From Ideal, Subjective Preference Polling Ideal stimuli Given Stimuli Distance from Ideal 3111221 1111111 2000110 222229" 1222213 1000011 In stage 2 an individual level regression model (main effects only) was done. I then used the regression coefficients (1-7) as a measure of the weight of that attribute on the overall evaluation (I also used the intercept). The ratings were taken as the dependent variable and regressed on the 7 attributes. It is important to that the reader understands that in all cases the models were estimated for each individual and that the parameters of the model were later used as the base for the analysis. 20 All this created the dataset. Next, I began to look into the data set resulting from the statistical analysis. I constructed a measure for the difference between the intercepts of task 1 and task 2. This measure captures the overall bias of the experiment. Overall, the results of the mean difference between task 1 and task 2 was negative for any test. This is statistically significant, and may be interpreted to mean that people discriminated less as the exam wore on. It may also mean that there are limits on people's patience. Despite this, I was able to support several of my hypotheses through the analysis of the data. Hypothesis 1-Assertion that objectivedata willbe better than subjectivedata. To perform this analysis, I separated the data along the lines of the objective and subjective tests. Specifically, I looked at the means of the three types of subjective tests and the mean of the three models of objective tests. The mean came from the difference between task 1 and 2 based upon the regression coefficients developed. Looking at the mean derived from the regression from the two tasks and then taking the difference between the means of task 1 and task 2; I was able to determine that the subjective preference polling fared worse than the objective tests. This supports my assertion in hypothesis 1. That is, those subjects who explicitly stated their preference for specific ultimately did not choose stimuli along those same weighted attributes in the final test. In other words, the test subjects initially placed a high value on certain attributes, only to ignore images with those attributes in light of viewing the stimuli as a whole. 21 Figure 4- Mean Difference for Subjective and Objective Tests Figure 5- Mean Difference for Individual Attributes of Subjective and Objective Tests 30 - --- 20 10 Ile! 0 -10 -20 -30 -40 Objective -U-Subjective 22 Motion Blur Background 3.055 -7.97 1.9235 1.53 -34.716 -9.234 Density Color Pbintalized Saturation Light/Dark -2.8365 21.179 -12.3695 -8.714 3.873 3.523 -3.606 -5.391 Inferring preference through observation of past behavior overall did much better than explicit statements by the test subjects, furthermore the information inferred from single orthogonal stimuli isfar superior to any other method. The results of plotting the differences in the means can be viewed inFigure 4. The results of plotting the values for the specific attributes can be found in Figure 5. Hypothesis2 - Orthogonalstimuliwillprovide better opportunityto learn than conelated stimuli. Next, when evaluating the statement of hypotheses 2, I separated the data along the lines of orthogonality and correlation. As shown in Figure 6, the mean of correlated data was always further from zero than that of the orthogonal data, except inthe case of the multiple stimuli. That isto say, the data does support hypothesis 2,which asserts that orthogonal data will provide a better opportunity to leam utility than correlated data. Figure 6 - Mean Difference for Correlated and Orthogonal Tests 2 - - - 1 0 01 - E3 Correlated, Choice 0-2 Correlated, Multiple 0-3 -4 Correlated, Single 0 Orthogonal, Choice -6 -7 Orthogonal, Multiple U Orthogonal, Single -8 Orthogonal vs. Correlated The results of plotting the values for the specific attributes can be found in Figure 7. Clearly the orthogonal single test has the least variance on any attribute. 23 Figure 7 - Mean Difference for Individual Attributes of Correlated and Orthogonal Tests 50 --- 40 30 20 -10 - -20 -30 ---- Correlated -40 Color Pbintalized Saturation Light/Dark Notion Blur Background iultiple 11-542 -14.106 -20.329 3.409 -30.577 2.523 Density 2Orthogonal -19.117 2.343 -0.402 0.451 8.272 -34.535 Correlated Single -2.691 39.802 -29.69 -16.312 3.796 -15.634 2.984 Orthogonal Single -2.982 2.556 4.951 -1.116 2.314 -0.306 0.863 Nkiltiple -7.551 Hypothesis 3 -Sets of stimuli will provide better opportunity to learn than individual stimuli. The data from my experiment does not support this result, in fact itsupports the opposite result that single stimuli provide more opportunity to learn than sets of stimuli. An interesting observation is that there is great variation in the single stimuli tests, whereas there is little variation in the multiple stimuli tests. This may be interpreted to mean that the subjects tended to give a relative rating when they saw the multiple stimuli and an absolute rating when they saw the single stimuli. From this one can draw the conclusion that how a response it elicited determines the result. As shown in Figure 8, the orthogonal single stimuli test provides so much information, that it overwhelms all other data. The results of plotting the values for the specific attributes can be found in Figure 9. 24 Figure 8 - Mean Difference for Single Stimuli vs. Sets of Stimuli Tests 21 0 Correlated Single *Correlated Set o Orthogonal Single oOrthogonal Set -3 c -4 E -5 -6 -7 -- - of Stimuli Sets vs. Single Figure 9 - Mean Difference for of Individual Attributes of Single Stimuli vs. Sets of Stimuli Tests 30 - 20 - 0 U 10 0 0 0 a -10 i -20 * -30 E~siy R~ntlie olr -40 --- - Single Set Stuato Lgh/Dr Light/Dark Mlotion Blur Backgroun d_ Mtin lr acgru Density Color Flointalized Saturation -2.8365 21.179 -12.3695 -8.714 3.055 -7.97 1.9235 0.905 -3.7875 -7.254 -9.939 5.8405 -32.556 -2.514 25 Hypothesis 4 - Sets of stimuli given conelated stimuli provide opportunity to learn than sets of stmuli given orthogonal stimuli. Referring back to Figures 5 and 6, one can clearly see that the case of the multiple stimuli for correlated data produced a mean closer to zero than the multiple orthogonal stimuli, although it was not statistically significant. Unfortunately, this means that I cannot support the claim made in hypothesis 4. 26 Discussion The fact that subjective preference polling fared worse than the objective tests can be explained in several ways. Test subjects may have been unwilling to share truthful information about their actual preference. This is a known and widespread problem for electronic data gathering, where individuals are concerned over issues of privacy and security. It is interesting that subjects who participated in the objective conditions were willing to share veridical information, which perhaps can be explained by the fact that they unwittingly gave their information rather than being asked outright. The implications of this difference in sharing of truthful information is especially critical at the beginning of the sales or new product requirements cycle when the merchant is attempting to learn about the customers preference. Through personal observation in mry work as a software product manager, people are more willing to talk about themselves and their work than they are to answer specific questions. By developing relationships with customers and learning about how they do their daily work, I have been much more successful in developing product requirements than through survey methods. The results of this experiment confirm this observation. Another reason that subjective preference polling may have been less successful is that customers, who are used to making holistic decisions simply were unable to differentiate on the individual attributes. That is, test subjects may have placed a high value on an individual attribute only to find that itmay not have been as important in light of all the attributes together. How often has one determined that price was the most important attribute, only to find that one would be willing to pay more for a product that was slightly more expensive, but far superior in other attributes than the least expensive model? While itis not statistically significant, several test subjects remarked that Attribute 6 (Motion Blur) was a determining factor for them, in addition the difference data from stage 1 to 2 on this attribute was very high for the subjects in the subjective condition (See Figure 5). However the results showed that overall this attribute did not alone determine preference. Based upon these results, I would recommend that any site which is attempting gain that intimate understanding of customer needs, which was previously discussed as critical for making customercentric business decisions, follow a method which infers customer preference rather than outright asking for information. Correlated data has an inherent problem that itdoes not provide a large variation of attributes on which to make distinctions. To the extent that similar sets of data may have large amounts of data 27 on one attribute, and very little on another one can easily ascertain that correlated data does not provide a complete picture of the attribute universe. Due to the greater variety of stimuli Within orthogonal data, more information can be inferred with less training data. One observation is that smart agents that attempt to infer utility will become "smarter" quicker if it looks at orthogonal data vs. correlated data. Another important observation is that not all data about a user needs to be saved. Clearly, itwould be sufficient to save disparate data and toss old data which is similar to the new. This would solve one disadvantage previously mentioned which mentioned that large data sets are prohibitive to maintain. An example would be an agent that records my musical preference. As I order music from different genre's and artists, the agent would be able to leam about my tastes along many attributes. Howv&er as I order music by the same artist the agent is only able to determine that I like that particular artist more than others, that is itonly leams about me on a small number of attributes. Given that there are more attributes in the set that contains all of the artists and genre's that I have previously enjoyed; itwill be easier for the agent to relatively determine my preference for a new album if the agent has information which spans the entire orthogonal set of my past purchases. Itis not necessary to maintain the information that I have ordered multiple albums from the same artist as this provides little new information. When looking across the dimensions of sets vs. single stimuli, my data showed that the single stimuli provided a better result. This may mean that single is inherently better, especially in the case of the orthogonal single stimuli, or itmay mean something more. It may mean that there is something inherently different between choosing amongst several altematives (already having made the purchase decision) and choosing a product in itself. There was little variation between the orthogonal and correlated sets of data, which means that given multiple stimuli it may not matter whether the choice is between similar or completely different products, only that the purchase decision has been made. Clealy an understanding of the customer is required to determine how to best elicit a response. Inthe case of decision processes that are concerned with choosing a product in of itself, the current work would recommend inferring from orthogonal data sets the values of customer preference for various attributes. Examples of types of products in this category would be electronic equipment, luxury items, recreational purchases; as each of these types of products are sold based upon the decision to have them or not. 28 In the case of decision processes that are chosen from amongst alternatives, that is the purchase decision has been made and now the consumer is choosing between brands or models, the current work does not have a recommendation since the standard used (task 2) was ratings single stimuli. Based on the analysis of the decision process, I would recommend doing something comparative as information derived from orthogonal stimuli provides the largest amount of information with the lowest amount of data. My motivation for beginning this work was to determine which method of utility profile would be most successful in determining customer preference. While I have demonstrated that there are differences in the measures of success amongst the methods, I have also come to the conclusion that understanding who the customer is and where they are in their purchase decision is required prior to selecting a method. I am able to conclude that if your customer is choosing a product in and of itself, then single orthogonal stimuli from which a preference is inferred has the highest measure of success. Next Steps One of the conclusions that I have drawn was that the single stimuli had the highest measure of success. This may have been due to the fact that task 2 was a single stimuli. To prove that the central issue is the match between the elicitation mode and the criteria, itis recommended to repeat the entire experiment, with task 2 as a set. If in the next experiment the results are the same, the conclusion will be that the single orthogonal method is overall superior; however if in the next experiment the multiple method will be superior, the conclusion will be that that it is the match between stage 1 and stage 2 that is crucial. Another future step might be to develop a smart agent that utilizes these methods and suggests stimuli based upon task1 in task 2. One could then test in real time other issues including performance, magnitude of error, number of errors, distance from the ideal and user satisfaction with the algorithm. 29 References 1. Special Issue on Information Filtering, Communications of the ACM, Vol. 35, No. 12, December 1992. 2. Barrett, R., Maglio, P. P., & Kellem, D. C. (1997) How to personalize the web. Proceedings of the conference on human factors in computing systems (CHI '97). New York: ACM Press. 3. Bradely, Usana M. Fayyad, O.L. Mangasarian. Data Mining: Overview and Optimization Opportunities 4. Clarke R. "Extra-Organisational Systems: A Challenge to the Software Engineering Paradigm"' Proceeds IFIP World Congress, Madrid, September 1992. 5. Deerweester S., Dumais S., Fumas G., Landauer T., Harshman R., "Indexing by Latent Semantic Analysis", Journal of the American Society for Information Science, Vol. 41, No. 1, 1990, pp. 391-407. 6. Fox, A., & Brewer, E. A. (1996). Reducing WWW latency and bandwidth requirements by realtime distillation. Proceedings of the fifth international World-Wide Web conference 7. Goldberg D., Nichols D., Oki B. and Terry D., "Using Collaborative Filtering to Weave an Information Tapestry", Communications of the ACM, Vol. 35, No. 12, December 1992, pp. 6170. 8. Benjamin N. Grosof, David W. Levine, Hoi Y. Chan, Colin J. Parris, and Joshua S. Auerbach. "Reusable Architecture for Embedding Rule-based Intelligence in Information Agents" (Dec. 01 1995). Proceedings of the Workshop on Intelligent Information Agents, at the ACM Conference on Information and Knowledge Management (CIKM-95), edited by Tim Finin and James Mayfield. Held Baltimore, MD, USA, Dec. 1-2, 1995. 9. Resnick Paul; lacovou, Neophytos; Sushak, Mitesh; Bergstrom, Peter; Riedl, John;''GroupLens: An Open Architecture for Collaborative Filtering of Netnews", Proceedings of the CSCW 1994 conference, October 1994. 10. Resnick, Paul; "Filtering Information on the Internet". Scientific American, March 1997. 30 11. Rich, Elaine; ''User Modeling via Stereotypes", Cognitive Science, Vol. 3, pp. 335-366, 1979. 12. Rosenfeld, L. B., and Holland, M.P. "Automated Filtering." Online; May,1994 13. Robert Rosenthal and Ralph Rosnow, "Essentials of Behavioral Research: Methods and Data and Analysis", McGraw Hill, second edition, 1991. 14. Ryan, K KIlosterman, S.; Patil, S. "Optimization in Data Mining". Prepared for Prof. Tom Magnanti - Spring 1998. 15. Upendra Shardanand, "Social Information Filtering for Music Recommendation", MIT EECS M. Eng. Thesis, also TR-94-04, Learning and Common Sense Group, MIT Media Laboratory, 1994. 31 Appendix I The following instructions were sent to the subjects requesting their assistance with data collection. Instructions: In this study, you will be shown a series of images and asked to state your preference for each image. The images will vary on seven attributes (dimensions). eDensity - Describes the number of circles in an image. *Color Family - Describes the hue of the circles. ePointalization - Describes the size of the points that make the individual circles. eSaturation - Describes the strength of the color within the circles. eBrightness- Describes the amount of light in the circles themselves. *Blur- Describes the crispness of the circles. eBackground- Describes the background color of the image. An illustration of these attributes can be seen in the two images below. There will be two steps in this study. Please evaluate all the images in both stages. If you cannot complete the entire study, please do not continue as it may skew my results. The entire time to complete the study should be approximately 10 minutes. In the first phase of the study, you will be given either a table of questions about your preferences or a series of images. Please follow the instructions on the screen. In the second phase everyone will be shown a series of 40 images and asked to rank them from 100 to 100. -100 means you really hate the image and +100 means you really love it.You may also choose any number in between to rank the images. 32 " e e Please wait until all the images have loaded before you begin to rate them. There are no right or wrong answers so please indicate your first reaction to the images. Please feel free to print and refer to these instructions as you complete the exams. Thank you for taking the time to assist me with my data collection for my thesis. All submissions will remain anonymous and will in no way be used for any other purposes other than this research. To begin please point your web browser to http://iesi.mit.edu/kir/thesis.nsf 33 Appendix I The following pages are screen captures of each of the possible screens that the subjects may have encountered. t r Kimberly s J Ryan L T hesis Netscape rut ~ tC ~Lk ~-.xV~ -~ PT eference Test : Phase 1 -The images in this database have seven attributes. (see examples) * First, please indicate yourpreference for each attribute in the "Your Preference" column. Choose so that each menu selection is such that it reflects your preference for this attribute. " Second, next to each attribute indicate how important it is foryour overall esthetic evaluation. The numbers of all seven weighting must add to 100. * Select Cetixue when done -lDensity- Saturation -High Brightness- Dark Blur-uone Background- Black * Br Density- Medium FanlyColor- Purples Pomntalnzaton- Medium. Saturation- Medium Bghtness- Light Blr- Yes Background- Grey Fsele in I-Select slect Backpan High FamilyColor- Blue/Green Massachusetts Institute of Technology Copynght @ 1999 KimberlyJ. Ryan H-AtiobtTsnt *90aai j10$ ta 0: D0ne F wotaa-swWglKibry n-*1 J.- |3 ~aVlrei~ Figure A 1 - Attribute Test, Phase 1 34 Peference Test : Phase I Massachusetts Institute of TechnologyCopyright@ 1999 KimberlyJ. Ryan Figure A 2 - Correlated Test, Phase I 35 r ce Test: Phase 1 Massachusetts Institute ofTechnology Copyright@ 1999 KmberlyJ. Ryan Figure A 3 - Orthogonal Test, Phase I 36 reference Test : Phase I : DISIM COMPIETELY -100..... .. 10aedag wokrioa"eS NEUTTRAL ....... i IMGE COMPIETELY ..... 100 Massachusetts Institute of Technology Copyright @ 1999 Kimberly J. Ryan -- O4-a Se T Figure A 4 - Orthogonal Set Test, Phase 1 37 'reference Test : Phase 1 DISLIKE NEUTRAL COMPI1ELY ....... .... LIKE COMPLEIELY ... 100 j Massachusetts Institute of Technology Copyright@ 1999 KimberyJ. Ryan Figure A 5 - Correlated Set Test, Phase I 38 Tliak Ye Ifysu are fieresSed in the results of tis expernmwt, plese seman emml kjr@mitedu. Iwlibe hgy to sharn my flnuqs vit yu. Massachusetts Institute ofTechnology Copyright@ 1999 KimberlyJ. Ryan Figure A 6 - Final Screen 39 Kimiberly J Ryan T 1hesis Netscape RNA F, reference Test :Phase 2 Pleae give a emr imea yeurprefrence hr the preferred image: Use ayxumber bewee-100and+100. DISLTEF COMPLETELY -100 ........ inage NEUTRAL LIKE COMPLEIEY .....100 16@f40 Massachusetts Institute of Technology Copyright FigureSC 7sia 1999 Kimbedy J.Ryan l Tes Phase 2(amfo at Figure A 7 - Final Test, Phase 2 (Samne for all Phase 1 tests) 40 Appendix III Data Set Matrices Table A I - Correlation Matrix 1i 1 1 1 4 2 1 1111421 jpg 2 2 2 2 3 2 2 2222322.jpg 3 3 3 3 2 2 3 3333223.jpg 1 1 1 1 1 2 1 1111121.jpg 2 2 2 3 4 2 3 3 3 3 4 2 3 3333423.jpg 1 1 2 1 4 2 1 1121421.jpg 3 2 2 2 3 2 3 3222323.jpg 3 3 3 3 3 2 3 3333323.jpg 3 3 3 3 3 2 1 3333321.jpg 3 3 3 3 3 1 3 3333313.jpg 3 3 3 3 1 1 3333311.jpg 2 2 2 2 2 2 2 2222jpg 2 2 2 2 2 1 3 2222213.jpg 2 2 2 2 2 2 2 2222222.jpg 22 1 2 1i 1 22 1 1 2 1 1222212 jpg 1112211I.jpg 3 1 1 1 1 2 1 3111121.jpg 3 1 2 2223422.jpg a<->b b<->c c<->d d<->e e<->f 0.62718 0.962622 0.882403 0.454796 0.358057 a<->c b<->d c<->e d<-f e->g 0.556187 0.930502 0.503953 0.0456 0.17622 a<->d b<->e C<-> f dc->g 0.394157 0.135333 0.53110 a<->e d<->f e->g 0.503953 0.045596 0.543961 a<->f b<->g 0.051614 0.59095 0.5201 f<,>g avg correlation 0.0378 0.474191 41 Table A 2 - Orthogonal Matrix 111 2 3 3 2 2 3 1 1 1 11111.jp9 1 1 2 2332112.jpg 1 2 3 3223123.jpg 1 2223211.jpg 1 2 21 2 22 3 1 1 1 2 1 3 3 2 2 3 3 3 2 3 1 2 3 3 1 1 3 2 2 2 1 3111212jpg 1 - 3 1332223.jpg 1 3332311.jpg 1223312.jpg 3 1 1 1 1 2 2 1 3 3 2 1 3 2133213.jpg 3 2 1 2 1 1 3321211.jpg 3 2 1 4 2 2 2321422.jpg 1 2 4 1 2 3 1212.jpg 3212413.jp 1133411 j 1 1 3 3 4 1 1 3 1 3 3 4 2 2 3133422.jpg 1 4 1 3 1321413.jpg 1 2212411.jpg 1 1 2 2 1 2 4 1 1 3 1 3 3 1 2 2 3 1 3 2 a<>b b<->c -007335 01 c<->d d e 7- 0285714 1313313.jpg - 1 0<->f f<-'g 0.05 01257 c<->f d<->g 0-074.12-0.1-0.1420. ac->d b<->e ec.>g 0.0721 0.0459 0a<ed-$f 0.A2D010 a- 4-0-1252940.142 42 2111323.jpg 2 2231321.jpg avg correlation Table A 3 - Correlated Set 1 1 1 111421.jpg -I 22jpg 4 2 111321Jpg 1 1 1 I 1 1 05 -0. -od - - - - 0. S22jpg - 9 1 4 1 1 2 112121 jpg 1 11211 jpg 1 0 1 0. 0.852 3421 jpg 1 2 0.81649 3411.jpg 1 1 S2jpg 13312.Jpgjp 3 3 1 6 1 1 0. 0 1 111121Jpg9 1 1 1 111222.jpg 2 1 1 1 1 22222jpg Jpg 21111 3 3422pg 0577 0. 5771 0 0. 10. 1 222ijpg 21 2 11 1 1jp 22jpg 057 0. 0.9 4 0.8164 3423.jpg 3 3 1 1 1 2 1 1 1 2 1 1 1112114pg 2 1 13312114pg 11 1 10.3 331 -0.52223 i 1 1 1211.jpg 121421.jpg 4[1 2423.jp'g 1~~r 1 1 0.= 0.5 0.870 3 2 1112114pg 11 1 0.577 0.577_ 22223ng | 2 2 3 2"""") 12 22221 23jpg 32 3323323Jp9 43 1 1 33 2 3 3323jpg 1 11411.pg 2 1 1 4 1 3 3 3 3 2 22.jpg 3 3 1 1 1 1331111.jpg 0174072 1 022941V1 022941 2 3 3 1 2 1 2122322jpg 1 1 1 23 1 4 3 2 1 1 3 3 1122322jpg 0 17778 -- S 1 1 1 1 2111113jpg 3 3 1 3 1 333111jpg 3 2 1111322jpg 3 3 1 1 3 2 1 0174078 .3 5 65773 3 S 3 1 3322132jpg 3 2 1 1 1 1111jpg 1 1 1 1111 jpg 3322jpg 1 0.147 1 0357 1 0. 2 3 3 2 1 2 2 2 2 1 1 32122.jpg 11 123.-jpg -3 1 1122111jpg 2 2 1 1 1 2 - 3 1 3 3 0.301511 1 0.5'- 2 2 4 2 3t_ 1 1 0l 407810.2581 0.447214i 2 2 2 1 1 11 2 22jpg 22 jpg 1113111 jpg 1 1 31 22 2 222223.jpg 1 1 0.5773 213.jpg 111222.jpg 2 2 1 3 1 2 1 2 0.9044 0.81649 0. 0081649 0. 0.81649 2222.jpg 2 1 1 1 33lljpg 1 33131.p 333 11 1 3 1 21211jpg 1 3 3 1.00451 [2 2411jp 1 332.p 1 1 - 1 - 0. 05 1 0.301511 0.770 -3 44 1 0.577 0.57735 0. 0.81649 0.852 21112112.jpg 1 2 1 11333311jpg I 1 2 1222212jpg 2 2 222222jpg 2~111223jpg 1 4 2 2 1 3 1 1 1 33422.jpg 1 232211.jpg 1 2 2 1 111121.jpg 1 22422.jpg 1 1 3 3 1 1 1 1 1122112.jpg 1 1 111112.jpg 1 1 0 81649 0.5 1 1 0.544331 1 1 3 1 070710 1 1 1 2 2 0.5 0.426401 0.301511 0. 0577 1 111111.jpg 1 1222422.jpg 1 4 1 0.73854 23411.jpg 1 311321.jpg 3 1 2111.jpg 11132211.jpg 1 4 3 1 1 3222.jpg 1 1 3311.jpg 0.192 2 3 0.301511 0.301511 0.852 1 2 3 1 21222.jpg 1 0. 0.426401 0.81649 0.426401 0 1 1112211.jpg 1 1 1 3321.jpg 0.816497 0.333333 0. 0 0.577 1 2 3 0.4082 0.5 45 Table A 4 - Orthogonal Set 1 3 3 2 1 1 3 2 2 1 3 2 1 3 3 2 2 3 213223.jpg 1 3 2 1 1321321.jpg 2 1 3 2 22 3213222.jpg 2 1 2 2 1321221.jpg 0 02 -0426401 3 2123.jpg 3 1 3 2 1 1 21332112.jpg 2 3 2 4 2 2422.jpg 2 1 3 2 2 3 1213223.jpg .05 -1 -0.188982 0.750929 0. -O 3 2 2 2 3211.jpg 11 1 1 1 3 2 4 2 2 1132422.jpg 2 2 1 3 2 2 3 2213223.jpg 2 3 2 1 3 2 1 0.81649 7 -0.5 3 1 1 1 2 3 2 1 1 3 2 -0.707107 0.426401 1 3 2 3 1321.jpg 26401 -063634 0.522233 11 1212.jpg 2 1 - 4 2 1 232421.jpg - 2 1 3 213213.jpg 4 2 1322421.jpg C 0 1 -0.904534 3 2 2 2 31332223.jpg 1 3 2 1 1 2 132112.jpg 2 1 3 4 1 33213413.jpg 2 1 3 1 1321311.jpg 1 - 1 - 0.426401 0.316228-0.258199 0.522233 0.636364 -0.090 2311.jpg 3 3 3 3 1 1 3 1 3 4 2 2 132422.jpg 1 1 1 2 3 2 3 1113223.jpg - 1 32 0.301511 0.301511 -0.904534 -0.707107 46 3 2132423.jpg 4 2 1 1 2 2132422.jpg 2 -0090 2 3 2 013484 0.774597 0.522233 I1 ......332;112.jpg 1 3 -42401 4 -0. -0.090909 -0090 11111.jpg 1 1 1 1 23321.jpg 0 0522233 1 2 2 3 3 1 2 223312.jpg 2 1 3 2 1 2 2 2132122.jpg 3 2 1 3 3 2 3 3213323.jpg 1 321121 -0.577 -0. -0.426401 0. -0.426401 1 1 2 -- 1 1 212113.jpg 1 331321.jpg -4301511 0.22941 4 1 2413.jpg 1 2 1 1212.jpg 2 1 3 3 3 3 2 131122.jpg 1 1 1 1 213113.jpg 1 1323321.jpg 0301511 -0 05 -05 -0426401 -0. 1 1 211321.jpg - 0577 -0301511 05 0426401 1 222311.jpg 1 3 2 21422.jpg 1 2422.jpg 4 1 1 1 1 2 0.70710 1 2 2 113223.jpg 3 1 1 121311 .jpg 212413.jpg 1 4 1 0.81649 0.17407 -0.426401 -0 0.70710 21211.jpg 123222.jpg 2 2 1 1 1 1 1 3 2 133213.jpg 1 2 2 -0.577 -0301511 1 1 122322.jpg - 05 0 852 1 132412.jpg 1212222.jpg 2 1 1 -022941 0.426401 0.301511 1 4 3 1 1 111323.jpg 3 1 1 1 Jpg 232422.jpg 4 1 223323.jpg 1 1 4 1 4 2 -0.5 -0.577 0-57735 I 111213 21.jpg 3 1 1 1 133411.Jpg 1 1 32421jpg 3223.jpg 2 1 333321.jpg 1 -0.301511 -0.17407 -0.3 -0 5 -05 0 1 3 133422jpg 1 21 22412.jpg 2 1 2 2223.jpg 47 1 0 0.707107 0.5 0.707107 -0.636364 -0.426401 3 1 - 1I 1 321413.jpg 1 3 3 4 1 2 2133412.jpg 2 3 2 2 2 3 3232223.jpg 3 2 1 2 1 1321421.jpg 1 - 1 1 2124111.jpg 2 22422.jpg 1 2 2 4301511 -0555 0 4 -0.636364 1 1 3 3 3 2 1 1 3 2 3 1 2 11323121.jpg 3 1 3 3 1 31313313.jpg 2 1 2 2 1 1 12122111.jpg 3 1 2 2123122.jpg 2 2 1 1 332211I.jpg 1 3 .707107 0.30151 1 -0.523 -0 -0943 1 2 2 2 2 2 3 2 3 1 1 3 -0.81649 48 3 3 2 1 0.5740 052223 0 0.707107 3 1 3322.jpg 2 -0.5 1 1 1321211.jpg 1 2 1 2 3 0.1 74078 2 - 1 231321.jpg 2 1 -0426401 0.816497 -08 - 1222422.jpg 33223213.jpg 1 311321.jpg Table A 5 - Final Test 1 1 1 1 111421.j -1 2222322.j 3333223.j 1 1 1 1 1 1111121.j 1 - 2223422.j 23321 12 .j 1 1 322312 3.j 1 22232 1l.j 1 1 1 1 311121 2 j 1 1232323.j 1 S 2122121.j -1 - 2123413.j 1 1 1 1 1 1 1 1 1 213141l.j 1 1212221.j 1 1 1 1 1 2123212.j 2121422] 1 1 3212122.j 1 1 1 1 1 2211311.j 1 3131411.j 1 2122221j S1222323.j 1 123213.j 1 1 3 2 1 1 1 21 324 11.j 1 1222221.j 1 2 2122212.j 1 1 2212311j i -j - - 2 1 1 2221422. 1 3212322j 1 1 1 3231311.j 1233323.j 1 1 1 1 1 1 1 - 1 1 1 2121413.j 1 1 1 1 1 2131111. 1 1211221.j 2122312.j 1 1 1 1 - 11 2112121.j 2212321.j 2121212j 49 3 2 1 2 1 1 2 3212112.jpg 3 1 3 1 1 1 1 3131111.jpg a<->b b<->c c<->d d<->e e<->f f<->g 7 0.096875 0.062 0.03653 0.1111 -0.088247 a<->c c<->O c-f Iec->g 0.14816 0.365 0.192 0169419 0.1466 a<->d b<->e b<->d 0.050511 -0.0 a<->e c<->f 0.165361024 0.038278375 e<->g -0.102169 0.339276 0.221463 ac-Sf 0.0366 d<->g -0.2310 0.484714 d<->f avg correlation 0.076428394 bc->'g 429814 0.221855 -0.012714643 a<->g 0.096275 50 0.016045 867