i

advertisement
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
i
PREFACE
For a number of years I have worked with teaching about, implementing and writing computer
programs representing fairly sophisticated mathematical models. Some problems have then emerged
which I have not seen any systematic attempts to answer. A basic observation is that a program for,
say, factor analysis, multidimensional scaling, or whatever will always grind out an answer. With
increasing sophistication in software development, advanced programs are to a larger and larger
extent available for users with perhaps less than complete insight in the mathematics’ of the
underlying algorithms. There does not seem to be any necessary reason for deploring this state of
affairs since any computer program is a tool which can be used without detailed knowledge of how it
is built. It is, however, often a very difficult task to figure out the meaning of the output from a program
applied to a specific set of data.
There exists what perhaps may be regarded as a paradoxical state of affairs in that there is a
discrepancy between on the one hand the amount of sophisticated mathematics used in many
programs and on the other hand an absence of developed rationale for answering many concrete
questions which the user will want to ask when he has applied an advanced program. The novice or
uninitiated may well be awestruck by what appears as highly developed tools. But when he asks
questions as for instance: is there any structure in the data, is it worth, while to try to Interpret the
output, what is really the dimensionality of these data, there does not exist any explicit rationale for
answering these questions. The user is typically left with more or less intuitively based rules of thumb.
Of course an expert will be able to give reasonable answers based on his long experience. But the
use of advanced computer programs would be much more convenient if the knowledge of the expert
could be replaced with explicit rules.
The aim of the present work is to outline an approach which I hope will lead to simple, explicit rules for
answering such questions as those exemplified above which currently are answered on a more or less
intuitive basis.
Unfortunately this does not imply that the present work can be regarded as a textbook. For one thing
the general approach is only applied to one set of methods, namely multidimensional scaling.
Furthermore the methodology is not worked out in detail for many types of applications of
multidimensional scaling and finally there is as yet no report of how the methodology works for
empirical data. At present this work will mainly be of interest for the expert in methodology and the
user of multidimensional scaling, though it is hoped that the reader with general interest in
methodology will find the general approach of interest.
Some comments on the separated chapters may be of help to the general reader in that it will indicate
which of the sections can be skipped. The basic idea in the present work is really quite simple.
Empirical data are regarded as infested with noise and the aim of the model (program) is regarded as
the "removing" of noise, "purifying" the data, or more literally: if noise obscures the underlying
structure or moves the data away from the latent structure, applying an optimal model will move the
result back towards the true structure. The general background of this idea is sketched in Chapter 1
(which the expert can just skim) - discussion of critical views on this seemingly Platonic view is
reserved for the final Concluding remarks.
Chapter 2 is the basic chapter in the present work, spelling out the main idea in some detail. Since the
idea represents a conceptual framework that can be applied to a large variety of different models it
may be regarded as a model for models or a metamodel. There are three basic relations, the relation
between the latent structure and the data, the relation between the data and the output and finally the
relation between the output and the latent structure. Since these three relations are not systematically
discussed in the literature previously, the first three sections of Chapter 3 reviews and classifies the
scattered comments of relevance. These sections are fairly technical and may be skipped by the
reader with mainly general interests. Section 3.4, however, represents what is perhaps a novel
approach to the justification of a new index and may be of general interest.
In Chapter 4 the first two sections are again classificatory, Sections 4.3 and 4.4 are fairly technical and
then Section 4.5 directly picks up the threads from Chapter 2 and leads to the results of practical
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
ii
interest which are presented in Sections 4.6 and 4.7. An attempt has been made to capture the basic
procedure and the main results in terms of graphical diagrams.
These four chapters in Part l represent an attempt to flesh out an approach which has not been made
explicit previously though it has been in some current literature. In the terminology of G.A. Kelly this
part tightens and clarifies an approach which hopefully in the future may make it easier to evaluate
results from complex programs.
Part 2 is different in that it raises rather than answers questions. Chapter 5 deals with tree structures,
then in Oh. 6 there are some frankly speculative attempts to sketch a more general model than at
present exists. In Chapter 6 there is a variety of illustrations from cognitive (and clinical) psychology
which is used to justify the search for more complex models. It is also hoped that this chapter may
inspire closer collaboration between experts in the development of technical tools for data analysis
and those whose technical expertise is manifested in the alleviation of human distress, the clinical
psychologists.
---------------
-----------------
---------------
Looking back on the years spent on the present work there is the deep realization that it would not
have been possible to finish this work without the help of many colleagues and friends. First of all I
wish to express my gratitude to Gudrun Eckblad who patiently worked through many drafts and did
invaluable service in removing or clarifying several obscure passages. In discussions she also pointed
out to me several implications of the basic framework which were unclear to me. It is now difficult for
me to sort out and rank order the help provided by several other colleagues who critically commented
separate sections of this work and provided invaluable encouragement. The list below is not complete,
but my gratitude to each of the persons listed below is deeply felt:
Rolv Blakar, Carl Erik Grenness, Paul Heggelund, Steinar Kvale, Thorleif Lund, Erik Paus, Ragnar
Rommetveit, Jon Martin Sundet, Astrid Heen Wold and Joseph Zinnes.
A very special gratitude is due to the staff at the Computer Centre of the University of Oslo.
Understaffed and overworked, yet there always seemed to be someone available to help me to utilize
our CDC machine and to debug errors in the program system I had to develop.
And, finally, this research has in part been supported by grants from the Norwegian Research Council
for Science and Humanities.
Finn Tschudi
October 1972.
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
iii
CONTENTS
page
i
PREFACE
Part I
A METAMODEL AND APPLICATIONS TO DIMENSIONAL MODELS
1 Introduction
1.1 Multivariate models and the research process. Statement of problems
1
1.2 Comments on formal and content oriented approache
5
1.3 General properties of data reduction models
9
1.4 Type of model and psychological theory
11
1.41 Spatial (dimensional) models
11
1.42 Tree structure model (hierarchical)
13
2. A metamodel for data reduction models
16
2.1 A metamodel
16
2.2 The extended form of the metamodel.
Empirical and theoretical purification
20
3. Nonmetric multidimensional scaling and the metamodel
25
3.1 Nonmetric algorithms and criteria for apparent fit (AF)
25
3.2 Methods of introducing error and indices of noise level (NL)
30
3.3 On indices of true fit (TF)
41
3.4 Direct judgments of true fit. An empirical approach
48
4. Beyond stress -how to assess results from multidimensional scaling by means
of simulation studies
54
4.1 Two approaches in simulation studies
54
4.2 Classification of variables in simulation studies
54
4.3 Comparison of algorithms
57
4.31 Choice of initial configuration and local minima
57
4.32 Comparing MDSCAL, TORSCA and SSA-1
58
4.33 Metric versus nonmetric methods.
63
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
iv
4.4 Previous simulation studies and some methodological problems.
Analytical versus graphical methods.
Unrepeated versus repeated designs.
65
4.5 Implications from the metamodel.
68
4.6 Evaluation of precision.
Construction of TF-contours from AF-stress.
71
4.7 Evaluation of dimensionality and applicability.
Application of the extended form of the metamodel.
89
Part II
ANALYSIS OF A TREE STRUCTURE MODEL AND SOME STEPS TOWARDS
A GENERAL MODEL
.
113
5. Johnson's hierarchical clustering schemes, HCS.
5.1 A presentation of hierarchical clustering schemes.
113
5.2 HCS and the Guttmann scale.
117
5.3 A dimensional representation of the objects in HCS (for binary trees)
- a tree grid matrix.
121
6. Filled, unfilled and partially filled spaces.
6.1 A discussion of HCS and spatial models.
127
127
6. 2 The inadequacy of tree structure models. Comments on tree grid matrices,
G.A. Miller's semantic matrices and G.A. Kelly's Rep Grid.
129
6.3 Outline of a general model
133
6.4 Comments on the general model
136
6.41 Some technical problems. The metamodel and the general model
136
6.42 The general model as a conceptual model,
new directions for psychological research.
139
CONCLUDING REMARKS
143
Appendix. Main features of the program system
147
References
150
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
List of figures.
v
page
Chapter 1.
Fig.1. Schematic diagram of the research process, based on Coombs (1964, fig. 1.1)
Fig. 2. Illustration of relation between 1 and 2 sets of objects.
Fig. 3. A classification(typology) represented as a tree.
2
3
14
Fig.4. Tree structure resulting from a hierarchical cluster analysis of latency data for
visual discrimination of pairs of letters by adult subjects.
Based on Gibson (1970, p. 139).
15
Chapter 2.
Fig. 1. A metamodel representing the relations between latent structure (L),
manifest (M) and reconstructed data (G).
16
Fig. 2. Extended form of metamodel (for repeated measurerments).
The figure illustrates empirical purification.
21
Fig. 3. Illustrations of possible lack of equivalence between empirical
purification and theoretical purification.
22
Fig.4. Construct network for extended form of metamodel.
23
Chapter 3.
Fig. 1. Illustration of error process used by Young (1970).
31
Fig. 2. Illustrations of different categories of true fit for 1 dimensional
configurations, n= 20.
52
Fig. 3. Illustrations of different levels of true fit for 2dimensional configurations, n= 20.
53
Chapter 4.
Fig. 1. Schematic illustration of the relation between (NL) and (TF,AF).
68
Fig. 2. Sample of results from simulation studies showing the relation between
TF-categories and AF-stress for selected values of n and t.
72
Fig. 3. TF contours from AF-stress for 1 dimensional configurations. Each curve shows
a TF category boundary ( contour ) as a function of AF and n. Also included
is a curve showing the 5% significance level.
73
Fig.4. TF contours from AF-stress for 2 dimensional configurations. Each curve
shows a TF category boundary (contour) as a function of AF and n. Also
included is a curve showing the 5% significance level.
74
Fig. 5. TF contours from AF-stress for 3 dimensional configurations. Each
curve shows a TF category boundary ( contour ) as a function of AF and n.
Also included is a curve showing the 5% significance level.
75
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
Fig.6. Relation between TF (expressed as categories and as correlations)
and n for configurations in 1, 2 and 3 dimensions when stress = 0.
vi
76
Fig.7. Relations between AF-stress and TF-catcgories for 7, 12
and 25 points and crossvalidation results for 1 dimensional configurations.
77
Fig.8. Relations between AF-stress and TF-categories for 9, 12 and
25 points and crossvalidation results for 2 dimensional configurations.
78
Fig. 9. Relations between AF-stress and TF-categories for 9, 12 and
25 points and crossvalidation resulta for 3 dimensional configurations.
79
Fig. 10. TF contours from NL for 1 dimesaional configurations. Each curve
shows a TF category boundary ( contour ) as a function of NL and n.
82
Fig. 11. TF contours from NL for 2 dimensional configurations. Each curve
shows a TF category boundary (contour) as a function of NL and n.
83
Fig. 12. TF contours from NL for 3 dimensional configurations. Each curve
shows a TF category boundary ( contour ) as a function of NL and n.
84
Fig. 13. Some comparisons between the relation of AF and NL based on
Figs. 3-5 and Figs. 10-12 and the relation of AF and NL in the original results.
Fig. 14. Schematic representation of the design used for the extended form of
the metamodel.
85
90
Fig. 15. Application of the extended form of the metamodel when the analysis is done in
varying dimensionalities for a given true dimensionality, t.
A schematic illustration of expected relative size of Theoretical Purification,
TP, based on theoretical correlations and Empirical Purification, EP,
based on empirical correlations.
Fig. 16. Curves showing how the amount of purification, Est(TP), depends upon n and t.
91
107
Chapter 5.
Fig. 1. An example of a HCS and the corresponding tree representation.
113
Fig. 2. Illustration of a nested sequence of sets.
118
Fig. 3. Different presentations of the same tree structure.
120
Fig.4. Illustration to the proof of having to represent a HCS in n-1 dimensions
or the 1 ∞metric.
124
Chapter 6.
Fig. 1. A tree with 4 objects and the corresponding 3 dimensional spatial representation.
127
Fig.2. Representation of a mixed class and dimensional structure, 3 cIasses and
one con-tinuous dimensions.
134
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
1
A METAMODEL AND APPLICATIONS TO DIMENSIONAL MODELS.
Chapter l.
INTRODUCTION
1.1
Multivariate models and the research process. Statement of problems
Multivariate models are finding increasing application in all branches of psychology, as for instance
testified by the handbook edited by Cattell (1966). The classical example of multivariate models, factor
analysis, is still probably the most popular of multivariate models.
The most important methodological contributions in recent years, however, are nonmetric models. The
modern period of development of nonmetric models may be dated from the time when Shepard
(1962a, 1962b) first described an algorithm in the form of a computer program for nonmetric
multidimensional sca1ingc Shepard's work must partly be seen against the background of metric multidimensional scaling, which again may be regarded as an outgrowth of factor analysis, cfr. Torgerson
(1958) An equally important background for Shepard’s work was earlier work in nonmetric models,
notably by Coombs and Guttmann. In a most interesting published letter Guttmann (1967) describes
some of this work and also the unfortunate lack of nonmetric approaches in the 1950’s.
Recently there has, however, been an extremely rapid development of nonmetric models, to a large
extent inspired by Shepard’s contributions. The most exciting recent development is conjoint
measurement, which for instance provides an approach to testing theories stated as metric functions
(e.g. Hull’s vs. Spence's theories on the relation between drive, habit strength and incentive) without
assuming more than ordinal properties of the response measures, cfr. Krantz and Tversky (1971).
Another consequence of applying conjoint measurement is that this in many cases makes it
unnecessary to use more or less arbitrary transformations in analysis of variance.
One basic contrast between the newer nonmetric approach and the older metric approach is whether
transformations are left open to discovery or whether they are more or less arbitrary imposed on the
data.
The recent developments in nonmetric models have occurred jointly with perhaps equally important
reformulations of the basis for psychological measurement. Today one can only glimpse the farreaching implications these new developments may have both for experimentation and theory building
in psychology (cfr. Krantz, 1972).
The present work focuses on some methodological problems, which are common for both metric and
nonmetric models. More specifically a point of departure for the present work is the fact that researchmaking use of multivariate models involves a series of decisions which often may be partly arbitrary.
The aim of the present work is to outline a conceptual framework which provides methods to aid the
process of making inferences from the output of analyses and to guide some of the decisions to be
made.
This conceptual framework will here mainly be applied to nonmetric multidimensional scaling;
hopefully it may in the future be applied to other multivariate models, nonmetric as well as metric
models.
Coombs (1964, p. 4) has a “flow diagram from the real world to inferences”, which will serve as a point
of departure for specifying the problems to be studied here. For our purposes it will be convenient
further to subdivide Coombs' "phase 3" - since this is the focus for the present work - into three steps
- labelled 3a, 3b and 3c in Fig. 1.
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
2
Fig.1. Schematic diagram of the research process, based on Coombs (1964, fig.1.1.). 3a, b and c
correspond to Coombs’ phase 3 (“inferential classification”).
We have nothing to add to the brief treatment of phase 1 by Coombs (1964, p. 4):
The universe of potential observations contains all of the things the behavioural scientist
might choose to record. If an individual is asked whether he would vote for candidate A, the
observer usually records his answer, yes or no: but we might ask why the time it took him
to answer is not of interest, or whether there was a change in respiration, or in his galvanic
skin responses or what he did with his hands, and so on. From this richness the scientist
must select some few things to record, and this is called phase 1 in the diagram.
While acknowledging the importance of this phase in the research process, Coombs has nothing
further to say about it: "Phase l, perhaps the most important of all, the decision as to what to observe,
is beyond the scope of this theory." (op.cit. p. 5.)
Phase 2 concerns one of the important contributions of Coombs’ Theory of data, the distinction
between "recorded observations" and "data". Coombs reserves the term “data” for that which is
analyzed and points out that "the same observations may frequently be interpreted as one of two or
more different kinds of data. The choice is an optional decision by the scientist and represents a
creative step on his part...." (op. cit. p. 4). On the other hand, as we shall presently see, a variety of
different kinds of observations may be mapped into the same kind of data.
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
3
Coombs interprets "data" in terms of two dichotomies1, whether the objects in the study may be
conceived as consisting of one or two sets of points, and whether the data may be interpreted as an
order or proximity relation on the points. This gives four kinds of data, the main focus in this work is on
the type of data which can be conceived as proximity relations on one set of points, "similarities data”
in Coombs' terminology.
This is motivated by the fact that the structure of such data is both simpler and more general than that
used in factor analysis, which is the most common multivariate model used. Similarities data are
simpler because only one set of points is involved in contrast to factor analysis, where there are two
sets of points (usually identified as "persons" and "tests”). The greater generality of similarities data is
apparent from an argument made by Coombs (1964, Ch. 24), where he points out that the case of two
sets of points may be regarded as an off diagonal submatrix from a more complete (intact) matrix with
just one set of points. We might for instance subsume persons and tests under a more general
concept, “objects”. It is readily apparent that the two diagonal submatrices - persons x persons and
tests x tests …will contain no observations. It is only the offdiagonal submatrix, persons x tests, where
we have observations on relations between objects. This is illustrated in Fig. 2.
Fig.2
Illustration of relation between 1 and 2 sets of objects
a) An intact similarities matrix (1 set of objects).
b) An offdiagonal submatrix (2sets of points) from a hypothetical matrix with 1 set of “objects”.
One consequence of the greater simplicity is that the solution is unique up to a similarity
transformation for similarities data. The problem of "oblique" vs. "orthogonal" transformation,
prominent in discussions of factor analysis, is not relevant for similarities data since only orthogonal
transformations are permissible (cfr. Cliff, 1966, p. 41). It should, however, be pointed out that we will
not have any special discussion of rotation in the present work.
The simplicity of similarities data will make it easier to concentrate on the basic relations which are
dealt with in the conceptual framework presented in Ch. 2. The consequence of this conceptual
framework can then be explored in detail for similarities data.
A study of similarities data is also of substantial interest as is evidenced by the wide variety of
experimental observations which may be mapped into similarities data. Direct judgements of
similarity/dissimilarity of each of the (2) pairs of objects are obvious examples. Some examples are the
study of hue by Ekman (1954), shape of U.S. states (Shepard and Chipman 1970) facial expressions
1
In the 1964 version of Coombs’ data theory there were actually three dichotomies, but the more recent
presentation (Coombs et. al., 1970) is somewhat simpler in that only two dichotomies are required
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
4
(Abelson and Sermat, 1962). Latency of response in discriminating pairs of stimuli is an interesting
alternative response measure, cfr. for instance Gibson,(1970).
Other types of examples include: overlap indices for pairs of words in classical free association
studies, cfr. Deese (1962), sorting words into any number of piles and for each pair recording the
number of subjects sorting them together, cfr. Miller 1969), and similarity of profiles, e.g. for colour
names applied to each of a set of spectral colours, cfr. Shepard and Carroll (1966).
Further examples include relations between journals which may be indexed by the amount of mutual
references, cfr. Coombs et.al. (1970, Ch. 3), see also Xhignesse and Osgood (1967) and substitution
errors in learning Morse signals, see Shepard (1963)2.
A systematic study of phase 2, in the present context the various procedures which may give
similarities data, is outside the scope of the present study, we now turn to phase 3.
In Coombs’ flow diagram phase 3 is labelled “inferential classification of individuals and stimuli” and
further described: "phase 3 involves the detection of relations, order and structure which follows as a
logical consequence of the data and the model used for analysis.” (op.cit. p. 5.) A general perspective
on the research process is provided by a further quotation:
the scientist enters each of these three phases in a creative way in the sense that
alternatives are open to him and his decisions will determine in a significant way the
results that will be obtained from the analysis. Each successive phase puts more limiting
boundaries on what the results might be. (op. cit. p. 5.)
Coombs' description of phase 3 is highly condensed and by breaking it up we can study separately
different choice points within this phase 3. While we do not wish to reduce the importance of the
scientist’s "creativity” the major aim of this work is to provide an approach which may help the scientist
to make the basis of his choices as explicit as possible.
Each of the three subdivisions which we have made of phase 3 corresponds to a major problem to be
illuminated. The first step (phase 3a) in the process of arriving at "inferential classification” is of course
to select a model. Usually this will be a spatial (dimensional) model. There is then the problem of the
applicability of such a mode. Will it give a faithful representation of the data? Or is the model
inappropriate for the given data? Coombs does not consider alternatives to a spatial model, but a
more recently proposed tree structure model (Johnson, 1967) does provide an alternative. While the
present work mostly treats a spatial model, some discussion of Johnson's model is included. By
presenting two different types of models for the same kind of data we wish to emphasize that a choice
point exists which is often overlooked.
Spatial models represent a type of model which may take on different forms, and thus we have a
further choice to make in phase 3 b (Fig. l). For one thing we may select from a variety of different
distance functions (though mainly we concentrate on the usual Euclidean space). Another choice of
form is the proper dimensionality. It is always possible to have solutions in several different
dimensionalities. We will try to outline a foundation for choosing the proper dimensionality.
As will be further discussed later, the decision as to applicability and dimensionality will in practice not
be made independent of each other. In order to conclude that a spatial model is inapplicable for a
given set of data, it will in most cases be necessary to show that the model is inapplicable for all
relevant dimensionalities.
2
The last two examples give asymmetric data matrices, conditional proximity matrices in Coomb’s terminology.
The number of times signal j will be given as an answer when signal i is presented will generally be different
from the number of times signal i will be given as an answer to signal j to give one example of asymmetry. The
models to be discussed for similarities data do, however, require symmetry. If these models are to be applied to
asymmetric matrices, the latter must then be converted to symmetric matrices by some averaging procedure (this
is usually done).
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
5
Having settled on a specific form of a data reduction model, what Coombs (1964) called "detection of
relations, order and structure", is not an automatic process, there still remains the final phase 3c, to
make inferences from the output of the analysis. At this stage the output is usually represented as a
configuration of points in a spatial diagram. One example is "semantic structures” in the well-known
study of a case of multiple personality by Osgood and Luria (1954). Fifteen concepts for each of the
three personalities were presented in three- dimensional semantic space. Another example from a
quite different area of psychology is the study of nine Munsell colours, by Torgerson (1952, 1958). The
results were here presented in a two-dimensional diagram (the dimensions being value and chroma).
Regardless of the subject matter an important consideration in the final phase is what we will call the
precision of the output. 'The concept of precision implies statistical considerations. The naive point of
departure for this concept is questions of the type: How "good" is this configuration, how much
confidence can I have in it? Acknowledging that there will always be random error from various
sources infesting the results, the question can be restated: If the study was repeated under similar
conditions to what extent would the resulting configuration be identical? The concept of precision has
some similarity to the concept of “behaviour domain validity” in the reformulation of classical reliability
theory by Tryon (1957). By precision we will mean the extent to which the location of points in an
observed configuration is the same as their “true” positions. In terms of Tryon's conceptualization the
observed configuration corresponds to an actually obtained set of test scores, while the true positions
correspond to "behaviour domain scores".
It is well known that the square root of the reliability coefficient is an index for behaviour domain
validity (discrepancy between domain and observed scores). There is, however, no index for the
precision of a configuration. One of the major problems in this work is to construct an index of
precision (discrepancy between true and observed configuration) and show how this index can be
estimated in practice. Extensive simulation studies (Monte Carlo methods) are necessary for this
purpose.
How will knowledge of precision be related to the process of making inferences? Generally speaking
the better the precision the more inferences can be drawn from the “fine structure” of the configuration.
Conversely, if precision is only marginally satisfactory, only the coarse structure of the configuration
can be used for making inferences. In the latter case only major clusters can be tentatively identified.
With close to perfect precision one can also use structural relations within cluster as basis for
inferences.
1.2
Comments on formal and content oriented approaches.
Two major approaches to the problems of applicability, dimensionality and precision may be
described. One approach relies on general indices which are independent of the more or less specific
hypotheses which the scientist may entertain. This will be called a formal approach and will be
discussed first.
The most important such index is "goodness of fit". This is an over all measure of how well the output
matches the data. For similarities data nonmetric multidimensional scaling is today the most common
method used for analyzing the data, and goodness of fit is usually assessed by stress, as described by
Kruskal (1964a, 1964b) in his important improvement of Shepard’s original program. Though
nonmetric multidimensional scaling and stress is extensively discussed in Section 3.1, some
discussion of stress at this point will serve to highlight weaknesses of the current formal approach. For
the evaluation of stress in practice Kruskal first states that stress is a normed "residual sum of
squares", and since it is dimensionless it can be thought of as a percentage. Different ranges of stress
are described as follows:
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
20% and above:
15% and above:
10% -15%:
Alternatively stress
in the range
10% - 20%:
From 5% -10%:
Below 5%:
From 2.5% - 5%:
Less than 2.5%:
0%:
6
Unlikely to be of interest
We must still be cautious
We wish it were better
Poor
Satisfactory or fair.
Impressive.
Good
Excellent.
Perfect.
These descriptions are extracted from Kruskal (1964a, p.3) and (1964b, p. 32) .The underlined
phrases represent the most frequently quoted part of Kruskal’s description. The background for his
description is described as follows: "Our experience with experimental and synthetic data suggests the
following verbal evaluation".
As regards the problem of applicability of the model, the answer based on stress would be to draw a
cutting point somewhere in the neighbourhood of a stress of 20% and regard all values above this as
evidence against the applicability of the model.
Similarly the only guide available to judge the precision of the output configuration is the phrases
"poor", "fair”, "good", "excellent".
Kruskal (1964a, p. 16) is more explicit on the problem of' dimensionality, where he suggests an “elbow
criterion”. First one plots a graph showing how stress decreases as dimensionality, t, increases (This
curve will further be referred to as a stress curve.) If adding an extra dimension gives a relatively small
improvement in stress, this may be noticeable as an elbow in the stress curve and this elbow may then
point to the appropriate value of t.
The problem here is well known in factor analysis ("when to stop factoring") and similar to the problem
of looking for an elbow in the stress curve is the process of inspecting a curve showing how
eigenvalues (characteristic roots) decrease for successive factors extracted. It is a well-known strategy
to look for “breaks” or elbows in such curves. This is recognized as a process which may require fairly
subtle skills, and Green (1966) even suggests detailed studies of how accomplished "root starers" go
about their task in order to write simulation programs to "mechanize” this task.
Here we note that the elbow criterion very often fails. Whether this is due to limited applicability of the
criterion itself or lack of skill in applying it, will be only incidentally treated, since the conceptual
framework outlined in Ch. 2 will suggest an alternative way of using a stress curve to determine
dimensionality.
Notice that from a more general point of view there is an implicit conflict between two goals in
determining dimensionality. On the one hand a frequently stated goal of parsimony requires a low
dimensionality. Generally, however, low dimensionality will give high stress. This conflicts with the goal
of not doing too much violence to the data. This latter goal, that the final solution remains close
(faithful) to the data, is easier to satisfy, the higher the dimensionality. Higher dimensionality implies
more parameters, and with a sufficient number of parameters of course any body of data can be
faithfully accounted for. We will later show that the apparent conflict between parsimony and
faithfulness stems from placing too much emphasis on goodness of fit. In the conceptual framework to
be proposed the conflict will be resolved.
A more important shortcoming of basing conclusions primarily on stress is that a variety of simulation
studies have shown that a given stress value has very different implications depending upon the
number of objects, n, and the dimensionality, t. The conceptual framework to be developed will serve
to interrelate and extend these studies. It will then be possible to offer concrete alternative guidelines
for answering the problems of applicability, dimensionality and precision, based on all three
parameters, stress, n and t.
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
7
Finally it is evident that the description of how to use the stress index by Kruskal, quoted previously,
clearly is of a provisional nature. The description has undoubtedly been of help in many cases, but the
criticism can be raised that the description is scarcely an impressive improvement over purely nonquantitative criteria.
The present approach will still be a formal approach, since the meaning of a particular configuration of
points will not be considered. In practice the scientist will in most cases be most concerned with
precisely such meaning or what we here choose to call the content. Some comments on content
orientation and its relation to the formal approach in multivariate research are therefore appropriate.
Recognizing the limitation of the current formal approach, for instance to the problem of
dimensionality, an alternative criterion of "interpretability" is often proposed, for instance by Kruskal
(1964a, p. 16).
“Interpretability” is not elucidated by Kruskal, but the concept appears to be equivalent to looking at a
configuration and seeing to what extent it makes sense. Faced with configurations in differing
dimensionalities one picks the one who make most sense.
This approach is probably also the most common in connection with the problem of applicability. If
none of the configurations in the relevant dimensionalities makes sense, the scientist will probably
conclude that a spatial model is not applicable for his data.
Before commenting on the relations between a formal approach and a content orientation a somewhat
more detailed account of the process of "making sense of' the results is necessary.
The extent to which a scientist makes sense out of results from an analysis clearly depends upon how
the results fit in with his preconceptions as to what the results might have been. Henryson (1957)
gives a useful description of various ways of using factor analysis which is equally relevant for
multidimensional scaling. First of all he distinguishes between descriptive and explanatory use of
factor analysis. Descriptive use is not relevant for our purpose, but the further subdivision of
explanatory use in hypothesis testing and hypothesis generating is highly relevant. Emphasizing
hypothesis testing appears to be equivalent to placing multivariate models within a strict hypotheticaldeductive framework. In such cases the content of the configuration is completely specified à priori.
The study of Munsell colours by Torgerson (1958) may be used as an example, here the Munsell
system provides a framework for specifying precisely how the colours are expected to be related. On
the other hand, hypothesis generation is equivalent to explorative research. Thurstone (1947, p. 56)
stressed the explorative nature of factor analysis: "The new methods have a humble role. They enable
us to make only the crudest first map of a new domain". The study by Osgood and Luria (1954) may
serve as an example of exploratory use.
It is, however, important to emphasize that the above distinction does not delineate two different kinds
of research. As Henryson (1957, p. 91) points out:
It is not possible to maintain any clear distinction between testing and generating of
hypotheses. By hypothesis testing some of the hypotheses can be disproved, and at
same time a basis can be found for establishing of new ones. These, in their turn, require
new tests, and so on. Nor is the generating of hypotheses a completely isolated
procedure. It is usually based on some form of more or less implicit hypotheses, which
become evident in the selection of variables…
This points the way to a dialectic view of the process of interpretation. The scientist has more or less
clearly preformed images (expectations) as to how the results will turn out, and these expectations
guide his preliminary interpretations. On the other hand, the results will stimulate more or less
pronounced changes in the images. In predominantly explorative research the images will be of low
clarity or sometimes we might even say that the results are not matched with images but stimulate
generative processes. If such processes do not provide a meaningful and subjectively acceptable
structure, the scientist will conclude that the results are worthless or that the model was not applicable.
In strict hypothesis testing the scientist will have a completely preformed image. In Piagetian terms we
might say that in this case the question is whether the results can be assimilated within present
structure or not. For explorative research the emphasis is mainly on how the cognitive structure of the
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
8
scientist accommodates to the empirical results. In practice the process of interpretation will be a very
complicated interplay of assimilative and accommodative processes.
Should interpretation - which is here regarded as a content oriented. Approach - guide decisions on
applicability, dimensionality and precision? One could regard this as a “subjective” (vague, difficult to
communicate) criterion, which ought to be replaced by an "objective" (precise quantitative,
communicable) criterion (a formal approach). One can indeed regard much of the history of
multivariate methods as attempts to replace "subjective" with “objective” criteria for decisions.
An argument can be made that our three main problems are best settled by a purely formal approach,
which will then provide an optimal framework for interpretation. This approach provides a different
perspective on the relation between “subjective” and “objective” approaches. To illustrate this we first
point out some dangers connected with unfettered interpretation. There is first what we might call
"overinterpretation”. Imagination may run wild. In discussing the Rorschach test, Smedslund (1967)
suggests that the capacity of some specialists to make sense out of this type of verbal material may be
compared to how some people manage to “see” goldfish swim in conformity with music being played.
More to our point there is a disturbing report of how accomplished experts can make sense out of
randomly constructed configurations (Armstrong and Soelberg, 1968)"
This danger may be particularly salient when the points represent verbal material, since for any
scientist extensive networks inter-relating concepts exists. Stimulated by for instance psychoanalytic
thinking one may then easily make sense out of practically any configuration. A quotation from Osgood
and Luria. (1954, p. 588) may illustrate this: “To rhapsodise, Eve Black finds PEACE OF MIND
through close identification with a God-like therapist (MY DOCTOR, probably a father figure for her),
accepting her HATRED and FRAUD as perfectly legitimate aspects of the God-like role.”
A knowledge of the precision of the configuration may indicate whether the clusters forming the basis
for such an interpretation can be reliably identified. This will be a necessary condition for having
confidence in interpretations (though of course not sufficient). If precision is fairly low, this will serve to
temper speculation, on the other hand a high precision may encourage more detailed interpretation. At
a different pole from overinterpretation there is the danger of what we may call "false neglect”. If one
is tempted to throw away the results because the output does not seem to make sense, a high
precision should encourage continued attempts at interpretation.
While overinterpretation and false neglect are easiest to illustrate within mainly explorative designs,
similar processes may occur when the research is closer to conventional hypothesis testing. First, a
knowledge of precision is necessary to decide to what extent only the crude aspects of the
hypothesised structure can be verified. If the precision is very low the results may be simply irrelevant
to the hypothesis, highly incongruent results cannot refute the hypothesis in this case. Recalling that
research rarely conforms to a strict hypothetical-deductive model, we can outline a general type of
choice facing the scientist when he has some commitment to a detailed hypothesis. In most cases the
results will show both features supporting the hypothesis and features deviating from the hypothesis.
Correspondingly the scientist can choose whether he will emphasize the support for the hypothesis, or
the deviations. A knowledge of the precision may give valuable information guiding this decision. The
previously quoted example from Torgerson (1958, Ch. 11) may be used as an illustration. Comparing
Fig. 4 and Fig. 8 one is first of all impressed by the pronounced similarity between the Munsell
placement (representing the hypothesis) and the empirical outcome (Fig. 8). On the other hand, there
are some deviations, in Fig. 8 stimulus 2 is somewhat displaced and to a lesser extent stimulus 5. If
now the precision is extremely high, these discrepancies might suggest taking a closer look at whether
research should be undertaken which might result in revision of the Munsell system. On the other
hand, with a more moderate precision there might be no basis for expecting closer fit than actually
observed, and in such a case there would be no reason for ascribing any significance to the
deviations.
Concerning dimensionality “interpretability” does not seem to be a desirable criterion. If a detailed
hypothesis exists a formal approach might provide a valuable independent check on the hypothesized
dimensionality. If interpretability in this case was used as the criterion of dimensionality, one might
miss the possibility that a formal approach might have provided a different answer. This danger seems
even more pronounced when the research is more exploratory. In this case no clear image of the
outcome exists and the hazard of ending up with a wrong dimensionality is added to the hazards of
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
9
overinterpretation and false neglect. The general position taken here is quite similar to the case made
by Armstrong and Soelberg (1968).
The distinction between a formal and a content oriented approach must be regarded as tentative. A
more satisfactory description of the relation between them would require a psychological description
of' how research actually occurs. Unfortunately this is a neglected field (studies of "root staring"
previously referred to would cover one small and perhaps insignificant area in this field). It is to be
hoped that studies will be undertaken to give a detailed description of how experienced scientists
actually go about completing “phase 3” (cfr. Fig.1l). This might in turn give rise to even better guide
lines for improving future research.
The motivation for the present work is, however, that improving the present formal approach will be of
help for research even though a satisfactory formulation of the relation between this approach and
“substantive”, content oriented considerations is lacking.
1.3
General properties of data reduction models.
The conceptual framework to be developed in this work has grown out of a more general interest in
exploring a specific view on multivariate techniques. This view regards nonmetric scaling as one
example of a broad class of methods which not only serves to give a maximally comprehensible
representation of a complex set of data, but also do have substantial theoretical implications. As
expressed by Coombs (1964, p. 6):
The entire process of measurement and scaling may be said to be concerned with data
reduction. We have perhaps an overwhelming number of observations and their
comprehensibility is dependent upon their reduction to measurement and scales. This is
a mechanical process but only after buying a (miniature) theory of behaviour.
“Data reduction” and “model” (miniature theory) in the term "data reduction models" are highly
interrelated. The most general starting point for any model for data analysis is the belief that there is
some structure (constraints) in the elements in the data matrix. This point of view is clearly brought
forth by Shepard (1963a, p. 33-34) when he discusses substitution errors in learning Morse signals:
now presumably there is some structure or pattern underlying [these (36/2) = 630
offdiagonal entries]. Otherwise there would be no purpose in presenting these numbers
separately, their means and variance would alone suffice. But of course we do not
believe that these numbers are independent and random.
Inspection is not a sufficient method to grasp this structure:
Unfortunately, though, man’s information processing system is notoriously unable to
discern any pattern in an array of [630] numbers by inspection alone. Therefore in order
to find out what these numbers can tell us about the way this system processes dot and
dash symbols we must first supplement this natural processing system with artificial
machinery.
It is such "artificial machinery" that we call data reduction models. Successful applications of such a
model then rests upon assumptions about the structure (constraints):
the entire set of numbers is in some way constrained or redundant. This in turn implies
that this set can in principle be recovered from a smaller set of numbers. And if the
recovery is sufficiently complete, then the smaller set of numbers contains all the
information in the original larger set. In fact we might [then] be said to have captured the
latent structure behind the manifest data. (underlined here), (op.cit. p. 34).
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
10
A data reduction model can then be said to imply a specific theoretical view of the kind of
constraints in the data.
The theoretical content is embodied in the type of latent structure and also in how the relation between
the latent structure and the manifest data3 is conceptualized. The symbol L will denote the latent
structure or model, M will be used to denote the data, the input to a data reduction model.
A further basic concept is that in practice there can never be a simple, direct relation between L and
M, there is always measurement error or noise to be coped with. A very interesting notion is that a
data reduction model may somehow strip away the noise which infests the data so that the output may
give a truer picture. This point of view is clearly expressed by Shepard (1966, p. 308):
The analysis can be regarded in part as a technique for purifying noise data. A spatial
representation [output from the analysis] of the type considered here can be both more
reliable and more valid than the fallible data from which it was derived. (underlined here).
A similar point of view is expressed by Coombs et al. (1970, p. 32):
to construct a scale from noise data, one needs a scaling method that "removes" the
error and provides means of estimating the “true” scale values.
I have not seen any systematic attempt to develop procedures to investigate whether it really is the
case that data reduction models may "purify data” in the sense that the output "is more reliable and
valid". If this possibility exists this logically implies that there is also the possibility that a data reduction
model may distort the data in the sense of giving a less valid representation than M.
A major aim of the present investigation is to attempt to clarify conditions under which data reduction
models may distort or purify data.
The discussion so far may tentatively be summarised by listing the following possible properties of the
output from data reduction models:
a) The output is simpler than M - it may be regarded as a reduced description of M since it contains
fewer numbers.
b) From the output one may - more or less completely - reconstruct (recover, reproduce) M.
c) The output from a data reduction model may give a “truer picture” than M in the sense that noisy
data are purified.
d) The output may reveal underlying – latent - structure. The symbol L will denote the latent structure
or model.
e) The output may give information about psychological processes.
Of these properties we especially wish to emphasize c). Generally it seems reasonable to assume that
if a data reduction model may give a false or distorted picture, it can in this case not reveal underlying
structure and thus not give valid information on psychological processes. Conversely purification may
tentatively be listed as a necessary (but perhaps not sufficient) condition for a data reduction model to
reveal underlying structure. Provided our data reduction model contains a valid conception of the
processes underlying the data, we expect purification to occur. On the other hand, if the model implies
an inappropriate theory we might expect distortion to occur.
c) is thus related to the larger problem of evaluating theories. Clearly procedures for evaluating such
theories should have priority over the often stressed need for “comprehensibility” or a simpler, more
manageable description than M. (What use can a “simple” description have if it distorts the data?) In
other words we wish to emphasize that c) is also more basic than a).
3
The term “manifest data” might be regarded as a redundant misnomer since data always are "manifest"
“Manifest” is, however, a useful contrast to "latent" and the terminology is seen to have some currency in the
literature.
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
11
So far the quotations supporting the listed properties have been drawn from multidimensional scaling.
The general description given is, however, also applicable to other types of multivariate models.
Concerning for instance factor analysis, Thurstone refers to this type of model as a method for
"revealing underlying order". He asks the reader to imagine a correlation matrix where various
performance measures are intercorrelated and:
our question now is to determine whether these relations can be comprehended in terms
of some underlying order which is simpler than the whole table of several hundred
experimentally determined coefficients of correlations (Thurstone, 1947, p. 57).
There is also the aim to “discover whether the variables can be made to exhibit some underlying order
that may throw light on the processes that produce the individual differences”, (op. cit. p. 57) - cfr. e)
above.
The basic term "latent structure" is probably best known from Lazarsfeld's work. In his major
exposition of "latent structure analysis" (LSA) - (Lazarsfeld, 1959) he repeatedly refers to "underlying
constructs". '"Locating of objects cannot be done directly. We are dealing with latent characteristics in
the sense that their parameters must somehow be derived from manifest characteristics. Our problem
is to infer this latent space from manifest data.” (op.cit. p. 490.) It is interesting to note that LSA may
be regarded as a data reduction model with the same general properties as the model discussed more
explicitly in the article by Shepard quoted above. Lazarsfeld is concerned with “the restrictions put on
the interrelations within the data by the assumptions of the model” (op. cit. p. 507) - in Shepard's
terminology this corresponds to "redundancy" in the data. When discussing an application of LSA to
academic freedom Lazarsfeld is clearly concerned with how the latent structure may throw light on
psychological processes (op. cit. p. 528-532). Finally an important step of LSA is from a set of latent
parameters to compute “fitted manifest parameters” (op.cit. p. 507). This corresponds to b) above.
Since we have especially stressed c) the following quotations are of special interest: "We know that
many individual idiosyncrasies will creep into the answers. -- In the manifest property space
[corresponding to our M] we are at the mercy of these vagaries. But in the latent space --- we can take
them into account and thus achieve a more "purified" description". (op.cit. p. 490, underlined here.)
Since LSA deals with quite different data structures than similarities data, this shows the general
character if the concept of data reduction models as elaborated in a) - e).
The central concepts of this section, purification and distortion, are relevant to the presently unsolved
problem of evaluating applicability of multidimensional scaling models. (An axiomatic formulation by
Beals et.al. (1968) has met with limited success, cfr. Zinnes. 1969). Logical clarification of the terms
“purification” and “distortion” will be the main task for the next chapter. This chapter will be concluded
with a section giving an introduction to the two types of models we are going to discuss in detail.
1.4
Type of model and psychological theory.
The most important aspect of any data reduction model is how the nature of L is conceived within the
model. Two broad classes of models may be distinguished which both are geometrical. The best
known is some kind of spatial (or dimensional) model, the other type of model will be called a tree
structure model.
1.41
Spatial (dimensional) models.
In a spatial model the objects are represented as a configuration of points in a space of specific
dimensionality. The basic relation between points in space is the mathematical relation distance.
There are three axioms a function must satisfy in order to be a distance function:
dij.= 0
if and only if i = j
d ij = dij
distance is a symmetric function
dij + djk ≥ dik
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
12
The third is the most important axiom and is usually referred to as the triangular inequality. The sum of
two distances in a triangle can not be less than the third or stated otherwise, the indirect distance
between two points (i, k via j) can not be less than the direct distance, dik.
The most well-known distance function is the Euclidean distance - well known from plane geometry in
high school.
1
2⎤2
⎡
(1) dij = ⎢ Σt (ail − a jl ) ⎥
⎢ l=1
⎥
⎦
⎣
Where t is the number of dimensions, and ajl is the coordinate of point i on dimension l. This is,
however, only a special case of a more general class of distance functions, usually referred to as
Minkowski metrics:
1
⎤
⎡
2
p
(2) dij = ⎢ Σt ail − a jl ⎥
p≥1
⎥
⎢ l=1
⎦
⎣
Beals et al. (1968) prefer the term power metrics which seems better since it is a more descriptive
term. In mathematical literature the term 1p metrics is widely used. For the power (p) equal 2 equation
(2) is readily seen to give the usual Euclidean function.
Two other special cases of equation (2) are of interest. For p = 1 the metric becomes the so-called
city-block distance:
(3)
t
dij = ∑
ail − a jl
l =1
For this model "we must think of the shortest distance between two points (stimuli) as passing along
lines parallel to the axes: metaphorically speaking, we must always go around the corner to get from
one stimulus to the other" (Attneave, 1950, p.521).
For the other limiting case - p = ∞, dij - reduces to:
[
(4) dij = max l ail − a jl
]
which is called the l∞ metric or dominance metric. In this space only the largest component difference
counts, all the others are neglected.
A coordinate matrix A, dimensionality (n, t) where ail is an element is the primary output from a spatial
data reduction model.
For p ≥ 1 the lp metrics are known to satisfy the triangular inequality. From equation (2) it is readily
apparent that the two first distance axioms are satisfied.
Why should spatial models be relevant to similarities data? And does the power - (p) - have any
possible psychological significance (since spatial models are mostly based on power metrics)? The
first question is answered by Shepard who points out:
There is a rough isomorphism between the constraints that seem to govern all of these
measures of similarity on the one hand, and the metric axioms on the other. In particular,
to the metric requirement that distance be symmetric, there is the corresponding intuition
that if A is near B then B is also near A. To the metric requirement that the length of one
side of a triangle can not exceed the sum of the other two, there is the corresponding
intuition that, if A is close to B and B to C, then A must be at least moderately close to C.
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
13
-- this in turn invites an attempt to carry the powerful quantitative machinery that has
developed around the concept of distance to the intuitively defined notion of proximity.
(Shepard, 1962a, p.126.)
As for the second question raised above, Shepard has again made important contributions. He has
drawn a distinction between analyzable and non-analyzable stimuli. For non-analyzable stimuli the
different dimensions are not phenomenologically given for the person judging similarity or difference.
The standard example is here pure colours where the dimensions hue, brightness and saturation are
not immediately given. For such stimuli Shepard suggests that judgements may be mediated by an
Euclidean model. For analyzable stimuli, however, he suggests that the component differences will not
be combined according to an Euclidean model, but that city block model may be more appropriate. For
further discussion on the implications of this for verbal learning and decision making see Shepard
(1964, and 1963b). Torgerson had a similar distinction in his 1958 book and later (Torgerson, 1965)
suggests that an Euclidean model may be appropriate for purely perceptual processes while the city
block model will be more appropriate for cognitive processes. For the other special case of the power
metrics Coombs et al (1970, p. 64) states:
The p = ∞ model corresponds to Lashley's principle of dominant organization. He (1942)
proposed that the mechanism of nervous integration may be such that when any complex
of stimuli arouses nervous activity, that activity is immediately organized and certain
stimulus properties become dominant for reaction while others become ineffective. This
model, called the dominance model, is suggested by experiments in which some one
stimulus property appears to be dominant in exerting stimulus control of behaviour. The
distance between two points in such a metric is the greatest of their differences on the
component dimensions.
A general perspective is provided by the following quotation from Coombs (1964, p. 248-249):
Any model which presumes to make a multidimensional analysis of a data matrix is by its
very nature a theory about how these components [the coordinate differences in equation
(2)] are put together to generate the behaviour. Any theory about a composition function
[cfr. p in equation (2)] is a theory about behaviour. This, it seems to me, is what makes
the subject interesting and important. The components in and of themselves are static,
inert and just descriptive, until a composition model imbues them with life. Perhaps most
of psychological theory can be expressed in the context of a search for composition
models.
1.42
Tree structure model (hierarchical).
While in spatial models objects are regarded as points in a space, a basic notion in a tree structure is
the concept of an object as belonging to one of a set of non-overlapping classes. In the present work
"class", "type" and "cluster" will be treated as synonymous concepts. In the field of clinical psychology
one may, albeit very roughly, distinguish between type theories versus dimensional (or trait) theories.
Horney, Jung, Kretschmer, Freud may for instance be regarded as theorists describing persons as
falling within separate types, while Allport, Cattell and Guilford may be taken to exemplify theorists
preferring to describe persons as varying along a number of dimensions. Considering psychiatric
classifications within the framework of data reduction models Torgerson writes: "It is possible to
categorize the investigators in two classes: the typologists and the dimensionalists". (Torgerson, 1967,
p. 179)
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
14
In Fig. 3. the notion of representing a classification as a tree is illustrated:
"Tree" is used in the strict graph theoretical sense of the term in this work, that is, a tree is a
connected graph with no loop. Whenever a tree structure is used as a model, the objects will be
represented as terminal nodes or leaves in the graph, cfr. level 0 above. The branches from the leaves
to the nodes at level 1 indicate which of the objects are included in each of the three classes. The final
node, level 2 above, is the root node which serves to delineate the domain of interest in a specific
study.
From the point of view of Steven’s well known classification of scales, a typology is equivalent to a
nominal scale. This is often regarded as a rather primitive way of describing structure. Another way of
stating this point of view is that a tree with just two levels is in most cases not very interesting
psychologically. The tree concept is much more powerful by considering not only a simple
classification (as we have done so far), but also subclasses within classes and further sub-subclasses
within subclasses etc. This way of thinking is perhaps best known from taxonomic schemes in biology,
where we at a fairly “high” level have phyla, then classes within a phylum, orders within a class, geni
within an order, species within a genus and finally as leaves specific creatures.
In psychology tree structures are used rather infrequently compared with spatial models. Miller (1969)
has used a tree structure model in semantics - partly inspired by taxonomic schemes - to account for
sorting data. He has also reanalyzed some of the free association data of Deese (1962) with such a
model.. Since tree structures are not generally well known in psychological research, a detailed
example is given below. The example (from Gibson, 1970) also illustrates that a tree model may give
information on psychological processes, (cfr. e), p 10).
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
Fig. 4
15
Tree structure resulting from a hierarchical cluster analysis of latency data for visual
discrimination of pairs of letters by adult subjects. Based on Gibson (1970, p. 139)
The branches in this example may be considered as graphemic features. The branches we have
labelled are commented by Gibson: "The first split separates the letters with diagonality [a2 which
subsumes the cluster (MNW)] from all the others [a1 which subsumes (CGEFPR)]. On the left branch
[b1 the “round” letters C and G, next split off from the others. [b2 - (EFPR)]. At the third branch, the
square right-angular letters, E and F [cl], split off from letters differentiated from them by curvature.
[c2]”. The tree structure for seven year old children was similar, but not quite the same.
Gibson's summarizing comments are especially interesting from the point of view of throwing light on
psychological processes: The result “suggests to me that children at this stage may be doing
straightforward sequential processing of features, while adults have progressed to a more Gestalt-like
processing picking up higher orders of structure given by redundancy and tied relations" (op.cit.p.
139).
How can a tree structure be extracted from a symmetric data matrix? The basic notion underlying all
classification is that objects within a cluster are close together while objects belonging to different
clusters are less close together. Extending this notion to subclasses implies that the objects
represented in clusters at the lowest levels (closest to the leaves) are closest together. Conversely a
specific pair of two objects are further apart the closer to the root one must move in order to find a
cluster which contains the pair. While this concept of closeness applied to trees is not recent, it seems
to have been first formalized in distance concepts by the highly important work of Johnson (1967). His
formalization also resulted in simple algorithms which can be applied to noisy data. Johnson's concept
of hierarchical clustering schemes, HCS, which is isomorphic to tree structure, will be extensively
treated in Ch. 5.
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
16
Chapter 2.
A METAMODEL FOR DATAREDUCTON MODELS
2.1
A metamodel.
A basic thesis in this work is that data reduction models may be regarded as (miniature) psychological
theories. While this work mainly deals with similarities data, in Section l.3 a set of properties of data
reduction models is outlined, which transcends similarities data. In Section 1.4 two alternative
geometric models are briefly sketched, spatial models and tree structure models. The introduction to
these models specifically illustrates the possible substantive relevance of the models.
For convenience in the following discussion the properties of data reduction models outlined in Section
1.3 are briefly restated here.
a)
b)
c)
d)
e)
Parsimony. The output is simpler than M (data input to the model).
Reconstruction. From the output more or less complete recovery of M is possible
Purification. The output may give a truer, more purified description and thus be said to:
reveal latent structure (L), and
give information on psychological processes.
While e) above most directly ties in with the basic thesis on data reduction models ( that they are
substantive theories), we have chosen to emphasize c) since it may be regarded as a necessary
condition for d) and e). An investigation of c) is then relevant to the general problem of evaluating
theories (applicability). As discussed in Ch. 1 the usual approach to evaluating data reduction models
is to compute indices of goodness of fit and evaluate such indices either more or less intuitively or in
the light of statistical sampling distributions. Goodness of fit will be seen to be directly related to b) and
a) above.
The task of this chapter is to outline a conceptual framework which will interrelate the usual goodness
of fit approach and purification (and distortion). This conceptual framework is called a "metamodel".
The term metamodel should be taken to imply a conceptual framework relevant to a wide variety of
types of model. The metamodel will serve as a useful heuristic guide to general insights on data
reduction models and also for suggesting new methods for answering problems of applicability,
dimensionality and precision. The basic and simplest form of the metamodel is presented in Fig. 1.
Fig.1.
A metamodel representing the relations between latent structure (L), manifest (M) and
reconstructed data (G).
NL - Noise Level.
AF - Apparent Fit, goodness of fit, stress, degree of recovery of M from G.
TF - True Fit, degree of recovery of L by output G.
Before discussing the interrelated concepts “apparent fit”, "noise level" and “true fit”, some preliminary
comments concerning L, M and G are necessary.
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
17
Concerning the data M, we have so far considered this as a square, symmetric matrix for similarities
data. It will however, be useful to consider M as being in vector mode. Any square, symmetric matrix
may be strung out as a vector, for instance in the sequence (2,1), (3,1), (3,2), (4,1), (4,2), etc. This
n
sequence excludes the diagonal, and the vector consists of ( 2 ) elements. Since we exclude
asymmetric matrices the symmetric values (1,2), (1,3), etc. are safely ignored. Other types of data
matrices than similarities data may be strung out as vectors in different ways. Unless otherwise
specified, M will from now on always refer to a vector with ( n ) elements. An element in this vector will
2
be referred to as mij .
While only one mode needs to be considered for M, two modes must be distinguished for both G and
L. For spatial models the output G is usually considered in the mode of configuration, a (n,t) matrix of
coordinates where n is the number of points and t the number of dimensions. If this mode is
specifically intended, the symbol G(C) will be used. gil will refer to a specific element (coordinate) in
G(C). The coordinates gi1 may be inserted in equation (2), Ch. 1. This will give reconstructed distances
which we here consider as another mode of G. These reconstructed distances will be in strict one to
one correspondence with the data elements. As for M we consider these reconstructed distances as a
n
vector (with ( 2 ) elements for similarities data). If this mode is specifically intended, the symbol G(V)
will be used G will serve as a generic symbol. G will be used if the context makes the specific mode
intended apparent, or if it is not essential to specify the mode.
There are the same two modes for L as for G. The configuration mode will be denoted L(C), L(V) will
denote the distance mode, and L will be used as a generia symbol in the same way as G will be used.4
Though the concept latent structure will be discussed in more detail later, we may here note that L can
be specified in more or less detail. At the most general level L may refer to just any type of latent
structure. The barest amount of specification is to state the type of L e.g. whether L is intended to
denote a spatial model or a tree structure model. On the next level of specificity, we may indicate a
specific form of L, e.g. dimensionality and type of metric space for spatial models. Finally, what may
be called the content of L may also be specified, that is concrete values of the element in L(C). In
simulation studies it is always necessary to have complete specification of the content of L(C).
Turning now to the connecting lines in Fig. 1, a central feature is that these lines can be regarded as
“sizes of discrepancy”. Consider first the relation between M and G. The extent to which one can
reconstruct M from G is a question of the discrepancy between G(V) and M. The closer G(V) is to M,
the more satisfactory is the "goodness of fit". Algorithms for finding G are more or less directly aimed
at optimizing fit (closeness), or what amounts to the same: minimizing the discrepancy between M and
G (V). Sometimes the discrepancy may even be interpreted literally as distance, this was for instance
the approach taken in defining the main problem in factor analysis by Eckart and Young (1936, p.
212), who formulated the problem in terms of a least squares solution to minimizing the distance
between the data matrix and a matrix of reduced rank.
Kruskal’s nonmetric algorithm is directed at minimizing stress, his index of discrepancy. Since Kruskal,
however, treats M strictly as an ordinal scale, stress cannot directly be regarded as the distance
between G (V) and M. The term discrepancy is somewhat less precise than the term distance, but
longer lines in the metamodel will always be taken to imply larger discrepancies.
4
In Section 5.1 we discuss how a tree is algbraically represented and how one from L(C) can compute the
corresponding distances, L(V). There is then the same relation between a tree output, G(C), and the
reconstructed distances, G(V).
Concerning references to single elements double subscripts will always be used. For elements in M - and
associated vectors to be considered in Section 3.1. there can be no confusion since there is only one mode to
consider. For G(V) convention dictates using dij to denote elements Finally the context will make clear whether
subscripted l’s refer to elements in
L(C) or L(V).
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
18
Since we may also from a tree output compute reconstructed values, G(V), it is possible to compute
goodness of fit also for a tree structure mode. It may, however, be noted that the algorithms for finding
the tree output are not explicitly directed at minimizing the discrepancy between G(V) and M. The
important point in this context is that a tree structure model may be discussed within the same general
framework as spatial models, which has not previously been done.
According to Coombs (1966, personal communication) any data reduction model consists of "a theory
of behaviour and an algorithm for applying the theory".
We are thus faced with the double problem of evaluating the theory (in this context how L is specified),
and since for similarities data there are a variety of competing algorithms, there is also the problem of
evaluating algorithms.
A central thesis in the present work is that goodness of fit cannot provide sufficient answers to these
problems, cfr. the critical comments on stress in Section 1.2.
Consider first a basic claim for nonmetric methods, that a configuration with metric properties can be
recovered just from ordinal data. The only way of demonstrating this is by first "knowing the answer"
and then to show that from ordinal information in M it is possible to get back essentially what we
started from. In terms of the present terminology this implies starting with a complete specification of
L(C), then to compute L(V) and then set M equal to L(V). In the analysis all but the ordinal information
in M is further ignored. The question of “recovery” is then answered by comparing G(C) with L(C) and
to the extent that these are similar, the answer is satisfactory. Indeed, this basic scheme was followed
in the first example given both by Shepard (1962b) and Kruskal (1964a) as will be discussed in more
detail later. At present we note that the discrepancy between G and M is not relevant in answering the
present question of recovery. Even if this discrepancy is zero, it might still be the case that the
underlying metric configuration is very incompletely matched by G. The main point of the metamodel is
to draw attention to the fact that it is more basic to know how far G is from L than to know how far G is
from M. To highlight this difference we have borrowed a pair of terms from Sherman (1970). He used
the term "apparent fit" (AF) for stress (goodness of fit) and “true fit” (TF) for the discrepancy between
L and G.
True fit has variously been called “accuracy of solution”, "metric determinacy", “'recoverability of metric
structure”, ”metricity”. We think "true fit” is to be preferred since it highlights the major importance of
assessing this discrepancy, furthermore it contrasts conveniently with “apparent fit”.
The case we have discussed so far, setting M equal to L(V), is not realistic as a model for empirical
data, since this case makes no allowance for noise or measurement errors. In terms of the metamodel
this case corresponds to zero discrepancy between L(V) and M, or noise level (NL) = 0, and we can
then conceive of L and M as coinciding.
Conversely we can conceive of various amounts of noise as equivalent to various sizes of
discrepancies (length of NL in Fig. 1). Methods of simulating error processes which results in various
noise levels will be discussed in Section 3.2, at present we simply note that a set of data which
contains much measurement error can be pictured as far removed from L, the line NL will be long.
We can now give a literal interpretation of the highly abstract concept “purification”. Any error process
removes M from L,
purification implies that the result G moves back closer to L.
In order to give precise meaning to this concept it is necessary to use the same index for NL and TF.
Since we have stressed that all three basic terms in the metamodel may be presented in the same
mode (vector mode), any index used to express NL may in principle be applied also to TF. Provided
the same index is used for NL and TF, we then have the following simple definitions.5
5
Unless otherwise specified a smaller value of any index for NL, TF and AF will always imply a "better" value
or closer congruence between the corresponding pair of L, M and G.
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
Purification:
Distortion:
19
TF < NL
TF > NL
Detailed results on purification will be presented later. We shall then see that for a given noise level
purification increases with n, a result highly consonant with the general expectation that it pays to have
a lot of data. From the metamodel we should, however, then expect that stress should increase with n
since more noise is stripped away the further the result G must be removed from M (of course in the
direction of L). This result strongly points out the advantage of using the metamodel as a conceptual
framework. If' the emphasis is solely on stress it would be very disconcerting to have stress increasing
with n when the results really get better.
We have just implied that without the guidance provided by the metamodel one may fall prey to a
pseudoconflict between the goal of having high n and the goal of a good apparent fit. A similar type of
pseudoconflict has already been discussed in Section 1.2. It will be recalled that low dimensionality
best satisfies parsimony (gives a high degree of data reduction) but generally gives high stress,
whereas the reverse is the case for high dimensionality. To see this as a “conflict” between the goal of
parsimony and the goal of remaining close to the data is nothing but a recognition of the limitation of
the latter goal. We will not place special emphasis on any of these two goals but argue that the
superordinate goal of searching for the best true fit will provide a new approach to the vexing problem
of dimensionality.
The answer to the central question on purification, does this really occur, has already been
anticipated. The concept has some currency in the literature, but has not been found to be specified
concretely before. This will be further done here after discussing various indices for TF and NL in Ch.
3.
Concerning the main finding that purification generally will be found to occur, it may be objected that
this really is a trivial finding, that it is just a consequence of a quite general “averaging” process.
Purification may be seen as analogous to the general fact that for a sample of measurements
generated by a certain stochastic process, the mean may be a better representation than the separate
observations. To take a different example, a regression line may give a better representation of a
relation than the whole scatter diagram.
For nonmetric models the case with no noise will of course always involve some distortion. When L
and M coincide, G should also coincide if there was no distortion. The fact that G will not exactly
coincide with L in this case implies some distortion (the amount of distortion in this case will be
discussed later). When the limiting case of no noise involves distortion, it is not trivial that the usual
case with more or less noise will imply purification.
There is, however, a more important way to answer the objection of "triviality". The fact is that no
objection really has been stated. A data reduction model implies a connection of the type of
constraints in the data. If this conception is valid we may generally expect purification. If however, this
conception is not valid distortion may be expected. Concerning the simple statistical examples just
mentioned it may first be pointed out that it is not trivial that a regression line will give a "better
representation". If there actually is a non-monotone relation, a regression line may be said to distort
the relation. Quite often summary statistics as the mean and variance are regarded as not having
theoretical implications, just being convenient "descriptive statistics". This point of view is strongly
contested by Mandelbrot (1965). Discussing “a class of long tailed probability distributions” he points
out that the Pareto distribution is useful in describing a variety of phenomena and that for this
distribution the second moment is not finite and thus the variance in any sample will not be of any use
as a descriptive statistic. He even points out that forms of the Pareto distribution, where the first
moment does not exist, may also be useful, - in this case not even a mean of a sample would be of
any use but would just obscure underlying processes generating the distribution.
The position taken here is that purification is a useful concept because the contrasting phenomenon,
distortion, may also occur. This may generally be expected to occur when an inappropriate model is
applied to a specific set of data. This will be illustrated by showing that when a tree structure model is
applied and L is of a spatial type, G will be further removed from L than M is, as will be briefly
commented upon in Concluding remarks. A different type of application of the present conceptual
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
20
framework is the previously mentioned problem of evaluating algorithm. Which of the competing
algorithms for similarities data is best?
If for a wide variety of different true configurations, L, method A consistently gives better values of TF
than method B, method A is clearly to be preferred. Notice that for this type of problem a comparison
of the different goodness of fit indices is not relevant. Partly they may not be comparable (concerning
for instance the Shepard-Kruskal approach versus Guttmann-Lingoes, it will be shown in Section 3.1
that the different indices of goodness of fit are not strictly monotonely related). But even if they were
strictly comparable the important issue in comparing the methods is not which one gives output closest
to M, but rather which method gives output closest to the true configuration L. In other words, degree
of purification is proposed as a criterion for comparing methods, the method with the highest degree of
purification should be considered best.
We have argued that TF is more basic than the usual goodness of fit, AF. Since in empirical
applications L is usually not known, TF must be estimated. If it were possible it would be desirable to
work out mathematically joint sampling distributions of TF and AF indices, but the mathematical
problems seem to be insurmountable. Consequently a simulation approach will be used. Estimates of
TF from AF and n and t are then proposed as the answer to the problem of precision, see Section 4.6.
In simulation studies L must be completely specified. From L (and usually some error process)
synthetic or "artificial" data are then generated and the solution G is evaluated from the point of view of
L, which may then be called an external true criterion. While L is not known in empirical applications,
cases may exist where the scientist is able to completely specify his image of L before analysis of his
data. This is what we in Section 1.2 referred to as "strict hypothesis testing". Torgerson's study of
Munsell colours was used as an example. In this case the Munsell classification may be regarded as
an external empirical criterion. Let us label such a criterion L' and insert this in the metamodel, cfr. Fig.
1, one may then compute the discrepancies TF’ and NL’. If it turns out that G is closer to L’ than M is
to L’, we may say that the hypothesis L' implies a purification of the data and our confidence in L' will
increase. On the other hand it might be the case that G moved away from L' or that distortion
occurred. In this case general confidence in L’ (or the theories generating L’) would not seem
warranted. Theories should account for data, not lead to distorted representations.
This appears to be a novel approach to the problem of evaluating completely specified hypotheses in
multivariate models. In Section 1.2 we did, however, argue that research which seeks a yes or no
answer to the validity of a completely specified hypothesis is rather atypical. By also considering the
discrepancy between M and G, more precise questions can profitably be asked. From the observed
AF, TF can be estimated, this estimate can then be compared with TF'. Even though purification might
have occurred, it might still be the case that TF' is substantially worse than TF. One might then
conclude that though L’ points in the right direction, some revision is called for. This may stimulate a
revised conception of L', say L’’, which may then be used as a basis for new empirical data, and the
same process may continue with L’’ substituted for L'.
2. 2.
The extended form of the metamodel.
Empirical and theoretical purification.
The simple form of the metamodel in Fig. l has been found useful for clarifying weaknesses of relying
too heavily on goodness of fit, estimation of true fit is proposed as an alternative. While the discussion
referred to both the problem of dimensionality (form of L) and applicability (type of L), no methods
were suggested to answer these problems. We will now discuss extensions of the metamodel which
have implications for both of these problems.
In order to estimate true dimensionality solutions in several dimensions are required. These solutions
might be generated from one set of data (giving a stress curve). In order to check applicability it is,
however, necessary to have several sets of data generated by a given L. We first consider this case, a
simple interpretation is repeated measurements on individual data. In Fig. 2 we visualize the structure
when we have two sets of data, Ml and M2, furthermore two corresponding outputs, Gl and G2.
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
Fig. 2
21
Extended form of metamodel (for repeated measurements). The figure illustrates empirical
purification, that is: Rel (G1, G2) < Rel (M1, M2)
In Fig. 2 we have diagrammed the case where G1 and G2 are closer than MI and M2. The discrepancy
between various G’s (or various M's) will generally be denoted by Rel (for relation), it being understood
that the closer the pair of terms is being related, the smaller is the value of Rel. Concrete indices for
assessing these relations will be discussed in Ch. 3.
In Fig. 2 we then have:
Rel (G1, G2) < Rel (M1, M2).
A major problem is now whether it is generally justified to conclude from the inequality above to
purification and thus that a valid model has been applied.
We will argue that not only is it possible to conclude from the inequality in Fig. 2 to purification, but that
also the reverse implication holds good, that is: purification generally implies that the inequality in Fig.
2 will be satisfied.
Notice first that purification as defined on p. 18 can never be directly observed in empirical work, since
with real data L is always unknown. This is not the case for the inequality in Fig. 2. This inequality can
always be empirically investigated whenever we have repeated measurements. Consequently this
inequality will be referred to as empirical purification. When both empirical purification and purification
as defined on p. 18 is under discussion, the latter will be referred to as theoretical purification.
An equivalence thesis may now be stated:
Empirical purification and theoretical purification are logically equivalent, that is the occurrence of one
implies the occurrence of the other.
We will state a preliminary case for this thesis by arguing against the conditions which would invalidate
it. The two invalidating conditions are diagrammed in Fig. 3.
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
22
Fig 3:
Illustrations of possible lack of equivalence between empirical purification and theoretical
purification
a)
Empirical distortion and theoretical purification (cfr. Fig. 3a)
Consider first Ml and M2 as composed of L plus "large" error components. Similarly Gl and G2 may be
considered as composed of the same L plus "smaller" error components. This is a direct consequence
of theoretical purification. If the error terms are not too much correlated, simple psychometric theory
tells us that Rel (Gl, G2) < Rel (M1, M2). In this case theoretical purification will imply empirical
purification, and the possibility diagrammed in Fig. 3a) will not occur.
On the other hand M1 and M2 could "guide" G1 and G2 in separate directions, yet both G1 and G2 could
be closer to L than M1 and M2. This could come about if G1 and G2 each capitalized on specific noise
components. For this case to occur, however, substantial correlations between error components in M
and in G would probably be necessary.
b)
Empirical purification and theoretical distortion (cfr. Fig. 3b)
This does not seem at all likely to occur. If L is inappropriate as a model for M1 and M2, there is no
basis for "guiding" G1 and G2 closer to each other. To the extent that it is safe to rule out this
possibility, it is legitimate to conclude that empirical purification will imply theoretical purification.
Results substantiating the equivalence between empirical and theoretical purification will be presented
later. We shall then see that the 5 vectors L (V), M1, M2, G1 (V), G2 (V), has a structure of a Spearman
type, where L corresponds to the general factor. To the extent that this holds true, the 5 vectors might
be diagrammed as points on a straight line. L (V) can be considered as a unit vector, then G1 (V) and
G2 (V) closest to L (V), as these are more saturated with the general factor than M1 and M2. A more
practically useful approach to the problem of applicability is sketched on p. 23.
Consider now analyses performed in several different dimensionalities. For each dimensionality we get
a pair of' results (G1t, G2t) where the superscript t is an index for dimensionality. A simple way of
estimating dimensionality is to pick the value of' t where G1t and G2t are closest. Different
dimensionalities correspond to different forms of L. More generally we then offer the following tentative
selection rule:
The true form of L is found by selecting the form of L which corresponds to the highest degree of
empirical purification.
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
23
If, regardless of' the form of L, there is no empirical purification this may indicate that the wrong type of
L has been selected, that the type of' model chosen is not applicable to the specific set of data.
We can now see why the simple form of' the metamodel in Fig.1 is insufficient for throwing light on the
problem of applicability. The reason is that no independent estimate of' NL is possible. We can then
not know whether for instance a high stress indicates that the model is inappropriate, that is, we have
used a wrong theory or whether simply the bad results are due to highly unreliable data. By increasing
NL, stress will generally increase. If one is tempted to conclude that a very high stress indicates that
the model is inappropriate, high NL will then always be an alternative explanation.
The situation is quite different when we know Rel (M1, M2), because from this NL can be estimated as
will be shown in Section 4.7. Furthermore TF can be estimated from Rel (G1, G2) by the same
procedure. All the concepts we have introduced when discussing the extended form of the metamodel
may be pictured as a redundant network of constructs as in Fig. 4.
Fig. 4
Construct network for extended form of metamodel
When the method of estimation of NL from Rel (M1, M2) is the same as the method of estimation of TF
from Rel (G1, G2), it follows that there must be the same ordering of TF and NL as of Rel (M1, M2) and
Rel (G1,G2). This is merely another way of stating a conclusion previously indicated by a different
argument, that empirical purification and theoretical purification are logically equivalent.
The redundancy in Fig. 4 points at general ways of testing the appropriateness of the model. Suppose
that in the apparently appropriate dimensionality (t0) there is a fairly high stress (AF|t0). From this
stress one may estimate NL. If on the other hand Rel (M1, M2) implies much less noise in the data than
AF|t0 this may indicate that the model does not apply to the data at hand. The point is that the
conversion from Rel (M1, M2) to NL will not be theoretically neutral, but implies a specific conception as
to the nature of L. If then Rel (M1, M2) turns out to be much too low (little error), this indicates that there
is more structure in the data than is captured by the specific type of L being hypothesized. In such
cases we should also expect empirical distortion.
A general way of evaluating applicability becomes possible through a comparison of the two values for
degree of purification in a given case, the empirically given purification and the estimated theoretical
one. A necessary condition for such a comparison is that empirical and theoretical purification are
expressed in the same units. In Section 4.7 this will be accomplished by converting the relations in
empirical purification to the same units as those used for theoretical purification.
When only one set of data is available one may as previously discussed plot a stress curve and look
for an elbow in this curve. An alternative strategy, however, is for each separate stress value to
estimate true fit in the corresponding dimensionality. This implies trying out several hypotheses as to
the form of L and to estimate true fit for each form.
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
24
A simple solution to the problem of dimensionality is to select the dimensionality which corresponds to
the lowest (best) estimated value of true fit. Formulated as a general rule:
When several estimates of TF are available - each estimate assuming a specific form of L - the true
form of L is that which corresponds to the lowest value of true fit.
This will be seen to be a generalization of the selection rule on p. 22 since Rel (G1, G2) and thus
empirical purification is assumed to be monotone with true fit.
The general rule given above and the more specific selection rule correspond to supplementary
strategies for finding the proper dimensionality. Simulation studies to be discussed in Section 4.7 will
form the basis for evaluating the success of these strategies, considered both separately and jointly.
Summary
In this chapter a metamodel has been presented as a conceptual framework for data reduction models
generally and specifically for simulation studies. True fit - discrepancy between results from the
analysis and true, latent structure - replaces apparent fit (conventional goodness of fit indices) as the
central concept. Noise is an important concept in the metamodel:
Noisy data are assumed to be purified when the analysis is guided by an appropriate theoretical
conception of he type of redundancies in the data
An optimal goal for simulation studies is to provide decision rules which can be applied in practical
work. Deciding on the applicability of the model and the problem of dimensionality has been
discussed. A further problem is to evaluate competing algorithms with the same general purpose. This
includes both comparison between various nonmetric algorithms and also comparing metric and nonmetric algorithm.
In the next chapter the three relations: noise level, apparent fit and true fit - which here have been
discussed in a general way - will be treated in detail. The complexity of simulation studies will be
apparent in Ch. 4. In this chapter steps are taken to realize the goals outlined for simulation studies.
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
25
Chapter 3
NONMETRIG MULTIDIMENSIONAL MODELS AND THE METAMODEL
3.1
Nonmetric algorithms and criteria for apparent fit (AF)
The most popular nonmetric multidimensional scaling method today is probably the one described by
Kruskal (1964a, 1964b) who set out to improve Shepard's original program. Since Shepard has
abandoned his original program in favour of Kruskal’s MDSCAL program (cfr. Shepard 1966, Shepard
and Chipman, 1970) we will not give any attention to the details of Shepard's first program. Later
Guttmann and Lingoes with their "smallest space analysis" (SSA -1) and further Torgerson and Young
(TORSCA) have offered programs with the same purpose as Kruskal, see Lingoes (1965,1966),
Guttmann (1968). Young and Torgerson (1967), Young (1968a). Whether these methods have any
advances compared with Kruskal's method will be discussed in Section 4.32.
Common to all the nonmetric methods is that no specific functional relationship between data and
distances is assumed. The only assumption is that there is a monotone relation between distances
and data. For the nonmetric methods the distances of the obtained configuration are computed so that
the order of the data is optimally reproduced (stress is an index telling how well this is accomplished).6
Reproducing order is in contrast to the older metric methods where the aim was to reproduce not only
order but the actual values of the data. Since a configuration obtained from nonmetric methods has
essentially the same properties as a configuration from a metric method, we can say that the
nonmetric enterprise replaces the strong interval or ratio assumptions of metric methods by the much
weaker ordinal assumptions.
The strongest claim which can be made for the nonmetric methods is that the weaker assumptions of
this approach does not in general imply any loss of information. Simulation studies relevant to this
claim will be reported in Section 4.33.
The present survey of the main features of the currently most popular nonmetric algorithms borrows
heavily from the recent work by Lingoes and Roskam (1971). This is by far the most detailed
mathematical exposition of the main methods available and also gives fascinating glimpses of the
sometimes acrimonious debates among the persons chiefly involved in developing the methods.
As implied by the introductory remarks in this section the essence of the nonmetric methods is
captured by the concept of monotonicity. Corresponding to a vector of dissimilarities M, with elements
mij is a monotonically related vector ∆ with elements δij. ∆ "replaces" M in all algebraic formulations in
the algorithms.
The general principle of monotonicity may be stated in two slightly different forms:
6
The term "similarities" which describes the kind of data the present work focuses on is used in two different
senses. Partly it is used in a generic sense, as a general term describing symmetric data matrices. In this sense it
subsumes "similarities" and "dissimilarities" as properties of the data in specific examples.
In the specific sense high data values of dissimilarities correspond to large distances while the reverse is the case
for similarities. For metric methods similarities are usually treated as scalar products, while dissimiarities are
treated as distances which are then converted to scalar products.
The presentation of nonmetric algorithms is simplified by considering dissimilarities as the basic case. Data
where high values correspond to low distances are then assumed first to be converted to dissimilarities by sorting
them in descending order.
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
Strong monotonicity:
Weak monotonicity:
26
whenever mij < m k1 then δ ij < δ kl
whenever mij < m kl then δij δ≤ δ kl
When there are ties in the data a further distinction is whether tied data shall imply tied δ values or not.
The weaker case where it is allowed to have tied data values and untied δ values, is following
Kruskal's preference for this approach, called the “primary approach to ties”. The stricter requirement
that tied data values shall imply tied δ values, is then called the “secondary approach to ties.”
Primary approach to ties:
If mij = mkl then δij =δkl or δij ≠ δkl
Secondary approach to ties:
if m ij = m kl then δ ij = δ kl
∆ (and M thus only indirectly by the principle of monotonicity) is one of the terms in the second basic
concept in nonmetric methods, the loss function. This function is a general expression for the
discrepancy between the distances - a vector D with elements dij computed from a (trial) configuration
- and the values in ∆.
p
dij = ( t∑ gik − g jk )1/ p
k =1
Where G7 is a (trial) configuration in t dimensional space. The loss function is then defined:
Loss = ∑(dij − δij )2 / ∑ d 2ij
Loss may be regarded as synonymous with the earlier discussed “goodness of fit” concept. (Badness
of fit would logically be a preferable term since the smaller the value the better the fit.)
The aim of nonmetric algorithms may now be formulated as finding the D which minimizes loss. Note
that in Loss there are two sets of unknowns, both D and ∆. An iterative process to be discussed later
is necessary to find the optimal D.
The basic distinction between the Kruskal and the Guttmann-Lingoes approach to nonmetric scaling
lies in the way ∆ is defined. Kruskal uses the symbol D̂ for his version of ∆ and defines the d̂ ij as the
numbers which (for a given D) minimize Loss while maintaining weak monotonicity with the data. In his
widely adopted terminology the quantity to be minimized is “stress”, and his celebrated stress formula
is:
)
S = ∑(dij − dij )2 / ∑ dij2
The Lingoes-Roskam Loss function is immediately seen to be a generalization of Kruskal’s stress.
7
G (and gik) are used both to denote a trial configuration and, as in Ch. 2, the final output configuration frcm the
analysis
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
27
In Kruskal’s approach finding d̂ ij requires a separate minimization process. First the dissimilarities in
M are arranged in increasing order. The set of distances, D, is partitioned into “the smallest possible
blocks such that each block consists of a subset of distances whose subscripts are associated with
consecutive values in M d̂ ij is set equal to the average of the distances in the block to which it
belongs.” (Lingoes and Roskam, 1971, p. 26). Starting with the distances corresponding to the lower
ordered dissimilarities, distances are merged into blocks until each block average is larger than the
preceding block average and smaller than the succeeding block average. When the process starts,
each distance is regarded as a separate block. Whenever a block average is smaller than the
preceding block average (not “down-satisfied” in, Kruskal's terminology) or larger than the succeeding
block average (not "up-satisfied”) - a merging of the corresponding blocks takes place. This process
will minimize stress for a given set of distances, and the set D̂ is weakly monotonic with the data.
Briefly, this may be described as a “block partition” definition of ∆.
In the Guttmann-Lingoes approach (embodied in the program SSA - 1) ∆ is denoted by the symbol D*
and is known as Guttmann’s rank images. D* is defined as a permutation of distances such that D*
shall maintain the rank order of the dissimilarities. When the dissimilarities are sorted from low to high,
“the rank images are simply obtained by sorting the distances from low to high and placing them in the
cells corresponding to the ranked cells of M”. (op.cit. p 47). The way D* is constructed automatically
implies that the rank images must satisfy strong monotonicity, unlike Kruskal's d̂ values.
The computation of rank images and block partition values is illustrated in Table 1. The example is
taken from an analysis of an order 4 matrix, the analysis of such matrices is discussed in detail in
Section 4.32.
Computation of rank images, D*, and block partition values, D̂
Table 1.
M - data
D - distances
1. blocking*
2..blocking
3. blocking
4. blocking
1,0
2,0
3.0
. 523
.689 .400
(.523) (.689) (.400)
(.523) (.544 .544)
(.537 .5537 .537)
(.537
.537 .537)
D̂ - final blocking
D* - rank images
(.537
.400
.537
.523
4.0
5.0
1.089
1.612
(1.089) (1.612)
(1.089) (1.6l2)
(1.089) (1.6l2)
(l.089) (l.412
.537) (l.089)
.689 1.089
6.0
1.212
(1.212)
(1.212)
(1.212)
1.4.12)
(1.412 1.412)
1.212 1.612
*Parentheses indicate blocks.
It is evident that the distribution of D* is identical to the distribution of D (since D* is a permutation of
D) and thus all moments of the distribution are equal. The way D̂ is constructed implies that the d̂
values have the same first moment as the d values, that is ∑ d̂ ij = ∑dij, whereas the higher moments
will generally be different. Only in the perfect case when there is perfect monotonicity and thus ∑ d̂ ij =
∑dij, (and S = O) will the distributions be identical in Kruskal’s algorithm. In this case each distinct dij
will have a corresponding distinct d̂ ij which wil1 be a separate block. For large stress values there will
be large blocks and thus a high degree of ties will exist in the d̂ values. It is exactly this property which
is described by the concept “weak monotonicity”, which contrasts with the strong monotonicity for rank
images.
In the Guttmann-Lingoes program two related formulas are used to evaluate goodness of fit:
ϕ = ∑(dij − dij* )/2 ∑ dij2
K* =
[
]
1
2 2
1 − (1 − ϕ )
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
28
From the apparent similarity between the formulas for S and ф it has been an unfortunate practice to
think that ф and S bear a simple relation to each other. Young and Appelbaum (1968, p.9) write: "It
can be seen immediately that 2 ф = s2 “, and Lingoes (1966, p. 13) in discussing an application of
SSA-1: "since Kruskal's normalized stress S - (2ф) 2". The first textbook treatment of nonmetric
scaling to appear states: " Φ is closely related to Kruskal's S, the relation being Φ = 1/2 • S2 except
that d̂ ij is estimated somewhat differently" (Coombs et al 1970, p. 71) .
The implied caveat in the last quotation should not be neglected. Indeed, the different definitions of
d̂and d* preclude any simple relation between S and Φ.
In order to illustrate this one could study the difference between S* = 2Φ and Kruskal’s S. Instead
we choose to compare K* with S since “in practice S* is almost identical to K* in numerical value"
(Roskam, 1969, p. 15). If' for instance S* = .2191 then K* = .2178. From the definition of K* and S* it is
readily seen that K*/S* = l − Φ / 2 . Lingoes (1967) has' proposed K* (and Φ) as a general index of
pattern similarity, applicable also when methods other than SSA have been used. Conversely
Kruskal’s S could also be applied to evaluate the outcome of SSA-1. K* and S may be seen as
alternative ways of evaluating the outcome of a specific algorithm independent of the algorithm
employed.
In the example in Table 1, K* = .2667, whereas for the same solution (same distances) S = .1406.
This reflects a general tendency, for the same solution K* will be substantially larger than S (except of
course when there is perfect fit). The smaller value of S is due to the fact that a separate minimization
process is involved when the d̂ values (which enter the stress formula) are computed, in contrast no
minimization is involved when computing the rank images which enter the formula for K*.
Having now discussed how ∆ may be computed and the corresponding evaluation of Loss, we turn to
the iterative process for finding D. A special problem is how this process is to be started. This is known
as the "initial configuration" problem.
The basic distinction to be made concerning this problem is whether the initial configuration is arbitrary
or non-arbitrary. In the former case an initial configuration is defined which has no specific relation to
the dissimilarities whatever. One widely used such configuration is described by Kruskal (1964b, p.
33). An arbitrary initial configuration can also be generated by some random device. Non-arbitrary
initial configurations in some way utilize the information in the dissimilarities. Ways of constructing
non-arbitrary initial configurations will be discussed in Section 4.32. The problem of the initial
configuration is usually discussed in terms of "local minima".
The latter concept is clarified by an overall abstract view of the iterative approach.
Consider first ∆ as fixed, Loss is then a function of nt variables (the coordinates in G). This space of nt
dimensions is called "configuration space" by Kruskal (1964b, p. 30) in contrast to the more usual
"model space" of' t dimensions. In configuration space each point represents an entire configuration.
This space will generally have several minima, the over-all minimum being the global minimum. The
iterative procedure searches for a minimum, but has no way of knowing whether a given minimum is a
local minimum or the desired global minimum. Consequently the procedure may be trapped at an
undesirable local minimum. There is a widespread feeling that with arbitrary initial configurations the
process will generally start "far away" from the desired global minimum, and that this will lead to an
uncomfortably high probability of being trapped at suboptimal local minima. Whether this really is the
case and whether it is to be recommended to start with non-arbitrary initial configurations, will be
discussed in Section 4.31.
Once the iterative process - in one way or another - is started, there finally remains the problem as to
how the configuration - and consequently the distances - is to be changed from one iteration to the
next. Young and Appelbaum (1968) distinguish between: Starting with a definition of an iterative
formula, then defining "best fit" without specifying any particular relation between the two. This is said
to characterize the Guttmann-Lingoes approach (SSA-1) and also the Young-Torgerson approach
((TORSCA): “On the other hand, Kruskal defines his notion of best fit, and then, on the basis of the
definition derived the best possible iterative formulation” (op.cit. p. 9). We might add that Kruskal used
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
29
the standard “negative gradient method" also called the "method of steepest descent". This involves
finding the derivative of the stress function and moving "downwards" in the direction of the gradient.
Young and Appelbaum distinguish between what may be called an extrinsic relation of fit and iteration
(SSA-1, TORSCA) vs. an intrinsic one (MDSCAL). This distinction should not, however, be taken to
imply too much since Young and Appelbaum show that the iterative formula for SSA-1, MDSCAL and
TORSCA, are of the same class and can all be stated:
v +1
v
gik
= gik
+ α n∑
dij − ∂ ij v
v
(g ja − gia
)
dij
j =1
where v is the iteration number (cfr. op.cit. equations 5, 18 and 19).
Notice that when dij is larger than ∂ij the points i and j are too far apart and tend to be pulled together,
and vice versa, so as to make dij and ∂ij more similar. The point i is of course subject to similar
influences from all the other points in the configuration, such that all these separate influences are
summed.
Each pair of points is thus subjected to some amount of "stress" in fact a physical analogue of how all
the separate stress components jointly act to change a given configuration is found in Kruskal and
Hart (1966).
The iterative formula above applies to all three methods only in the case of Euclidean space and
consequently the distinction between extrinsic and intrinsic relation between fit and iteration is of no
practical importance in this case.
For non-Euclidean space, however, the above iterative formula does not apply to Kruskal's MDSCAL.
MDSCAL may then well turn out to be superior to the other methods for non-Euclidean spaces if an
intrinsic relation is the best approach.
The coefficient ∝ is called a step-size coefficient. All the programs employ different strategies for
computing this. This is extensively discussed by Lingoes and Roskam (1971), and here it is sufficient
to note that comparison between algorithms in the non-Euclidean case is made more difficult t by the
fact that the step-size coefficient may be a confounding factor.
The major features discussed here are summarized in Table 2. In this table the most salient
similarities and differences between the "three major current nonmetric programs are stated.
Table 2.
A survey of major features of nonmetric algorithms
Monotonicity
Approach to tied
Dissimilarities
Definition of ∆
(values monotone with
dissimilarities)
Initial configuration
Relation iterative
formula and fit
function 2)
SSA-1
Strong
Primary
MDSCAL
Weak
Optional
(primary) 1)
TORSCA
Weak
Secondary
Rank images
Block partition
Optional
(arbitrary)1)
Block
partition
Nonarbitrary
Intrinsic
Extrinsic
Non-arbitrary
Extrinsic
l) The most frequently used options are stated in parenthesis.
2) If important, then only for non-Euclidean distances.
This analysis of various programs in terms of distinctive features encourages a more modular
approach. Several combinations of features not represented by the current programs may then
suggest themselves. This is extensively discussed by Lingoes and Roskam, indeed, the whole aim of
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
30
their study may be said to break current programs up in separate components and then to recombine
the components in an optimal way. They discuss a variety of features not even hinted at in the present
simplified presentation. A variety of mixed strategies is also discussed, that is shifting between
various combinations of features. Their strategy when analyzing various matrices may be said to treat
features as "independent variables". These are then evaluated in terms of the "dependent variables" S
(or K*).
Our primary concern is different. We wish to replace stress by true fit. Stress is then treated as a
predictor variable, true fit being the variable to be predicted.
From the point of view of' the metamodel stress is only one of several possible indices for the concept
apparent fit. An obvious alternative index for AF is K* (even if SSA - 1 is not used). Yet another
alternative would be to use a rank correlation between M and G(V) as an index of AF. This index was
provided as collateral information on goodness of fit in earlier versions of SSA-1, cfr. Guttmann (1968,
p. 478). Since, however, rank correlation neglects the metric properties of G(V) a better alternative
might be to compute linear correlation between G(V) and D*. Guttmann (1968, p. 481) points out that
the iterations tend to make the relation between G(V) and D* linear and to minimize the alienation from
the regression line through origo in the diagram plotting G (V) against D*.
Preliminary explorations have revealed that all these indices are highly interrelated and thus serve
equally well to predict true fit. This does not rule out the possibility that special cases might exist where
the specific index chosen for AF might make a difference, but at present it seems a very good first
approximation to regard AF as a unitary concept and use stress as the basic index for this concept.
3.2
Methods of introducing error and indices of no noise level (NL)
Noise or stochastic variations from various sources are ubiquitous in psychological research. How,
then, are noise processes specified in simulation studies? Do these specifications correspond to
psychological processes? In the present context a more important question is to what extent it really
makes a difference for the applicability of results from simulation studies, whether there is such a
correspondence or not.
Not only the type of noise must be specified, it is also necessary to specify different amounts of noise,
and finally there is the question as to what kind of index is to be used to describe a given amount of
noise.
In terms of the metamodel this section discusses the relation between L and M. In what way is M
specified as a distortion of L and how shall the discrepancy between L and M be indexed?
We shall see that there are many ways of introducing error and for each of the different ways there are
several possibilities of numerically expressing the resulting noise level. First different types of noise
are discussed, then amounts and indices.
A discussion of some different ways of introducing errors is found in Young (1970). First we describe
the method preferred by Young, this method will be extensively applied in Ch. 4. The procedure is
illustrated in Fig. 1.
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
31
Fig. 1 .Illustration of error process used by Young (1970).
The pair i, j with coordinates (li1, li2); (lj1, lj2) represents two points in the true configuration, L (C). The
line labelled 1ij represents the true distance between these points. Normal random deviates are added
to each coordinate for each point. This gives the error- perturbed positions i' and j'. The distance
between these positions, mij, may then be regarded as an element in the data vector M. It is important
to note that different random errors are added to a point i for each distance where i is involved, cfr. the
subscripts to ε in Fig. 1.
In Young's approach the variance of the random normal deviates, εijk, depends upon the whole
configuration, L (C), and is independent of points and dimensions. The more general case, where the
variance of ε maybe different for different points and/or dimensions, will be briefly discussed in the
Concluding remarks.
Following Ramsay (1969), Young points out that the error process used by him leads to a non-central
chi-square distribution of the dissimilarities, where the parameter of non-centrality is related to the true
distance between two points, cfr. Ramsay (1969, equation (1), p.171). Using Ramsay’s terminology,
this distribution will simply be referred to as the "distance distribution". The error process corresponds
to a multidimensional extension of Thurstone case V and will further be referred to as the RamsayYoung process.
This process implies that all distances will more likely be over-evaluated than under-evaluated. "In the
limiting case of zero distance it is certain that the estimate will not be an under-estimate. On the other
hand, as the true distance becomes larger the probability of an over-estimate approaches the
probability of an under-estimate". (Young, 1970, p. 461).
If we translate this into classical test theory we see that a basic assumption of this theory is violated.
Dissimilarity corresponds to observed score and distance to true score, the difference to error score.
That small distances will generally give (relatively) large positive errors (over-evaluation) and larger
distances smaller positive errors imply that true and error scores are negatively correlated contrary to
all psychometric test theory. Results to illustrate this will be presented in Table 3 and further discussed
in Comment 1), p. 40.
On the basis of Ramsay's results Young points out that the error process is equivalent to first
completing true distances then "add error to the distances, where the error has a non-central chisquare distribution”. (op.cit. p. 461). He then discusses why one should not use the normal distribution
to add error to the true distances (as Kruskal (1964a) did in an example to be discussed in the next
section). Young states (op. cit. p. 461):
It seems hardly true that a "real subject" would be as likely to underestimate a small
distance as a large distance", and he further argues: “The non-central chi-square
distribution seems to reflect the psychological aspects of the judgment processes more
accurately than the normal distribution (op.cit. p. 462.)
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
32
The reason given for this is that the distance distribution has a parameter corresponding to the number
of dimensions of the stimuli, while there is no corresponding parameter for the normal distribution.
Ramsay (1969, p. 170) goes one step further than Young and quotes data supporting this error
process: "The distance distribution predicts the sort of nonlinearity which has been observed in
varying degrees in some scaling studies. This appears in a positively accelerated relationship between
dissimilarities and corresponding interpoint distance... This phenomenon was especially evident in the
study by Indow and Kanazawa (1960)”. We may comment that Indow and Kanazawa used a metric
model, but essentially the same relation appeared when Kruskal (1964a, p. 19) reanalyzed these data
with MDSCAL.
Ramsay (1969, p. 170) concludes that "this is one of the most important discrepancies between the
distance and normal distributions. The normal distribution predicts a linear relationship between
estimator and interpoint distance”. While this non-linearity may be of theoretical interest we shall later
in this section see that for some purposes the consequences of the non-linearity are negligible.
In a more recent investigation Wagenaar and Padmos (1971) do use the normal distribution, albeit in a
multiplicative rather than additive way. Each distance was multiplied by a separate random element
with mean 1 and standard deviation x. For this procedure "one should note that the standard deviation
of the actual error is proportional to the distance” (op.cit. p. 102). On the other hand, in the study by
Indow and Kanazawa (1960) the relation between “scaled sense distance” (dissimilarity) and "SD of
scaled distance" is strongly non-monotone, an inverted U curve (op.cit. p. 331). Since furthermore the
relation between dissimilarity and distance is practically linear in the region where SD decreases as a
function of “scaled sense distance”, these data seem to run strongly counter to the Wagenaar-Padmos
model.
Unfortunately the results from Indow and Kanazawa do not quite rule out the validity of the WagenaarPadmos procedure. The judgment process might have been of a “two stage” nature. First there might
have been an error process analogous to the Wagenaar-Padmos type, then a monotone
transformation could have produced the non-linearities found by Indow and Kanazawa.
It will be useful to attempt to give an exhaustive list of the specifications to be made for a complete
description of an error process. This will serve to clarify the differences between the Ramsay-Young
process and the Wagenaar-Padmos procedure, and also to point out other possibilities to explore.
Nine types of decisions may be distinguished. It will, however, turn out that some of the decisions are
only relevant for specific choices on other decisions.
1. Where error is introduced.
a) Error introduced to coordinates (Ramsay-Young).
b) Error introduced to distances (Wagenaar-Padmos).
This may perhaps be thought to be a spurious distinction since, as we discussed, adding normal
deviates to coordinates in the Ramsay-Young process is mathematically equivalent to adding a value
from non-central chi-square distribution to each distance. This does not, however, imply general
equivalence between 1a) and 1b) since there are cases where an error process is specified in
complete detail for e.g. la), and there is no defined equivalence in 1b) (and vice versa). Generally 1a)
vs. 1b) may correspond to different psychological. Processes. “The variation in perceived difference or
similarity either must arise from variation in perception of the stimuli in psychological space
[corresponding to 1a)] or it must arise directly from difference perception [corresponding to 1b)]".
(Ramsay, 1969, p. 181).
2. Constancy of coordinate errors (only relevant for 1a)).
a) Error coordinates for point i differ from point j to point k.
b) Error coordinates for point i the same for all other points.
2b) will simply give a new configuration and is dismissed by Young (1970, p. 460): "This was not done
because there exists a perfect solution for the set of error-perturbed distances, the error-perturbed
configuration”. In the concluding remarks we will, however, find a place for this option when
considering further extensions of the metamodel. 2a) is illustrated in Fig. 1.
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
33
3. Addition of "extra dimensions” (only relevant for 1a)).
a) No additional dimensions added.
b) Add one or more additional "error" dimensions.
3b) is also dismissed by Young since "it would have produced distances which could be precisely
recovered in a space whose dimensionality was equal to the number of true plus error dimensions”.
(op.cit. p. 460). Lingoes and Roskam (1971), however, use precisely this method in the study of
“metricity" (true fit in the present terminology) included in their work. As will be discussed in the
Concluding remarks their enthusiasm for such studies is highly limited and it is then perhaps not
surprising that they have no discussion to justify their choice of method. Though the studies to be
discussed in Ch. 4 similar to Young's are based on 3a), we will find a place for 3b) in the more general
forms of the metamodel in the Concluding remarks.
4. Arithmetic operation to introduce noise.
a) Additive.
b) Multiplicative.
Young does not consider the multiplicative case, this is, however, used by Wagenaar and Padmos. As
already pointed out, their method will give increased error variance for larger distances. This property
makes it doubtful whether the multiplicative case can be generally meaningful if introduced to
coordinates. A necessary condition would then be that the absolute size of the coordinates was
meaningful, and thus that there was a natural zero point for the configuration. In multidimensional
scaling it is practically always assumed that the zero point is arbitrary, usually it is conventionally
placed at the centroid of points.
5. Type of distribution of error.
a) Normal distribution.
b) Other types.
No alternative to using the normal distribution as basic seems to have occurred in the literature on
multidimensional scaling. This, however, is not the case for onedimensional scaling, where for
instance the Bradley-Terry-Luce model (discussed by Coombs, 1964) is a prominent alternative to the
Thurstone model.
In the future it may perhaps be profitable to explore alternatives to the normal distribution. As is the
case for one-dimensional models theorizing may be given a different direction by exploring alternatives
to the often poorly specified assumptions underlying use of the normal distribution.
The next three decisions all concern homogeneity (or heterogeneity) of error variance.
6. Error variance for dimensions.
a) Error variance assumed the same for all dimensions.
b) Error variance assumed to differ according to dimensions.
The only argument for 6a) is of course its greater simplicity. As Ramsay (1969, p. 181) points out, it “is
rather like assuming that absolute sensitivity is the same for all relevant properties of the stimuli ...this
assumption is likely to be false in some situations".
7. Error variance for points.
a) Error variance assumed the same for all points.
b) Error variance assumed to be different for different points.
As for 6. there are likely to be situations where the simplest case is likely to be false. Especially when
the objects are very complex, as for instance persons or words, some objects are likely to be
intrinsically more difficult to judge and this will probably be reflected in larger error variance for these
objects.
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
8.
34
Error variance for distances (only relevant for 1b))
a) Error variance assumed independent of size of distance.
b) Error variance assumed to depend upon size of distance.
In most cases the outcome of 8. cannot be decided independently of the other decisions but will simply
be a consequence of other decisions made, for instance 1b), 4b) and 5a) as used by Wagenaar and
Padmos imply 8b). If one, however, should choose to disregard Young and Ramsay’s misgivings
concerning adding normal deviates directly to the distances then 4a) and 8b) might be an alternative
to the Wagenaar-Padmos procedure.
9. Monotone transformation added.
a) No specific monotone transformation for producing dissimilarities.
b) Monotone transformation added before arriving at the dissimilarities to be analyzed.
Young (1970) chose 9b), that is, he used a two-stage process in producing the dissimilarities to be
analyzed. After the Ramsay-Young process he chose an arbitrary monotone transformation (squaring
and adding a constant). It may at first seem surprising that this step was included. Previously we have
stressed that only the ordinal information in the dissimilarities is relevant, which should imply that the
output should be independent of any final monotone transformation. Consequently 9b) should be a
completely redundant step in investigating true fit. There is, however, a peculiarity in the TORSCA
algorithm so that: "it is possible that various monotonic transformations may result in different final
configurations with differing values of stress" (op.cit. p. 464). To circumvent the problem of local
minima the TORSCA algorithm starts by constructing an initial configuration from the “raw”
dissimilarities and this is the reason why 9b) might make a difference in the output configuration. In
Section 4.32 we shall see that the problem of local minima in most cases can be satisfactorily solved
without using more than ordinal properties of the dissimilarities. This will make it unnecessary to worry
about possible effects of "various monotonic transformations”.
We now turn to a brief discussion of the problems involved in testing whether a specific set of noise
specifications corresponds to psychological processes. Two kinds of implications of noise
specifications have already been mentioned, there is first the relation between distances and
dissimilarities (a plot of this relation is usually called a Shepard diagram). A second kind of implication
is how the variance of dissimilarities depends upon the size of the underlying distances. If now
implications of a given set of noise specifications are confirmed, our confidence in this set may
increase (as Ramsay made use of the Indow-Kanazawa study). If, however, the implications are not
confirmed, the situation is quite problematic. To see this it is necessary to have in mind the distinction
between two types of consequences of noise processes for the relation between L(V) and M. First the
dissimilarities in M may be shuffled relative to the true distances in L(V). This will for instance be
reflected in decreased rank correlation between L (V) and M. Such consequences we here call ordinal
rearrangement. Second, metric relations may be different in M than in L(V), metric rearrangement.
A given ordinal rearrangement will correspond to a large set of different metric rearrangements. Any
given ordinal rearrangement is not changed by adding a monotone transformation, as in 9b), whereas
the metric rearrangement is then changed. So far only metric consequences of noise processes seem
to have been derived, and these consequences have then been tested by metric assumptions of the
dissimilarities. If these metric consequences are not satisfied, this does not, however, rule out the
possibility that the ordinal consequences of the noise processes may be satisfied.
It may be the case that the psychological error process corresponds to the specifications made, but
that there are additional monotone transformations in the human information processing. If it is merely
the metric consequences of a given noise process which are violated, this is no argument against
using simulation studies based on the given noise process, since only ordinal properties are used in
nonmetric analysis.
There does not seem to have been any attempts to specify exclusively ordinal properties of specific
noise processes. It is then possible that different noise processes may give similar ordinal
consequences.
On the other hand it seems likely that for instance the Wagenaar-Padmos procedure and the YoungRamsay process may have different ordinal properties. The former procedure will be likely to produce
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
35
high degrees of shuffling of larger distances and relatively low degrees of shuffling of small distances,
whereas the Young-Ramsay process is more likely to produce equal amounts of shuffling for small
and large distances. A complicating consequence in deriving ordinal consequences of noise
processes is that such consequences generally will depend upon the configuration and the distribution
of distances which the configuration implies. In regions where distances are well spaced there will
generally be less shuffling than in regions where distances are more tightly clustered.
Deriving consequences of various noise processes - especially ordinal consequences - and devising
appropriate experimental tests is a seriously neglected field.
The question of whether a given set of noise specifications corresponds to psychological processes
has two different aspects. First, relevant empirical studies obviously have consequences for
psychological theory. There has for instance been some interest in specifically isolating a monotone
transformation by for instance studying to what extent such a transformation can be recovered.
Several examples are given by Shepard, (1962b, Figs.4, 5 and 6). In direct judgments of dissimilarity
where the subjects are asked for numerical estimates such a monotone transformation would
correspond to a kind of "psychophysical function". Nonmetric multidimensional scaling may provide a
new approach to the moot question of the form of such functions. It should, however, be emphasized
that merely asking about the form of monotone transformations in the information processing is
equivalent to focussing on 9b) and neglecting all other components of noise specifications. Indeed, in
the present framework, we will not regard this as inducing noise at all since there will be a perfect
monotone relation between distances and dissimilarities. In this case we regard L and M as coinciding.
In the typical case of discrepancy between L and M it may be quite difficult to disentangle
consequences of noise processes as specified by 1-8 above and any consequence of a specific
monotone (non-linear) transformation in the information processing. Recall for instance that a nonlinear relation may arise, not because of any specific monotone transformation, but simply as a
byproduct of the Ramsay-Young process.
However; interesting empirical testing of noise processes may be in itself, such studies do not
necessarily have any relevance for the main concern in the present work. What is of importance in the
present context is whether the relation between stress and true fit depends upon how error processes
are specified. Hopefully there will be no pronounced interactions. If this is the case, the user of the
results to be presented in Ch. 4 does not need to worry about the kinds of error processes producing
his data. If, however, there are serious interactions, empirical testing of noise processes will be highly
relevant. In this case procedures for identifying the type of error process must be worked out and for
each type of error process a separate procedure for estimating true fit from apparent fit must then be
worked out. Alternatively, the more pronounced the interactions may turn out to be, the weaker must
be the form of any general conclusions.
Obviously anything even resembling a comprehensive exploration of various types of error processes
in simulation studies will be prohibitive. Furthermore at present purely theoretical work seems more
important than simulation studies concerning effects of various error processes. At present we will
mainly use the Ramsay-Young process, in Section 4.7 a comparison with the Wagenaar-Padmos
procedure is included. In the final chapter implications of more general error processes are briefly
discussed.
We now turn to the problem of how various amounts of noise, and thus sizes of NL, are produced.
For both the Ramsay-Young process and the Wagenaar-Padmos procedure amounts of noise are
defined as a proportion. We first discuss the Ramsay-Young process where the discrepancy between
L and M clearly depends upon the variance, σ ce, of the error terms ( σ in Fig. 1) which is added to the
coordinate, relative to the spread of the points in the true configurations. Young (1970, p. 465) defines
noise level, E, as "the proportion of error introduced".
(1) E = σ ce / σ v
where σ V "denotes the standard deviation of the true distances and serves as a standardization"
(op.cit. p. 462). This is by no means a natural choice of standardization term. An obvious alternative
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
36
would have been to use the standard deviation of the configuration σ c,8 in the denominator of the
definition of E. The latter alternative is discussed by Young, but no convincing rationale is given for the
choice he made.
He does for instance point out that σ V simultaneously will tend to increase with dimensionality and
decrease with mean true distance and thus "unless the mean distance is changing in a way to
compensate, we might expect that the effective proportion of error variance is confounded with
dimensionality" (op. cit. p. 463)"
Table 3 and comment 2) on p. 40 will illustrate that such compensation does take place.9
It is, however, important to note that "noise level" for a given proportion, EO, is not the same whether
σ c or σ v are used as standardization. It turns out that for a given configuration σ v is substantially
smaller than σ c. This implies that for a given configuration the error variance for E0 will be smaller
when σ v is used than when σ c is used. Results to substantiate this are presented in Table 3 (cfr.
comment 3) on p. 41). Unless one is very careful in specifying precisely the way E is defined, one may
risk not getting comparable values if results are reported in terms of E. Turning to the WagenaarPadmos procedure there is no problem of specifying a standardization term since a multiplicative
model is used. While their procedure as pointed out is similar to the Ramsay-Young process in
defining noise level in terms of a proportion (or in their terminology a "fractional random error",
Wagenaar and Padmos, 1971, p. 102), it should be stressed that their fractional error is not
comparable to E as defined by Young, cfr. Tables 3 and 4 and comment 5) on p. 41.
We can now see that there are several disadvantages with the approach of defining amounts of noise
exclusively in terms of some proportion. The preceding discussion implies that different ways of
specifying the proportion do not give comparable results, furthermore there is the problem that for any
given specification amounts of noise may depend upon irrelevant parameters, as for instance
dimensionality and Minkowski constant. Evidently some index different from a proportion must be
applied to substantiate these statements. The main such index used in the present work is simply the
correlation between L(V) and M, r (L, M).
If now for some specification of error proportion r (L, M) shows that noise depends upon some
irrelevant parameter, this is not an unproblematic statement, since it assumes that in some way r (L,M)
is more “basic”. Without making such an assumption one could alternatively say that it is r (L, M) which
depends upon the irrelevant parameter, that amount of noise per definition is the same when the error
proportion is the same. We regard r(L, M) as the basic index of noise level, NL, since it will turn out to
be fairly simple to relate this to other concepts - specifically the key concept purification - which does
not seem to be the case for any definition of an error proportion. Furthermore there does not seem to
be any satisfactory way of estimating error proportion from retest data. Wagenaar and Padmos (1971,
p. 105) refer to their proportion as ”measurement error” and imply that it can be estimated: “if the
measurement error is known beforehand for instance on the basis of repeated measurements", but no
procedure for “knowing measurement error” is described.
Young use σ c where we use σ V , and σ t where we use σ V . The present use of subscripts is consistent with
the terminology introduced in Section 2.1 where C was introdueed to denote the configuration mode (coordinates) and V was introduced to denote the vector (distance) mode.
8
9
While this makes a given level of E comparable across dimensiona.lities, E is not comparable across different
Minkowski values, p. This is because σ V decreases as p increases. Increase in p tends to produce more
homogenous distances since then more and more only one of the coordinates "counts" in producing any given
distance, the variance among coordinates in different dimensions looses in importancee Conversely, the variance
of distances from a given configuration will be maximal for the city-block model, p = l, since the coordinates in
all dimensions then contribute maximally to the distancese Since the simulation studies in Ch. 4 will be limited
to the Euclidean case, we will, however, not give further details on this problem here.
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
37
On the other hand we shall see that it is quite simple to estimate r (L, M) from retest ("parallel form”)
reliability. Conceptually this corresponds to the extended form of the metamodel, cfr. Section 2.2,
where we considered several M vectors generated from a single L.
In the discussion of the Ramsey-Young process we showed that it leads to correlations between true
and error scores and thus error scores for parallel forms will also tend to be correlated. At present,
however, we choose to neglect this and use simple correlational theory. Assuming then that what M1
and M2 have in common is completely described by L, the partial correlation between Ml and M2
holding L constant will be 0, which is expressed below:
r(M1M2 ) = r(L,M1) • r(L,M2 )
If we now assume that r(L, M1) is approximately equal to r(L, M2) or alternatively that we are interested
in the geometric mean of these two noise levels, r(L, M), we then have:
r(L,M) = r(L,M1) • r(L,M2 = r(M1M2 )
(2)
A very important advantage of this line of reasoning is that exactly the same argument can be made
concerning the relations between true fit and the two output configurations G1 and G2. The use of
correlation r(L, G) as an index for true fit will be discussed in detail in the next section, here we just
state the corresponding equation:
(3)
r(L, G) = r(L, G1) • r(L, G2 ) = r(G1G2 )
Notice that the terms of equations (2), and (3) give precise meaning to the two left pointing arrows in
Fig. 4, Ch. 2. If we should compare the present concepts to conventional psychometric theory this
might be done as in the outline below:
Validity
Reliability
Data
r (L, M)
r (M1, M2)
Output
r (L, G)
r (G1, G2)
Recalling a quotation from Shepard in Ch. 1 where he pointed out that the output can be both more
reliable and valid than the data from which it was derived (p. 10), we now see that the thesis of
equivalence between theoretical and empirical purification stated in Section 2.2 can give precise
meaning to Shepard's statement.
r (G1,G2) > r (M1,M2) and r (L,G) > r (L,M) imply precisely more reliable and valid output than
data.
Results to substantiate the general validity of equations (2) and (3) and further implications of these
equations will be presented in Section 4.7.
A very important objection to the use of correlation as the basic index for NL must now be discussed.
A usual correlation coefficient assumes interval scale properties of both variables. There can be no
objection to treating L(V) as an interval scale, concerning M, however, we have repeatedly stressed
that an essential feature of nonmetric models is that the output is independent of whatever monotone
transformation of the dissimilarities we may consider. It is well known that such freedom will
completely play havoc with the product moment correlation. Have we then forfeited our claim to
staying within the nonmetric framework by introducing r(L, M)?
First we note that provided the relation between L and M is essentially linear it is defensible to use
correlation. As a matter of fact the studies in Section 4.6 use r(L, M). This can be defended because
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
38
the non-linearity for the Ramsay-Young process turns out to have little, if any, consequence in the
present context. For one thing, the non-linearity just turns up in the ends of the Shepard diagrams, in
most of the region linearity is excellent, cfr. the previously referred to Fig. 16 in Kruskal (1964a, p. 19).
In order to study this non-linearity more precisely M vectors from 8 configurations with 20 points
(representing 4 different noise levels) in each of 3 dimensionalities (l, 2 and 3) (altogether 24
configurations) were generated and plots were made of the relations between L and M. (The RamsayYoung process was used). Inspecting these 24 plots it was hardly possible to detect any non-linearity,
though the effects in Kruskal’s figure were present, although somewhat less pronounced.
This may justify using r(L, M) for the Ramsay-Young process, but it does not justify general
applicability of the results based on using r(L, M). (The reader might ask if he had to assume linearity,
why use nonmetric models at all, this will, however, be answered in Section 4.33). In practice the
safest assumption will be that there is generally not a linear relation between L and M, this will also
make the use of retest correlation to estimate r(L, M) as in equation (2) highly dubious. Even if the
Ramsay-Young process may have some validity, there is always the possibility that there will be
additional monotone transformations in the information processing.
A solution would be to find some transformation of M which would offset the distortions induced by
using correlation if there is non-linearity. A simple solution which was attempted was to use rank
correlation. This solution has the attractive feature that in some data collection methods the data are
given in the form of ranks, an example is the ranking procedure used by Shepard and Chipman
(1970). In some preliminary investigations it did, however, turn out that using rank correlations gave
very crude fit for equations (2) and (3).
A much better solution is to use the principle of monotonicity to find a transformation of M so that it will
always be defensible to use linear correlation. It will be recalled that there are two different approaches
to monotonicity, rank images, corresponding to strong monotonicity, and block partition values,
corresponding to weak monotonicity. Of these two approaches rank images were found preferable,
since this will not tie values in the transformed data when the raw data are untied.
In the simulation approach there are two possibilities for rank image transformation of M, since both L
or G may be used as a basis for transforming M. Instead of assigning l, 2, 3 etc. to the ranked values
of M as in rank correlation, we substitute for the ranked values of M the values of L(V) sorted in
ascending order. Thus the ordinal information in M is preserved but a new distribution is obtained
which is identical to the distribution of L and this insures maximal linearity. G(V) may be used in the
same way, the resulting transformations will be labelled M*L and M*G respectively.
For the 24 configurations previously mentioned, M*L was computed and correlated both with L and
with M. The lowest correlation between M and M*L was .997, differences between r(L, M) and r(L, M*L)
showed up first in the fourth decimal place. This further testifies to the linearity observed in the plots.
Since in practice, however, L is unknown there is no counterpart in usual empirical studies to using
M*L. It does, however, turn out that M*G and M*L give essentially identical results, even for relatively
high levels of noise. In Section 4.7 we report results comparing M*G and M to check the validity of
equation (2), cfr. also Section 4.33.
It may be noticed that M*G is just another symbol for D* if SSA-1 is used. By here using M*G as a
symbol, it is emphasized that the rank image transformation can be used independently of the
Guttmann-Lingoes algorithm.
Another advantage of using M*G may be mentioned. If one is interested in the type of relation in a
Shepard diagram, this relation tends to be obscured by the presence of errors. Instead of plotting data
against distances one may plot data against M*G. When using the correct form of the model M*G will
be linearly related to the true distances. The relation between M and M*G will reveal the type of
monotone transformation producing M unencumbered by errors, since there is a perfect rank
correlation between M and M*G.
When we further report results based on r(L, M) and r((M1, M2) it should be understood that this is valid
because the noise process being used does not produce significant non-linearities. However, when
there is reason to doubt linearity, M*G should be used, this will then give the necessary linearization.
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
39
Returning now to the relation between proportion of error and the resulting correlation, r (L, M), it may
be useful to regard proportion of error as the intended noise level, and r(L, M) may be regarded as the
actually obtained noise level. This distinction is similar to the familiar statistical distinction between a
universe parameter and a sample value. Using the same error proportion repeatedly on the same
configuration will give slightly different values of r(L, M) since the random error process will start with a
different value each time.
This conceptualization may point to away of deciding what the best way to standardize error proportion
is in the Ramsay-Young process. Should σ G or σ V be used as the denominator in the definition of E,
cfr. equation (1)? One criterion of a specific method being the best is that the variance of the obtained
r (L, M) values for a given value of E is smaller than for competing methods (since for a given value of
E the intended variance is 0). A related alternative is that the best method should have minimal
variance within a given level of E, while the variance between levels should be maximal. This will lead
to higher correlation between E and r L, M) for the best method. Some preliminary results using this
approach failed however, to show any clear-cut difference between using σ c or σ v for standardization,
and consequently Young’s procedure (using δV) is used throughout in the Ramsay-Young process
The problem of describing the discrepancy between L and M may be regarded as the problem of
finding an index for the relation between an interval scale (L) and an ordinal scale (M) variable. We
have suggested using correlation either because in some cases M is linearly related to L, or else we
have suggested transforming M to produce linearity. The stress coefficient can, however, be regarded
as an alternative approach to describing the relation between an interval and an ordinal scale variable.
Consequently, stress should be well suited as an alternative index for NL. There is one advantage of
using stress as an index for NL, which is related to the problem of local minima. Recall that in terms of
Kruskal’s formulation of his algorithm, the true configuration, L, will be one point in the configuration
space, and thus one of the possible solutions to M. The algorithm searches for the global minimum
which (if found) by definition will be G. When noise is introduced L cannot be expected to be the global
minimum. The stress of G will then be the lowest possible stress. Consequently the stress of L, NLstress10, cannot be less than the stress of G, AF-stress, but will generally be higher. If conversely AFstress is larger than NL-stress we know that the global minimum has not been found (at least one
point with lower stress than G exists, namely L). If then AF-stress is less than NL-stress, this is an
indication that the program has not been trapped in a local minimum (though of course it is by no
means conclusive evidence).
Another possible advantage of using stress is that stress can also be used as an indication of true fit.
Just as stress can be used as an indication of discrepancy (L(V), M) it can be used in exactly the same
way to indicate discrepancy (L, (V),G(V)). This makes it possible to study purification not only by
basing indices for NL and TF on correlation but also on stress. It may be of advantage to show that the
general conclusions on purification are independent of just one type of index. Since, however,
equations (2) and (3) will turn out to be powerful tools, correlation will be used as the basic index for
both NL and TF.
In the next section we review the various indices which have been used for TF, where it will be evident
that correlation is by no means the only alternative worth exploring.
10
NL-stress is a convenient way to express a specific index used for a general concept. The same convention
applies to TF and AF and other specific indices referred to.
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
Table 3.
Error proportion, E
t= 1
l) Ramsay-Young
σv
standard
2) Ramsay-Young
σ c standard
3) Wagenaar-Padmos
t=2
4) Ramsay-Young
σ v standard..
5) Ramsay-Young
σ c standard
6) Wagenaar-Padmos
7) r(L,M-L), t=l
8) r(L,M-L), t=2
40
Illustrations of relations between different procedures for inducing noise, n
= 12. In rows 1) -6) each cell is a root mean square correlation based on
results, r(L, M), from 10 different random configurations. For a given
dimensionality, t, the numbers in each column of the table are based on
the same configurations. For each error proportion, different
configurations were used. Rows 7) and 8) report mean correlations
between error and true scores for the M vectors in row 1) and row 4),
respectively.
.05
.10
.15
.20
.25
.30
.35
.40
.45
.50
.998
.990
.979 .962 .945 .919 .899 .873 .844 . 824
.997
.986
.969 .947 .921 .887 .856 .829
.970 .767
.997
.985
.967 .950 .921 .887 .863 .829
.770 .739
.998
.990
.976 .964 .944 .921 .897 .870
.827 .801
.997
.992
-.08
-.05
.987
.971
-.01
-.05
.967
.943
-.06
-.l0
.951 .923
.893 .851
-.15 -.l0
.03 -.07
.895
.799
-.18
-.13
.869
.789
-.17
-.10
.835 .794
.694 .685
-.15 -.18
-.13 -.14
.755
.656
-.12
-.15
Table 4. Overall means for results presented in Table 3..
Means for rank correlations are included.
Linear correlation
Rank correlation
t=1
t=2
t=1 t=2
Ramsay-Young
.925 .921
.909
.917
σ v standard
Ramsay-Young
.898
.901
.879
.897
σ c standard ..CI .
Wagenaar-Padmos
.895
.835
.916
.835
Comments to the results presented in Tables 3 and 4.
1) Correlation between true and error scores are presented in rows 7) and 8), based on σ v
standardization of the error term in the Ramsay-Young process. The corresponding results for σ c
were practically identical and are not reported. Increasing the error proportion further than 0.50
tends to slightly increase the numerical value of the negative correlation. In the region E= 0.501.50 there may be correlations in the range -.20 to -.30. Further increases in error proportion,
however, will reduce the numerical value of true error score correlation. As E increases without
bounds r (L, M) will approach 0, the configuration is drowned in noise and correlation between true
and error score will also approach 0. The implications of this will be further discussed in the next
chapter.
For the Wagenaar-Padmos procedure there were, as expected, no significant departures from 0 in
correlations between true and error scores.
2) Comparing row 1) with 4), we see that the compensation referred to on p. 36 does take place, the
values are closely similar, this is reflected in the very small difference in mean values in Table 4,
.925 vs. .921. Other studies, also using 3 dimensional configurations, show the same similarity as
here reported.
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
41
3) Comparing row 1) with 2), likewise row 4) with 5), we see that as stated on p. 36 standardization
with σ c gives consistently lower correlation, though the over all mean differences in Table 4 do
not seem to impressive. As for σ v standardization there is no interaction with dimensionality.
4)
As will be further discussed in Section 4.6 Table 3 does not cover the whole range of E of
interest. For the Ramsay-Young process with δV standardization values of E in the range of .501.00 may be of interest, even values of E in the range of 1000-1.50 may give rise to acceptable
solutions if n is not too small.
E = 1.00 roughly corresponds to r (L,M) = .53
“
“
= .31
E = 1.50
Values of E larger than 1.50 almost invariable produce M vectors where L is beyond recovery.
5) For the Wagenaar-Padmos procedure there is a pronounced interaction with dimensionality, the
values in row 6) are consistently lower than the values in row 3) .The reason for this is not clear,
neither is it clear why the rank correlation is substantially larger than the linear correlation for t = 1
(.916 vs. .895). This is highly significant (in 9 of 10 conditions the rank correlation was higher). It is
possible that the Wagenaar-Padmos procedure may be very sensitive to the detailed form of the
distribution of distance.
3.3
On indices of true fit (TF)
Although it seems important to agree on an appropriate measure of the discrepancy between G and L,
this topic has never been treated systematically in the literature. In no case has there been any
attempts to justify the type of index used.
Broadly we may distinguish two different types of indices:
1. Indices based on the distance mode, that is discrepancy between LV) and G(V).
2. Indices based on the configuration mode. In this case it will be assumed that G has been
orthogonally rotated to maximal similarity with L. Indices can then be computed to express the
discrepancy between L(C) and G(C).
Indices belonging to the first class are far more frequent than indices of the second class, and will be
treated first.
In his, by now classical 1962 papers, Shepard used - as the first example illustrating the use of his
method - as true configuration 15 points in 2 dimensions, the coordinates taken from Coombs and Kao
(1960). The distances were subjected to a monotone transformation (but no noise added). In
multidimensional scaling the output configuration is only determined up to a similarity transformation.
Origo can be freely chosen (translation), likewise the unit (dilation) and finally one may freely rotate
orthogonally since all these transformations leave the order of the distances unchanged. Shepard
placed the origo at the centroid of the points (as is usually the case and will be assumed throughout
this work). The dilation factor was constrained by setting the mean of the interpoint distances to unity,
the configuration was then rotated to maximal similarity with the true configuration, L.
As a measure of true fit Shepard (1962b, p. 221) computed the "normalized mean square discrepancy
between the reconstructed an the true distance”:
(4)
TF1 = ∑(lij − dij )2 / ∑ lij2
Without further knowledge of this particular index the reported value of .00013 would be hard to
evaluate. It seems like a comfortably small value, but a graphical presentation is far more informative:
"That this discrepancy is indeed extremely small is shown graphically in fig. 3" (op.cit. p. 223). With
perfect true fit it should be the case that the reconstructed configuration should coincide exactly with
the true configuration. In Shepard's Fig. 3, where L(C) is represented by crosses, G(C) by circles, it is
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
42
evident that this is essentially the case, circles and crosses coincide. Indeed, it is hard to imagine any
situation where the minute discrepancies would make any difference.
Since Shepard used an errorless example this is as expected and substantiates the basic claim for
nonmetric scaling, that metric information on coordinates is implied by rank order information on
dissimilarities.
Presenting his improved version of Shepard’s program, Kruskal (1964a) used the same true
configuration as Shepard, but in this case the dissimilarities were first distorted by addition of a normal
deviate to each (transformed) distance. After transforming the reconstructed configuration, Kruskal
chose to compute a “percentage difference".
In the present terminology Kruskal's formula can be written:
(5)
TF2 = ∑(lij − dij )2 / ∑((lij + dij )/2)2
While the numerator is the same as in equation (4), the normalizing factor is different so that the sizes
of the two indices are not comparable. Comparing Fig. 11 in Kruskal's paper with the corresponding
figure in Shepard's paper, it is evident that the fit is somewhat worse in Kruskal's case, though for all
practical purposes the amount of fit illustrated by Kruskal would be acceptable. In the next section we
present figures to illustrate various levels of true fit, Figs. 2 and 3.
Since Kruskal had added noise to his dissimilarities it is to be expected that the fit would decrease. If
Kruskal and Shepard had used the same index for true fit, we would also have a numerical estimate
for the amount of loss of true fit suffered by the addition of noise.
None of the indices originally used by Shepard and Kruskal seem to have been used in later studies.
An index related to equation (4) would be the stress of G with respect to L. The general form of the
formula is identical but stress would give smaller values since then block partition values would be
substituted for the dij values, and as previously mentioned this implies a separate minimization
process. This can be written:
(6)
TF1 = ∑(lij − dij )2 / ∑ lij2
Lingoes and Roskam (1971, p. 168) use yet another index:
(7)
TF4 = 1 −
∑(lijdij )2
∑ lij2 dij2
The only index which seems to have appeared in more than a single study is linear correlation which
also pertains to the distance mode or some transformation of correlation. The first example is Shepard
(1966) where, however, only the case with no noise was studied. Root mean square was used to
compute mean indices. In a much more extensive design - to be discussed in detail in Section 4.4,
Young (1970) defined “metric determinacy” (true fit in the present terminology) as the "squared
correlation between the true and the reconstructed distances" (op. cit. p. 458). When computing mean
indices Fisher’s z-transformation was used, this transformation was also used in various analyses of
variances “to improve the normality of the distribution” (op.cit. p. 470). In his various attempts to
predict true fit from stress, dimensionality and number of points, Fisher’s z was used as the index to be
predicted.
In elaborating Young’s design his pupil Sherman (1970, p. 22) used yet another function of a true fit
correlation, TF-r. He wanted a function of r maximally linearly related to stress and through trial and
error he found that the coefficient of alienation:
2
cal = 1 − r
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
43
was more closely linearly related to stress than were r, r2 or Fisher's z transformation of r.
As will be further discussed in Section 4.6 linearity is desirable for purposes of estimating true fit. It
turns out, however, that Sherman’s transformation, cal, severely violates linearity when the whole
range of NL values is included. When the same type of transformation is applied to 1-cal, linearity was
restored practically over the whole range. The formula for this transformation, K, is:
(8)
K = 1 - (1- l - r 2 )
A further transformation of K was found desirable to adjust linearity for values of r very close to 1.0 and
for values of r less than .70. These transformations will be further discussed in the next section and
data showing that these transformations do give linearity are presented in Section 4.6.
Any transformation of r does of course presuppose that r basically is a valid measure of true fit. In his
1969 review of scaling Zinnes strikes a highly critical note concerning the use of correlation as an
index for true fit: "Shepard’s use of the correlation coefficient as a measure of accuracy here seems
unfortunate. While the correlation coefficient is a useful index when two variables are crudely related, it
is practically useless when the variables agree on the ordering of the stimuli. This property of the
correlation coefficient is amply demonstrated by Abelson and Tukey” (op.cit. p. 465).
This critique may be stated too strongly. For one thing the general relevance of many of the examples
referred to from Abelson and Tukey (1963) may be questioned since they use variety of rather bizarre
linear orderings. Secondly, it is not clear from the critical comments made by Zinnes when correlation
becomes useless. In the cases of interest in the present work, the variables will in most cases not
agree on the ordering of the stimuli (rank correlation will be less than 1).
On the other hand, the variables will not be "crudely related". A possible implication of Zinnes’ criticism
may be that r is too insensitive in the region close to 1, small departures from 1 may be of large
practical interest. The transformations used in the present work to a much higher degree than
Sherman’s cal transformation, “dramatize" very small differences in r when r is close to 1. Fig. 6 in
Section 4.6 illustrates this for r > .99, cfr. also the general survey of the transformations in Table 7 in
the next section.
We may, however, conclude that the use of correlation is not unproblematic. One feature shared by all
indices of type l is an indirect quality. There are (2n) elements involved in the index.
In contrast, for an index of type 2 there will be nt elements which generally will be a much smaller
number than ( n ). A type 2 index is "closer to " the visual impression of discrepancies in figures where
2
true and reconstructed configurations are juxtaposed.
We first discuss one such index, then turn to a more general discussion of criteria for evaluation of
indices.
The familiar coefficient of congruence, CO can be given some simple interpretations:
2 • ∑ ∑ g2
C0 = ∑ ∑ lik gik / ∑ ∑ lik
ik
It is convenient to fix the unit of G so that ∑∑12ik ∑∑g2ik and the formula for C0 then simplifies to:
2
C 0 = ∑ ∑ lik gik / ∑ ∑ lik
Another useful version of this formula is:
C1 = 1 − C0
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
44
Consider now the discrepancies between each pair of corresponding points in for example Fig. 4. For
each of the n pairs, say pair i:
t
(lik
∑
k =1
− gik ) 2
(lik − g ik )2
indicates the total error for all coordinates (in Euclidean space simply the squared "error distance").
Summing across all pairs of points and norming by the total sum of squares of coordinates, finally
taking the square root, we get a normed root mean square of coordinate errors, for short NCE:
2
NCE = ∑ ∑(lik − gik )2 / ∑ ∑ lik
Since:
2
∑ ∑(lik − gik )2 = 2( ∑ ∑ lik
− ∑ ∑ lik gik )
We get:
NCE = C1 • 2
Consider next the root mean square of distance errors normed by the root mean square interpoint
distance, for short "normed distance error", NDE. Mean interpoint distance can be shown to be
2.∑∑l2ik/(n – 1) and then:
NDE =
∑ ∑(lik − gik )2 / n
2
2 ∑ ∑ lik
/(n − 1)
= C1 • (n − 1)/n
Finally we may give a more abstract interpretation by considering L and G as two points in nt
dimensional space, what Kruskal (1964b, p.30) calls "configuration space". The distance between L
and G is simply:
∑ ∑(lik − gik )2
Norming this by the length of L, √∑∑12 ik , we get normed error length, NEL. The formula for NEL is the
same as for NCE:
NEL = C1 • 2
In practice the coordinates of G in the preceding equations are determined by orthogonal rotation of
the preliminary output to maximal similarity with L. One algorithm for accomplishing this is described
by Cliff (1966), who maximizes ∑∑likgik , which is equivalent to maximizing C0. This algorithm is used in
the present work. Though Cliff does not explicitly describe an overall goodness of fit index, C0 is a
natural choice, and has been applied by Shepard and Chipman (1970, p. 11) under the name
“coefficient of agreement”.
Confronted with a variety of criteria as in the present section one needs a set of methodological rules
by which to judge the appropriateness of any proposed index, either considered singly or relative to
other indices. The following list of desiderata is presented very tentatively:
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
45
a) Predictability
We may prefer the index which lends itself best to prediction from other variables (for instance
stress, dimensionality, n). In a very general context, Cattell (1962) argues that the variables which
are most clearly related to other variables, are more "basic" than others, this may be called
Cattell's criterion.
b) Intuitive appeal
Admittedly this sounds like a very subjective and ambiguous criterion. It is, however, listed since it
may be a useful criterion when the domain is uncharted as in the present case. By presenting
figures as done by Shepard and Kruskal (other examples will be given in the next section)
weaknesses of current indices might be apparent and other alternatives suggested. The next
section demonstrates how this criterion may be given experimental definition and discusses
further possibilities implied by this approach.
c) Simple boundaries
Indices where 0 and 1 are boundaries may be preferable. The stress measure has for instance
been criticized because there is no well defined upper boundary.
While the three criteria above may have general applicability, the next two are designed to illuminate
specific problems.
d) Comparability with index for noise level
In order to give operational specification to the concept of "purification" it is necessary to use the
same kind of index both for noise level and true fit. Since data are only given in vector form C0 is
then by definition inappropriate. Correlation and all other indices of type 1 may, however, be used
since all three terms in the metamodel can be stated in vector form. In the present work correlation
will be used as a basis for studying purification. Stress, however, has been found to give similar
results.
e) Component breakdown.
For some purposes it may be useful to break down the total discrepancy into separate
components for each point (or for each dimension). This is fairly simple for C0 (or rather C1). It is
then convenient to define each component so that the total discrepancy is a root mean square of
the components. The discrepancy for point i, h.i, is:
(9) hi =
t
(lik
∑
k =1)
− gik )2 • B0
where B0 is defined so that
C1 =
n 2
h
∑ i
i =1
/n
that is
2
B 0 = n / ∑ ∑ lik
Similarly one may study discrepancies for each dimension:
(10)
hk =
n
(l
∑ ik
i =1
− gik ) 2 • B1
where B1 is defined so that
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
C1 =
t
hk k2
∑
k =1
46
/t
that is
2
B1 = t / ΣΣ1ik
For equations (4), (5) and (6) it is also simple to give a similar breakdown for points. Equation (4) will
be used as an example. For point i the discrepancy is:
(11)
hi =
n
(l
∑ ij
j=1
− dij )2 • B
where B is defined so that:
TF1 = ∑ hi2 / n
that is:
B = n / ∑ lij2
It does not, however, appear possible to give a similar breakdown for contributions of separate
dimensions for equations (4) , ( 5) and( 6) .
A disadvantage of correlation end also of equation (7) is that is does not easily admit of component
breakdown.
A complete investigation of all the indices would be prohibitive. As an example of the general
approach we conclude this section by comparing correlation and C1 with respect to Cattell’s criterion.
Correlation is selected not only because it is usually used but because it may be a powerful tool by
virtue of equations (2) and (3). C1 is selected because it represents a quite different approach. It may
have a somewhat more direct quality than r, but perhaps a more important reason for including this
index is that it easily admits of component breakdown.
The ideal situation would be if both r and C1 gave equally convincing results both from the point of
view of predictability and intuitive appeal. The indices would then function in a complimentary way.
Each could be put to work for special purposes where the other one did not apply (r for the study of
purification, C1 for studying component breakdown), yet true fit could be retained as a unitary concept
by virtue of the two general desiderata.
The relation between r and C1 was studied in six different conditions generated by two values of n (10
and 15) and three values of t (1,2 and 3). For each of the six different conditions 25 different random
configurations were analyzed (5 different values of E (noise level) and 5 replications for each noise
level). This study actually was a replication of the study done by Young (1970) and detailed results are
2
given in Section 4.5. It was found that 1 - r and C1 were practically linearly related and Table 5
reports the relation between these two indices.
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
47
Table 5. The relation between correlation and congruence as alternative indices for true fit. The
correlational index is
1- r
2
, the congruence index
C1 =
1 − Co
. Each cell reports a correlation based on
25 different configurations.
10
t- dimensionality
1
2
.986
.970
3
.978
15
.996
.972
n
Root mean square of the 6 correlations above
.982
.981
2
Correlation between 1 - r and C1 across
150 config. (merged correlation)
.972
The correlations seem to be satisfactorily high. It is of special importance to note that the merged
correlation is not appreciably lower than the averaged correlation (.972 vs. .981). If the merged
correlation had been substantially lower this would have indicated that the relation between r and C1
as indices of true fit would have depended upon “irrelevant” parameters (as n and t). In this case true
fit based on r would have been a different concept than true fit based on C1. The present results
indicate that we may (tentatively) regard true fit as a unitary concept.
Since r and C1 are not perfectly related it is possible that one of the indices is more predictable than
the other. There are two main predictor variables, NL (where we also use correlation) and AF (where
we use stress). The results to be reported are based on rank correlations between these predictors
and the criteria r and C011. For NL the root mean square correlation with TF - r is .944 whereas the
corresponding correlations with TF – C0 is .929. In 5 of the 6 conditions TF -r is the most predictable
criterion. For AF the root mean square correlation with TF - r is .933, the corresponding correlation
with TF - C0 is .927, for this criterion, however, TF - r was more predictable in only 2 of the 6
conditions.
The present results indicate that from the point of view of predictability, if there are any differences at
all, these differences will probably favour correlation as the basic index for true fit.
We may further note that C1 does not have a clearly defined upper boundary. Even if there is no
common structure for G and L, C1 will be less than 1 since the rotational procedure will capitalize on
noise. The higher the dimensionality and the lower the value of n is, the more will the expected upper
boundary for C1 be less than 1.0.
It does not seem likely that any of the other indices mentioned in this section would give appreciably
different results. There might, however, be special circumstances where the specific index used may
make some difference. One could conceive of elaborate designs using a variety of indices for each of
the three basic relations NL, AF and TF. Influenced by Cattell's criterion, perhaps also using the logic
propounded by Campbell and Fiske (1959) in their multitrait - multimethod approach one might work
out more precise criteria for deciding, first whether it was reasonable to regard each of the three basic
relations as a unitary concept, second if so what the best index for each relation would be. At present,
however, it is difficult to see that much could be gained by such refinements.
We conclude that it is legitimate to use correlation as the basic index for true fit. If n is not too low C1
may be used for special purposes, for instance if it is desired to study component breakdown. Since
C1 may be somewhat easier to give an intuitive interpretation the next section will give descriptions
both in terms of r and C1.
11
Using linear correlation gave numerically highly similar results, though more clearly favouring TF - r. This
might, however, be due to the fact that C1 may not be maximally linearly related with the predictor variables
used, consequently results based on rank correlations are reported here.
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
3.4
48
Direct judgments of true fit. An empirical approach
In the experiment to be discussed in this section subjects judged varying degrees of true fit in visually
presented figures. This study has three major purposes.
a) Check intuitive appeal of numerical indices.
Intuitive appeal was mentioned as one of the criteria for evaluating indices on p. 45. Hopefully
there will be no serious discrepancies between the ordering given by judgments and the ordering
on the basis of numerical indices. This will give increased confidence in these indices. If on the
other hand there are pronounced discrepancies between the numerical and judgmental orderings
the situation will be quite problematic. Further theoretical and empirical analyses will then be
necessary to bring out possible flaws of present indices which may then suggest improved ones.
b) Define a cutoff point between acceptable and unacceptable degrees of true fit
At some point the discrepancy between L(C) and G(C) will be so pronounced that the results will
be useless. Is it possible to find a "natural" boundary between acceptable and unacceptable
degrees of true fit?
c) Provide true fit categories
This is a broader concern than just defining a cutoff point. The present study is the main basis for
providing categories to replace the use of stress and the associated verbal labels provided by
Kruskal, cfr. p. 5-6..(How the correct category in practice may be estimated is discussed in Section
4.6). A specific advantage of the present approach is that visual presentations may provide a
useful perceptual anchoring to these categories. This may reduce some arbitrariness associated
with interpretation of categories merely described by general verbal labels.
Stimulus material and judgment tasks.
Two sets of pairs of configurations were used as stimulus material. For 2 dimensional configurations
23 pairs of (L,G) configurations were constructed, presented in the same way as the previously
referred to examples by Kruskal (1964a) and Shepard (1962b). Fig. 3 gives four examples of the 2
dimensional pairs used. Each configuration consisted of 20 points. The pairs were constructed to
reflect several sources of variance, first, they covered the whole range of TF-values (the actual
distribution will be given later in Table 7), second, both systematic (circular and rectangular
arrangements) and random configurations were used. Similar principles were followed in constructing
22 pairs of 1 dimensional configurations, four examples from this set are presented in Fig 2, see p. 52.
In Fig. 2 the pairs have been reduced 10% in size from the original sized used in the experiment, in
Fig. 3 the reduction in size is 50%. Each pair was supplied with an arbitrary label for convenience in
recording the judgements, of course no other information about the pairs was given.
Five colleagues served as judges. All of them had some training in multivariate techniques. For each
series the subjects were asked to judge how well one configuration was as a description of the other how well they matched each other.
The pairs were to be rated on a scale from 100 (perfect) to 0 (worse) so that scale distances reflected
differences in how well the pairs matched. In essence this instruction asked for both rank order and
interval scale information, the latter type of information will, however, be ignored since the subjects
complained that this aspect of the task was quite difficult. Finally they were asked to partition the scale
which they had used in 5 categories: “Excellent”, "good", "fair", "poor" (barely acceptable), "off"
(unacceptable). Comments made during the judgments and subsequent discussion with the judges will
play an important part in the following presentation and discussion of the results. For each of the three
main problems there will also be supplementary elaborative and technical comments.
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
49
Results and discussion
Intuitive appeal.
Concerning the intuitive appeal of r as an index of true fit, the results were very encouraging. For the 2
dimensional series the rank correlation between the subjects ranking and TF -r ranged from .925 to
.960 with median value .935. For the 1 dimensional series the results were even better, rank
correlations ranging from .955 to .990 with median .980 and the deviations from perfect
correspondence showed little consistency between subjects.
During the discussions the pairs were sorted on the basis of the numerical indices and the judges
were encouraged to criticize this ordering on the basis of their own deviations from this ordering. This
produced little in the way of consistent answers. These qualitative observations from discussions with
the judges further support the validity of using r as an index of true fit.
As would be expected from the results reported in the previous section, the present design was
inadequate to give differentiating information on r vs. C1 as indices for true fit as the rank correlation
between these indices was above .99. It may, however, turn out to be possible to construct stimulus
material where r and C1 will give quite different orderings of the pairs. A possible basis for such
discrepancies is suggested by comments made by the judges. Occasionally they experienced conflicts
- how should they weigh against each other few large errors versus many small ones? Detailed
studies of judgments by experienced subjects where the pairs differ markedly in the distribution of
errors, might reveal flaws with r and/or C1 as indices of true fit.
The construction of the present stimulus material precluded a more detailed study of such problems
since this conflict was not pronounced. In the present cases the judgment of correspondence may be
a fairly simple, predominantly perceptual process. It might be mentioned that five 2 dimensional pairs all within the acceptable range -were perfectly rank ordered by a 6 year old boy. The tasks were
performed with little hesitation.
The main reason that the present judgments give such clear results may be that the error process
used when generating the stimulus material assumed the same error variance for all points. When this
assumption is highly questionable, the present results must be used with extreme caution, cfr. also the
comments on specification 7 on p. 33.
Cutoff point.
The judgments showed a remarkable agreement not only concerning rank order but also in providing a
boundary differentiating acceptable from unacceptable degrees of fit. On the basis of the judgments,
r= .707 was found to be the best boundary (the precise value was decided on the basis that for this
value of r half the variance in L and G is common variance). Cross tabulating this boundary against
judgments of acceptability gave the results presented in Table 6.
Table 6.
Concordance between judgments of acceptability by 5 subjects and a numerical
criterion (r > .707).
Number of pairs
judged acceptable
Number of pairs
judged unaccetable
Sums
% concordance
R < .707
4
61
65
(61 + 157)/225 = 96.9
r > .707
157
Sums
161
3
160
64
225
We see that the judgments by the subjects and the mathematical criterion concorded in 96.9% of the
cases.
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
50
For the 1 dimensional series, 7 of the 22 pairs were in the unacceptable region (r < .707) and for the 2
dimensional series correspondingly 6 of 23 pairs. The high intersubjective agreement is not due to a
“lumping together” of unacceptable pairs in contrast to acceptable ones. For both series all subjects
had close to perfect rank ordering also within the unacceptable range. It was thus not that they simply
differentiated between “degrees of structure”, and "no structure", rather that at a certain point the
degree of structure seemed so "diluted" that it no longer seemed worthwhile to try to interpret the
configuration. Examples of unacceptable pairs are shown in Figs. 2d) and 3d).
A further problem is whether it is possible to find any support for the presently proposed cut-off point in
the literature. As we shall see in Section 4.4, both Shepard (1966) and Young (1970) would perhaps
prefer TF -r to be higher than .90. It is, however, not possible to give much weight to their views since
they are stated obliquely, lacking in precision.
It is, although by a fairly indirect procedure, possible to find support for the cutoff point here proposed
in a paper by Cliff (1966, p. 41). In the context of defining the number of factors - by determining
whether a factor is identifiable across parallel data sets (after orthogonal rotation to congruence) - he
writes:
Factors are matched one-by-one and correlations between loadings on the
corresponding factors are determined…. Preliminary experience indicates that a
correlation of .75 is minimal if the factors are to have recognizably the same
interpretations.
We now take the value .75 also to apply to whole configurations (a value for a configuration can be
regarded as averaged across dimensions). It is then possible to find a value of r roughly equivalent to
C0 = .75. First it must be noted that congruence between parallel data sets, (C0 (G1 G2), as in Cliff’s
procedure will be lower than congruence between a data set and a true configuration C0(G,L).
Unfortunately one does not have the same simple relation between C0(G1 G2) and C0(L, G) as
between r(M1 M2) and r(L, M), cfr. equation (2). By generating two different M vectors for each of a
number of L configurations the relation between C0 (G1 G2) and C0(L, G) can be studied, and it was
found that C0(G1 G2 ) = .75 corresponded to C0(L, G) = .85. : From extensive studies of the relation
between r and C0 this was finally found to correspond to approximately r = .72, a value remarkably
close to our proposed cutoff point of r = .707.
True fit categories.
The final step is to divide the acceptable region into categories. There was, however, no uniform
agreement on the range to be encompassed by the descriptive labels supplied. Two subjects did for
instance use "poor" in a way which in interview turned out to be equivalent to: "I would never try to
interpret my data if the fit was that bad. Neither would I pay attention to any conclusions - however
tentatively stated – others might care to make from such data". On this basis - in agreement with the
subjects - their “poor” category was included under “off”, and further “poor” was found to be too
ambiguous to retain as a descriptive label.
The discussions with the subjects suggested three categories in the acceptable range: Excellent, good
and fair. Granted that there will be different requirements in different fields of psychology, the following
descriptions of the categories are tentatively offered.
Excellent: This is the required fit if the fine grain of the data is very important. This will be the case in
advanced fields of psychology. If for instance one should want to recommend changes in the Munsell
colour system or the basis of multidimensional scaling, “excellent” true fit would be required. Fig. 3a
illustrates a case very close to this category, cfr. also Fig. 2a.
Good: This will be the fit typically found in good work in social psychology and personality. Not only
major clusters, but also rough ordering in various directions can be interpreted fairly confidently. Fig.3b
illustrates an example at the lower boundary of this category.
Fair: For this level of fit major clusters can be identified though there will be some error in allocation of
members. The provisional nature of interpretations of clusters should be stressed. Figs. 2c and 3c
provide illustrations of the worst level of fit where interpretation may be possible.
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
51
The discussions with the subjects just suggested ranges for boundaries between categories, the
precise boundaries were dictated by numerical considerations.
When the three acceptable categories were compared with the K transformation, equation (8), it was
found that the “good” category covered a much larger domain than the “fair” category. On this basis
the "good" category was further subdivided into two categories: “very good” and “good”. Table 7
summarizes the statistical properties of the final set of categories and also gives the distribution of the
figures in the stimulus material in terms of these categories.
Table 7: True fit categories and distribution of figures used in the experiment.
1) r- correlation .
2) K - 1 - (1 - 1 - r 2 )
3) TF -categories
4) C0 congruence*
5) C1 - normed disstance error*
No. of 1 dimensional pairs
~hNo. of 2 dimensional pairs
*) Approximate values.
7
Categories
Excellent Very good Good Fair
Unacceptable
.994
.976
.922 .707
0
.457
-.5-1,0
.998
.623
2.0
.99
.790
3.0
.96
.956
4.0
.84
1.0
4.5
-
.05
.10
.20
.40
-
5
2
3
5
7
Sum
22
2
0
6
9
6
23
Unfortunately the "very good" category was not represented in the 2 dimensional series (but see Fig.
2b for a 1 dimensional example). The subdivision of the preliminary "good" category made the range
of each of the three middle categories equal in terms of the K transformation.
There are two different ways of using the description of true fit. The simplest is just to ask to which
category a given solution belongs, perhaps this will be sufficient in most cases. The K transformation
does, however, invite more precise numerical descriptions. This will for instance be useful in Section
4.6 when we discuss with what accuracy one can estimate true fit from stress, n and t.
It may be convenient to have simpler numerical representations of category boundaries, this is
provided in row 3) in Table 7. In the range K = .457 to K = .956 this is a linear transformation where
1.0 corresponds to .457 and 4.0 corresponds to .956. From the point of view of the end categories the
K transformation was not completely satisfactory. This transformation would make the “excellent”
category more than twice the size of the middle categories, the ”unacceptable” region would be
negligible. As will be seen from the results in Section 4.6, making the "excellent" category 50% larger
than the middle ones, and the ”unacceptable” category 50% smaller, give maximally linear relations
with stress. This implies using a different linear transformation for each of the end categories.
The term TF-categories will be used to refer both to the numerical description and to the proposed
labels for the different categories. Statements where TF is assigned a numerical value always refer to
the above discussed linear transformations of the K values.
Concluding comment.
At the present state of development of multivariate techniques some arbitrariness and lack of precision
in describing the true fit index is unavoidable.
There is bound to be individual differences among researchers as to the requirements they will make
in different cases. Further progress may be expected as experience is gained with the proposed index.
It will then be important to gain systematic knowledge of what actually occurs when researchers
interpret configurations in specific cases. Exactly what topological (and other) properties do they stress
in what contexts? How important may for instance serious misplacements of few points versus small
misplacements of many points be in concrete cases? How important may for instance serious
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
52
misplacements of many points be in concrete cases? Knowledge on questions like this may help the
construction of more refined indices and will be in line with the plea in Ch. 1 for obtaining descriptions
of how research actually occurs.
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
53
Fig 3:
Illustration of different levels of true fit for 2 dimensional configurations, n = 20
° represents true position ● reconstructed position.
Corresponding points are connected by straight lines.
* a) should be classified as “very good”, but is labelled “excellent” since it is very close and it was
found desirable to represent the best category.
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
54
Chapter 4
BEYOND STRESS –HOW TO ASSESS RESULTS FROM MULTIDIMENSIONAL SCALING BY
MEANS OF SIMULATION STUDIES
4.1
Two approaches in simulation studies.
This chapter contains the promised guide lines for the problems of applicability, dimensionality and
precision. Since these are based an simulation studies the first step is to point out two different
strategies which have been employed in previous simulation studies.
The first approach is influenced by the perhaps unfortunately strong emphasis on statistical
significance testing in much of psychological research.
Systematic attempts to provide information on how to assess stress were made independent of each
other by Stenson and Knoll (1969) and Klahr (1969) who provided tables and graphs giving expected
stress values as functions of n and t.
The sampling distribution of stress for given values of r and t are simple to generate by analysis of a
sufficient number of random sets, each set for instance being a permutation of the first ( n ) integers.
2
Klahr reported complete sampling distributions, Stenson and Knoll just reported average values, and
also provided some rough rules of thumb for how far stress should deviate from the average to be
regarded as statistically significant. The conventional advice to be expected from such studies is to
provide stress values corresponding to the customary 5% limit and recommend this as a cutoff point
separating acceptable from unacceptable solutions. This is for instance done in the most recent of this
type of study (Wagenaar and Padmos, 1971).
Concerning this approach we first note that it represents a very different strategy to separate
acceptable from unacceptable solutions than the approach used in Section 3.4. Results comparing the
consequences of these two approaches will be discussed in Section 4.6. At present we note that in
many cases the scientist will not primarily be interested in whether he can safely reject the null
hypothesis of no structure or not.
Just as one will not be content to know that the reliability of a test significantly departs from zero, but
wants an estimate of the amount of internal structure, we believe that the researcher in
multidimensional scaling typically will be concerned with the amount of structure in his material.
This corresponds to the second approach in simulation studies where we offer TF - categories (as
discussed in Section 3.4) as an index for amount of structure. For a given number of points, n, and
dimensionality, t, we shall see that stress (AF) and TF is a function of the amount of noise added to
the true configuration. Analyzing random sets corresponds to a special case of this more general
approach. If a given true configuration is subjected to increasingly higher level of noise the
configuration will eventually be "drowned" in noise. As noise level increases without bounds we end up
with a random set of dissimilarities.
Analysis of random sets may thus be regarded as a special case of the more general approach of
studying consequences of noise levels. This will be apparent in the results in Section 4. 6.
4.2
Classification of variables in simulation studies.
Very briefly the steps in simulation studies are: construct L, introduce error to get M, analyse M to get
G, then finally predict TF from AF (and n and t). Though this may sound simple we shall see that there
are a large number of variables to explore. The parameter space is of such a staggering complexity
that anything even remotely resembling a factorial design is completely out of the question.
The list of variables presented below is, however, intended to be exhaustive. We indicate which of the
variables will be explored in the present work. The variables which are listed and not explored may
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
55
then point to further studies. The classification offered below also serves as a brief summary of
Chapter 3.
The basic distinction in the classification of variables is:
A. Specifications necessary to make when running a simulation study, what may here more
loosely be referred to as independent variables.
B. Consequences of the specifications made in A, loosely here referred to as dependent
variables.
It will be convenient to further subdivide the independent variables into three different classes:
A1. Simple quantitative variables.
A2. Complex quantitative variables.
A3. Qualitative variables.
where the three classes differ in complexity. The main focus of the present work is to outline methods
to deal with main effects and interactions for a subset of the variables in A1. Main effects and
especially interactions where variables from the more complex classes are involved are a nuisance
since this will limit the relevance of the present simulation studies for many experimental situations.
We now consider variables which belong in each of the three classes:
A1.
Simple quantitative variables.
n - number of points in the configuration
E - degree of error, error proportion
T - dimensionality of true configuration
m - dimensionality of analyzed configuration
p - Minkowski constant of true configuration (type of metric space)
pl - Minkowski constant of analyzed configuration.
In Section 4.6 we will set m = t, corresponding to the simplest form of the metamodel, cfr. Section 2.1.
In Section 4.7 we then explore several values of m for a given value of t and analyse the resulting
stress curves as implied by the extended form of the metamodel in Section 2.2.
Exploring several values of pl for a given value of p, may be relevant to finding the true type of metric
space, cfr. for instance the analysis of colour space by Kruskal (1964a, p. 23-24). This strategy also
produces “stress curves”, such stress curves will, however, not be studied in the present work which is
limited to the Euclidean case, that is p = pl = 2.
A2.
Complex quantitative variables.
A2.1 - relative variance in different dimensions of the true configuration.
A2.2 - relative error variance for points and/or dimensions.
A2.3 - degree of ties in the dissimilarities.
Concerning A2.1 it is clear that generally configurations will vary in the extent to which they have equal
variance in all dimensions. So far systematic studies have been limited to the special case of equal
variance in all dimensions. Studying the more general case will probably give results "between" those
which to now have been obtained for various values of t. Results for a "flattened" twodimensional
configuration may for instance be expected to fall between the results for t = 1 and t = 2.
A2.2 has already been discussed, cfr. specifications 6 to 8 in the discussion of noise level in Section
3.2.
Concerning A2.3, the present investigation will be limited to the study of sets of dissimilarities without
ties. Notice, however, that the metamodel implies a promising approach to the problem of how to treat
ties. As implied by the brief treatment in Section 3.1 the primary approach to ties - which allows untied
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
56
distances for tied data without downgrading stress - will automatically give lower stress than the
secondary approach. It is by no means, however, evident which approach will produce best true fit.
This will be the subject for a separate investigation.
In the present study we consistently make the simplest possible choices for all the variables in A2. The
simulation studies will be limited to configurations which have equal variance in all dimensions, error
variance will be assumed homogenous, and there will be no ties. In a variety of experimental situations
(perhaps most?) these assumptions will be unrealistic and it seems likely that generalizations from the
simplest cases to the more complex will be tenuous indeed. These simplifications are probably the
single most important limitation of the results to be discussed in Section 4.6 and 4.7.
Future studies will deal with the more complex cases. Probably the best strategy will be to do
simulation studies tailormade to the most reasonable specifications in given experimental contexts. In
the present introductory work the best strategy seems to be to concentrate on the simplest cases to
bring forth the general logic of the present approach as clearly as possible.
A3.
Qualitative variables.
A3.1 - Definition of configuration.
A3.2 - Type of error process.
A3.3 - Type of algorithm.
For each of the variables in A2 one can conceive of a set of quantitative parameters to describe them,
this does not, however, seem possible for the variables in A3. Consider first A3.1 which perhaps may
be considered as a domain of variables. One obvious distinction is between configurations defined by
a random process versus by some systematic procedure. One procedure has been to use subsets of a
fixed list of uniform random numbers from Coombs and Kao (1960) as coordinates (Coombs -Kao
coordinates have been used both by Shepard (1966) and Young (1970)). A highly related approach is
to simply use some routine generating uniform random numbers. In Section 4.4 we shall see that
these two procedures have been reported to give quite different results. If this really is the case it is
doubtful whether results from random configurations can be applied to systematic configurations, for
instance a circular configuration. For more examples see the configurations studied by Spaeth and
Guthery (1969). To the extent that results depend upon type of configuration their general relevance
will be highly limited.
A3.2 has already been extensively discussed, cfr. specifications 1 to 5 in Section 3.2. For both A3.1
and A3.2 one would hope that the results will be largely independent of a concrete set of
specifications, in other words that there are neither main effects nor interactions associated with these
qualitative variables. Obviously it is practically impossible to investigate this with any degree of
completeness. The strategy to be pursued here is to compare a few examples differing in many
aspects of A3.1, A3.2 and A3.3 and on this basis make some tentative generalizations.12
A3.3 has been discussed in detail in Section 3.1. The simplest case would be if one algorithm turned
out to be better than all the others, or if they turned out to be equally good. The choice between them
could then be made on the basis of other considerations, the most relevant such consideration in most
cases would probably be computer time. If it turned out that some algorithms gave best results for
some conditions, other algorithms for other conditions, we would have a fairly messy situation, making
life more difficult for users of multidimensional scaling. In Section 4.3 we compare the currently most
popular algorithms from the point of view of the metamodel.
B.
Dependent variables.
As a result of making specifications for Al, A2 and A3, there will be values of the three basic relations
NL, AF and TF. Since these relations have been discussed in detail in Ch. 3 a summary comment is
sufficient here. It will be recalled that for each of these three relations there are a variety of indices to
12
See the study of systematic configurations in Section 4.7 for an example of this strategy for A3.1 and A3.2
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
57
give them numerical expression. Preliminary results strongly indicate that results are independent of
the specific indices chosen, this justifies treating each of the three basic relations as unitary concepts.
Theoretical reasons indicate linear correlation as the basic index for NL and TF, stress is a convenient
choice for AF since it is so well known.
Occasionally, however, other indices will be used to illustrate more specific points.
Notice that though in the present classification of variables it is logical to classify AF as a dependent
variable, it will in Section 4.6 be most convenient to consider AF, n and t as independent variables and
just TF as the dependent variable.
Having now summarized A and B there is a final distinction to be made: to differentiate between
unrepeated and repeated designs. As the terms are used here in the former case a given true
configuration is only analyzed once. This corresponds to the simple case of the metamodel and is the
only type previously reported in the literature. To simplify, t will be assumed known in this case and we
will set m = t. For repeated designs a given true configuration will be analyzed several times, each
time the actually introduced error components will be different. This corresponds to the extended form
of the metamodel. Unrepeated designs are discussed in Section 4.6, repeated ones in Section 4.7.
These two types of designs are regarded as supplementary, that is they should give the same results
on the aspects where they are comparable. A discussion of previous contradictory results in simulation
studies in Section 4.4 will bring this out.
From the point of view of our three main problems: precision, dimensionality and applicability the next
section –comparing algorithms - may seem as a digression. Yet before proceeding with our major
concern it is of importance to explore whether there is one tool (algorithm) better suited for our main
purpose than alternative ones.
4.3
Comparison of algorithms.
Before discussing the broad question of comparing algorithms, it is first necessary to discuss the
widespread practice of using Kruskal's algorithm with an arbitrary initial configuration.
4.31
Choice of initial configuration and local minima.
Concerning local minima Kruskal (1964b, p. 35) writes: "Experience shows that this is not a serious
difficulty because a solution which appears satisfactory is unlikely to be merely a local minimum”.
If it is feared that a given solution does not represent an overall minimum it is recommended to start
again, for instance by "using a variety of different random configurations to repeatedly scale the same
data …if a variety of different starting configurations yield essentially the same local minimum, with
perhaps an occasional exception …..then there is little to worry about” (Kruskal, 1967, p. 23).
From the point of view of the user this state of affairs is, however, clearly a nuisance. No unequivocal
criteria of “satisfactory solution" is presented by Kruskal, furthermore having to run the same data
repeatedly makes the program much more inconvenient to apply, both from the point of view of time
used to preparing runs and in terms of computer time.
There is now a fair amount of evidence that the risk of being trapped in a local minimum may be very
high if Kruskal's initial configuration is used. For 120 twodimensional error free configurations Shepard
(1966, p. 297 note 6) reported that in about 15% of the cases it proved necessary to begin again from
a differing staring position. On the other hand Spaeth and Guthery (1969, p. 509) in their analysis of
19 one and twodimensional simple geometric shapes reported that: "in well over half the runs
MDSCAL landed at a minimum other than the true overall minimum." Wagenaar and Padmos (1971,
p.102) more generally report that "without precautions a considerable number of calculations end in
local minima", and the analysis in one dimension "generally tended to end in local minima when the
initial configuration was chosen randomly." (op.cit.p.103.) Lingoes and Roskam (1971) defined as
"major local minimum" a solution deviating more than .025 K* units (cfr. p. 52 ##) from the best
obtainable. For 40 random cases (10 points generated in five dimensions and analyzed in two
dimensions) there were 45% major local minima (op.cit. Table 4, p. 91). Finally they report that for 40
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
58
cases generated and analyzed in two dimensions there were 35% major local minima in two
dimensions and 20% for the onedimensional solutions (op.cit. Table 8, p. 97).
As part of the present simulation studies a set of 25 random configurations of 15 points in 2
dimensions, covering 5 noise levels, were analyzed independently in 1, 2 and 3 dimensions. The
following results were obtained: for 3 dimensional solutions l2%, for two- dimensional 40% and for 1
dimensional 84% major local minima.
As might be expected local minima were reflected not only in higher stress for non-arbitrary initial
configurations, but also in substantially worse true fit. Further aspects of this study which compared
the main nonmetric algorithms will be given in Section 4.32.
Concerning the results of local minima when arbitrary initial configurations are used, it should be borne
in mind that the number of such minima may be reduced by starting in a “too high” dimensionality and
going down. For 10 cases of 10 points in two dimensions Lingoes and Roskam (1971, p.93) found no
local minimum in two dimensions when they started in 5 dimensions, whereas there were 7 local
minima when working directly in two dimensions.
A more satisfactory way of avoiding the problem would be to start with an initial configuration directly
related to the dissimilarities, that is a non-arbitrary initial configuration. The most elaborate such initial
configuration is the procedure in TORSCA (Young, 1968a). Initially dissimilarities13 are treated as
distances and converted to scalar products by way of the well known formula presented by Torgerson
(1958, p.258) and eigenroots and vectors are computed. This is usually referred to as the Young Householder -Torgerson decomposition. From this configuration Euclidean distances are computed
and the best monotonic transformation of the dissimilarities is computed. The transformed
dissimilarities are again converted to scalar products and the process is repeated until no
improvement is possible, this then is the starting point for the nonmetric algorithm.
This procedure has the undesirable feature of not being completely independent of metric properties,
and “it is possible that a presumption based on the metric of the data may get one into local minimum
traps”. (Lingoes and Roskam, 1971, p.128). Furthermore the procedure is very timeconsuming. A
much simpler procedure which has some of the same features is first to convert the dissimilarities to
rank numbers,14 and then simply treat these rank numbers as distances which are converted to scalar
products and factor analyzed. This is equivalent to using only the first step in the iterative procedure
for the initial configuration in TORSCA, but in such away that the results are independent of any
monotone transformation. The procedure may be called a rank factor analysis approach.
Lingoes and Roskam (1971, p.42 - 43) recommend the same procedure except for first squaring the
rank numbers. They do not, however, report any results from this procedure. The SSA-1 initial
configuration has a related but somewhat more involved rationale. For details on this procedure see
Lingoes and Roskam (1971, p.39 - 42). In Section 4.32 there will be no further discussion of Kruskal's
algorithm with an arbitrary initial configuration, but the studies to be reported there are relevant to the
problem of comparing the merits of the various non- arbitrary initial configurations.
The present survey confirms the conclusions by Lingoes and Roskam (1971, p.132) who strongly
points out that arbitrary starts in Kruskal’s MDSCAL are to be avoided.
4.32
Comparing, MDSCAL. TORSCA and SSA-1
As argued in Section 2.1 the central issue when comparing various nonmetric algorithms is which one
of them gives best true fit. Though in practice an underlying true configuration is not known, it would
be reasonable to extrapolate from simulation studies if over a variety of different circumstances one
algorithm persistently did better than another.
13
Similarities are inverted, values of 0 are then unacceptable.
14
Similarities are then sorted in descending order, dissimilarities in ascending order.
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
59
The different features of algorithms, as outlined in Section 3.1, have no intrinsic psychological content,
at least no one has argued for the specific theoretical relevance of e.g. choosing weak vs. strong
monotonicity as basic in a given algorithm. The present section may then be said to deal with strictly
technical problems, the battle between the algorithms is to be fought on purely pragmatic grounds.
There is a widespread feeling that the main nonmetric algorithms largely give if not identical, then for
practical purposes close to indistinguishable results. Yet there have been surprisingly few studies
systematically comparing algorithms from the point of view of true fit. Young (1970) includes a
comparison of' TORSCA with results from MDSCAL presented by Shepard (1966). This comparison
only deals with the error-free case and will be commented in Section 4.6. As previously pointed out
Lingoes and Roskam (1971) mainly restrict themselves to comparing indices of apparent fit, though
they do include one comparison between some algorithms from the point of view of' true fit. This
comparison, however, is restricted to variants within their hybrid MINI-SSA15 program.
Analysis of order 4 matrices.
A priori it seems reasonable that any marked differences between the algorithms would be most likely
to show up for small values of n. Partly from this point of view we have been persuaded by the
argument made by Lingoes and Roskam (1971, p. 99) that analysis of all distinct order 4 matrices may
highlight problems only hinted at in analysis of larger matrices. Analysis of order 4 matrices is
facilitated by the fact that there are just 30 distinct order 4 matrices, (cfr. the proof op.cit. p. 105-107).
Notice, however, that analysis of order 4 matrices is of only marginal relevance from the point of view
of our main criterion since for these matrices no external L can be said to exist.
Our main concern in this study was to try out the rank factor analysis approach to initial configurations
in Kruskal’s algorithm and also to compare TORSCA (excluded by Lingoes and Roskam) with the
other variants. 16 The results are presented in Table 1.
15
For the uninitiated the acronym needs spelling out: Michigan (Lingoes), Israel (Guttmann) Netherlands
(Roskam) Integrated Smallest Space Analysis. As recognized by Lingoes and Roskam the acronym ought to
include a reference to Kruskal since his block partition approach to monotonicity is one of the options in MINISSA.
16
A minor concern was to check that the program versions presently used were identical to the ones used by
Lingoes and Roskam by comparisons where possible. We shall later see that unfortunately there is reason to
doubt whether errorfree programs always have been used in the published literature.
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
60
Table 1. Mean results from analysis of 30 distinct order 4 matrices.
Algorithm
MDSCAL
Initial configuration
Kruskal arbitrary initial
configuration
rank factor analysis*
SSA-1 output
MDSCAL
MDSCAL
TORSCA
SSA-1
Lingoes Roskam over all minima**)
.053
a)
S
.106a)
.077b)
.060c)
.064b)
.065b)
.095a)
K*
.108b)
.111C)
a) Result taken from Lingoes-Roskam (1971, Table 11 p.108)
b) Own result
c) Result from Lingoes Roskam confirmed in own study.
*) A possible weakness in MDSCAL may be revealed by the fact that occasionally the stress of the
initial configuration did not decrease, but dramatically increased. In these cases the stress of the initial
configuration was used in the mean. The same phenomenon was occasionally observed in analysis of
random sets of 6 points the same strategy was used then. This phenomenon has, however, never
been observed with the thousands of other cases where rank factor analysis has been used in the
course of the present investigation.
**)These values represent “the best solutions obtained over all possible algorithms and initial
configurations tried in minimizing K* or S". (op.cit. p. 100).
The following conclusions are tentatively offered on the basis of the results presented in Table 1:
1) There is something to be gained by the factor analysis initial configuration compared with
Kruskal's arbitrary starts, (stress = .077 vs. stress = .106). The other alternatives do, however,
appear to be clearly superior since they all give lower stress than .077.
2) If MDSCAL is provided with optimal start configurations (perhaps output from SSA-1?) not
very much can be gained by trying a variety of alternatives. In practice the latter strategy
would probably be prohibitive in terms of the time required both to prepare runs, also
computer time.
Algorithms compared with respect to true fit
Our next task is now to present results with larger values of n, using TF - categories as the major
criterion. We shall then see to what extent the tentative conclusions from the previous analysis must
be modified. For each of the conditions (a specific combination of n and t) several random
configurations were generated. In each condition different levels of noise was used so that the whole
range of TF-categories values was observed in each condition. Before discussing the results with
noise some comments on results for error-free data are made. 5 sets of error-free dissimilarities were
included in each condition. The comments serve to clarify why results from these sets are excluded
from the main results.
Remarks on results from error-free data.
Inclusion of these cases would have given the TORSCA analysis of M an unfair advantage since then
straight Euclidean distances are inputted, and the Young – Householder -Torgerson resolution gives
not only a solution with stress = 0, but a solution with perfect true fit.
Unfortunately this has not been clear in the literature. Commenting on results from Spaeth and
Guthery (1969) Lingoes and Roskam (1971, p.130) point out that “either the decomposition was not
carried out correctly or the TORSCA monotone distance algorithm is capable of messing up a perfect
fit”. The present results conclusively show that the TORSCA monotone algorithm is working perfectly
in all cases of analysis of Euclidean distances. As intended TORSCA then works as a metric method
in these cases and the expected perfect true fit has been found. Consequently there must have been
some error in the Spaeth - Guthery implementation of TORSCA.
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
61
A minor technical point is that generally MDSCAL17 is superior to the other algorithms for error-free but
monotonically distorted distances. In such cases TORSCA has in the present work not generally been
found to give a solution with stress = 018, while this generally is the case for MDSCAL. Likewise SSA-1
rarely ends up with perfect apparent fit, this may, however, be due to one of the criteria for termination
in this algorithm, unlike the TORSCA and MDSCAL algorithm, there are no options for adjusting
criteria for termination in SSA-1. Stress = 0 for MDSCAL corresponds to slightly improved true fit
compared with the other algorithms, this will be further commented in relation to Figure 6.
Results from data with noise.
Table 2.
Algorithm
MDSCAL
MDSCAL
TORSCA*
TORSCA
TORSCA
SSA-1
Mean results of TF-categories for different algorithms and different combinations of n and
t. n = 6, t = 1 is based on 35 different random configurations, for the other conditions 20
different random configurations were used. Noise was introduced for all the
configurations. The Ramsay-Young error process was used.
Initial
configuration
rank factor
analysis
SSA-1output
mij
rank numbers
m ij = m2 ij+10**
n t
6 1
n t
7 1
3.125 3.370
2.784
2.943
3.243
3.269
n t
9 2
n t
9 3
n t
10 1
n t
5 2
2.984
3.059
2.992
2.967
3.570
3.531
3.510
3.553
2.014
2.221
2.022
3.000
3.477
2.204
2.140
* For TORSCA using various monotone transformations of the original raw data is equivalent to different initial
configurations for other algorithms, cfr. p. 58.
** This specific transformation is included because it was used by Young (1970).
From the results presented in Table 2 we see that the tentative conclusions from Table 1 do not have
general validity. Though slight, the difference disfavouring MDSCAL for n = 6 and n = 7 is, however,
probably significant. In view of this and the poorer performance of MDSCAL for order 4 matrices we
will not recommend MDSCAL when n is small. The main simulation studies in Section 4.6 use
TORSCA for n = 6 and n = 7, and for other conditions MDSCAL and TORSCA are both used.
For the other conditions, however, the use of MDSCAL (with rank factor analysis) does not give
noticeably different results from the other algorithms. Notice specifically that the elaborate procedure
used in order 4 matrices, to use the output of SSA-1 as the initial configuration for MDSCAL, does not
improve the much simpler procedure of rank factor analysis to give the initial configuration.
The main conclusion to be gained from Table 2 is that with an initial configuration from rank factor
analysis, Kruskal's MDSCAL has not been significantly improved on by any of the more recent
algorithms.
Specifically concerning TORSCA the results clearly show that rank numbers may just as well be used
as input data.. Since otherwise TORSCA and MDSCAL are practically identical, the present results
show that the timeconsuming construction of an initial configuration in TORSCA really is superfluous.
The studies to be discussed in Section 4.6 and 4.7 are almost exclusively based on MDSCAL. Since
there are no clear advantages of other algorithms it is reasonable to make the choice on the basis of
computer time and MDSCAL is here far superior to the other algorithms.
17
Referring just to MDSCAL it will further be understood that rank factor analysis has supplied the initial
configuration
18
Notice also that in Young (l970, Table 3, p. 466) stress ought to be 0 in the first column since this corresponds
to the error-free case, but of the 15 means in this column only one is given as .0000 in the other cases stress
varies between .000l and .0117.
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
62
MDSCAL versus SSA-1. An unresolved problem.
A further analysis of the results of comparing SSA -1 and MDSCAL may yet bring out problems
worthy of further study. First we should note that for the 60 comparisons between MDSCAL and SSA
-1 the closely similar means reflect results that were highly similar for each single configuration. In 50
of the 60 cases the difference in TF was less than .20, the two largest differences were .38 and .59 probably only the latter difference could be of any importance in applied work. Secondly a detailed
analysis of the 40 cases, where both SSA -1 and MDSCAL using the output from SSA -1 as the initial
configuration were used does bring out some further problems.
In the following discussion it is necessary to keep in mind a distinction made by Lingoes and Roskam
(1971). They point out that the goal of minimizing K*(based on rank images) is not necessarily the
same goal as minimizing stress (based on block partition values). The first goal is called minimizing
goodness of measurement, the second goal minimizing goodness of fit. (The intended connotations of
“measurement” vs. “fit” are not made clear). We will just refer to minimizing K* vs. minimizing S,
wishing to emphasize that these two approaches may sometimes represent different targets.
The first point to note is that - perhaps not unexpectedly – SSA -1 does not minimize stress. For n = 9,
t = 2 mean stress of SSA-1 output was .1027 while the final MDSCAL output had stress of .0946. For
n = 9, t = 3 the comparable results were .0647 vs. .05540. While these differences may not seem
dramatic the results are highly significant because in every one of the 40 cases MDSCAL reduced
stress. Since the final output from MDSCAL in this case was mostly, (practically) identical to the output
from MDSCAL starting with rank factor analysis, it seems reasonable to assume that the
MDSCAL/SSA -1 approach succeeded in arriving at a global minimum in what may be called the block
partition configuration space. If now minimizing stress had been an optimal goal from the point of view
of true fit, we would have expected to find that the MDSCAL/SSA -1 would show better true fit than just
the SSA-1 solutions. But this very clearly did not happen, if anything it was rather the case that
improving stress made true fit worse. Cfr. 3.059 vs. 3.000 and 3.551 vs. 3.477 in Table 2). The major
conclusion to be drawn from this analysis then is:
Minimizing stress is not necessarily an optimal goal from the point of view of achieving the best
possible true fit.
While MDSCAL minimizes stress, the next question is whether on the other hand SSA -1 minimizes
K*. If this was the case there would be a basis for asking under what conditions what target is optimal
from the point of view of true fit. If now SSA -1 does minimize K* further iterations on the output from
SSA -1 can do nothing but increase K*, the iterations will then move the output away from .the global
minimum in the “rank image configuration space”.
Consequently the next step was to compute K* for the MDSCAL/SSA-1 solutions. The results were
both clear and puzzling. For n = 9, t = 3. mean K* for SSA -1 was .0865 for MDSCAL/SSA -1 .0850.
For n = 9, t = 2 the results were .1397 vs. .1425 respectively. As indicated by these means it is entirely
unpredictable whether MDSCAL will increase K* or not. In 21 cases MDSCAL increased K* and in the
other 19 cases it decreased K*.
These results clearly indicate that it is not the case that SSA -1 generally minimizes K*. The puzzling
problem is then why SSA -1 did not do worse than MDSCAL. SSA -1 minimises neither K* nor S yet it
works as well as any other algorithm from the point of view of true fit. The present results do not permit
a more detailed analysis of this problem since the differences in true fit are small and one can not
expect strict relations between targets for minimizations and true fit.
One may however, speculate that improving SSA -1 (by using some variant in MINI-SSA?) will lead to
minimization of K* and that there are a large number of cases where this is a better target than
minimizing S. Yet another possibility is that there may be still other targets that may profitably be
explored from the point of view of true fit.
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
63
Conclusions.
The present choice of mainly using MDSCAL is not contraindicated by the variety of analyses
presented by Lingoes and Roskam (1971). There are three major investigations in that report which
are of relevance:
a) From a large number of analyses of random cases they conclude that when provided with a good
initial configuration "MDSCAL operates as satisfactorily as any other procedure we investigated". (op.
cit. p. 98).
b) Analysis of order 4 matrices have already been discussed, this confirmed a suspicion in our own
data that MDSCAL may not be quite satisfactory when n is very small (7 or less).
c) In their study of “metricity” (true fit) MDSCAL was not used but one of the variants within MINI-SSA
used (op. cit. Table G, p.128) seem to differ only in the choice of initial configuration and step size
calculation. The difference between this version and the others which were tested turned out to be
negligible.
Still there is every reason to reiterate the plea for more research. We would suggest working with a
few simple systematic configurations, experimenting with a variety of algorithmic strategies always
evaluating any strategy from the point of view of true fit. Such investigations may well give rise to
different conclusions than the present general "no difference" verdict.
The estimates of TF from stress, n and t to be presented in Section 4.6 can be applied regardless of
whether MDSCAL or TORSCA has been used. Even though we have not made any claims for
superiority of these methods in relation to SSA -1, it must be emphasized that the estimates can not
be applied if SSA - 1 is used. In Section 3.1 we were at some pains to point out that K* and S will give
quite different results. To illustrate this we may mention that for order 4 matrices the relation between
K* and S is well described by K* = 1.8 x S. For n = 9, t = 2 and 3 the relation was K* = 1.3 x S and for
n above 15 the relation is generally K* = 1.2 x S (one may guess that the relation will tend to l as n
increases). It would, however, probably be difficult to find a general conversion formula from K* to S.
Another approach would be to compute stress of' SSA –1 output directly.) Since this as previously
discussed gives higher values of stress without giving worse true fit such a strategy would lead to
biased estimates as the estimates of true fit would be worse than really warranted.
4.33
Metric vs. nonmetric methods
A major advantage of the nonmetric methods compared with metric methods is that the latter may give
too many dimensions and thus give rise to misleading interpretations. This will be the case if a metric
interpretation of the data is not appropriate. A favourite example is the study of colour vision by Ekman
(1954), where a metric analysis produced five factors. This was a somewhat surprising result in view
of classical colour theory. Nonmetric reanalyses of Ekman’s data have consistently given the familiar
colour circle, a two-dimensional representation (Shepard 1962b, Kruskal 1964a, Guttmann 1966,
Coombs 1964).
Torgerson (1965) reviews other studies on colour vision during the 1950’s where metric analyses gave
disturbing results and later (Torgerson, 1967) he points to simulation studies showing that if true
distances are distorted by some monotone transformation and the resulting “distances” analyzed
metrically, too many dimensions will result. This clearly demonstrates the advantage of nonmetric
studies if there are nonlinearities in the data.
On the other hand one may ask, what if the re are no non- linearities in the data, will metric analysis
then give better results than nonmetric? We know that this must be the case if there is no noise, since
as repeatedly stated this always implies some (even if slight) distortion when nonmetric methods are
used, but perfect true fit for metric methods. But what about cases where there are linear relations
between dissimilarities and distances, and also noise? Notice that this is a very good illustration of
how irrelevant AF - stress is in answering many questions. A nonmetric analysis will invariably turn out
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
64
with a solution with lower stress than the metric solution19, it may, however, be the case that the
nonmetric algorithm just messes up the best obtainable solution by capitalizing on noise.
If the researcher feels "fairly" certain that there are no pronounced nonlinearities in his data he may
face a difficult choice: Should he use a metric analysis and perhaps gain in precision? (if nonmetric
methods capitalize on noise and thus distort relative to the metric solution.) But if his presumption of
linearity was incorrect he may risk ending up with too many dimensions.
This problem is the point of departure for the simulation studies to be reported here. For n = 20, 20
different random configurations were analyzed for t = 1, 2 and 3, altogether 60 different configurations.
In each of the conditions various levels of noise were introduced so as to cover the whole range of
values of TF-categories of interest.
It will be recalled from Section 3.2 that there may be some (even if slight) nonlinearity when the
Ramsay-Young error process is used. In the present study we did not wish to introduce any bias
whatever disfavouring the metric approach so we used the rank image approach to correct for any
nonlinearities. The study then also serve the subsidiary purpose of providing more information on the
linearity of the relation between L and M for the Ramsay-Young process20.
As shown in Table 3 three different versions of M were analysed metrically, first M then M*L and finally
M*G.21
Tab1e 3.
Mean results of TF-categories for MDSCAL vs. metric methods. Each mean is based
on 20 different random configurations, n = 20. Noise was introduced for all the
configurations. The Ramsay-Young error process was used.
Algorithm
What is analyzed
1
2.730
MDSCAL
M
Factor
Analysisx
M
3.014
Factor
analysis
M*L
2.886
Factor
ana1ysis
M*G
2.913
x) Young-Householder-Torgerson decomposition.
t
2
2.535
3
2.743
2.999
3.171
2.913
3.087
2.885
3.056
The results were quite surprising. It had been anticipated that on "home ground" so to speak the
metric approach would surpass the nonmetric approach, but the results speak very clearly otherwise.
Looking at the results in more detail the impression from Table 3 is strikingly confirmed. For each of
the 60 configurations the true fit of the MDSCAL solution was compared with the best of the three
metric solutions. In 8 cases it was possible to find a metric solution which outdid the nonmetric ones,
the largest of these 8 differences was, however, a trifling .011, a difference hardly likely to make any
practical difference. It may be mentioned that for some of the conditions reported in the previous
section similar comparisons were made between metric and nonmetric approaches. The present
results are not restricted to a relatively high value of n.
19
The present studies partly used MDSCAL with a metric solution as start configuration and stress always
decreased. As expected from results reported in the previous section this gave identical results to those obtained
when rank factor analysis supplied the initial configuration.
20
Comparing r(L, M) with r(L, ML*) and r(L, MG*) failed to show any case of discrepancy likely to be serious
in practice.
21
As expected and as also implied by Table 3 M*G and M*L were consistently highly interrelated, the correlation
between these two vectors never strayed below .99. This shows that while M*L may theoretically be the most
desirable transformation of M, M*G which always can be computed in practice, is a perfectly viable alternative.
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
65
These results give a very clear answer to the problem of whether one should use a metric or
nonmetric approach if one for some reason is in doubt. Provided there is noise in the data (and who
can doubt that for empirical data?) there is nothing to loose by using a nonmetric algorithm, on the
contrary it is highly likely that the results will be substantially better.
Perhaps this unqualified recommendation of nonmetric methods should now be somewhat tempered.
Part of the motivation for the study reported in this section was provided by some very important
experiments reported by Torgerson (1965). He showed that there where cases where the structure of
the data was not revealed, but rather distorted by using nonmetric models. He used geometric figures
which reflected both qualitative and quantitative dimensions. In these cases the nonmetric programs
partly distorted the underlying structure “by eliminating the contribution [of qualitative dimensions]
entirely and then capitalizing on error” (op.cit. p. 389).
Is there then a contradiction between the results reported in this section which come out clearly in
favour of non-metric methods and Torgarson's strong tempering of unbridled enthusiasm for nonmetric
methods. We do not think so. Spelling this out, however, requires a more synthetic view of types of
models for similarities data. This is the topic for Ch. 6.
4.4.
Previous simulation studies and two methodological problems.
Analytical versus graphical methods.
Unrepeated versus repeated designs.
In this section the systematic simulation studies most directly leading up to the main results presented
in Section 4.6 will be reviewed. In that section detailed results will be presented -extending the results
of the studies to be dealt with in this section. The present section will mainly focus on two
methodological problems.
The first systematic simulation study was done by Shepard (1966). He worked with two-dimensional
configurations. For each n (varying from 3 to 45) he constructed 10 random configurations from the
Coombs-Kao coordinates. From the results he concludes that (op.cit. p. 299) “while the reconstruction
of the configuration can occasionally be quite good for a small number of points, it is apt to be rather
poor (for n less than eight say)”. For n = 7 minimal r = .919 and root mean square = .980. “As n
increases, however, the accuracy of the reconstruction systematically improves until even the worst of.
the ten solutions becomes quite satisfactory with ten points [min r = .992] and, for all practical
purposes, essentially perfect with 15 or more points.” We note that Shepard does not commit himself
as to how low a correlation must be in order for TF to be unacceptable, but it appears that a correlation
of say .90 would be considered as unacceptable by him.
From the point of view of the metamodel, Shepard's study corresponds to one special case, the case
where there is no noise. As previously stated, even though TF will be acceptable in this case there will
always be some distortion. L and M will coincide since there is no noise, while there will be some
discrepancy between L and G. The other special case of the metamodel is the analysis of random sets
as briefly discussed in Section 4.1.
In practice the intermediate cases will be of greatest interest. The first systematic study of such cases
was probably done by Kruskal. “Kruskal (personal communication) has also investigated the
robustness of the solution by imposing random deviations on artificially generated data. Generally a
moderately high level of added “'noise” can be sustained before the recovered configuration suffers
serious deterioration" (Shepard, 1966, p. 308).
The first complete report giving details on the amount of noise and the resulting deterioration is given
by Young (1970). He studied five levels of n (6,8,10,15,30), three levels of t (1,2,3) and five levels of E
(0, 0.10, 0.20, 0.35, 0.50). The configurations were generated by using partly overlapping subsets of
the Coombs-Kao coordinates. This generated configurations which are not independent of each other.
Possible consequences of this will be discussed later in this section. For each of the 5x3x5 = 75
combinations means of TF and AF were computed across 5 different random configurations.
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
66
TF-correlations were converted to Fisher's z which were used in regression analysis, mean z values
were converted back to square correlations. Young plotted the relation between (n, AF) and (n, TF)
separately for each level of E and t. For TF the main effects were as expected, TF improved with
increasing n, and deteriorated as E increased, and finally deteriorated as t increased. For AF the
results showed the same pattern for E and t as for TF. But increasing n leads to increase
(deterioration) of AF-stress. The fact that increasing n gave opposite results for TF and AF seemed to
be a cause of some concern for Young: "if one relies heavily on stress the unfortunate situation exists
that as he diligently gathers more and more data about an increasingly larger number of stimuli, he will
become less and less confident in the nonmetricality reconstructed configuration, even though it is
more accurately describing the structure underlying the data." (op.cit. p. 471). This result is of course
exactly as expected from the point of view of the metamodel. For a given NL, G will move back
towards L as n increases. Young's results can be taken to definitely show that it is not appropriate to
try to interpret stress without regard for the number of points involved.
Concerning definition of a lower limit for an acceptable true fit correlation, Young is no more explicit
than Shepard. In illustrating his main conclusion, that "nonmetric multidimensional scaling is able to
recover the metric information of a data structure, even when the structure contains error" (op.cit. p.
470) he implies that values of his index higher than .80 (which corresponds to correlation higher than
.91) are acceptable (op, cit. p.471).
Noting the regularities in his data Young tried in several ways to develop regression equations which
would enable one to estimate TF from stress, n and t. "although some of the attempts were able to
account for more than 90% of the variance in z, it was not possible to develop a regression equation
which permitted reasonable interpolation and extrapolation". (op.cit. p. 412). By looking at a previous
report (Young, 1968b) we can see why the attempts failed and this may then suggest a different
approach to the estimation of TF. Since TF - z was not linearly related to n, a term for points squared
was added. This, however, led to a non-monotone relation between n and TF for a given value of AF.
Beyond n = 22 his table showed that TF deteriorates with increasing n for a given stress value. This is
inconsistent with Young’s main results which showed improved TF as n increases. A reasonable
conclusion to draw from this is that while there is sufficient regularity in the results to justify attempts to
estimate TF, this regularity is not easily captured by regression analysis.
An alternative is to use graphical methods. While this approach perhaps gives less accuracy than an
optimal analytical method, it may be very difficult to find the optimal analytical method. The great
advantage of graphical methods is a far greater flexibility than usual analytical methods, there is also a
greater closeness to the data than when working with analytical methods. Since we are concerned not
only with the relation between AF and TF, but also with the relation between NL and TF, maximal
flexibility will be very important. In Section 4.6 a graphical approach will be used. This approach will be
evaluated in terms of the conventional criteria: multiple correlation and crossvalidation.
The remaining problem in this section is a discussion of a disturbing failure to replicate some of
Young’s results by Sherman (1970). Apparently trivial differences in defining true configurations have
been reported to give pronounced discrepancies in the results. These discrepancies, will be discussed
in terms of the distinction between unrepeated and repeated designs (cfr. p.107).
Sherman used the same basic design22 as Young (he has been one of Young's coworkers). The main
difference is that unlike Young Sherman generated true configurations which were all completely
independent of each other. While there where striking over-all similarities in the results the
discrepancies are important in the present context. There were three types of discrepancies. (cfr.
Sherman, 1970, Fig. 16 and Fig. 17, p. 52 - 53).
a) For a given error proportion Sherman failed to find that stress uniformly increased with n.
While not pronounced this result is quite disturbing from the point of view of the metamodel
which leads us to expect G to move closer back to L with increasing n and thus AF- stress
consistently to increase with n (as Young found).
22
Actually Sherman's design was far more extensive than Young's. Using the terminology on p. ## Sherman for
p = 2 analyzed each M vector with pl = 1,2 and 3 and for each value of t (l, 2 and 3) M was analyzed with
m = l, and 3.
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
67
b) For given values of (n, t, E) Sherman found both TF and AF clearly worse than Young did.
This implies that Sherman's results would give a more pessimistic view than Young’s as to the
possibilities of recovering true configurations from noisy data.
c) For all plots of (n, AF) and (n, TF) Sherman found more irregular curves than Young.
Sherman attributes all these discrepancies to the differences in the ways the true configurations were
generated. "all of our configurations were random and independent. There were overriding
dependencies in his (Young's) similarities data which reduced variance and lessened the effect of:
random error” (op.cit. p. 51, 54.) It is, however, by no means clear how this could produce all the
above mentioned discrepancies.
In terms of the distinction between unrepeated and repeated designs Young’s design may be labelled
a partially repeated design. Since his configurations were not completely independent his design is
not an unrepeated one (as Sherman’s clearly is), on the other hand the configurations were not
completely dependent (identical) so neither is Young’s design a repeated one. We now argue that
neither discrepancy a) nor b) above are likely to occur as a consequence of unrepeated vs. repeated
designs. If this is the case there is even less reason to ascribe main effects to the distinction between
repeated and partially repeated designs as done by Sherman.
A basic premise in the argument is that the major systematic source of variance in AF and TF is the
error proportion E producing between cell variance. Generally the variance within cells (specific
combinations of n, t and E) may be regarded as generated by two minor sources of variances. One
such source is that each replication within a cell will give a separate pattern of random error. This
source of within cell variance is common for both unrepeated and repeated designs. The second
source of within cell variance is the influence of different true configurations. This source is of course
restricted to unrepeated designs.
Sherman’s argument may now be restates as a claim that different true configurations not only
contribute to within cell variance but also produce systematic between cells effects, cfr. a) and b)
above. A much more likely possibility is that different true configurations mainly contribute to within cell
variance and also contribute somewhat too unsystematic between cell variance. This will produce
more irregular curves, so Sherman's argument may have some validity for c) above.
If it really was the case that different true configurations produced pronounced systematic between cell
variance then two different repeated designs, each based on replications from just a single
configuration would produce markedly discrepant results. Incidentally, if this were the case any
attempts at general attempts to predict TF would be doomed to failure.23
The basic design in Section 4.6 is an unrepeated design which gave results closely resembling
Young's and thus gives support to the critical comments on Sherman’s analysis. Unfortunately,
however, it does not explain why he got so deviant results.
Hopefully this discussion serves to underscore not only the distinction between repeated and
unrepeated designs; but also the wider concern of general replicability of results. Since unexpected
error may creep in when complex chains of programs are used (as is necessary when investigating
true fit), replicability is a major concern.
In the next section the metamodel is put to use and implications drawn which give a simple framework
for assimilating the relations between the main variables and directly leads to the type of analysis done
23
By using the technical apparatus of analysis of variance one could get more precise information on the effect
of varying true configurations. More complex designs (for instance repeated design within cells, unrepeated
between cells) might also be illuminating. We do, however, feel that analysis of variance may not be the optimal
tool in an explorative work like the present. The results to be reported in Section 4.6 show overriding
consistencies in the results. At present the best strategy seems to be to focus on such consistencies. Later
research may then bring out the full complexities.
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
68
in Section 4.6. To illustrate the general relations we report some of Young’s results in detail and
compare them with our own to show that what we consider essential features are replicable.
4.5
Implications from the metamodel.
In the following treatment it is convenient to consider (n, NL) as independent variables and (TF, AF) as
dependent variables. Dimensionality will be considered fixed. A schematic representation of
implications for two levels of n and two levels of NL are given in Fig. 1.
Fig. 1.
Schematic illustration of the relation between (n, NL) and (TF, AF). Note that AF is of the same
size in a) and d) and that TF is the same in c) and c)
Fig.1 illustrates the two sets of main effects to be expected from the metamodel:
Increase n. TF decreases (improves) and AF increases ("looks worse"). In other words G moves away
from M towards L, that is purification increases with n.
Decrease NL (reduce noise). Both TF and AF decrease (improve). For a given combination of (n, t)
NL, AF and TF are all highly intercorrelated. One might say that for any such combination there is just
one degree of freedom. This is a basic fact which makes estimation of TF from AF (or from knowledge
of NL) possible.
Before proceeding with the joint implications of a) and b) it may be advantageous to present empirical
support for these effects. Tables 4 and 5 also serve to confirm the critical analysis of Sherman’s failure
to replicate Young's findings discussed in the previous section. The only difference between our own
procedure and that of Young is the present use of an unrepeated design in contrast to Young' s
partially repeated design.
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
69
Table 4. Stress as a function of error proportion, E, number of points, n, and dimensionality, t. A
replication of results reported by Young (1970, Table 3, p. 467).
n = 10
E
0
.10
.20
.35
.50
means
own
.000
0348
0925
1746
2230
.1050
1
Young
.0005
.0407
.0895
.1565
.2215
.1017
own
.0004
.0126
.0520
.0733
.1098
.0504
t
2
Young
. 0023
.0162
.0455
.0926
.1378
.0591
3
own
Young
.0006 .0055
.0087 .0121
.0338 .0336
.0591 .0654
.0752 .0738
.0355 .0381
n = 15
0
.10
.20
.35
.50
means
.0004
.0473
1070
.1822
.2696
1213
.0013
.0530
.1099
.1777
.2617
.1208
.0007
.0300
.0680
.1394
.1750
.0826
.0025
.0310
.0741
.1257
.1729
.0813
.0015 .0117
.0224 .0271
.0532 .0591
.0927 .1120
.1233 .1366
.0586 .0673
Table 5 True fit as a function of error proportion, E, number of points, n, and dimensionality, t. A
replication of results reported by Young (1970, Table 3, p. 467). The measure of fit is here
mean z values (from correlations) converted back to squared correlations.
N = 10
E
0
.10
.20
.35
.50
means
own1 Young
.9958 .9998
.9876 .9954
.9844 .9826
.9011 .9419
.9089 .9281
.9719 .9903
t
own2 Young
.9968 .9965
.9896 .9887
.9539 .9567
.9117 .8986
.7489 .7157
.9671 .9643
own3 Young
.9910 .9863
.9823 .9808
.9428 .9342
.8037 .8082
.6864 .6806
.9417 .9340
.9996
.9920
.9780
.8961
.8222
.9826
.9988
.9885
.9552
.8772
.7373
.9700
n = 15
0
.10
.20
.35
.50
means
9986
.9968
.9865
.9646
.9057
.9884
.9993
.9951
.9870
.9621
.9062
.9918
.9900
.9950
.9744
.9486
.8135
.9833
.9946
.9841
.9625
.8790
.7327
.9588
That the difference in design does not produce marked differences in the results are born out by
inspection of' Tables 4 and 5. In each of the tables there are 6 means, and in each of the tables 3
have the higher value in Young's results, 3 in our own results.
We now turn to the main effects: Concerning a) we see that as n increases from 10 to 15 TF improves
both for 1, 2 and 3 dimensional solutions. This is most readily seen by comparing the means. Not all
the comparisons for separate noise levels are in the expected direction but perfect results can not be
expected for a quite limited range of n. For stress the higher mean values for n =15 than for n =10 are
completely supported by the results for separate noise levels both for Young and in the present
results. Concerning b) inspection of each separate column verifies this effect in both investigations.
Consider now the joint implications of a) and b) above.
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
70
From the point of view of TF there must be a compensatory relation between n and NL since an
increase in n can be balanced by an increase in NL. This is illustrated in Fig. 1 b) and c).
For TF a convenient way to summarize these relations might then be to draw a set of indifference
curves, for instance one such curve for each of the proposed category boundaries in Section 4.4. A set
of such curves will be referred to as TF contours from NL. TF contours from NL are presented in Figs.
10, 11 and 12 in Section 4.6. These figures may for instance be useful in estimating TF from a known
(or presumed) reliability or conversely they may be used to estimate the reliability necessary to
achieve a desired level of TF.
From the point of view of AF there must also be a compensatory relation between n and NL but
opposite of that for TF. An increase in n can be balanced by a decrease NL as illustrated in Fig. 1 a)
and d). This is perhaps the most dramatic way of underscoring the fact that stress can not be
interpreted independently of n (and NL). For a high n and low NL a given stress value may indicate
close to perfect fit, while the same stress value for low n and high NL may indicate a practically
worthless solution.
In practice one will not as here consider AF as a “dependent variable”, but as a predictor of TF. Just as
we can construct TF contours from NL we can construct TF contours from AF-stress. Since NL and
AF are highly correlated (for given (n, t) combinations) we would expect these two sets of contours to
be similar in appearance. One might indeed say that the two sets of contours are nothing but
alternative ways of presenting the constraints among TF, AF and NL. The positive slope in the TF
contours constructed from AF-stress indicates that in order to maintain a given TF value, increase in
AF must be compensated by increase in n.
Since TF contours from AF-stress provide a convenient way of estimating TF from AF and are our
answer to the problem of precision, a detailed report of how these contours, cfr. Figs. 3, 4 and 5,
actually are constructed will be given in Section 4.6.
In principle the TF contours from NL contain information on amount of purification, but in a rather
inconvenient form since in these contours quite different units are used for NL and TF. More
convenient - and practically useful - ways of representing amounts of purification will be presented in
Section 4.7.
This section will be concluded by some supplementary comments on the metamodel. Generally we
have two ordinal relations on the three lines in the metamodel:
TF < NL (purification)
AF < NL
(In section 3. 2 we argued that AF- stress < NL-stress (p. 39), and since generally we have
found different indices for the same relation to be highly correlated this inequality should be
fairly generally valid).
The metamodel would be much more powerful and simpler to use of these two ordinal relations could
be supplemented by linearity. That would imply that G could be presented between L and M on the
line LM. Unfortunately there are considerations which rule out such linearity.
b) implies that as NL increases both TF and AF increase. As NL moves towards its maximal value
(corresponding to r(L, M) = 0) TF will move towards its maximal value. This value must be represented
as equal in length to the maximal value for NL since also the maximal value of TF will be represented
by a correlation of 0, r( L, G) =0. Increasing NL forces G away from L, but since AF also increases, G
is simultaneously forced away from L and M and consequently away from the LM line. When
maximally distant from M, G will be equally distant from both L and M.
Concerning a) one might perhaps think that theoretically it could be possible for TF to decrease with
increasing n without simultaneously increasing AF. TF could for instance decrease just by G moving
closer to the LM line (if this movement took place along a circular curve AF would stay constant). But if
such were the case it would not be possible for G to move arbitrarily close to L as n increases without
bounds. This would seem to be an intuitively natural boundary condition. Unlike Sherman (1970), cfr.
a) p.128 we have found no reason to doubt the general validity of a) in our rather extensive simulation
studies.
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
71
It is to be hoped that further research on the metamodel may turn out with more powerful general
statements than appears possible at present.
4.6
Evaluation of precision.
Construction of TF contours from AF-stress
This and the following section will present the material likely to be of most use for empirical analyses
making use of multidimensional scaling. Here dimensionality will be taken for granted, and analyzed
dimensionality will here be the same as true dimensionality.
Since TF contours from NL are tightly interlocked with TF contours from stress, this section also
presents TF contours from NL. This serves to bring out the joint (TF, AF, NL) structure and provides
an additional approach to validate the procedure here used.
As reported in Section 4.5 a replication of some of the results presented by Young (1970) for n = 10
and 15, t =1, 2 and 3 was done. Since the results were found satisfactory, cfr. Tables 4 and 5, the TF
contours from stress could in principle have been constructed from Young's results and in fact part of
his results were utilized. There were, however, two reasons making additional runs necessary.
First, since only mean results were reported by Young, it would not have been possible to check how
close estimated and known true fit would be for individual configurations. Second, Young's design did
not include sufficiently high noise levels.
Additional runs were made for error proportions E =.75, 1.00, 1.25, 1.50 and 2.00. Since Young had
no values of n between 15 and 30, several runs were also made for n = 20.
Finally, additional runs were made for n = 6 and n = 8. Altogether 450 configurations were run in order
to construct the contours. These 450 configurations were distributed as follows in the 5 TF categories:
60, 62, 81, 119 and 128 from “Excellent” to “Unacceptable”, that is with a pronounced emphasis on the
”Fair” and “Unacceptable” categories.
The preliminary replication of Young's n = 6 and n = 7 used TORSCA. All other runs were made with
MDSCAL using rank factor analysis to give the initial configuration. An unrepeated design was used.
The coordinates for each configuration were selected from a uniform random distribution. The
Ramsay-Young error process was used.
For each condition (separate combination of n and t) the first step was to compute the mean of AF and
TF over the 5 replicates for each value of E. For each condition a TF - AF curve was plotted. A sample
of such curves is presented in Fig. 2.
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
72
Fig 2. Sample of results from simulation studies showing the relation between TF – categories and AF –
stress for selected values of n and t. The points represent means.
These curves serve to illustrate that the TF -categories transformation does give satisfactory linear
relations with stress (linearity is perhaps even more clearly revealed in Figs. 7 to 9. These figures will
be commented on later). The curves also illustrate why conventional linear regression is inadequate,
as they are based on an additive model which requires parallel lines.
One might ask whether there is any specific reason why TF correlation should be transformed to give
linear relation with stress, and the answer is the pragmatic one that linear relations are easier to work
with.
Values of AF for each TF category boundary were read from the (TF, AF) plots. In this process
irregularities in the curves were smoothed out, and this smoothing was simplified by the fact that the
curves were basically linear.
For each dimensionality a preliminary table from the smoothed (TF, AF) curves was made. Each
column of such a table represented a specific TF - category boundary and contained the
corresponding stress values for the different levels of n included in the study. To a minor extent these
tables were filled out from Young’s results; where both the present simulation studies and Young's
result gave values for the same cells, the results showed remarkably close correspondence.
A plot of a column in such a table is then a TF contour, cfr. Figs. 3 - 5. Actually the plots from the
preliminary tables did not produce quite so smooth curves as in these figures. Before arriving at the
final TF contours some additional smoothing was found necessary, this was again checked back with
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
73
the original (TF, AF) plots. It was hoped that the smoothing processes would reduce noise in the
curves. As may be seen from Fig. 2, however, the amount of smoothing required was not extensive.
Fig 3.
TF contours from AF – stress for 1 dimensional configurations. Each curve shows a TF category
boundary (contour) as a function of AF and n. Also included is a curve showing the 5%
significance level.
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
74
Fig. 4. TF contours from AF – stress for 2 dimensional configurations. Each curve shows a TF category
boundary (contour) as a function of AF and n. Also included is a curve showing the 5%
significance level.
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
75
Fig 5. TF conttours from AF – stressfor 3 dimensions. Each curve shows a TF category boundary
(contour) as a function of SF and n. Also included is a curve showing the 5% significant level.
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
76
Fig 6. Relation between TF (expressed as categories and correlations) and for configurations in 1, 2 and
3 dimensions when stress = 0.
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
77
Fig 7. Relations between AF – stress and TF – categories for 7, 12 and 25 points and crossvalidation
results for 1 dimensional configurations.
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
78
Fig 8. Relations between AF-stress and TF- categories for 9, 12 and 25 points and crossvalidation results
results for 9,12 and 25 points and crossvalidation results for 2 dimensional configurations.
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
Fig 9.
79
Relations between AF – stress and TF – categories for 9, 12 and 25 points and crossvalidation
results for 3 dimensional configurations.
Points from the curves in Fig. 6 are read off to provide the values at the TF- categories for stress = 0 in
the (TF, AF) curves in Figs. 7 to 9.
The curves in Fig. 6 were constructed in the same way as the TF contours from AF. First preliminary
values for the curves were read off from the original (TF, AF) curves as presented in Fig. 2. A similar
smoothing process was used as that described for constructing TF contours from AF- stress.
A minor technical comment may be made concerning the results presented in Fig. 6. Young (1970, p.
466) stated that MDSCAL and TORSCA are equally adept at recovering the underlying true
configuration in the errorless case. A closer look at the curve for t = 2 in Fig. 6 may lead to some
qualification of the statement, at least for higher values of n. For the n = 30, t = 2 condition Young
found for instance root mean square correlation 24 .999914 vs. Shepard .999998. In terms of the TFcategories scale this difference is not as minor as it appears at first sight, it reflects roughly .4 units in
the TF-categories scale. The difference for n = 15, Young .99951 vs. Shepard .99991 corresponds to
roughly .2 units in the TF-categories scale. These differences do represent a slight, but nevertheless
clearly discernible superiority of MDSCAL in the errorless case, a point also mentioned in the
comparison of algorithms in Section 4.32.
24
In the errorless case and for high n different random configurations give remarkably similar TF-correlations,
so that the way these correlations are averaged is unimportant.
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
80
The curves in Fig. 6 are somewhat influenced by, as previously mentioned, also using results from
Young. Generally, however, they give a picture of recovery falling between the results of Young and
Shepard. That the present results verify the superiority of MDSCAL for error-free data may be
discerned in Fig. 8 where the crossvalidation results both for n = 12 and n = 25 reveal that the points
for stress = 0 read from Fig. 6 fall short of the precision in the crossvalidation results. While more
extensive investigations probably would lead to revision of the curves in Fig. 6, it is, however, hard to
see that these revisions would have much, if any, practical significance.
Turning now to the other special case, the analysis of random sets, there are as previously mentioned
several studies reported in the literature. Unfortunately, none of the studies were sufficiently complete
to be directly included in Figs. 3 to.5. The study by Klahr (1969) did not include values of n beyond 16,
Wagenaar and Padmos (1971) stopped at n = 12. On the other hand Stenson and Knoll (1969) chose
to compute averages just from three cases, this could only suggest rough rules of thumb for testing
statistical significance.
The first step was then to run 50 random sets with 30 points in 1, 2 and 3 dimensions. Especially for 1
dimension there was a clear difference, as the present results gave a mean of .515, while the value
from Stenson and Knoll (1969, Fig. 1, p. 123) was .53. In view of the small standard deviation this was
judged to be a significant difference. A possible explanation could be that the present approach, using
rank factor analysis to give the initial configuration, avoided some local minima. Consequently, 50 sets
of random data for n = 6, 8, 10, 12, 16, 20, 30 were analyzed in 1, 2 and 3 dimensions. Especially for 1
dimension the present results consistently gave stricter values of stress than previous studies. While
for instance Wagenaar and Padmos for n = 12 report the 5% value as .395 the present results gave
.37. Using previously published results might give too many false rejections of the null hypothesis, the
present results are somewhat more exacting. Plotting the results from my own studies did not give
quite as smooth curves as those presented in Figs. 3 to 5, and some smoothing was done.
We may now see quite simply how the analysis of random sets is a limiting case of various noise
levels. The points with maximal stress on the curves in Fig. 2 correspond to points on the random set
contours in Figs. 3 to 5. Turning forward to Fig. 13 may perhaps serve to clarify this even further. The
terminal points on the (NL, AF) curves are directly read off from the random set contours. In Fig. 5 we
may for instance see that for n = 10 the expected stress value for random sets is .10. In Fig. 13 we
then see that when NL is maximal this value is used as the last point on the corresponding (NL, AF)
curve.
From Figs. 3 to 5 it is now easy to see what practical difference it makes to use the conventional 5%
limit vs. the cutoff point of TF = 4 arrived at in Section 3.4. For both 1, 2 and 3 dimensional
configurations the 5% criterion is stricter for n less than 12, whereas the reverse is the case for n
greater than 1.2. We shall later see that the confidence one can have in estimates of TF for n less
than 12 is unfortunately not completely satisfactory, so we do not have any definite advice as to what
criterion to use in such cases.
Leaving out marginal cases for small values of n, there are strong arguments to be made for the
present approach as a better alternative than conventional hypothesis testing. A basic critique against
the customarily reported “p-value” is that it says nothing about what should be of major concern,
namely the strength of the relationship or in the present analysis the amount of structure. The criticism
directed at conventional hypothesis testing, by Edwards et.al. (1963) and more generally discussed by
Bakan (1966) is highly relevant in the present context. The reader who clings to the magical number 5
should rest assured when he convinces himself from Figs. 3 to 5 that the presently proposed cutoff
point is clearly stricter when n is greater than 12. Our advice would be to look at Figs. 2 and 3 in
Section 3.4 and then the user may decide for himself whether he will accept the presently proposed
criterion, or whether (as might not be unlikely in many cases) he may prefer an even stricter criterion
before he looks further at his configuration.
It is to be hoped that the present approach will be seen in the much more fruitful context of estimating
strength of relation (here amount of structure) than the sterile emphasis on just achieving statistical
significance. One purpose of the approach in the present work is to help steer work in
multidimensional scaling away from the trap of overemphasizing statistical significance testing which
probably has done much to make psychological research as sterile as it often is. As Bakan (1966)
points out, one reason for the popularity of statistical tests of significance is that it relieves the
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
81
researcher of making explicit the basis for his decisions. We join Bakan in trying to restore the
responsibility of the researcher for his decisions and hope that the present approach may contribute in
that direction.
Turning now to the evaluation of the present procedure for estimating true fit, it will be most convenient
first to present graphically results from a crossvalidation study, then to bring out the interlocking
structure (TF, NL, AF) by also bringing in the TF from NL contours and finally to present numerical
evaluation of the present procedure.
Since some amount of smoothing was involved in construction of the TF contours from AF-stress,
there is an undefined number of degrees of freedom in the procedure and also the possibility that the
smoothing mainly uses features unique to the results from the 450 configurations and that
extrapolation to other values of n would give substantially different results. Consequently
crossvalidation is very important. Will estimates of true fit for values of n not included in the original
study be as good as estimates in the original study? The original study used values of n = 6,10,15,20
and 30.
For the crossvalidation study n = 7 for 1 dimensional configurations, n = 9 for 2 and dimensional
configurations and n = 12 and n = 25 for 1,2 and 3 dimensional configurations were deemed adequate
values 25. For each condition 5 levels of E were used, and 5 replications were run for each (n, t, E)
combination, making a total of 225 configurations in the crossvalidation study. For l dimensional
configurations E = 0, .20, .50, 1.00, 1.50 and for 2 and 3 dimensional configurations E = 0, .20, .35,
.50, 1.00.
From the TF contours from AF- stress the relevant functions relating TF- categories and AF-stress
were constructed. This step has already been specified as necessary to arrive at more precise
estimates of TF. cfr. Figs. 7 to 9.
If the procedure really does permit interpolation, then the results from the crossvalidation study should
fall on the (TF, AF) curve constructed on the basis of Figs. 3 to 5. In each of Figs. 7 to 9 a circle
represents the joint (TF, AF) mean from 5 crossvalidation configurations. Generally the
correspondence of these means with the independently constructed (TF, AF) curves will be seen to be
excellent.
When constructing TF contours from NL the first step was to find a transformation of r (L, M) which
was linearly related to TF-categories. It was found that:
NL1 = 1 − r(L, M)2
was a good transformation for this purpose. The steps in constructing the TF contours from NL exactly
paralleled those for TF contours from AF-stress and will not be further discussed. The resulting
contours are shown in Figs. 10 to 12.
25
For the results for 6 points in 2 and 3 dimensions it was possible to estimate TF from AF. We recommend n =
8 as the lowest value to use for dimensionality greater than 1.
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
82
Fig 10. TF contours from NL 1 dimensional configurations. Each curve shows a TF category boundary
(contour) and a function of NL and n.
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
83
Fig. 11 TF contour from NL for 2 dimensional configurations. Each curve shows a TF category boundary
(contour) as a function of NL and n.
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
84
Fig. 12 TF contour from NL for 3 dimensional configurations. Each curve shows a TF category boundary
(contour) as a function of NL and n.
Since by equation (2) in Ch. 3
r(L, M)2 = r(M1M2 )
we then get:
r(M1M2 ) = 1 − NL12
For instance NL1 = .20 corresponds to r(M1 M2) = .96. For convenience both NL1 and r(M1 M2) are
given in Figs. 10 to 12.
This permits a simple way of estimating TF from knowing e.g. test retest reliability. Notice for instance
that for high values of n there should result adequate levels of TF for surprisingly low values of
r(M1M2). If for instance n = 30 and t = 1 one should get a very good true fit even with reliability say .68.
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
85
The relation between true fit and reliability will be further pursued in the next section. In the present
context we wish to emphasize that one may combine the results from TF contours from AF with results
from TF contours from NL to get results on (NL, AF) curves. These curves should be linear as there is
linearity both in the (TF, AF) curves and in the (TF, NL) curves.
To give a concrete example, consider the condition n = 15, t=1 and take the contour TF=2. We then
get:
AF = .205 from Fig. 3
NL = .47 from Fig. 10
and then the point (.47, .205) is located on the corresponding (NL,AF) curve from the original data in
Fig. 13.
Fig 13 Some comparisons between the relation of AF and NL based on Figs. 3 – 5 and Figs. 10 – 12 and
the relation of AF and NL in the original results
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
86
Notice both the linearity and the generally excellent fit this process gives on the representative curves
in Fig. 13. The concordance in Fig. 13 provides a very good over all check that the smoothing involved
in constructing both sets of TF contours has not removed us appreciably from the original data.
Fig. 13 also serves to bring out a difference between the present estimation process and usual
regression techniques. In usual regression techniques there is an asymmetry in that there are “two
regression lines”. There is, however, no asymmetry in the present procedure, the results presented in
Fig. 13 imply that regardless of what is regarded as predictor and what is regarded as the variable to
be predicted the present procedure will give the same results.
One drawback of using a graphical procedure is that the numerical evaluation of the estimation
procedure is a bit tedious. The following steps are necessary:
From TF contours from AF-stress construct (TF, AF) curves for all relevant conditions. Entering such a
curve with an observed value of AF one then reads off from the (TF, AF) curve the estimated value of
TF. This will subsequently be referred to as TF|AF. An exactly parallel procedure is used to estimate
TF from NL, these estimates will be referred to as TF|NL.
A necessary condition for the present procedure to be valid is that there is a fairly high
correspondence between these estimates of TF and the known true fit. Since in many cases one will
just be interested in the extent to which the present procedure gives the correct TF category, we first
present the % concordance for each condition.
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
Table 6.
87
Percent of cases in each condition where the estimation procedure gives correct category
placement, concordance.
TF
: short for TF category
TF|AF
: TF categories estimated graphically from AF- stress
TF|NL
: TF categories estimated graphically from NL
N
: number of configurations in a given condition .
a) Original study (450 configurations)
n
6
8
10
10
10
t
.1
2
1
2
3
N
40
20
50
25
40
15
15
15
20
20
20
30
30
30
1
2
3
1
2
3
1
2
3
50
40
40
30
35
20
20
20
20
450
.TF, TF|AF
55
75
70
72
68
86
85
83
100
80
85
80
90
90
TF,TF|NL
70
95
76
80
90
TF|AF, TF|NL
70
70
82
85
70
92
93
85
100
86
90
75
95
85
90
93
79
100
90
85
95
100
95
b) Crossvalidation study (225 configurations)
7
9
9
1
2
3
25
25
25
60
76
44
72
80
72
68
88
56
12
12
12
25
25
25
1
2
3
1
2
3
25
25
25
25
25
25
225
80
84
76
92
84
84
88
72
88
96
84
92
92
72
80
92
88
76
The first thing to notice from Table 6 is that there are marked differences between the results for n less
than say 12 and for n greater or equal to 12. This is especially pronounced in the column of most
importance in the present context, the concordance between TF and TF|AF. For n less than12 the
percent never goes above 80 whereas for n greater than 12 there is just one stray condition where the
percent drops to 76 (n = 12, t = 3 in Table 6b).
Consequently summary statistics will be presented separately for these two regions of n. These
summary statistics include linear correlation and since TF, TF|AF and TF|NL are all expressed in the
same units (the TF-categories scale) the root mean square discrepancy between each of the three
pairs is also relevant statistic.
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
88
Table7. Summary of results from
a) original study
b) crossvalidation study
TF
: TF-categories
TF|AF : TF-categories
TF |NL : TF-categories
estimated graphically from AF-stress
estimated graphically from NL
Type of comparison
TF,
a)
Correlation
n < 12
n > 12 .984
Total
.927
.984
.965
TF|AF
b)
TF,
a)
TF |NL
b)
TF|AF, TF|NL
a)
b)
.920
.983
.967
.961
.991
.989
.975
.991
.986
.956
.993
.980
.949
.991
.980
Root mean square
discrepancy
n < 12
.51
n > 12
.28
Total
.39
.52
.30
.39
.36
.22
.29
.30
.25
.27
.39
.17
.28
.41
.20
.29
Percent
concordance
n < 12
n > 12
Total
.56
.83
.74
.81
.75
.75
.71
.90
.87
.91
.83
.86
.93
.85
.79
Number of configurations
a)
b)
n < 12 175
75
n ≥ 12 275 150
Total
450 225
.67
.86
.79
These results seem highly encouraging. Notice that the correlations are analogous to multiple
correlations. The graphical procedure used to estimate TF from AF implicitly takes account of n and t.
Young (1970, p. 472) reported that numerous attempts at multiple regression generally accounted for
75 to 85% of the variance, occasionally more than 90%. In a previous report (Young, 1968, p.22) he
reported a maximal multiple correlation .965, by a curious coincidence this is exactly the value found in
the original study done here.
In view of our previous discussion of unrepeated designs (as the present) vs. partially repeated (as
Young) this is especially encouraging since that analysis would lead us to suspect more irregularity in
the present type of design. That this does not seem to be the case would then indicate that the effect
of completely different configurations is not very marked. This should be one basis for ascribing some
generality to the present results.
One might perhaps object that comparison with Young's maximal multiple correlation is somewhat
unfair since in the present investigation a larger range of values were observed since E covered a
much larger range in the present investigation).
Restricted range (as in Young's study) might reduce correlation. This however, is probably balanced
by a tendency observed here for standard error of estimates to increase somewhat for the higher
values of TF - categories. Furthermore the root mean square discrepancy and % concordance which
are not similarly influenced by range as correlation is, bear out the present favourable results.
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
89
Comparing the original and crossvalidational results confirm the graphical correspondence displayed
by Figs. 7 to 9. Both for correlation and root mean square discrepancy there are no differences in the
results. 26
The simplest way to summarize the confidence one can have in the estimates of TF would be to say
that provided n is larger than 12 the odds in favour of the present procedure leading to the correct TF
category is better than 4 to 1.
Yet of course there is room for improvement and especially refinement. The most important task would
be to try to replace the present graphical procedure by analytical methods. This would imply finding
analytical expressions for the coefficients of linear equations in (TF, AF) curves, cfr. Figs. 7 to 9.
These coefficients would have to be expressed as functions of n and t. Provided such analytical
expressions could be found it would be a very simple task to plug in an estimate of TF in MDSCAL.
The present procedure could be improved further by finding a more elegant transformation of r than
TF-categories. After first transforming r to give K, cfr. equation (8) in Ch. 3, it will be recalled from
Section 3.4 that K was further transformed by a broken curve composed of three linear segments. A
smooth transformation would be aesthetically more satisfying and perhaps also lead to some
improvement.
A somewhat puzzling aspect of the present study is that there is a perhaps slight but highly consistent
tendency for estimates of TF from stress to be poorer than estimates of TF from NL. The use of for
instance K* (based on rank images) would hardly improve the estimates, as in numerous conditions K*
and S were correlated and the correlation stayed well above .99. It may be the case - as implied by the
discussion of the fine grain of the results in Section 4.32- that minimizing S is not generally an optimal
target from the point of view of true fit and that this somehow is related to the poorer performance of
stress as a predictor variable than NL.
The major conclusion of the present section, however, should be to focus on the fact that largely
inspired by Young it has been possible to reach a goal which he first formulated. A major further goal
should probably at first not be refinement, but rather extension. The reader may wish to review Section
4.2 to bring into focus that we have explored only a very simple case of the general possibilities. A
major task would be to explore the more complex cases to provide a better basis for applying the
present results to the variety of empirical studies using multidimensional scaling.
4.7 Evaluation of dimensionality and applicability
Application of the extended form of the method
In order to provide information on dimensionality and applicability the extended form of the metamodel,
cfr. Section 2.2, must be applied. Since the design used in this section has a fairly complex structure, it
will be convenient first to have a general outline of the design. After the presentation of this outline
implications of the extended form of the metamodel will be schematically represented. The
representation will then serve as a framework for discussion of results from two simulation studies.
26
One might notice a small difference in the present concordances. This is probably a consequence of different
distributions of TF-categories in the original and crossvalidational study. A closer investigation of such detail
would at present add little or nothing to the over all picture. Further information on crossvalidation is included in
Section 4.7.
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
90
Fig. 14 Schematic representation of the design used for the extended form of the metamodel.
t – true dimensionality, m, m1 and m2 analyzed dimensionality.
Mij and Gij : Empirical correlations (r(Mi, Mj) and r(Gi, Gj)).
NL and TF : Theoretical correlations (r(L,M) and r(L,G)).
Superscripts refer to analyzed dimensionality.
With only two parallel forms there will be values only in the cells marked with *.
There are two basic features of the design used in the present section. The first is that for each true
configuration, L, several vectors, M1, M2…..Ms, are generated. Each M vector is generated by a
different stream parameter for the random process. In the present studies these M vectors are
generated from different noise levels, E1, E2…..Ene, furthermore there are a number of replications,
rep. for each noise level so that the total number, s, of the M vectors for a given L is s = ne x rep. This
corresponds to a far more extensive design than will usually be the case in practice, when there will
usually be only two parallel forms, M1 and M2. In contrast to a single retest reliability, r (M1, M2), the
present design generates a correlation matrix designated Mij in Fig. 14. The design used here
permits a detailed study of a single configuration, and corresponds to what in Section 4. 2 was labelled
a repeated design. Notice that the present design may have a parallel in empirical research if one is
willing to assume that s individuals for e.g. a set of physical stimuli have the same underlying
structure. Since the correlation between any two M vectors can be observed in empirical research, the
Mij correlations are here called empirical correlations. This is in contrast to the correlations between L
and M which usually can not be observed in empirical research (barring the special case of completely
specified hypothesis, cfr. p. 20) and consequently these correlations are here called theoretical
correlations.
The second basic feature of the design is that each configuration is analyzed in several
dimensionalities, m = 1, 2 and 3. Each value of m generates a set of G vectors and for each such
value an inter-correlation matrix can be computed. cfr. the Gij matrices in Fig. 14. The correlation
between two G vectors can also be observed in empirical research and are also called empirical
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
91
correlations here. These correlations contrast with the correlations between L and G(TF), the latter
being labelled theoretical correlations.
The bidiagonals (Mi.Gim) correspond to AF, here indexed by stress.
One major task of this section is to show how comparison of Gij values for the same pair (ij) but for
different values of m can solve the problem of dimensionality, together with information from
corresponding AF values. The final task is then to show how comparison of the appropriate Gijt with
the corresponding Mij value can throw light on applicability. A schematic representation of implications
from the metamodel, both for theoretical. and empirical correlations is presented in Fig. 15.
Fig. 15. Application of the extended form of the metamodel when the analysis is done in varying
dimensionalities for a given true dimensionality, t.
schematic illustration of expected relative size of Theoretical Purification, TP, based on theoretical
correlations and Empirical Purification, EP, based on empirical correlations.
NL|Mij = r(Mi, Mj) converted to the TF – categories scale.
TF|Gij = r(Gi,Gj) converted to the TF – categories scale.
Superscripts refer to analyzed dimensionality. Hatched lines signify that there are no unequivocal
relations to be expected between the position of G relative to M and L.
Fig. 15 illustrates most of the implications from the extended form of the metamodel which will be
tested in this section. These implications have all been more or less directly discussed in Ch. 2.
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
92
Implications for theoretical correlations
a)
TF t < TFm ≠ t
A basic premise in the metamodel is that the highest purification (lowest TF value) will result when the
analysis is guided by the correct assumption as to the form of L. This implies that the best TF value
will be found when analyzed dimensionality, m, is equal to true dimensionality, t.
b)
TF t < NL
This is nothing but a restatement of the thesis that theoretical purification (TP = NL -TFt) does occur,
cfr. Section 2.l, and as stated above TP is highest in the correct dimensionality.
c)
TFm < t > NL
If the analysis is done in too low dimensionality there is an insufficient number of degrees of freedom
to represent the information in L, consequently part of the structure will be lost and we will expect
negative values of TP, that is theoretical distortion. In Fig. 15 this is represented by for instance G1 for
t = 2 being further removed from L than M is.
Implications for empirical correlations
a1)
TF | Gijt < TF | Gijm ≠ t
a1) is proposed as a basic rule for finding the correct dimensionality and states that (since the lower
the TF value the closer the correspondence) the correct dimensionality is found by simply picking the
value of t where the correlation between the corresponding G vectors is highest. a1) is just a
restatement of the selection rule stated on p. 22.
b1)
TF | Gitj < NL | Mij
This is just a restatement of the thesis that empirical purification (EP = NL|Mij – TF|Gijt does occur
Since NL|Mij is the same regardless of m, the inequality a1) implies that EP will be highest in the
correct dimensionality. Negative values of EP denote empirical distortion.
Note the close parallelism between a) and a1), likewise for b) and b1). This parallelism is related to the
equivalence between EP and TP stated below. It is, however, difficult to state whether we should
expect an inequality.
c1)
<t
TF | Gim
> NL | Mij
j
parallel to c), that is whether analysis in too low dimensionality will result in empirical distortion or not.
If for instance two M vectors generated from a two-dimensional configuration are analyzed in 1
dimension it could be the case that the analysis happened to come out with more or less the same
one-dimensional configuration in both cases and it would then not necessarily be the case that
r(G1i,G1j) would be less than r(MiMj) (if this were the case we would per definition have empirical
distortion). This uncertainty is represented by hatched lines in Fig. 15.
Relation between empirical and theoretical correlations, equivalence between EP and TP
Perhaps the most central thesis in this work is the thesis of equivalence between TP and EP
(assuming correct dimensionality). This thesis was first stated on p. 21 and then given more precise
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
93
definition on p. 37 in the discussion of indices for NL in Section 3.2 and is for convenience repeated
here:
r(L,M) = r(L,Mi ) • r(L,M j ) = r(MiM j )
|
TP
|
|
TP
|
r(L, G) =
|
EP
|
t
r(L, Gi ) • r(L, Gtj ) =
NL equation
r(Git , Gtj )
TF equation
it being understood that TP and EP are both expressed in TF-categories. The equivalence stated
above will be relevant to the problem of applicability.
Two simulation studies have been done to check the implications of the extended form of the
metamodel. One study deals with systematic configurations, the other with random configurations.
Before presenting the results both studies are described.
Description of simulation studies.
Unless otherwise specified results are reported separately for each configuration for systematic
configuration. Each configuration is then a separate condition. For random configurations a condition
includes results for 5 different true configurations.
There are 8 conditions for systematic configurations, 4 different true configurations, all with n = 12, are
analyzed separately with two different error procedures, the Ramsay-Young process (R-Y) and the
Wagenaar-Padmos procedure (W-P), cfr. Section 3. 2 for a description of these noise procedures. The
4 configurations are:
t=1
t=2
i) a line with equal distance between neighbouring points
ii) a line where the distance between neighbouring points successively increase.
i) a circular configuration
ii) a lattice configuration.
This study represents an example of the strategy described in the discussion of "complex qualitative
variables" in Section 4.2, to study a few examples differing in many respects.
There are 6 conditions for random configurations, only the R-Y error process is used. For both n = 15
and n = 20, 1, 2 and 3 dimensional configurations were analyzed. A specific feature of this study is
that there is no trace of interval assumptions of the elements in the M vectors since all results are
based on rank image transformations of M, cfr. p. 38 27. This study is analyzed in more detail than
systematic configurations.
27
Actually rank image trasformations, based on G, were computed for all values of m. There was a slight but
fairly consistent tendency for the Mij intercorrelations to be highest when the correct dimensionality was used.
This may provide a further approach to dimensionality, but this possibility has not been explored in any detail.
Generally the results were fairly similar for all values of m, and for m = t the results were as expected very close
to the Mij correlations. In practice dimensionality may first be determined by inequality a1), it is then sufficient
to compute the rank image transformation for t, only these results will be reported. In Figs. 14 and 15 one may
replace the expressions with M with corresponding expressions for M*, cfr. also the survey of' terminology in
Table 8.
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
94
Further features of the two studies are outlined below:
ne - nr. of noise levels(E)
values of E
R-Y
W-P
rep - nr. of replications
for each value of E
Systematic config.
4
Random config.
3
.0 .20 .35 .50
.10 .20 .35 .40
.10
.35 .50
3
2
12
6
1
5
N(T) = con x s
nr. of observations in means
for theoretical correlations
12
30
N(E) = con x ( n )
2
nr, of observations in
means for empirical correlations
66
75
s = ne x rep
nr. of M vectors for each L
con - nr. of configurations
in each condition
Both these studies provide further crossvalidation results for the contours discussed in the previous
section, this will be separately discussed at the end of the section.
The results will be reported in four steps. First results based on theoretical correlations are reported.
Second, results based on empirical correlations and TF estimated from stress are reported. These two
latter types of results provide the opportunity to see how well the present approach contributes to
solving the problem of dimensionality. Third, results are presented on the relation between empirical
and theoretical purification, the latter based on results presented in Section 4.6. This illustrates our
approach to applicability. Finally crossvalidation results are summarized and compared to the results
presented in Section 4.6.
All results are presented in the TF-categories scale. For convenience Table 8 summarizes the main
symbols used.
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
95
Table 8. Survey of terminology.
Expressed in
What is converted to TF-categories
TF-categories
Theoretical correlations
r(L,M)
NL
t
r(L,M* G )
t
NL | M* G
r(L, Gm )
TFm
t
NL − TF t or NL | M* G − TF t
TP
Empirical correlations
r(Mi,M j )
NL | Mij
t
NL | Mij* G
TF | Gm
ij
t
t
r(Mi* G ,M*j G)
m
r(Gm
i ,G j )
t
EP
NL | Mij − TF/Gijt or NL | Mij* G − TF | Gijt
TF|AFm
Graphical estimate of TF from AF -stress, cfr. Figs. 3- 5.
Note that superscripts always refer to analysed dimensionality,m. For each value of true
dimensionality, t, n (m?) always takes on the values 1, 2 and 3.
M* refers to rank image transformations of M, cfr. p. 38.
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
96
Results based on theoretical correlations.
The main results are presented in Table 9.
Table 9.
Results based on theoretical correlations. Results where true dimensionality, t, equals
analyzed dimensionality are underlined.
a) Systematic configurations.
Ramsay-Young error process
t=1
t=2
Configuration
i)
ii)
NL
TF3
TF2
TF1
2.64
2.90
2.52
1.48
2.58
2.65
2.30
1.46
TP
1.16
1.12
i)
Wagenaar-Padmos error process
t=1
t=2
ii)
i)
2.67 2.60
2.26 2.36
1.80 1.90
4.33 4.25
.87
.70
ii)
i)
ii)
3.06
3.33
3.22
2.00
2.84
3.11
3.02
1.65
3.51
3.14
2.56
4.33
3.46
3.24
2.79
4.28
1.06
1.19
.95
.67
b) Random configurations.
n
15
t n
1 15
NL|M* Gt
2.72
TF3
TF2
TF1
2.85
2.59
1.75
.97
TP
t
2
n
15
2.78
t
3
n
20
t
1
n
20
t
2
n t
20 3
2.79
2.72
2.71
2.70
2.62
2.21
3.75
2.42
3.38
4.13
2.83
2.60
1.62
2.34
1.89
4.04
2.05
3.44
4.11
.57
.37
1.10
.82
.66
It is readily seen in Table 9 that the implications stated on p. 92 hold good:
a)
t
m≠t
TF is best when m = t, TF < TF
For both random and systematic configurations the underlined TF value is smaller than the other TF
values for each condition.
b)
t
*t
Theoretical purification does occur, TF < NL (or NL | M G )
As might be expected TP is more pronounced for higher values of n and for lower values of t. As we
shall see in more detail later this implies that for otherwise similar conditions it is easier to check
applicability of the model the higher the value of n and the lower the value of t (This should not be
unexpected since these conditions imply more constraints on the data, and generally the more
constraints, the more vulnerable any model will be).
c)
Theoretical distortion will always occur when analyzed dimensionality is too low,
t
TFm< t > NL ( or NL | M* G
This is very marked in the present results. Generally the amount of distortion is for instance more than
one unit in the TF scale if a two-dimensional configuration is analysed in one dimension. For example
for the R-Y error process, t = 2, configuration i) and m = 1 the distortion is 4.33 - 2.67 = 1.66.
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
97
On the other hand if analyzed dimensionality is too high there is no consistent tendency neither in the
direction of purification nor distortion.
The general validity of the three inequalities a), b) and c) is strikingly confirmed by looking at the
individual results. For systematic configurations the relations between the means are based on 4 x 12
= 48 comparisons for each value of t, totally 96 comparisons for this study. For random configurations
there are a total of 6 x 30 = 180 comparisons, making a grand total of 276 comparisons. a) is violated
in 7 of the 276 cases or 2.5%, b) is violated in 4 cases or 1.5% For c) 1 dimensional configurations are
not relevant, and there are thus 168 relevant comparisons and in 12 of these, or 7.1%, c) was violated.
It should be noted that the violations were by no means randomly distributed, for t = 1 and for n = 20, t
= 2 there were no violations whatever, furthermore there were no violations for the lowest level of E
(highest reliability), that is the violations occurred only with low reliability. This pattern is perhaps not
too surprising and we shall see the same pattern repeating itself in the more practically important
results dealing with how to assess dimensionality.
Results based on empirical correlations.
The main results are presented in Table 10.
Table10. Results based on empirical correlations. Results where true dimensionality, t, equals
analyzed dimensionality are underlined.
a) Systematic configurations
Ramsay-Young error process
Wagenaar-Padmos error procedure
t=1
t=2
t=1
t=2
Configuration i)
ii)
i)
ii)
i)
ii)
( i)
ii)
NL|Mij.
2.84 2.18 2.88 2.84
3.33
3.03
3.69
3.62
TF|Gij3
2.90 2.78 2.45 2.63
3.10
2.65
3.38
3.33
TF|Gij2
2.54 2.45
l.97 2.09
2.78
2.36
2,80
2.98
TF|Gij1
1.58 1.54. 4.04 3.90
2.26
1.74
4.05
4.07
EP
1.27
1.24
.91
.74
1.07
1.29
.89
.64
b) Random configurations
n
15
t
1
n
15
t
2
n
15
t
3
n
20
t
1
n
20
t
2
n
20
t
3
NL|Mij* tG
TF|Gij3
TF|Gij2
TF|Gij1
3.05
3.01
2.75
1.92
3.10
2.90
2.50
2.75
3.12
2.74
2.91
3.55
3.04
2.95
2.70
1.76
3.03
2.65
2.10
3.4 0
3.02
2.28
2.71
3.07
EP
1.12
.61
.38
1.28
.93
.75
Corresponding to a) and b), a1) and b1) as stated on p. 92 are readily seen to be verified. In every
condition:
a1)
≠t
TF | Gijt ∠TF | Gm
ij
and
t
b1)
TF | Gijt ∠NL | Mij (or NI | Mij* g )
c1) may have some validity for systematic configurations but not generally for random configurations.
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
98
The important question is now the extent to which a1) is valid for each single comparison. The
confidence with which we can answer this affirmatively will indicate how far the present approach
provides a simple solution to the problem of dimensionality.
This approach will further be referred to as dimensionlity rule a1).
Individual results turn out to depend upon reliability (noise levels), n, t and also type of configuration
(systematic versus random). For systematic configurations there are no pronounced differences
between the two configurations for each t. For each combination of t and error procedure, there are
thus 2x66 = 132 comparisons to be made. A survey of the per cent violations of rule a1) for systematic
configuration is given below:
t=1
R-Y
0
W-P
5.4
t=2
R-Y
3.7
W-P
16.5
The results are generally highly encouraging for such a relatively small value of n. There appears to be
a preponderance of violations for the W-P procedure. This, however, is mainly due to the fact that
generally the reliability is lower for the W-P procedure (cfr. the NL values in Table 10 and the NL|Mij
values in Table II). When for instance all cases with r(Mi, Mj) < .65 were excluded from t = 2 the W-P
% violations dropped to 5.5% while the R-Y % raised to 3.9).28
For random configurations the results were entirely clear for 1 dimensional configurations where there
were no violations whatsoever. For 2 dimensional configurations the results were quite different for
different noise levels. From Table 3, Ch. 3 E1 = .10 corresponds to r(L, M) = .99, similarly E2 = .35 to
r(L,M) = .92 and E3 = .50 to r(L,M) = .81. These values were confirmed in the present study. By the
equation on p. 93 the various combinations of noise levels then roughly corresponds to reliabilities as
outlined below:
E1
E2
E3
E1
.98
.89
.80
E2
E3
.81
.73
.66
It will be convenient to present the results separately for results generated only by E1 and E2 and the
results generated by E3. As will be seen from the outline above this roughly corresponds to a
distinction between cases with reliability above .80 vs. cases with .80 and below. Since there were 2
replications per noise level, this gives for each configuration 6 cases produced by E1 and E2 - 2
diagonal and 4 offdiagonal cases - that is a total of 30 such cases for each condition. There will be 9
cases produced by E3 for each condition (totally 45 such cases for each condition). Before we can give
details on individual results for 2 dimensional configurations it is necessary to distinguish between
three types of violations of rule a1):
Type (1) TF | G1ij < TF | Gij2
Type (2) TF | Gij3 < TF | Gij2
Type 3) both Type 1) and Type 2) above.
For Type 1) using rule a1) will lead to too low a dimensionality, correspondingly Type 2) will lead to too
high a dimensionality.
28
A more systematic comparison between the two error procedures would have been possible if different noise
levels had been chosen in such away that the reliabilities would have been the same for the two procedures.
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
Table 11.
99
Types of violations of dimensionality rule a1) for 2 dimensional configurations.
N: number of comparisons.
n
t
Noise
levels
Type 1)
Type 2)
total nr
Type 3) of violat.
N per cent
15 2
20 2
E1 and E2
E3
Total
E1 and E2
E3
Total
4
18
22
1
6
7
0
1
1
0
0
0
0
5
5
0
0
0
4
24
28
1
6
7
30
45
75
30
45
75
13.3
53.3
37.3
3.3
13.3
9.3
We see that rule a1) works quite well for n = 15 when reliability is above .80 29 (noise levels E1 and E2)
and for all investigated noise levels when n = 20. There is also a very consistent pattern in the errors
such that when the selection rule fails, it is in the direction of giving an underestimate of true
dimensionality. No such pattern was observed for systematic configurations. We should expect less
violations for n = 15 than for n = 12, nevertheless there appears to be more violation for n = 15, t = 2
than for the systematic configurations with n = 12. These differences between systematic and random
configuration will be further discussed later.
For t = 3 the selection rule completely fails for n = 15, there are 30/75 = 45% violations. For n = 20 the
selection rule may have some value for the noise levels E1 and E2, where there were 7/30 = 23.3%
violations. Altogether there are 27/75 = 36.0% violations for n = 20 and t = 3.
TF estimates from stress.
We now turn to another proposed criterion for determining dimensionality not previously discussed in
this section but stated in Section 2.2, p. 24. The stress for each analyzed dimensionality, m, is
converted to a TF estimate from the figure among Figs. 3-5 with dimensionality m, t is then identified
as the value where TFm is at a minimum. This is suggested as an alternative to looking for an elbow in
the stress curve, cfr. p. 6. It is a much simpler criterion, since a minimum is easier to identify than an
elbow. Another advantage of this criterion is that it does not require retest procedures. The main
results are presented in Table 12.
29
It is possible that more refined results might give differences if a given reliability was generated by the same
levels of E for both Mi and Mj than if the levels were quite different. From the outline on p. ## it is for instance
apparent that both E1E3 and E2E2 generate roughly the same reliability. We suspect that the selection rule will not
work so well when a given reliability is generated by widely different noise levels. With just a single reliability
(retest) coefficient, one can in practice of course not know with any confidence whether the underlying noise
levels are similar or not.
Nevertheless a detailed comparison of the success of the selection rule for a given reliability generated by closely
similar versus widely different noise levels might be of interest in further studies.
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
100
Table 12. TF estimated from AF - stress.
Results where true dimensionality, t, equals analyzed dimensionality are underlined.
a)
Systematic configurations
Ramsay-Young error process
t= l
t= 2
Configuration
i)
ii)
i)
TF|AF3 2.28
TF|AF2 2.25
TF|AF1 1.69
b)
2.43
2.20
1.63
Wagenaar-Padmos error procedure
t=l
t=2
ii)
i)
ii)
i)
ii)
2.20
1.93
4.15
2.08
1.98
3.90
2.68
2.50
2.10
2.18
2.02
1.79
3.13
2.92
4.28
3.15
2.90
3.88
Random configurations
n
15
TF|AF3
TF|AF2
TF|AF1
t
1
n
15
2.38
2.19
1.73
t
2
2.21
2.14
3.04
n
15
t
3
n
20
2.26
2.98
3.67
t
1
.28
2.01
1.44
n
20
t
2
n
20
2.07
1.82
3.19
t
3
2.04
2.85
3.32
Table 12 immediately shows that for every condition:
t
m≠t
a2) TF|AF < TF|AF
The next question is then to what extent this inequality, dimensionality rule a2), holds good for each
separate condition. For systematic configurations there are 12 comparisons for each separate
configuration, 24 comparisons for each combination of t and configurations. For random configurations
there are 30 comparisons for each condition. (There is one comparison for the AFm values, generated
by each separate M vector). Again it is simple to summarize the results for 1 dimensional
configurations. For the R-Y error process, n = 15 and n = 20 there are no exceptions to rule a2), while
for the W-P error procedure there are 3 violations or 12.5%.
For 2 dimensional configurations we can distinguish the same types of violations of dimensionality rule
a2) as the previously discussed violations of dimensionality rule al):
Type 1) TF|AF1 < TF|AF2
Type 2) TF|AF3 < TF|AF2
Type 3) both Type1) and Type 2) above.
Table 13. Types of violations of dimensionality rule a2) for 2 dimensional configurations.
N number of comparisons.
Y-R
W-P
n = 15
n = 20
Type 1)
0
1
1
0
Type 2)
3
6
6
1
Type 3)
0
1
3
1
total nr.of
violations
3
8
10
2
N
24
24
per cent
12.5
33.5
30
30
33.5
6.7
As reflected in Table 11 there was a tendency for concentration of errors at higher noise levels, this
aspect is not included in Table 13. Judging from over-all percentages of violations it is not possible to
see any clear-cut difference between the two-dimensionality rules (for n = 15, 37.3 vs. 33.5), for n =
20, 9.3 vs. 6.7). There is, however, a very interesting difference in the pattern of types of errors which
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
101
suggests that the two rules may serve in a complementary way. When rule a1) failed it was in the
direction of giving too low dimensionality, that is:
Rule a1) serves best in avoiding too high dimensionality.
With too many degrees of freedom, solutions from different M vectors may diverge in different
directions, thus avoiding Type 2) errors. On the other hand with too low dimensionality, stereotyped, or
“too similar” solutions may be found, thus giving Type 1) errors. It should, however, be pointed out that
the tendency towards Type 1) errors reflected in Table 11 did not occur for systematic configurations.
These configurations may all be characterized by a lack of clustering in the configurations. Perhaps
this feature tended to preclude a tendency towards stereotyped solutions, contrary to random
configurations where some clustering must be expected.
Turning now to dimensionality rule a2) there is a very clear tendency for this rule to avoid Type 1)
error, that is:
Rule a2) serves best in avoiding too low dimensionality.
This is not difficult to explain. With too low dimensionality, there are far too few degrees of freedom to
accommodate the structure of the material, this will force stress markedly upwards. As will be recalled
from Section 4.6, to a given TF there corresponds higher stress in one than in two dimensions but this
is far outweighed by forcing a solution into a too low dimensionality. On the other hand rule a2) is not
equally good in avoiding too high dimensionality, stress will then be low, the solutions may capitalize
on noise and we can not expect the rule to be highly differentiating. The latter reasoning may explain
the perhaps surprisingly low error rates found for 3 dimensional configurations, 10% for n = 15 and
6.7% for n = 20. If these configurations had been analyzed in 4 dimensions as well, we would have
expected a much higher error rate.
The main conclusion on dimensionality is that, provided certain condition holds good, both our two
major rules work very well. These conditions are: sufficiently high value of n, and not too high
dimensionality. To some extent low n and high t can be compensated by having highly reliable data.
The more detailed discussion suggested that even finer diagnosis of the proper dimensionality may be
achieved by using the criteria in a supplementary way. Rule a1), picking the highest value of
r(Gim, Gjm), rules out too high values for t, while rule a2), picking the lowest value of TF|AFm, rules out
too low values for t. Working out the fine details of such a combined use of criteria will, however,
require extensive investigation of different types of configurations as there may be interactions
between this variable and rule a1).
Theoretical and empirical purification.
We now assume that dimensionality has been estimated and turn to discuss the relation between TP
and EP, that is the validity of the equations on p. 93. This will be seen to illustrate an approach to the
question of the applicability of the model. There are two approaches to checking the validity of these
equations, namely separate check and combined check.
In the first approach the equation relating empirical and theoretical correlations for NL, the NL equation
is checked separate from and independent of the corresponding TF equation. In contrast to the next
approach the separate check is only indirectly testing the equality between TP and EP.
In the second approach the relation between the two equations is directly checked, that is the equality
between TP and EP.
The separate and the combined check correspond to different strategies for testing applicability. This
is briefly commented on p. 110.
Separate check.
For both the NL and the TF equations there are again two different approaches to test the validity of
the equations. Perhaps the most straightforward approach is to study the relative size of the
discrepancies between the middle and right part of the equations. In terms of Fig. 14 the NL equation
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
102
states that the Mij 30 correlations can be reproduced from the NL column, correspondingly the TF
equation states that the Gijt correlations can be reproduced from the TF column, in other words that
the Mij and the Gijt matrices both have a perfect onedimensional Spearman structure. The statistics
31
which will be used to check the structure of these matrices are:
∑
Res(Mij ) =Σi< j r(Mi,M j ) − r(L,Mi ) • r(L,M j ) /( c∑ i< j r(Mi,M j )/c)
t t
t
t
Σ
t t
Res(Gijt ) =Σ
i< j r(Gi , G j ) − r(L, Gi ) • r(L, G j ) / i< jr(Gi , G j )
simply indices for the relative size of the absolute deviations between the observed and the expected.
To rely exclusively on Res(Mij) and Res(Gijt) may obscure that small differences for very high
correlations may be of large practical importance.
The second approach checks the validity via the transformation to the TF-categories scale. First
theoretical correlations are multiplied e.g. r(L, Mi) ·.r(L, Mj), then each of the ( s2 ) products are
converted to TF-categories. NL|TCij (TC for Theoretical Correlation) denotes one such converted
product, these values are then compared with NL|Mij.
Correspondingly r(L, Gjt) · r(L, Gjt) are converted, the resulting values denoted TF|TCij.
A separate comment on this procedure (and the new symbols NL|TCij and TF|TCij) is in order since it
will serve to clarify an otherwise puzzling aspect of the results reported in Tables 9 and 10. Instead of
first multiplying theoretical correlations, then converting, one might have first converted then averaged.
The latter, averaging procedure would have introduced a systematic bias as illustrated in the following
example:
Averaging procedure
r (L,M1)
r (L,M2)
TF
=
=
.90
.80
3.20
3.76
mean(NL) = 3.48
TCij procedure:
(r (L,M1) · r (L,M2))2
= .722 = .8485
NL|TC12 = 3.55
Since the TCij procedure first multiplies theoretical correlations, more weight will be given to the lower
correlation in this approach than when averaging. This will give worse (higher) values in the TFcategories scale (3.55 > 3.48 in the example).
As a matter of fact if the averaging procedure had been used, mean (NL) for checking the ( s2 ) NL|Mij
values would have been identical to the NL values reported in Table 9. By comparing Tables 9 and 10
it will be seen that NL is systematically lower than NL|Nij. When NL|TCij is used there is, however, no
such bias. The discrepancy between the two procedures will be larger the greater the discrepancies in
the correlations. Correspondingly there are systematic differences between TFt in Table 9 and TF|Gijt
Table 10, though less pronounced since the Gijt correlations are more homogenous. The larger bias
for NL and the smaller bias for TF combine to produce a systematic bias for TP vs. EP such that TP is
generally smaller. The above argument implies that these two set of values are not comparable. When
30
In order not to have the terminology too complicated the sub-and superscripts for rank images will be dropped
in the rest of the chapter. It should, however, be understood that all computations for random configurations are
still based on rank images.
20 Since these corelations are of the same size for all the conditions the average across c conditions are used as
denominator (but of course separately for the R-Y and the W-P procedures). This is the contrast to Res(Gijt)
where the average correlation is different between conditions. It is always implied that the sums extend over the
different configurations within a condition.
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
103
we later compute TPTC as the difference between NL|TCij and TF|TCij there is no longer any bias. TPTC
and EP are closely similar in general size, cfr. Table 16.
We are now ready to present the results for the separate check, corresponding to
Res(Mij) and Res(Gijt) we have for the transformation approach:
Res(NL | Mij ) = ∑ NL | Mij − NL | TC ij /( ∑ ∑ NL | Mij /c)
Res(TF | Gijt ) = ∑ TF/Gijt − TF | TC ij / ∑ TF | Gijt
Table 14.
Separate check.
Relative residuals in per cent for the validity of the relation between empirical and
theoretical correlations.
a) Systematic correlations .
Ramsay-Young
t=1
error process
t=2
Wagenaar-Padmos
t=1
error
t=2
Res (Mij)
Res (Gijt)
1.6
.6.
1.7
1.0
3.3
1.2
5.6
2.6
Res (NL|Mij)
Res(TF|Gijt)
2.1
9.1
3.2
5.6
3.1
6.7
2.5
4.3
procedure
b) Random configurations
n
15
t
1
Res (Mij)
Res (Gijt)
1.4
1.0
Res (NL|Mij)
Res (TF|Gijt)
1.6
6.8
n
15
t
2
1.6
1.4
1.6
3.6
n
15
t
3
n
20
t
1
n
20
t
2
n
20
t
3
1.9
1.7
1.3
. 7
1.5
.5
1.1
.7
1.9
2.9
1.7
6.8
1.7
2.9
1.5
2.9
The results are very convincing. From both the Spearman point of view (Res (Mij) and Res (Gijt)) and
the transformation point of view (Res (NL|Mij) and Res (TF|Gijt)) the relative errors are acceptably
small. For the Mij matrices the relative error does not seem to be much influenced by whether
correlation residuals are computed or whether residuals in the TF scale are computed. For the Gijt
matrices, however, the relative error is more pronounced in the transformation approach. This is
probably due to the fact that in the Gijt matrices there will be differences between very high correlations
and such differences are magnified by the TF-categories scale.
While there is a tendency for relative error to decrease with n there is no very clear dependence on
dimensionality.
At present penetration to the fine details of these results does not add much to the over all picture.
There appears to be some relation between noise level and relative error, higher relative error for
higher noise level, but this tendency is itself of a fairly erratic nature. While different ways of presenting
deviations from the NL and TF equations might give somewhat different results, we think it is fair to
state that the general validity of these equations is excellent.
Before turning to the combined (direct) check on the equality between EP and TP some further
comments on Spearman structure will be made. Turning to Fig. 14 one might ask whether the square
MiGjt matrix should not also have a Spearman structure, that is:
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
t
104
t
r (L,Mi) • r (L,G j ) = r (Mi,G j )
This is indeed the case, the relative residuals are roughly of the same size as Res(Gijt). There is,
t
however, one important exception, the bidiagonal r(Mi,Gi ). This is a correlational index for AF, an
alternative to AF-stress. Since a number of studies (not reported in detail in the present work) have
t
t
shown that AF-stress and r(Mi,G i ) are very highly interrelated, comments on r(Mi,G i ) will serve to
elaborate some comments made previously on AF.
There is first the principle of global minimum stated on p. 39, that M should be closer to G than to L.
This implies:
t
r (Mi,G i ) > r (L,Mi)
This inequality has been checked for the total of 6 x 30 = 180 comparisons for random configurations
in every single case the inequality was found to be valid. The global minimum inequality implies the
weaker inequality:
t
t
r (Mi,G i ) > r (L,Mi) • r (L,G i )
In terms of partial correlation this latter inequality states that the relation between M. and Git can not be
t
completely accounted for in terms of L, or that Mi and Gj have more in common than can be
accounted for by L. This "more" can be called “capitalizing on noise”: In addition to L there will be
noise components common to Mi and Git. This is another way of stating a conclusion arrived at earlier,
cfr. p. 70, that G can not be represented on a straight line between L and M. As might be expected the
amount of capitalizing on noise behaves very regularly. The discrepancy
t
t
r (Mi,G i - r (L,Mi) • r (L,G i )
increases: when n decreases, when E (noise level) increases and finally when t increases. When
n= 20, t = 1 E = .1 the discrepancy is just .0028 while the maximal value observed in the present study
was for n = 15, t = 3, E = .35, when the value was .2059.
Combined check.
In the combined check on the TF and NL equations there are again two approaches. The most
immediate is first to compute TPTC
TPTC = NL|TCij - TF|TCij
and then the discrepancies
EP –TPTC = (NL|Mij – TF|Gijt) - (NL|TCij - TF|TCij) =
(NL|Mij – NL|TCij) - (TF|Gijt - TF|TCij)
The latter expression shows that checking the relative discrepancies of (EP - TPTC) is equivalent to
checking differences entering the expression Res(NL|Mij) and Res(TF|Gijt). Since generally differences
are far less reliable than the components we should expect substantially worse discrepancies for the
direct check. Nevertheless it will be of interest to study the relative discrepancies:
Res (EP) = ∑ | EP -TPTC| / ∑EP
The other approach in the combined check will turn out to be a validation of our basic proposal for
checking applicability of the model. On p. 23 we proposed that applicability could be evaluated by
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
105
comparing two values of purification, empirical purification and estimated theoretical purification; Est
(TP).
A seemingly simpler approach would be to compare TF estimated from NL|Mij - this will be labelled
TF|(NL|Mij) - with TF estimated from t (Git, Gjt), that is TG|Gijt.
Two independent estimates of TF are then compared, one based on retest reliability, the other on
results after multidimensional analysis. TF|(NL|Mij) can be read off from Figs. 10 to 12 just as TF|AF is
read off from Figs. 3 to 5. If now TF|Gijt is not appreciably higher than TF|(NL|Mij) this substantiates the
validity of the model. To take an example suppose that n = 15, t = 1 and r (Mi, Mj) = .935. Two
different values of r(G1i, G1j) for this reliability will be used to illustrate the procedure. As example 1 we
set r(G1l, G1j) = .990, this gives TF|Gij1 = .93. From Fig. 10 we see that for n1 = 5, t = 1and r (Mi ,Mj) =
.935, then TF|(NL|Mij) = 1.0 so in this case we would have a very good confirmation of applicability. As
example 2 we set r (Gi1, Gj1) = .960, this gives TF|Gij1 = 1.85. The latter value is appreciably higher
than TF|(NL|Mij), but is it so much higher that we have reason to doubt the validity of the model?
Instead of working out procedures to decide when the discrepancy between TF|Gijt and TF|(NL|Mij) is
as large as to throw doubt on the validity of the model, we choose to use the slightly more indirect
procedure of comparing two points in a usual coordinate system.
the observed point:
the theoretical point:
(NL| Mij, EP) and
(NL|Mij, Est (TP))
where the first (common) coordinate is the abscissa, the second coordinate the ordinate and
Est (TP) = NL|Mij – TF|(NL|Mij)
The discrepancy between the ordinates of the observed and theoretical points is then:
EP –Est (TP) = (NL | Mij − TF | Gijt ) − (NL | TCij − TF | TCij ) =
TF | (NL | Mij ) − TF | Gitj
So the second approach is to study:
ResEst (EP) = ∑ | EP – Est (TP) | / ∑ EP
that is the discrepancies between the empirical and theoretical estimates of purification.
The reason we choose to compare the observed and the theoretical point instead of justTF|Gijt and
TF|(NL|Mij) is that generally the latter discrepancy will depend upon the size of NL|Mij. Comparing the
observed and the theoretical point takes NL|Mij into account, in essence it is equivalent to comparing
TF|(NL|Mij) with TF|Gijt separately for each level of NL|Mij cfr. the expression for EP-Est(TP).
Before suggesting rules for when the discrepancy between the observed and the theoretical points is
suspiciously large, we need a convenient way to find the estimated theoretical purification, Est(TP).
This information is contained in Figs. 10 to 12, albeit in a fairly indirect way. In our example
r(Mi, Mj) =.935 corresponds to NL|Mij = 2.25 and this implies:
Est(TP) = 2.25 – 1.0 = 1.25.
Systematic information on Est (TP) is represented in Table 15.
To fill in Table 15 the first step was to construct (TF, NL) curves - parallel to the (TF, AF) curves in
Figs. 7- 9. In these curves NL was converted to the TF-categories scale and it was then simple to read
off values of NL-TF for different levels of NL.
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
106
To take an example: for n = 20 and t = 1, then r(Mi ,Mj) = .85 corresponds to NL|Mij = 3,.0. In Fig. 10
we see that (n = 20, r (Mi, Mj) = .85) is about halfway between the contours for TF=1 and TF=2 but
slightly closer to TF = 1. On a (TF, NL) curve we can read off the value TF = 1.48, that is TF|(NL|Mij).
This then gives Est (TP) = 3.0 - 1.48 = 1.52, the latter value is an entry in Table 15.
Table 15.
Amount of estimated purification, Est(TP), as a function of n, t and NL|Mij. NL|Mij is
estimated from r(MiMj) and is as Est(TP) expressed in the TF-categories scale. Negative
values denote distortion.
T=1 n
NL|Mij
-.5
30
20
15
10
6
0
.5
-.20 .20
.55
-.50 -.10
.28
-. 70 -.25
.20
-1.0 -.65 -.22
-2.0 -1.57 -1.15
1.0
1.5
2.0
2.5
3.0
3.5
.85 1,15 1.40 1.62 1.70 1.72
.65
.98 1.25 1.42 1.52 1.50
.52
.82 1.12 1.32 1.38 1.30
.20
.55
.80 .90
.98 .85
-.75 -.20 .10 .40
.50 .42
4.0
4.25
1.42 1.0
1.15 .75
.90 .60
.55 .32
.18 .05
t=2
30
20
15
10
8
-.30
-.60
-.80
-1.20
-1.50
.02
-.22
-.40
-.80
-1.03
.35 .70
.17 .45
.00 .30
-.40 .00
-.55 -.16
30
20
15
10
8
-.60
-.70
-.90
-1.40
- 2.10
-.30
-.40
-.55
-.92
-1.70
.05
-.02
-.20
-.50
-1.32
.98
.78
.55
.32
.05
1.25 1.42 1.45 1.38
.98 1.15 1.20 1.10
.73 .88 .90 .75
.55 .73 .70 .55
.16 .25 .18 .17
1.10 .75
.80 .55
.55 .35
35 .20
.13 .10
.75
.82 .84
.60 .68 -.65
.42 .48 .45
.12 .15 .15
-.35 -.18 -.10
.60
.40
.28
.02
.00
t=3
.38
.25
.05
-.17
-.90
.63
.48
.28
.00
-.55
.78
.62
.40
.12
-.05
.38
.25
.18
.00
.00
Notice that the information in the column for NL = -.5 is the same as the information presented in
Fig. 6, the error free case. Some representative curves illustrating the information in Table 15 is presented in Fig. 16.
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
107
Notice that from the bottom scale in Fig. 16 one may easily find the appropriate value of NL|Mij from
r(Mi, Mj. 32 The same scale can be used to find TF|Gijt from r(Git, Gjt).
By interpolation curves for other values of n than the ones listed in Table 15 can be constructed, or as
often will be the case in practice, single values of Est(TP) can be computed by a double linear
interpolation.
If now the observed point is closely under (or above) the theoretical point everything is OK. If,
however, the observed point is far below the theoretical point the wrong type of model has most likely
been used. Ideally one would like to have a confidence belt surrounding each purification curve. This
32
For convenience the transformations leading from r(Mi ,M j) to NL/Mij are also summarized in FORTRAN
notation in the appendix.
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
108
is not possible to construct from the present material, but we can suggest some rules of thumb from
the maximum deviations observed in the present study. For n = 15, t =1 EP should always be greater
than .50 x Est (TP).
For n = 15, t = 2 and 3 there is a much simpler rule, EP should just be above 0. For n = 20, t =1 we
should have EP greater than .40 x Est (TP) and for n = 20, t = 2 and 3 EP greater than .50 x Est (TP).
The two previously mentioned examples will serve to summarize the procedure for checking
applicability.
Finding the theoretical point
Finding the observed point
Ex.1
Ex.2
r(Mi,Mj)
.935
r(G 1i ,G 1j )
.990
.960
NL|Mij
2.25
TF |Gij1
.93
1.85
Est (TP)
1.25
EP = NL / Mij − TF / Gi1j
1.32
.40
From the bottom scale in Fig. 16 we see that r (Mi, Mj) = .935 corresponds to NL|Mij = 2.25. Entering
the curve for n = 5, t =1 with 2.25 as abscissa gives the value 1.25 for Est (TP).
(Since Est(TP) – NL|Mij = TF|(NL|Mij) then TF|(NL|Mij) is here 2.25 - 1.25 = 1.00, the same value as
previously read off from Fig. 10. This again illustrates that the information in Table 15 and Fig. 16 is
the same as in Figs. 10 to 12.)
Using the bottom scale in Fig. 16 also shows that r(Gi1, Gj1) = .99 gives the previously mentioned value
TF|Gij = .93. In example 1 the model looks slightly "too good", EP is higher than Est(TP). Such a
finding should not be surprising since the curve for Est(TP) is an expected curve where there will be
random deviations in both directions. We should, however, have EP greater than .50 x Est(TP). This
gives a lower boundary of .63 for EP in the present case, and this boundary is clearly not reached in
example 2. Even though there is some empirical purification in example 2 (as may be directly seen by
comparing r(Gi1 ,Gj1) and r (Mi, Mj)), the observed amount of EP is not sufficient to warrant faith in the
applicability of the model for this example. Provided the model is correct a reliability of .935 implies
more structure in the data than what is implied by r(Gi1, Gj1) = .960.
It is now convenient to give systematic information on Res (EP) and ResEst (EP).
Table 16 present information on the two approaches to the combined, or direct check on the equality
between EP and TP.
The first row of Table 16 is taken from Table 10. Comparing this to the second row we see that there
are no significant biases in the TPTG procedure, the mean results are closely similar. There is,
however, a consistent difference favouring Est(TP) over EP. The mean difference Est(TP) - EP across
the six conditions is .12 . To a large extent, however, this difference is due to the combination of error
levels E1E3 - if this combination is excluded, the overall difference drops to .07. Still this is probably a
significant difference and it reflects that TF|Gijt >TF|(NL|Mij).
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
109
Table16. Combined check.
The relation between EP and TP by two different approaches.
Relative residuals in per cent.
Random configurations.
n
15
EP
TPTC
Est (TP)
Res (EP)33
ResEst(EP)
1.12
1.09
1.27
.12
.23
t
1
n
15
.61
.59
.79
.25
.34
t
2
n
15
.38
.37
.41
.20
.40
t
3
n
20
t
1
1.28
1.26
1.45
.11
.20
n
20
t
2
.93
.94
1.10
.07
.18
n
20
t
3
.75
.74
.63
.10
.22
Since the latter value is based on the TF contours this inequality implies that the contours are (slightly)
“too good” for the present study of random configurations. Notice that there is a similar tendency when
comparing TF|AFt in Table 12 with TFt in Table 9, again the results from the TF contours are slightly
too good. (On the other hand there is the reverse tendency for systematic configurations and for both
studies the crossvalidation, as later reported in Table 17, appears satisfactory, so there may not be
much reason to dwell on these discrepancies).
As expected the relative discrepancies are much higher than those for the separate check in Table 14.
The most important relative discrepancy, ResEst(EP) may appear uncomfortably high. Those relative
discrepancies do not, however, preclude useful rules of thumb for checking applicability as exemplified
on p 108.
From p. 98 it will be recalled that the present study covered reliabilities in the range .66 to .98 (some
reliabilities were beyond .99). From the purification curves presented in Fig. 16 it might appear that it is
advantageous not to have too high reliability. For n = 20, t =1 there does for instance seem to be a
maximum for r (Mi, Mj) = .80.This apparent advantage of not too high reliability has no real basis,
however, as present results seem to indicate that the relative error increases with decreasing
reliability.
If one has reliabilities far exceeding those observed in the present study, the use of the purification
curves is not very meaningful. This is most clearly seen in the extreme case with reliability = 1. The
curves then indicate distortion, but this is meaningless from the point of view of r(Git, Gjt) which of
course also will be 1 in this case. When r(Mi, Mj) is greater than .99 we propose to use stress instead
of r(Git, Gjt). If for instance n = 15, t = 2 and r (Mi, Mj) = .998, this corresponds to
NL = .5 and NL – TF = 0 and thus TF = .5. From e.g. Fig. 2 it is seen that for TF = .5 we should have
stress in the neighbourhood of .01. If then the stress of both Gi2 and Gj2 is far above .01 this indicates
as stated on p. 23 that there is more structure in the material than is captured by the method and thus
that the wrong model has been applied.
Perhaps finer diagnosis of applicability could be developed by systematically combining stress and
r(Git, Gjt) in the assessment of applicability. For the regions of reliability we have investigated,
however, stress is slightly less related to TF|(NL|Mij) than TF|Gijt is and at present it is hard to see how
stress might give better procedures when we have the usual values of reliability.
There might appear to be a weakness in the present approach to applicability in that we first assume
that dimensionality is estimated, then proceed to ask whether the type of model is correctly chosen.
But how can dimensionality meaningfully be estimated if the model is basically inappropriate? First we
should notice that probably any approach to applicability would be very difficult (if not impossible) to
develop if dimensionality is very high. As true dimensionality increases, then expected purification will
33
In these results the combination of noise levels E1E3 is excluded since this condition severely disfavoured
ResEst (EP).
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
110
tend towards 0. r(Git, Gjt) will tend to be more and more equal to r(Mi, Mj). It is, however, not likely that
a spatial model really is appropriate if dimensionality is too high. In the limiting case where
dimensionality is n-2 (or n-1 for metric models) we have what will later be called an unfilled space. So
we assume that for the cases where one wants to check applicability of the model the alternatives are:
either a reasonably low dimensionality or that a spatial model is inappropriate. If a spatial model is
inappropriate one might expect dimensionality rule a1) and rule a2) to give contradictory or equivocal
results. This in itself might bean indication that the model is inappropriate.
It is then further possible to use our proposed comparison of the observed point and the
corresponding theoretical point on the purification curve for several alternative hypotheses of the value
of t. If for all values of t the observed point is far below the theoretical point this will be a strong
indication that the model is inappropriate.
It would have been valuable to try this procedure for data generated by other types of model than a
spatial model, for instance a tree structure model. This has not yet been done, but we would expect
that serious distortion would occur if one analyzed the data with the hypothesis of for instance a 2 or 3
dimensional structure. Perhaps many cases where the model is inappropriate will show a strong
empirical distortion for the potentially relevant values of t. Provided n is not too low (cfr. Table 15) and
reliability is fair (say above .80) a clear empirical distortion will be a very strong indication that the
wrong type of model has been used. For the rare cases of very high values of reliability, it is possible
from Figs. 3 to 5 and Figs. 10 to 12 to estimate what the corresponding value of stress should be if the
model is appropriate. The details of such an approach have, however, not yet been worked out.
While the design used in the present section includes a set of “parallel” forms, the result for the
combined check have been written out from the point of view of the user who has a single test, retest
design. As pointed out on p. 90, however, there may be cases where the design used in empirical
research has a. similar structure to that outlined in Fig. 14. This will be the case either if there are
several replications for a single individual or if we are willing to assume that results from different
individuals can be considered as replications generated from the same L. In this case one may go "the
other way" from what was done in computing the values in Table 14. Instead of estimating Mij and Gijt
from the theoretical correlations, the theoretical correlations can be estimated by standard procedures
for dealing with a Spearman structure, cfr. Harmann (1967), Thurstone (1947). The low residuals
reported for the separate check in Table 14, compared with the relatively high residuals in Table 16
suggest that this will be a very sensitive procedure to test applicability of the mode. This design also
has the advantage that one will get “tailormade” estimates of TF, it is then not necessary to go via the
contours in Section 4.6.
If we finally take a closer look at the meaning of TF|Gijt this brings forth an incompleteness of the
present approach which calls for further research. Each of the configurations Git and Gjt have their
corresponding TF, and TF|Gijt is a kind of average of these separate true fits. But provided the
problems of dimensionality and applicability have been satisfactorily answered, one will not be
primarily interested in such an average. One will want in some way to get at the best configuration and
then to know the true fit of this one configuration. Just as generally a mean is more reliable than the
separate components one might hope it would be possible to derive one configuration which would
have a better true fit than either of the configurations Git and Gjt.
A promising approach to this problem might be to use the option in MDSCAL which allows a single
solution to be computed from several data vectors, the repeat option described by Kruskal (1967). A
special problem is then what to use as start configuration to avoid local minima problems. In some
preliminary runs Git was used as start configuration when Mi and Mj were inputted in one run.
Altogether 8 different random configurations from various combinations of n and t have been
analyzed. For each configuration 3 solutions were computed, the first from M1 and M2 when E1 = .10
(reliability ca. .98), the second from M3 and M4 when E2 = .35 (rel. ca. .81) and the third from M5 and
M6 when E3 = .50. (rel. ca. .66).
In all but 2 of the 18 runs the solution had better true fit than either of the separate true fits for Git or
Gjt. It might be noticed that in the two exceptions the G with the worse true fit was used as start
configuration and the program failed to change this configuration - that is we happened to start with
what might well have been a local minimum and this might have been avoided with a different start
configuration. Leaving out these two cases the improvement in TF compared to TF|Gijt turned out to be
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
111
roughly .20 for E1 and varying between .20 and. 50 for E2 and E3. At this point we should not be
surprised that though true fit improved by using the repeat option, stress was higher (looked worse).
Using the repeat option is equivalent to increasing n, and we remember that this generally increases
stress and improves TF. It is, however, too early to attempt to give parametric expression to this
relation, likewise it is not known if the repeat option will give an improved solution if the true fits of the
separate G’s are widely different. A further possibility is to study "to what extent further improvement
can be made by using the repeat option with more than two M vectors. Is it then possible to get equally
good true fit as otherwise can only be achieved with high n and highly reliable data? If this turns out to
be the case it might be possible to get excellent levels of true fit even with small values of n and
unreliable data merely by having enough replications.
Further crossvalidation results.
Further crossvalidation results from the random and the systematic configurations are given in Table
17.
Comparing these results with those presented in Table 8 (p. 95) we would expect the results for
systematic configurations to be between those for n ≥ 12 and n (< 12, that is the correlation
between .92 and .98 and the rmsq discrepancy to be between .30 and .50.
Table 17.
Further crossvalidation results
r correlation between TFt and TF|AFt
rmsq root mean square discrepancy between TFt and TF|AFt.
a) Systematic configurations
Ramsay-Young error procedure
t =1
t=2
configuration i)
ii)
i)
ii)
Wagenaar-Padmos error process.
t=1
t=2
i)
ii)
i)
ii)
r
rmsq
.951
.364
.858
.481
.941
.284
.895
.443
.919
.402
.831
.484
.878 .946
.599 .363
Summary results
R-Y
W-P
Total
b) Random configurations
n
t
n
15
1 15
r
rmsq
.957
.265
.944
.293
t
n
2 15
.972
.301
t
3
n
20
.957
.293
r
rmsq
.903
.918
.920
.409
.466
.437
t
1
n
20
.980
.201
t
2
n
20
t
3
.969
.266
Summary results (over all conditions)
r
rmsq
.962
.285
We might thus have wished the total correlation to be slightly better but the rmsq is just as expected.
We tend to put more faith in the latter index since it is sensitive not only to general linear relation but
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
112
also to differences in average and standard deviation. There is exactly the same pattern for random
configurations, the total rmsq here is .285 which fits just beautifully with the values .28 and .30 from
the studies in Table 7. The fact that the over all correlation is a bit lower in the present studies is
probably due to the fact that more limited ranges of noise levels were used at present than in Section
4.6.
These further crossvalidation results should finally settle the point that there is no systematic
difference to be expected between repeated and unrepeated designs, cfr. the discussion in Section
4.4, p. 66-67. Furthermore it is very encouraging that the results for systematic configurations do not
depart appreciably from those for random configurations and the similar results for the two different
error procedures suggest that it may not be of basic importance precisely how the error process is
specified. 34
34
It might here be further mentioned that for n = 12 in the crossvalidation results reported in Section 4.6 the WP error procedure was also used. The relation between AF and TF turned out to be the same for both the R-Y and
the W-P error procedures.
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
113
Part II.
ANALYSIS OF A TREE STRUCTURE MODEL AND SOME STEPS TOWARDS A GENERAL MODEL
Chapter 5 Johnson's "hierarchical clustering schemes, HCS.
5.1
A presentation of “Hierarchical clustering schemes" HCS.
While Section 1.42 gave some intuitive background and a concrete example of a tree structure, this
section treats tree structures from a formal point of view. The formal definition of a HCS is followed by
the definition of distance for a HCS. The latter definition makes it possible to map a HCS into a
distance matrix and we then show that it is possible to go "the other way" - that is to reconstruct a HCS
from such a distance matrix. The definition of a HCS will be illustrated by reference to the example in
Fig. 1.
Fig. 1. An example of at HCS and the corresponding tree representation.
The definition of a HCS has two parts:
a) An ordered sequence of m+1 clustering
C = (C0, C1 …...Cj-1, Cj ...Cm)
b) A set of corresponding values of clusterings:
α = α0, α1…..α j-1, αj…αm)
where α0 = 0 and αj-1 ≤ αj
The subscript j (j = 0, 1, ...m) denotes the level of the HCS.
A clustering is a partitioning of n objects into a set of non-overlapping clusters. In Fig. 1 n = 7 and the
integers 1, 2 ...7 are used as labels for the 7 objects. Each cluster in a clustering is delineated by a
parenthesis. At the lowest level, where j = 0 (see the row for j = 0 in Fig. 1) each cluster consists of just
a single object. The corresponding clustering – C0 - then consists of n clusters. This is called the weak
clustering, it is really a dummy (trivial) clustering and has no empirical significance. At level 1 there is
one non-trivial cluster, (12), the remaining clusters in C1 again consist of single objects, so there are 6
clusters in the clustering C1 (see the row for j = 1).
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
114
The most important property of a HCS is that the clusterings are ordered, which is expressed:
Cj – 1 < Cj
The relation “<” can be interpreted as an inclusion relation. To say that Cj-1 is included in Cj is a
shorthand way of expressing that every cluster in Cj-1 is included in a cluster in Cj. Stated otherwise every cluster in Cj is either a cluster in Cj-1, a merging (union) of clusters in Cj-1.
As the level j increases the clusters in clustering Cj will also “increase” in the sense that they will
contain more objects. The clusterings might then also be said to “increase” or to get “less weak”.
Finally we get w hat is called the strong clustering, which consists of just one cluster. This cluster then
contains all the objects (C6 in the example). Just as the weak clustering it is a dummy clustering
without empirical significance.
The clusterings C1 to Cm-1 are non-trivial. Different groupings of the objects (cfr. the use of
parentheses) in each clustering will define different trees for a given n.
In the example the clustering C5 consists of two clusters (123) and (4567). The three clusters at the
preceding level C4 - (123), (45) and (67) - are included in C5. (123) is of course “included” in (123).
Both (45) and (67) are included in (4567), the latter being a merging of the two former clusters. This
property of inclusion makes it possible to represent a HCS as a tree. At a given level j some clusters
from previous levels are merged. These clusters are represented as nodes. The merging is
represented as branches from the “previous” cluster nodes to the node at level j which then represents
the “new” cluster being formed. In the example the node at level j = 5 represents the cluster (4567).
The two branches from this node to the level 3 and level 4 node signifies that (45) is formed at level 3,
(67) at level 4 and that these are merged at level 5 as stated above.
If the new clusters were not formed by merging of previous ones, we could not represent the structure
as a tree. If we had for instance (45) at one level and (56) at the next level, then of course the
structure would not be a HCS since (45) is not included in (56).
So far α the values of clusterings - have not been discussed. Some properties of HCS, as in Section
5.2, may be stated independent of these values. The α values are used to define distances between
objects in a HCS structure. The distance between objects (x, y), d(x, y), is defined:
(1) d(x, y) = αj where j is the least integer such that x and y are in the same cluster in clustering Cj.
In the example 4 and 5 are necessarily in the strong clustering C6. The least integer j where they are in
the same cluster is 3, and in Fig. 1 we see that α3, =d (4,5) = 14.
Let us now see how equation (1) satisfies the distance axioms (cfr. p. 11-12). The definition of
distance immediately implies that d(x, x) = 0 ( since α0 by definition is 0).
The first distance axiom also states that if d(x, y) = 0 then x = y. This requires that α1> α0 (if α1 = 0 then
distinct objects x and y could have distance 0). Unless otherwise stated it will be assumed that α>0.
Concerning the second distance axiom it is immediately evident that d(x, y) = d(y, x).
The most important of the distance axioms is the triangular inequality. Johnson shows that the
distance definition for a HCS satisfies a much stronger version than the usual statement of this
inequality, what he calls the ultrametric inequality. This inequality is simple to illustrate. Let:
(2)
d(x,y) = αj ,
d(y,z) = αk
Where we assume αj ≤αk
We then know that x and y are in the same cluster in Cj and y and z in the same cluster in Ck. Since Cj
is included in Ck x must be in the same cluster as y and z in Ck. (The cluster containing x and y must
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
115
increase from Cj to Ck, x can then not be "dropped" from the cluster). Both x and y must join z in a
cluster at the same level. This implies according to the definition of distance:
d(x, z) = αk
The usual statement of the triangular inequality is:
d(x, z) ≤ d (x ,y) + d(y, z)
but we have shown :
(3)
d(x, z) = d(y, z)
and then clearly d x, z) < d(x, y) + d(y, z).
Johnson (1967, p. 245) states the ultrametric inequality:
(4)
d(x, z) ≤ max (d(x, y), d(y, z))
Equations (2) and (3) are an alternative formulation.
As Johnson points out the ultrametric inequality is clearly stronger than the triangular inequality in the
sense that the ultrametric inequality establishes a much smaller upper bound for d(x, z) than is
generally required by the triangular inequality. (In this sense the "weakest" requirement of the
triangular inequality would be: d(x, z) = d(x, y) + d(y, z) that is: y between x and z on a straight line, cfr.
p. 197).
From the HCS definition of distance it is straightforward to map a given HCS into a distance matrix as
shown in Table 1 for our example.
Table1. The distance matrix for the HCS in Fig. 1.
1
2
3
4
5
6
7
1
0
2
6
26
26
26
26
2
2
0
6
26
26
26
26
3
6
6
0
26
26
26
26
4
26
26
26
0
14
22
22
5
26
26
26
14
0
22
22
6
26
26
26
22
22
0
18
7
26
26
26
22
22
18
0
Note the large number of ties in this matrix. This is a characteristic feature of a distance matrix
satisfying a HCS, a direct consequence of the definition of distance. When two clusters are merged all
the distances between the objects in one cluster and the objects in the other cluster must be equal. In
the example all distances between for instance (123) and (4567) must be equal to 26.
This feature makes it simple to reconstruct a HCS from a distance matrix such as the one above. The
matrix is successively condensed. At each step the smallest distance in the matrix αj is picked and the
clusters (which may be single objects) with distance αj are merged. This creates Cj from Cj-1. If d(x, y)
= αj is picked at Cj-1 the ultrametric inequality implies that the distance between (x, y) and another
object z is uniquely defined.35 ) Since d(y, z) = d(x, z) it is natural to define d (x y), z) = d(y, z) = d(x, z).
This process of condensation is illustrated at two stages in Table 2.
35
x, y and z may be clusters containing more than a single object.
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
116
Table 2. Illustration of condensation in finding the HCS from the distance matrix.
C1
(12)
3
4
5
6
7
(12)
0
6
26
26
26
26
3
4
6 26
0 26
26
0
26 14
26 22
26 22
5
26
26
14
0
22
22
6
26
26
22
22
0
8
7
26
26
22
22
8
0
C4
(123)
( 45)
( 67)
(123)
0
26
26
(45)
26
0
22
(67)
26
22
0
At stage 1 the only non-trivial cluster is (12) giving the matrix labelled C1. In this matrix 6 is the
smallest distance, and this gives the cluster (123) in C2. In the matrix corresponding to C2 14 is the
smallest distance and this gives the clustering (123), (45), (6), (7) in C3. Picking 18 in C3 gives the
clustering (123), (45), (67) - the corresponding condensed matrix is given to the right in Table 2.
For empirical matrices the ultrametric inequality will probably never be strictly satisfied. In the process
of condensation it will not generally be the case that d(x, z) =d(y, z) when x and y are merged to one
cluster. There will not be the large number (and pattern) of ties which the ultrametric inequality
demands. Methods for constructing HCS which "approximates" the structure in the data matrix will not
be discussed in any detail in the present work. Suffice it to mention that Johnson recommends two
extreme strategies,
a) always pick the smallest of the dissimilarities to be merged, the minimum method,
b) always pick the largest of the dissimilarities to be merged, the maximum method.
If the two strategies give "closely similar" results this gives some reassurance that HCS is an adequate
model. Goodness of fit and a complete discussion of HCS from the point of view of the metamodel will
be discussed at another occasion.
Johnson does not explicitly treat the question of the possible number or clusterings (m+1) in relation to
n. The number of clusters in Cj-1 must be at least one less than the number of clusters in Cj-1. Since
the process stops when all the objects have been merged to one cluster there can be at most m+1 = n
clusterings (the process starts with n clusters in C0). If for a given numerical value αj more than two
clusters are merged to a single cluster this implies that m is correspondingly reduced.
If for instance in the example d(1,2) = d(1,3) = 2 - then d(2,3) must also be 2 - and the three clusters
(1), (2) and (3) are merged to (123) at C1 and there will be at most m+1 = n-1 or 6 clusterings. In such
cases there will be more than two branches from a single node in the tree representation. We always
have that the number of clusterings will equal the number of nodes in the tree plus one (since there is
no node corresponding to the weak clustering C0).
Two cases can then be distinguished:
a) Binary tree. The number of clusterings: m+1 = n. Always two branches from each node in the
tree (as in the example here used).
b) General tree. The number of clusterings: m+1< n. More than two branches from some (or all)
of the nodes in the tree. Fig. 3 in Ch. 1 gives an example.
Unless otherwise specified binary trees will be assumed in the following.
Summary
In this section a HCS is defined and illustrated. The central notion in a HCS is the concept of an
ordered set of clusterings. Another concept is a set of ordered values - α. These values are the basis
for the definition of distance between objects. Since the clustering are ordered (by an inclusion
relation) this distance function satisfies a stronger form of the triangular inequality - the ultrametric
inequality.
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
117
In the next section we discuss a specific implication of the inclusion relation between clusterings. The
set α is then not considered.
5.2.
HCS and the Guttmann scale.
In this section it will be shown that a HCS - considered as a sequence of ordered clusterings -is
isomorphic to a Guttmann scale. In order to prove this some concepts developed in a recent paper by
Johnson on "Metric clustering" Johnson (1968) are very helpful as are also some similar and more
general concepts in an important article by Restle (1959). The main concepts are illustrated by the
same tree structure as used in the previous section. It is simplest not to include clusters consisting of
just single objects in the clusterings.
Table 3. Illustration of "height" and “distance between clusterings
j C - clusterings
0 1 (1 2)
2 (1 23)
3 (1 2 3) (4 5)
4 (1 2 3) (4 5) (6 7)
5 (1 2 3) (4 5 6 7)
6 (1 23 4 5 6 7)
h(Cj) = h(Cj∩Cj+1)
0
1
3
4
5
9
21
d(Cj, Cj+1)
1
2
1
1
4
12
-
h(Cj) is called the height of a clustering by Johnson. This term is probably inspired by “weak” and
“strong” clustering as defined in the previous section. A weak clustering has height 0 and height is
maximum for a strong clustering. More precisely height is defined in terms of incedence matrices. An
incedence matrix is a symmetric n by n matrix with entries 1 for all pair of objects which are in the
same cluster, 0 otherwise. The diagonal consists of 0 entries. Height is then defined:
h(Cj) = sum of incedence matrix for Cj.
This is the same as the total number of relations "within clusters". Since the incedence matrix is
symmetrical it is sufficient to consider the offdiagonal halfmatrix. Examples of incedence matrices (for
C4 and C5) are given in Table 4.
Table 4. Examples of incedence matrices.
C4
1
2
3
4
5
6
7
C5
1 2 3 45 6 7
1
1
0
0
0
0
1
0
0
0
0
1
2
3
4
5
6
7
0
0 1
0 0 0
0 0 0 0 1
.
1 2 3 4 5 6 7
1
11
0 0
0 0
0 0
0 0
0
0 1
0 11
0 11 1
If there are s clusters in Cj, with ni objects in cluster si the following formula can be used to compute
h (Cj):
(5)
n
n
n
h(C j ) = ( 21 ) + ( 22 ) + ....( 2s )
3
4
In the example we have for instance h(C5) = ( 2 ) + ( 2 ) = 3 + 6 = 9. See Table 3 for other values of
h(Cj). For the strong clustering h reaches its maximal value:
n
h(Cm) = ( 2 )
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
118
since all the objects then are in one cluster. This can be used to norm the h values (dividing all of them
n
by ( 2 ) as Johnson does) but this is immaterial in the present context.
Johnson does not explicitly treat a clustering as a set. This is, however, very much implied by his
definitions. The elements of a (clustering) set are the ones in the corresponding incedence matrix. His
concept of height is then a measure of the set (see Restle 1959, p. 208) the measure simply being the
number of elements. The weak clustering corresponds to the empty set since it has no elements.
Johnson defines the intersection of two clusterings as the largest clustering contained in both. In terms
of incedence matrices the intersection of two clusterings is the number of ones which are common to
both matrices. This corresponds to the standard definition of intersection as the set containing just the
common elements.
Johnson's definition of distance between clusterings:
(6)
d(Ci,Cj) = h(Ci) + h(Cj) - 2h(Ci ∩ Cj)
is exactly the same as Restle's definition of distance between sets, and the specific proof Johnson has
that this measure satisfies the axioms for a distance function is implied by a more general proof by
Restle (1959, p. 209-210).
Restle has a general definition of betweenness which is useful:
Sj is between Si and Sk if and only if:
(7)
(8)
Si ∩ s j ∩ Sk = ф that is: Si and Sk have no common members which are not also in Sj
s i ∩ Sj ∩ s k
= ф that is: Sj has no unique members which are in neither Si nor Sk.
Restle (1959, p. 212) then specifically considers "the special case of nested sets - the Guttmann
scale" where for S1 S2 ...Sn S ⊂ Si+1 for i = 1,2... n. In this case it is simple to see that for i < j < k (as
will be assumed in the following) then Sj is between Si and Sk.
Fig 2. Illustration of a nested sequence of sets.
Equation (7) can be written Si ∩Sk ⊂ Sj which is clearly satisfied by a nested sequence since Si ∩
Sk=Si and by definition Si ⊂Sj. All common members of Si and Sk (simply Si) must also be in Sj.
Equation (8) can similarly be written: Sj ⊂ Si USk. It is evident that Sj has no unique members since
Sj⊂Sk by definition.
In the case where all triples of sets satisfy the betweenness relation Restle shows that:
(9)
d (Si,Sj) + d (Sj,Sk) = d (Si,Sk)
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
119
that is: distances are additive and in an abstract sense the sets can then be mapped as points on a
straight line, or the set theoretical definition of betweenness corresponds to betweenness on a straight
line.
Since now every cluster in Ci is included in some cluster in Cj, all elements in the set Ci must also be
elements of the set Cj, (see for instance the incedence matrices for C4 and C5 in Table 4). The sets Cj
increase in the simple sense that new elements are added to the “old ones” as j increases. We have
then shown:
When each clustering is regarded as a set, the sequence of clusterings forms a linear array and can
be mapped as points on a straight line.
This can also be seen more directly from the definition of distance between clusterings. Since
clustering sets are included in each other:
(10)
h(Ci) = h(Ci ∩.Cj)
Inserting this in the distance definition, equation (6), gives:
(11)
d(Ci, Cj ) = h(Ci) + h(Cj) - 2h(Ci) = h(Cj) – h (Ci)
Similarly
and
d(Cj,Ck) = h(Ck) – h(Cj) .
d(Ci,Ck) = h(Ck) – h(Ci)
which gives:
(12)
d(Ci,Cj) + d(Cj,Ck) = d(Ci,Ck)
Q.E.D.
Notice that:
d φ,Cj) = h(Cj) – h(φ) = h(Cj) and d(φC0) = h(C0) = 0.
The measures of the clustering sets - the heights - map the sets on a straight line. If we accept that all
elements in a clustering set are weighted equally (simply added) the h values can be regarded as an
interval scale representation of the clusterings. The endpoints, h (C0) = 0, and h (Cm) = ( n ), which are
2
empty of empirical meaning, correspond to the two "degrees of freedom" in an interval scale.
Structural characteristics of a given HCS might be studied by considering the differences in distance
between succeeding clusterings cfr. the last column in Table 3. In our example we note that d (C5, C6)
is very much larger than any of the other intervals. This is because clusters containing roughly the
same number of objects are joined in the last clustering. We may note that:
(13)
d(Cj,Cj+1) = h(Cj+1) – h(Cj) = nj1 · nj2
where nj1 and nj2 are the number of objects in the two clusters merged in Cj+1. The new cluster in h
(Cj+1) contains (nj1+nj2)●(nj1+nj2-1)/2 objects and the new cluster thus adds
(nj1+nj2)●(nj1+nj2-1)/2 – nj1●(nj1-1)/2 – nj2●(nj2-1)/2 = nj1●nj2 elements.
In Table 3 we have for instance:
d(C5, C6) = 3 · 4 = 12
Equation (13) might perhaps be useful in a description of structural characteristics of trees.
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
120
While clusterings can be said to form a linear array it is clearly not the case that the objects conform to
any linear order. Indeed, it would represent a misunderstanding of the concept of a tree to impose any
linear order on the "horisontal" sequence of objects in presenting a tree, since the specific sequence is
largely arbitrary. Consider for instance the different presentations of the same tree structure in Fig. 3.
Fig. 3 Different presentations of the same tree structure.
Just from Fig. 3a) it might have been tempting to regard 1, 2, 3, 4, 5 as a linear order, but Fig. 3b) and
c) clearly show that this is unwarranted and there are still further ways of representing the same tree
structure.
In depicting a tree we do not want branches to cross each other. This of course implies some
constraints on the order in which we may list the objects. (In the example above it is for instance not
possible to list any of the objects 3, 4 or 5 "between" 1 and 2, since the branch from such an object
would then cross the branches joining 1 and 2). Johnson's computer program for HCS analysis
illustrates the degrees of freedom in an important way. In his program the object labelled n (the last
row in a similarity matrix) is always printed at the extreme right of the paper. Since it is completely
arbitrary which object the experimenter labels n, this implies that any of the objects may be placed as
the last. (Or as the first since it is evident that a tree for a given HCS might be “reversed”). In the
example above the same HCS is represented with 5, 4 and 3 respectively as the last object.
Since a given object can always be placed last for a given HCS, no object can then be between two
other objects. Stated otherwise no three objects in a HCS can be represented on a straight line. This
can also be seen directly from the ultrametric inequality. Suppose the contrary: that y is between x and
z and that
d(x, y) = αj < d(y, z) = αk.
d(x, y) + d(y, z) = d(x, z) then implies that d(x, z) = αj + αk which according to our statement of the
ultrametric inequality is impossible since this inequality requires that d (x, z) = αj + αk cfr. equations (2)
and (3).
Since it is convenient to list clusterings vertically, cfr. Fig. 1, we might say more informally that a tree
can be considered as a linear order from a "vertical point of view" when clusterings are considered as
units. It has been shown that this is an implication of the fact that "the clusterings increase
hierarchically" (Johnson, 1967, p. 243). A similar notion is implied by the familiar Guttmann scaleobjects are rankordered and objects with higher ranks have all properties of objects with lower rank
plus some more. From a "horizontal point of view", however, when now objects are considered as
units, the sequence of objects (or leaves in a tree) is to a large extent arbitrary. Since no three objects
can be mapped on a straight line one might wonder whether a dimensional representation of the
objects is at all appropriate.
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
121
Is the ultrametric inequality so strong (cfr. p.115) that it precludes meaningful spatial representation of
the objects?
In the next section we argue that n -1 dimensions are required to represent n objects (conforming to a
HCS) as n points in a metric space. This will set the stage for a general discussion of HCS and spatial
models in Ch. 6.
5.3 A dimensional representation of the objects in a HCS (for binary trees) - a tree grid matrix
It will be recalled that a HCS (now including the clustering values α) can be mapped into a distance
matrix. It is possible to give a dimensional representation of this distance matrix which is simple in the
sense that the coordinates are closely related to the α values and the pattern of values in the
coordinate matrix clearly reflects the tree structure.
It is not “simple”, however, from the point of view of the number of dimensions required, since this
equals n-1. We first explain the nature of the coordinate matrix (which alternatively will be referred to
as a tree grid matrix). The underlying space is not the usual Euclidean one but the l∞ metric
(dominance model, cfr. Section 1.41. After showing that the coordinate matrix maps into the proper
distance matrix we prove that -at least for the l ∞ metric -the dimensionality can not be reduced.
The coordinate matrix for the HCS used in Section 5.1 is presented in Table 5.
Table 5 Coordinate matrix (tree grid matrix) for the HCS in Fig. 1, l ∞ metric.
dimension
node
6
5
4
3
2
1
1
2
3
4
5
6
objects
1
2
3
4
5
6 7
+1
-1 0
0
0
0
0
+3 +3 -3
0
0
0
0
0 0
0 +7 -7 0
0
0 0
0
0
0 + 9 -9
0 0
0 +11 +11 -11 -11
+13 +13 +13 -13 -13 -13 -13
The matrix is oriented to bring forth as clearly as possible the similarity to the presentation in Fig. 1.
We notice that there is a strict one-to-one correspondence between dimensions and nodes in the tree
representation.
The dimension corresponding to the root node will be seen to be most important, or to have largest
scope, it is the most general dimension. This will be referred to as the first dimension, cfr. dimension 1
in the bottom row in Table 5.
The dimensions are labelled inversely with respect to levels so that dimension j corresponds to level
(node) n-j. Generally "lower" dimensions will have larger scope than "higher" dimensions, though
some dimensions are not ordered with respect to scope (In Table 5, e.g. dimensions 2 and 4 are thus
ordered, while this is not the case for dimensions 4 and 5).
On the first dimension all the points (which represent objects) subsumed under the left branch from the
root node have values + αn-1/2 and the values are – αn-1/2 for points subsumed under the right branch.
The first dimension partitions the n points in two subsets with n11 and n12 points respectively where n11
+ n12 = n. The next dimension has non-zero values only for one of the subsets formed by the first
dimension. All objects subsumed under the left branch corresponding to node n-2 are represented
with values + αn-2/2, and the values are - αn-2/2 for objects under the right branch. There will be n21
points with values +αn-2/2 and n22 points with values – α n-2/2 where n21 + n22 = n11 or n12. The rest of
the objects are represented with value 0 on the second dimension.
All the higher dimensions will have non-zero values only for one of the subsets previously formed. For
dimension j the objects subsumed under the left branch of node n-j are represented by the values
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
122
+ αn-j/2 and - αn-j/2 for objects under the right branch. There will be nj1 “ + values” and nj2 “- values”.
Each nja (j >1, a = 1,2) will be the sum of a partition for a higher dimension. (The coordinates in each
dimension might be multiplied by -1, the assignments “left and +”, “right and – “ are arbitrary).
We now show that the type of coordinate matrix discussed above maps into the proper distance
matrix. First all the distances between the subsets formed by the first dimension is computed. The
differences between coordinates is the same for all pairs of objects belonging to different subsets:
αn-1/2 – (- αn-1/2) = α n-1
Since αn-1 is larger than any of the coordinate differences for higher dimensions, all the distances
between the first subsets equal αn-1 as they should. (Remember that only the largest difference is
relevant in computing a distance according to the definition of l∞ metric). In the example all the
distances between (123) and (4567) are computed from the first dimension, they are seen to be:
13 - (-13) = 26
cfr. Table 1.
Consider next the higher dimensions. For any dimension j the distances between the two subsets
formed by this dimension (containing nj1 and nj2 objects) are computed. First note that the non-zero
values on lower dimensions will be tied. The union of subsets for dimension j forms one subset (with
the same value for all elements) of a lower dimension. Consequently the lower dimensions can not
contribute to the nj1 x nj2 distances which are computed from dimension αj. Since second the non-zero
values for dimension j are larger than the non-zero values for higher dimensions, the distances
between the nj1 and nj2 objects will be simply αj as they should. In our example all the distances
between the subsets formed by e.g. dimension 2 will be 22. When a dimension forms subsets of just
one object each, only one distance will be computed from this dimension. This is the case with
dimensions 3,4, and 6 in. the example.
We now have shown that a grid matrix with n -1 dimensions gives a dimensional representation of the
distance matrix from a given HCS in a l∞ metric. The representation is “simple” in the sense that the
non-zero values for dimension j correspond to the branches from the node j.
The essence of the argument given above is that quite literally lower dimensions dominate the higher
ones. When computing distances the higher dimensions do not contribute anything if the objects have
different values on a lower dimension.
A basic point is that large number of zeros for a given dimension signify a limited scope, range, for this
dimension. There are few distances which are influenced by such dimensions. The zero values should
be taken to signify that the dimension is irrelevant for the corresponding domain of objects. The
psychological significance of "irrelevance" and “scope" (range) will be prominent in Section 6.2.
While the tree grid matrix is appealing since it is so closely related to the tree representation, it is not
simple from the point of view of the number of dimensions involved. Indeed, it is well known that any
set of n points (satisfying the triangular inequality) can be represented in n-1 dimensions. (See for
instance Torgerson, 1958). When applying spatial models one usually hopes that the number of
dimensions is far less than n. Is it possible to give a dimensional representation in less than n -1
dimensions? This will be the case if we can show that a tree grid matrix has rank less than n -1 or
alternatively that the n -1 row vectors in a grid matrix are linearly dependent. If this is the case one (or
more) of them can be represented as linear combinations of the others and the dimensionality can be
correspondingly reduced. Conversely, if the n -1 row vectors are linearly independent dimensionality
can not be reduced. We will give a proof that the n-1 vectors are linearly independent, that is:
Theorem: In a l∞ metric it is not possible to re resent n elements in a binary tree in less than n -1
dimensions 36
36
If this was not the case some of the nodes would be redundant. The reader may skip the proof.
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
123
First we give a proof for our example of a HCS and then we suggest a general proof. The proof is a bit
tedious but a straightforward application of linear algebra. For dimension j the vector of coordinates
will be written Xj. The non-zero values in a row will be written xj and -xj. In the example we then have:
(14)
X1 =
X2 =
X3 =
X4 =
X5 =
X6 =
1
2
(x1 x1
(0
0
(0
0
(0
0
(x5 x5
(x6 -x6
object
3
4
x1
-x1
0
x2
0
0
0
x4
-x5
0
0
0
5
-x1
x2
0
-x4
0
0
6
-x1
-x2
x3
0
0
0
7
-x1)
-x2)
-x3)
0)
0)
0)
A necessary and sufficient condition for a set of n-1 vectors to be linearly independent is that the only
way of satisfying the vector equation:
(15)
k1 X1 + k2 X2 +
+ kn-1 Xn-1 = 0
is that all the scalars kj (j = 1,2, n - 1) must be 0. This is w hat will be proved.
If on the other hand equation (15) could have been satisfied with at least one kj different from 0 the
vectors would have been linearly dependent and then at least one of them could have been expressed
as a linear combination of the others and the dimensionality could then be reduced.
Equation (15) implies that the separate equations for each object must be satisfied. We insert equation
(14) in equation (15) for the objects subsumed under the left branch from the root node and get:
(16)
k1x1 + k5x5 + k6x6
k1x1 + k5x5 -k6x6
k1x1 + k5x5
k1x1 - k5x5
k1x1
= 0 (1)
= 0 (2)
= 0 (12) = (1) + (2)
= 0 (3)
= 0 (123) = (12) + (3)
(1) in equation (16) is the equation for object 1, similarly (2) for object 2. Since (1) and (2) are only
differentiated by X6, we see that the last right hand term drops out when we add (1) and (2) in equation
(16). This new equation corresponds to the cluster (12) and is labelled accordingly. Similarly (12) and
(3) are only differentiated by X5 so by adding (12) and (3) it is evident that k1 = 0. Having first gone up
we then go down. By inserting k1 = 0 in (3) in equation (16) we get k5 = 0 and by further inserting k1 =
k5 = 0 in (1) equation (16) we finally get k6 = 0.
In order to better trace the details of this reasoning and thus get a general view of the line of thought it
may be instructive to study Fig.4.
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
124
The branches are labelled in such away that it is easy to see which terms go into the equation for any
of the objects e.g. k1, k5 and k6 for object 1. Generally we start from a cluster composed of single
objects. Adding equations (going up) the term for the differentiating dimension drops out. When the
cluster thus "grows" a new equation is added and again a differentiating dimension drops out. This
process is repeated till we get to a cluster separated by a single branch from the root node. It is then
evident that k1 = 0 and by then going down it is seen that the k's nested under k1 must also disappear.
Below the same process is shown for the right part of the tree, (not in complete detail):
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
- k1x1
- k1x1
- k1 x1
- klx1
- k1x1
- k1x1
- k1x1
+ k2x2 + k4x4
+ k2x2 - k4x4
+ k2x2
- k2x2 + k3x3
- k2x2 - k3x3
- k2x2
125
= 0 (4)
= 0 (5)
= 0 (45)
= 0 (6)
= 0 (7)
= 0 (67)
= 0 (4567)
(forming (4567) was not strictly necessary since we already knew that k1 = 0, this step was just added
for additional clarity).
In general we start by considering two single objects c0 and c01 which are differentiated just by xim.
Adding the equations for these elements gives the equation for cluster c1 where xim drops out.
(17)
± k1x1 ± k i1x i1 ⋅ ⋅ ⋅ ⋅ ⋅
± k im −1x im −1
± k 1x1 ± k i1x i1 ± k im − 2 x im − 2
=0
=0
--------± k 1x1 ± k i1x i1
=0
± k1x1
=0
c0 + c01
c1 + c11
= c1
= c2
= cm1
cm-1 + cm-1,1 = cm
where im > im-1 > im-2.…............. i1 > 1.
When going up the cluster c1 grows by having added a cluster c11 which then gives a new cluster c2.
This process is repeated till we finally reach cm. For each new cluster cj a differentiating term xim-j
drops out. cm then implies k1 = 0. We then work down and then cm-1 implies ki1 = 0 etc. till finally c1
implies xim-1 = 0 and at last c0 (or c01) implies kim = 0.
If at any stage in the process described by equation (17) cj1 is composed of more than one element we
must arrive at one equation for cj1 by a process similar to the first equations in (17). Some dimensions
different from ij (j = 1,2, m) will be involved here. When we get back to cj1 the k's for these dimensions
must be separately traced and will similarly be seen to be 0. The complete process is recursive, the
main process which leads to cm must be used to arrive at cj1 and perhaps again for subclusters of cj1.
cm will correspond either to the left or right branch from the root node, cfr. the ± notation in equation
(17). Exactly the same argument can then be used for the other branch from the root node (starting
with an arbitrary cluster of two elements nested under the other branch from the root node).
We have then shown that starting from equation (15) and a n - 1 dimensional representation of a tree
we must have:
k1 = k2 = …. kj =…..kn-1 = 0
According to the definition of linear independence all the dimensions or vectors, Xj are then linearly
independent and the theorem stated on p. 122 is thus established.
This proof assumes a l∞ metric. Can it be the case that a smaller dimensionality will suffice for another
lp metric? I have not seen any way of proving that this can not be the case. It does not, however,
appear likely. An indirect approach is to study dimensional representations of tree structures in the
most popular Euclidean (l2) metric. A large number of analyses of distance matrices generated from
HCS structures have shown that in every case n-1 dimensions are necessary. These runs are part of a
more specific comparisons of tree structure models and spatial models, the details of which will be
reported elsewhere. Suffice it here to note that the Euclidean representation does not reveal the
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
126
structure as tree grid matrices do. Depending upon the type of tree it will be more or less difficult to
“decode” the structure as a tree" This problem will be further commented in Section 6.41.
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
127
Chapter 6
FILLED, UNFILLED AND PARTIALLY FILLED SPACES
6.1.
A discussion of HCS and spatial models.
Any given similarity matrix may be analyzed either by a spatial model or a tree structure model. Are
these models exclusive in the sense that if one of them fits the data the other will not? More generally
do they represent different theories of underlying cognitive structure? Miller (1967) mentions as one of
the problems in need of clarification the relation between factor analysis (a spatial model) and tree
structures. In this chapter we will throw some light on the similarities and differences between the two
approaches. This will lead to an outline of a general model.
When I started to investigate this problem, I first noticed that no element y in a HCS can be placed
between two other elements x and z on a straight line (cfr. p. 120) .The next step was then to consider
a simple tree with just 4 objects and the corresponding distance matrix, cfr. Fig. 1.
Fig.1. A tree with 4 objects and the corresponding 3 dimensional spatial representation.
It is fairly simple to see that this tree can not be represented in a two-dimensional Euclidean space.
Three dimensions are necessary as illustrated in part c) of Fig. 1.
When 3 objects require 2 dimensions, 4 objects 3 dimensions, it was tempting to guess that generally
n-1 dimensions would be necessary to represent n objects as points. Intuitively it appeared that the
large number of ties in the distance matrix required by the ultrametric inequality would “force” the
dimensionality upwards when considering trees with new objects added. When I suggested this
argument to Johnson he pointed out (personal communication): " - however, n - 1 scalars (one
dimension) suffice if you expand slightly the notion of dimension. For example to place n points in one
dimension we may pick an order and n-1 interpoint distances. In clustering we pick a tree and n node
distances: the information seems to be about the same. An interesting question: is there a
twodimensional clustering representation in the above sense.”
This comment inspired the work reported in Section 5.2, where it was shown that a HCS could be
regarded as a one-dimensional scale. Each clustering was mapped as a point on a line. This is really a
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
128
fairly simple consequence of the cumulative nature of the clusterings (regarded as sets). New
elements are added to each new clustering in the ordered sequence and all the previous elements are
retained. The distance between succeeding clusterings is simply the number of new elements added.
The term "hierarchical" in HCS seems to refer mainly to this cumulative growth in the sequence of
clusterings.
Notice that the node distances (α values) mentioned by Johnson in the quotation above were not
considered when regarding HCS as a onedimensional scale. There may perhaps be other ways of
regarding a HCS as a one-dimensional scale. (Perhaps new elements added to a clustering set could
be weighted by the corresponding α value.)
If, however, not clusterings but objects are to be mapped as points not the Euclidean but the
dominance metric was found to be particularly appropriate. Each node in a tree then corresponds to a
dimension. Objects subsumed under the left branch from node j were given the value αj /2 and objects
under the right branch the value - αj /2 on dimension n-j. (The first dimension which is most important
corresponds to the root node, value α n-1). The l∞ metric insures that only one dimension is relevant in
computing the distance between any two objects.
How are these different representations related: a onedimensional representation of clusterings and a
n-1 dimensional l∞ representation of objects? The one-dimensional scale can be conceived of as a
series of intervals (cfr. the last column in Table 3, Ch. 5). The last interval corresponds to dimension 1,
the next last to dimension 2 and finally the first interval (C0 - C1) to the last dimension. (If the ∞ values
were taken into account in defining the length of the intervals it could probably be shown that the
length of the intervals were related to the variances of the corresponding dimensions).
In Section 5.3 we proved that it is not possible to have less than n - 1 dimensions in representing n
objects in l ∞ metric. This immediately implies that if we in a given tree insert a new object (and thus a
new node) the dimensionality will increase. A tree with 3 objects can be represented as the corners of
a triangle with two equal sides. An added fourth point x can not be "between" the three points (a, b and
c) in the sense of being inside the triangle. The space inside the triangle can not be filled, it must be
empty. (Indeed the whole plane formed by extending the triangle must be empty except for a, b and c).
Generally objects can be regarded as the corners of a regular convex polyhedron in n-1 dimensions
and no new point can be located inside this polyhedron. Stated otherwise:
It we represent n objects forming a tree structure in a space the space can not be filled.
Inherently there does not seem to be anything "wrong" with the spatial representation in the l∞ metric.
Yet it should be strongly pointed out that the notion of a space which can not be filled does run counter
to the general concept of space. This is stated as follows by Torgerson (1965, p. 385):
spatial models tend to imply continuity. We tend to interpret a dimension or direction in the
space as a continuous variable. Since space itself is nothing but a hole it seems to me that
this assumption implies that the hole can be filled, the hole should not have unfillable holes
in it.
In our case the space is - apart from "skeleton points" - nothing but an "unfillable hole"!
In a basic theoretical contribution on "the foundations of multidmensional scaling" Beals et. al., (1968)
note in the beginning of the article that "the content and justification of multidimensional scaling have
not been explored" and furthermore: "such representation carry strong implications that should not be
overlooked" and a. warning is sounded: "if the necessary consequences of such models are rejected
on empirical or theoretical grounds the blind application of multidimensional scaling techniques is quite
objectionable." (op.cit. p. 28). More specifically they consider "metrics with additive segments
(underlined here) where "segmental additivity" implies that for any distinct points x and z there exists a
set Y of points such that for y in Y d(x, y) + d(y, z) = d(x, z). Clearly this is an aspect of continuity which
is not satisfied by HCS. A main part of their article is devoted to stating ordinal assumptions from
which the usual axioms for a distance function and segmental additivity can be derived. Some of these
ordinal assumptions clearly imply continuity, one of them is for instance commented: "this condition
implies that there are no "holes" in the set of stimulus objects". (op.cit. p. 131).
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
129
The main point in the present section is to bring to attention that spatial models imply continuity, while
from a spatial point of view the objects in a HCS can only be mapped in an "unfillable" space. The
usual notion of "dimension" - usually implied by spatial models -clearly implies continuity both from a
common sense point of view and also in a large bulk of psychological work. Considering white and
black shades of grey quickly comes to mind as an example. Also, we usually conceive of people not
just as intelligent or stupid, but as varying in ability. In learning theory some notion of "habit strength" is
usually prominent - again a dimensional - continuous concept.
At this point we could reiterate the point made in Ch. 1, that tree structure and spatial models
represent different types of model. Perhaps the metamodel, e.g. the selection rule stated on p. 22,
might be developed to provide a simple test as to which (if any) of the two types of models was most
appropriate for a given set of data.
But a more fascinating inquiry is to ask: if HCS represent unfilled spaces, spatial models filled spaces,
might there not be other, more general structures which subsume HCS and spatial models as special
cases? Tentatively such structures may be labelled partially filled spaces.
Section 6.2 will underscore the importance of partially filled spaces by giving examples of hierarchical
models in cognitive psychology and then argue for the inadequacy of such models. After the outline of
a general model in Section 6.3, some of the issues raised in Section 6.2 will be further developed in
Section 6.42.
6.2. The inadequacy of tree structure models. Comments on tree rid matrices G.A. Miller's semantic
matrices and G.A. Kelly’s Rep Grid.
There is a quite obvious isomorphy between tree grid matrices, and a type of semantic matrices
described by Miller (1967). This is best brought out by rearranging a demonstration example he gives.
Table 1: An example of a semantic matrix as a tree grid matrix. Adapted from Miller (1967, Figure 6,
p. 47).
m
o
t
h
e
r
t
r
e
e
c
h
a
i
r
r
o
c
k
f
e
a
r
v
i
r
t
u
e
Semantic
markers
(features)
+
-
c
o
w
t
i
g
e
r
object -nonobject
living -nonliving
mental- characterologicall
plant- animal
artefact -natural
human -subhuman
feral -domesticated
+
+
+
+
+
+
+
+
+
-
+
-
0
0
0
0
-
0
0
+
0
0
+
0
0
+
0
0
0
0
0
+
0
0
0
0
0
0
+
0
0
0
0
0
0
0
0
It is immediately evident that the general structure is the same in Table 1 and in Fig. 4, Ch. 5 (The only
difference is that values of clusterings are excluded from Table 1, but that is unimportant in the present
context.)
So objects may be words and dimensions may be semantic features. Later Miller (1969), asking how
the subjective lexicon is organized, has given some evidence that a collection of English nouns fairly
well conforms to a tree structure model. In the present context we will concentrate on Miller's general
presentation of hierarchical semantic systems and show that exactly parallel concepts are part of the
impressive personal construct theory presented by Kelly (1955).
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
130
First we note that corresponding to features Kelly uses the term construct. Kelly's insistence on the
dichotomous nature of constructs is conveniently captured in Table 1. What Miller refers to as values
of features correspond to poles of constructs in Kelly's terminology.37 The main methodological tool
Kelly has provided is what he calls the Rep Grid. In most applications the objects are what in Lewinian
terminology would be called "relevant persons in the subject's life-space" (self, spouse, father etc.
etc.). Kelly usually refers to them as figures. Generally Kelly finds it convenient to say that constructs
deal with events, figures are thus an example of events. So far the semantic matrix is formally identical
to a Rep Grid, the former being flanked by features and items, the latter by constructs and figures.
Whether the internal structure of a Rep Grid is similar to that of the type of semantic matrices Miller is
especially interested in will be discussed later.
As a central aspect of hierarchical organization Miller (1969, p. 176) underscores that features from
one path of a tree are not defined for items subsumed under different paths. As an example he points
out that at one node in a taxonomic tree we have the animal-plant feature, F1. Further removed from
the root there is the vertebrate-invertebrate feature, F2, which is subsumed under the path from
animal. The vertebrate feature is not defined for plants. Generally “a feature F1 is said to dominate a
feature F2 just in case F2 is defined only for items having a particular value of F1." (op.cit. p. 176).
In Table 1 we should thus not regard features as three-valued functions, the zeros should be taken to
imply that for these items the corresponding features are simply undefined.
Exactly the same idea has the status of a corollary in Kelly's theory:
Range Corollary: A construct is convenient for the anticipation of a finite range of events only. (Kelly,
1955, p. 68):
one may construe tall houses versus short houses, tall people versus short people, tall
trees versus short trees. But one does not find it convenient to construct tall weather
versus short weather, tall light versus short light, or tall fear versus short fear. Weather,
light and fear are, for most of us at least, clearly outside the range of convenience of tall
versus short." (op. cit. p. 69).
Kelly is further quite explicit in differentiating between a contrast pole and what is outside the range of
convenience. When discussing the personal construct approach to understanding what for instance
respect may mean to a person Kelly points out that "we cannot understand what he means by
“respect” unless we know what he sees as relevantly opposed". "We do not lump together what he
excludes as irrelevant with what he excludes as contrasting. (op.cit. p. 71).
Miller (1969) explicitly links the limited relevance of features to hierarchical organization. While Kelly is
not equally explicit on such linkage he does have an organization corollary which is highly relevant to
hierarchical organization:
Organization Corollary: Each person evolves, for his convenience in anticipating events, a construction
system embracing ordinal relationships between events" (Kelly, 1955, p. 56).
Generally
"there may be many levels of ordinal relationships with some constructs subsuming
others and those in turn, subsuming still others. When one construct subsumes
another, its ordinal relationship may be termed superordinal and the ordinal relationship
of the other becomes subordinal”. (op.cit. p. 56-57) .
We see that what Miller calls a dominating feature Kelly chooses to call a superordinal construct.
"Higher concepts" will be used as a common term for “dominating features” and "superordinal
constructs", “lower concept” similarly for "dominated features" and “subordinal constructs”.
37
A difference which will not be elaborated here is that features are provided by the linguist. Kelly, however,
insists on eliciting the subject's own constructs, and he never claims that the illustrating examples he uses should
apply to persons generally.
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
131
Looking closer at Kelly’s examples illustrating his orgnaization corollary, there are, however, some
differences compared to Miller's treatment of hierarchical organization. Kelly distinguishes two types of
ordinal relations. One construct may subsume another by:
a) extending the cleavage intended by the other or
b) abstract across the other’s cleavage line.
As an example of a) good may subsume intelligent (and other things falling outside the range of
convenience of intelligent) while similarly bad may subsume stupid (plus things which are neither
intelligent nor stupid). It is not entirely clear how the type of structure a) implies is related to the kinds
of trees we have considered so far, this will be further commented in Section 6.42.
The examples Kelly gives to illustrate b) raise some interesting issues. As one example he states that
the evaluative pole (of the evaluative-descriptive construct) may subsume the construct intelligentstupid (and others) while the descriptive pole may subsume for instance light-dark. While this seems
formally similar to the type of hierarchical organization Miller discusses there is an interesting
difference. If for instance a dominated feature applies to a specific item the higher or dominating
feature will also apply. If vertebrate applies to a specific creature, then animal will apply as well.
But if intelligent applies to a specific creature, evaluative will not apply to this creature, likewise if dark
applies to a specific physical condition, descriptive will not apply to this condition. A construct
"abstracting across" in the sense implied by this example is clearly a construct applying to its
subordinal constructs but not to the events subsumed by the subordinal constructs.
One way to capture the difference between Miller's and Kelly's examples is to say that the former
show transitivity (F1 applies to F2, F2 applies to an item and F1 applies to the item) while the latter do
not show this transitivity (evaluative applies to intelligent, intelligent applies to person x, but evaluative
does not apply to person x).
This distinction between transitive and not transitive relations in hierarchical organization may clarify
an otherwise puzzling problem. It may be tempting to identify concepts at higher levels as somehow
indicating more "abstract" thinking than concepts at lower levels. From a taxonomic point of view,
however, fish is at a higher level than herring, yet few would say that pointing at a creature and saying
“there is a fish” is more abstract than saying "there is a herring". For not transitive relations, however,
the intuitive notion of more abstract thinking seems to apply to the superordinal construct, since it is a
metaconstruct, a construct about a construct.
According to Kelly such metaconstructs are of profound importance for understanding how persons
may change (cfr. Fragmentation and Modulation Corollary). In the present context we just outline two
kinds of change, particularly emphasized in the most important of the recent theoretical contributions
to personal construct theory, Hinkle (1965). There is first what Hinkle calls slot change, e.g. one may
shift from regarding self (or others) as subsumed under one pole to being subsumed under the
contrasting pole. The hierarchical nature implied by Kelly's corollaries does, however, imply
possibilities of what may well be regarded as "deeper" changes. As an example Kelly (1955, p. 82)
asks us to consider a person who once construed people around him under the construct fear –
domination, there are those to fear and there are those to dominate. But there may be a metaconstruct
childish-mature, where childish subsumes fear-domination and mature subsumes respect-contempt
and the rnetaconstruct may permit a change from applying the fear-domination construct to the
respect-contempt construct. This is referred to as a shift change by Hinkle. We may note that a shift
change for subordinal constructs corresponds to a slot change for a superordinal construct. The
person may say: "whereas I formerly had a childish outlook, I have now shifted to a mature outlook
concerning other people".
These two different kinds of changes will be further commented in Section 6.42. Turning now to the
Rep Grid where the person provides constructs to describe figures, Kelly might have chosen to
explore the hierarchical implications of his theory. We have pointed to both the formal similarity of a
semantic matrix and a Rep Grid, and also important similarities between Miller's description of
hierarchical organization and some of Kelly’s corollaries. One necessary condition for exploring
hierarchical organization would then be that the range corollary explicitly was included in the
instruction to the subject. This would imply that he would be permitted to mark certain constructs as
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
132
irrelevant for certain figures. As a matter of fact, however, Kelly chose to elaborate the spatial
(dimensional) implications of his system and explicitly assumed that all figures would fall within the
range of convenience of all the constructs the subject provided. For a given construct a check mark is
taken to imply that one pole applies to the figure, a void implies that the contrasting pole applies. This
gives a "filled" grid, there are no intersects corresponding to the zeros in Table 1. Of course he was
aware that "this may not be a good assumption in all cases: it may be that the client has left a void at a
certain intersect simply because the construct does not seem to apply, one way or the other, to this
particular figure" (op. cit. p. 271).
This assumption precludes finding a hierarchical structure in Rep Grid. As we have seen, such
structure necessitates not only a sizeable number of irrelevances, but also a specific pattern of them
(cfr .the zeros in Table 1, p. 129 and Fig. 4., Ch." 5.). Did Kelly violate his own theory in not including
implications of the range corollary in the Rep Grid? There has been some discussion of this, cfr. the
recent summary of Grid methodology by Bannister and Mair (1968). They do for instance point out that
when a subject conscientiously carries out the instructions for the Rep Grid "he may quietly produce
what, in terms of his personal system, is nonsense." (op.cit. p. 204).
The main point in the present context is not, however, to add to this discussion but rather to point out
that there are overriding considerations in Kelly's theory which may justify the choice he made. These
considerations bring us to the main point in the present section, the inadequacy of tree structure
models. There is one construct used to describe construction systems which gives us a clue to this
inadequacy, the construct propositionaliy. This construct in a way runs counter to the notion of range
of convenience "a propositional construct is one which does not disturb the other realm memberships
of its elements ---Although this is a ball, there is no reason therefore to believe that it could not be
lopsided, valuable or have a French accent." (op.cit. p. 157). It is as if Kelly recognizes that for most of
us anything with a French accent must fall outside the range of convenience of the construct ball, but
he refuses to necessarily accept such "constricted" thinking. Struggling for years with this type of
problem Bannister and Mair (1968, p. 129-130) have tried to exclude false teeth from the range of
convenience of religious-atheist, only to end up realizing "that false teeth are clearly atheist".
Balls with French accents, atheist false teeth - what do they have in common? These constructions
share the attempt to break away from strict, pedestrian semantic rules, in other words they point to
what makes language come alive, metaphors. Metaphors may recall a charming game, guessing who
a person is thinking about by way of metaphorical questions. "If he were a flower, what would he be?
Or what would be his emblem as an animal, his symbol among colours, his style among painters.
What would he be if he were a dish?." (Gombrich, 1965, p. 36) A fascinating aspect of this game is
that it really may work, "the task of the guesser is by no means hopeless" (op.cit. p. 36). Perhaps one
should not be surprised that this game is one of the bag of tricks used in encounter groups.
So it may not simply be the by now proverbial malleability of subjects in psychological experiments
which makes them comply by filling out a complete grid. The metaphorical quality of language, which
may make just about any construct apply to just about any figure, seems to be a better explanation for
the fact that it is usually not difficult to make a person fill out a complete grid.
And how to deal with metaphors? We should not forget that metaphors and related phenomena were
the point of departure for perhaps the most widely discussed example of a spatial model in
psychology, Osgood’s semantic space. Since this is not always recognized among the numerous
critics and commentators of semantic space, we quote from the introductory chapter in Osgood et.al,
(1057, p. 20-21):
"The notion of using polar adjectives to define the termini of semantic dimensions grew out of research
on synesthesia with Karwoski …..” Pointing to the general relevance of synesthesia for thinking and
language there were the observations:
whereas fast exciting music might be pictured by the synesthete as sharply etched, bright
red forms, his less imaginative brethren would merely agree that words like “red-hot”,
"bright" and "fiery" as verbal metaphors adequately described the music. The relation of
this phenomenon to ordinary metaphor is evident. A happy man is said to feel "high", a
sad man "low", the pianist travels "up" and "down" the scale from treble to bass: soul
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
133
travels "up" to the good place and “down" to the bad place: hope is "white" and despair is
"black".
Reminiscing on the growth of "semantic space” Osgood (1969, p. vii to ix) recalls his childhood
infatuation with words and “a vivid and colourful image of words as clusters of starlike points in an
immense space." He then expresses his deep gratitude to the inspiration provided by Karwoski’s work
on synesthesia and later recalls how "I was swept up into the monumental edifice of learning theory
that Clark Hull was building.” Osgood ends up identifying semantic space with a wayward Pinocchio.
After stating what he is not, there is the positive assertion: "he is…. primarily reflecting affective
meaning by virtue of the metaphorical usage of his scales."
One possibility might now be to say that tree structures have a limited range of convenience, as they
provide a model for part of the psychological lexical organization and that there is a different range of
convenience for spatial models, the latter being relevant for metaphorical and affective aspects of
language. But this does not seem satisfactory, there is Kelly clearly straddling both horses.
Kelly might be said to deal with phenomena on a macro level - the large issues in personality theory. It
is very interesting to note that the dissatisfaction with an either/or approach to these types of models
which we have read out of Kelly also finds support in a recent theoretical framework for
psycholinguistics. When discussing phenomena at a micro level, Rommetveit (1968, 1972) describes
referential, associative and emotional aspects of the experience of words but one of his basic points is
that this in some way is an artificial and arbitrary division of one process, since the different
components mutually influence each other. Representational processes release associative and
emotional aspects and are also continuously influenced by associative and emotive impulses.
(Rommetveit, 1972, p. 75).
So it appears that neither a tree structure model nor a spatial one can be adequate for complex
human functioning. This motivates the outline of a more general model in the next section.
6.3.
Outline of a general model.
The first step towards a general model is to embed classes in a multidimensional space. Struggling
with the general problem of geometric representation of complex structures Attneave (1962, p. 638)
stated: "if a multidimensional psychophysical space is taken as the fundamental framework then
classes (e.g., of objects) may be conceived as regions or hypervolumes in that space." A very
significant further step was taken by Torgerson (1965) who suggested a variety of structures all of
which shared the characteristic that they violated the assumption of a filled space. In the present
terminology he suggested a variety of types of partially filled spaces. It is especially important to note
that he reported some experiments where the stimuli were constructed to reflect both qualitative (e.g.
sign of asymmetry) and quantitative (aspects of size) dimensions and the results supported
interpretation in terms of a partially filled space.
The most important of the structures Torgerson suggested is a mixture of class and dimensional
structures. This gave rise to his highly interesting contribution to a symposium on Classification in
Psychiatry where he refused to fall prey to the dichotomy between “dimensionalists” and "typologists"
(Torgerson, 1968). He (op.cit. p. 219) suggested that similarity between patients may be determined in
part by class membership "and also in part by degree of difference on one or more quantitative
dimensions that cut across class boundaries. “This would occur, for example, if some of the variables
were sensitive to overall degree of disturbance, regardless of the type of disturbance involved”. The
resulting structure is not too difficult to visualize. First we note that e.g. 3 classes will be represented
as three points (or tight clusters) in a two dimensional space. In the simplest case the 3 classes will be
represented as corners of an equilateral triangle. If we now to class membership add a dimension (e.g.
overall disturbance) "the points would … be located in a three-space, but only on the three lines
corresponding to the edges of a triangular prism" as in Fig. 2.
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
Fig. 2.
134
Representation of a mixed class and dimensional structure, 3 classes and one continuous
dimension. Adapted from Torgerson (1968, p. 219)
With 1 quantitative dimension we get clusters organized as lines, with 2 quantitative dimensions we
get clusters organized as planes etc.
A purely dimensional interpretation of such structures would be highly misleading, since the basic
feature, the mixed structure would then be lost. This should not, however, deter us from applying
multidimensional methods, since as Torgerson (1965) emphasized a multidimensional space in
principle can embed any kind of structure, spatial or not. Adherence to a conventional dimensional
framework is then of course objectionable but "we can think about - and look at - the shape of the
configuration itself" (op.cit. p. 390). But we may recall from Ch. 1 that our capacity to "look" is severely
restricted, this is why Shepard (cfr. p. 18) stressed the need for “artificial machinery” - or data
reduction models. Consequently it is a very important step when one of Torgerson's pupils described
an algorithm for revealing “mixtures of class and quantitative variation”, Degermann (1970).
Before giving a rough outline of the steps in his algorithm it may be helpful to describe the types of
experiments that he reported which conformed to his model. There were first “15 stimuli composed of
three classes (triangle, circle, square) "varying in five levels of grey (brightness)”, second an
experiment with “20 stimuli composed of four classes (triangle, circle, square, cross) varying in 5 levels
of brightness”. The third experiment contained three shapes and two quantitative dimensions, both
brightness and shape (op.cit. p. 484). For all experiments the subjects made judgments of dissimilarity
for each pair of stimuli. An important feature which these experiments share with the example from
Torgerson is that the quantitative dimensions apply to all the objects in the set, we have what will be
called global dimensions.
The first step in Degerman's algorithm is to perform a multidimensional scaling of similarities data
which gives an (n, k) coordinate matrix. n is as usual the number of points and k the total number of
dimensions. The basic purpose of the procedure is to partition the k space into two orthogonal
complements, q dimensions for q quantitative dimensions and k-q dimensions for k – q + 1 classes. In
experiment 1 above we would for instance expect a total of 3 dimensions, q = 1 quantitative dimension
and 2 dimension (3-1) for 3 (3-1+1) classes. In the second experiment we would expect k = 4, q = 1
and k - q = 3 for 4 classes. Also in the third experiment we would expect k = 4, q = 2 and k - q = 2 for
3 classes.
First values of k and q must be preset and the next step is then to identify the k – q + 1 clusters. The
special case of q = 0 is the basis for previous cluster programs, this corresponds to a simple
classification (a nominal scale). The problem faced by Degermann is of a far greater complexity. For
q = 0, the clusters are organized around points, but for q = 1 around lines, etc. as already mentioned.
The task is performed by first calculating a set of what he calls hyperplanar coefficients. For the case
of q = 1 such a coefficient is computed for all 3-tuples (generally q + 2 tuples). This coefficient is
simply an index of the extent to which the 3 points fall on a straight line, or more concretely the length
of the perpendicular from the largest side to the opposite vertex. Generally a hyperplanar coefficient is
the "minimum distance from a vertex to the opposite face of a simplex" (op.cit. p. 480). The
hyperplanar coefficients will be small for all points belonging to the same cluster and large for points
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
135
belonging to different clusters. The next step is then to use an iterative procedure to identify the
members of each of the k – q+1 clusters from the set of hyperplanar coefficients. 38
Finally a fairly complex rotational procedure is used to separate the class space from the quantitative
dimensions. In terms of Fig. 2 the prism would be tilted to stand squarely on the plane. The first two
dimensions would then reveal the class structure, here we would get 3 sets of superimposed points.
Plotting for instance the first against the third (the quantitative) dimension we would get the points on
three parallel vertical lines.
This mixed model clearly subsumes the usual dimensional models as one special case. This special
case corresponds to setting q = k in Degerman’s model, we then have the space organized in terms of
a single cluster. Furthermore a class structure is also clearly subsumed, this corresponds to q = 0.
The model is, however, not more general than a tree structure model. These two models are not
comparable in generality since neither one of them can subsume the other. We do, however, think that
it may be possible to extend Degerman’s model to a level of generality where it will subsume a tree
structure. As the next section shows, however, we do not underestimate the complexity of this task.
One way to generalize Degermann's model is to start by checking whether it is reasonable to set
q = 0. Consider now a case where in some way this is found tenable. We note (op.cit. p. 477) that
“when small amounts of random error are present, the class members disperse somewhat, and the
prototypal clusters for nominal classes resemble (k-q) dimensional spheroids centered at the vertices
of a simplex”. It is now possible to conceive of each of these k + 1 dimensional spheroids (when q = 0)
as separate galaxies in the total universe. Technically the key concept is now recursivity. For each of
the separate galaxies the generalized procedure can be applied again. For galaxy i there will be the
choice as to whether gi should be set equal to 0 or not. If qi = 0 we get a set of subgalaxies. Again
each subgalaxy may be treated in exactly the same way as first the total universe then a galaxy. If now
for each galaxy, subgalaxies within galaxies etc. it is reasonable to set the corresponding q = 0 the
universe is described as a tree structure. The analysis then produces classes, further classes within
classes etc. and this is just what a tree is.
This general model will, however, be more interesting when q is not set to 0 for each galaxy and each
subgalaxy. Suppose that first it is decided to set q = 0 but that in galaxy i, qi is greater than 0.
Degermann’s algorithm is then applied just to this galaxy and produce qi quantitative dimensions. Note
that these dimensions only apply to galaxy i and not to the whole universe.
In contrast to the global dimensions in Degermann's model, we get the possibilities of local dimensions
in the generalized model. When g is first set to 0 this rules out global dimensions. The presently
proposed generalization of Degermann's model can, however, accommodate both global and local
dimensions. The previously referred to (k - q) dimensional spheroids may also be analyzed again by
Degermann's model and a single class might then turn out to have a complex substructure, perhaps
comprising both further classes and local dimensions.
In principle the concept of local dimensions solves a problem which worried Attneave (1962) as to the
applicability of multidimensional scaling. As he expressed the difficulty:
The concept of a psychological space of many dimensions, in which virtually any object
may be represented, runs into several difficulties of a still more fundamental nature [than
the previously referred to which deals with the Minkowski constant]. Only one of these
need to be discussed here: the problem of relevance. Consider the kind of dimensions
that might be important for the representation of a human face: e.g., height of forehead,
distance between eyes, length of nose etc. Now where, on such dimensions, is an object
like a chair located? We cannot say that the distance between the chair's eyes is
“medium”, nor that it is "zero": since the chair has no eyes, any question about the
distance between them is completely irrelevant. This is to say, in geometrical terms, that
38
It is interesting to note that one of the procedures used by Johnson (l967) for noisy data is what he calls the
“connectedness”(or minimum method, cfr. p. 191) which will identify “elongated”, chainlike, clusters. This is
somewhat similar to the more refined procedure used by Degermann.
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
136
a face and a chair belong to different representative spaces (or to partially overlapping
spaces), rather than to different regions of the same space”. (op.cit. p. 632) .
In our proposed general model "distance between eyes" would be a local dimension and a human face
would belong to a different galaxy than a chair. Galaxies are not just “different regions of the same
space”, there are not necessarily any “bridges” or “dimensions” spanning the galaxies, the space in
our general model is to an even greater extent than in Degermann's model an unfillable - or only
partially filled space.
The remainder of this chapter is devoted to difficulties in implementing such a general model as we
have sketched. The difficulties are of two sorts. First there are technical problems, some of which are
hinted at in Section 6.41. Perhaps a more serious shortcoming is that as yet we have no concrete
example where the full generality of the model will be useful. This we think partly reflects a lack of
theoretical sophistication in cognitive and social psychology. In Section 6.42 we offer some
speculations which perhaps in the future may lead to fruitful research.
6.4.
Comments on the general model.
The points taken up in this section does not add up to any coherent overall picture. Specifically the
issues raised in Section 6.41 will be seen to be of a guite different kind from those taken up in Section
6.42.
6. 41.
Some technical problems
The metamodel and the general model.
Three points will be raised in this subsection. First we make more explicit the recursivity required and
the problems to be solved in order to implement the general model. Second we comment on the
relation between the general model and the metamodel and third we comment on nonmetric methods
in relation to the general model.
As a general term encompassing the previous terms "universe" and "galaxies", we use simply
"spheroid". A spheroid here denotes a set of points the detailed structure of which is to be decided.
“Degermann separation” will refer to the basic feature of Degermann's algorithm, separating a k
dimensional space into a q dimensional quantitative subspace and a k - q dimensional class space
(k – q + 1 classes).
The following steps describe an outline of the general model:
l) SPHEROID ANALYSIS
2)
test if structure in the spheroid (versus just noise)
a) if YES then go to 3)
b) if NO then STOP
3)
estimate q (and k).
4)
DEGERMANN SEPARATION
4.1 print out the quantitative subspace (if g greater than 0)
4.2 test if q less than k
a) if YES then RECURSlVE call of 1) for each of k – q + 1 spheroids
b) if NO then STOP
For each analysis the recursive aspect will generate a process tree, furthermore the resulting output
can be described by a family tree, cfr. Eckblad (1971a, 971b) for a further discussion of these
concepts.
Some examples will illustrate the steps a general program will go through for various special cases
previously discussed:
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
Type of example
Main steps gone through
pure tree
[2a), 3) q = 0,
Pure class structure
2a), 3) q = 0,
pure dimensional str.
2a), 3) q = k,
Degermann mixed str.
(only global dimensions) 2a), 3) k > q > 0,
not global, but local
dim.
2a), 3) q = 0,
both global and local. dim. 2a), 3) q > 0,
4.2a)]
4.2a), 2b).
4.2b).
137
Repeat [ ] until 2b).
4.2a),2b).
4.20.),2a), 3) q > 0, etc.
4.2a),2a), 3) q > 0, etc.,
A small difficulty is that such an outline is difficult to implement in FORTRAN since recursivity is
generally not possible in FORTRAN (There are other programming languages, e.g. ALGOL, where
recursivity is no problem). A greater difficulty will be to devise appropriate tests of structure (step 2)
and of dimensionality (step 3).
Concerning step 2) one simple condition would be that if the number of points is less than some
specified number, search for structure will be impossible and this will provide one simple STOP
condition (incidentally this may preclude the general model from describing a complete tree structure
as it may well be impossible by this approach to detect further structure in say 4 points). Otherwise
step 2) will probably prove difficult to implement since how can the program "know" if there is structure
without going through a complete analysis? If a complete analysis should be necessary in step 2) we
would seem to be involved in a too messy recursivity since it is difficult to envisage stop conditions.
Consequently it appears likely that it will be necessary with heuristic devices - inspired by computer
work in artificial intelligence - to implement something resembling a general test of structure (perhaps
some general measure of amount of information may also be useful).
Heuristic devices may also be necessary in order to find appropriate estimates of q. If q must be
preset (as in Degermann's algorithm) it will not be possible to implement the recursive loops implied by
the general outline, since q necessarily must be reestimated as we move away from the root in the
process tree generated by the general program. The same holds good for k. One simple rule here will
be that if for instance spheroid i after step 4) contains ni points, then the corresponding k when the
program next enters step 3) must be less than ni.
It may also be possible to estimate k as part of step 4). The basic function of k is to regulate the
number of classes (spheroids) into which the current number of points inputted to step 4) is to be
partitioned. It may be of interest to note that for the special case of q = 0 the general formula for a
hyperplanar coefficient (Degermann, 1970, p. 480, equation (7)) can be shown to reduce to the
distance between two points. For this case there exists a variety of clustering procedures which also
estimates the number of classes, cfr. for instance Tryon and Bailey (1970). Perhaps some features of
"standard" clustering procedures can be incorporated into Degermann's more general clustering
procedure.
We may also note that there are a variety of partial generalizations of Degermann's algorithm which
should be fairly simple to implement but which fall short of the full recursivity described above. A
simple example would be to set q = 0 and then again to input each of the ni spheroids (with ki
computed as some specified function of ni) to Degermann's algorithm. In this second set of runs one
might for instance set qi = 1 for each of the spheroids if one guessed that there would be one local
dimension in each "subgalaxy".
Suppose now that in one way or the other it proves possible to implement the suggested
generalization of Degermann's problem, how should we then conceptualize this general model in
relation to the metamodel? In Ch. 1 we treated a tree structure as one type of model and a
dimensional (spatial) model as another type of model, and partly on this background made applicability
of model one of the major problems to be investigated.
Type has till now been a more general concept than form (e.g. different dimensionalities), but in the
general model both a tree structure model and a purely dimensional model must both be regarded as
different forms of the same type. There is no longer seen to be such a “fundamental” difference
between a spatial and tree structure model as to warrant a distinction in “types”. We can now regard
these two models as characterized by different specifications of parameters within the same type of
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
138
model. In principle then what we in Ch. 1 referred to as deciding which type of model is appropriate
now reduces to a question of justifying a specific set of parameters within one type of model (It may of
course be possible to conceptualize different “types” from our presently proposed "general model",
again further levels of abstraction will be possible and this will then reduce such types to alternative
forms within a still more general type, etc.).
From this point of view the strategy we illustrated in Section 4.7 for determining dimensionality may be
relevant to step 3) in the outline of the general model, though as indicated step 3) is probably of a far
greater complexity. Likewise one of the major concerns in Section 4.6 was to illustrate a methodology
to test whether there was structure in a material or not, this may be relevant to step 2). Perhaps the
present methodology may be incorporated into - or at least inspire - the heuristic strategies which are
necessary to provide a workable general model.
There is finally one more comment to make on the relation between metric and nonmetric methods
which adds to the investigation reported in Section 4.33. It will be recalled that we came out strongly in
favour of nonmetric models, though we paid cognisance to a finding reported by Torgerson (1965)
that nonmetric models may distort data. The structure in the stimulus material reported by Torgerson
appears to be somewhat similar to the structure of the stimuli employed by Degermann (1970).
Consequently it is somewhat surprising that Degermann uses nonmetric multidimensional scaling to
provide the initial (n, k) configuration. Be that as it may, if the distance matrix from a tree structure is
used as input to a nonmetric program, most of the structure will be lost. This substantiates the claim
made by Torgerson (1965, p. 389) that nonmetric models may throw away information in the data. To
take a simple example, suppose we have n = 8 and at each node forms subsets of equal size, then
any nonmetric program will give perfect fit in one dimension. This dimension will consist of two
clusters, each with four superimposed points. The perfect onedimensional fit for the tree implies that
there will be the same values of ∆ (cfr. Section 3.1) for different values of the dissimilarities.
This finding is not surprising for MDSCAL and TORSCA since it is consistent with weak monotonicity.
The finding may, however, be surprising for SSA-1 since it clearly violates strong monotonicity.
Evidently the “strong monotonicity” in SSA-1 is not "sufficiently" strong to avoid degenerate solutions.
It should not be thought that this result is unique to a particular type of a tree structure. To take a quite
different example, if for each node subsuming k points subsets of points are formed which consist of
k-1 and 1 points respectively (k = n, n – 1, .…2), there will again be a perfect onedimensional fit with
one point in one cluster and n-l superimposed points in the other cluster. The former example is
constructed by an "even split" principle, the latter by a "maximally uneven split" principle.
This finding 39 clearly implies that if a nonmetric scaling is used as a starting point and there really is a
tree structure, this will not be revealed if the analysis starts with nonmetric scaling. On the other hand
it will be recalled from Section 5.3 that a l∞ metric was required to give a coordinate representation
isomorphic to tree structure (a tree grid matrix). Numerous Euclidean metric analyses of similarities
data generated by tree structures (the details of these analyses will be reported elsewhere) have,
however, shown that it is not difficult to recognize the tree grid matrix if the underlying tree is based on
the even split principle. The more the tree departs from this (approaching maximally uneven split) the
more difficult it is to recognize the tree in the Euclidean representation (but perhaps not impossible).
The preliminary conclusion we draw from this is that in the general model it may not be possible to use
the standard nonmetric models. (A possibility which has not been investigated is that the reported
degeneraries may be less pronounced with data slightly infested noise, though the possibility does not
a priori appear reasonable).
39
That tree structures are not captured by nonmetric methods may be said to elaborate what Shepard (1962b, p.
249) stated, that any nonmetric model would fail to reveal structure within classes characterized by "proximity
measures for all pairs of points within the same subset larger than the proximity measures for all pairs divided
between two subsets".
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
6.42
139
The general model as a conceptual model new directions for psychological research.
Degermann (1970 p. 486, 487) distinguishes between the value of his model as a tool for determining
structure in empirical data and as a "general conceptual model", an “aid in forming a framework for
experimentation, thereby allowing relevant questions to be asked”.
This section raises some problems which - when clarified - may suggest that our proposed general
model may be a powerful tool as a conceptual model. First we note that similarities data may be
inadequate if we wish to explore the potentialities of the general model. We also consider other types
of data. The focus will, however, not be types of data per se, but rather some theoretical problems.
Since dimensional models are fairly well understood, this section concentrates on the less explored
nature of hierarchical relations. What are the types of relations in hierarchical systems and how should
one go about to assess these types of relations?
These problems will have to be clarified before one can attempt to combine tree structures and
dimensional structures in the general model.
In Section 6.2 we distinguished between two types of relations in hierarchical systems, transitive and
not transitive relations. Not transitive relations were related to metaconstructs, which again point to
different levels of abstraction.
In the signal contribution by Bateson (1955) the notion of different levels of abstraction is basic.
Bateson has had a tremendous impact on clinical psychology and his students have provided the most
viable (and probably a superior) alternative to psychoanalysis and behaviour modification techniques,
cfr. e.g. Bateson et.al., (1956), Haley (1963, 1969), Watzlawick et.al. (1967), Weakland et. al, (1972).
One of the main contributions of Bateson (1955) was to describe a third type of relation in hierarchical
systems, he described intransitive relations in communication which simultaneously takes place at
different levels of abstraction. His main example is an analysis of the message "This is play" on the
basis of observations of monkeys at the Fleishbacker zoo in January 1952.
Somewhat simplified, his analysis (cfr. especially op.cit.p.41) goes as follows:
a)
b)
c)
the playful nip denotes a bite
a bite denotes intention to hurt, 40 but
the playful nip does not denote, but rather denies intention to hurt.
a) and b) logically should imply (by transitivity) intention to hurt, but the essence of "this is play" is a
denial of this logical pattern.
An interesting example of communication which is at least partly characterized by the same structure
as "this is play" is the enchanting behaviour of flirtation. Briefly "this is flirtation" appears to have the
following structure:
a)
b)
c)
the special smile (tone of voice, glance etc.) denotes a sexual approach
a sexual approach denotes intercourse, but
the special smile does not denote but rather denies intercourse,
and again a) and b) "logically" implies the reverse of c) and we all know that "violating" the transitivity
has its special charms.
The label "intransitivity" for the patterns described above is only incidentally used by Bateson,
basically he discusses "this is play" in relation to paradoxes and to Russell's theory of Logical types.
40
Bateson chooses not to specify what a "bite" denotes, but using a specific label for this makes it much simpler
to illustrate his basic idea.
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
140
We share Bateson's scorn for the logician's attempts to rule out messages such as those above as
inadmissible since: "without these paradoxes the evolution of communication would be at an end. Life
would then be an endless interchange of stylized messages, a game with rigid rules, unrelieved by
change or humour." (op.cit. p. 51).
On the other hand it is not difficult to see that the intransitivity in the examples above may be
precarious indeed, play may turn into dead serious fight, flirt may loose its special flavour and be
replaced by "transitive" behaviour.
When we ask about methods which may be directly useful in assessing types of relations in cognitive
structures we shall see that this raises new questions about perhaps yet other types of relations in
hierarchical systems. A particularly significant contribution is Hinkle's innovations in Kelly
methodology.
As noted in Section 6.2 a usual Rep Grid does not incorporate the hierarchical aspects of Kelly’s
theory. If one, as Bannister and Mair (1968) advocates, uses a rating form instead of dichotomous
marking, a Grid becomes formally identical to a semantic differential, the only difference being that in
the former case the subject provides his own constructs, whereas in the semantic differential the
scales (corresponding to constructs) are provided. To further emphasize the similarity, we note that in
some cases it may be profitable to combine personal constructs and provided constructs, cfr.
Fransella and Adams (1966).41 So Rep Grids may just as semantic differentials epitomize the
dimensional approach.
The basic methodological innovation by Hinkle (1965) is the laddering technique. Having provided
constructs to characterize triads where the self is always one of the figures, the subject is further
asked to state the preferred pole of each construct. Call this A1, the other pole A2. He is then asked to
provide superordinate constructs by answering "what is the advantage of A1, versus the disadvantage
of A2". This provides a new construct B1 and B2. The same procedure is repeated on B1 and B2 till the
subject has no further constructs to provide. Neither Hinkle (1965) nor Bannister and Mair (1968)
provides concrete examples of results of such ladders for several constructs for a single subject,
example may be useful to appreciate the special quality of the method.42
Only parts of some of the ladders from the preferred ends are outlined:
make exciting food - gives a richer life - self actualization
enjoy a drink while discussing - alcohol liberates -avoids standard norms - liberates my potentials
not having ceramics as a hobby -concentrate on other interests experience oriented - gives more genuine relations to others
- self actualization
wish to be interested in politics - find one's own position
- find what is true for me - be independent - liberate potentials - self actualization.
The example illustrates how one person from seemingly quite different points of departures arrives at
the same "root".
Hinkle's technique may be extended to provide a valuable clinical tool. Basically one may regard a
symptom (or complaint as Kelly would have said) as a kind of behaviour which has undesirable
consequences for the person: the symptom is a construct with a desired contrast (e.g. "anxiety" vs.
“freedom from anxiety"). On the other hand there will also be advantages of the symptom, as it
provides ways of controlling one' s environment (cfr. Haley, 1963) and correspondingly there are
disadvantages of being free from the symptom. Hinkle (1965, p. 18, p. 57) mentions - without exploring
further-implicative dilemmas, that is situations where both poles of a construct have both desirable and
undesirable features. “Implicative dilemma” conveniently captures the essence of the above outline of
41
Continued use of the dichotomous form of the Rep Grid is not recommended, since dichotomies are not well
suited to reveal dimensional structures. It should be noted that in 1966 Kelly would - if having to rewrite his
personal construct theory - have deleted the section on the Rep Grid, cfr. Hinkle (1970, p. 91).
42
The example and the general comment on the method are based on pilot data collected in a course in the fall
1970.
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
141
symptomatic behaviour. It may be possible to provide (more or less) standardized procedures to
reveal such implicative dilemmas by systematically eliciting advantages and disadvantages both of the
symptom and the contrast to the symptom. This is further treated by Tschudi and Larsen (1970) and
Larsen (1972, Ch. 7), a major point being that revealing implicative dilemmas immediately suggests
therapeutic techniques.
In working further with a variety of examples of these kinds of techniques it is possible that one might
want to use e.g. a tree structure (or perhaps some different kind of model?) not as a descriptive tool,
but rather as a normative model. The model will then be regarded as a baseline, and the deviations
will be of interest in themselves. (This is the case with for instance the use of expected utility as a
model for decision making. The model is not rejected when it does not fit the data, rather the
deviations are regarded as giving important information on information processing, cfr. e.g. Edwards
et. al, (1965). It should be clear that we would not necessary give opprobrious labels to deviations
from logical patterns as the analysis of flirtation makes evident.)
One limitation of both Hinkle's laddering and methods for revealing implicative dilemmas is that these
methods mainly seem to "extend cleavages" rather than "abstract across", cfr. p. 131. From the point
of view of the Bateson group the latter concept seems far more relevant for an understanding both of
psychopathology and therapy. A recurring theme is that paradoxes (e.g. "double binds"), which
necessarily involves different levels of abstraction, are involved both in producing and alleviating
symptomatic behaviour. Consequently the most fruitful way for integration of the work of those who
draw their main inspiration from Kelly and those belonging to the Bateson group, would be to study
what we called “metaconstructs." One could for instance ask a person to sort the constructs he
provides in a Rep Grid and if possible to further sort his metaconstructs. This in essence would be to
ask the person to comment on his construct system. This may well be possible to wed to therapeutic
techniques
At one point in such a procedure the person may perhaps come to realize "this is the way I have
regarded myself and presented myself to others" (a core construct in Kelly's terminology), "but there
may be different ways….” In Hinkle’s terminology this procedure may facilitate "shift changes", (cfr. p.
131) or change in metaconstructs. At this point it seems appropriate to quote the definition of therapy
given by Bateson (1955, p. 49) "therapy is an attempt to change the patient's metacommunicative
habits". This definition has not been improved or challenged by his students.
Before concluding a general precaution is necessary. All the methods we have mentioned in this
section share one basic feature, he results will be highly sensitive to the interpersonal context in which
the methods are used. The most striking impression from reading student's reports on the laddering
techniques was the wide range of comments on the meaningfulness of the method. This ranged from
"mere verbal exercise" to "deep (occasionally shocking) and highly revealing confrontation with one's
most personal values." It does not seem unreasonable to ascribe at least part of this variation to the
varying interpersonal relations.
Stated otherwise there are ample opportunities for arranging situations where the person from most
points of view will just produce nonsense.
This touches a basic issue which can not be elaborated in the present context:
We do not regard cognitive structure as something just “residing in the mind” but rather as strategies
the person may or may not choose to reveal in specific situations. One might even regard various
cognitive structures as (partly) generated by specific situations. This point of view' may in the future
pave the way for experimental manipulations (therapy may be regarded as manipulation of strategies
on an intuitive basis).
Returning finally to our general model the strategy we propose is first to experiment with a variety of
methods of the sort proposed by Hinkle, and also to consider further innovations. The first goal should
be to clarify the nature and occurrence of various types of relations in hierarchical construct systems.
Dimensional aspects of construct systems may best be revealed by the Osgood - Kelly type of ratings.
Perhaps relations between dimensional and hierarchical aspects will be different according to the
prevailing type of hierarchical relations. Will we for instance find global (e. g. evaluative) dimensions in
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
142
hierarchical systems with mainly transitive relations? Will there only be local dimensions if there is a
preponderance of not transitive (meta) constructs?
Probably it is premature at present even to suggest such kinds of questions. A more immediate goal is
to call for close collaboration between the Bateson and Kelly groups of researcher's who, if at all
aware of the other group has done nothing more than paying a passing tribute to each other. Such
collaboration, we believe, will increase the likelihood of developing viable general models. And these
models will hardly be less complex than the model we tried to sketch in Sections 6.3 and 6.41.
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
143
CONCLUDING REMARKS.
There is nothing to add to Part II, so these remarks only concern Part I. First we briefly summarize the
main results and point to further extensions of the present approach. Finally critical views on the
concept 'latent' - or 'true' -structure are discussed.
The present work may be regarded as an extension of the basic contributions of Shepard, furthermore
the work of Kruskal and Young has also been of invaluable help. Perhaps Young should be given
credit for first explicitly formulating the goal of replacing goodness of fit (stress) with true fit. We believe
that making more explicit the theme of latent structure (where indirectly the work of Lazarsfeld has
been a valuable source of inspiration) and the related concept purification has made it possible to
show that the goal formulated by Young can be reached. Since stress - apparent fit - is heavily
influenced by irrelevant parameters such as n and t (dimensionality), any general description of how to
evaluate this index (notably Kruskal's widely quoted description) is inadequate. We think that the
presently proposed true fit index will prove to be a superior alternative.
The value of the conceptual framework we have formulated - the metamodel - is perhaps more clearly
revealed in the present approach to the problem of dimensionality. We go beyond the plea for showing
reliability of the output configuration, G, -cfr. Cliff (1966) and Armstrong and Soelberg (1968) - by
showing that computing this reliability, r (Gim, Gjm), for several different dimensionalities quite simply
will reveal the correct dimensionality, since for this dimensionality the reliability will be largest.
Generally we have further shown that it is the case that the output is more reliable than the data which has not been done previously. This we have labelled empirical purification.
The basic idea in the present approach is that true fit will be optimal when the analysis is guided by
correct specifications (e.g. dimensionality). Not only will this lead to maximal empirical purification, the
basic idea has also made it possible to replace inspection of stress curves (and e.g. the use of the
“elbow criterion”) by the simpler criterion of converting each stress value to a true fit value and then
finding the correct dimensionality by simply picking the lowest (optimal) of the converted true fit values.
The concept purification is made more valuable by the fact that distortion is also possible. In the
present work we have not shown conclusively that empirical distortion may occur for a broad class of
situations, but this is due to the fact that Part 1 is restricted to exploring dimensional models. It might
here be mentioned that in studies to be reported in detail elsewhere we have found that if a tree
structure model is used to analyse data generated by a dimensional model, then marked empirical
distortion will occur.43 The results reported in Section 4.7 do, however, indicate that if n is not too low
and t not too high, then for most realistic levels of retest reliability there should be a pronounced
empirical purification. Consequently there may be conditions where, even if there is some empirical
purification, this may not be sufficient for placing confidence in the underlying model. The converse of
the basic idea is that if the wrong type of model is used we can not expect optimal true fit, but must
expect either empirical distortion or pronounced deviations from the estimated theoretical purification.
Stated otherwise we believe that one of the main contributions of the present approach is that it
provides a new approach to the problem of applicability --the evaluation of the underlying model.
Models are not regarded as theoretically neutral - on the contrary they are regarded as carrying
implications which may or may not be warranted for a given set of data. So the present approach may
also be seen in the broader context of evaluating theories.
A point which perhaps should have been more emphasized is that these results are of obvious value
not only for the interpretation of results but of equal importance for the planning of experiments. The
user may first study Section 3.4 and find out the desired level of true fit. Figs. 10 to 12 will then
indicate the set of combinations of n and reliability which will be necessary. If stringent tests of
dimensionality and applicability are also desired, the results in Section 4.7, particularly in Table 15 and
Fig. 16, may lead to modification of e.g. n.
43
HCS (Johnson, 1967) was used and in the Oslo version we included an averaging procedure to the max and
min methods originally proposed by Johnson. The empirical distortion was marked for all methods, but slightly
less for the averaging procedure.
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
144
Once more, however, we must sound a caution concerning all applications of these results since there
are obvious limitations. Recall the complexity of the variables to be taken into account in simulation
studies, cfr. Section 4.2, and the very limited region of the parameter space that has been explored
thus far. It should be stressed, however, that the programming system outlined in the appendix easily
can accomodate more complex cases than at present have been explored.44 It is fairly straightforward
to provide results for for instance a specific value of n and a specified amount of variance for each of t
dimensions. It may be a better strategy to perform separate runs of this programming system for
specific cases than to grind out large (and probably fairly unmanageable) tables and graphs.
A very interesting task will be to extend the present methodology to other kinds of models than
multidimensional scaling. A good starting point would be the recent program for Polynomial Conjoint
Analysis by Young (1972). This program embodies a large set of measurement models (or
combination rules) as for instance multidimensional scaling, factor analysis, unfolding and additive
conjoint measurement. For all these models there is a large set of choices of algorithmic strategies.
The user can for instance choose not only between rank image and block partition transformations
(cfr. Section 3.1) - there are also two other types of transformations. Furthermore there are for each
type of transformation several possible minimization strategies and finally there are several stress
formulas to choose from. Even after the user has settled on a specific measurement model, he still has
the double problem of first choosing algorithmic strategy and then to estimate appropriate parameters
and evaluate the resulting solution. Concerning the first problem the strategy in Section 4.3 should be
used. Hopefully this will produce some viable generalizations so that the user will be relieved of the
burden of choice in a situation where the consequences at present can be but dimly perceived.
Furthermore it is to be hoped that further explorations of the present methodology will suggest
improvements in the methodology itself. For one thing the metamodel might be extended to cover
more complex cases than those in Section 2.2 and Section 4.6, e.g. the choice between two related
models such as factor analysis (a scalar product model) and multidimensional scaling (a distance
model). It would further be convenient if the present graphical approach could be supplanted with an
analytical approach. Provided correlation is reasonable as a basis for a more general index of true fit,
it might be an advantage with a more elegant transformation than the present TF- categories
transformation. An even more important improvement would be to formalize the metamodel so that
more precise deductions could be drawn. At present the metamodel is just a heuristic device.
To some the present endeavours may perhaps appear to be nothing but idle exercises since the basic
concept latent (or true) structure is thought to be irrelevant and misleading in social science. Such a
critical view appears to be expressed by Lingoes and Roskam (1971, p. 124) who state:
At least one of us puts no store in what he considers the pseudoproblem of "recovering"
known configurations, since for most social science data a “true” set of distances does
not exist to be recovered. All that we typically have is a set of similarity/dissimilarity
coefficients and our task is to understand the observed patterns. A geometric
representation given certain specifications on the elements and properties of that
representation is largely a convenience for aiding such comprehension - nothing more (it is no more “true” than the original data)! (key phrases for subsequent discussion are
underlined here).
It might be noticed that Lingoes and Roskam partly echoe remarks made earlier by Guttmann (1967,
p. 75) who laments "the unfortunate use of terminology (by Shepard and others) such as “recovering”
configurations."
At this point we could note a differentiation into different schools, a "Shepard school" (to which the
present work would belong) and a "Guttmann school". This, however, would be a deplorable state of
affairs. We think there is ground for a conciliatory view and will argue for this. (Note the lack of
unanaminity in the Lingoes-Roskam quotation and the inclusion of a study of "metricity" (true fit) in
their report).
44
On request further information on the programming system will be given or in special cases runs tailormade to
a specific problem can be run at the Computer Centre of Oslo.
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
145
The quotation raises four problems, of which the first is the central and the one who will be most
extensively discussed.
a) Does the present approach imply belief in the “existence” of a true set of distances?
b) How can one provide help for the researcher who of course wants to "understand the
observed patterns"?
c) How are "certain specifications" to be justified?
d) Can a geometric representation be more true than the original data?
The problem of “existence” has ramifications which obviously can not be explored here. We choose to
settle first on one structural interpretation which - we think - all parties will agree in refuting. This is
what Piaget (1971) calls the preformational (Platonic) view of structures where "they may be viewed
as given as such, in the manner of eternal essences." (op.cit. p. 60).
Using "latent structure" as a conceptual tool for a given empirical set of data should not be taken to
imply anything even faintly resembling an "eternal essence". To bring home this point it is convenient
to restate some of the specifications of noise processes in Section 3.2. If the researcher in a given
experiment chooses to conceive of the content of L as specified in one way at time t0 this does not
commit him to conceive of precisely the same content of L at time t1. One may explore specification
2b), cfr. p. 32, and conceive of some random process which produces a different configuration at time
t1 (other noise specifications may of course also be operative). Provided the perturbations are not too
pronounced one might still under some conditions expect empirical purification. This might at least be
explored. Specification 3b), cfr. p. 33, raises even more interesting possibilities. Generally one may
conceive of a fairly large dimensionality for a given domain. What may be "relevant" (or salient)
dimensions may, however, largely be dependent upon the context. What in one context may be
regarded as "error dimensions" (noise) may in other contexts be the salient dimensions, and vice
versa. Furthermore the relative salience of relevant dimensions may vary with context (an
experimental demonstration of this was given by Torgerson, 1965). While one would not expect any
purification across such diversity of conditions it might still be of interest to explore consequences of
such variation in simulation studies. The results could be valuable for evaluating empirical results
following from specific experimental manipulations.
The general point is that a given content of L may be regarded as just a convenient conceptual tool, it
need not carry any connotation of "existence".
From this point of view it is interesting to note that attempts to do without concepts of "latent" or "true"
structure in other fields of psychometrics have been none too successful. There is for instance Tryon
(1957) lashing out against the "doctrine of true and error scores" and "underlying factors". Yet the
following quotation (op.cit. p. 237) where he defines his alternative, the domain score, is highly
revealing: "The domain score, usually called a “true score”, is defined as the sum (or average) of
scores on a large number of composites." The difference between a "domain score" and a “true score”
seems to be mainly semantic in nature.
A related example is stochastic models for e.g. intelligence tests where the basic formulations are in
terms of probabilities of a given number of correct answers (e.g. Rasch, 1960). Yet this can be
reformulated so that an observed number of correct answers can be expressed as an expected value
plus a deviation from this value.
More generally we do not see any fundamental difference between the conventional statistical
machinery of ‘universe parameters’ and 'stochastic variables' and the present 'latent structure' and
'manifest data'.
The basic point is that some structure (schema, image, hypothesis, conceptual tool) is necessary for
the scientist in order to assimilate whatever there may be in the data. The scientist's structure is far
from "static", what is required is what Piaget calls a constructional view of structuralism: "there is no
structure apart from construction" (Piaget, 1971, p. 140). This is the kind of view we tried to sketch in
Ch.1, cfr. p. 7-8, and in Ch. 2, p.20, we more explicitly illustrated an example of this kind of view.
Turning now to the other questions raised by the Lingoes-Roskam quotation we first note that b) takes
for granted that there is an "observed pattern". This we believe can not be taken for granted. Just as
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
146
one may easily simulate a random pattern, subjects may resort to guessing or some quasirandom
process. Some data just are no good! So here we are back to our proposed true fit index - loosely
referred to as amount of structure.
As for c) no answer is provided by Lingoes and Roskam. Perhaps the present contribution would be
less "offensive" if formulated in a language of providing methods for estimating parameters since this
is what our procedure for evaluating dimensionality does. Concerning d) we can of course not accept
their flat denial that a configuration can be more true than the observed data. The answer to this
question will depend upon how it is further specified. We have interpreted "more true" as "purification"
which we have been at some pains to show will occur when an appropriate model has been used.
In his letter Guttmann (1967) repeatedly stresses the importance of looking directly for patterns in the
observed data, for instance: "when I first saw Ekman's first colour vision data matrix, it was obvious
without any computing - that it was a circumplex." (op.cit. p. 76). We do not think this in any way is in
opposition to Shepard's plea for "artificial machinery" (cfr. p. 9) to supplement the generally limited
capacity most of us have for directly observing patterns. Scientific activity is one of the most poorly
understood forms of human activity and any attempts to guide or improve this activity, whether
cultivation of “directly looking” or extensive computer simulation, will surely find their place in the joint
scientific endeavour of figuring out the human complexity.
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
147
Appendix.
Main features of the programming system.
The present programming system is written for a CDC 3300 but it should be fairly straightforward to
adapt the basic ideas to other computer systems with good disk (tape) facilities.
There are three basic features of the programming system:
a) One job calls several programs - it will therefore be convenient to refer to a program as an
element. The programming system utilizes a general master system where a single card is
sufficient to call an element.
b) All elements communicate via disk (or tape) files which may be temporary (scratch files) or
permanent. Input and output for all elements are by means of a standardized system where
both configurations and symmetric matrices are strung out as vectors. Goodness of fit indices
are also outputted on files. There is no card output nor any necessity for punching printed
output from one element as input to another element.
c) In each of the elements one parameter card is sufficient to process all the runs in one
condition (a specific combination of n and t). Usually one (set of) parameter card(s) is
necessary to process each single data vector. In the present system, however, a special loop
has been built into MDSCAL, TORSCA and SSA –1 so that one parameter on the parameter
card specifies the number of configurations to be processed. For the simple form of the
metamodel (unrepeated designs, cfr. Section 4.2) this parameter, c, will simply be the number
of noise levels, ne, times the number of replications for each noise level, rep. c will of course
also be a parameter in all the other elements in a specific job.
We will briefly outline the typical sequence of elements in the simplest simulation studies and also
indicate some of the more complex possibilities within the present system.
One job might make six calls for elements as follows: (most elements require one or two parameter
cards)
l) DISTANCE.
2) MAIN PROGRAM.
3) DISTANCE
4) FILE - 1
5) RELATE
6) FILE - 2
This element will typically generate c different L vectors and c
corresponding M vectors.
This will be either TORSCA, SSA -1 or MDSCAL. In the latter case a
special preprocessing program is used first to compute the initial
configuration from M, cfr. Section 4.32. The file output will be cG(C)
vectors and a different file containing c goodness of fit values.
This time this element will simply convert G(C) to G V), cfr. Section
2.1.
Typically this element will merge the separate files for L, M and G
vectors to one file (L1 M1 G1) (L2 M2 G2) ...(Lc Mc Gc)
In the simplest case this program will compute correlations within each
set (L, M, G). The program will select the correlations of interest and
output them on a special file.
This program handles the set of NL, TF and AF indices computed by
the preceding elements. When necessary particular indices are
transformed. The program contains a variety of transformations and
may easily be extended. Means for each noise level of (transformed)
indices are computed.
This system has been growing over some years and will continue to grow (perhaps to unwieldiness).
The element DISTANCE will at present generate either distances or scalar products. A separate
vector containing information on the no noise levels desired is read in. Special parameters determine
the type of noise process and the type of configuration generation. One initial value to start a random
routine will be sufficient both to generate configurations and noise processes. It is, however, also
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
148
possible to read in systematic configurations (from cards or from a file generated by yet other
programs) which will then be perturbed by noise processes.
For repeated design one call of DISTANCE will produce s = ne x rep M vectors for each of con
different configurations. In this case FILE - 1 will generate con sets of vectors where each of the sets
is of the form (L, M1, M2,…Ms, G1, G2 ...Gs) and RELATE will pick out the correlations (or
subcorrelation matrices) of interest to be further processed by FILE - 2. When each configuration is
analyzed in different dimensionalities, m, FILE-1 will separate out the various sets of Gm vectors
according to m, or various subcorrelation matrices may be picked out by RELATE.
One job may also include several different MAIN PROGRAMS for each M vector (cfr. Section 4.32).
Furthermore RELATE may be supplemented by MATCH (Cliff, 1966). Before FILE-1 is called one may
also call OS - 1 (Lingoes, 1967) which has been adapted so that rank images of M (M*G and/or M*L cfr.
Section 3.2, p. 38) are outputted and then further sorted out by FILE-1. Again RELATE may pick out
any subcorrelation matrix of interest, however, complex the structure inputted to RELATE by
preceding "elements will be.
In complex cases FILE - 2 may receive a variety of indices of NL,TF and AF (from MATCH, several
MAIN PROGRAMS, several varieties of correlations from RELATE). If desired (transformed or
untransformed) indices may be outputted from FILE - 2 and then the interrelations between all these
indices (or any desired subset) may be studied by a new call of RELATE which now may compute
correlations, or if desired root mean square discrepancies. In Section 3.3 a simple example of this
strategy is reported, else reports from such runs have mostly been tucked away in footnotes.
As implied by the Concluding remarks there are many features incorporated in the present system
which so far has not been explored. One example is to generate configurations with different variance
in different dimensions, this is simple to do by reading in a separate vector in DISTANCE. Similarly
separate vectors can be read in to specify different noise parameters for different points or
dimensions. It might here be mentioned that if it is desired a vector containing stress components for
separate points may be outputted on file from MDSCAL.
A simple extension of the present system will be to include fairly detailed tables in FILE - 2.and
incorporate a double linear interpolation process so that e.g. TF|AF (cfr. Sections 4.6 and 4.7) can be
computed with sufficient precision without the more cumbersome graphical procedures now being
used.
Concerning the general Polynomial Conjoint Analysis program (POLYCON, Young, 1972) it will (in
principle) be fairly simple to incorporate in the present system. Features b) and c), cfr. p. 145, indicate
the necessary additions to POLYCON. For each of the new measurement models in POLYCON a
corresponding extension of DISTANCE can be plugged in, otherwise the system does not need any
major revision (Perhaps other true fit indices may be found appropriate but this will not call for anything
but minor revisions).
It finally remains to give the details of the TF-categories transformation (which is part of FILE - 2).
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
149
For r (Gi, Gj) or r (Mi, Mj) the square root of r is first computed. The succeeding steps are the same as
for r (L, G). Call the starting point then R and TF-categories (TFCAT) is computed as follows:
REAL K
HELP = 1. - SQRT (1. - R * R)
K
= SQRT (1. -HELP * HELP)
IF (K - .956) 2,1,1
1 A = 11.364
B = - 6.864
GO TO 5
2 IF (K - .457) 4,3,3
3 A = 6.012
B = -1.747
GO TO 5
4
A = 3.282
B = -.5
5 TFCAT = A* K + B
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
150
References:
Abelson, R.P. and Sermat, V. (1962) Multidimensional scaling of facial expressions. Journal of
Experimental Psvchology, 63, 546- 554.
Abelson, R.P., and Tukey, J.W. (1963) Efficient utilization of non-numerical information in quantitative
analysis. General theory and the case of simple order. Annual of Mathematical Statistics, 34,
1347 – 1369.
Armstrong, J.S., and Soelberg, P. (1968) On the interpretation of factor analysis. Psychological
Bulletin, 70, 361 -364.
Attneave, F. (1950) Dimensions of similarity. American Journal of Psychology, 63, 516 - 556.
Attneave, F. (1962) Perception and related areas. In S. Koch (Ed.), Psychology: A study of a science,
Volume 4. New York: McGraw-Hill, 619- 659.
Bakan, D. (1966) The test of significance in psychological research Psychological Bulletin, 66, 423 436.
Bannister, D., and Mair, J.M.M (1968). The evaluation of personal constructs. New York. Academic
Press.
Bateson, G. (1955) A theory of play and fantasy. Psychiatric Research Reports, 2, 39 - 51.
Bateson, G., Jackson, D.D., Haley, J., and Weakland, J. (1955) Toward a theory of schizophrenia.
Behavioural Science, l, 251- 264.
Beals, R. Krantz, D.H., and Tversky, A. (1968) Foundatlons of multidimensional scaling. Psychological
Review, 75, 127- 143.
Campbell, D.T., and Fiske, D.W. (1959) Convergent and discriminant validation by the multitraitmultimethod matrix. Psychological Bulletin, 56, 81 -105.
Cattell, R.B. (1962) The relational simplex theory of equal interval and absolute scaling. Acta
Psvchologica, 20, 139 -153.
Cattell, R.B. (Ed.) (1966) Handbook of multivariate experimental psychology. Chicago: Rand McNally.
Cliff, N. (1966) Orthogonal rotation to congruence. Psychometrica, 31, 33 – 42.
Coombs, C. H. (1964) A theory of data. New York: Wiley.
Coombs, C.H, Dawes, R.M., and Tversky, A (1970) Mathematical psychology. An elementary
introduction. New Jersey: Prentice Hall.
Coombs, G., H., and Kao, R.C. (1960) On a connection between factor analysis and multidimensional
unfolding, Psychometrika, 25, 219- 231.
Deese, J. (1962) On the structure of associative meaning. Psychological Review, 69, 161- 176.
Degerman, R. (1970) Multidimensional analysis of complex structure: Mixtures of class and
quantitative variation. Psychometrika, 35, 475 - 490.
Eckart, C., and Young, G. (1936) The approximation of one matrix by another of lower rank.
Psychometrika, 1, 211 - 218.
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
151
Eckblad, G. (1971a) On hierarchical structures in programs and plans. In G. Eckblad (Ed.),
Hierarchical models in the study of cognition. University of Bergen
Eckblad, G. (1971b) Comments on Rommetveit's “On concepts of hicrarchical structures.” Part II. In
G. Eckblad (Ed.), Hierarchical models in the study of cognition. University of Bergen
Edwards, W., Lindman, H., and Phillips, L.D. (1965) Emerging technologies for making decisions. In,
New directions in psychology.#l New York: Holt, Rinehart and Winston.
Edwards, W., Lindman, H., and Savage, L.J. (1963) Bayesian statistical inference for psychological
research. Psychological Review, 70, 93 - 242.
Ekman, G. (1954) Dimensions of color vision. Journal of Psychology, 38, 467- 474.
Fransella, F., and Adams, B. (1966) An illustration of the use of the Repertory Grid technique in a
clinical setting. British Journal of Social and Clinical Psychology, 5. 5 - 162.
Gibson, E.J. (1970) The ontogeny of reading. American Psychologist, 25, 136 - 143.
Gombrich, E .H. (1965) The use of art for the study of symbols. American Psychologist, 20, 34 -50.
Green, B.F. Jr. (1966) The computer revolution in psychology. Psychometrika, .31, 437- 445.
Guttmann, L. (1966) Order analyais of correlation matrices. In R.B. Cattell (Ed.) Handbook of
multivariate experimental psychology. Chicago: Rand McNally. 43-9 - 4-58.
Guttmann, L. (1967) The development of nonmetric space analysis: A letter to professor John Ross.
Multivariate Behavioural Research, 2, 71 –82.
Guttmann, L (1968) A general nonmetric technique for finding the smallest Euclidean space for a
configuration of points. Psychometrika, 33., 469 - 506.
Haley, J. (1963) Strategies of psychotherapy. New York: Grune and Stratton.
Haley J. (Ed.) (1969) Advanced techniques of hypnosis and therapy: New York: Grune and Stratton.
Harmann, H.H. (1967) Modern factor analyis. Second Edition, Revised. Chicago: The University of
Chicago Press.
Henrysson, S. (1957) Applicability of factor analyis in the behavioral sciences: A methodological study.
Stockholm: Almqvist & Wiksell.
Hinkle, D.N. (1965) The change of personal constructs from the viewpoint of a theory of implications.
Unpublished Ph.D. Thesis. University of Colorado
Hinkle, D.N. (1970) The game of personal constructs. In D. Bannister (Ed.), Perspectives in personal
construct theory. New York: Academic Press, 91 - 110.
Indow, T. and Kanazawa, K. (1960) Multidimensional mapping of colors varying in hue, chroma and
value. Journal of Experimental Psychology, 59, 330- 336.
Johnson, S.C. (1967) Hierarchical Clustering Schemes, Psychometrika, 32, 241- 254.
Johnson, S.C (1968) Metric Clustering. Mimeographed Report. Bell Telephone Laboratories. Murray
Hill, New Jersey.
Kelly G. A. (l955) The Psychology of Personal Constructs. Volume I.II. New York: Norton.
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
152
Klahr, D. (1969) A Monte Carlo investigation of the statistical significance of Kruskal's nonmetric
scaling procedure. Psychometrika, 34, 319- 330.
Krantz, D.H. (1972) Measurement structures and psychological laws. Sciemce, 137, 1427 - 1435.
Krantz, D.H. and Tversky, A. (1971) Conjoint-Measurement analysis of composition rules in
psychology. Psychological Review, 78, 151 - 169.
Kruskal, J.B. (1964a) Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis.
Psychometrika, 29, 1- 27.
Kruskal, J.B. (1964b) Nonmetric multidimensional scaling: A numerical method. Psychometrika, 29,
28- 42.
Kruskal, J.B. (1967) How to use MDSCAL, a multidimensional scaling program. Mimeographed
Report. Bell Telephone Laboratories, Murray Hill, New Jersey.
Kruskal, J.B., and Hart, R.E. (1966) A geometric interpretation of diagnostic data for a digital machine:
Based on a study of the Morris, Illinois Electronic Central Office. System Technical Journal,
.45, 1299 - 1338.
Larsen, E. (1972) Valget - Et strategisk og terapeutisk virkemiddel i psykoterapi. En tentativ syntese av
ulike teoretiske tilnærmingsmåter til psykoterapi. Hovedoppgave. Psykologisk institutt:
University of Oslo
Lashley, K.S. (1942) An examination of the "continuity theory" as applied to discriminative learning.
Journal of General Psychology, 26, 241 - 265.
Lazarsfeld, P.F. (1959) Latent structure analysis. In S. Koch (Ed.), Psychology: A study of a science.
Volume 3. New York: McGraw-Hill. 476 - 543.
Lingoes, J.C. (1965) An IBM 7090 program for Guttmann-Lingoes Smallest Space Analysis-1.
Behavioural Science, 10, 183- 184.
Lingoes, J.C. (1966) Recent Computational advances in nonmetric methodology for the behavioral
sciences. In, Procedings of the international symposium: Mathematical and computational
methods in social sciences. of International Computation Center, Rome. 1 -38.
Lingoes, J.C. (1967) An IBM 7090 program for Guttmann-Lingoes Configurational Similarity-1.
Behavioural Science, 12, 502- 503.
Lingoes, J.C., and Roskam, E. (1971) A mathematical and empirical study of two multidimensional
scaling algorithms. Michigan Mathematical Psychology Program, 1.
Mandelbrot, B. (1965) A class of long-tailed probability distributions and the empirical distribution of
city sizes. In, Proceedings of the Seminars of Menthon – Saint - Bernard, France 1-27 July
1960 and of Gösing, Austria 3-27 July 1962). Paris. Mouton & Co. The Hague.
Miller, G.A. (1967) Psycholinguistic approaches to the study of communication. In D. Arm (Ed.),
Journeys in science: Small steps -great strides. Albequerque: The University of Mexico
Press. 22- 73.
Miller, G.A. (1969) A psychological method to investigate verbal concepts. Journal of mathematical
psychology, 6, 169 - 191.
Osgood, C. E. (1969) Introduction. In J.G. Snider and C.E. Osgood (Eds.), Semantic Differential
Tecnique. Chicago: Aldine Publishing Co. vii- ix.
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
153
Osgood, C. E., and Luria, Z. (1954) A blind analysis of a case of multiple personality using the
Semantic Differential. Journal of Abnormal and Social Psychology, 49, 579 - 591.
Osgood, C. E. Suci, G.J., and Tannenbaum, P.H. (1957) The measurement of meaning. Urbana:
University of Illinois Press.
Piaget, J. (1971) Structuralism. London: Routledge and Kegan Paul.
Ramsay, J.O. (1969) Some statistical considerations in multidimensional scaling. Psychometrika, 34,
167 - 182.
Rasch, G. (1960) Probabilistic models for some intelligence and attainment tests. Köbenhavn:
Danmarks pædagogiske institut.
RestIe, F. (1959) A metric and an ordering on sets. Psychometrika, 24, 207- 219.
Rommetveit, R. (1968) Words meanig and messages. Theory and experiments in psycholinguistics.
New York: Akademic Press, og Oslo: Universitetsforlaget.
Rommetveit, R. (1972) Språk, tanke og kommunikasjon. Oslo: Universitetsforlaget.
Roskam, E. (1969) A comparison of principles for algorithm construction in nonrnetric scaling.
Michigan Mathematical Psychology Program, 2.
Shepard, R.N. (1962a) The analysis of proximities. Multidimensional scaling with an unknown distance
function. I Psychometrika, 27 125 - 140.
Shepard, R.N. (1962b) The analysis of proximities. Multidimensional scaling with an unknown
distance function. II Psychometrika, 27, 219- 245.
Shepard, R.N. (1963a) Analysis of proximities as a technique for the study of information processing
in man. Human Factors, 5, 3- 48.
Shepard, R.N. (1963b) Comments on Professor Underwood's paper: Stimulus selection in verbal
learning. In C.N. Cofer and B.S. Musgreave (Eds.), Verbal behaviour and learning: problems
and processes. New York: McGraw –Hilll. 48 - 70.
Shepard, R.N. (1964) On subjectively optimum selection among multiattribute alternatives. In M.W.
Shelley and G.L. Bryon (Eds.), Human judgments and optimality. New York: Wiley. 257 281.
Shepard, R.N (1966) Metric structures in ordinal data. Journal of Mathematical Psyohology, 3, 287315.
Shepard, R.N., and Carrol, J.D. (1966) Parametric representation of non-linear data structures. In
Krishnaiah (Ed.), Multivariate Analysis. New York: Academic Press. 561 -592.
Shepard, R.N. and Chipman, S. (1970) Second-order isomorphism of internal representations: Shapes
of states. Cognitive Psychology, 1, 1 - 17.
Sherman, C.R. (1970) Nomnetric multidimensional Scaling: The role of.the Minkowski metric. Chapel
Hill, North Carolina: L.L.Thurstone Psychometric Laboratory Report. No. 82
Smedslund, J. (1967) Noen refleksjoner om Rorschach-testen. Nordisk Psvkologi, 19, 203 - 209.
Spaeth, H.J., and Guthery, S.B. (1969) The use and utility of the monotone criterion in
multidimensional scaling. Multivariate Behavioral Research, 4, 501 - 515.
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
154
Stenson, H.H., and Knoll, R.L. (1969) Goodness of fit for random rankings in Kruskal’s nomnetric
scaling procedure. Psvchological Bulletin, 71, 122- 126.
Thurstone, L.L. (1947) Multiple factor analvsis. Illinois: The University of Chicago Press.
Torgerson, W.S. (1952) Multidimensional scaling: I. Theory and method.. Psvchometrika, 17, 401 419.
Torgerson, W.S. (1958) Theory and methods of scaling. New York: Wiley
Torgerson, W.S. (1965) Multidimonsional scaling of similarity, Psvchometrika, 30, 379 - 393.
Torgerson, W.S. (1967) Psychological scaling. In, Psvchological measurement theory. Proceedings of
the NUFFIC international summer session in science. Psychological Institute of the
University of Leyden: The Netherlands. 151- 180.
Torgerson, W.S. (1968) Multidimensional representation of similarity structures. In M.M. Katz, J.O.
Cole and W.E. Barton (Eds), The role and methodology of classification in psychiatry and
psvchopathology. Washington, D.C.: U.S. Government Frinting Office. 212 - 220.
Tryon, R.C. (1957) Reliability and behavior domain validity: Reformulation and historical critique.
Psvchological Bulletin, 54, 229 - 249.
Tryon, R.C. and Bailey, D.E. (1970) Cluster analvsis. New York: McGraw-Hill.
Tschudi, F., and Larsen, E. (1970) Notes on Harold Greenwald’s technique: Pointing out the
advantage of the symptom. Mimeographed paper. University of Oslo
Wagenaar, W.A., and Padmos, P. (1971) Quantitative interpretation of stress in Kruskal’s
multidimensional scaling technique. British Journal of Mathematical and Statistical
Psychology, 24, 101 - 110.
Watzlawick, P., Beavin, J.H., and Jackson, D.D. (1967) Pragmatics of human communication. New
York: W.W. Norton.
Weakland, J.H.I, Fisch, R., Watzlawick, P., and Bodin, A.H. (1972) Brief therapy: Focused problem
resolution. Mimeographed Report. Mental Research Institute. Palo Alto, California.
Xhignesse, L.V., and Osgood, C.E. (1967) Bibliographic citation characteristics of the psychological
journal network in 1950 and 1960. American Psychologist, 22, 778 - 792.
Young, F.W. (1968a) A FORTRAN IV program for nonmetric multidimensional scaling. Chapel Hill,
North Carolina: L.L Thurstone Psychometric Laborator Report No. 56.
Young, F.W. (1968b) Nonmetric multidimensional scaling: Development of an index of metric
determinacy. Chapel Hill, North Carolina: L.L. Thurstone Psychometric Laboratory Report
No.68.
Young, F.W. (1970) Nonmetric multidimensional scaling: Recovery of metric information.
Psychometrika, 35, 455 - 473.
Young; F.W. (1972.) POLYCON Users Manual. A FORTRAN-IV program for Polynomial Conjoint
Analysis. Chapel Hill, North Carolina: L.L.Thurstone Psychometric Laboratory Report No.
104
Finn Tschudi (1972)
The latent, the manifest and the reconstructed in multivariate data reduction methods.
155
Young, F.W., and Appelbaum, M.L (1968) Nonmetric multidimentional scaling: The relationship of
several methods. Chapel Hill, North Carolina: L.L. Thurstone Psychometric Laboratory
Report No. 71.
Young, F.W., and Torgerson, W.S. (1967) TORSCA, A FORTAN IV program for Shepard-Kruskal
multidimensional scaling analysis. Behavoural Scence, 12, 498.
Zinnes, J.L. (1969) Scaling. In, P.H. Mussen and M.R. Rosenzweig (Eds.), Annual Review of
Psychology, 20, 143 - 478.
Download