COGS Proposal

advertisement
COGS Proposal – Owain Evans (February 2011)
Project on Computational Theories of Aesthetic Judgments
I will first introduce the ideas behind the project and then describe the project itself in
more detail. Experiments are described on page 12. Most of the ideas below are joint
work with Peli Grietzer (Harvard).
Introduction: Beauty as Compressibility
Jürgen Schmidhuber, an AI theorist and theoretical computer scientist, has proposed a
computational account of aesthetic judgments1. On his view, a stimulus is judged to be
beautiful or attractive by a subject S to the extent that the stimulus is compressible
relative to S’s compression scheme. This notion of compressibility is taken from
computer science and information theory. Roughly, compressible stimuli are those that
can be given a short description (in a compression scheme or formal language) because
they contain structure or repeating patterns. For example, the tessellation and fractal
images (below left) have a simple repeating structure. Whereas the random noise
images (below right) do not contain any pattern.
Schmidhuber’s page http://www.idsia.ch/~juergen/beauty.html contains links to his papers on the
topic.
1
1
There are obvious problems with this account if we take it as a full account of beauty.
A chessboard, while very simple, would rarely be called beautiful. The same goes for a
C major chord played on a piano. These stimuli are certainly not unpleasant, but people
would quickly find them boring2. Moreover, it seems that the most profound aesthetic
experiences often come from complex stimuli: the city of Rome, the philosophy of
Plato or Wittgenstein, art by Picasso, Joyce or Stravinsky. These complex stimuli seem
especially hard to compress. They contain patterns, but it doesn’t seem that the works
as a whole can be reduced to a concise description3.
Schmidhuber argues for explaining beauty as compressibility. However, it may be
better to identity compressibility with attractiveness, pleasantness, niceness, likability
and prettiness. I mean these terms as used in a specific aesthetic sense. For instance,
the description of a face or garden as “attractive” or “nice”, as opposed to an
“attractive” or “nice” job offer. In this aesthetic sense, these terms can sometimes
function as weaker claims along the same scale as judgments of “beauty”. That is,
beautiful things tend to be attractive and pleasant, and are often also nice or pretty4.
(Why care about these attractiveness judgments, from a philosophical or psychological
perspective? Not everyone devotes time to the aesthetic consumption of the higher arts,
but these attractiveness judgments seem omnipresent. They influence the choice of
where to live and how to organize one’s home space, the choice of partners, friends
2
Another issue is that randomness does not always seem unpleasant. Some art works contain major
random elements while being aesthetically pleasing as a whole. See Ellsworth Kelly’s paintings of color
spectra arranged by chance.
3
That they can’t be so reduced is part of what makes them great.
4
This is only a heuristic. Some objects may be beautiful but not have any of the other terms apply to
them.
2
and job candidates, and the choice between different products based on packaging,
advertising and design. In various domains, attractiveness judgments have been shown
to have significant subconscious effects on other kinds of non-aesthetic evaluation
(“the Halo Effect”). Finally, it seems plausible that attractiveness judgments are related
in important ways to the more transcendent aesthetic experiences).
Considering again very simple stimuli, such as a C Major chord, a chessboard, or
Google’s homepage design, it is more plausible to describe these as pleasant or nice,
rather than beautiful. What then makes an object beautiful and what does this have to
do with compressibility? I will discuss some ideas for how to answer these questions
below. For now, I’ll introduce another computational idea from Schmidhuber. The lack
of strong preference for very simple stimuli is explained (in common sense terms) by
their not being interesting. Schmidhuber gives interestingness a simple formal analysis
in terms of compressibility. Whereas beauty (or attractiveness, etc.) is the subjective
compressibility of a stimulus, interestingness is the rate at which the subjective
compressibility changes over time:
Beauty (etc.) of stimulus S for subject T = # bits in T’s mental representation of S
Interestingness of S for T
= rate of change in the # of bits used to represent S by T over time
= d(Beauty) / d(Time)
Schmidhuber then tries to account for people’s aesthetic preferences in terms of these
two notions. I will discuss some limitations of these two concepts below. Before doing
that, I want to focus on compressibility as an account of aesthetic attractiveness. I will
first summarize existing experimental and conceptual support for the view, and then
discuss some new experiments for testing this theory.
3
Empirical Evidence for the Compression View
Schmidhuber presents a range of anecdotal examples in favor of his proposal 5.
However, there is body of literature in experimental psychology that broadly supports
his theory:

Research going back the Gestalt psychologists has shown the people prefer
abstract images with more symmetry, with more redundancy, with strong
figure/ground contrast (especially when making judgments quickly). These are
all features of images that make them more compressible.

A body of work has shown that “mere exposure” to a stimulus (e.g. a Chinese
character the subject has never seen before) makes a stimulus more attractive.
A weaker effect is observed if the mere exposure is to a similar stimulus. The
effect is robust and holds for a range of stimuli. A fun related fact is that people
prefer the mirrored version of their own face but the non-mirrored version of
their friend’s faces.

People find prototypical6 instances of a concept more attractive than instances
further from the prototype. This holds even when the concepts are randomly
generated dot patterns. People prefer prototypical (or most common) chord
progressions, prototypical instances of animals, and prototypical (or averaged)
faces. Once you have a prototype, you can store instances just in terms of
deviation from the prototype. Thus, the prototype itself will be most
compressible, and compressibility will decrease with distance from the
5
He gives examples of images generated by very simple programs and notes that they are fairly
attractive. There are many other anecdotes that are explained by the theory: Why people find unpleasant
other people’s messy rooms but much less their own (they’ve seen enough of the room to being to
compress it and they might know the internal logic of the arrangement of stuff in the room). Why a wellstocked and orderly store can have many more objects than a messy room but not be unpleasant (the
order makes it compressible). Why it’s worse to overhear someone talking on a cellphone than to
overhear a conversation where you hear both parties (you can’t compress the conversation because you
are lacking lots of relevant information). Why unpredictable torture is especially unpleasant. Why faces
tend to look more attractive from a distance (or with blurred vision) or in dim light (hence club and bar
lighting) or when the person is wearing sunglasses (which remove from view the eyes—a very
informative part of the face). Why people (with no idea of the relevant mathematics) like fractal images.
[Accents example].
6
The prototype could be the average or the most stereotypical instance. There is an extensive literature
in psychology on the relevance of prototypes to various aspects of cognition.
4
prototype. The compressibility of prototypes may explain the mystical Platonic
idea that the forms (i.e. prototypes) are more beautiful than their instances.
Some of this empirical evidenced has been used to argue for the theory that
attractiveness judgments depend on the “processing fluency” of the stimulus7, which
seems to consist in how quickly or accurately the stimulus can be categorized or
otherwise used in thought. This notion is closely related to compressibility, and I won’t
at this point go into the details of distinctions that can be made between the two
notions. Another notion, related both to the speed of categorization of a stimulus and
its compressibility, is the probability that the subject assigns to the stimulus. There are
theoretical reasons (as well as some recent empirical reasons) to think that the
probability and compressibility of stimuli for humans will be very closely related. That
is, more probable stimuli will be more compressible8. Again, I won’t discuss
experiments here that could be used to distinguish between these notions in human
judgment. The main point here is that there is a closely related set of formal notions
(compressibility, probability, categorizability) that will often be correlated in
experiments. Schmidhuber focuses on compressibility (for various good reasons) but
the project of investigating his theory of aesthetics is really a project of investigating
how this family of notions relates to aesthetic judgments.
Despite the body of work in support of Schmidhuber’s theory, there remains lots of
empirical work to be done. First, most experiments have looked at extremely simple
stimuli for which aesthetic judgments are likely to be very mild. It’s important to see if
the theory does equally well in accounting for judgments made about larger, more
cognitively demanding stimuli. Second, even though many of the stimuli used have
been extremely simple 2D abstract patterns, the models presented of these judgments
have mostly been informal or qualitative. For example, the experiments involving
processing fluency test their theory by looking at the correlation between
categorization speed and attractiveness. They expect that categorization speed will
correlate in some way with attractiveness. However, they don’t have a model of what
about stimuli makes them quickly categorizable. To make predictions about
attractiveness, they need to first gather data about categorization speed, as it’s hardly
7
8
“Processing fluency and aesthetic pleasure”, Reber, Schwarz, Winkielman (2004).
This correspondence is discuss at length in Mackay’s book on information theory.
5
transparent (in general) what will make a categorization task hard or easy. In contrast,
compressibility is a well understood computational notion. For an arbitrary
experimental setup, we can compute the compressibility of the stimuli (it’s a precise
numerical quantity) and then compare these quantitative predictions to human
judgments. Of course, there are practical problems with testing the compression theory.
We don’t necessarily have access to the compression scheme that people are using and
we don’t know how good people are at solving the hard computational problem of
finding the optimal compression for a stimulus. To the extent that we can learn about
the compression scheme and computational capacity (the latter of which will be
especially difficult because it may be very heterogeneous), we will be able to make
precise quantitative predictions about attractiveness for a very wide range of stimuli9. I
discuss concrete plans for experiments below.
Plausibility arguments for the Schmidhuber theory
Here is a brief discussion of how compression theory could account for various
properties that aesthetic judgments seem to have.
1. Beauty or attractiveness are experienced as properties of their objects
Once we fix an individual with a particular compression scheme (or internal language
of thought), the compressibility of a stimulus will depend only on properties of the
stimulus. However, the actual compression achieved by the subject can vary depending
on the computational efforts of the subject, because finding the patterns in a stimulus
that allow it to be compressed is a hard computational problem 10 and may require time
and effort. This explains why the attractiveness of a stimulus may not be immediate
(e.g. music or an intricate image requiring repeated listening or viewing) but that the
experience of something as attractive seems to be an experience of features of the
image itself. (For contrast, consider the case where one feels euphoria or joy while
perceiving a stimulus without attributing one’s affect to features of the stimulus).
It’s worth noting that one limitation of the categorization theory is its lack of generality. People may
make aesthetic judgments about stimuli they’ve never had to categorize.
10
Finding the shortest Turing machine that outputs a given n-bit string is uncomputable (Kolmogorov).
9
6
One might think that this aspect of the phenomenology of beauty or attractiveness
clashes with the relativity of compressibility to a language or compression scheme. If
people are aware that the compressibility of an object varies depending on compression
scheme (for example, one’s compression scheme could change with time and different
people could have different schemes) then we might expect their experience of beauty
to reflect that awareness. (For comparison: when deeming something to be exciting or
novel, we seem to be more aware of the subjectivity of this property. “It was really
exciting for me” vs. “It was really beautiful for me”). However, compressibility, at
least on the Kolmogorov model, is subjective only up to a small constant number11. For
large stimuli (i.e. those which are represented in uncompressed form by a large number
of bits) compressibility is almost the same across different compression schemes. This
predicts that people (if aware of the relativity of their judgments at all) should
acknowledge more relativity in judgments about small stimuli than large ones12. (Some
potential problems for this line of argument are the following. First, computers can
deal with enormous inputs, but it’s not clear whether humans can in the same way.
Second, it may be human compression schemes vary more than existing programming
languages).
2. Variability in judgments of beauty often isn’t based on mere brute preference
Compression can also help explain the intersubjective variability of aesthetic
judgments. Two subjects can vary in how much they compress a subject for two
different reasons. First, one subject is better able to solve the computational problem of
finding a compressed representation. This would correspond to different aesthetic
judgments arising from different skill levels in finding patterns or structure that both
subjects would acknowledge as existing. For example, someone may judge a piece of
music or a movie to be bad because he fails to recognize the structure behind the
superficial disorder (e.g. due to being distracted during the performance or lacking
practice in lateral thinking of the right sort). In this kind of case, it can be possible to
11
The size of the compressed representation of stimulus S w.r.t. scheme T = The compressibility of
stimulus S w.r.t. scheme T’ + a small constant c (where c is fixed and so shrinks in significance as the
sizes of the compressed representation increase).
12
There is some difficulty here of translating in a principled way familiar stimuli like faces, natural
scenes and works of art into digital representations of different sizes. Should a representation of a piece
of abstract visual art preserve every detail that is visible when looking at the canvas up close? On the
other hand, it could be easy to rule that a four-bar piano tune is smaller than an hour long symphony..
7
show someone the structure and have them experience the beauty for themselves or at
least recognize their initial mistake. However, in some cases the difference in skill
levels between subjects may be too great.
Compressibility can also vary because of differences in compressive scheme.
Consider the first fifty digits of pi. This can be expressed in a short English phrase.
Likewise, we can write very short computer programs in most programming languages
which outputs these fifty digits13. However, we can imagine some hunter-gatherers
who would have no similarly short representation of pi’s first fifty digits. They would
have no name for the number (like ‘pi’ in English) and no names for concepts that
make it easy to pick out by definite description (as in ‘the ratio of the circumference of
a circle to its diameter’ or ‘the area of the unit circle’). Some variability in aesthetic
judgments may be the result of different compressive schemes. From a large body of
research, we know that American subjects like arbitrary Chinese characters more if
they have had more recent exposure to them (“the mere exposure effect”). Here we can
suppose that exposure to the character leads to better compression for the character.
We can expect this to make a difference to people’s appreciation of works of art. For
instance, complex forms like the outline of the British Queen’s head or the outline map
of the US (as in Johns’ famous work) would appear as simple, recognizable elements
for people familiar with them and would look random otherwise. A large classical
portico may look more anomalous on a colorful, wooden C19th house for someone
without any experience of the portico as a standard architectural unit. The same kind
of phenomenon may explain people’s first reaction to music based on very different
scales or conventions. People may find the music structure-less and very hard to
remember with much accuracy. Musical training may consist both in developing skills
for discerning pre-existing patterns and in teaching the arbitrary conventions that are
employed in a particular musical tradition14.
For example, you use the Taylor expansion. pi / 4 = 1 – x^3 / 3! + x^5 / 5! - ....
Likewise, someone learning about medieval art may have to spend some time learning the arbitrary
conventions for how various Christian scenes were to be painted. Mere exposure may also explain part
of the phenomenon that the fashions of previous decades look ridiculous to us now while today’s
fashions look great.
An additional point: Gaining the ability to compress something very contingent like a particular
person’s face or the outline of the particular country implies that some other stimulus has been less
compressible. That is, changing a compression scheme to fit some frequently occurring stimulus makes
you worse at compressing some other stimulus. So we might expect that whenever mere exposure makes
something more attractive, there are some stimuli that must become less attractive than they were before.
This could be tested by exhaustively testing some range of stimuli (e.g. all 10-bit strings). Possible
13
14
8
I’ve put variation in compression schemes forward as a theoretical option that
could do work in explaining variation in aesthetic judgments. It’s plausible that it is
relevant to some of the cases noted in the previous paragraph, but its wider significance
is unclear. As discussed in the previous section, the compressibility of large stimuli is
almost independent of compression scheme15. Hence we should only expect limited
intersubjective variation based on different compression schemes.
3. Beauty and attractiveness are applied to a diverse range of phenomena
Here are some things that humans find beautiful or attractive (where “attractive” is
meant in the aesthetic sense):

Paintings and sculpture, ranging from photorealistic to very abstract

Architecture, videogame virtual worlds, furniture and other man-made objects

Highly organized and structured sequences of sounds (i.e. music), some of
which may resemble everyday sounds (animal or human voices, leaves rustling)
and some of which do not

Sunsets, coastal areas, waterfalls, canyons, flowers and trees, people, animals
and other natural objects

Dances

Poems and other literary works

Proofs of mathematical theorems

Scientific, mathematical or philosophical ideas
Some of these objects seem to be beautiful in different senses. Maybe an idea is
beautiful in part because of its explanatory power. However, it’s plausible that the first
items on the list (works of art and natural objects) can all be beautiful or aesthetically
attractive in the same way16. It’s also plausible that there is at least some important
similarity between this kind of beauty and the beauty of mathematical proofs and
intellectual ideas.
implication that if you have to memorize lots of arbitrary structure in order to store (say) familiar faces
or concepts in compressed format, there will be other faces or concepts that you store less well as a
result. This issue may be interestingly relevant to the phenomenon of recognition. There’s an
experiential difference between recognizing (say) your dog as your dog and recognizing the same dog as
a very prototypical dog (or prototypical Dalmatian).
15
At least on the Kolmogorov model.
16
Works of art can have aesthetic properties that natural objects cannot have. Poems, for instance, can
be aesthetically impressive in part because of the richness of the ideas they express. However, it is also
possible for poems or paintings to be attractive, even beautiful, without expressing very rich ideas.
Similarly, a musical tune or dance can be very pretty without having any other aesthetic virtues.
9
So, what do these objects have in common, apart from being called “beautiful” or
“attractive”? They do not seem to share many non-trivial intrinsic properties. Yet they
are all things that humans like, and that provoke a positive affective response in
humans. The problem is that there are other things that humans like which we don’t
call beautiful or attractive. For example: the experience of satisfying a strong urge (e.g.
for food or drink, or physical warmth), the experience of one’s own team winning, the
experience of seeing a friend after a long time apart, the experience of an outcome
being morally just (from one’s standpoint). There are also sensory experiences that
seem to have more in common with the attractive/beautiful experiences listed above
but which are not usually described in the same terms. We can have rich and satisfying
experiences from the senses of touch, smell and taste, but we are less likely to describe
these experiences in terms of beauty or attractiveness 17. Likewise, most non-music
sound experiences (e.g. the sea, traffic, glass shattering, animal calls) can be pleasant
or unpleasant, but can’t be beautiful in the way that even a simple melody played on a
piano can be.
The compression theory suggests an answer to these questions. The items on the
list above (at least up to math proofs) are the kind of stimuli from which humans
construct rich mental representations. These representation can start complex but then
become smaller due to the discovery of regularities or redundancies in the initial
representation. This allows the phenomena in question to be generated from a simpler
mental program (i.e. set of stored instructions). In the case of natural scenes or natural
objects, finding structure in these stimuli is finding regular structure in the world and
so has clear practical use in navigating the world. If you realize that all the apples on a
tree (or in an orchard) look a particular way (and all look that way), then you have a
simple rule for identifying this tree (or orchard) in the future. Also, you can infer that
any apple’s falling from the tree will be the same, and so your being hit by an apple
will have the same effect wherever you are standing under the tree 18. So the experience
of finding patterns in a set of visual experience will often be the experience of finding
structure in the world around us. This holds much less in the case of other sense
modalities. The senses of taste and touch are often involved in actions that manipulate
As David Chalmers notes in “Fall from Eden”, we are also less likely to describe them as veridical or
non-veridical.
18
Similarly: if I look from above and see that a river splits into two very similar streams, then I can
exploit this similarity by just storing information about one of the rivers.
17
10
or alter the world (regardless of what its previous state was) in ways we find desirable.
Actions of eating or touching generally change aspects of the world, whereas vision is
used to merely observe (without manipulating) parts of the world in which we hope to
discern regularity or structure. The senses of smell and hearing (in the non-music nonspeech case) are clearly used for non-destructive “observation”. However, they seem to
be mostly used for identifying particular objects or events in the world. With vision,
we are given an initial visual presentation of a scene and then seek to find patterns in it
(e.g. all apples are the same color and size, or all the houses on one side of street are
built in the same style). It’s harder to think of cases where we have an initial smell or
sound percept and then subsequently discover some underlying regularity in that
percept19.
So we have a general observation: the kind of experiences we call attractive or
beautiful are those in which we have an initial mental representation in which we seek
to find regularities (i.e. structure that makes for a more compressed representation).
This explains why some sensory modalities (touch, smell, taste, most instances of
hearing) do not typically produce such experiences. This also explains why the
desirable experiences listed above (satisfying urges, having one’s team win, etc.) are
not described using these aesthetic terms. What to say about beautiful ideas? Take
Newtonian Mechanics. As a successful scientific theory, Newtonian Mechanics allow a
compression of a huge number of empirical phenomena. They explain not just
mechanical interactions on earth, but the motions of the planets and the moon. At the
same time, the laws themselves can be written on a postage stamp20. Thus you have a
huge compression of empirical evidence whose source, when written down in
mathematical terms, is extremely concisely expressible. (Compare with a theory that is
similarly successful empirically but has a large number of special cases for different
domains. This theory would be less beautiful though still have the same power to
compress empirical evidence).
Not to say we couldn’t, and in cases where we do, maybe “beauty” and “attractiveness” would start to
be applied to such experiences.
20
They are simple once you have vector calculus. But vector calculus is a very simple and obvious idea
in mathematics that has many applications in mathematics itself as well as in areas outside physics.
19
11
Plan for work over next few months
Our first goal is quantitative testing of the compression theory of attractiveness
judgments. This involves devising and implementing computational models of human
learning and compression abilities, and testing the models against human experimental
data. We have designed three experiments for this purpose.
Experiment 1: Learning prototypes from noisy instances
As noted above, previous experimental work has shown that people find prototypical
instances of a concept more attractive than atypical instances. In Experiment 1, we will
provide a computational model for learning prototypes. As someone learns a prototype,
images very close to the prototype will become increasingly probable (and so
compressible). We thus predict that the very same image will become more attractive
over the course of learning.
We will also test the converse effect in order to control for mere exposure. If
subjects start with a slightly unrepresentative set of samples from the prototype, then
an image may seem more compressible early on in their learning than it does having
learned the prototype21.
Finally, we will consider stimuli that have a very simple compositional structure.
Since compositional operations (e.g. placing two prototypes side-by-side in the same
image) are very computationally simple, we expect that compositional stimuli will be
judged attractive as a simple function of the attractiveness of the components.
The stimuli for the experiment will be tiled images. A very similar set of stimuli was
used successfully by Winkielmann et al. Some examples of the kind of stimuli we’ll
use are below. The large image shows a prototype. Surrounding it are stimuli randomly
There’s a worry that if the starting sequence is too misleading, subject will think that they are being
misled and that there’s not a true i.i.d. process generating images. A difficulty here is that there are
multiple Bayesian algorithms for finding the prototype and so it’s not clear which one to try to fit human
judgments for this task. I guess you could find the one that has best overall fit and use that to make
predictions about how compressible the test image should seem after the misleading evidence.
Alternatively, we could teach subjects one prototype relative to which an image X is prototypical. Then
spend more time showing another, different prototype, relative to which X is atypical. This should result
in X looking less attractive.
21
12
generated from this prototype according to different models of noise. The examples
also vary in how much noise is applied in each case.
5
5
5
10
10
5
15
15
5
10
10
15
5
10
5
10
15
10
15
5
5
10
10
15
5
10
15
15
5
5
10
15
10
15
55
55
55
55
5
1010
51010
1010
1010
1515
1515
55
1010
1515
1515
55
1010
1515
10
15
15
1515
55
1010
1515
55
1010
1515
10
55
55
15
1010
55
151010
1010
5
10
1515
1010
1515
1010
15
5
1515
55
55
10
1515
55
1010
1515
15
1515
55
1010
1515
55
1010
1515
Experiment 2: Learning Taboo Regions
In Experiment 1, subjects judge the attractiveness of images that are very similar to the
stimuli from which they learn prototypes. However, the compression theory predicts
that any images that become more compressible on learning the prototype should
become more attractive, not just very similar images. For instance, if subjects learn a
particular prototype from noisy images like those shown above, they should find
images involving simple computable transformations of that prototype more attractive
(i.e. more attractive after the learning than before). Some examples of simple
transformations are shown below:
13
15
15
10
10
5
5
5
10
15
5
10
15
15
5
10
5
10
15
510
15
5
10
5
10
10
5
15
15
10
5
10
15
10
15
510
15
10
10
5
510
5
15
5
5
510
10
15
15
5
10
5
5
15
15
5
10
10
10
5
15
5
15
15
10
5
10
15
15
5
5
5
15
10
5
15
5
10
5
15
15
15
5
10
10
10
10
15
10
5
5
10
15
15
5
15
15
10
15
5
10
15
5
10
15
10
15
15
5
10
15
5
10
15
5
5
Experiment 2 will test this prediction about the attractiveness of transformations of
10
5 newly-learned
10
5 based on a “taboo region”, an area of the 2D
prototypes. Stimuli are
15
15
5
10
15
plane where no black pixels occur. Subjects see many images that respect the taboo
10 region.
10will have learned (at least subpersonally) that in
After enough examples, they
all the images they’ve seen, certain areas of the image have always been white. This
15
15
learning of the Taboo Region should
make it more compressible for them. Subjects
5
10
15
5
10
15
will then be presented with the Taboo Region. We predict that it will be more attractive
after the learning stage of the experiment than before. Here’s an example of the kind of
stimuli we will use:
14
10
10
10
20
20
20
30
30
30
40
40
10
20
30
40
40
10
20
30
40
10
10
10
20
20
20
30
30
30
40
40
10
20
30
40
20
30
40
10
10
20
20
20
30
30
30
40
10
20
30
40
20
30
40
10
20
30
40
10
20
30
40
40
10
10
40
10
40
10
20
30
40
5
10
15
Taboo Region
20
25
30
35
40
5
10
15
20
25
30
35
40
The nine images on the previous page all have no black pixels in the taboo region
(shown directly above). Subjects would be shown a set of training stimuli like the nine
images above (but more numerous and with images that make the taboo region less
obvious than those above) and then shown the taboo region. We can also test subjects
on simple transformations of the taboo region. For instance, the size and position of
15
some elements of the taboo region could be varied. Or, if multiple taboo regions are
learned by subjects in the training phase, we can present them images that combine
multiple taboo regions simultaneously.
Finally, the Taboo Region experiment may be a starting point for investigating the
importance of the change in compressibility of a stimulus. If subjects see a large
number of noisy images that respect the taboo region, then we can imagine that they
will learn the taboo region very well by the time they see it. In that case, seeing the
taboo region itself will be compressible, but it won’t provide them any increased
ability to compress previously seen images. However, if subjects have seen only a
small number of noisy images, they will have some uncertainty about the properties of
the taboo region before they see it. Thus, seeing the region in its entirety will
presumably give them a more compressed model of their past observations22. Note that
there are two related notions of an increase in compressibility over time:
1. Increased compressibility for a single fixed image
Someone could be viewing a static image and discover patterns in the image
after staring at it for long enough.
2. Increased compressibility of past observations attained by viewing or reading a
new stimulus
Someone can come to recognize a pattern in previous observation by viewing a
new stimulus that communicates the pattern.
Seeing the taboo region will provide the second kind of compression gain. Of course,
every subsequent image viewed in the training phase will make previous images more
compressible. However, the taboo region (depending on how exactly subjects interpret
it) will provide subjects with a concentrated compression increase.
Experiment 3: Investigating people’s existing compression scheme
A general problem with testing the compression theory is that we don’t have access to
people’s compression scheme. The previous two experiments got around this because
22
Not fully clear on this. Question is how good a compression they could get with a distribution over
taboo regions. I think the basic claim here is correct.
16
they involved a learning task, where we can predict that subjects will alter their
compression scheme in a particular direction. Thus, after the learning task, a class of
stimuli will become more compressible than they were before. We can’t say, however,
how compressible they were before or how compressible they become relative to other
stimuli we haven’t tested. Experiment 3 aims to learn subjects’ existing compression
scheme in a particular domain and then use this to make a much wider range of
attractiveness predictions.
The basic idea for learning a compression scheme is as follows. Take some
simple domain such as n-bit 1D images or nxn 2D images. Get people to judge the
probability of a substantial sample of the images. Now find a compression scheme that
fits the probabilities for the human judgments. This will allow you to generalize to
untested images and (plausibly) to images of a larger size.
This kind of design was employed by Tenenbaum and Griffiths (2003). They
constructed a model that successfully predicted human probability judgments for
sequences of 8 coin tosses (which is a similar case to 8-bit images). A simple extension
of their approach would be to use their model to predict attractiveness judgments.
However, 8-bit strings are probably not the most interesting set of stimuli for getting
attractiveness judgments. It might be better to use the Tenenbaum and Griffiths model
as a starting point for judgments about n x n images. One could take samples of (say)
10x10 images, and augment the model as suggested by subject judgments. (We might
imagine that images which involve iterations of a shape or structure will be seen as
simple. For instance, the image below with a simple shape shown in three size variants.
So we need a compression scheme that makes this possible).
17
The road ahead: compressibility in art and science
One aim of this project is to connect the attractiveness judgments made about the
simple stimuli above to aesthetic responses to art and science. I will sketch some ideas
that are currently very rough. We have lots more material in preparation but it is not
very organized.
Our basic idea23 is that the aesthetic properties of ideas in mathematics24, philosophy
and science come from their ability to compress lots of phenomena (e.g. empirical
evidence or true mathematical statements) while themselves being simple (i.e. short,
elegant). For contrast, there are intellectual works with elegant theories that are
arguably not very explanatory (e.g. various mathematical models in economics and
cognitive science, false theories in the history of physics, theories like logical
positivism in philosophy or an ontologically austere religious metaphysics) and works
with inelegant theories that may nevertheless be explanatory (e.g. works in historical
sciences, history, ecology, meteorology, and philosophical works by Rawls).
The phenomena that science or philosophy explain are not always phenomena that
we have full or conscious awareness of. In a murder mystery or a long narrative joke,
we are first presented with a curious sequence of events and then receive an
explanation (compression) of these events in concentrated form at the end of the
mystery or joke. Everyone who sees the ending of the mystery or joke holds in
memory the events that the ending will explain25. In science or philosophy, the
explananda of the theory may be phenomena that we only become fully aware of after
23
Our ideas are related to ideas that Schmidhuber expresses in short passages in various papers.
However, his treatment (though insightful) is very brief, and so we’ve gone beyond it in various ways.
24
The case of mathematics is somewhat different from the scientific cases. In science there is a diverse
body of empirical phenomena that we seek to compress by finding regularities in the data. In
mathematics, one might suppose that almost all of the interesting facts of mathematics have already been
compressed (to a dramatic degree) by rigorization of calculus in the C19, and then by the full
axiomatization in terms of set theoretic axioms and formal logic. One approach is to think about a set of
theorems in a particular area of math as providing a compressed (relative to human limited inferential
abilities) representation of the known mathematic facts in that area. Thus, set of theorems compresses an
area by making it tractable for a mathematician, i.e. making it relatively easy for him to decide
statements in that area (this is crucial for practical applications of an area of math, where particular cases
matter). This is not compression in the formal sense, and there’s a serious issue of which notion is more
important in human cognition.
25
It’s noteworthy that we care about having a murder mystery explained even though the explanandum,
a series of fictional events, has no possible practical relevance. Still, scientists and philosophers care
about explaining many phenomena whose understanding would have little practical significance.
18
learning about the theory. Successful theories will often impact us in two ways. First,
they show us a new sphere of phenomena, teaching us to look more closely at everyday
happenings or to recognize rich structures at very small or large scales. Second, they
provide a compression of these new phenomena, as well as compressing phenomena
with which we are more familiar. Both aspects are important to the overall appeal of
scientific inquiry but (on the compression account) it is the second that should be most
important aesthetically.
Works of art, like scientific works, can alert us to novel phenomena and explain or
connect up a diverse range of familiar phenomena. Sometimes it makes sense to see a
work of art as communicating a generalization about an aspect of the world26. Often,
works of art exhibit a connection or correspondence. This kind of correspondence
could be related to compression, but it’s not clear exactly how27. As with scientific
works, it is this semantic28 component of art (the aspect which points out from the
work of art to things in the world) as well as the syntactic component (elegance and
beauty in the syntax itself) that contribute to the aesthetics of the work of art. (With art
it may be very hard separate or distinguish the contributions of these two components
to the overall aesthetic effect). In science, it is often easy to paraphrase the semantic
component. In art, this can be much more difficult. For example, it is hard to explain in
general terms the way in which art conveys mood, sensibility, emotion (especially in
the case of music) or correspondences and relations between different kinds of
experiences29. The key question for our project is not to characterize the effects
produced by art (and its means of producing them). The question is whether the
aesthetic judgments that we apply to art can be understood in terms of compression.
Some of the things that art conveys (mood, emotion, insights into subjective
experience, all kinds of generalization about the social and moral world) can be
One simple example is in literature’s displaying prototypical individuals or situations. We might
describe real individuals in terms of fictional characters (just as we might use historical figures for the
same purpose).
27
Showing a correspondence between X and Y can allow you to more parsimoniously represent Y by
storing it in terms of X. The more parsimonious representation of Y may make certain cognitive
operations (generating predictions or logical consequence, testing consistency, storing in memory, etc.)
easier.
28
I’m using “semantic” very loosely here.
29
Davidson on metaphor and Peacocke on the perception of emotion in music are examples of accounts
how art can convey “content” (construed broadly). Both writers are concerned with how these
phenomena relate to human cognition more broadly.
26
19
conveyed by non-artistic means30 that aren’t usually seen as candidates for beauty or
other aesthetic judgments. So there is some reason to think that some of the more
distinctive effects of art are not necessarily bound up with art’s aesthetic effects.
30
You can learn about these things by reading non-fiction, or hanging out with the right kind of people
or in the right cities. These are all kinds of things that can be beautiful, but they are not necessarily
beautiful in their ability to communicate the kind of things that art communicates. Of course, these
claims are contentious and need to be developed in much more detail.
20
Download