COGS Proposal – Owain Evans (February 2011) Project on Computational Theories of Aesthetic Judgments I will first introduce the ideas behind the project and then describe the project itself in more detail. Experiments are described on page 12. Most of the ideas below are joint work with Peli Grietzer (Harvard). Introduction: Beauty as Compressibility Jürgen Schmidhuber, an AI theorist and theoretical computer scientist, has proposed a computational account of aesthetic judgments1. On his view, a stimulus is judged to be beautiful or attractive by a subject S to the extent that the stimulus is compressible relative to S’s compression scheme. This notion of compressibility is taken from computer science and information theory. Roughly, compressible stimuli are those that can be given a short description (in a compression scheme or formal language) because they contain structure or repeating patterns. For example, the tessellation and fractal images (below left) have a simple repeating structure. Whereas the random noise images (below right) do not contain any pattern. Schmidhuber’s page http://www.idsia.ch/~juergen/beauty.html contains links to his papers on the topic. 1 1 There are obvious problems with this account if we take it as a full account of beauty. A chessboard, while very simple, would rarely be called beautiful. The same goes for a C major chord played on a piano. These stimuli are certainly not unpleasant, but people would quickly find them boring2. Moreover, it seems that the most profound aesthetic experiences often come from complex stimuli: the city of Rome, the philosophy of Plato or Wittgenstein, art by Picasso, Joyce or Stravinsky. These complex stimuli seem especially hard to compress. They contain patterns, but it doesn’t seem that the works as a whole can be reduced to a concise description3. Schmidhuber argues for explaining beauty as compressibility. However, it may be better to identity compressibility with attractiveness, pleasantness, niceness, likability and prettiness. I mean these terms as used in a specific aesthetic sense. For instance, the description of a face or garden as “attractive” or “nice”, as opposed to an “attractive” or “nice” job offer. In this aesthetic sense, these terms can sometimes function as weaker claims along the same scale as judgments of “beauty”. That is, beautiful things tend to be attractive and pleasant, and are often also nice or pretty4. (Why care about these attractiveness judgments, from a philosophical or psychological perspective? Not everyone devotes time to the aesthetic consumption of the higher arts, but these attractiveness judgments seem omnipresent. They influence the choice of where to live and how to organize one’s home space, the choice of partners, friends 2 Another issue is that randomness does not always seem unpleasant. Some art works contain major random elements while being aesthetically pleasing as a whole. See Ellsworth Kelly’s paintings of color spectra arranged by chance. 3 That they can’t be so reduced is part of what makes them great. 4 This is only a heuristic. Some objects may be beautiful but not have any of the other terms apply to them. 2 and job candidates, and the choice between different products based on packaging, advertising and design. In various domains, attractiveness judgments have been shown to have significant subconscious effects on other kinds of non-aesthetic evaluation (“the Halo Effect”). Finally, it seems plausible that attractiveness judgments are related in important ways to the more transcendent aesthetic experiences). Considering again very simple stimuli, such as a C Major chord, a chessboard, or Google’s homepage design, it is more plausible to describe these as pleasant or nice, rather than beautiful. What then makes an object beautiful and what does this have to do with compressibility? I will discuss some ideas for how to answer these questions below. For now, I’ll introduce another computational idea from Schmidhuber. The lack of strong preference for very simple stimuli is explained (in common sense terms) by their not being interesting. Schmidhuber gives interestingness a simple formal analysis in terms of compressibility. Whereas beauty (or attractiveness, etc.) is the subjective compressibility of a stimulus, interestingness is the rate at which the subjective compressibility changes over time: Beauty (etc.) of stimulus S for subject T = # bits in T’s mental representation of S Interestingness of S for T = rate of change in the # of bits used to represent S by T over time = d(Beauty) / d(Time) Schmidhuber then tries to account for people’s aesthetic preferences in terms of these two notions. I will discuss some limitations of these two concepts below. Before doing that, I want to focus on compressibility as an account of aesthetic attractiveness. I will first summarize existing experimental and conceptual support for the view, and then discuss some new experiments for testing this theory. 3 Empirical Evidence for the Compression View Schmidhuber presents a range of anecdotal examples in favor of his proposal 5. However, there is body of literature in experimental psychology that broadly supports his theory: Research going back the Gestalt psychologists has shown the people prefer abstract images with more symmetry, with more redundancy, with strong figure/ground contrast (especially when making judgments quickly). These are all features of images that make them more compressible. A body of work has shown that “mere exposure” to a stimulus (e.g. a Chinese character the subject has never seen before) makes a stimulus more attractive. A weaker effect is observed if the mere exposure is to a similar stimulus. The effect is robust and holds for a range of stimuli. A fun related fact is that people prefer the mirrored version of their own face but the non-mirrored version of their friend’s faces. People find prototypical6 instances of a concept more attractive than instances further from the prototype. This holds even when the concepts are randomly generated dot patterns. People prefer prototypical (or most common) chord progressions, prototypical instances of animals, and prototypical (or averaged) faces. Once you have a prototype, you can store instances just in terms of deviation from the prototype. Thus, the prototype itself will be most compressible, and compressibility will decrease with distance from the 5 He gives examples of images generated by very simple programs and notes that they are fairly attractive. There are many other anecdotes that are explained by the theory: Why people find unpleasant other people’s messy rooms but much less their own (they’ve seen enough of the room to being to compress it and they might know the internal logic of the arrangement of stuff in the room). Why a wellstocked and orderly store can have many more objects than a messy room but not be unpleasant (the order makes it compressible). Why it’s worse to overhear someone talking on a cellphone than to overhear a conversation where you hear both parties (you can’t compress the conversation because you are lacking lots of relevant information). Why unpredictable torture is especially unpleasant. Why faces tend to look more attractive from a distance (or with blurred vision) or in dim light (hence club and bar lighting) or when the person is wearing sunglasses (which remove from view the eyes—a very informative part of the face). Why people (with no idea of the relevant mathematics) like fractal images. [Accents example]. 6 The prototype could be the average or the most stereotypical instance. There is an extensive literature in psychology on the relevance of prototypes to various aspects of cognition. 4 prototype. The compressibility of prototypes may explain the mystical Platonic idea that the forms (i.e. prototypes) are more beautiful than their instances. Some of this empirical evidenced has been used to argue for the theory that attractiveness judgments depend on the “processing fluency” of the stimulus7, which seems to consist in how quickly or accurately the stimulus can be categorized or otherwise used in thought. This notion is closely related to compressibility, and I won’t at this point go into the details of distinctions that can be made between the two notions. Another notion, related both to the speed of categorization of a stimulus and its compressibility, is the probability that the subject assigns to the stimulus. There are theoretical reasons (as well as some recent empirical reasons) to think that the probability and compressibility of stimuli for humans will be very closely related. That is, more probable stimuli will be more compressible8. Again, I won’t discuss experiments here that could be used to distinguish between these notions in human judgment. The main point here is that there is a closely related set of formal notions (compressibility, probability, categorizability) that will often be correlated in experiments. Schmidhuber focuses on compressibility (for various good reasons) but the project of investigating his theory of aesthetics is really a project of investigating how this family of notions relates to aesthetic judgments. Despite the body of work in support of Schmidhuber’s theory, there remains lots of empirical work to be done. First, most experiments have looked at extremely simple stimuli for which aesthetic judgments are likely to be very mild. It’s important to see if the theory does equally well in accounting for judgments made about larger, more cognitively demanding stimuli. Second, even though many of the stimuli used have been extremely simple 2D abstract patterns, the models presented of these judgments have mostly been informal or qualitative. For example, the experiments involving processing fluency test their theory by looking at the correlation between categorization speed and attractiveness. They expect that categorization speed will correlate in some way with attractiveness. However, they don’t have a model of what about stimuli makes them quickly categorizable. To make predictions about attractiveness, they need to first gather data about categorization speed, as it’s hardly 7 8 “Processing fluency and aesthetic pleasure”, Reber, Schwarz, Winkielman (2004). This correspondence is discuss at length in Mackay’s book on information theory. 5 transparent (in general) what will make a categorization task hard or easy. In contrast, compressibility is a well understood computational notion. For an arbitrary experimental setup, we can compute the compressibility of the stimuli (it’s a precise numerical quantity) and then compare these quantitative predictions to human judgments. Of course, there are practical problems with testing the compression theory. We don’t necessarily have access to the compression scheme that people are using and we don’t know how good people are at solving the hard computational problem of finding the optimal compression for a stimulus. To the extent that we can learn about the compression scheme and computational capacity (the latter of which will be especially difficult because it may be very heterogeneous), we will be able to make precise quantitative predictions about attractiveness for a very wide range of stimuli9. I discuss concrete plans for experiments below. Plausibility arguments for the Schmidhuber theory Here is a brief discussion of how compression theory could account for various properties that aesthetic judgments seem to have. 1. Beauty or attractiveness are experienced as properties of their objects Once we fix an individual with a particular compression scheme (or internal language of thought), the compressibility of a stimulus will depend only on properties of the stimulus. However, the actual compression achieved by the subject can vary depending on the computational efforts of the subject, because finding the patterns in a stimulus that allow it to be compressed is a hard computational problem 10 and may require time and effort. This explains why the attractiveness of a stimulus may not be immediate (e.g. music or an intricate image requiring repeated listening or viewing) but that the experience of something as attractive seems to be an experience of features of the image itself. (For contrast, consider the case where one feels euphoria or joy while perceiving a stimulus without attributing one’s affect to features of the stimulus). It’s worth noting that one limitation of the categorization theory is its lack of generality. People may make aesthetic judgments about stimuli they’ve never had to categorize. 10 Finding the shortest Turing machine that outputs a given n-bit string is uncomputable (Kolmogorov). 9 6 One might think that this aspect of the phenomenology of beauty or attractiveness clashes with the relativity of compressibility to a language or compression scheme. If people are aware that the compressibility of an object varies depending on compression scheme (for example, one’s compression scheme could change with time and different people could have different schemes) then we might expect their experience of beauty to reflect that awareness. (For comparison: when deeming something to be exciting or novel, we seem to be more aware of the subjectivity of this property. “It was really exciting for me” vs. “It was really beautiful for me”). However, compressibility, at least on the Kolmogorov model, is subjective only up to a small constant number11. For large stimuli (i.e. those which are represented in uncompressed form by a large number of bits) compressibility is almost the same across different compression schemes. This predicts that people (if aware of the relativity of their judgments at all) should acknowledge more relativity in judgments about small stimuli than large ones12. (Some potential problems for this line of argument are the following. First, computers can deal with enormous inputs, but it’s not clear whether humans can in the same way. Second, it may be human compression schemes vary more than existing programming languages). 2. Variability in judgments of beauty often isn’t based on mere brute preference Compression can also help explain the intersubjective variability of aesthetic judgments. Two subjects can vary in how much they compress a subject for two different reasons. First, one subject is better able to solve the computational problem of finding a compressed representation. This would correspond to different aesthetic judgments arising from different skill levels in finding patterns or structure that both subjects would acknowledge as existing. For example, someone may judge a piece of music or a movie to be bad because he fails to recognize the structure behind the superficial disorder (e.g. due to being distracted during the performance or lacking practice in lateral thinking of the right sort). In this kind of case, it can be possible to 11 The size of the compressed representation of stimulus S w.r.t. scheme T = The compressibility of stimulus S w.r.t. scheme T’ + a small constant c (where c is fixed and so shrinks in significance as the sizes of the compressed representation increase). 12 There is some difficulty here of translating in a principled way familiar stimuli like faces, natural scenes and works of art into digital representations of different sizes. Should a representation of a piece of abstract visual art preserve every detail that is visible when looking at the canvas up close? On the other hand, it could be easy to rule that a four-bar piano tune is smaller than an hour long symphony.. 7 show someone the structure and have them experience the beauty for themselves or at least recognize their initial mistake. However, in some cases the difference in skill levels between subjects may be too great. Compressibility can also vary because of differences in compressive scheme. Consider the first fifty digits of pi. This can be expressed in a short English phrase. Likewise, we can write very short computer programs in most programming languages which outputs these fifty digits13. However, we can imagine some hunter-gatherers who would have no similarly short representation of pi’s first fifty digits. They would have no name for the number (like ‘pi’ in English) and no names for concepts that make it easy to pick out by definite description (as in ‘the ratio of the circumference of a circle to its diameter’ or ‘the area of the unit circle’). Some variability in aesthetic judgments may be the result of different compressive schemes. From a large body of research, we know that American subjects like arbitrary Chinese characters more if they have had more recent exposure to them (“the mere exposure effect”). Here we can suppose that exposure to the character leads to better compression for the character. We can expect this to make a difference to people’s appreciation of works of art. For instance, complex forms like the outline of the British Queen’s head or the outline map of the US (as in Johns’ famous work) would appear as simple, recognizable elements for people familiar with them and would look random otherwise. A large classical portico may look more anomalous on a colorful, wooden C19th house for someone without any experience of the portico as a standard architectural unit. The same kind of phenomenon may explain people’s first reaction to music based on very different scales or conventions. People may find the music structure-less and very hard to remember with much accuracy. Musical training may consist both in developing skills for discerning pre-existing patterns and in teaching the arbitrary conventions that are employed in a particular musical tradition14. For example, you use the Taylor expansion. pi / 4 = 1 – x^3 / 3! + x^5 / 5! - .... Likewise, someone learning about medieval art may have to spend some time learning the arbitrary conventions for how various Christian scenes were to be painted. Mere exposure may also explain part of the phenomenon that the fashions of previous decades look ridiculous to us now while today’s fashions look great. An additional point: Gaining the ability to compress something very contingent like a particular person’s face or the outline of the particular country implies that some other stimulus has been less compressible. That is, changing a compression scheme to fit some frequently occurring stimulus makes you worse at compressing some other stimulus. So we might expect that whenever mere exposure makes something more attractive, there are some stimuli that must become less attractive than they were before. This could be tested by exhaustively testing some range of stimuli (e.g. all 10-bit strings). Possible 13 14 8 I’ve put variation in compression schemes forward as a theoretical option that could do work in explaining variation in aesthetic judgments. It’s plausible that it is relevant to some of the cases noted in the previous paragraph, but its wider significance is unclear. As discussed in the previous section, the compressibility of large stimuli is almost independent of compression scheme15. Hence we should only expect limited intersubjective variation based on different compression schemes. 3. Beauty and attractiveness are applied to a diverse range of phenomena Here are some things that humans find beautiful or attractive (where “attractive” is meant in the aesthetic sense): Paintings and sculpture, ranging from photorealistic to very abstract Architecture, videogame virtual worlds, furniture and other man-made objects Highly organized and structured sequences of sounds (i.e. music), some of which may resemble everyday sounds (animal or human voices, leaves rustling) and some of which do not Sunsets, coastal areas, waterfalls, canyons, flowers and trees, people, animals and other natural objects Dances Poems and other literary works Proofs of mathematical theorems Scientific, mathematical or philosophical ideas Some of these objects seem to be beautiful in different senses. Maybe an idea is beautiful in part because of its explanatory power. However, it’s plausible that the first items on the list (works of art and natural objects) can all be beautiful or aesthetically attractive in the same way16. It’s also plausible that there is at least some important similarity between this kind of beauty and the beauty of mathematical proofs and intellectual ideas. implication that if you have to memorize lots of arbitrary structure in order to store (say) familiar faces or concepts in compressed format, there will be other faces or concepts that you store less well as a result. This issue may be interestingly relevant to the phenomenon of recognition. There’s an experiential difference between recognizing (say) your dog as your dog and recognizing the same dog as a very prototypical dog (or prototypical Dalmatian). 15 At least on the Kolmogorov model. 16 Works of art can have aesthetic properties that natural objects cannot have. Poems, for instance, can be aesthetically impressive in part because of the richness of the ideas they express. However, it is also possible for poems or paintings to be attractive, even beautiful, without expressing very rich ideas. Similarly, a musical tune or dance can be very pretty without having any other aesthetic virtues. 9 So, what do these objects have in common, apart from being called “beautiful” or “attractive”? They do not seem to share many non-trivial intrinsic properties. Yet they are all things that humans like, and that provoke a positive affective response in humans. The problem is that there are other things that humans like which we don’t call beautiful or attractive. For example: the experience of satisfying a strong urge (e.g. for food or drink, or physical warmth), the experience of one’s own team winning, the experience of seeing a friend after a long time apart, the experience of an outcome being morally just (from one’s standpoint). There are also sensory experiences that seem to have more in common with the attractive/beautiful experiences listed above but which are not usually described in the same terms. We can have rich and satisfying experiences from the senses of touch, smell and taste, but we are less likely to describe these experiences in terms of beauty or attractiveness 17. Likewise, most non-music sound experiences (e.g. the sea, traffic, glass shattering, animal calls) can be pleasant or unpleasant, but can’t be beautiful in the way that even a simple melody played on a piano can be. The compression theory suggests an answer to these questions. The items on the list above (at least up to math proofs) are the kind of stimuli from which humans construct rich mental representations. These representation can start complex but then become smaller due to the discovery of regularities or redundancies in the initial representation. This allows the phenomena in question to be generated from a simpler mental program (i.e. set of stored instructions). In the case of natural scenes or natural objects, finding structure in these stimuli is finding regular structure in the world and so has clear practical use in navigating the world. If you realize that all the apples on a tree (or in an orchard) look a particular way (and all look that way), then you have a simple rule for identifying this tree (or orchard) in the future. Also, you can infer that any apple’s falling from the tree will be the same, and so your being hit by an apple will have the same effect wherever you are standing under the tree 18. So the experience of finding patterns in a set of visual experience will often be the experience of finding structure in the world around us. This holds much less in the case of other sense modalities. The senses of taste and touch are often involved in actions that manipulate As David Chalmers notes in “Fall from Eden”, we are also less likely to describe them as veridical or non-veridical. 18 Similarly: if I look from above and see that a river splits into two very similar streams, then I can exploit this similarity by just storing information about one of the rivers. 17 10 or alter the world (regardless of what its previous state was) in ways we find desirable. Actions of eating or touching generally change aspects of the world, whereas vision is used to merely observe (without manipulating) parts of the world in which we hope to discern regularity or structure. The senses of smell and hearing (in the non-music nonspeech case) are clearly used for non-destructive “observation”. However, they seem to be mostly used for identifying particular objects or events in the world. With vision, we are given an initial visual presentation of a scene and then seek to find patterns in it (e.g. all apples are the same color and size, or all the houses on one side of street are built in the same style). It’s harder to think of cases where we have an initial smell or sound percept and then subsequently discover some underlying regularity in that percept19. So we have a general observation: the kind of experiences we call attractive or beautiful are those in which we have an initial mental representation in which we seek to find regularities (i.e. structure that makes for a more compressed representation). This explains why some sensory modalities (touch, smell, taste, most instances of hearing) do not typically produce such experiences. This also explains why the desirable experiences listed above (satisfying urges, having one’s team win, etc.) are not described using these aesthetic terms. What to say about beautiful ideas? Take Newtonian Mechanics. As a successful scientific theory, Newtonian Mechanics allow a compression of a huge number of empirical phenomena. They explain not just mechanical interactions on earth, but the motions of the planets and the moon. At the same time, the laws themselves can be written on a postage stamp20. Thus you have a huge compression of empirical evidence whose source, when written down in mathematical terms, is extremely concisely expressible. (Compare with a theory that is similarly successful empirically but has a large number of special cases for different domains. This theory would be less beautiful though still have the same power to compress empirical evidence). Not to say we couldn’t, and in cases where we do, maybe “beauty” and “attractiveness” would start to be applied to such experiences. 20 They are simple once you have vector calculus. But vector calculus is a very simple and obvious idea in mathematics that has many applications in mathematics itself as well as in areas outside physics. 19 11 Plan for work over next few months Our first goal is quantitative testing of the compression theory of attractiveness judgments. This involves devising and implementing computational models of human learning and compression abilities, and testing the models against human experimental data. We have designed three experiments for this purpose. Experiment 1: Learning prototypes from noisy instances As noted above, previous experimental work has shown that people find prototypical instances of a concept more attractive than atypical instances. In Experiment 1, we will provide a computational model for learning prototypes. As someone learns a prototype, images very close to the prototype will become increasingly probable (and so compressible). We thus predict that the very same image will become more attractive over the course of learning. We will also test the converse effect in order to control for mere exposure. If subjects start with a slightly unrepresentative set of samples from the prototype, then an image may seem more compressible early on in their learning than it does having learned the prototype21. Finally, we will consider stimuli that have a very simple compositional structure. Since compositional operations (e.g. placing two prototypes side-by-side in the same image) are very computationally simple, we expect that compositional stimuli will be judged attractive as a simple function of the attractiveness of the components. The stimuli for the experiment will be tiled images. A very similar set of stimuli was used successfully by Winkielmann et al. Some examples of the kind of stimuli we’ll use are below. The large image shows a prototype. Surrounding it are stimuli randomly There’s a worry that if the starting sequence is too misleading, subject will think that they are being misled and that there’s not a true i.i.d. process generating images. A difficulty here is that there are multiple Bayesian algorithms for finding the prototype and so it’s not clear which one to try to fit human judgments for this task. I guess you could find the one that has best overall fit and use that to make predictions about how compressible the test image should seem after the misleading evidence. Alternatively, we could teach subjects one prototype relative to which an image X is prototypical. Then spend more time showing another, different prototype, relative to which X is atypical. This should result in X looking less attractive. 21 12 generated from this prototype according to different models of noise. The examples also vary in how much noise is applied in each case. 5 5 5 10 10 5 15 15 5 10 10 15 5 10 5 10 15 10 15 5 5 10 10 15 5 10 15 15 5 5 10 15 10 15 55 55 55 55 5 1010 51010 1010 1010 1515 1515 55 1010 1515 1515 55 1010 1515 10 15 15 1515 55 1010 1515 55 1010 1515 10 55 55 15 1010 55 151010 1010 5 10 1515 1010 1515 1010 15 5 1515 55 55 10 1515 55 1010 1515 15 1515 55 1010 1515 55 1010 1515 Experiment 2: Learning Taboo Regions In Experiment 1, subjects judge the attractiveness of images that are very similar to the stimuli from which they learn prototypes. However, the compression theory predicts that any images that become more compressible on learning the prototype should become more attractive, not just very similar images. For instance, if subjects learn a particular prototype from noisy images like those shown above, they should find images involving simple computable transformations of that prototype more attractive (i.e. more attractive after the learning than before). Some examples of simple transformations are shown below: 13 15 15 10 10 5 5 5 10 15 5 10 15 15 5 10 5 10 15 510 15 5 10 5 10 10 5 15 15 10 5 10 15 10 15 510 15 10 10 5 510 5 15 5 5 510 10 15 15 5 10 5 5 15 15 5 10 10 10 5 15 5 15 15 10 5 10 15 15 5 5 5 15 10 5 15 5 10 5 15 15 15 5 10 10 10 10 15 10 5 5 10 15 15 5 15 15 10 15 5 10 15 5 10 15 10 15 15 5 10 15 5 10 15 5 5 Experiment 2 will test this prediction about the attractiveness of transformations of 10 5 newly-learned 10 5 based on a “taboo region”, an area of the 2D prototypes. Stimuli are 15 15 5 10 15 plane where no black pixels occur. Subjects see many images that respect the taboo 10 region. 10will have learned (at least subpersonally) that in After enough examples, they all the images they’ve seen, certain areas of the image have always been white. This 15 15 learning of the Taboo Region should make it more compressible for them. Subjects 5 10 15 5 10 15 will then be presented with the Taboo Region. We predict that it will be more attractive after the learning stage of the experiment than before. Here’s an example of the kind of stimuli we will use: 14 10 10 10 20 20 20 30 30 30 40 40 10 20 30 40 40 10 20 30 40 10 10 10 20 20 20 30 30 30 40 40 10 20 30 40 20 30 40 10 10 20 20 20 30 30 30 40 10 20 30 40 20 30 40 10 20 30 40 10 20 30 40 40 10 10 40 10 40 10 20 30 40 5 10 15 Taboo Region 20 25 30 35 40 5 10 15 20 25 30 35 40 The nine images on the previous page all have no black pixels in the taboo region (shown directly above). Subjects would be shown a set of training stimuli like the nine images above (but more numerous and with images that make the taboo region less obvious than those above) and then shown the taboo region. We can also test subjects on simple transformations of the taboo region. For instance, the size and position of 15 some elements of the taboo region could be varied. Or, if multiple taboo regions are learned by subjects in the training phase, we can present them images that combine multiple taboo regions simultaneously. Finally, the Taboo Region experiment may be a starting point for investigating the importance of the change in compressibility of a stimulus. If subjects see a large number of noisy images that respect the taboo region, then we can imagine that they will learn the taboo region very well by the time they see it. In that case, seeing the taboo region itself will be compressible, but it won’t provide them any increased ability to compress previously seen images. However, if subjects have seen only a small number of noisy images, they will have some uncertainty about the properties of the taboo region before they see it. Thus, seeing the region in its entirety will presumably give them a more compressed model of their past observations22. Note that there are two related notions of an increase in compressibility over time: 1. Increased compressibility for a single fixed image Someone could be viewing a static image and discover patterns in the image after staring at it for long enough. 2. Increased compressibility of past observations attained by viewing or reading a new stimulus Someone can come to recognize a pattern in previous observation by viewing a new stimulus that communicates the pattern. Seeing the taboo region will provide the second kind of compression gain. Of course, every subsequent image viewed in the training phase will make previous images more compressible. However, the taboo region (depending on how exactly subjects interpret it) will provide subjects with a concentrated compression increase. Experiment 3: Investigating people’s existing compression scheme A general problem with testing the compression theory is that we don’t have access to people’s compression scheme. The previous two experiments got around this because 22 Not fully clear on this. Question is how good a compression they could get with a distribution over taboo regions. I think the basic claim here is correct. 16 they involved a learning task, where we can predict that subjects will alter their compression scheme in a particular direction. Thus, after the learning task, a class of stimuli will become more compressible than they were before. We can’t say, however, how compressible they were before or how compressible they become relative to other stimuli we haven’t tested. Experiment 3 aims to learn subjects’ existing compression scheme in a particular domain and then use this to make a much wider range of attractiveness predictions. The basic idea for learning a compression scheme is as follows. Take some simple domain such as n-bit 1D images or nxn 2D images. Get people to judge the probability of a substantial sample of the images. Now find a compression scheme that fits the probabilities for the human judgments. This will allow you to generalize to untested images and (plausibly) to images of a larger size. This kind of design was employed by Tenenbaum and Griffiths (2003). They constructed a model that successfully predicted human probability judgments for sequences of 8 coin tosses (which is a similar case to 8-bit images). A simple extension of their approach would be to use their model to predict attractiveness judgments. However, 8-bit strings are probably not the most interesting set of stimuli for getting attractiveness judgments. It might be better to use the Tenenbaum and Griffiths model as a starting point for judgments about n x n images. One could take samples of (say) 10x10 images, and augment the model as suggested by subject judgments. (We might imagine that images which involve iterations of a shape or structure will be seen as simple. For instance, the image below with a simple shape shown in three size variants. So we need a compression scheme that makes this possible). 17 The road ahead: compressibility in art and science One aim of this project is to connect the attractiveness judgments made about the simple stimuli above to aesthetic responses to art and science. I will sketch some ideas that are currently very rough. We have lots more material in preparation but it is not very organized. Our basic idea23 is that the aesthetic properties of ideas in mathematics24, philosophy and science come from their ability to compress lots of phenomena (e.g. empirical evidence or true mathematical statements) while themselves being simple (i.e. short, elegant). For contrast, there are intellectual works with elegant theories that are arguably not very explanatory (e.g. various mathematical models in economics and cognitive science, false theories in the history of physics, theories like logical positivism in philosophy or an ontologically austere religious metaphysics) and works with inelegant theories that may nevertheless be explanatory (e.g. works in historical sciences, history, ecology, meteorology, and philosophical works by Rawls). The phenomena that science or philosophy explain are not always phenomena that we have full or conscious awareness of. In a murder mystery or a long narrative joke, we are first presented with a curious sequence of events and then receive an explanation (compression) of these events in concentrated form at the end of the mystery or joke. Everyone who sees the ending of the mystery or joke holds in memory the events that the ending will explain25. In science or philosophy, the explananda of the theory may be phenomena that we only become fully aware of after 23 Our ideas are related to ideas that Schmidhuber expresses in short passages in various papers. However, his treatment (though insightful) is very brief, and so we’ve gone beyond it in various ways. 24 The case of mathematics is somewhat different from the scientific cases. In science there is a diverse body of empirical phenomena that we seek to compress by finding regularities in the data. In mathematics, one might suppose that almost all of the interesting facts of mathematics have already been compressed (to a dramatic degree) by rigorization of calculus in the C19, and then by the full axiomatization in terms of set theoretic axioms and formal logic. One approach is to think about a set of theorems in a particular area of math as providing a compressed (relative to human limited inferential abilities) representation of the known mathematic facts in that area. Thus, set of theorems compresses an area by making it tractable for a mathematician, i.e. making it relatively easy for him to decide statements in that area (this is crucial for practical applications of an area of math, where particular cases matter). This is not compression in the formal sense, and there’s a serious issue of which notion is more important in human cognition. 25 It’s noteworthy that we care about having a murder mystery explained even though the explanandum, a series of fictional events, has no possible practical relevance. Still, scientists and philosophers care about explaining many phenomena whose understanding would have little practical significance. 18 learning about the theory. Successful theories will often impact us in two ways. First, they show us a new sphere of phenomena, teaching us to look more closely at everyday happenings or to recognize rich structures at very small or large scales. Second, they provide a compression of these new phenomena, as well as compressing phenomena with which we are more familiar. Both aspects are important to the overall appeal of scientific inquiry but (on the compression account) it is the second that should be most important aesthetically. Works of art, like scientific works, can alert us to novel phenomena and explain or connect up a diverse range of familiar phenomena. Sometimes it makes sense to see a work of art as communicating a generalization about an aspect of the world26. Often, works of art exhibit a connection or correspondence. This kind of correspondence could be related to compression, but it’s not clear exactly how27. As with scientific works, it is this semantic28 component of art (the aspect which points out from the work of art to things in the world) as well as the syntactic component (elegance and beauty in the syntax itself) that contribute to the aesthetics of the work of art. (With art it may be very hard separate or distinguish the contributions of these two components to the overall aesthetic effect). In science, it is often easy to paraphrase the semantic component. In art, this can be much more difficult. For example, it is hard to explain in general terms the way in which art conveys mood, sensibility, emotion (especially in the case of music) or correspondences and relations between different kinds of experiences29. The key question for our project is not to characterize the effects produced by art (and its means of producing them). The question is whether the aesthetic judgments that we apply to art can be understood in terms of compression. Some of the things that art conveys (mood, emotion, insights into subjective experience, all kinds of generalization about the social and moral world) can be One simple example is in literature’s displaying prototypical individuals or situations. We might describe real individuals in terms of fictional characters (just as we might use historical figures for the same purpose). 27 Showing a correspondence between X and Y can allow you to more parsimoniously represent Y by storing it in terms of X. The more parsimonious representation of Y may make certain cognitive operations (generating predictions or logical consequence, testing consistency, storing in memory, etc.) easier. 28 I’m using “semantic” very loosely here. 29 Davidson on metaphor and Peacocke on the perception of emotion in music are examples of accounts how art can convey “content” (construed broadly). Both writers are concerned with how these phenomena relate to human cognition more broadly. 26 19 conveyed by non-artistic means30 that aren’t usually seen as candidates for beauty or other aesthetic judgments. So there is some reason to think that some of the more distinctive effects of art are not necessarily bound up with art’s aesthetic effects. 30 You can learn about these things by reading non-fiction, or hanging out with the right kind of people or in the right cities. These are all kinds of things that can be beautiful, but they are not necessarily beautiful in their ability to communicate the kind of things that art communicates. Of course, these claims are contentious and need to be developed in much more detail. 20