CULTURAL ANALYTICS: ANALYSIS AND VISUALIZATION OF LARGE CULTURAL DATA SETS White paper from SOFTWARE STUDIES INITIATIVE @ Calit2/UCSD Prepared by Dr. Lev Manovich <manovich@ucsd.edu> Director, Software Studies Initiative @ CALIT2 Professor, Visual Arts Department @ UCSD email: manovich.@ucsd.edu URL: www.manovich.net Original idea: October 2005. full version: May 2007. Updates: May 2008. Please do not distribute without our permission. SUMMARY Can we create quantitative measures of cultural innovation? Can we have a real-time detailed map of global cultural production and consumption? Can we visualize flows of cultural ideas, images, and trends? Can we visually represent how cultural and lifestyle preferences – whether for music, forms, designs, or products – gradually change over time? Today sciences, business, governments and other agencies rely on computerbased analysis and visualization of large data sets and data flows. They employ statistical data analysis, data mining, information visualization, scientific visualization, visual analytics, and simulation. We believe that it is time that we start applying these techniques to cultural data. The large data sets are already here – the result of the digitization efforts by museums, libraries, and companies over the last ten years (think of book scanning by Google and Amazon) and the explosive growth of newly available cultural content on the web. (For instance, as of February 2008, Flickr had 1.2 billions of images – together with tags created by users and other metadata automatically logged by Flickr servers). The envisioned highly detailed interactive visualizations of cultural flows, patterns, and relationships will be based on the analysis of large sets of data comparable in size to data sets used in sciences (i.e. a few or even dozens of terabytes). The data sets will come from a number of sources. The first source is media content - games / visual design / music / videos / photos / art / photos of architecture, space design, etc. / blogs / web pages, etc. In visualizing this content, we will use not only already existing metadata (such as image tags created by the users) but also new metadata that we will generate by analyzing the media content (for instance, using computer vision techniques to detect various image features). The second source is digital traces left when people discuss, create, publish, consume, share, edit, and remix these media. The third source is various web sites that provide statistics about cultural preferences, popularity, and cultural consumption in different areas. Yet another source is what we can call “meta channels” – blogs which track most interesting developments in various cultural areas. Visualizations should provide rich information presented in different formats, i.e., graphics, text, numbers, images, video, time graphs, maps, time-space interfaces, etc. This information will be placed in larger contexts – for instance, geo maps overlaid with economical, sociological and/or historical data. Visualizations should be designed to take full advantage of the largest gigapixel wall-size displays available today – that are being constructed at Calit2 where our lab is located: www.calit2.net Scaled-down versions of these visualizations running on the web or user’s PC or mobie phone should also be available. http://vis.ucsd.edu/mediawiki/index.php/Research_Projects:_HIPerSpace While existing cultural visualizations typically present a single graph and are hard-wired to the particular data they show, our goal is construct an open and modular Cultural Analytics research environment which will allow the user to work with different kinds of data and media all shown together: original cultural objects / conversations, cultural patterns over space and time, statistical results, etc. A user should be able perform analysis of the data herself using visualization as a starting point (like in GIS); as more processing power becomes available, such analysis eventually would be done in or close to real-time. The data sets can be assembled before hand, or harvested from the web in real time. Ideally, such an environment will be general enough so a user would be able to connect new cultural databases and also to add new visualization/analysis modules. Why do we need Cultural Analytics? Why do we need Cultural Analytics research environment? What problem would it be solving? One answer is that it is a logical next step given a growing number of visualization projects done by historians, similarly growing numbers of artistic projects in visualization of contemporary historical data, and other related developments. But if Cultural Analytics will provide new tools for research and teaching of culture in the past (traditional domain of humanities), we feel that it is really necessary if we are to have a meaningful conversation about contemporary culture. The exponential growth of a number of both non-professional and professional media producers over the last decade has created a fundamentally new cultural situation. Hundreds of millions of people are routinely created and sharing cultural content (blogs, photos, videos, online comments and discussions, etc.). As the number of mobile phones is projected to grow during 2008 from 2.2 bil to 3 bil during 2008, this number is only going to increase. At the same time, the rapid growth of professional educational and cultural institutions in many newly globalized countries along with the instant availability of cultural news over the web has also dramatically increased the number of "culture professionals" who participate in global cultural production and discussions. Hundreds of thousands of students, artists, designers have now access to the same ideas, information and tools. It is no longer possible to talk about centers and provinces. In fact, the students, culture professionals, and governments in newly globalized countries are often more ready to embrace latest ideas than their equivalents in "old centers" of world culture. If you want to see this in action, visit the following web sites and note the range of countries from which the authors come from: Student projects on archinect.com/gallery/; Design portfolios at coroflot.com; Motion graphics at xplsv.tv; Before, cultural theorists and historians could generate theories and histories based on small data sets (for instance, "classical Hollywood cinema," "Italian Renaissance," etc.) But how can we track "global digital culture" (or cultures), with its billions of cultural objects, and hundreds of millions of contributors? Before you could write about culture by following what was going on in a small number of world capitals and schools. But how can we follow the developments in tens of thousands of cities and educational institutions? Impossible as this may sound, this actually can be done. Welcome to Cultural Analytics. Background Our vision for Cultural Analytics draws on the ideas in NSF “Cyberinfrastructure Vision” report (2006), ACLS “Cyberinfrastructure for Humanities and Social Sciences” report (2006), as well as current work in data mining and visualizing the web and blogosphere, web and business analytics, information visualization, scientific visualization, and software art. The NSF report emphasizes high performance (petascale) computing, data analysis and visualization. The 2006 summer workshop on Cyberinfrastructure for the Humanities, Arts and Social Sciences at the San Diego Supercomputer Center has similarly encouraged the participants to think about research projects on that scale. However, so far I don’t know of any project in humanities that has begun to address this challenge using contemporary cultural data sets (as opposed to historical ones). In the present decade our ability to capture, store and analyze data is increasing exponentially, and this growth has already affected many areas of science, media industries, and the patterns of cultural consumption. Think, for instance, of how search has become the interface to global culture, while at the same time recommendation systems have emerged to help consumers navigate the ever-increasing range of products. We feel that the ground has been set to start thinking of culture as data (including media content and people’s creative and social activities around this content) that can be mined and visualized. In other words, if data analysis, data mining, and visualization have been adopted by scientists, businesses, and government agencies as a new way to generate knowledge, let us apply the same approach to understanding culture. Traffic monitoring, Florida Department of Transportation Imagine a real-time traffic display (a la car navigation systems) – except that the display is wall-size, the resolution is thousands of times greater, and the traffic shown is not cars on highways, but real-time cultural flows around the world. Imagine the same wall-sized display divided into multiple frames, each showing different data about cultural, social, and economic news and trends – thus providing a situational awareness for cultural analysts. Imagine the same wall-sized display playing an animation of what looks like an earthquake simulation produced on a super-computer – except in this case the “earthquake” is the release of a new version of popular software, the announcement of an important architectural project, or any other important cultural event. What we are seeing are the effects of such “cultural earthquake” release over time and space. Imagine a wall-sized computer graphic showing the long tail of cultural production that allows you to zoom to see each individual product together with rich data about it (à la real estate map on zillow.com) – while the graph is constantly updated in real-time by pulling data from the web. Imagine a visualization that shows how other people around the word remix new videos created in a fan community, or how a new design software gradually affects the kinds of forms being imagined today (the way Alias and Maya led to “blobs” in architecture). These are the kinds of projects we want to create. AT&T Global Network Operations Center Existing Work In this decade we see a growing body of projects that visualize patters in cultural content, cultural flows, and cultural dynamcs. People who create such visualizations come form many fields: design (for instance, Media Lab’s alumni Ben Fry), media design (Imaginary Forces’s visualization of the paintings in MOMA collection commissioned by MOMA for its lobby), media and software art (for instance, George Legrady’s visualization of books’ flow in Seattle Public Library commissioned by the library; Listening Post installation by Mark Hansen and Ben Rubin), computer graphics, and computer and information science. Not surpisingly, the pioneering and best projects in this area are done by people who combine more than one background – such as seminal History Flow created by Martin Wittenberg and Fernanda B. Viégas from IBM's Collaborative User Experience research group. ‘Valence’ by MIT Media Lab’s Ben Fry, http://acg.media.mit.edu/people/fry/valence/ George Legrady’s visualization of the flow of books in the Seattle Public Library Our proposal builds on this work. At this time, we want to overcome what we see are as limitations of many the projects created so far: (A) In comparison to the amounts of data used today in science or being processed by Google, Amazon, Nielsen, and social media companies, such projects use very small amount of data; (B) As data sources, they usually use media and metadata available on the web (often using API’s of Amazon, Google Maps, etc. – for instance, see liveplasma). In other words, the projects are driven by what data is easily available, rather than by more challenging theoretical questions and research agendas in art history, film and media studies, communication, urban anthropology, sociology, etc., or by currently important cultural issues discussed in media. Screen capture of a liveplasma visualization (C) They typically rely on existing meta-data, and do not look “inside the data,” i.e. they don’t perform data analysis. For instance, existing visualizations of Flickr data use existing metadata and don’t do any image processing on the images themselves. Similarly, Amazon’s recommendation system uses available metadata about the books and users’ activities on amazon.com; as far as I know, it is not connected to Amazon’s software that does analysis of book’s contents (“Statistically Improbable Phrases”) (D) They do not layer many types of data together (as is done in GIS applications - including GIS-type applications for general public such as Google Earth) Screen capture of GoogleEarth, demonstrating layers (E) Cultural visualizations that show lots of detail often are static (for instance, Listening History, Bradford Paley’s map of science). Visualizations that are dynamic usually have less detail, they don’t let the user to drill down into the data, and they rarely use multiple kinds of data. As an example of a functionality and interface we would like to see in a Cultural Analytics research environment, look at real estate web site Zilllow. It allows the user to zoom into the street level view of the US to see estimated housing prices for every block and get detailed information about houses on the market, including time data of how a house’s price changed over a 10-year period. (F) Most important, with very exceptions (such as the paper published by the authors of the seminal History Flow), the authors of culture vis projects do not offer theoretical interpretations of their visualizations or discussions of the findings. Thus, while the number of projects keep growing – as can be seen by looking at the following web sites which collect projects in visualization of cultural data – it is hard to know how useful many of these projects are: Bradford Paley’s “Map of Science” www.visualcomplexity.com Lee Byron’s “Listening History” infoaesthetics.com Viz4All mapping science In Digital Humanities, people are doing statistical analysis of historical texts – but these texts almost always come from the past and the results are almost never turned into compelling visualizations. Currently, there is work at NCSA to create platforms to enable mining and visualization of large sets of humanities data (MONK, NORA) – but once again, they are designed to work on texts rather than visual media. For example, “The NORA project, funded by the Andrew W. Screen capture, infoaesthetics.com Screen capture, visualcomplexity.com Mellon Foundation, aims to produce software for discovering, visualizing, and exploring significant patterns across large collections of full-text humanities resources in existing digital libraries.” We see more use of graphics – timelines, GIS-style interfaces, and even beginning of thinking about time-space interfaces – in historical research. While this a very encouraging development, the projects produced so far deal with relatively small amounts of (usually textual) data – not surprisingly, since our historical records have a scale very different from the scale of contemporary culture (cultural objects created by both professionals and amateurs + digital traces of cultural activities and conversations.) Since digital humanities work so far has focused on the analysis of canonical historical texts that belong to “high culture” and it has not taken advantage of the wealth of contemporary cultural data available on the web, let us consider another body of work by non-academic actors that focuses on current cultural and social data. The examples are software such as Google Earth and Microsoft’s Photosynth, web analytics tools such as Nielsen BuzzMetrics' BlogPulse, GIS applications, recommendation systems, and projects such as the Music Genome Project and NewsMap. These software and projects have their own limitations. The projects that do visualize large amounts of data - for instance, mapofscience and similar work, maps of cyberspace, Google Earth, GIS applications – do not explicitly focus on culture (i.e., media content and related activities of people who consume, create, share, and re-mix this content). Other types of applications are designed to run on the web in real time and be accessible via a browser. This limits the amount of data being analyzed and the visual presentation of these data. In other words, these software are not designed to take advantage of the components of next-generation cyberinfrastructure developed at Calit2 and similar places: display walls with the resolution in hundreds of megapixels, data storage measured in petabytes, petascale computing, new scientific visualization techniques, grid computing, etc. The same applies to recently emergent web-based visualization software efforts which aim to democratize visualization by letting users upload their own data sets and then visualize them using provided tools (IBM’s Many Eyes and Swivel). (As the public interest in these software projects demonstrates, the idea of visualizations social and cultural data either designed for the general public or created by the public is gaining momentum. Similarly, the numbers of cultural visualization projects - even if they are done on a small scale for now - have been growing exponentially over last few years as demonstrated by stats at visualcomplexity.com. It is also relevant to mention here the Netflix prize that invites the public to work on data mining algorithms by using a dataset provided by the company.) While Google, Amazon, Netflix, and other large web companies capture and analyze very large data sets (users’ searches, product purchases, movie rentals, etc.), they have not focused their efforts on visualizing this data or analyzing it to ask larger theoretical questions about how culture works and how it changes today (at least, according to publicly availably information about what these companies are doing.) Other projects have focused on using digital traces to create detailed records of social activities (MIT Media Lab’s Reality Mining; Microsoft’s MyLifeBits). However, they are also are not concerned with creating rich visualizations of these data or using the captured data to explore larger cultural and social issues relevant today. The same goes for the recently emerged paradigm of “live blogging” or “life logs” where people frequently (potentially continuously) send photos, video, text (and potentially other media) to a private or a public site/blog. (Of course, such life blogs are a rich source of data for Cultural Analytics). The New Paradigm: Approaching Culture as Big Data We propose a new paradigm for cultural analysis and visualization that, in parallel to web and business analytics and visual analytics, can be called Cultural Analytics. (Other terms that can be also used are Cultural Datamining, Culture as Data, or Big Humanities). The projects in this paradigm will share certain features that will make them different from existing work as summarized above: (A) We will focus on visual and media data (as opposed to text as it is done in digital humanities so far). This type of data requires more storage and computation and expertise in computer graphics and visualization – thus making Calit2 an ideal place for this work. (B) We will focus on visualizing cultural dynamics, flows and cultural changes on all possible dimensions – an approach which we think is particularly relevant today given the rapid speed of cultural changes around the world. (C) We will use very large data sets comparable in size to data sets used in most data-intensive science projects – thus participating in the vision set by NSF - and the recently announced Humanities HighPerformance Computing initaive by NEH: http://www.neh.gov/ODH/ResourceLibrary/HumanitiesHighPerformanceComp uting/tabid/62/Default.aspx (D) While we also want to data-mine cultural data from the past, we will focus on analyzing and mapping contemporary global cultural production, consumption, preferences, and patterns. Consequently, we will use the web as a data resource (photos, video, music, books bought, games, etc.) in addition to other data sources. To put this in another way, while our proposal is informed by the work in digital preservation, digital museum communities, our emphasis is different. In contrast to digital humanities, we want to focus on current cultural data, with historical data acting as support for the present. Similarly, in contrast to digital preservation paradigm, our emphasis is not on archiving the past for its own sake but on understanding the present and thinking about the future. In other words, the goal of our work is not preservation for its own sake but analysis – whether human analysts looking at visualizations or software doing analysis of the data. (E) In contrast to existing work we will combine existing metadata with the computer analysis of actual data (feature extraction, segmentation, data clustering, data mining, machine learning, etc.). (F) Taking a clue from visual information systems used today in a variety of industries, we want to construct not only single visualizations but also suites of visualizations, graphs and maps which show different kinds of information next to each other - thus taking advantage of the latest large resolution display systems. (For an example of layouts used in existing commercial applications, see Barco’s iCommand.) We believe that such displays are necessary if we want to really make sense of rapidly changing global cultural patterns and trends today. In other words, we want to provide “situational awareness” for “culture analysts” - ranging from media critics to trend forecasters to decision makers at companies that produce media and lifestyle products worldwide. (G) The projects should use the largest displays systems in the world (resolution-wise) – i.e., the displays currently built at Calit2 VIS Lab Barco’s iCommand and IVL. (Of course, they also should be available as desktop and/or webware application in scaled version for wider dissemination). This will allow us to present different kinds of information next to each other in a way that has not been done before. In 2007, the UCSD division of Calit2 completed the largest display system in the world – a wall made from 55 30-inch monitors resulting in a combined resolution of 500 megapixels. This display wall is driven by 18 Dell XPS personal computers with 80 NVIDIA Quadro FX 5600. Therefore, the result is not simply in a large passive monitor but rather in a very large “visual computer” which can calculate and display at the same time.) (H) In contrast to existing analysis and visualizations done by individual designers and commercial companies (such as data-intensive recommendation systems created by Amazon, Netflix, Tivo and other media companies), we will explicitly position our work in relation to humanities, cultural criticism, contemporary cultural debates, as well as NSF’s vision for next-generation cyberinfrastructure, grid computing, and other cultural and scientific agendas; (I) Our proposed paradigm of Cultural Analytics can be also connected to the recently established paradigm of visual analytics. We believe that the vision of visual analytics - combining data analysis and data visualization to enable “discovery of the unexpected within massive, dynamically changing information spaces” – is perfectly applicable to cultural data sets. (J) The most important difference between our proposal and the existing work is conceptual. We want to use analysis and visualization of large cultural data sets to ask more challenging questions than the current generation of visualization projects is able to do. For example: given that U.S. government has recently focused on creating a better set of metrics for innovation initiatives, can we create quantitative measures of cultural innovation around the world (using analysis and visualization of cultural data)? Can we track and visualize the flow of cultural ideas, images, and influences between countries in the last decade – thus providing the first ever data-driven detailed map of how cultural globalization actually works? If we feel that the availability of information on the web makes ideas, forms, images, and other cultural “atoms” travel faster than before, can we track this quantitatively, visualizing how the development of the Web speeded up cultural communications over the last decade? How to Do Cultural Analytics First, get lots of cultural data – preferably, well structured and with some medatata. Here are some examples: thousands of twentieth century movies already available in digital format (P2P networks and commercial providers); 700,000 art history images collected by Artstor (funded by Mellon Foundation); millions of hours of video in the BBC motion gallery, 1.5 billion of photos on Flickr; records of activity in Massively Multiplayer Online Role-Playing Games (MMORPG) and Second Life; over 500,000 hours (~60 years) of continuous data on daily human behavior captured by MIT’s Reality Mining project. Ideally, we will get data from one of the leading social media companies such as Google, Flickr, Amazon, Sony, MySpace, etc. since these companies today have more wellstructured data on current cultural interactions (along with the media itself) than any public resource. Alternatively, we can harvest the data from the web ourselves and/or obtain the data from the public collections such as Artstor. And if we want to start with smaller data sets, well-structured media collections are easy to find (for instance, the project 6 Billion Others will soon make available 450 hours of video interviews with appr. 6,000 people from 65 different countries; see infosthetics.com for more examples of publically available data sets). The data set(s) can either be directly visualized; alternatively, they can be first statistically analyzed and then visualized. (While at present this analysis may have to be done in advance for very large data sets, in the future increasing computation speed should enable both analysis and visualization to be performed in real time at the user’s request.) The visualizations should allow the user to zoom between an overview of all data and the individual data. (For instance, the user should be able to view individual photographs or video clips, read blog entries, etc.) Visualization should have resolution and detail compatible to the most demanding scientific visualization done today. At the same time, we want to use more abstract visual language developed in information visualization and software art fields (for examples, see gallery at processing.org). Depending on the size of the data set, visualization can be static or animated or interactive. Of course, interactive is best! Thus, the user will be given controls which allow her/him to change what the dimensions are, filter which data are shown, change graphical style, etc. The following are some possible scenarios: Harvest data from the web or get an existing data set from a third party -> visualize using existing metadata Harvest data from the web or get an existing data set from a third party -> perform data analysis such as such as image processing to generate new metadata -> visualize Get an existing data set or harvest data from the web -> perform data analysis such as such as image processing to generate new metadata -> analyze metadata using data mining, other statistical analysis, etc. -> combine with other data sets -> visualize Ideally, we would like to construct a general purpose Cultural Analytics research environment that can be used by many users from different disciplines. Just as you can do that with GeoFusion tools in relation to geo databases, a user should be able to connect on the fly to various cultural database, quickly analyze and visualize, and also add her own or third-party visualization modules. Applications Cultural Analytics has the potential to generate new approaches for studying cultural history and contemporary culture. If slides made possible art history, and if the movie projector and video recorder enabled film studies, what new cultural disciplines may emerge out of the use of visualization and data analysis? At the same time, Cultural Analytics can provide a new application area for research in large-scale visualization, HCI, data storage, data analysis, and petascale computing as outlined in the previously cited NSF report. Other applications: humanists, social scientists, cultural critics, museums, digital heritage projects, as well as cultural consumers at large. In fact, everybody involved in culture today – from the individual members of a global “cultural class” to governments around the world which are competing in knowledge production and innovation – would be potentially interested in what we want to do – measures of cultural innovations, detailed pictures of global cultural changes, real-time views of global cultural consumption and production. Cultural Analytics projects will also complement the work done at Calit2 Center of Interdisciplinary Science for Art, Architecture and Archaeology. An Example of Cultural Analytics: Visualizing Cultural Change Humanities disciplines, cultural critics, museums, and other cultural institutions usually present culture in terms of self-contained cultural periods. Modern theories of history such as work by Kahn (“scientific paradigms") and Foucault (“epistemes") also focus on stable periods rather than transitions. In fact, very little intellectual energy has been spent in the modern period on thinking about how cultural change happens. Perhaps this was appropriate given that until recently the cultural changes of all kinds very usually slow. However, since the beginnings of globalization in the 1990s, not only have these changes accelerated worldwide, but the emphasis on change rather than stability became the key to global business and institutional thinking (expressed in the popularity of terms such as “innovation” and “disruptive change.”) Recently, software such as Google Trends and Nelson’s BlogPulse, as well as individual projects such as History Flow and visualizations of Flickr’s activity started to diagram how cultural preferences or particular cultural activities change over time. However, so far these projects / services either use small data sets or they generate very simple graphs. In order to create the first rich and detailed visualizations of cultural change, we will use data sets representing not one but two or more “cultural periods.” As an example, consider a visualization that shows how Renaissance changed into Baroque in Italian paintings of the 15th-17th centuries. Such visualization can solely rely on already available metadata about the data sets: for instance, paintings’ titles, dates, sizes, and available sociological data about the areas of Italy during the given period, etc. Alternatively, it can combine these metadata with an analysis of the dataset itself – in this case, we would use image processing to do image segmentation, feature extraction, and object classification. This new metadata can also be subjected to statistical analysis. Other examples of interesting and culturally and economically important data sets: (K) The changes in patterns of cultural consumption around the world as a result of globalization; (L) The proliferation of styles in popular and dance music over the last few decades; (M)Changes in the characteristics of video games, movies, blogs, or any other currently popular cultural form over periods of time. (N) Visualizing the changes and development of concepts using Wikipedia entries – comparing how frequently different articles get updated over time, tracking development of new articles. (This project is currently underway in my undergraduate class – L.M.) In the case of visualizing cultural change in the last few decades, the visualizations can also use economic and sociological data about the people/places (many such data sets are now available for free on the web.) An Example of Cultural Analytics Interface: Globalization Viewer Real-time (or near real-time) interactive visual information displays are used in a variety of professional settings: business information dashboards; financial analysis and trading; situational awareness; command and control; visual analytics applications; GIS applications; monitoring of large systems such as factories, telecommunication systems, or space missions; interfaces of vehicles, etc. Yet such displays have never been deployed to present the detailed views of cultural processes. What currently exist are simple applications that can show one kind of cultural activity without any context: photos being uploaded on Flickr in real-time, most discussed themes in blogosphere (blogpulse), popular search terms (the display at Googleplex entrance). Using the map of earth as the interface, our Globalization Viewer would show real-time (or near real-time) information about the patterns of cultural consumption and production around the world. The information is rich, detailed, and it covers many categories. The user can zoom into particular geographical areas to see what movies are playing there, popular books, music, and other products, the proportions between local and foreign brands, new architectural projects, exhibitions and festivals, topics discussed in local press and in local blogs, etc. Optional: a spatial interface can be combined with a timeline interface that shows how cultural patterns change over time in different areas (see timeline visualizations of social and economic data produced by Gapminder - recently bought by Google who made it animation software available in Google Docs spreadsheet - for interface examples). Although such a real-time viewer can be connected to many current cultural discussions and commercial interests, we feel that thinking about it as a visual tool that reveals in detail a how local cultures change due to globalization would be particularly relevant today. By taking advantage of the resolution of Calit2 wall-size displays, culture around the world for the first time can be visualized as a dynamic process in sufficient detail. The web provides enough data sources to drive such visualizations, therefore is simply a matter of collecting these data from various web sites and displaying them together in the compelling manner. We believe that such a display will be of interest to a variety of players – from cultural institutions to potentially any company including social media companies operating on the Web. An Example of Cultural Analytics Interface: Virtual World GeoViewer (Prepared by Noah Wardrip-Fruin, Assistant Professor, Communication Department, UCSD; Associate Director, Software Studies Initiative nwf@ucsd.edu) Most of the research on virtual worlds (such as massively multiplayer games and worlds like Second Life) falls into two camps: (a) qualitative work with small numbers of users and (b) quantitative work on topics such as world population, activities of users, and so on. Almost none of the research engages with the fact that these are *worlds* which are moved through geographically. It also does not combine them with the kinds of geographic visualizations that have proven powerful for understanding our own world (from popular web services such as Zillow to high end ocean atmosphere science). This project will bring a spatial approach to studying virtual worlds, providing a powerful research tool as well as a compelling visualization for users and the interested public. It has the potential to significantly shift how we see this important emerging cultural phenomenon. As we imagine the results, users will be able to see a virtual world not just as a geography, but also as software, updated in real time. This allows for the visual traversal and data querying present in applications such as those built with GeoFusion, but we can see much more: What are the users doing and what is the world software doing? This might make it possible, for example, to see how common patterns of leveling behavior emerge in an MMO. The pathways of the fastest-leveling players could be traced and examined, especially in relation to software operations such as instance creation, joining and leaving guilds, item acquisition, etc. This would be of interest to researchers, to the game companies that design and publish MMOs, and also to players eager to discover new ways of engaging the world. Conversely, a researcher doing qualitative work in an virtual world might (with permission) trace the journeys of their informants, with markers for significant events (the first purchase of land in Second Life, the early leveling events in an MMO) looking for patterns of relation with the world not apparent through interviews with the subjects. We believe the most powerful approach to this sort of work would involve the types of high-resolution tiled displays being created at Calit2. At the same time, if a smaller-scale web version could be made available this might spark broad community interest and conversation. Possible partners include Sony Online Entertainment (who has expressed an interest in working with us) and Linden Labs (who seem to be interested in alternative visualizations of Second Life). More Examples of Cultural Analytics Projects Projects that analyze the contents of a set of cultural products (each questions below can be also asked in relation to any other art/design field). (1) "Are there any differences today between portfolios of designers (in a particular field) from different countries? If so, how significant are these differences? How did this changed over last 10 years?" (Examples of data sets: design portfolios on coroflot.com; student blogs of architectural schools @ archinect.com, short films on Vimeo, etc.) (2) "if we describe contemporary motion graphics (or any other art/design field) works using a number of attributes, what patterns can we observe if we sample a large number of works? Do some attributes tend to occur together? Are there any significant differences between the works in different areas as they are specified on xplsv.tv - commercial, experimental, VJ live, etc?" (3) "if we graph the gradual simplification of forms in European art of the end of the nineteenth and early twentieth century which eventually lead to pure abstraction, what patterns can be observe? Will we see a linear development (i.e. a strait line) or will we see some jumps, accelerations, etc.? Will these patterns be different for different arts and media (i.e., painting, furniture, decorative arts, architecture)? Examples of projects which look at the patterns in the development of cultural institutions, ideas, and cultural perceptions: History Flow visualization (4) “How do designers in different parts of Asia incorporate the concept of “Asia” in their designs? Do these different “Asia” concepts work in distinct ways in different media? What are the “cultural DNA” which make “Asia” in design, and how they are distributed across different regions, media, markets? (5) "what geopolitical patterns would we see if we map the growth of art museums (or: art biennials, design biennials, fashion works, film festivals, design schools, university programs in media, university majors for new subjects, publications in art history, articles on media art, etc.) around the world over last 5 (7, 10, etc.) years?" (6) "If we look at new cultural concepts (or selected set of concepts) which emerged in this decade, how much attention and "mindshare" these concepts have captured relative to each other? To answer this question we can analyze Wikipedia pages for these concepts. We can use the dates when the pages were established and also compare how much and how frequently people contributed to each page." (I.e., doing History Flow type of analysis but on a large set of pages.) A Example of Cultural Analytics Interface: Media Explorer The interfaces that we use to navigate media collections – from Artstor to Flickr and YouTube – are still largely derived from the nineteenth century media. We use digital equivalents of light tables, filmstrips, or photo books. If digital media collections already contain billions of images, why I can only see a handful of them at any one time, and why they are arranged in a grid? Why a computer cannot help me to find new connections between the images, or to lead me to images that stand out on some dimensions? Rather than using a computer as a fancy slide viewer, we can use it as an “intelligence amplifier” who will help us explore image collections (or collections of any other media) in new ways. Using computer vision techniques, we will first analyze all the images in a collection. Every image will be automatically described on dozens of dimensions: histograms showing distribution of colors, shapes, textures, etc.; presence of human figures and faces; composition; and so on. This new metadata will be added to already existing metadata such as year the artwork was created, size, medium, name of the artist, place of creation, description of content, genre, national school, user tags, etc. Statistical techniques will be used to cluster images similar on some dimensions into new groupings, to highlight the images that have unusual combinations of features, etc. (This can be done either beforehand, or in real time – if a system runs on a sufficient fast system such as a grid computer). As the user explorers a media collection, software uses the metadata and statistical analysis to show related media objects on whatever dimensions the user selects. This allows an exploration of cultures across time, space, media, genre, and other conventional groupings. For instance, the user may choose to see all images with similar content, composition, color scheme, etc. within a particular historical / national space, or across the whole collection. Additionally, constantly updated statistical graphs may signal if particular image is unique or at least significantly different within a given set.