>> Andres Monroy: Welcome, everyone. My name is... the MSR, and I have the pleasure to introduce Cesar...

>> Andres Monroy: Welcome, everyone. My name is Andres Monroy. I'm a researcher here in the MSR, and I have the pleasure to introduce Cesar Hidalgo. Cesar is a professor at the MIT Media Lab. And actually I was a student at the Media Lab, but I left right when he started, so I feel like when I left the Media Lab, the Media Lab became a lot more interesting, not only because of Cesar, but there's a lot of [inaudible] doing really interesting work there as well. So Cesar is the head of the Macro Connections Groups at the Media Lab. He's also the ABC Career professor of media arts and sciences. And he actually just published -- he has one book he published last year, right? And then one, too, this year. >> Cesar Hidalgo: Yeah, we have an MIT Press book 2014, and now like the new book, Why Information Grows: The Evolution of Order, From Atoms to Economies, coming out on June 2nd. >> Andres Monroy: Awesome. All right. Welcome Cesar Hidalgo. >> Cesar Hidalgo: Thank you. Thank you, Andre. So it's a pleasure for me to be in this environment because, first of all, I've been a user of Microsoft products from a very young age. But also, you know, it's kind of like a technical environment which I think probably I can get good feedback, ideas, about some of the projects that we're doing that are not necessarily publications but actually platforms that have now been adopted and used by large numbers of people. So what I'm going to do today is I'm going to show you a number of different sites that fall into a category that I like to call data visualization engines. What these sites try to solve is a little bit of like the opposite of the problem that many big companies in the tech center have been resolving. Many companies like EMC or SAP or Oracle have solved the problem of like, hey, you're a client, you have shit loads of data, we'll help you catch all of that data, have sure that it's properly indexed and stored and everything, so then view you can possibly retrieve it and, you know, operate on that data. But on the other hand you need -- well, the other side of the coin would be to have some technologies that would allow you to visualize that data and to basically draw insight from it. And unfortunately all of the large companies that have solved the problem of storing the data and catching the data have not solved very well the problem of giving the data back to the people, you know, in a way that makes it easy to understand. And I've been solving this problem not because I'm very interested on databases or database visualization, per se, I am interested now, but originally I was very interested in understanding the world. But I've always been a very macro guy. You know? And if you're a macro guy, you basically want to look at the work at the large type of structure, you know, at a large scale. And in that context I had to develop these visualization engines to help inform the research that I was doing and eventually found that this -- hey, Shahar, how are you doing? -- that these visualization engines, you know, were something that also helped distributed research in a different way than papers. So what I want to do today is I want to show you seven different projects. The first one is Observatory of Economic Complexity. The second one is going to be DataViva, you know, which makes available data for the entire formal sector economy of Brazil, more than 50 million people, a billion visualizations. Pantheon, which looks at data and cultural production, which we work with Shahar. The Global Language Network which actually was Shahar's thesis at MIT. For some reason this one, you know, like it got [inaudible]. Immersion, which is a really sign of the e-mail interface, Place Pulse, and Street Score. Okay? And what I want to do is I want to show you that these data visualization engines are not just way of transforming bits into colors and shapes but actually are tools that we can use to get stories out of data and draw insight so we end up learning about the world. So the first example that I want to show you is the one of the observatory of economic complexity. This is a tool that makes available data for international trade for basically all countries in the world and for the last 50 years. It's the number one destination nowadays if you search for international trade data. So if you search what are the exports of Argentina in Google, the first link is this site. And what this site does is allows you to display all of that data but the Trojan horse aspect of the observatory of economic complexity is that by displaying international trade data and by focusing on the types of products that country export, not just in values, you are starting to introduce the idea that actually what matters for economic development is the mix of products that you make. So to illustrate that, let's look at a few examples. The first example here shows the products exported by Chile. And as you see, Chile, you know, is a country that for its size -- is around 18 million people -- exports quite a bit. That's $78 billion. You know, that's a good chunk of change. But you see the products that Chile export are mostly refined copper, copper ore, raw copper, grapes, a little bit of wine. You know, wine, you know, even if you export a little doesn't kind of like make economy, you know? It's very hard to export, you know, too many billions of wine. And now let's look at the products that South Korea exports. Now, South Korea obviously is a much larger country, much larger economy, exports $562 billion, but also, you know, the difference is not just in the amount of the export but also what they export. They export integrated circuit, cars, LCDs, refined petroleum, broadcasting equipment and so forth. Now, the question is are these differences in the mix of products a country make consequential or they just cosmetic. Should we care about those things. So to introduce this very simply, let's just first look at the bilateral trade between Chile and South Korea to make a contrast about the monetary values and about the mix of products that they're exchanging. So if you look here at the top, you see now that Chile exports $4.6 billion to South Korea. Okay? So from about $76 billion that it export, 4.6 goes to South Korea. And these are mostly atoms -- refined copper, raw copper, copper ore, sulfate chemical woodpulp, pig meat, you know, grapes. You know, products of relatively low sophistication. Now, Chile imports $2.5 billion from Korea. So Chile has a positive trade surplus with Korea. They export 4.6 billion; they import 2.5 billion. But it has kind of like a negative imagination balance because what Chile's exporting is atoms. What Chile is importing from Korea is the way in which those atoms are arranged. And that's kind of like information. Like in cars, delivery trucks, you know, you have actually that you are involving the knowledge, know-how, and imagination of Korean workers. So the question is what -- you know, Chile is exporting atoms and is making a shit load of money; Korea is exporting the way that the atoms are arranged. It appears that that might matter for something, but can we get some quantitative evidence that these things are consequential. And actually we can. There's like three things that are very important that we can predict very accurately by knowing the mix of products that countries make. The first one is that we can anticipate what products they're going to make in the future. So we can predict the future paths of [inaudible]. The second one is we can predict which countries are going to grow and which countries are not going to grow. So we can predict actually economy growth, long-term economy growth. Finally we can predict the levels of inequality. Because at the end of the day, the mix of products that countries make demands certain types of institutions. So let's say imagine like you are a country that has tobacco plantations. Well, that's a type of product that you can be readily productive at with a really shitty institutional environment. Okay? But if you're a software industry, and imagine like software industry you be treating workers as workers were being treated in the tobacco plantation 300 years ago, probably you guys wouldn't be very productive. You know? Because you demand a different type of institution given the type of work that you produce. And in that context, the mix of products that a country makes also helps moderate even the distribution of that income. So let's look at the first of these predictions. Yes. >>: So a question. So it is true that you can make these inferences based on the data you show, and I think that's very valuable, but I guess my question is what is the bar? Because I just did this search and I can find articles in The Economist that talk about the connection to these two countries, or there's a four-page report by a Ph.D. candidate that, you know, talks about these things at length. So it's possible the problem identify a pattern that sort of summarizes your inference as well. Question is, well, how to -- what and how to compare the results of this to something else. >> Cesar Hidalgo: Yeah. So like just to give an idea, like, first of all, what I'm showing you here is the database visualization technology. On the other hand, we have a number of papers that have published in science, in PNAS, a book that came in MIT Press that document all of the statistics about this. And the way that you compare this is that in some way when you are looking at the economy growth of countries, what you need to make sure, you know, is that you are able to control for other possible confounding factors. And for that there are many statistical methods that you can use to try to discount for those other factors. So, for instance, if I'm looking at the economy growth of countries and I want to try to show you that the mix of products that countries make is what matters for economy growth, I have to be able to show you that, well, that is robust to control for a country's initial level of income, by its level of location, by its institutional environment and so forth. And we have shown in our research that basically the mix of products that countries make is a much stronger predictor in traditional [inaudible] controls. Actually it incorporates information of all of those controls and provides additional information in the predictions. Yeah. >>: [inaudible] statistics, but guess I wonder shouldn't you be looking to the qualitative insight you can glean from looking at the data presented like so as opposed to, you know, using insight [inaudible] foreign affairs or economics. >> Cesar Hidalgo: Why is an or? I don't understand why it's an or. So, for example, one of the things that this site is used for is that journalists sometimes, you know, would say they want to discuss the economic ties between India and China. And what they would do, they would like go and sometimes go to Google and search, hey, what does China export to India and, you know -and so forth. And then they would bump into a site, and then they would embed this visualization as part of the narrative. So it's not a really would you either have narratives or either you have charts. Like here you have a site that makes around 30 million charts, and you can use those 30 million charts to construct narratives as well. And what I'm doing here is actually using a few of these charts to tell you some little narratives. One of these narratives is like, look, you know, Chile and Korea, even though the trade balance goes in one direction, you know, the balance of imagination go in an opposite direction. I could certainly write that as like a 300-word op ed in which I would embed these stories. You know? Like the other things also you can embed in narratives, and that's why we have written papers about it. Because in the papers we include the narratives and the charts. But I don't think it's an or. Yeah. So like just to recapitulate, what I want to try to show you guys very briefly is that there's three things that we can predict based on the mix of products that countries make. What are the products they're going to make in the future, how much they're going to grow, and how unequal they're going to be. So let's look at the first of those predictions. So to look at the first of the predictions, I'm going to a visualization that is a little bit more complex. You know? This is the product space. Okay. And the product space in this case is also showing the products that Chile exports. Now, in this case for the year 1979, but now the nodes that are painted show the products that Chile exports a lot of. The ones that are not painted, the ones they export little of. So Chile 1979 exports a lot of preserved fish but very little fresh fish, very little frozen fish fillet. This have comparative advantage in the light gray products. Now, Chile also exports, for instance, miscellaneous fruit but doesn't export vegetables or fruit vegetable juices. But we know which products are connected and what the opportunities for the Chilean economy are because we know what's the probability that two products have co-exported. Okay? So now, for instance, if I say, well, Chile isn't going to export vegetables in the future, and I say one, two, three, four products that are connected to vegetables, Chile does not export, or one, two, three, four, you know, five products that are connected to vegetables are products that Chile does export. So in some sense vegetables is a product for which Chile has kind of like a high density, there's a lot of activity happen around it, which will predict that maybe should be a product that Chile would export in the future. Now, if I look, for example, at parts of meta working machine tools, these are product that is also connected to many products none of which Chile's able to export successfully, so I will predict that that's not going to be a future export. This -- I'm sure that is very simple for you guys, but this is just like any standard recommended system, more or less is kind of doing this thing. So this is like in some way making explicit, the graph that underlies any recommender system. Okay? So let's look if the prediction pans out. So the prediction was that Chile was going to diversify in the fishing cluster, in the processed food clusters. And if I go from 1979 to year 1996, you know, I'm going to find that actually, you know, like Chile has diversified, so now they do export vegetables, they do export the fruit and vegetable juices. They do now export a little fresh fish and so forth. Okay? Yes. >>: How do you control for things like political stability and just, you know ->> Cesar Hidalgo: So like the institutional barriers that are -- like there are variables out there that we use, and I can show you in the book, but they are -- they are very self-service so they're kind of shitty [inaudible]. So you have like the World Governance indicators, you know, that are available since 1996, and they have, for example, they're controlled for corruption, political instability, and variables like that that are determined by the people at the World Bank that developed the World Governance Indicators. What we find, though, is that those variables have very little predictive power and the reason why they have very little predictive power is that at the end of the day like institutions of a country are also reflected in the mix of products that a country is able to make. So in some sense like the component that matters for an economy is probably already reflected in this vector that has 700 different features, you know, each one of which is whether this country export or not that product. >>: So if you were to look at Syria or Iraq 1979 versus today ->> Cesar Hidalgo: Oh, yeah, yeah. Like but these countries -- like that's like we can start to [inaudible] political economy, but let's look at like -- let's say let's look at Iraq, and let's go before the U.S. kind of like made a mess out of it, so 1985 -- oh, shit. I'm not online. This was also cached in the browser. Do you guys have an MSFTP open? Let's see. >>: [inaudible]. >> Cesar Hidalgo: Oh, yeah, give us your little rights away. Yeah. Okay. There. Sure. Okay. Thank you. So let's look at, you know, Iraq. And, for example, if you look at Iraq in 1985, it's a very boring product space. Just have like oil there. Oil is a [inaudible]. It's very large. If you look at it as a treemap, you kind of like see what share represents. But this is something that is a [inaudible] of countries that are concentrated in natural resource exports, which is the problem of political capture, economic capture that is well described by [inaudible] in his book about a million. And the idea is the following, is that let's say I have a country that has [inaudible] diverse economy. Controlling that country politically or economically is kind of hard because there's a lot of different actors that are playing in that economy. But in a country like this, if I become the president and petroleum happens to be a state industry, I have absolute control. I have political control on the one hand and I have economic control on the other hand. So that's why countries that find petroleum, well, the second thing that they find is a bad president because they are countries that are very easy to capture. While a country that has a very diverse economy like Japan, Germany, or the U.S., even if you controlled the entire tech sector, there's a lot of other sectors that still -- that have a wait and makes that political process much harder to fall in the hands of a single individual. But that's a good question. So okay. So the first idea was that given the mix of products a country makes, you can predict what they're going to do next. Hopefully that was something that I was able to communicate. The second idea, you know, is that actually if you know the mix of products that a country makes, you can predict their future level of economic growth. And this is a little bit more complex. I'm going to do a very superficial description because I don't want to go into details of the math. But basically what you do is you take the matrix that connect countries to products, to project it over the space of countries, and then you take an eigenvector of that matrix. And that is a measure that tells you how complex the economy of that country is because in some sense the eigenvector of that matrix is telling you how diverse are the countries that make what you make and how ubiquitous are the products that are made by those countries. So if you're a country that makes a lot of products that fewer countries can make, you're kind of a sophisticated economy. If you're a country that makes few products that everyone makes, then your economy is kind of simple. Okay? It's the same as a multiple choice test. If you're a kid that only answers the question that everybody got right and answer very few questions, probably you know very little. If you're a kid that answered all of the questions, even those that nobody else was able to answer, you're probably a kid that has a lot of capacities. So you run this test over countries, and you get this measure that is the economic complexity index, which you can interpret as the ability of countries to generate income. And on the Y axis you have actually the amount of income that they actually generate. So, for example, here you have a country like China, and China is a country that you see that is kind of like below the cloud. Below the cloud of points, not the cloud that is someone else's computer. Here you have that China is a country that has an economy that is too complex, given the level of income, because basically the complexity of the Chinese economy is comparable to that of Hong Kong, Netherlands, or Norway; meaning, that is a country that is doomed to grow. And develop complexity, you know, the income should be higher. Now, if you look at a country like, for example, Qatar or Kuwait, given what they're able to make, their incomes are too high, and that's kind of clear. Why? Well, because they're selling a lot of items, you know, basically they're digging money off the ground, but they're not actually generating much economic value. You have Australia, South Arabia, and Greece, for instance. If you go back in time, Greece, for example, was one of the countries that was most out of whack before the crisis in terms of this measure. And what we find is that statistically is that here you have Norway and here you're going to have Greece -- is that over time the countries that are below the line tend to grow faster than ones that are above the line. And that's a very strong predictor. Actually would predict like 50 percent of future economic growth like 50 years in advance in the best cases -- sorry, 20 years in advance. Okay? So it's a long-term economic growth because the capacity of countries to make products is sort of like a fundamental that helps us explain the capacity that they have to generate income. Okay? Now, the last thing that I wanted to show you is that eventually we're also working now on a new paper that is doing a connection between the complexity of our country's economy and the level of income inequality. But to illustrate that, I'm going to change into a new platform, you know, which is a much more ambitious platform, which is called DataViva. And DataViva is a platform that makes one billion visualizations. This data contains -- this platform contains data for all entire former sector economy in Brazil. This is 50 million people for the last 12 years. It includes data on exports at a monthly basis. It includes data on employment, on salary, on industries and location. It's the most ambitious data visualization Web site for public data to date. And we launch it last week in this new version and would actually work in now all of the marketing to get it out with a number of different venues. But what I'm going to show you here is a little bit of what DataViva does, and then I'm going to tell you how we can start looking at this idea that maybe the mix of products that a country makes also might affect the way in which they distribute income. So let's look at a profile that is generated by DataViva. So if you were to search Sao Paulo, what you would get, it is like relatively long profile in which you can learn about trade, wage and employment, economic opportunities and so forth. So, for example, here you see that Sao Paulo's trade balance was kind of like roughly neutral. They were importing as much as they were exporting until 2008. And after the crisis obviously both exports and import went down, but then imports rebounded and exports did not. And now they have been carrying this large gap for a period of like five to six years. Then you can ask yourself the question, well, what are things that actually Sao Paulo exports? And you see that in Sao Paulo most of the industries that are located there are still industries that are managing the exports of soybean, raw sugars, coffee, corn. Not a very sophisticated economy. What are the destinations that Sao Paulo export to, China, India, or what are the products that they import. And the nice thing about this platform is that also you can ask like full of questions. So, for instance, when you look at destinations, say, wow, the export look to China to United States. What do they export to China? Okay? And I can ask what are the exports to Sao Paulo to China, and I would get visually like soybeans, sulfate chemical woodpulp, raw sugar. Okay? You know, then ->>: [inaudible]. >> Cesar Hidalgo: Yeah. >>: How do they have some room for soybean? >> Cesar Hidalgo: Because remember that this is data based on the industry. So you -- to be a soybean executive, you don't have to be sitting in a field. You have to be sitting in a building downtown Sao Paulo trading soybeans, you know, on the Internet, on the phone, and stuff like that. So in that context, Sao Paulo export soybeans because the people that are moving the soybeans around, you know, yeah, are sitting there and getting salaries there and sending their kids to school there than, you know -- okay. Then, you know, you can look at industries. Then like other things that are kind of interesting, then you can look at wages data, you can look at like, for example, like the distribution of income in Sao Paulo. You can look at like the occupations that exist in the city and how much money each one of them -- so how much does it cost to hire an information analyst in Sao Paulo, on average 5,000 reals, 5,000 -- 5.3 reals. What are the common majors that people study? Could look at the universities that people would have to go to. What are the industries that would employ information analysts in Sao Paulo? You guys were like looking to open a research lab in Sao Paulo I heard like recently like where we were preparing for the conversation. Now here you see these are the industries that hire them. You know, for example, when it comes to IT consultancy in that case actually they made a little bit more than average, they may make 6.2 thousand reals a month. So you then get an idea of like the average salaries that are commanded by each occupation in each industry. But in this context what I wanted to do is to highlight a little story that would connect the inequality of countries with the industries that are present in them. So here we have now the experts for the entirety of Brazil. Brazil also gets a profile, in that way we can profiles for the entire country, for each state and for each of the 5,000 municipalities, which would be the equivalent of a city here in the U.S. And if you do entire Brazil, you see Brazil exports $225 billion, but they export a lot of iron ore and crude petroleum, refined petroleum. These are extractive industries. And they also export actually quite a bit of different types of machinery and transportation, like aircraft and cars. Like these little Embraer jets. They're Brazilian. You know, the engines are not Brazilian, but the jet is, you know, Brazilian made. So the question is, well, you know, like in terms of exports, you can think that like maybe the resources actually contribute a lot to the Brazilian economy. But let's see how those things look like in terms employment. So here now you have the occupations in Brazil that are employed in the extractive industries. And in total, let me -- let me do [inaudible] because I think here there was one portion that we don't get to yesterday. There's like a little -- yeah. So let's look now, you know, at the people that are employed, you know, in the extractive industries, you know, in Brazil and how many of those and what type of jobs they do. And what we're going to find, you know, I don't know -- I don't think this is my site. This is your Internet, guys. You know? We're going to start with a process in industries. So if you look at manufacturing, you're going to find that manufacturing even though it doesn't export that much, it represents 7.9 million jobs. The entire [inaudible] Brazil is 50 million jobs. So actually this is a big fraction. This is like 20 percent of all employment goes into this processing industry. Now, if you look at extractive industries -- let me load it again. Maybe I'm not online anymore. If Google doesn't work. Oh, no, yeah. Did Amazon go down? This is in U.S. Okay. Sorry about that, guys. So sorry about the back. But like what you would find is that the mineral extraction industry, they employ around 300,000 people in total. So while on the one hand, you know, they represent around 20 percent of the total exports of Brazil, on the other hand they export as few as 300,000 people. The manufacturing sector, you know, and the processing industries, they represent a relatively comparable fractions of total exports. They're not bigger or much bigger than the process in -than the process is done in the extractive industries, but they employed 8 million people. So you can see that obviously they are a much more inclusive type of economic activity in which you know you involve a large number of individuals in the process. So what I want to show next, you know, is another set of projects that look not only now at the economy of countries but look at their cultural production. As you can imagine, I spend a lot of my life looking at the mix of products that countries make or at the economic activities that are present in each part of Brazil and how this evolve over time. And at some point what happen is that that shit got old and I got bored and I got tired. So what did we started to do next? We started to look at cultural production because basically as I presenting this type of research around the world, one of the questions that I was always getting was, well, you're looking at products, how about services, you know? How about like the hotel industry, how about like the restaurant industry? And I was saying, well, you know, first of all, like in the real world we kind of like started to include that, because this includes industry data and employment data, and that will include industries, second, you know, there's no reason to believe that that's a very interesting question because all of the mechanisms that we describe in terms of products probably hold in the case of industries, like you're going to have related varieties and you're going to move from industries that you are good at to industries that are similar. So one day I realize that a question that nobody had asked me was like, well, what about Elvis Presley? And when it dawned on me, I said like, well, that must be a good question because in some sense, well, the U.S. does export let's say soybeans and aircraft engines, but they also export culture. Like, for instance, like Miles Davis or Elvis Presley or [inaudible] Armstrong. Where do we count that? What are the patterns that are defined actually by cultural production rather than industrial production. So we created another project called Pantheon in which actually we started to accumulate data in cultural production for the last, you know, 6,000 years. So just to give an idea, like if Chile looks like this in the context of the [inaudible] complexity, the country export refined copper and copper ore, wine, and grapes, in the case of Pantheon, Chile looks like a country that has exported politicians like Pinochet, Allende, O'Higgins, soccer players like Zamarano and Salas, or writers like Pablo Neruda and Gabriela Mistral. Now, how do we know which people goes into Pantheon? I want to just give you like the short answer to that question. For the long answer, you can go to the method section. But what we do is first, you know, we look at all people that have presence in at least 25 different languages in the Wikipedia. Why do we do that? Well, first because if we were to use only the English Wikipedia, we're going to be very biased towards the English language. So in some sense we look at people that have presence in a large number of languages, because that gives us a bit of an idea of which persons have global fame rather than local fame. So, for instance, American football players don't make it into Pantheon because they tend to be locally famous. They have long pages in English Wikipedia; they have basically pages in few other languages. Someone like Isaac Newton, on the other hand, will have page in almost 200 Wikipedia language editions. Okay? So that's the first thing. So we look at people that have presence in many different language in the Wikipedia. Why 25? Because with 25 languages in the Wikipedia, we got a set of 11,334 people that we had to then create manually. And that's kind of like was at the limit of what we could do with that manual curation. Why manually? Because in some sense, even the ontology that you're seeing here to describe the cultural production of Chile did not exist and we have to create it by using uncontrolled vocabularies and a lot of manual labor. So what you can do with Pantheon, well, is on the one hand, sure, you can look at difference between countries, so, for example, if you look at the pattern of cultural production of Chile, you know, that includes 26 people; or the U.S., which includes 2,000 people, you see great differences. For instance, there's a lot of actors, singers, and magicians in the U.S., also a lot of scientists that were not present in the case of Chile or even, you know, a much larger diversity of sports. You know, because in Chile was mostly soccer and tennis. But what is interesting about Pantheon is not so much that you can look at differences between countries, is that you can look at the way in which cultural production evolves over time. So what I want to do here, so I'm going to start -- yep. >>: One questions. How do you deal with people who are born in one place but become [inaudible]. >> Cesar Hidalgo: Like where you're born, that's where we put you in because that's the only thing that we can like really encode with a hundred percent certainty. So the Greece people hate us because there's a lot of famous Greeks that are born in present day Turkey. But there's all the geocode in APIs you express in the boundaries. So there's no geocode in API that I can use to know if like let's say priests born in the year 1200 in what is now Spain was Visigoth or an Ostrogoth. You know? It's going to tell me Spain. So we have to kind of like deal with those type of constraints. But what is interesting is actually now when you look at the entire world and you see things changing over time. And in that context, you know, here is like basically our 6,000 years of history. And what I want to do is I want to start looking at these 6,000 years of history but concentrating on different technological eras. And what we're going to see is that there's a few things that happen when you change technological eras. The first one is that the composition of culture production changes dramatically. Okay. So who would remember changes. And the other thing is that the number of people that would remember also changes. So let's start, you know, by looking at the world up to year 1400. And this is all of the people that will remember up to year 1400. This is basically before the printing press. This is the era of writing but prior to a printing press. And you see that most of the cultural production of the world, most of the people that we do remember involves politicians and religious figures. Okay. What also is kind of curious here is that the arch, for instance, are quite conspicuous. You have nine printers that fit into this period from which most of them were born in the late 1380s or 1390s. You know? Because basically these are people that were still in some way a little bit famous by the time that printing was invented, like Donatello, Van Eyck [inaudible]. So what we're going to find is that when we change, you know, now the time window to a time window when which we look at the period after printing but prior to film and radio, this matrix of culture production for the world is going to change completely. And why this is going to change? Well, like some of the theories of why this should change, you know, would involve the work of Marshall McLuhan, which I'm sure that some people might be familiar with. Marshall McLuhan said the medium is the message. Now, what he meant by that is what changes society is not what people say but the technology that they use to say those things. So in some sense like what people say on the radio is gone with the wind, but the invention of the radio was a transformative technology that change the type of discussions that were happening, the type of people that was involved in those discussions and so forth. There other person that argues this forcefully is Elizabeth Eisenstein. She wrote a book called The Printing Press As an Agent of Change. And in that book Eisenstein [inaudible] the printing press did not only change the number of books being produced but it changed what was in those books. It changed, you know, like basically like -- it changes a lot of things. First of all, it creates the information that is more permanent than the one that was passed on before. So with that it develops the idea of spelling that didn't exist. Then eventually also, you know, it starts reviving a lot of the classics. Because I don't know if you guys know, but like in the year like 1200 not too many people in Europe knew about Aristotle or Socrates. That information was basically more or less preserved in the Arab empire and then it was reimported back in Europe and it was disseminated once again with the printing press. And then eventually, you know, that involved in cooperation of new people into publications because now printers were for-profit people that wanted to find books, you know, to print that other people wanted to buy. And these books could be like the dialogues of Galileo that became very popular when the church make them illegal or other books of scientists of that time. So like what Eisenstein argues is that with the invention of the printing press, there is the shift towards the arts and the sciences. Do we see that here? So this is the matrix of cultural production for people born between the year 1400 and 1900. And you see that is very different. Now religious figures are just a mere 3 percent. Okay? So quite minor. And you have a lot of physicists, biologists, mathematicians, chemists, astronomers, you know, physicians, economists that are born in that period. You also have a lot of painters, composers and artists. So it's a very different set of people and types of people that were remembering from that era than from the era prior to printing. Now, the nice thing is that this is not the last time, you know, that technology has changed. The next change came at the beginning of the 20th century with the introduction of film and radio. And with the introduction of film and radio, once again we get a new matrix of cultural production. So what happens now is that the matrix rearranges itself and now the arts continue to increase but they're quite different. They're not painters and composers anymore. Now they are performers. So you have actors, musicians, singers, and film directors. So people don't talk about that movie that was written by that guy; they talk about that movie in which Brad Pitt was in. Now, why is that? Well, because the medium before was captured in the words. It was text. It was books. So you worried about the author. The medium now is capturing the faces. So worry about the actor. And the actor is actually something that it tells us that in this case [inaudible] must go from the medium to the fame of the individual. Because actors existed all along. They were not invented with film. The Greeks had actors. Shakespeare had actors. But no one remembers the actors from the time of Shakespeare. Nobody remember the actors from time of the Greek. Or there are very few that people would remember. While, you know, when the silver screen comes along, you know, basically the performers -actors, musicians, singers and also some creative people like film director -- are the ones that get enhanced. Now, the second half of the 20th century has the introduction of a new technology, which is television. And with the introduction of television, the matrix changes again. What we have is the rise of the famous sportsman. So TV is perfect to stay at home, drinking a beer and watching a game. You know? You don't do that like if the Super Bowl you have to go to like a movie theater to watch it, probably people would -- you know, it would not be that popular, you know? Like in some way like TV is an intimate thing that you're like you're watching sport in your underwear. And in that context, you know, you have like the frame of the famous sports player that gets enhanced with it. Now you might ask, you know, what happens with the Internet. You have to notice that I'm looking at people that are globally famous based on date of birth. So the number of people that are globally famous based on date of birth that were born after the Internet, you know, still were born like after 1996, so it would be too early to make any conclusions based on data. What I can do later if you guys want is to be speculative about how do I think the Internet is changing this matrix. But I cannot give you any evidence of my speculation. The other thing, though, that this helps explain is -- which I find fascinating, I was not excepting in the beginning -- is that in a -- as I show you, the matrix of cultural production changes with technologies because the composition of who becomes famous changes. But also the infraction of people that become famous also changes. So here what I have in the Y axis -- and I apologize for this chart because it's a PDF for a printout. We don't it in the visualization engine yet -- is what I'm looking here is that the birth of globally famous people divided by the population of the world at that time. Okay? So it's kind of like the -- have you guys play Civilization? Yeah? So it's kind of like the birthrate of these like great architects in civilization, stuff like that. Okay? And this number, you know, this is the year 500 BC, is basically constant all the way to year 1450. You see some things going up and down. That's Gaussian noise. Actually, we measured it, and there's sort of -- like there are a lot of small divisions. Once in a while you get a big one. They also tend to be coincide with like whole numbers. So I think it's just historians put in a lot of people in the year like, you know, 200 or shit like that. But it's actually Gaussian noise. Then you get this -- you know, we use that change point analysis technique, which is a statistical technique, to determine when the mean of a time series changes. And actually, you know, the change point analysis technique identifies the time of the printing press as a time in which the rate of producing globally famous people doubles. >>: Or the rate of remembering them. >> Cesar Hidalgo: What? >>: Or the rate of remembering them. >> Cesar Hidalgo: Yeah, yeah, yeah. Yeah. This is obviously the people that would remember. Yeah. Sadly. It's not if they were famous at that time, it's that if we know them now. Yeah. >>: Your first library [inaudible] before that. >> Cesar Hidalgo: Oh, yeah, yeah. No, these are, sorry, public library. Yeah. >>: What? >> Cesar Hidalgo: These are kind of like modern public libraries, like off an institution. Yeah. Yeah. And then eventually then with the introduction of, you know, like new communication and broadcasting technologies, this radius starts to explode. Obviously we don't know if it's because these people are recent or because they're memorable. Okay? Because a lot of the people that are now in our dataset, you know, are going to be forgotten, but some of them are not going to be. The other thing is if we look at the [inaudible] of fame no matter which window we look at is in parallel with the same more or less exponent around 4 and 5. So it means that like the number of -- the ratio of people with a certain level of fame remains constant despite changes in communication technologies. And that is something that I find to be quite interesting. Now, hopefully with this Pantheon study I have I was able to show to you that by looking at this dataset of cultural production we can learn about how broadcasting technologies have changed who remember, not which rate we remember people and how the distribution of fame looks like. But what we can do also is to look at how other factors, the connectivity of languages, affect the number of globally famous people that each language will be able to produce. And this is a paper that was created by Shahar. He's the first author. This is Ronen, et al. Shahar is there. Say hi, Shahar. >>: Hi. >> Cesar Hidalgo: And up here in PNAS last December, source of collaboration with Steven Pinker from Harvard, and basically what we did here is what we tried to do is a lot of people when they were looking at importance of languages, they were looking at intensive measures. How many people speak the language. How rich are those people. How big is the area. How big is their military power. But in reality thinking of language in the context of the power of the people that speak it is not the best way to think about language because the whole point of language is that you use it to communicate it. It's not an intrinsic property. It is kind of a medium. I can communicate with you guys because we share this language of English. Even though you can hear from my accent it's not my native tongue. So in some way when you think about the language, the language can do two things. One thing is it can help communicate people that speak the language, and in that case it can help transmit information indirectly between groups that do not speak that language. So, for instance, Shahar can learn a very nice joke, you know, in Israel, in Hebrew, then we meet together, you know, he tells it to me, and then I go back to Chile and I tell that joke in Spanish, and in some sense English was not the final destination of that joke, but it was kind of like this intermediate language through which that information went through. So we decided to go and try to map these networks. And, of course, you know, like finding data on which languages [inaudible] spoken was kind of difficult, but we were able to get our hands on three datasets that you should not interpret at datasets that are reflective of the entire world but that are datasets that are representative of very specific leads. Which are important because at the end of the day, you know, most of the information that is produced and generated in the world is produced by elites. And here by elite I don't mean kind of like the king and the countess, but kind of like everybody, for instance, that already has read a few books in their life. In a global context, they're an elite. And in this context what we have is that this is the network of languages that would come up if you look at 2.2 million book translations, you know, from a dataset from UNESCO. And you see that here English is kind of like this big global hub which connects to a large number of languages. Then you have Russian here. Anybody speaks Russian here? Okay. So here, for example, you see Russian, and Russian is kind of like a strong local hub that is connected to a lot of languages that are not very connected to anything else. You know, for example, like [inaudible] or Georgian [inaudible] in part, you know, is because there was explicit policy during the Soviet time to translate books to and from Russian, and there were many countries that had a political affiliation with the regime of the time, and therefore they receive much of the information from Russian sources. >>: So the size of the circle represents how many ->> Cesar Hidalgo: How many people. Exactly. So, for instance, here you have that English and Chinese have the same number of speakers if you count native and nonnative speakers. If you count people that speak English as a second language, there's 1.5 billion people in the world that speak English. >>: And the thickness of the link ->> Cesar Hidalgo: The number of books translated from one to the other. So, for example, between English and French, there have been many books that have been translated between English and French. >>: Books translated from one to the other or the number of books that exist in both languages? >> Cesar Hidalgo: No, translated. >>: [inaudible] to say Harry Potter book 2 exists in both French and Spanish. Is there an edge from French to Spanish because of Harry Potter book 2 ->> Cesar Hidalgo: No. >>: Okay. >> Cesar Hidalgo: No, no, no. So let's say Harry Potter book 2, you know, was written in English, then it gets translated to Spanish. That's a link in that direction. Now, let's say that then there is a translation that goes from Spanish to French. Then in that case it would count from a Spanish to French. So even, for example, let's say Mark Twain's Tom Sawyer was translated from English to Spanish and there's some translation to go from Spanish to Catalan. In that case that counts as a Spanish-Catalan link because an expression of the co-usage of those languages. >>: You think [inaudible] arrow, it should be ->> Cesar Hidalgo: Yeah. So in the paper we have the arrows. In the Web site we're calling it the arrows [inaudible]. >>: Sure. >> Cesar Hidalgo: Like this one we hacked it in a week. You know? This is not kind of like a big data [inaudible]. It's kinds of like a mini site. Because the difference, for example, this site, you know, gets hundreds of thousands of people every month. This one gets like 400,000 people every month. So it's kind of like a resource and the traffic increase over time. This one go like 300,000 people in like three days. And then by now there's very few traffic because it's just kind of like one idea that you're putting out. You're not putting out a resource. You know? So in that -- that's why something that you hack relatively quickly because it has a different type of lifecycle. >>: [inaudible] speakers but there's not so much literary activity in terms of translation, can you identify? >> Cesar Hidalgo: So, for example, like Chinese is definitely like the language that for its size tends to be very peripheral. But Hindi actually tends to be quite peripheral, too, because obviously in India there is also a lot of use of English, and through English, India connect itself to the world. Spanish for me was actually a little bit surprisingly disconnected in the books. But depends on the type of media you look to see something different. So, for instance, in Twitter Spanish actually tends to be like I think the second most important [inaudible] language. Arabic is the language that tends to be kind of peripheral in most of them and also is spoken in many people. In Wikipedia, German, you know, ends up being the second most important language in the Wikipedia. And German is a language that always rank high because we're using Eigenvectors [inaudible] languages not because it's connected to too many languages, but it tends to be connected to languages that are always influential. So German tends to be connected to all of like, you know, languages of Western Europe and Eastern Europe, and in that context, you know, it has kind of like some good neighbors when you're doing Eigenvectors and trying to get measure. But what we find which is hard which is kind of cool is that when you look at this network, you can ask yourself the question, now, okay, now I have measures of the connectivity of languages. There are not intensive measures of how much people speak the language or how rich they are. Are those measures better at explaining the number of famous people produced by that language than those in intensive measures. And the answer, you know, is that in the case of Wikipedia and the book translation network, yes, they actually are much better at explaining the number of famous people produced by a language. In fact, if you look at the PNAS paper, in the best case it will explain almost 90 percent of the variance in number of famous people produced by a language only by looking at the Eigenvector centrality of that language. So obviously we don't have a closer story in this case because we cannot differentiate between the hypothesis that people are learning a language because the language is producing good content or the languages that were connected are better at diffusing the content. I think that probably the strongest part of the fact is in the latter, that once you become famous in English, it's very easy to become globally famous where when you become famous let's say in [inaudible] you are probably famous in [inaudible]. Okay? And it's very hard for your fame to become global simply because that information is trapped. You know? Kind of be like if you do like a great piece of code for [inaudible] not going to go very far. So with that in mind, I want to show you like a few other projects. You know? Yeah. >>: Real quick. You've showed us three different images based on three different datasets. Naively would seem to have very different interpretations. They seem to have very different interconnectivities, different sets of nodes. Which one do I believe? >> Cesar Hidalgo: Like that's the wrong question, and I'll tell you why. Because you're interpreting all of these images to be representative of something that is beyond the dataset. But the way that you should always interpret these datasets is that, well, this is Wikipedia. What is the Wikipedia dataset representative of? Wikipedia. What is the Twitter dataset representative of? Twitter. Not of the world. So for Wikipedia you should believe the Wikipedia. For Twitter ->>: So you convince [inaudible] of the National Academy of Sciences that Wikipedia is intrinsically interesting enough that it was worth writing an article only about translations between Wikipedias? >> Cesar Hidalgo: In Wikipedia and Twitter -- >>: I mean, I assume that they -- that somewhere in your article you suggested there was some conclusion to be drawn beyond these three particular measures of social media. >> Cesar Hidalgo: No, no, no. Like we actually like were very [inaudible] that they should be [inaudible] because like you say that Twitter is irrelevant, they have -- governments have come down because of things that happen on Twitter. Like books. Books is kind of like most of our history has been based on the fact that actually we write things down. You know? So in some sense, sure, this is not representative of all our communication, but is representative of three leads that are very globally influential. This is not people that are not globally influential. Like books, translations, Wikipedia, and Twitter, outlets that are worth to study. >>: I found that people are trying to push us into saying, like come up with one definitive conclusion, like put a single number on everything. And this is not really possible. What we found, though, is that different media have different way. And like an interesting conclusion that we make in the paper, at least allude to, is the fact that we see a different variance on Twitter than what we see in book translations. It means that potentially the languages -- the languages that we see on Twitter are more associated with developing countries, so there's more -- it's a more democratic medium, the book translation, so possibly this map 20 or 30 or 50 years from now will be very different because the medium involved -- the media involved will be very different and also the cultures involved will be very different as well. So the world is also changing. It's always changing. Unfortunately we cannot make -- we don't have the long data to make like the long-term conclusions about how it is the global language network changing over time. So we're not trying to proclaim something we cannot do. But still these are different -- each one of those is true to its own ->>: [inaudible] say you gave me a whole bunch of post talk just so stories which sounded really nice. Behold here's the [inaudible] cluster in books world it's because of we can [inaudible] this happened. What did you find in here that actually surprised you? What did you learn from this that you could not have learned had you not drawn this diagram? >> Cesar Hidalgo: Like for me I would have never expected that like the connectivity of a language in the network of translation explains 90 percent of the variance of the famous people that are being produced by that language. You know? You don't see like those 90 percent are squares in social science research very commonly. I would have been happy with like a 30 percent and three stars next to it to be honest. So in that context like the strength of the fact for me was very important. Also, you know, like what you highlight here is exactly something that points to the misconception that you had, which is people tend to think of data as something that is a reflection of a world that is exogenous to it while in reality what you have here is a representation of languages and the languages cannot be separated from the medium in which they were expressed. So I'm pretty sure if I were to look at a different set of medium, I would get a different network. And in that context I would be learning about the expressions that exist in that medium. So those are a couple of things. Now what you have is that you have a very interesting hypothesis because now you have the hypothesis, well, in the context of global fame and attention. Is this network actually something that matters to a point in which we maybe should try to do something about it? Does this network help explain other things that we never tried to explain in this network? For instance, you know, one of the things that we are looking at is like well, you know, if you do a gravity model of trade, you know, does this network help explain trade between two countries after controlling for distance and the size of the economies. And if the answer is yes, what would that tell you? Well, it would give validity to the [inaudible] theory of the economy in which social interactions are the ones that precondition or preestablish the network that is going to make a possible economic activity because these people have to speak to each other. >>: So for Twitter, does it [inaudible] Twitter, what does it mean? >> Cesar Hidalgo: Ah. So we look at all the usage that we were able to get our hands on, which is like a billion tweets. And then detect the language. So then if you tweeted let's say in English and then you tweeted in Chinese ->>: The same person doing both. >> Cesar Hidalgo: Exactly. Yeah. So the same person. >>: I see. >> Cesar Hidalgo: So I know, for example, like I contribute to this link because I tweet in English and I tweet in Spanish. >>: I see. >> Cesar Hidalgo: You know? But I don't tweet in other languages. You know? Exactly. So this is the same person tweet both. In Wikipedia is the same editor has to edit article in both languages. >>: So I think as a scientist this is very satisfying, making use of our [inaudible]. There's a lot of elemental surprise and there's a lot of developmental sort of explanation of the world, exploration as well, but as an engineer my question is [inaudible] how is this actionable. >> Cesar Hidalgo: Oh, yeah, yeah, like that will have to go back to like the other projects that are not about languages and their influence. But, for example, like this project like DataViva, which makes available a billion visualizations and it's actually in the context of a world in which most governments around the world have passed on laws that mandate to open up their data and they have IT teams that [inaudible]. So in that context what I'm trying to do is to develop a set of technologies in which we can deliver the data for them, you know, in the context of these large data visualization engines that make that information, you know, globally available. So in that context, for example, this data visualization is actionable. Another thing that we're doing is with EMC they're looking at, hey, shit, we sell to a lot of people our fucking back end, you know, now we're saying they can hook up any front end that they want. But it's kind of a pain. Can we like ship it together with our own front end. So, for example, we send them all the data of DataViva that is [inaudible] back end. Then we're going to go there, visit them for a week, create like a front end, you know, for them to show them that is very quickly and easy to do, and we can do that because all of the libraries that we use to create this we have created ourselves too. So this is -- we don't just create the [inaudible] but we create the libraries. And in that context, you know, that is kind of like something actionable for like EMC or for Oracle that would want to distribute that front end to have something better than my PHP admin if you're looking at your data. >>: [inaudible] I appreciate that. But then in going back to the previous one, I was hoping for an answer like, well, I don't know, we should encourage more translation [inaudible] or we should make, I don't know, the Skype translated technology [inaudible] or we should make access to Twitter more easy for some population. Something like that. There is some discrepancy that you observe and you can perhaps ->> Cesar Hidalgo: So in that context, for example, once I remember -- like I think Shahar was also in that conversation, there was these people at the middle [inaudible] hundred companies. So we get people from all different type of sectors. And one of these groups of people was people that were -- they build mostly like the router and Internet infrastructure for like Southeast Asia. And what they're always asking themselves the question is like, hey, where is traffic going to be in five years from now? Because traffic changes. And building infrastructure is not that easy. Takes a lot of time and effort. So what they were thinking is like, well, can we use this network to try to say, hey, these are two languages that are actually like kind of spoken so as the level of incomes are going to go up, the number of people that are going to be integrated into the communication technologies and want to communicate across, you know, is going to increase and therefore we should expect a larger increase, you know, let's say in bandwidth between Thailand and Indonesia -- let's say between Thailand and Malaysia, just to give you like a made-up example. So in that context you could try to look at those things as well. For me kind of like what is more interesting is actually that scientific experiment aspect in which you are learning about what are the things that determine how information grows and diffuses in our society and in our economy. And in that context this was like what I would say until now heretofore not very a popular factor that people were using to describe that, and it happens to be one that is actually quite strong in the diffusion at least of cultural information, and therefore should need to be considered as a main explanation, not as a cite collection. >>: Predictive context, which I appreciate a lot, actually, I really like the examples you gave about builds and capacities, network bandwidths and cables and things like that. Are they happy with the data that's inferred from Twitter or Wikipedia, or are they asking the same question as has been asked, which is, well, I don't care so much about Twitter, I care about the world. >> Cesar Hidalgo: Yeah, so in that context, you know, like I would say when you're working in a practical context, people are very understanding of like the constraints of implementation. So obviously if it would be possible to get better data and if that better data make a real difference, like companies there would like, I don't know, start serving people, you know, and see which languages they speak or try to find a different way. In that context, I would say as a first approach at least to see if let's say this has legs, if we would like look at it ten years ago would have made the right predictions, it's something that people would be very much willing to give it a shot. Yes. >>: So this all fascinating. You've shown almost like an evolution where the medium has slowly shifted our perception more and more and more towards a performer. And I'm wondering if you almost foresee the medium -- the performer now playing the role and selling the next medium basically in a way. Because the onus being more put on -- I mean, looking at language being a much stronger predictive factor and looking at the evolution towards more of a performance-based production basically, I'm wondering if almost our next evolution is more of a cultural one that is more enhanced by the medium -- sorry, the performer than the medium. >> Cesar Hidalgo: So I can see like how the medium and the performer interact in a different way now because I think we're going through like a new age of invention in which there's a lot of people that have become famous because of helping create a new medium. You can think of like Bill Gates or Steve Jobs or Mark Zuckerberg, all these people that have global fame, people that have contribute to creation of like a new medium, whether it is a personal computer or social media or Twitter or whatever those things. >>: There's almost like a bias associated with that where the performer kind of reinforces the use of the medium or the growing of the medium and how much you're willing to forgive the medium if you want in a way and stick with it and allow it to evolve. I don't know if ->> Cesar Hidalgo: You [inaudible]. >>: Different translations. >>: You had famous people by date of birth. >> Cesar Hidalgo: Yes. >>: Does it change much if you do it by date of death, especially in the more recent centuries? Seems like a more sensible thing to ->> Cesar Hidalgo: So if you do date of death, what you find is that there's actually a much stronger bias in favor of like big developed countries as cultural centers. Because what happened is that people with talent, you know, are basically born everywhere but they don't die everywhere. You know? Like everybody that -- I think it would be very hard to find, for instance, someone in the Wikipedia that has a presence in a number of languages and that was -had the same place of birth and place of death. You know? >>: [inaudible]. >> Cesar Hidalgo: Yeah. And the data that like we could look into -- although there is a lot of people that is alive in our dataset still, from Justin Bieber to George Bush. >>: You wait long enough. >> Cesar Hidalgo: Yeah. So what I wanted to show you is like a few other examples. I don't know if you guys want to look into that, some that look at cities. I don't any of you are interested in maps. Or some that look at e-mail. You know? So the first one is this example that involves e-mail, which I call e-mail is the revenge of the Internet. And this is my e-mail. So this is all of the e-mails that I received let's say between noon and like 2 p.m. when I guess I lost my Internet connection. And they pile up. And e-mail has a horrible design interface because basically what e-mail is designed to do is designed to make you act urgently. So what is e-mail designed around? It's designed around messages, not people. Each one of these is a message, not a people, or a message thread. And designed around time. What's the most important in the e-mail? What's on top. So if the e-mail was the newspaper, the headline would be the latest e-mail that came to your inbox. Now, in reality, obviously they're in some context in which we want to push that sense of urgency, but we need technology to help us push a sense of reflection. So what we did is we did the design experiment that we decided lets turn e-mail a hundred percent around. Let's flip it on its toes. And how would e-mail look if we flip it on its toes? We flip it on its toes, it would do two things. The first thing is we wouldn't center the e-mail on messages, we would center on people. Why? Because people think about people. I think of Andreas, I don't think of like -- I don't remember the title of the e-mail that we're exchanging, but I know it was you. So that's what I'm going to search and I know who you connect me to and so forth. Now, we center it on people. And also let's not just put like an narrow window of time, let's look at the network that you're weaving over the long race, not over the short race. And that's Immersion. And anybody can try it. You guys can log in with your Gmail if you have Gmail. And basically what it does is this now is showing me the network that have built over the last 10.6 years. Okay? I can like include more people if I want, you know ->>: That's your personal e-mail? >> Cesar Hidalgo: Yeah, this is my personal e-mail. And people are connected and have been in a cc in the same message. >>: So the connection means that you send e-mail to somebody else. >> Cesar Hidalgo: Yeah. So, for example, this is my mom and this is my sister. And my mom, my sister, and this is -- you know, since this is ten years, my girlfriend in the year 2004 are connected there. Because there were e-mails that involved the three of them. And these were my friends in Chile, you know, and then here, then I start moving here and I go through like this is my advisor, the Ph.D. [inaudible] and then I go from there, and then this is when I went to Harvard, and then this is when I go to MIT, and this is the people at MIT and the red is kind of like my group. You know? Now, obviously, you know, this network changes a lot over time. So what I usually do, this is like what is active on my inbox like in the past month. So these are the balls that I'm having like juggling or that I'm dropping. So like I have Maggie, my admin, you know, she's the one that keeps my universe from colliding. Here I have, for instance, this is people in my company and the clients that we have. Then here, you know, Kevin is one of my students. And here is like also having these collaborators from Colgate, Kevin is creating like a tool that allows you to create a DataViva in like 50 seconds. Okay? So and we're doing it with ->>: That's the size indicates how many e-mails that you send to ->> Cesar Hidalgo: Exactly. And here you have -- you know, like you have other groups. Nicky works on the project that I want to show you next and so forth. So it allows you to reflect and to see how these social interactions have been evolving and so forth. The original idea was to transform this in a full-fledged inbox, like a visual inbox. So then when I would like send e-mails, receive e-mails, I would know that the e-mail is coming from here or from there. Now an e-mail from here is more important than an e-mail from there. So it's that now you're prioritizing based on the position that they're occupying on your social network, not on the time that they decided to press send. Which I thought it would be a better way to try to weave your own network. But the students that did this, one now is working on Google and the other one is doing consulting by himself. So basically this product is just parked on a server at the Media Lab. You know? But it's a nice project. >>: Some of the people who are in the periphery could be very important and you might want to respond quickly, right? >> Cesar Hidalgo: Yeah. They're like the new people. Or the mistresses. Yeah. No, it's true. Like sometimes you show this to people and they're like who's that one, who's that one. And then they blush. Because it's a person that you don't connect with anyone else, the mistress. That's why they show up in the border. But what I find is that if you have people in the border, if you don't connect them to your network, it's very hard to keep that link. So the ability for you to preserve things with other people also depends on your ability to embed the people in your social network. Because your friends provide a service of keeping their friends connected to you also. >>: [inaudible] tiny little dot [inaudible] but an e-mail from that person ->> Cesar Hidalgo: Could be very important. Exactly. >>: [inaudible]. >> Cesar Hidalgo: No, no, like there you could have like some sort of like important algorithm that is exogenous to the social network but maybe has information of the social network of everyone that is using e-mail. So in some sense I'm pretty sure that if you were to mine, you know, like e-mail data, you should discover that [inaudible] is kind of like an important guy in the network. You know? And you could use that. So it would go beyond your own inbox, the data you wanted to use. >>: But also there's some -- has something to do with whether the e-mail is -- you know, [inaudible]. >> Cesar Hidalgo: Oh, yeah, yeah. >>: For about two years out. >> Cesar Hidalgo: But that's something like, for example, already is done very well with the priority inbox of Google and everything. >>: I see. >> Cesar Hidalgo: Yeah. So like mailing lists, they tend to be like lower priority. And those are easy to [inaudible] because mailing list are places that everybody like basically sends e-mail to but they never go in the other direction. And in that context I think we're out of time. But what I wanted to show you is just kind of like this side process that we have been looking at urban perception and we have created like computer vision and machine learning algorithms to create very high resolution maps, you know, of urban perception. So we have collected around 1.2 million preferences in this Place Pulse Web site that allows us to determine which place looks safer, livelier, more depressing, et cetera. And then we take those preferences and we use to train machine learning algorithms that then we can use to generate, you know, maps that are very, very, very high resolution. So this map of New York has 300,000 points, but it's based on 2,000 images from New York because you wouldn't be able to crowd source 300,000 points for multiple comparisons. I never was able to get that traffic. And what we're doing now is that now that we have the technology to create these maps that tell us which places look good or bad, we are looking at which places are changing positively and changing negatively. And that is something that is very interesting because actually like we're starting to look at the dynamics of the city. So these are places, for example, these are the before-after pictures, 2007, 2014. And these are places that have been highly improved in Williamsburg. And we're starting to see, well, now that we can detect the places that change, you know, how much of our, you know, like exclusion of minorities is happening in those places because we have a measure of gentrification that we can use to actually look which are the places that had new construction. We can see what predicts whether this place or that place is going to be the next one to get improved, is it the proximity to other things, is it the demographic component that they have. Can we actually now that we have a measure of urban change start predicting urban change and informing what are the things that are causing the urban environment to change, is it private investment or is it public investment? Is it that people basically got allowed building permits and they just put their own money and they build shit themselves, or is that the government decided to like clean up all the streets, made them nice into parks and then eventually the buildings came later. These are questions that are hard to answer right now because you don't have good measures of urban change, but we hope that with this technology you're going to start creating maps of urban change that can be refreshed with a relatively high frequency because this is just computer vision and machine learning that you can keep on cranking and turning and improving. And in that context, we can start assigning questions of what causes urban change and what are the effects of a changing urban environment. And with that in mind, I would like to finish up. I'll just put a plug for my book, it's called Why Information Grows. It has zero to do with anything that I talk today. It's actually what I like doing. It's about evolution of physical order from atoms to economies. So I start the book by describing, you know, [inaudible] statistical physics, then I go into [inaudible] statistical physics. And from there, you know, I contrast that with the information theory of Shannon. I explain why there are some important difference in there and why eventually information is related primarily to physical order. And then I describe the mechanisms, then explain the origin of physical order in the universe, and I show how those mechanisms are reembodied in society and economy to ultimately conclude that the growth of economies is nothing other than an epic phenomena of the growth of information in the world. Thank you. [applause]. >> Andres Monroy: We have some time for some questions. >> Cesar Hidalgo: We had questions during the talk, too, but if anybody has anything else. >> So you don't talk about exactly how [inaudible]. >> Cesar Hidalgo: What did I use to explain? >>: Just a standard technique for [inaudible] like they don't look cluttered, everybody can see ->> Cesar Hidalgo: Yeah, yeah. So like there's two things. So, for example, this side is built custom on D3, which is a JavaScript visualization library. But what we've done, you know, is that all of this side, for example, like DataViva, when you're making a billion charts, you don't want to use D3 to create each one of those charts. You know, you're going to go crazy. So we created a library ourselves that is called D3plus. You know? And the D3plus library, what it does is that like it provides you kind of like cookie-cutter, well-designed D3 visualizations that you can incorporate with like one line of code. So let's say you want to create like kind of like this nice pie chat that mouse over and makes the sizes of the font proportionate to the size of the [inaudible]. Just have to do that code and that's it. You know? So that allows us to then scale and create this more ambitious online project that create lots of visualizations because we have kind of like that level building block that we can put, we have to figure out what query is, what I'm going to show, connect those two, and then we can create these like large, you know, informative profiles, like the ones that we have here for industries or occupations or for -- yeah. Yeah. So that's what we use. Okay. Yeah. Okay? >> Andres Monroy: Thank you very much. >> Cesar Hidalgo: Thank you.

>> Andres Monroy: Welcome, everyone. My name is... the MSR, and I have the pleasure to introduce Cesar...

Related documents

Products

Support

&gt;&gt; Andres Monroy: Welcome, everyone. My name is... the MSR, and I have the pleasure to introduce Cesar...

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib

>> Andres Monroy: Welcome, everyone. My name is... the MSR, and I have the pleasure to introduce Cesar...