Cesar A. Hidalgo, MIT: Measurements for many: A brief history... evolution of data collection and statecraft and statecraft

advertisement
Cesar A. Hidalgo, MIT: Measurements for many: A brief history on the coevolution of data collection and statecraft and statecraft
Andy Lippman: Cesar Hidalgo will be addressing us from Cambridge. Cesar’s background is economics,
specifically global economic analysis.
Cesar Hidalgo: My apologies for not being there, but I am glad to be invited to speak.
When I thought about what I could contribute on communications I decided that I would tell of the
evolution of how data is collected and the effect on statecraft and the many implications of its analysis.
Policies are about actions – how to modify the world.
I’ll start by asking you a question
Consider fire. It’s very hot. If you touch the hot pan will you burn your hand? Probably. You will move your
hand right away. You do not go through a deliberation process. You just pull your hand off. The problem is
easy.
Let’s complicate it a bit. Increase the scale.
Soccer: The ball is being crossed for a shot on goal. This situation is difficult for the defending team. What
is their response? They do not share a nervous system. They each have a separate system. How do you
combine those networks? It is dealt with by a simplification of the actions that could be taken. The coach
teaches players ahead of time how to respond. Forget about complicating emotions and other extraneous
info and concentrate on the simple task at hand.
Deal with the world by simplifying into scenarios and predefined responses.
Define a language to describe the system. Make it legible. Define methods so players can measure
possible responses.
Let’s now consider what was called, in Saxony and Prussian history, cameral science, which was essentially
an effort to reduce the fiscal management of a kingdom to simple scientific principles.
Take a forest of a certain size. It takes 20 years to produce. One way to distribute the product of the forest
was for each villager to take a certain amount of wood per year spreading the take evenly across the 20year period. But this was not completely satisfactory. Some people had a greater need for wood than
others. They needed more. So, they classified trees by ages and used a bidding system to distribute the
wood. This system also had inherent flaws. There were a lot of things in the forest that were not so
important to the villagers: brush, squirrels, etc. The demands were for certain types of wood—for ships
and other types of building. They needed trees that grew fast. Why wait for the forest to regenerate?
They could simply plant trees as needed.
After first doing this the next 50 years were very good. They obtained the highest yields ever. But, after
this first 50 years, yields decreased markedly. What happened was that by reducing the “unwanted” flora
and fauna that made the forest work the long-term consequences were great. The way we act on the
world has consequences.
As an attempt to improve the resource distribution system they developed the idea of creating land
property rights. Villages then were different than villages now. Land around village was used for
cultivation. Shared narrow strips of land were collectively owned and cooperatively cultivated. Groups
could trade land to meet their needs. This system works okay for single small villages, but as thousands of
villages developed they had to come up with a better system. They developed maps showing tracts of
land divided into parcels. People were provided land tenure and were taxed by the size of the parcels.
Making Society Legible:
There were also taxes on property. But, there was a measurement problem. They had to measure the size
of each house. Property owners were not pleased to let the assessors in for taking measurements to
determine their taxes. To adapt the system to this reality they decided that they could judge the size of a
house by the number of windows and doors. Bigger houses had more of them; so they taxed each window
and door which could be counted from outside and this was not only less intrusive, but also faster.
The long-term result? People started building with less windows and doors.
Two modern yet popular simplifications that are highly distortive:
(1) Foods measured by calories and
(2) The relative wealth of nations.
Other data such as source of the food—some are animal products others are plant products—the
nutritional value varies, etc. Norms had been developed over millions of years of ways to measure with
ONE dimension. But, often this is an insufficient method for our needs: carrots and M&Ms affect the body
very differently. We need to provide food for children in school. In determining how to do this one
constraint is the budget, another consideration is the nutritional content. This requires measure each
different plant, each different food item. When the measurement used is calories then, due to budget
constraints, higher calorie count foods, which are not necessarily healthy or nutritious, are used.
Making a complex system more legible is great. It allows communication between its parts and it allows
action at a distance.
The relative wealth of nations.
What makes countries rich or poor?
Three popular answers are: (1) education, (2) governance and (3) competitiveness. (See chart for details)
Think about ancient Greek science. Why come up with 12 pillars for competitiveness. Why 12? Look at the
Epicycloid Model. (See illustration)
The Atlas of Economic Complexity
Instead of trying to define what input matters from the input side, we instead look at the outcomes.
Which countries are productive? Which countries have put resources to productive use, connecting the
structure of the observable network with the structure of its primitive components?
If a country makes jet engines it has to have all the parts to build them.
How do you go from that model for the world and transform into terms that are usable for the network
language a country has? The diversity of possible products a country can produce depends on the
complexity of their set of capabilities. A country with a diversity of complex capabilities can produce more
products than a country that has a limited set.
Top 5 products by complexity. (See chart)
Bottom 5 products by complexity. (See chart)
Complexity by country. (See chart)
Why does this matter? It allows the prediction of the diversity of products possible in the future and
allows us to predict the need for education, governance and competitiveness.
Conclusion:
Remarks I want to make: States and large organizations need to act across complex systems. The
management of these systems requires statistical analysis of large volumes of data that are
comprehensible to use only after grotesque simplifications.
These simplifications are highly distortive and tend to gear systems toward outcomes that are more
compatible with the measures of it than with the outcomes that inspired the measures in the first place.
Science can blind us, but it can also set us free. One goal of communications is to generate languages that
are adequate to describe resources needed to reach the desired end result.
Again, sorry I was not able to be there in person. Thank you for your time.
QUESTIONS
1.
[Questioner from Cisco]: in discussions about measurements of GDP….are non economic metrics
like happiness appropriate toward success.
Absolutely. During The Great Depression they needed some sort of measure of public confidence. How is
it that the US has, over past 50 years, had a decline in manufacturing? There were measures that
happened in US that did not happen in other countries. Can attitudes/emotions be important to this
outcome? Collective emotion can enter into the formula. We need better measures to take this into
account.
2.
Wondering if you have thoughts on generalizing large amounts of data. It can be biased. How do
you analyze this large amt of data to make measurements unbiased?

Large data sets are very complex to analyze. It pushes the boundaries of statistics. In data sets with
millions of points of data, is it all statistically relevant? We have to develop models that are statistically
relevant to the data set. When you work with networks you want to have a model and deviations. Look at
a communications network. Take two that have the same number of connections but somehow the
outcome is different. We can end up with hypotheses that seem logical but do not fit the data.
3.
How would this apply practically? To companies? To markets?
I had conversation with ____________ at the Media lab. From the perspective of a country (Turkey) what
do I do about this particular problem, but which can also be applied to companies? We can use
appropriate data to make informed decisions. In this work the number. of decisions a company has to
make are mindboggling.
Have 1,000 products with 1,000 variables there are a million decision to make.
Switzerland makes chocolate, aircraft….diverse products. Models can predict possibility of success and
help make those decisions.
Download