Document 10746547

Reality Hedging: Social System Approach for

Understanding Economic and Financial Dynamics by

Wei Pan

wj~

B.Eng., Tsinghua University (2007)

Submitted to the Program in Media Arts and Sciences

School of Architecture and Planning in partial fulfillment of the requirements for the degree of

Doctor of Philosophy

0

WI

COO'*"1_!J

at the

MASSACHUSETTS INSTITUTE OF TECHNOLOGY

Februrary 2015

@

Massachusetts Institute of Technology 2015. All rights reserved.

wf

:

Signature redacted

A u th o r .. .............................................. .... .........

Certified by.............

Accepted by ................

Program in Media Arts and Sciences

School of Architecture and Planning

2

September 15th, 2014

Signature redacted ...

Prof. Alex (Sandy) Pentland

Toshiba Professor of Media Arts and Science

-TLhesis

Supervisor

4


..........

of Pattie Maes

Interim Academic Head

Program in Media Arts and Sciences

Reality Hedging: Social System Approach for Understanding

Economic and Financial Dynamics by

Wei Pan

Submitted to the Program in Media Arts and Sciences,

School of Architecture and Planning, on September 15th, 2014, in partial fulfillment of the requirements for the degree of

Doctor of Philosophy

Abstract

This dissertation's main contribution is a new methodology, Reality Hedging, which is to use big-data driven approaches and tools from Computational Social Science for understanding, monitoring and designing economic and financial systems. The center idea in this approach is to treat economic and financial systems as systems of connected people.

We are entering a new age where many aspects of our lives are digitized thanks to social media and smart phones. As we see many areas of research topics using these datasets to establish new behavioral and social theories about our society (e.g.:

Reality Mining, Social Network Analysis, etc), it is natural to ask if and how we can use these advancements to build better economic and financial structures and institutions especially after the past financial crisis. After all, all economic systems are systems of people, rather than systems of atoms which always follow the same physics principles and mechanism. I collected and analyzed some large economic and financial social systems from individual levels to city levels. Many connections between financial dynamics with social dynamics were examined. I also focus on results and findings that can be used to build resilient and productive economic systems, and can be used to hedge out risks resulting from the social connectivity.

In this thesis, I will discuss my research efforts in collecting valuable large-scale behavior data using smart phones. I will show that such datasets are helpful in inferring individual financial status. I will expand individual observations to new models for understanding the innovation economics in cities. I will continue to elaborate the idea of idea flow in behavior changes by focusing on the study of an online trading platform which allows traders to discuss and share trades with each other.

Thesis Supervisor: Prof. Alex (Sandy) Pentland

Title: Toshiba Professor of Media Arts and Science

2

Doctoral Committee:

Thesis Supervisor: ......


..........

Alex (Sandy) Pentland

Toshiba Professor of Media, Arts, and Sciences

Massachusetts Institute of Technology


Thesis Reader:..................

Andrew W. Lo

Harris & Harris Group Professor of Finance

Massachusetts Institute of Technology

Thesis Reader: ...

ignature redacted

Michael W. Macy

Goldwin Smith Professor of Arts and Sciences

Cornell University

Acknowledgments

I spent five wonderful years in my advisor Sandy Pentland's Human Dynamics Group, and I am truly grateful to Professor Pentland for taking me in and providing me with the best research training. You taught me to think big and do great research. If there is any merit in my work, it is because of the guidance from you.

I want to thank my thesis committee members Professor Andrew Lo and Professor

Michael Macy for your support and advice. Your works are among the ones that have inspired me the most. It is such an honor to be able to work with the heroes you admire, and I happen to be such a lucky person.

I want to thank everyone in the Human Dynamics Group for all your support, suggestion and encouragement, especially Dr. Erez Shmueli for the preparation of eToro data for this thesis. I am especially fortunate to receive guidance from some of the most brilliant minds in the world that I know, who happened to decide to spend a couple of years in Human Dynamics Group and to share an office with me: Dr.

Manuel Cebrian and Dr. Yaniv Altshuler.

I want to thank my wife Zhe An for endless support in the last seven years. You followed me wherever I went, and you supported me for whatever I pursued. Seven years ago we arrived in US by ourselves, and we made it. Also Emma, my dear daughter, please forgive dad for not spending enough time with you.

4

Contents

1 Introduction 17

2 Big Data: New Measurements for Social Systems 23

2.1 Overview ............................. . . . . . . . .

23

2.2 Big Data from the Internet . . . . . . . . . . . . . . . . . . . . . . .

25

2.2.1 Review of Online Data . . . . . . . . . . . . . . . . . . . . .

25

2.2.2 Online Social Networks for Trading . . . . . . . . . . . . . .

26

2.3 Big Data from Mobile and Wearable Devices . . . . . . . . . . . . .

29

2.3.1 Existing Studies Based on Mobile Phones . . . . . . . . . . .

29

2.3.2 The Friends and Family Study . . . . . . . . . . . . . . . . .

31

2.3.3 Results from the Friends and Family Study . . . . . . . . . .

34

2.4 From Individual Financial Behavior to Policy Making: T he Big Picture 40

3 Idea Flow: Social Interactions and Urban Economic Development 43

3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . 43

3.2 The Superlinear Growth of Cities . . . . . . . . . . . 44

3.3 Idea Flow and Ecnomoic Development . . . . . . . . 45

3.3.1 The Social Tie Density Model . . . . . . . . . 46

3.3.2 From Social Ties to Idea Flows . . . . . . . . 48

3.3.3 Empirical Evidences . . . . . . . . . . . . . . 50

3.3.4 Limitations on Social Tie Density . . . . . . . 52

3.4 Conclusion: Data-driven Economic Measurements . . 55

5

4 Social Trading: A Microscopic Look Into Idea Flow 57

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

57

4.1.1 Financial Systems As Social Systems . . . . . . . . . . . . . .

57

4.1.2 Relevant Work . . . . . . . . . . . . . . . . . . . . . . . . . .

59

4.2 eToro: The Social Trading Platform . . . . . . . . . . . . . . . . . . .

60

4.3 Idea Flow for Optimal Trading . . . . . . . . . . . . . . . . . . . . . .

61

4.3.1 Idea Flow and Trading Performance . . . . . . . . . . . . . . .

61

4.3.2 Social Dilemma in Structuring Idea Flows . . . . . . . . . . .

65

4.3.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

68

4.4 Idea Flow and Idea Overflow . . . . . . . . . . . . . . . . . . . . . . .

72

4.4.1 Social Influence in Copy Trades . . . . . . . . . . . . . . . . .

72

4.4.2 Excessive Copying and Idea Overflow . . . . . . . . . . . . . .

73

4.4.3 eToro Idea Overflow to Market Idea Overflow . . . . . . . . .

77

4.4.4 Conclusion: Bubbles and Overflow . . . . . . . . . . . . . . .

79

4.5 D iscussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

82

5 Conclusion

5.0.1

5.0.2

5.0.3

Future Directions . . . . . . . . . . . . . . . .

C aveat . . . . . . . . . . . . . . . . . . . . . .

Privacy Concern . . . . . . . . . . . . . . . .

83

. . . . .

83

. . . . .

84

. . . . .

85

A Red Balloons: Using Social Media Data for Monitoring a Large-scale

Social Incentive Task 87

A .1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

A.1.1 Social Incentive . . . . . . . . . . . . . . . . . . . . . . . . . . 87

A.1.2 Large-scale Social Incentive . . . . . . . . . . . . . . . . . . . 89

A .2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

A.2.1 The Challenge . . . . . . . . . . . . . . . . . . . . . . . . . . .

89

89

A.2.2 Our Winning Strategy . . . . . . . . . . . . . . . . . . . . . .

A.2.3 Academic Significance of Our Winning . . . . . . . . . . . . .

90

92

A.3 Using Tweets to Analyze of the MIT Incentive Structure . . . . . . . 93

6

A.3.1 Comparison with Other Strategies . . . . . . . . . . . . . . . . 94

A.3.2 Understanding the "MIT Brand" Effects in Our Strategy .

. .

96

A .4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

B Tables 99

7

8

List of Figures

2-1 We show the screenshots for the eToro social trading platform here: a) The general landing page showing the current trades by other users and top ranked traders. Users can click any trade to copy; b) A public profile page for a eToro user (images and names removed), which contains his current trades, messages and most importantly the number of followers mirroring his trades. . . . . . . . . . . . . . . . . . . . .

28

2-2 The evolution of wearable human behavior sensing systems from 1997 to 2007. The left one is the original MIT wearable computing group's jacket hardware[51] in the late 90s, and eventually smart cell phones and special badge-like sensors to the right became easily available and small enough to be used by billions every day. . . . . . . . . . . . . . 30

2-3 High level timeline for the Friends and Family study. . . . . . . . . . 32

2-4 Sample screenshots: Sync-state and version display (left), survey (center), and probe preferences debug screen . . . . . . . . . . . . . . . . 34

2-5 We show here the mean bluetooth interaction diversity Dcau(i) and its standard error for individuals in different income categories. The top plot is based on previous household income, and the bottom plot is based on current household income. There exists borderline positive correlation between current household coarse income and call diversity

(r = 0. 32, p < 0.10), and the correlation is much stronger within native

English speakers in the participant pool (r = 0.53, p < 0.06). However, there is no correlation between previous estimated household income and face-to-face interaction diversity (r = -0.28, p > 0.60) . . . . . .

36

9

2-6 We show here the mean call diversity Dcan(i) and standard error for individuals in different income categories. The top plot is based on reported previous household income, and the bottom plot is based on reported current household income. There exists positive correlation between current household coarse income and call diversity (r =

0.28, p < 0.08). However, there is no correlation between previous estimated household income and call diversity(r = 0.003,p > 0.80).

.

37

2-7 We demonstrate the prediction performances using each single network here. For comparison, we also show the result of random guess, and the result using our approach, which combines all potential evidence. 40

3-1 Overall time of calls between residents of a county as a function of its population. The points refer to the data (adapted from Calabrese et al. [33] computed from ten million users' mobile phone call records within US during July 2010), while the solid line is the theoretical prediction from the model Eq (3.6) adapted to raw population. The model captures both the super-linear growth and tilts on both ends of the curve while providing a superior fit to the data (based on adjusted

R

2 value) when compared to a pure power-law relation (dashed curve). 49

10

3-2 The spreading rate as a function of density for two different contagion models. Figure 3a: the mean spreading rate as a function of density

p. The points correspond to n = 30 realizations of simulations of the

SI model on a 200 x 200 grid. The dashed line corresponds to a fit of the form R(p) ~ p1+a with a = 0.18. The solid line is a fit to the social-tie density model. Figure 3b: the mean spreading rate as a function of p under the complex contagion diffusion model based on n = 30 realizations of simulations. The dashed line corresponds to the power-law fit of the form R(p) ~ p

1

+a with a = 0.17. Once again the solid line is the fit to the model described in the paper. In both cases, the social-tie density model provides a better fit than a simple power-law with with much lower mean-square errors (29% and 41% lower respectively). . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

3-3 Spreading rate of HIV as a function of density in United States Metropolitan Statistical Areas. The relationship between density and AIDS/HIV spreading rate of the 90 metropolitan statistical areas from recent CDC and US Census surveys. As is visible, the model captures the qualitative trends in the data. . . . . . . . . . . . . . . . . . . . . . . . . .

53

3-4 Correlation between GDP and population as well as correlation between GDP and population density for all 247 NUST2 regions in the

European Union. Left panel: correlation between density and GDP, suggesting a strong correlation with a super-linear functional form as predicted by the model. A pure power-law fit to the data is also shown for illustrative purposes. Right panel: the correlation between population and GDP this time showing a sub-linear functional form. However, the poor R

2 value suggests that raw population does not correlate as well as density with GDP growth in cities. . . . . . . . . . . . . . . . 53

11

4-1 The mean ROI for all trades of the three social types based on earlier data between 2010 and 2012. The returns are significantly different from each other (ANOVA p < le 10), and mirror trades generate significant positive return(t-test, p < 0.005). . . . . . . . . . . . . . .

63

4-2 The mean ROI for all trades of three social types based on all new recent data from three instruments: EUR/USD, OIL and SPX500.

For all three types of instruments, the return of individual trades are difference from copy trades with K-S test p < 0.001. The return of individual trades are different from mirror trades with P < 0.00001.

The return between copy trades and mirror trades are different with p < 0.00001. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

64

4-3 The return of the three strategies based on the data trading period.

We note that Simple Best Short-Term Return strategy is actually the best way to generate the largest return. . . . . . . . . . . . . . . . . . 68

4-4 The return of the three strategies based on the data trading period.

The best strategy is to look for individuals with the best long-term risk-adjusted performance. Therefore, long-term Sharpe ratio seems to be the best metric for valuing a trader. . . . . . . . . . . . . . . .

69

4-5 The distribution of number of followers for each trader shows a strong power-law pattern. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

4-6 The single market perspective versus the crowd market perspective for all trading days for all EUR/USD trades. The x-axis for each of the three plots is the percentage of buy orders in all individual trades, also referred as Single Market Perspective. The y-axis is the percentage of buy orders in all copy trades (referred as Crowd Market Perspective). 74

12

4-7 The single market perspective versus the crowd market perspective for all trading days for three different asset classes: EUR/USD, OIL and SPX500. The x-axis for each of the three plots is the percentage of buy orders in all individual trades, also referred as Single Market

Perspective. The y-axis is the percentage of buy orders in all copy trades (referred as Crowd Market Perspective). . . . . . . . . . . . . . 75

4-8 The correlation coefficient between

Cd and t of different window size

i shifted from -3 days to 3 days. The largest correlation occurs when market trend is one day after, with an average window size of 7 and

14 days. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

4-9 We plot the crowd market perspective together with the one-daybehind real market trend smoothed by a one-week average window.

We notice that the crowd market perspective strongly negatively correlates with the next-day market price. . . . . . . . . . . . . . . . . . 78

4-10 We show the Sharpe ratio of our strategy using different values for

threshold and for holding days. The strategy shows a potential return to bet on market mean reversion one day after eToro traders show sudden significant large copy trading tendency. . . . . . . . . . . . . . 80

4-11 We show the Sharpe ratio of our strategy using different values for

threshold and for holding days for our mean reversion strategy based on eToro traders' social tendency on GOLD instrument. The strategy shows a potential return to bet on market mean reversion three days after eToro traders show sudden significant large copy trading tendency. 81

A-1 A demonstration of the three mechanisms we use to increase physical activity. Left: Control, where subjects are paid by their own performance. Middle: "Peer-View", where subjects see their peers' performance but are paid by their own performance. Right: "Peer-Reward", where subjects are paid by their peers' performance . . . . . . . . . . 88

13

A-2 Recursive incentive mechanism: (a) Suppose that in this network, agent a, recruits all of his neighbors, namely a

2

, a

5 and a8. Suppose that a

8 recruits a6, who finds balloon 7i. (b) We have a winning sequence S(41) =

(a, a8, ) with IS('i)I = 3. The finder receives P8 = 4,000 = 2, 000.

Since a8 recruited a, then P8

=

4,000

2(3-2+1

receives 2(4,0 = 500. Likewise looking at the left recruitment path, we have a winning sequence S(02) =(a, a

2

, a

3

, a4) with

IS( i

2

)I = 4. The finder receives p4 = 4,000 = 2, 000. As above, we have P3 = 4,000

2431

2(4-4+1)

1,-21

P2 =

00

= 500. From this sequence, a, receives

2(4-1+1)=

2(4 0

250. Adding up its payments from the two sequences it initiated, al receives a total payment of pi = 750. Assuming there are only two tasks, the surplus in this case is S = (4, 000 3, 500) + (4, 000 3, 750) = 750 .

. . . . . . .

92

A-3 Raw tweet counts for five teams from the announcement of the challenge to the announcement of the winner. The time series starts at the announcement of the challenge, and ends at the announcement of the winner. The dotted line marks the time at which the balloons were launched. Note that the MIT team launched its web site and mechanism only 2 days before the balloon launch. .......... ............... ... ......... ..

A-4 Tweet count over time of MIT red balloon team versus another MIT-related

95 event during the same month on December 15. For easy comparison, we shifted data temporally by matching the day when the MIT team launched the online campaign with the day when the MIT bike news was released: (a)

Daily Twitter counts for both events (Data are scaled in this figure so that both peaks have the same value); (b) Raw daily increase in Twitter counts.

The vertical blue dash line indicates the day of the DARPA Challenge com petition. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

14

List of Tables

2.1 Network data used in this study. . . . . . . . . . . . . . . . . . . . .

39

3.1 Growth factors / for some urban economic factors. . . . . . . . . . .

45

4.1 Number of trades in my dataset categorized by trading type. . . . . . 61

4.2 Number of trades in my dataset categorized by instruments (Top 15 are show n here). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

4.3 The position size of all trades. . . . . . . . . . . . . . . . . . . . . . . 62

4.4 Linear regression results of the two regressions in Eq. 4.2 and Eq. 4.4.

The first column is for newly added users, and the second column is for newly left users. N/A indicates that this factor is not in the regression model. (*:p < 0.05, **:p < 0.01, ***:p < 0.001) . . . . . . . . . . . . .

66

4.5 Linear regression results for different potential factors which might explain social tendency level for copy trading on both EUR/USD and

GOLD. (*:p < 0.05, **:p < 0.01, ***:p < 0.001) . . . . . . . . . . . .

76

4.6 In-sample and out-of-sample testing for my social mean reversion strategy......... ...................................... 79

B. 1 Linear regression results of coefficients for model variables of regression

Eq. 4.4 with _ included. (*:p < 0.05, **:p < 0.01, ***:p < 0.001) 99

15

16

Chapter 1

Introduction

You are so complex that you don't always respond to danger.

Jenny Holzer

One of the biggest disruptions of the last decade is the availability of data, primarily from Internet and wearable devices. This new trend has brought enormous opportunities towards the understanding of societies. These data actually help to establish a new field "Computational Social Science" [89], aiming at understanding theories of societies better. For instance, researchers are able to analyze data of millions of individuals using Twitter and Facebook data [29]; the mobility of millions are widely discussed [72] and has already transformed transportation industry; to name two.

Among many new initiatives, Reality Mining was developed as one of the corner stones for this new field [59]. Reality mining is the idea of using real individual level passively measured mobile sensing data on social interactions to re-examine fundamental social science concept: from the ideas of friends [60] to the dynamics of influence [114]. These projects not only have shown the limitations and inaccuracy of traditional economics and social science methods such as surveys of small samples, but also in many cases have improved many social theories. One of the key contributions from reality mining is to show that survey data on social ties is vastly different from

17

measurements from mobile phones [60] [114]. Using the reality data opens a new door towards understanding theories of societies.

Naturally, one would ask if we can use such reality big datasets for economic and financial research purposes.

The Complexity in Economic and Financial Studies

The 2008 crisis has inspired a lot of researchers to re-examine many classical economic thinkings and theories [99]. One of the main changes from the researchers and practitioners in economic research is to broadly recognize the complexity natural of the economic systems [143]. The notion of systemic risk [45], risk from the interconnectivity of institutions, has caught a lot of attention after 08 crisis, in which a few bad bank nodes almost took down the whole financial network [1] [75] [79].

While the interconnectivity of banking and financial institutions had created enormous stability problems in financial markets, we also need to recognize that the fine element in any economic and financial system, the individuals, are also inter-connected

by social ties. These connections naturally change individuals' economic and financial behaviors and development. For instance, how one would form his/her opinion about their confidence level of the economy? How one would become more productive in creating goods and innovations? These two questions are the foundations of macro economics: the former one determines consumption, and the later one determines production. There are solid evidences that social interactions and human connectivity in the modern society play a huge role in predictions and consumptions [64] [150]. Even our government decision making process is a social process: One example is the US

Federal Reserve Open Market Committee (FOMC), the highest power in regulating markets and economy by controlling the currency supply and interest rates. As many readers know by examining their meeting minutes, there is no single decision that is based on a set of solid models and theories, and the same economic data can create vastly different opinions because of social influence [103].

18

Reality Hedging: A New Perspective for Economic and Financial Systems

Considering the complexity in the social interaction nature of economic systems as

I stated above, this dissertation is centered at examining these systems with the idea of using computational social science methodologies to study certain financial and economic problems. I call this approach "Reality Hedging", meaning that I use the knowledge from the massive reality passively-sensed data from phones and Internet to build economic and financial strategies to avoid (hedge out) certain problems and risks in our modern economic lives.

I argue that this proposed approach, "Reality Hedging", can provide additional benefits in the following aspects of economic and financial studies.

The first contribution is to provide better measurements of our societies. We built sophisticated machines to understand physical systems to study every particle detected, such as the Large Hadron Collider [53]. On the other side, economists rely on surveys of very small samples to understand the status and trends of economics.

The current mainstream dataset is abstract and heavily aggregated: For instance, the

US monthly consumer confidence survey only calls 500 households

[95].

It is simply too small a sample considering the 400 million US population. Surveys are difficult and expensive to conduct. Therefore, information collected is high-level and limited.

It cannot tell details about the distribution and underlying reasons for confidence changes: Is it due to the fact that the raising stock price increases individuals' retirement value [81], or is it influenced by the outstanding company earning performance of their employers? If a new policy comes out from the government, it will take a month for economists to understand the policy's impact due to the release cycle of these surveys. It will be helpful if economists can see responses tomorrow. One the other hand, passively measured big data can provide much broader insights about individuals without the burden of conducting surveys.

Surveys are generally poor instruments for social related studies: People are often unable to recall and describe their social connections accurately

[57].

Also, to understand one's behavior under social interactions, researchers must sample and survey

19

enough of one's social ties as well. Therefore, sampling and conducting surveys on a representative subset from the population to both represent the population and to be sufficient for analyzing every subject's social influence is a very challenging task [22].

In contrast, large-scale sensor-based social study can generally avoid these pitfalls.

In this thesis, I show that by using mobile phone sensors we can measure individual financial confidence directly. I also show how social interactions in cities, which can be measured easily by looking at phone records, can change overall productivity outcome.

In Appendix A, I prove with a practical social competition that you can use social media data to measure effectiveness of a theoretical social mechanism. I argue that big data can be powerful tools for detecting financial and economic status.

The second contribution is to help build better behavioral and social models for economic decision making processes. Ofen economic theories rely heavily on rational individuals assumptions. However, human behaviors are more complicated: In 2008, a tax rebate check of 800 dollars was issued to every taxpayer by the US government during the financial crisis in order to encourage consumption. However, a study afterwards shows that giving free money to distressed consumers was less effective than other stimulus measures [133] .

Policy making like tax rebate requires finer behavioral and social model beyond rational thinking. Humans' spending decisions are largely influenced by their peers, and in a crisis many behavioral aspects such as social influence would reduce further people's perception about economic futures.

Therefore, their spending tend to be irrational.

In this thesis, I show how fear and uncertainty can spread in social networks and dominate individuals' social behavior in addition to rational thinkings. The examples in this thesis on financial market participants' behavior demonstrate a more complicated behavioral model when social systems are considered, and allow clear guidances for designing new incentives.

The third contribution is the ability to connect individual micro level observations to macro level observations. There are often mis-connections between macro and micro level economic analysis, as micro-level analysis is often theoretical and less data-driven. For instance, prediction market is a place where users can choose to

20

trade virtual stocks to harvest the best crowd wisdom predictions [10]. The incentive in the market is carefully crafted to elicit best estimations from every rational market participants. However, data sometimes don't show optimal behavior in the prediction power of these markets [70], contradictory to what the game theory micro models suggest. Similar observations also happens in general markets too, where the aggregation of individual decisions fail to converge to the predicted optimal points.

Unlike these well-thought theoretical prediction market models, stock markets are formed in a more natural way without any prior experiment and design component in them. They show even more uncertainty, as bubbles and crashes happen often [143].

In my thesis, I show that by studying carefully the dynamics of a social group of individual investors, you can connect their behaviors to predict aggregated market movements. The detail social-tie level observations provide better insights into our predictions on macro levels, and help researchers understand the working and failing parts in mechanism design.

Chapter Overview

This dissertation includes a few years of my work on building a different perspective of tackling many economic challenges we have. Though this is the very beginning of this type of analysis, I hope these ideas will inspire more people to think differently when looking at economic systems.

This dissertation provides multiple angles from which Reality Hedging is adopted for understanding economic systems. Chapter Two is dedicated to measurements: I talk about the modern sensors, modern phones and Internet, and how these data can be used to develop economic measurement systems. I continue to discuss how we apply the social structure into understanding the economics and creativity development of cities, proposing the idea of "idea flow" as one key aspect of economic developments.

I explain how individual-level social interactions can be integrated into a city-scale model for predicting productivity levels. In Chapter Four, we take a micro-level look at how individuals adopt idea flows from their social networks. I use a large

21

social datasets of individual traders to suggest how exactly idea flow affects individual investor's behavior. Chapter Five concludes this dissertation. I also have attached in Appendix A another example of how to using social media to monitor a practical incentive for hundreds of thousands of real participants.

22

Chapter 2

Big Data: New Measurements for

Social Systems

2.1 Overview

Probably one of the most advanced achievements in the beginning of the new millennium is the availability of so many data about humans and crowds. While you can consider datasets such as high resolution images of the universe as big data, in this dissertation by data I mean data of human activities and behaviors. Two major technology advances really enable our ability as researchers to instrument human society in a level of detail and precision that are simply impossible a decade ago.

* The advancement in Internet and cloud technologies. Almost every book and record that can be archived electronically are being archived in digital format such as the Google book initiative [104]. Many business transactions of all types are converted into digital systems. We human beings start to record and share own lives and social interactions online with platforms like Facebook.

* The advancement in mobile and wearable technologies. Modern cell phones are built in with dozens of sensors to collect everything about the cell phone owner [3]. People find it easy to record and share their lives with the mobile technology.

23

While many researchers refer to this trend as the "Big Data" movement, the idea of using a lot of data to study a complicated system is not new for fields such as

Biology and Physics[89]. In Physics for example, it is well known that many high energy particle studies generate significant amount of data. For instance, the LHC

(Large Hadron Collider) generates around 700Mb of data every second. To generate this magnitude of data stream for humans, it requires everyone of Facebook's 800 million users to type new messages 24x7 on their keyboard with no stop. As a matter of fact, the World Wide Web was created by CERN (European Organization for Nuclear Research) to easily share data between scientists from worldwide.

Still, the data about humans and society enables us to do deep analysis that we simply cannot do a while ago, just like the data from LHC enables physicists to look into dynamics of atoms. Researchers in social science are able to run some planet scale analysis and experiments with Facebook

[29].

On the mobile phone side, researchers have been working on phone records from a nation to understand economic changes in these nations [102].

Can these data be used to help us understand economics and financial markets better?

In 2001, economics students from 17 countries released a International Open Letter to all economics departments to ask for reforming economics education. The letter includes seven points, and the top one is a broader conception of human behavior [67]. The understanding of human behavior itself, the collective outcome of human behavior, and how to change human behavior are precisely the focus of this big data movement. In the past, neoclassic economists cannot be blamed for lack of valuing human behavior in their theory: For physicists, the advance in the development of new theories often comes after the advance of new measurements and experiments.

If there is no data on measuring the speed of light, there will be no relative theory from Einstein. Economists simply lacked the data on human behavior to build better and robust theories for economic activities. Therefore, I firmly think that the new

"Big Data" movement on human and crowd behavior is going to close some gaps in theories and reality for economic research just like data did for Physics. This is also

24

the focus of this dissertation.

We are going to discuss two types of data I used for my research. We will go over some interesting data from Internet-based social network applications, and discuss how such data can be adopted to understand individual's financial trading behavior.

We will then continue to discuss another interesting data source from smart phone mobile devices, and cover some interesting current and future works in this area as well.

2.2 Big Data from the Internet

The increasing digitalization of human interactions, behavior and communications, thanks to the boom of the information technology revolution, truly provides researchers an unique opportunity to look into new ideas for human behavior. The examples are numerous, and here I provide a simple outline of different types of online data in academic research.

2.2.1 Review of Online Data

I categorize online human behavior datasets into the following two types. I will generally discuss these two types and how I use them in my dissertation.

9 People share information and communicate within their social networks online.

Using data from people's interactions on systems such as Facebook, social scientists can now study millions of people and conduct large-scale experiments easily[29]. This revolution completely changes the way we understand influence and information flow in crowds[147], and it has created many opportunities in real practices from online marketing to political activities[41]. For economic research, these data provide the idea of new incentive systems. Other more direct information and idea sharing services, like Twitter, enable researchers to monitor the crowd's opinions, such as opinions on stock investments[28].

I have studied both topics in my phd research. In the Red Balloon project [123],

25

I designed an Internet-based crowd incentive structure, aiming at solving critical problems with a crowd of thousands of individuals for a very small budget.

I also use social media to understand the effectiveness of the incentive structure in the same paper. My research also covers the topic of financial trading social networks, aiming at understanding the dynamics of collective human investment/trading behavior [118]. They will be discussed in Appendix A and

Chapter Four respectively.

o People's activities are recorded online. Because of the adoption of cloud services, many routine activities by individuals, such as shopping, paying bills, medical records and taking notes, are now stored online. While potential privacy issue cannot be ignored completely, the availability of these datasets also created enormous opportunities for economic research. For instance, Google mines search keywords for predicting flu and employment rate[12]. One of the most exciting datasets that has ever been studied is the credit card dataset[88], and it revealed that individual shopping behavior is predictable to many degrees.

As the authors argue, traditional economic research only focuses on aggregate shopping behavior, but the authors are able to trace individuals of different income classes and of different countries, and demonstrate properties with very high resolution.

In one project, using the digitalized flu daily data from US Center for Disease

Control and Prevention, I was able to infer the dynamic traveling patterns across

US states from the dataset itself [115]. By simply applying statistical learning models to some interesting large datasets of any measurements, much insights can be discovered.

2.2.2 Online Social Networks for Trading

It is not unusual to be able to develop some interesting topics on social networks in general. However, with respect to financial dynamics, we rarely see research works combining social networks with financial trading behaviors. Scholars suspect for a

26

long time that humans act differently with a small or a large incentive

[701.

Because anything finance activities come with cash incentive, it is natural to suggest humans are more rational in the financial market. However, even in 2013, there are big arguments asking if humans are rational in trading money (i.e. under strong financial incentive) between Nobel winners [135]. While some researchers long suspect that humans are not rational and are also influenced by their social ties in doing financial trading activities[130], we lack empirical studies and evidences.

Just like every other industry in the big data age, new innovative online trading systems have come to life to replace old boring brokerage services. With the rise of social platforms such as Facebook capturing social interactions of millions of people, entrepreneurs are building new online trading platforms with social networks built in. Examples include eToro, Myfxbook, Zulutrade, Currensee, to name a few. In these platforms, traders can also explicitly share and use their social interactions for trading purposes. Platforms such as eToro have created new opportunities for finance research. For the first time, financial researchers are able to study social mechanism in making financial pricing and trading decisions. A majority part of my dissertation is around one of such services: the eToro social trading brokerage.

eToro (See http: //www. etoro. com) is an on line discounted retail broker for foreign exchanges and commodities tradings with easy-to-use buying and short-selling mechanisms as well as leverage up to 400 times. In other words, eToro makes trading accessible and fun, as it allows any user to take both long and short positions, with a minimal bid of a few dollars. eToro also magnifies risks in trades, as it allows user to be leveraged. As a result sometimes traders do lose more than 100% of their position value in a single transaction. In Chapter Four of this dissertation, I will discuss some research based on the eToro data.

However, among all new features, the most interesting feature of eToro is that it provides a social network platform (known as The Open Book) for all traders. We illustrate the main social trading interface in Fig. 2-1. Users can easily look up other users' trades, portfolios, and past performance. Users can place three types of trades in eToro.

27

..........

See Folow Copy!

Watch ft video

SRe*e Fftn

Get $50sho.*ng

Live T rading Feed m aet recent acfvmeos made byreal people

Go WrTrder

W GOMEGA Bought Ott. Oft

L tnan mirute ago lm Ffance

UK9 Coomenti Folow Cow, vkio42 Sold AUDAJSD G1.0088

* less than a mirute ago

FIm

Norway

Ue Commnot Folow

*usoqukmtsns3 closed a AUO/USD Sell position about a minute ago fron Spain

Uke Commen Folow f

NcolstHafmann closed a EURAJID

Sell position, gaIning

8.9% a miute ago ftom

German M abiut

Ut Comment Folow

Tats Stiy

(a) Social Trading Landing Page r.afweu.r.

upAO

1243 383 *846%

Top

Pefornnw

= p-

2 11bnmrn

3

3 ryklose

0

A*wW

4 pavantk.an

$4.6%

* 579%

85%

* 25%

*2132% f 's trading feed sholivig last 20 achons Show E Trad. sc sos

Last Month P&L

From =Spain

2968 Copiers

Folow

12444 sW*4

0 a

O W placed an

USOMCH @0.9281

order to Buy about 3 nours ago from Spain

UKA Comrome Follow cow*

ANN* las

Buenas mtaerias

1.26 cada vez tardes ternina todo. por ahora esta mas amigos cerca. Un

E euro esta soportando mocha preson, no hay nada de conflana en los mercados a primas esta bajando, Veremos manlana

[a el

1.29 aguanta pero el empieza vez quo el abra2

GP el y preclo de

F1 en comno recordad que

Catatunya, empleza a ronda europea. Saludos!1

=aout 3 hours ago rrom Spain

UKe Commenit Follow

~a0c lakd this ekip djmilW:

*esperemos qua no va a l tgan

Risk Breakdown

10%

10%

(b) Public Profile for a User

Figure 2-1: We show the screenshots for the eToro social trading platform here: a)

The general landing page showing the current trades by other users and top ranked traders. Users can click any trade to copy; b) A public profile page for a eToro user

(images and names removed), which contains his current trades, messages and most importantly the number of followers mirroring his trades.

28

e Single trade: Users can place a normal trade by themselves.

" Copy trade: This mechanism allows a user to place a trade exactly as another user's one single trade. In the following discussion, we will refer to this type of copying as "copy trade". As shown in Fig. 2-1(a), users can review all current real-time trades, and choose any one to copy.

" Mirror trade: This mechanism allows a user to pick an example user. For every trade the example user makes, eToro automatically executes the same trade on behalf of the user. In the following discussion, we will refer to this type of copying as "mirror trade". Fig. 2-1(b) shows a user's profile page, where other users can follow and mirror all the trades of this user in his profile page. The word "follow" and "mirror" in the following content is interchangeable.

2.3 Big Data from Mobile and Wearable Devices

The second movement in measuring human behavior is very much the growth of wearable systems. It has always been a dream to measure every word and every movement from a person

[1441,

yet the development of such technology for everyday usage was not mature until recently. In Fig. 2-2, I illustrate that the original wearable computing system developed by MIT has evolved into much smaller systems such as the sociometric badges[111]. A wearable computer has evolved from a huge chunk of electronics from 20 years ago to a small handheld device that penetrate the modern ways of living in 2014.

2.3.1 Existing Studies Based on Mobile Phones

A lot of different research projects on collecting data at the level of the individual have been achieved. Eagle and Pentland [59] defined the term "Reality Mining" to describe collection of sensor data pertaining to human social behavior. They show that using call records, cellular-tower IDs, and Bluetooth proximity logs, collected via mobile phones, the subjects' social network can be accurately detected, as well

29

Figure 2-2: The evolution of wearable human behavior sensing systems from 1997 to

2007. The left one is the original MIT wearable computing group's jacket hardware[51] in the late 90s, and eventually smart cell phones and special badge-like sensors to the right became easily available and small enough to be used by billions every day.

30

as regular patterns in daily activity [59, 60]. This initial study was then expanded in

Madan et al.[97], who conducted a similar experiment and show that mobile social sensing can be used for measuring and predicting the health status of individuals based on mobility and communication patterns. They also investigate the spread of political opinion within the community [98]. Other examples for using mobile phones for social sensing were done by Montoliu et al. [106] and Lu et al.[94].

Here I would like to briefly discuss a mobile phone based study of which I am a core member. This work, known as the Friends and Family Study, has been widely accepted and known as one of the most comprehensive efforts in collecting every possible piece of information for hundreds of people living in a community, using both advanced mobile sensing technologies and traditional surveys. The detail information of this work can be found in Aharony et al[2].

2.3.2 The Friends and Family Study

The general framework of the Friends and Famiy Study idea is a combination of a longitudinal living-laboratory/social-observatory type. of study, coupled with a supporting system infrastructure that enables the sensing and data collection, data processing, and also a set of tools for feedback and communication with the subject population. This project implements and extends the ideas of the Reality Mining approach [59], by (1) adding much greater data richness and dimensionality, combined with (2) a strong element of active interaction and carefully designed experimental stimulation of the study population.

Living Laboratory: The "Friends and Family" Community: Starting March 2010, we initiated a living laboratory study conducted with members of a young-family residential living community adjacent to a major research university in North America. All members of the community are couples, and at least one of the members is affiliated with the university. The community is composed of over 400 residents, approximately half of which have children. The residence has a vibrant community life and many ties of friendship between its members. We shall refer to this residence as the "Friends and Family" community.

31

Pilot Phase: 55 Participants Phase 11: 130 Participants

6 Months

Pilot Launch

March 2010

Pase n

Sept. 2010

-

12 Months

-- - - .- -

Fitness Intervention Additional Interventions

Oct-Dec 2010

%.

Figure 2-3: High level timeline for the Friends and Family study.

This study involves a relatively different subject population when compared to previous ubiquitous computing observatory studies. For example, colleagues and coworkers in Reality Mining [59], and undergraduates in [97]. The Friends and Family community includes a much more heterogeneous subject pool, and provides a unique perspective into a phase in life that has not been traditionally studied in the field of ubiquitous computing married couples and young families.

As depicted in Figure 2-3, a pilot phase of 55 participants launched in March 2010.

In September 2010 phase two of the study included 130 participants, approximately

64 families. Participants were selected out of approximately 200 applicants, in a way that would achieve a representative sample of the community and sub-communities.

One of the reasons for keeping the number below 150 is that these numbers fit well with Dunbar's social evolutionary theory regarding the number of people humans are able to maintain a relationship with [54]. Throughout the study we ask about social closeness between all participants in the study, and numbers larger than Dunbar's number could become quite tedious. We refer to experiments in our scale as "Dunbar scale" experiments. The research goals of the longitudinal study touch on many aspects of life, from better understanding of social dynamics to health to purchasing behavior to community organization. The two high-level themes that unify these varied aspects are: (a) how people make decisions, with emphasis on the social aspects involved, and (b) how we can empower people to make better decisions using personal and social tools.

Study Data Collection: One of the key goals of this study is the collection of multi-

32

modal and highly diverse range of signals from the subject population.as shown for corporate setting and an undergraduate community in

[111],

and

[97].We

wanted to gather data on numerous network modalities, so that their properties and interrelation could be better understood. We applied a user centric, bottom up approach utilizing the following components:

Mobile Phone Sensing Platform: This is the core of the study's data collection.

Android OS based mobile phones are used as in-situ social sensors to map users' activity features, proximity networks, media consumption, and behavior diffusion patterns. The mobile phone platform is described in more detail in the next section.

We did not sponsor phone plans or data plans users received a mobile phone that fit their desired provider, and they were responsible to port their existing account to it or open a new account. The condition was that the study phone be their primary phone for the duration of the study. The phones run our software platform, which periodically senses and records information such as cell tower ID, wireless LAN

IDs; proximity to nearby phones and other Bluetooth devices; accelerometer and compass data; call and SMS logs; statistics on installed phone applications, running applications, media files, general phone usage; and other accessible information. Over

25 different types of data signals are currently collected. The system also supports integration of user-level apps, like an alarm clock app we developed, for additional data collection and interventions. The phone system also has a survey application.

Sample screenshots can be seen in Figure 2-4. The configuration is set so that batteryintensive actions (e.g. GPS scans) are performed in intervals allowing usefulness while minimizing battery drain. A remote configuration capability allows for fine-tuning the system, with a goal of enabling a minimum of 16 hours between charges. The software is released as an open source project on http: //www. funf .

org.

Surveys: Subjects complete surveys at regular intervals, combining web-based and on-phone surveys. Monthly surveys include questions about self perception of relationships, group affiliation, and interactions, and also standard scales like the Big-

Five personality test [82]. Daily surveys include questions like mood, sleep, and other activity logging.

33

Figure 2-4: Sample screenshots: Sync-state and version display (left), survey (center), and probe preferences debug screen

Purchasing Behavior: Information on purchases is collected through receipts and credit card statements submitted at the participants' discretion. This component targets categories that might be influenced by peers, like entertainment and dining choices. Facebook Data Collection Application: Participants could opt to install a

Facebook application that logs information on their online-social network and communication activities. About 70% of subjects opted to install.

2.3.3 Results from the Friends and Family Study

Using Friends and Family Study Data for Understanding Individual Financial Status

However, probably the most interesting economic related social result from the Friends and Family Study is to establish the relationships between one's individual income level and one's social behavior statistics[117]. The former one is collected in the study via traditional methods as many economics researchers do, while the latter one is done using automated mobile phone sensing software. Such results cannot be accomplished without a large-scale crowd data collection effort such as the Friends

34

and Family Study.

The discovery of the strong correlation between social interaction patterns and the wealth and economical development of a community has attracted a lot of attention [58]. The current challenge is to understand the causality of this finding.

Researchers tend to believe that a diverse relationship brings benefits such as increased information, external opportunities, etc [30] [113]. Such thinkings come from a long line of classical social science literature: Granovetter's weak tie theory [74] and

Burt's social structure hole theory [32], to name two.

"Friends and Family Study" provides a unique opportunity for understanding this relationship. We are able to examine carefully individual-level relationship between one's financial status (household income) and his/her interaction diversity by taking both the survey data and the sensor data from phones into consideration. The richness of the study also allows us to observe changes in correlation rather than correlation.

The prevailing causality explanations imply the following reasoning: If successful individuals are suddenly deprived of their incomes like many participants in this study, naturally they will continue to keep their diverse interaction behavior. Their previous success suggests that they understand and benefit from their social diversity, and their future success still relies on their continuous diversity interaction. Since many of them came back to graduate school from descent jobs, there are considerable income changes among participants. However, we surprisingly discover that users' social diversity patterns correlates only with their current income, as illustrated in

Fig. 2-5 and Fig. 2-6.

Using the Friends and Family Study data, I reveals the opposite: Individuals will quickly lose their diversity in interaction when their financial status gets worse;

Individuals will quickly gain their interaction diversity when their financial status get better. We suspect that a more behavioral and psychological oriented mechanism plays an important role in the other direction of causality: Individuals' social diversity patterns are influenced by their financial status. We believe that good financial status ensures people more safe and satisfied living conditions [65], they naturally feel more confident [120] and secure in exploring new social potential [40] [125].

35

0.8

0.75

0.7~8

Blueboth Diversty - Prwious 1n0m0

E

$20 000-$46000

8$40000-65000

65000-SOO 000

0080

0.65

0.8

0,70

0.85

Buetooth Diveisty Curet

Income

(a) (b)

Figure 2-5: We show here the mean bluetooth interaction diversity D,,'1(i) and its standard error for individuals in different income categories. The top plot is based on previous household income, and the bottom plot is based on current household income. There exists borderline positive correlation between current household coarse income and call diversity (r = 0.32,1P < 0.10), and the correlation is much stronger within native English speakers in the participant pool (r = 0.53,p < 0.06). However, there is no correlation between previous estimated household income and face-to-face interaction diversity (r = -0.28,p > 0.60)

Using Riends and Family Study Data for Consolidating Online and Offline

Social Interactions

We have discussed two big human behavior data sources: the online data and the offline data. One natural question is that how can we connect the social ties in, say,

Facebook versus the real social networks of one individual?

I have studied this problem by trying to model the influence of a certain behavior, the installation of mobile smartphone apps, using the Friends and Family datasets [116]. Here is some interesting take-aways, and the details can be found in my paper [116].

We are interested in studying the network-based prediction for mobile applications (referred as "apps") installation, as the mobile application business is growing rapidly [61]. The app market makers, such as iPhone AppStore and Android Market, run on almost all modern smart phones, and they have access to phone data and sensor data. As a result, app market makers can infer different types of networks, such as the call log network and the bluetooth proximity network, from phone data.

However, it remains an unknown yet important question whether these data can be used for app marketing.

36

0.7

Call Log Diversity - Previous

Ioeme

420 ,000

M0000-45,000

45,000-0.000

E 0000-90,000

0.650.75

3 0.6

000

05

(a)

0.8

0.70

0.00

0.5

400$45,000-00

0.

Cal Log Diversity Current Income

420.00

000O00-45 000

M0,000(no00ne)

(b)

Figure 2-6: We show here the mean call diversity Dcau(i) and standard error for individuals in different income categories. The top plot is based on reported previous household income, and the bottom plot is based on reported current household income. There exists positive correlation between current household coarse income and call diversity (r = 0.28, p < 0.08). However, there is no correlation between previous estimated household income and call diversity(r = 0.003, p> 0.80).

It is natural to speculate that there are network effects in users' app installation, but we eventually realize that it was very difficult to adopt existing tools from largescale social network research to model and predict the installation of certain mobile apps for each user due to the following facts:

1. The underlying network is not observable. While many projects assume phone call logs are true social/friendship networks [158], others may use whatever network that is available as the underlying social network. Researchers have discovered that call network may not be a good approximation [56]. On the other hand, smart phones can easily sense multiple networks using built-in sensors and software: a) The call logs can be used to form phone call networks;

b) Bluetooth radio can be used to infer proximity networks [56]; c) GPS data can be used to infer user moving patterns, and furthermore their working places and affiliations [63]; d) Social network tools (such as the Facebook app and the

Twitter app) can observe users' online friendship network. In this work, our key idea is to infer an optimal composite network, the network that best describes app installation, from multiple layers of different networks easily observed by modern smart phones, rather than assuming a certain network as the real social network explaining app installation.

37

2. Analysis for epidemics

[68]

and Twitter networks [155] is based on the fact that network is the only mechanism for adoption. The only way to get the flu is to catch the flu from someone else, and the only way to retweet is to see the tweet message from someone else. For mobile app, this is, however, not true at all.

Any user can simply open the AppStore (on iPhones) or the Android Market

(on Android phones), browse over different lists of apps, and pick the one that appears most interesting to the user to install without peer influence. One big challenge, which makes modeling the spreading of apps difficult, is that one can install an app without any external influence and information. One major contribution of this paper is that we demonstrate it is still possible to build a tool to observe network effects with such randomness.

3. The individual behavioral variance in app installation is so significant that any network effect might possibly be rendered unobservable from the data. For instance, some geek users may try and install all hot apps on the market, while many inexperienced users find it troublesome even to go through the process of installing an app, and as a result they only install very few apps.

4. There are exogenous factors in the app installation behaviors. One particular factor is the popularity of apps. For instance, the Pandora Radio app is vastly popular and highly ranked in the app store, while most other apps are not. Our model takes this issue into account too, and we show that exogenous factors are important in increasing prediction precision.

We then use the following networks to construct an optimized network to predict app installations. We summarize all the networks obtained from both phones and surveys in Table 2.1. We refer to all networks in Table 2.1 as candidate networks, and all candidate networks will be used to compute the optimal composite network.

It should be noted that all networks are reciprocal in this work.

Our algorithm predicts the probability of adoption (i.e. installing an app) given its neighbor's adoption status. pi E [0, 1] denotes the predicted probability of installation, while xi

C {0,

1} denotes the actual outcome. The most common prediction measure

38

Network Type

Call Log Undirected,Weighted

Bluetooth Proximity Undirected,Weighted

Friendship

Affiliation

Source

# of Calls

# of Bluetooth Scan Hits

Undirected,Binary Survey Results (1: friend; 0: not friend)

Undirected,Binary Survey Results (1: same; 0: different)

Notation

Gc

Gb

Gf

G

Table 2.1: Network data used in this study.

is the Root Mean Square Error (RMSE = 1 (p, X,) 2 ). This measure is known to assess badly the prediction method's ability [70]. Since in our dataset most users have installed very few apps, a baseline approach can simply predict the same small pi and still achieve very low RMSE.

For app marketing, the key objective is not to know the probability prediction for each app installation, but to rank and identify a sub-group of individuals who are more likely to appreciate and install certain apps compared with average users.

Therefore, we mainly adopt the approach in rank-aware measures from information retrieval practices [101]. For each app, we rank the likelihood of adoption computed

by prediction algorithms, and study the following factor:

b) Optimal F

1

-score (referred later simply as F Score). The optimal F

1

-score is computed by computing F

1

-scores

(2 " )foreach point on the Precision-

Recall curve and selecting the largest F value. Unlike MP-k, the optimal F score is used to measure the overall prediction performance of our algorithms.

For instance, F = 0.5 suggests the algorithm can reach a 50% precision at 50% recall.

We now illustrate the key finding our this work, the prediction performance when our algorithm is only allowed to use one single network. The results are shown in Fig.

2-7. We find that except the affiliation network, almost all other networks predict well above chance level. Bluetooth network performs much better than friendship networks, which matches previous work well[57]. The call log network seems to achieve the best results, which may due to the fact that we only take calls in the study community into consideration.

The conclusion is really interesting in this work: I found that first social networks

39

0.5

0.4-

0

CD)

'-

0.3-

0.2-

0.1

Ud

0

09

Figure 2-7: We demonstrate the prediction performances using each single network here. For comparison, we also show the result of random guess, and the result using our approach, which combines all potential evidence.

do help predict app installations. However, different networks have different effects.

Since Facebook asks you to consider who is your friend, we think the survey from the Friends and Family Study is a good approximation for online social networks like Facebook. Generally such Facebook type social network has a slightly lower contribution in improving prediction power according to Fig. 2-7 compared with really social networks measured by call logs and bluetooth co-location. Also, by combining both online and offline networks, we can generate better prediction. These results suggest, as common sense may suspect, that people have different personas, friendships and lives online and offline, and people are influenced by both these lives.

Really life still weights more on individuals than virtual life, but one needs to consider all of them.

2.4 From Individual Financial Behavior to Policy Making: The Big Picture

On top of my results, Singh et al.[139] also suggest that shopping behavior (i.e. overspending, loyalty and diversity in shopping) is more predictable using data from mobile sensors than traditional surveys and personal metrics, which implies that the big

40

data approach can help politicians and economists to understand better consumers' behavior and credit risk[139] rather than traditional sources of data.

These strong and positive results of understanding financial status and shopping behavior using users' social interaction activity level from mobile phones suggests the potential of using these big data for understanding better consumers and individual.

As Singh[139] pointed out, mobile phone service providers can use the detailed social interaction user behaviors to predict the spending behavior for users. More, the social data from mobile phones can be used as ways of inferring individual credit rating and risk.

As a future direction, I see a even bigger application for measuring key economy factors such as consumer income levels as I illustrated in Pan et al.[117].There are many potential ways of using cell phone data to improving these century-old types of surveys, and such new advances will completely change the way politicians and governments measure the health of economy and adjust the policies.

41

42

Chapter 3

Idea Flow: Social Interactions and

Urban Economic Development

3.1 Introduction

In this chapter, I would like to talk about how the new measurements from big data benefit politicians and scientists in understanding economic developments of cities.

Rather than discussing across different domains of problems in economic development, this chapter will cover one central interesting problem: the growth of cities. I will demonstrate a new model I developed to understand modern cities as many of us live our lives in. Unlike traditional theories, this model suggests that the social components of the cities, i.e. the interactions and flow of information between individuals, are one important factor in cities' success. Readers can find the full paper of this model in Pan et al[119]. The implication of this model is the new idea of measuring the productivity of cities for urban planners using the big data approach we talked about in Chapter Two. In the last section of this chapter, I will talk more about this aspect and potential future research directions.

Readers may find that many results here are still theoretical or simulation-based due to the difficulty in collecting idea flow measurement data. As of the time of writing this thesis, many findings and results in this section are actually validated by real mobile datasets for Portugal and UK by Schlapfer et al [132].

43

3.2 The Superlinear Growth of Cities

A larger percentage of people live in cities than at any point in human history[43], while the density of urban areas is generally increasing[148]. One of the enduring paradoxes of urban economics concerns why people continue to move to cities, despite elevated levels of crime, pollution, disease, and wage premiums that have steadily lost ground to premiums on rent[69]. New York in the 18th century, according to Thomas

Jefferson, was "a toilet of all the depravities of human nature". Since Jefferson's day, the city has grown to host the depravities of 100-fold more people, yet the stream of new arrivals has not stemmed.

While the forces behind any urban migration are complex, the advantages afforded

by urban density comprise an important driver. Smith[140] was one of the first to point to urban centers as exceptional aggregators, whether of innovations or depravities. Cities appear to support levels of enterprise impossible in the countryside, and urban areas use resources more efficiently, producing more patents and inventions with fewer roads and services per capita than rural areas[105, 87, 19, 66, 20, 21].

To give readers a better idea, here are some details about the super linear benefits of city growth. Bettencourt et al[20] report a common scaling behavior of the form

Y (t) ~ N (t), (3.1) where Y(t) is some urban economic indicator, and N(t) is the population size at time t. They find that many urban indicators, from disease to productivity, grow with surprisingly similar values for the exponent 1.1 < < 1.3 as shown in Supplementary

Table 3.1. They suggest that such a scaling pattern reflects quantities such as information, innovation and wealth creation and conjecture that these are intrinsically related to social capital, crucial to the growth and sustainability of cities. While such findings, viz. the qualitative dependence of economic indicators on the population size, potentially have a profound impact-implying that global urbanization is very efficient and a key driver of economic development-there is some debate as to which is underlying mechanism as well as the precise functional relationship between the

44

two.

Table 3.1: Growth factors

#3

for some urban economic factors.

Urban Economic Indicator Growth Factor 3

New Patents

GDP

R&D Establishment

Intra-city call time

1.27

1.13 1.26

1.19

1.14

New AIDS Cases 1.23

3.3 Idea Flow and Ecnomoic Development

Despite the widespread focus on density as a driver for the uniqueness of cities both in scientific and popular audiences, we still lack a compelling generative model for why an agglomeration of people might confer an advantage. Important advances in several fronts have highlighted the difficulty in gaining an understanding of the urban processes beyond the density description level. Early economic models of agglomeration point to the role of technology diffusion in creating intellectual capital [80, 13, 8], but lack a quantitative description of the generative mechanism for how this diffusion happens. Hierarchies have also been proposed as an elegant mechanism for this growth[9]; however, recent studies hint at the absence of well-defined hierarchy across geographical scales [91, 4, 107, 62, 112]. It has also been observed[31], that diversity among residents and their intermingling, displays a weak correlation with cities' success thus prompting the authors to conclude "more fine-scale data on interactions among people of different disciplines-or the culture, laws and peculiarities of cities-is required to better assess the under- or over-performance of innovation of cities."

Recent developments in the study of social networks shed some light on this challenge. Empirical evidence suggests that interactions and information exchange on social networks are often the driving force for idea creation, productivity and and individual prosperity. Examples of this include the theory of weak ties [74, 73], structural holes [32], the strong effect of social interaction on economic and social success

45

[55], the influence of face-to-face interactions on the effect of productivity

[154], as well as the importance of information flow in the management of Research and Development [5, 128]. Consequently, it seems that understanding the mechanism of tie formation in cities is the key to the development of a general theory for a city's growth described by it's economic indicators and its population. Following this line of thinking, our proposed answer for super linear growth of cities can be regarded as a natural extension of Krugman's insights on industries [87]. Krugman pointed out the connection between manufacturing efficiency and transportation of goods as a function of proximity of factories. Similarly, our theory connects the efficiency of idea-creation and information flow to the proximity of individuals generating them.

The overall idea of the new model is to prove that when population grows in a city, the density of social ties will increase at a super linear rate with respect to the population. As a result, the flow and amount of information one faces in a city will also grow super linearly with respect to the growth of population density.

While the idea of this model is novel, the creation of this model is largely inspired

by some known observations: For example, it has been shown that communication volume decreases with distance in a large network of mobile phone calls spanning the

United States [33, 109]. Additionally, geography-based routing alone has been found to account for around 80% of completed chains in a message-forwarding simulation, with population density comprising an essential component of optimal information routing [92].

3.3.1 The Social Tie Density Model

We propose to model the formation of ties between individuals (represented as nodes) at the resolution of urban centers. Since our model is based on geography, a natural setting for it is a 2D Euclidean space with nodes denoted by the coordinates Yi E R

2 on the infinite plane. Furthermore, we also assume that these nodes are distributed uniformly in space, according to a density p defined as, p =

# nodes per unit area.

46

While the assumption of uniform density is an approximation, the qualitative features of the model are unaffected by other more realistic choices of the density distribution[ 119]. Following Liben-Nowell et al.[92], we define the probability of a tie to form between two nodes i, j in the plane as

Pjj 0C 1 ranki(j)'

(3.2) where the rank is defined as ranki (j) :=|k : d(i, k) < d(i, j)}| (3.3) and dij is the Euclidean distance between two nodes. If j lies at a radial distance r from node i, then the number of neighbors closer to i than j is the product of the density and the area of the circle of radius r, and thus the rank is simply, ranki (j)

= p7rr2, (3.4) which implies that the probability an individual forms a tie at distance r goes as

P(r) ~ 1/7rr

2

, similar in spirit to a gravity model [86].

For a randomly chosen node, integrating over r up to an urban mobility "boundary" denoted as rmax, we obtain the expected number of social ties t(p).

t(p)

= ln p + C, (3.5) where C = 2 ln rmax

+

In -r + 1. We note that rma. may well be unique for each city, and is often determined by geographical constraints as well as city infrastructure

(cf. Supplementary Note 3 and Supplementary Figure S2-S3). Integrating over the number of social ties for all nodes within a unit area gives us the social tie density

T(p),

(3.6) T(p) = pInp +C'p,

47

with C'

=

C 1. Thus the density of social ties formed between individuals grows as T(p) p ln p, a super-linear scaling consistent with the observations made by

Calabrese et al. [33] (also discussed in the content below). We argue that T(p) to a first approximation is the individual dyadic-level ingredient behind the empirically observed growth of city indicators.For more detail on the theoretical analysis and support for the assumptions involved, see Pan et al[119.

Empirical evidence for the effect of social tie density Recent work [33] shows

a super-linear relationship between calling volume (time) and population across different counties in the United States. As Figure 3-1 illustrates, the super-linear relationship in the data is approximated by the authors as a power-law growth y = ax8 with 3 ~ 1.14. However, by assuming a uniform distribution on county sizes and treating population as a proxy for density, we show that our density driven model is able to capture precisely the distribution of the call volume. The model produces the exact shape of the curve, including the power-law growth pattern (0 = 1.14) and tilts on both end, with an adjusted R

2

= 0.99 (See Figure 3-1). Consequently, we propose that the model may well provide a reasonable explanation for communication patterns observed in US counties.

3.3.2 From Social Ties to Idea Flows

The expected patterns of link and interaction formation in itself is insufficient to explain how growth processes in cities work to create observed certain scaling phenomena such as productivity and innovations. Actually, the manner in which these links spread information and encourage idea and behavior adoptions actually determines value-creation and productivity. Since it is known that social network structure has a dramatic effect on the access of information and ideas [74, 55, 32, 154, 5, 128], it seems plausible that higher social tie density should engender greater levels of idea spreading leading to the observed increases in productivity and innovation.

To test the hypothesis that a city's productivity is related to how far information travels and how fast its citizens gain access to innovations or information, it is natural

48

0

100000

U

0d

1000

-' *

*

-

* data

-

2 power-law P=1.14, R =0.81 [Calabr ese et al.] our model, R

2

=0.99

1000 10000 100000

County Population

10000000

Figure 3-1: Overall time of calls between residents of a county as a function of its population. The points refer to the data (adapted from Calabrese et al. [33] computed from ten million users' mobile phone call records within US during July 2010), while the solid line is the theoretical prediction from the model Eq (3.6) adapted to raw population. The model captures both the super-linear growth and tilts on both ends of the curve while providing a superior fit to the data (based on adjusted R 2 when compared to a pure power-law relation (dashed curve).

value)

49

to examine how this information flow scales with population density, and to quantify the functional relationship between link topology and speed of information spreading.

Here, I show simulation of two models of contagion of information diffusion [83, 7, 37] on networks generated by our model. The first contagion model simulates diffusion of simple facts, where a single exposure is enough to guarantee transmission. The second more complex diffusion model is typical of behavior adoption, where multiple exposures to a new influence/idea is required before an individual adopts it. In Figure 3-2 we discover that in both SI and complex contagion models the mean diffusion speed grows in a super-linear fashion with 3 ~ 1.2, in line with our previous results and match well with the disease spreading indicators in cities [20]. As a consequence we conclude that an explanation for the observed super-linear scaling in productivity with increasing population density is the super-linear scaling of information flow within the social network.

3.3.3 Empirical Evidences

While in most cases it is not possible to obtain the social tie density of a city directly, our model suggests that population density is strongly correlated with social tie density across cities with similar transportation infrastructure and economic situations

(i.e. similar rm). Therefore, we here explore social tie density indirectly by using population density measures, and we only focus on horizontal comparison of cities of similar levels of economic development, such as US cities and European Union cities.

As a test case for our hypothesis, we study the prevalence of AIDS/HIV infections in cities in the United States. In Figure 3-3, we plot the prevalence of AIDS/HIV in

90 metropolitan areas in 2008 [149] as a function of population density. As the figure indicates, there is fairly good agreement between the data and the curve generated

by our model of diffusion using both the simple and complex contagion models.

The same agreement holds for European cities on economic indicators. In Figure

3-4, we plot the overall GDP per square km in NUST-2 (Nomenclature of Territorial

Units for Statistics level-2) regions in the EU as a function of population density

p as well as population size. The NUST-2 regions are defined by the EU as the

50

10

-

* SI simulation: 4 million population grid

our model p log p fit

10

0.05 0.10 p

S

10^2

* SI simulation on a 16 million population grid

.

- power-law fit =1.17 our model p log p fit

0.20

10^1

0.02 0.08 0.16 0.02

p

Figure 3-2: The spreading rate as a function of density for two different contagion models. Figure 3a: the mean spreading rate as a function of density p. The points correspond to n = 30 realizations of simulations of the SI model on a 200 x 200 grid.

The dashed line corresponds to a fit of the form R(p) ~ pl+a with a = 0.18. The solid line is a fit to the social-tie density model. Figure 3b: the mean spreading rate as a function of p under the complex contagion diffusion model based on n = 30 realizations of simulations. The dashed line corresponds to the power-law fit of the form R(p)

p1+a with a = 0.17. Once again the solid line is the fit to the model described in the paper. In both cases, the social-tie density model provides a better fit than a simple power-law with with much lower mean-square errors (29% and 41% lower respectively).

51

city-size level territorial partition for census and statistics purposes [42]. We find a strong positive correlation between density and the corresponding urban metric with a super-linear scaling component, but conversely a much weak and sub-linear growth pattern on raw population size. While it is not the main focus of this paper, we show that the super-linear growth on density can be often be indicated in data as superlinear growth on population, and that density is a better indicator for socio-economic growth than population-see Supplementary Note 4.

Note that in both datasets the scaling exponents are restricted within a narrow band 1.1 < 3 < 1.3, potentially suggesting a common mechanism behind both the prevalence of AIDS/HIV and scaling of GDP with respect to the population density.

An advantage afforded by our model is the need to dispense with parameter tuning, as the model naturally produces this scaling within a reasonable margin of error. Thus,

by considering social structure and information/disease flow as a major driving force in many of the city indicators, our approach provides a unique and general theory to the super scaling phenomena of cities.

Both the spreading of information (potentially leading to increase productivity and innovation) and contagious diseases rely on the mechanism of social interactions. However, while information can mediate via Mass Media (exogenous) influence, and/or endogenous (word-of-mouth) processes, we chose to highlight the AIDS/HIV spreading data to validate our model, as an example of a purely endogenous process.

3.3.4 Limitations on Social Tie Density

Our model predicts that social tie density scales super-linearly with population density, while naturally accounting for the narrow band of scaling exponents empirically observed across multiple features and different geographies. We note that this is achieved without the need to resort to parameter tuning or assumptions about heterogeneity, modularity, social hierarchies, specialization, or similar social constructs.

We therefore suggest that population density, rather than population size per se, is at the root of the extraordinary nature of urban centers. As a single example, metropolitan Tokyo has roughly the same population as Siberia while showing re-

52

00

1OAO

0

10A-1

-

* data power-law $=1.21, R

2

=0.668

our model, R

2=

=0.671

**

1OA-2

1OA-3

10A2

Population Density per Square Mile

10A3

Figure 3-3: Spreading rate of HIV as a function of density in United States Metropolitan Statistical Areas. The relationship between density and AIDS/HIV spreading rate of the 90 metropolitan statistical areas from recent CDC and US Census surveys. As is visible, the model captures the qualitative trends in the data.

0^8

8

I

C,

0.

0^5 h

F

* data

power-law 0=1.26, R

2

=0.82

our model, R2=0.82

* r

10A8

F

0

0

0

0

0

0

O'^7

* data

power-law =0.85, R

2

=0.51 .* *

10A6

100 1000

Population Density per Square KM (x 103)

10000 100 1000

Population (x 103)

Figure 3-4: Correlation between GDP and population as well as correlation between

GDP and population density for all 247 NUST2 regions in the European Union. Left panel: correlation between density and GDP, suggesting a strong correlation with a super-linear functional form as predicted by the model. A pure power-law fit to the data is also shown for illustrative purposes. Right panel: the correlation between population and GDP this time showing a sub-linear functional form. However, the poor R 2 value suggests that raw population does not correlate as well as density with

GDP growth in cities.

10000

53

markable variance in criminal profile, energy usage, and economic productivity. We provide empirical evidence based on studies of indicators in European and American cities (both categories representing comparable economic development), demonstrating that density is a superior metric than population size in explaining various urban indicators.

However, readers must also be cautious about what this model is not about. While our model provides a fundamental first-principles basis for explaining productivity of cities, we note the importance of higher-order variables such as transportation infrastructure in order to tailor the model to specific cases to get better results. As an example, the density of social ties is intrinsically a function of the ease of access between residents living in the same city. Consider the case of Beijing, which has a very high population density, but due to its traffic jams, is currently de-facto divided into many smaller cities with limited transportation capacities between them. Consequently it may not demonstrate a higher social tie density than other cities with a much lower population density. Thus a direct comparison of the model predictions with a similarly dense area such as Manhattan needs to take into account this refinement. In keeping with the spirit of the simplicity and bottom-up approach of our model, we chose to use data from cities within the United States and the European

Union such that extraneous variables are controlled for.

In examining call logs from UK cities, Eagle et al. [55] discovered that call log diversity, which measures how evenly individuals spend time across their social ties, is positively correlated with regions' social economic developments. I believe that they are describing also another higher-order variable: For two cities with same population density, the city with evenly distributed actions over all social ties (i.e. higher social diversity under Eagle et al.'s measurements) can pass idea more quickly than the other city where social interactions are more heavily concentrated on a subset of ties.

Interestingly, Eagle et al. also suggests that a more evenly distributed interaction activities across different geographical regions will lead to higher economic success, which is precisely what city means for economic developments in our theory: We show in our study that in a city the interactions across different parts of the city are more

54

evenly distributed than areas outside the city.

3.4 Conclusion: Data-driven Economic Measurements

The idea of this model provides the foundation of a new approach for city planning: for the first time some aspects of the productivity and growth of a city can be measured and quantified by simply counting interactions of human beings. This can be done, as we demonstrated in Chapter Two, with mobile phone interaction data.

Remember in Friends and Family Study, we can clearly use social interactions to compute individuals' financial status. A large scale data collection effort can provide us better measurements for social tie density, and allows us to better understand cities.

Technically, this is not difficult at all. Mobile carriers are routinely allowing researchers to analyze a sample of their data streams from their mobile customers.

British Telecom allowed Eagle et al.[55] to study the relationship between calls and economic indices by sharing all call data occurred in UK.

More telecom companies are becoming aware of the value in their data: In the recent NetMob conference, Orange released a large dataset from their mobile operation in Cote d'Vloire for a research competition[27 known as the D3D challenge. Some researchers are finding interesting results from this dataset[141j.

Challenges for using big data as for economic measurments

With the great promises from big data for economic measurements using mobile devices, I still would like to give some cautious thinking here on such big-data econometrics. As a potential future direction, let's compare this big data approach to the

Census Bureau. In US, Census Bureau releases many monthly statistics about the country's economic progress, such as retail sales[110]. If we are to replace Census Bureau with mobile phone data, the statistical measurements from mobile devices must satisfy the following core requirements in which traditional Census Bureau econometrics do well while mobile device datasets might not.

1. Short-time Variation: Many monthly figures reveal short term changes in economy. However, researchers on mobile data often compute statistics from

55

a multi-month window [27]. Will the mobile data capture even weekly and monthly changes in economy? This is a big future challenge.

2. Temporal Compatibility: Census Bureau's numbers are comparable cross its life span. However, mobile devices are relatively new, and the technology and adoption of mobile devices have already changed dramatically over the years: there is never an end for introduction of new mobile phones, new networks and new services. As it is irrational to assume that the technology in mobile systems remains the same, it will be realistic to assume that the statistics we computed from this month's mobile data cannot be compared to the statistics from the same month last year.

3. Risk and Error Tolerance: Census Bureau's indices are stable and generally tolerant of many unexpected events. However, human behavior measurements from data such as mobile phone datasets are not. For instance, if we were able to use the idea of the social tie density and mobile calls to measure city's economy, what would happen if an small earthquake hits. Because of the earthquake, people would be more likely to talk to friends and families to confirm each others' safety [6]. As a result, the additional social interactions we would have measured were no longer due to the growth of economy. We would report higher economical activity level while traditional surveys conducted by Census Bureau might not.

As a result, I still believe the value in the traditional survey methods for understanding social and economic developments, and here I propose these challenges as future directions of this line of research.

56

Chapter 4

Social Trading: A Microscopic Look

Into Idea Flow

4.1 Introduction

In the last chapter we talked about the idea flow in cities, and we generally find that more ties bring more opportunities. If this is true, should each individual just seek more and more connections? Let's take another look at this idea we discovered in the macroscopic model for cities: In this chapter, we are going to examine individuals traders who are exposed to connections to other traders at a microscopic scale. I will unveil some interesting characteristics when financial decisions are made under social idea flow as well as social influence.

4.1.1 Financial Systems As Social Systems

Recently the interests on social influence and social dynamics are growing with the rise of Computational Social Science [89]. We see increasing research efforts in measuring and understanding different social dynamics systems: online systems like Twitter, and real living systems ranging from discussion groups [153], a dormitory community [2] to a New England town [38], to name a few.

One particularly interesting type of social systems are the financial systems.

57

Though financial systems have been analyzed dominantly with physics stochastic calculus (i.e. random walk) models such as Black-Sholes [26], they are with no doubt driven by the collective behavior of humans. Key aspects of social science, such as the existence of social influence and the topology restrictions on information flow, are rarely discussed by finance researchers [130]. By adopting social theories into the analysis of different financial systems, this approach may be able to better explain the many mysterious phenomena in the markets, such as overreactions and market crash.

On the other side, financial systems are among the best quantitatively documented systems, with datasets of transactions in mill-second resolution. Though many of the data are still only about trades rather than networks, researchers are able to infer the network properties of financial systems with newly developed tools [115] [24] to understand the underlying connectivity from individual trades. In addition, new financial data with explicit social relationships are also becoming available [130].

From the literature, one may expect that financial systems are among the best crowd wisdom systems, as researchers believe that real cash incentive is the driving force behind optimal rational crowd wisdom [70] [11]. In simple words, when you are trading with material amount of real money, you are more likely to pay attention and provide best rational judgements. Other online experiments, such as the health influence study by Centola et al.[35] and the music market study[131], are casual with minimal potential rewards. Users are less likely to provide their best thinking efforts, and participants' instant intuitions and even poor user interface design can dramatically change decision behavior.

Because of the belief of money incentive, the foundation of finance research is the idea of price being equilibrium of informed fully rational individuals, a direct derivative from economics. Therefore, the study of finance often concentrates on key statistics, such as the price of a security, which is an aggregated agreement among financial traders. If the price was a combination of equilibrium agreement and some behavioral and social characteristics among investors, the study of price would naturally become less meaningful. Many researchers argue this is actual the case [25] [44]

[17]. In the

58

cycles of bubbles and crashes, it is difficult to argue that the price changes were built on top of some consistent equilibrium theories. This core struggle between reality and theory in finance is the center of the great debate between the two 2013 Nobel

Economics laureates Professor Fama and Professor Schiller[135].

Rather than looking at this great debate from the aggregated data such as price, price earning ratio, etc, I actually use our dataset to examine the financial market from its finest element: the individuals and their networks themselves. The lack of previous studies on social systems of the markets might be due to the simple fact that such big data simply did not exist. Now the data enables us to study finance better, just as the mobile data enables us to study cities better. My thesis in this chapter, to the contrary of the efficient market theory, contributes certain undesirable market movements to the nature of aggregated human social behavior tendencies. I show, in the following research, that humans are benefitting from social interactions.

I also argue that the social behaviors from pure social study in idea influence and idea diffusions also exist in financial systems, even when traders are under strong monetary incentive. The aggregation of these social behaviors leads to non optimal financial results.

4.1.2 Relevant Work

Some finance researchers recently started to explore the potential social component of individual financial traders [138][77], and Heimer et al. suggest that social interaction does promote certain trading activities. Independently, I started to explore this eToro social trading system since 2012 [118]. The main difference of my methodology and that of Heimer et al. is my idea of applying existing computational social science framework into the study of individual traders, and I focus on the study of crowd wisdom, social interactions and idea flow aspect of financial social traders, while Heimer et al. focused on connections between social interactions and trading returns. I also see new interests in this line of research growing, and many new interesting papers continue to discuss other aspects of trading behavior with eToro data [137] [136].

A large body of research articles on behavioral finance exist, yet most of them

59

are built around small lab experiments. Behavior finance also lacks social favor, and mainly discusses individual psychological reaction towards market movements

[18].

4.2 eToro: The Social Trading Platform

Our data come from eToro (See http: //www. etoro. com), an on line discounted retail broker for foreign exchanges and commodities tradings with easy-to-use buying and short-selling mechanisms as well as leverage up to 400 times. In other words, eToro makes trading accessible and fun, as it allows any user to take both long and short positions, with a minimal bid of a few dollars. eToro also magnifies risks in trades, as it allows user to be leveraged. As a result sometimes traders do lose more than 100% of their position value in a single transaction.

Readers should refer to Section 2.2 in Chapter Two for a detail description of the eToro trading platform.

Our data are composed of over 87.5 million trades from August 2010 to December

2013 (social trading features were launched at early 2011 at eToro). Most of the trades are mirror trades as shown in Table 4.1. The most traded instruments are generally currencies, with commodities and indices as well. The results are reported in Table 4.2. The most important thing of eToro is that the total capital in eToro is extremely small compared with the real volumes of these instruments. eToro traders do not affect global price market in any aspect with their trades (see volume summary in Table 4.3). This is an important aspect in our discussion.

Readers may argue that eToro is a special type of financial markets, and our results may not be representative. We believe that in the real financial world while market participants can usually chat with any other participant in open platforms such as Bloomberg instant chat. Information flow, opinions and influence from other peers, and the eventual trading decisions are often largely constrained by the network connections of traders in a manner similar to the eToro user network.

60

Table 4.1: Number of trades in my dataset categorized by trading type.

Trading Type

Individual

Total Trades

17.2 Million

Copy Others' Trade (Copy Trade) 0.39 Million

Mirror Others' Portfolio (Mirror Trade) 70 Million

Table 4.2: Number of trades in my dataset categorized by instruments (Top 15 are shown here).

Instrument Total Trades

EUR/USD 23.2 Million

GBP/USD 11.8 Million

AUD/USD 11.8 Million

NZD/USD 11.8 Million

GOLD 4.4 Million

USD/CHF 4.1 Million

EUR/JPY 3.5 Million

USD/CAD 2.9 Million

GBP/JPY 2.7 Million

EUR/CHF 1.2 Million

CHF/JPY 1.1 Million

SILVER

OIL

1.1 Million

0.88 Million

SPX500 0.70 Million

4.3 Idea Flow for Optimal Trading

4.3.1 Idea Flow and Trading Performance

As we have discussed in Section 4, there are extraordinary benefits in large urban scale social interactions. If the idea flow within these social interactions is valuable to the success of individuals in their lives, shouldn't we also observe such phenomena in the financial world as well?

This is indeed the case. In my 2012 paper [118], I discovered the following principle: The promise in eToro indeed lies in its social features: In Fig. 4-1, we plot average daily ROI (Return On Investment) of EUR/USD on all the trades of different

61

Table 4.3: The position size of all trades.

Property Value

Average Leverage 200

Average Cash Position per Trade 34.89

categories. We find that different types of trades lead to different levels of returns

(ANOVA p < le 10), and mirror trades actually generate a positive return (t-test, p < 0.005). At a first look, following other users in the crowd seems to be a simple way to make money, and social trading does outperform single trades.

In fact, as I received more data from eToro, I realize that the benefits of social trading are among all other assets. To illustrate the idea, I select three instruments covering three different domains of trading: EUR/USD, OIL, and SPX500. They are the euro US dollar currency trade, the crude oil commodity trade, and the S&P 500

Index trade, respectively. In real world, trading in these three different domains has very different characteristics and strategies. I observe similar results in Fig. 4-2.

The results suggest two fundamental principles in trading with other people: o There exist private information and superior traders. A lot of research in theoretical computational crowdsourcing considers the private information as a key assumption such as various prediction market design [76]. By narrowing our focuses on trading single instrument to eliminate portfolio construction, we suggest that private information indeed is reasonable and essential.

o Idea flow helps traders. Just like I pointed out in the previous chapter, information flow provides money making intelligence and opportunities for traders who choose to receive them. The superior knowledge, skills and information carried by the mirroring and copying social behaviors benefit greatly the overall trading performance of eToro users.

62

ROI for different types of trades

0.01

0 a) -0.01

0OF

_0

-0.02-

0

0-

-0.03 cc

-0.04a)

0)

-0.05a)

0)5

-0.06 F

-0.07

mirror trades copy trades single trades

Figure 4-1: The mean ROI for all trades data between 2010 and 2012. The returns

(ANOVA p < le 10), and mirror trades p < 0.005).

of the three social types based on earlier are significantly different from each other generate significant positive return(t-test,

63

0.1

0.08 -

0.06

-

0.04-

CC

0.02-

0

-0.02-

-0.04-

-0.06-

-0.08

Individual

Copy

Mirror

EUR/USD OIL SPX500

Figure 4-2: The mean ROI for all trades of three social types based on all new recent data from three instruments: EUR/USD, OIL and SPX500. For all three types of instruments, the return of individual trades are difference from copy trades with K-S test p < 0.001. The return of individual trades are different from mirror trades with

P < 0.00001. The return between copy trades and mirror trades are different with p < 0.00001.

64

4.3.2 Social Dilemma in Structuring Idea Flows

In eToro, one of the best strategies for users is actually to mirror other traders' strategies as seen in Fig. 4-2. User A can choose to mirror a more sophisticated user

B, and eToro will automatically copy and execute user B's all future trades on behalf of user A. Naturally this mirroring behavior implies strong trust in user B for user A, and users who are mirrored and followed the most are likely to be the best traders.

Since the crowd in eToro can choose freely to copy and mirror any other users after reviewing users' performances, can crowd be able to find the best information and formalize the right network structure to achieve excellent trading returns? The importance of this question can also be framed as the following: We have individuals who are brilliant and are able to foresee future financial developments, and we know that learning from these people makes ourselves do better, why can't we create the right social interaction networks and pass the optimal ideas through these networks, rather than creating again and again big investment mistakes.

We construct the following models for our study. For each user u in the eToro database, we define the number of followers (those who mirror the users) at week w as

fuw.

The number of newly added followers in week w is denoted as a,, and the number of people who used to follow but leave on week w is 1,,w. Therefore, fuw u= ,,,w. We use three performance metrics for each user: the mean

ROI for all the trades executed by user a between time w, and w

2 is rU,wW2; the total amount of cash profit the user made during this time period is P-,wlw2; and the

Sharpe ratio [134] for these trades during this time period is sU,W1,W2.

We run the following two regressions. For the users left, we normalize it by the total base of followers.

a,w

~)

3

1Pu,w-4,w

+

/2ru,w4,w

+

/3Pu,w-12,w

+

/4r,w-12,w+

(4.1)

3

5Su,w-12,w

+ 3

6Pu,w-52,w

+

/#7ru,w-52,w

+

/sSu,w-52,w

+ i3fu,w-1 + /0 + E (4.2)

65

fu+W-+

~/

3

1Pu,w-4,w + f

3

2ru,w-4,w + 3

3Pu,w-12,w + f4ru,w-12,w+

)3su,w-12,w + 06Pu,w-52,w -+ 07r'u,w-52,w -+ 088u,w-52,w i- +

(4.3)

(4.4)

Table 4.4: Linear regression results of the two regressions in Eq. 4.2 and Eq. 4.4.

The first column is for newly added users, and the second column is for newly left users. N/A indicates that this factor is not in the regression model. (*:p

**:p < 0.01, ***:p

< 0.001)

<

0.05,

(sE")

Pu,w-4,w ru,w_4,w

Pu,w-12,w ru,w-12,w su,w-12,w

Pu,w-52,w ru,w-52,w su,w-52,w fU'w_1

N

adj. R 2

New followers a,, Followers left: 1"W

9.09e-5

(8.88e 5)

-4.54e-5**

(1.59e-6)

-2.46

(2.22)

-0.012

(0.039)

6.08e-5

(7.08e-5)

24.50***

(4.13)

-0.041

(0.049)

2.71e-6

(1.27e-06)

-0.076

(0.074)

-0.0049 ***

(0.00089)

-5.78e-5

(2.95e-5)

-15.725

(7.93)

-0.17

(0.14)

-2.82e-6 ***

(5.29e-7)

-0.079

(0.14)

-0.018***

(0.0024)

0.11***

(0.0035)

3699

0.24

N/A

N/A

0.087

Based on the regression results in Table 4.4, we discover that the following-atarget-trader behavior is heavily correlated with the recent (12 weeks) performance of the target user to be followed, as well as the existing number of followers the target user has. On the other side, leaving a target trader is generally correlated with a user's long term as well as short term risk adjusted return.

66

Some additional information here: if we regress without including the current number of followers for Eq. 4.2, the adj. R 2 will come down to 0.029. Therefore, it is very much the case that the number of followers is the key driver for following behavior. We also regress on the unfollowing model by also including the current number of followers, and the result is shown in Appendix B. Although our dependent variable 1u' is already normalized by the total number of followers, we still notice f W-1 in Table B.1 that the unfollowing behavior is negatively correlated with the total number of followers of the trader to be unfollowed. Including this variable of total followers also doubles R

2 in our model. This suggests that users are even more likely to unfollow a trader with a lot of followers.

While we will discuss very shortly the theoretical background of these behaviors,

I would like to first discuss one key question: if I were to harvest the best crowd wisdom, what would be the optimal strategy?

We study this problem by constructing so called optimized trading strategies using the wisdom of the crowd of eToro traders. Our strategy is mainly influenced by the actual users' behavior on eToro. We here construct three portfolio strategies rebalanced daily.

" Simple Best Long-Term Sharpe: The first strategy, referred as the Simple

Best LT strategy, is constructed by looking at the top t users on eToro platforms.

For each trading day, we rank each user by their accumulated continuously compounded risk adjusted return (i.e. sharpe ratio) up to this day, and execute the same trades of the top t users by evenly dividing possessed capital among these users. If none of the top t users is trading on a particular day, we don't trade either.

" Simple Best Short-Term Return: The second strategy is referred as the

Simple Best ST strategy. We construct the second strategy by analogy to the first Simple Best LT strategy. Everything remains the same except we rank users by their recent previous 12-week return rather than the long-term risk adjusted return. This is to follow the actual eToro users' behavior.

67

M

5-

4-

3-

8

7 _

6

1

2-

0 T 1

Top 10

I

Top 50

Simple Best - Long-Term Sharpe

Simple Best - Short-Term Return

Social Best

*i~iI-

To p100 Top 200 Top 400

Figure 4-3: The return of the three strategies based on the data trading period. We note that Simple Best Short-Term Return strategy is actually the best way to generate the largest return.

* Social Best: Similar to the first two strategies, we rank users by their numbers of followers rather than their performance. The intuition behind this strategy is that the best users recognized by the crowd are the best users on the platform.

We execute the same trades of the top t mirrored users by evenly dividing capital among them.

We show the return (Fig. 4-3) and the sharpe ratio (Fig. 4-4) for all three strategies looking at strategies from top 10 to top 400 users.

4.3.3 Discussion

The traders obviously rely on different criteria for decisions of following and unfollowing another trader. We have already established the fact at the very beginning of this

68

6

5

4

0)

-a a) 3

N

C

2

Simple Best Long-Term Sharpe

Simple Best Short-Term Return

Social Best

0

Top 10 Top 50 Top 100 Top 200 Top 400

Figure 4-4: The return

The best strategy is to of the three strategies based on the data trading period.

look for individuals with the best long-term risk-adjusted performance. Therefore, long-term Sharpe ratio seems to be the best metric for valuing a trader.

69

10

....- distribution of mirror trades no. of followers

...- ......-...-.......

0

data

- - power-law fit: a=1.5

X 10

A

-2

U)

0

0 10

0

6

0

L

-4

100

100

10-50

101 102 10

3 no. of followers

104 10 106

Figure 4-5: The distribution of number of followers for each trader shows a strong power-law pattern.

sector that idea flow is important in improving trading performance. However, how traders leverage such crowd wisdom do not reflect the best potential value in social trading. Traders unfollow another trader based on our standard and best financial performance metrics: the long-term risk adjusted return, but traders rarely follow other traders using this metric. It is important to emphasize that in eToro users can view freely the long term and short term performance of any trader. Therefore, such findings we have cannot be explained by the design of the eToro user interface. It is extremely interesting to discuss why this is the case.

In social research, especially recent ones, we have already seen similar observations on the social feedback theory: our results match very well with a well-known cultural market study [131], in which the crowd votes for the best songs on an artificial online music sharing website where previous user votes are recorded and displayed. The sharing website has multiple independent universes to attend for repeat studies. Researchers discovered that the "best" songs always rank at the top, but most songs

70

had uncertain ranking outcomes in different universes. In eToro, the top 10 best traders recognized by accumulated performance and crowd choices generate comparable return. However, when we further diversify our portfolio by including more top followed users, we start to realize that the crowd's selected experts have less certain skills, just like many top ranked songs in the cultural market study.

We suspect that the process of expert eliciting on eToro is similar to a preferential attachment model [16], because in eToro users are also provided with the information of the number of followers about another user. We notice that the number of followers in Fig. 4.3.2 forms a distribution well fit by a power-law curve, which is a strong indicator for preference attachment behavior [16].

The other potential reason of such observations may come from the idea of rational herd behavior, which can be found in the classic literature of Banerjee[15] and

Bickchandani et al[23]. The central idea, as described by Hey et al[78], is that traders assume that others be in possession of private information. The mere fact that more people are following some one is a good evidence of a strong crowd believe that the person being followed is superior to others. As most herd models suggest, this is an positive feedback loop process: the more people do the same thing of following some one, the more likely more people will follow this individual. This provides a theoretical explanation for observations in social network experiments such as in Salganik et al. [131] and Centola et al [36].

Even though in eToro you do have full information about a user's performance simply by opening a user's profile, I suspect that evolutionary reason is likely the force behind such "social stupidity" or "socially undesirable behavior" as tossed by

Hey [78]. Generally, for financial advisors we are not allowed to observe their total track records, and we often do seek other information from others to diversify our choices. In the real world, there does exist a significant amount of private information, and certain herd behavior gives us quick responses and decisions when we cannot find or verify information. For instance, imagine in ancient societies, if everyone in your village starts to run, one will also join the force to run before verifying what's happening. Such rapid response might save you from a war invasion quickly than you

71

spend time to observe the enemy coming.

Other potential reasoning is that it is simply impossible to go over many different individuals to evaluate each one's performance, as in social science there exist such limit for how many contact one can have [71]. It is socially awkward, costly and nearly impossible for one to screen many others. Therefore, the simple strategy we proposed can theoretically generate great return, but it would be hard, in real life, for an individual to implement.

We do also see that humans observe a strong appetite for short-term return, and in our Simple Best Short-term Model, we do find that such approach yields actually a decent return, although not the best adjusted return approach. After all, investors are rewarded by the pure return rather than lack of risk, so such mirror approach actually does predict supreme realized results.

4.4 Idea Flow and Idea Overflow

Besides mirroring other users, eToro also allows users to copy a single trade order rather than every trade from another user (the copy trade type). Users can log into the Open Book to look for other users' trades, and click to copy one trade. If a user copies a trade, it implies that the user has come to the Open Book and consulted other users' trades. He/she made the decision based on his/her own judgment as well as the social influence from others. Readers must be warned that the individual trades (trades users did on their own in eToro) might also be influenced by some others, as the users may look at other trades and then enter the trade himself rather than click the "copy" button. It should be mentioned that copying trades are less common (~ 2% of all transactions) in eToro.

4.4.1 Social Influence in Copy Trades

We define the following notation: for each day d, we compute the percentage of long orders among all single trade order flows denoted as sd, and the percentage of long trade orders among all copy trade order flows as

Cd.

We also refer to sd as single

72

market perspective, and Cd as crowd market perspective.

We here plot sd and Cd in Fig. 4-6 from my initial paper[118]. Our discovery is surprising: while individual beliefs in buys and sells are fairly stable, the speculations in copy trades are a magnitude more volatile (oa

2

: 0.006 vs. 0.03, F-test p < 10-36).

Therefore with explicit social inputs, users tend to become more extreme rather than converge. Social influence seems to play an important role in driving market volatility.

The reason for such volatility lies in the idea of social influence: exposing one to others' idea will change one's idea towards that of others.

To illustrate this idea, We plot

(Sd, cd),Vd on a 2-d space, and the results are illustrated in Fig. 4-7. To better compare different assets classes, we again show the plots for all trade transactions on EUR/USD currency, on OIL commodity, and on

SPX500 index. Fig. 4-7 clearly suggests that a linear correlation with slope ~~ 1.5.

In other words, the explicit exposure to others' trades drives the market further from parity. For example, when 60% of the crowd individually decides to short the stock, exposing the crowd to others trading behaviors will likely push 80% of the crowd to short. If we believe that the average individual opinion of the market is the best price of the market, then social influence seems to encourage overreaction and to drive the market to extreme.

4.4.2 Excessive Copying and Idea Overflow

One of the most interesting questions in all crowd sourcing ideas is for each individual to evaluate how much external influence one should adapt to his/her own idea (later we refer it as users' susceptible level). One extreme is none: a user simply ignores any input from their social ties, and sticks to his/her own idea. The other extreme is everything: a user simply abandons his/her idea, and completely relies on others' ideas. In game theoretical literature, this problem does not exist as there is generally a rigid inference framework based on some pre-set well-defined priors and distributions [152]. However, this is not the case in reality, and I will give some examples now.

While users tend to go to these two extreme directions, we do suspect that both

73

all long daily percentage of long and short single trades copy trades even all short [L,

Mar-11

I I I L I

Jun-11 Sep-11 Dec-11

Figure 4-6: The single market perspective versus the crowd market perspective for all trading days for all EUR/USD trades. The x-axis for each of the three plots is the percentage of buy orders in all individual trades, also referred as Single Market

Perspective. The y-axis is the percentage of buy orders in all copy trades (referred as

Crowd Market Perspective).

extremes are generally not optimal [122]. One of the simplest way to measure susceptible levels of traders is to see how likely the crowd is going to look at others' trades.

In our eToro case, for one single instrument, if the total number of single trades is ind- for day d, and the total number of copy trades is n'PY then we can define social ncopy tendency level td as . We recognize that this social tendency level td is different from the susceptible level of each individual. Yet we argue that it does serve as a reasonable proximation: If td is large, we know that more people are actively looking into others' ideas before trading on their own, which further suggests that they will

74

0

0

0.6

-.

0.2

G

/

1.4 1.4

0 EURUSD a-1.8392 R

2

82

-0.41872

1.2 -1' ap*

0.4 0

/

-

.

1.2 0 OIL

a=1.4212

R

2

-0.63421

0

0

1.4

1.2

0 SPX a=1.6543 R

2

-0.39754

C

0

0~~~~

00

0

-

E0.2

0.

-0.

-0.

02

0.6-0.4

0

08-

Percntage of

Buy .Percntage a

-0.2,

0 .2 0 0 06 08 0 of

Buy .Percntage

-0.2

0

-0.4,y

-0.6

.2 04 0 .6 o

Buy0

.80

Figure 4-7: The single market perspective versus the crowd market perspective for all trading days for three different asset classes: EUR/USD, OIL and SPX500. The x-axis for each of the three plots is the percentage of buy orders in all individual trades, also referred as Single Market Perspective. The y-axis is the percentage of buy orders in all copy trades (referred as Crowd Market Perspective). be more susceptible to social ideas.

What drives td: Let's focus on one instrument EUR/USD, and we collect the following independent variables for regression.

The first thing we use is a global tendency indicator: ga is the total trend for keyword searching on "ECB Rates" on Google Trends (roughly the counts of searches containing this key word). Google Trends actually becomes a potential source for financial purposes[126] recently. There are many other key words relevant to Euro currency exchange rate, and the difficulty in weighting between different key words for EUR/USD is beyond the discussion of this work. I have explored a few different key words, and during the Euro crisis the key word "ECB rate" seems to explain high level of variations in td.

The second element we suspect is volatility: Vd is the implied volatility level for

EUR/USD, in this case we use the value from EVZ, the CBOE implied EUR/USD volatility index.

The last one we introduce is the entropy (defined as e) of the percentage of buys and sells using single market perspective sd.

We study the following linear regression for each day in our trading data:

75

td ~

3

1 gd -+0 e

We also expand the same study to all GOLD trade as well, because it is the most traded commodity in our most traded instruments list in this dataset (see Table 4.2).

Consider the fact that there are many factors related to gold commodity price

[I,

we decide to not include the Google Trend variable into the regression. Also we use the

CBOE GVZ gold price volatility index as an volatility measure instead of EVZ. The regression results are presented in Table 4.5. As copy trades represent only a small portion of the total trades, so for other less popular instruments such as OIL and

SPX500 on eToro, the data are not sufficient to run any statistical tests.

Table 4.5: Linear regression results for different potential factors which might explain

social tendency level for copy trading on both EUR/USD and GOLD. (*:p < 0.05,

**:p < 0.01, ***:p < 0.001)

EUR/USD GOLD

9d (Google Trends) 0.00019*** N/A

(6.53e 5) (N/A)

Vd

(Volatility) 0.00084 ** -0.00024

(0.00017) (0.00014) ed

N

adj. R 2

-0.18 *** -0.038 **

(0.010)

305

(0.011)

0.50 0.038

The conclusion is that for EUR/USD, users seem to be heavily exposed to others' opinions when market volatility is high and the baseline trading activities are more directional. While for commodity trading, only the directional movement entropy is significantly associated with copy trade behavior.

For EUR/USD, we discover a strong correlation between certain Google Trend keywords counts and copy trade behavior. The more people are seeking information about these underlying instruments, the more likely people are seeking other people's ideas, and as well more susceptible. We suggest that the discoveries found in Preis et al.[126] are partially explained by our data: The users' susceptibility levels can be

76

measured by their Internet searching behaviors. The raise in susceptibility increases the correlation on opinions of individual traders without real new information. Such correlation due to idea overflow accelerates the movements of price out of its normal rational bounds, which are often referred as bubbles or crashes.

4.4.3 eToro Idea Overflow to Market Idea Overflow

We already know that eToro users have different levels of social susceptibility at different time periods. We also realize that social influence is generally stronger at high directional entropy time periods. Therefore, I realize that at times of high social susceptibility, eToro traders in aggregation might over-react to the market. We wonder if such over-reaction is general towards the whole population of traders.

We first define the market trend t'. td = avg(pd-i:d), (4.6) where

Pd is today's price, and avg(pdji:d) is the average price from the previous i days, with i as the average window size. The price trend is to eliminate high frequency noise from the market trend. If t' is positive, the market has a up-going trend; otherwise the market tends to go down.

Let's focus on EUR/USD now. We correlate both t* and cd with different values

of i - {1, 3, 5, 7, 14, 20}, and shift the trend days forward and backward to check if eToro users' market perspective predicts or follows the market trend. In my early work

[118],

we find that there is no significant correlation between market trend and individual sd (r < 0.02, p > 0.10). However, we observe a strong correlation between the crowd market perspective and the next day's price trend (r = -0.35,p < 0.1).

The results are shown in Fig. 4-8 for different window sizes for market trends and different shifts.

We clearly see Fig. 4-8 that the strongest correlation is at i ~ 10 with tomorrow's market trend. As a result, social copy trades can actually be used to predict tomorrow's market trend. Also notice that the coefficient in Fig. 4-8 is negative, which

77

correlation longing copy trades v. market trend

-0.05

-0.1

0

1-day average

3-day average

7-day average

14-day average

20-day average

0

C

0

-0.15-

-0.2-

0 -0.25-

-0.3

-0.35'

-3 -2 -1 0 shifts in days

1 2 3

Figure 4-8: The correlation coefficient between Cd and t' of different window size i shifted from -3 days to 3 days. The largest correlation occurs when market trend is one day after, with an average window size of 7 and 14 days.

means that the more long social trades are today (long trades imply expectation of market recovery), the more likely the market is going down tomorrow with respect to its weekly average. We plot both the Cd and td (shifted by one day) with i = 14 in

Fig. 4-9, and it is pretty visible to notice that when Sd goes extreme in one direction,

td usually goes the opposite direction.

EUR/USD: social long trades predicts next-day trend

1 13.05

0.5

-0

_j

Mar-11 Jun-I11 Sep-1 Dec-1I

Figure 4-9: We plot the crowd market perspective together with the one-day-behind real market trend smoothed by a one-week average window. We notice that the crowd market perspective strongly negatively correlates with the next-day market price.

We can use similar ideas from the mean reversion strategy to construct a trading strategy. We balance the portfolio daily. The trading strategy is as follows: We define

78

threshold first: If at any time, the social tendency level td

is threshold standard deviation higher than the mean of td, we consider that there is a strong possibility of over-reaction in the eToro group. We then trade in the direction negative of t', and try to profit from the correction of the market. We hold for holding days before cashING out the position.

The key assumption of this trading strategy is to assume that the whole world is also similar to eToro traders: when eToro users are seeking more social advice, many of the traders in the real world will do so as well. When eToro traders are influenced

by social advices, so will the traders of the real world. We show the Sharpe ratio with different settings of threshold and holding values in Fig. 4-10 below.

Our trading strategy generates a reasonable return with sharpe > 1 as shown in

Fig. 4-10. We also use the standard in-sample and out-of-sample techniques to avoid overfitting: We fit threshold and holding for the year 2011 and 2012 data, and apply the trained parameter to the rest of the data (mainly 2013). Overall, our trading strategy fits well during both phases as shown in Table 4.6.

Similarly, we apply the same technique to create a gold trading strategy. The results are shown in Fig. 4-11.

Table 4.6: In-sample and out-of-sample testing for my social mean reversion strategy.

Sharpe Ratio EUR/USD GOLD

In Sample: 1.064 2.382

Out of Sample 1.036 1.029

4.4.4 Conclusion: Bubbles and Overflow

The implication of this over-reaction strategy suggests that by measuring a small group of social interactions we are able to predict the overall market over-reactions. If we agree that bubbles and crashes in the financial market are some kind of overreaction and mis pricing from market participants, then the observations in eToro can provide us some more insights into the formation of bubbles.

79

1.4

1.2

"

0.8

1

0.6

S0.4 -

0.2

0-

-0.2-3

-0.4 >.

1.5-

Threshold (in z-score)

0.5

0

1.5

Holding Time (in day)

Figure 4-10: We show the Sharpe ratio of our strategy using different values for

threshold and for holding days. The strategy shows a potential return to bet on market mean reversion one day after eToro traders show sudden significant large copy trading tendency.

My work confirms the existence of a positive feedback mechanism, as Sornette et al. [142] suggests. In our regression, we show that people are more urgent to seek social answers when the market is directional. We provide here micro-level evidences for the macro-level phenomena of over-reactions.

There exist other explanations for over-reactions of the market. One of the arguments is agent-based. You can generate bubbles and crashes if speculative trend following agents has a stronger presence than fundamental trading agents in financial markets [157] [96]. However, these models assume that agents are either completely technical and information driven, or completely speculation driven. We know that as

80

3

2.5 -4

2 J

0/)

1

0.5

1-A

2

0

1.5

TI

6 1A0

Ihres k (n z-scOre")

0.5

1

0

2

6

4

Holding Time (in day)

Figure 4-11: We show the Sharpe ratio of our strategy using different values for

threshold and for holding days for our mean reversion strategy based on eToro traders' social tendency on GOLD instrument. The strategy shows a potential return to bet on market mean reversion three days after eToro traders show sudden significant large copy trading tendency.

humans we belong to neither. Our approach suggests a homogenous set of individual traders. We show that the underlying assumption of trading with given information is unrealistic. When traders are presented with information from the news, from the mass media, or from others, it is impossible for traders to determine the value and authenticity of all the information.

When presented with a solid material information such as, theoretically, a war between two large countries in the world, one can expect that all traders will start to sell with no hesitation. However, these types of information are not the prevailing type of information in the market. Ideas and information flow in the financial market, and

81

8

the key, as suggested in this research, is to study how users respond these information and influence. I believe that understanding and monitoring such responses are keys to prevent financial market crises.

4.5 Discussion

We study two major micro factors under the idea of social idea flow: How networks are constructed and how users adopt external information.

On the first point, we realize that the well documented social influence preferential attachment process exists in social trading systems as well. Therefore, even under strong economic incentives, traders are less likely to act in an optimal fashion. Rather, they still follow their social intuition, and decide to trust people who are trusted the most by others. Such ties of idea flow increase the correlations of trading strategies, as one individual's idea can change many others. However, traders do re-evaluate ties during their mirroring period, and they tend to ignore the social influence in their re-evaluations. This dynamic process creates, and eventually, ends herding behavior.

The second important point is that in real settings, users can often decline to adopt to ideas from social ties. We show in general that social influence in financial trading is strong. However, such influence is stronger when market is moving in a more volatile and more extremely directional fashion. We might also be able to measure the strength of global social influence by looking at Google Trend key search word counting data. We find that the social susceptibility on eToro can be extrapolated into predicting and measuring the global financial market.

82

Chapter 5

Conclusion

This dissertation covers new techniques for measurements and analysis of economic financial systems: I started from the community of the Friends and Family Study. I looked at city scale analysis, then moved to traders' networks of the financial market.

I have shown how data from mobile devices, Internet, and social networks can be applied to better understand these economic systems.

5.0.1 Future Directions

The key concept of reality hedging is a new approach for understanding financial and economic systems. In classical economic research, we measure financial systems by macro indicators such as price, earning, etc [34], and we measure social economic systems by its total output such as GDP, infrastructure construction, unemployment, etc [93]. However, this thesis examines a core part of these models. I suggest that a crucial part of the human society, the interactions and flows of information, is useful but often abandoned in mainstream works.

As experiments and data have enabled physicists to build new models about matters, the new big data developments provide better data to understand our society.

We are the first generation of economic researchers to get to observe the behaviors of millions of individuals in economic and social systems. Reality Hedging is the idea to use these data to understand these systems better, and to provide valuable

83

tools in explaining problematic phenomena that cannot be explained reasonably by traditional rational economic models.

The applications of reality hedging are broad.

For city developers, the idea to measure cities by looking at how easy people can commute, meet and work with others face-to-face in a city can be extremely valuable.

As we showed in our paper [119], cities such as Beijing are de facto a few cities attached to each other because the terrible traffic situation, and therefore it may prevent these cities to harvest the super linear productivity growth benefits from its continuously growing population density.

For market regulators and investors, by understanding the collective behavior tendencies of market participants, we can better understand the underlying forces for price movements in the markets, and estimate the efficiency and risks in the market.

Therefore, we are able to measure bubbles even without price information.

For governments and policy makers, reality hedging can also be valuable. In

Appendix A, I will present another example for reality hedging: how can we use social media to understand political incentives? I show that how a practical challenge with a social incentive involving tens of thousands people was developed, and how my research was able to precisely measure the effectiveness of an economic incentive structure by monitoring social interactions. This is another great application of reality hedging.

5.0.2 Caveat

Readers must note that there are obvious caveats in this dissertation on reality hedging. I discussed the limitations of each chapter in some of my published works [119] [124] [118] [2].

One key take-away is that these new data and new measurements are supplementary to existing measurements and models. We discussed in the urban paper that economic developments of cities also largely depend on factors such as infrastructure, government, education, and many others [119]. The simple increasing of social tie density to a poor African city will not create the same effects as to an American city. In financial markets, price is driven by important fundementals and the rational

84

inference of all available information: No matter how good the social dynamics model we have, reality hedging cannot predict the next move of Federal Reserve or the next war. These types of information will ultimately change the market price no matter how social interaction models suggest.

5.0.3 Privacy Concern

The most important question in this big data movements is the safety of privacy.

There are no doubt that privacy is important in all aspects of data research, not just research in this dissertation.

There can be potential danger in the massive collection of personal data in this world. As the benefits are tremendous as I demonstrated in this dissertation, It would make much more sense to better regulate data sharing rather than completely ban the sharing and analysis of user data. In addition to strict data anonymization and all other privacy technologies, I think the right direction is to provide individuals the ownership and control over their own data as Pentland suggests

[121].

Some practical systems have already emerged to help solve these privacy problems

[47].

The privacy concern raised from big data research often prohibits the most fundamental practice of any scientific work: the re-production and verification of research results. Due to various privacy concerns by the owners of many big data platforms, it is often the practice in big data research that the researchers are not allowed to share datasets used in his/her papers that are published in academic journals and conferences. One instance among many is the research work comes from Facebook

[14]

[29].

While the subject of these works is the most interesting online social system, it is difficult for other researchers to get involved in such types of datasets or research platforms. However, there is strong reasoning behind such practice, as some researchers are already showing that so-called anonymous data from mobile phones and other big datasets can be re-identified easily

[46] [108].

More research is absolutely necessary in these sensitive areas of data sharing and anonymization. Some theoretical works

85

on verification without sharing data might be a potential solution [127].

86

Appendix A

Red Balloons: Using Social Media

Data for Monitoring a Large-scale

Social Incentive Task

A.1 Introduction

This appendix intends to illustrate one practical application of Reality Hedging. I show how to use social media to monitor a policy incentive in a real crowd sourcing task. This example illustrates the benefits of using social media responses to monitor and analyze the effectiveness of an innovative peer-to-peer (p2p) incentive policy on social systems.

A.1.1 Social Incentive

With the raise of social networks, we start to see an increasing number of interesting economic works based on building micro economics theories on social interactions. For instance, Ankur Mani et al. has provided some very new and interesting frameworks on such ideas [1001. In fact, many utilities companies, in both US and Europe, have started to use his idea of peer pressure to encourage energy-efficient life styles.

One of the example is FunfFit: I have conducted some peer influence incentive to

87

encourage the adoption of a healthy and active life style in the Friends and Family

Study [3]: Between October-December 2010, an intervention was presented to participants as a wellness game to help them increase their daily physical activity levels measured by accelerometer sensors in their smartphones. 108 out of 123 active subjects at the time elected to participate. Subjects were divided into three experimental conditions: Control, Peer-View, and Peer-Reward. Following an initial period where baseline activity levels were collected, all intervention subjects were given feedback on their performance in the form of a monetary reward, R, which was calculated as a function of their activity. Reward of up to $5 was allocated every three days.

In the control condition, subjects saw only their own progress on their phones.

Also, reward given to the control subjects depended only on their own activity. "Peer-

View" subjects were shown their own progress and the progress of two 'Buddies" also in the same experimental group, which are assigned to the participants before the experiment. In turn, the subject's progress was visible to two other peers in the same experimental group. Each subject's reward still depended on his own activity. "Peer-

Reward" subjects were shown their own progress as well as that of two Buddies, but this time subjects' rewards depended solely on the performance of their Buddies. See

Fig. A-1 for a demonstration of these different mechanisms.

Control

Peer-See Peer-Rewai6 y

Figure A-1: A demonstration of the three mechanisms we use to increase physical activity. Left: Control, where subjects are paid by their own performance. Middle: "Peer-View", where subjects see their peers' performance but are paid by their own performance. Right: "Peer-Reward", where subjects are paid by their peers' performance.

We discovered that by computing the increase of activity levels per reward dollar,

88

the "Peer-Reward" generates much better performance. The "Peer-See" generates twice the level of activity per reward dollar, and "Peer-Reward" doubles the level of activity of "Peer-See" [3], all with p < 0.001. There are tremendous benefits to leverage up the intrinsic value of the small reward money by using social relationships, and we show success in the small Friends and Family Study community.

A.1.2 Large-scale Social Incentive

One of the challenges of these research projects is that they are all at a fine small scale.

It is easy to monitor and survey a small population to study an incentive. What if we want to design some planet-scale new social incentive programs to encourage certain activities for the benefits of the society? How could we understand the responses of the society with respect to a special new social incentive/policy?

This example is one of the first large scale social incentive experiments for an important crowdsourcing task, and I will pay much of the attention to how we actually used big data from the Internet to better understand the incentive we developed. I believe this is the start point of a bigger research initiatives to better monitor and understand economic policies in large scales and in the long run.

The story we are going to talk about here is how I, together with the team, won the DARPA Network Challenge, which is a worldwide research competition hosted by

DARPA to celebrate the 40th birthday of the Internet and facilitate research in social media research. However, the focus of this chapter will not be winning the challenge itself, but how social media helps us to monitor and analyze the effectiveness of our winning strategy. The detail can be found in Pan et al.[124].

A.2 Background

A.2.1 The Challenge

Recognizing the difficulty of time-critical social mobilization, the Defense Advanced

Research Projects Agency (DARPA) announced the DARPA Network Challenge. The

89

announcement, which coincided with the 40th anniversary of the first remote log-in on the ARPA Net (considered the ObirthdayO of the Internet), was made at the

University of California, Los Angeles on October 29, 2009. Through this challenge,

DARPA aimed to Oexplore the roles the Internet and social networking play in the timely communication, wide-area team-building, and urgent mobilization required to solve broad-scope, time-critical problems 0.

The challenge is to provide coordinates of ten red weather balloons placed at different locations in the continental United States. The unique look of the special weather balloon was announced a month ago over the DARPA website. On the day of the competition, all balloons would be taken out and exhibited in a populated area for a full day. DARPA officers will be next to the balloon to hand out its exact geo coordination. Whichever team reports first all the ten balloons' exact locations will be the winner of the competition. According to DARPA, Oa senior analyst at the

National Geospatial Intelligence Agency characterized the problem as impossibleO by conventional intelligence gathering methods.

A.2.2 Our Winning Strategy

According to the DARPA report, between 50 and 100 serious teams participated in the DARPA Network Challenge, from a total of 4, 000 teams [50]. Moreover, approximately 350, 000 people participated in the DARPA Network Challenge in various ways, ranging from searching for balloons, to simply being aware of the challenge and willing to report a balloon if spotted.

The MIT Team, which won the challenge [491, completed the challenge in 8 hours and 52 minutes. In approximately 36 hours prior to the beginning of the challenge, the

MIT Team was able to recruit almost 4, 400 individuals through a recursive incentive mechanism.

The MIT Team's approach was based on the idea that achieving large-scale mobilization towards a task requires diffusion of information about the tasks through social networks, as well as incentives for individuals to act, both towards the task and towards the recruitment of other individuals.

90

We consider the MIT Team's approach to the DARPA Network Challenge to be an instance of a more general class of mechanisms for distributed task execution.

We define a diffusion-based task environment which consists of the following: N =

{a,

... , an} is a set of agents; E C N x N is a set of edges characterizing social relationships between agents; IF

=

{o,

... , m} is a set of tasks; P : N x T ->

[0, 1] returns the success probability of a given agent in executing a given task; B E R be the budget that can be spent by the mechanism.

In a diffusion-based task environment, unlike in traditional task allocation mechanisms (e.g. based on auctions), agents are not aware of the tasks a priori. Instead, they become aware of tasks as a result of either (1) being directly informed by the mechanism through advertising; or (2) being informed through recruitment by an acquaintance agent [151]. Another characteristic of diffusion-based task environments is that, when a task is completed, the mechanism is able to identify not only the agent who executed it, but also the information pathway that led to that agent learning about the task. The pathway leading to the successful completion of task V/i is captured by the sequence S(@i) = (ai,... , ar) of unique agents, where a, is the agent who completed the task, a, was informed of the task by ar- and so on up to agent a

1 who was initially informed of the task by the mechanism. By slightly overloading notation, let JS(Oj)j denote the length of the sequence (i.e. the number of agents in the chain), and let aj E S(Oj) denote that agent a

1 appears in sequence S(Oj).

We can now define a class of mechanisms that operate in the above settings. A

diffusion-based task execution mechanism specifies the following: I C N is a set of

initial nodes to target (e.g. via advertising); pi is the payment made to agent aj; such that the following constraint is satisfied: c11 +

ZiEN pi < B.

In words, the mechanism makes two decisions. First, it decides which nodes to target initially via advertising. Second, it decides on the payment (if any) to be made each agent. The mechanism must do this within its budget B.

In the DARPA Network Challenge, each @b represents finding a balloon, and v(4') = 4, 000 for all

'bj

E T. Moreover, we assume that the ten tasks are all identical

(namely finding a balloon), and all task are indistinguishable, Vai E N,

V4k, 01 E

4'

91

we have P(ai,bk) = P(ai, i

1

). That is, the success probability of a particular agent is the same for all balloons.

We are now ready to define the MIT Team mechanism, referred to as a recursive incentive mechanism.

Figure A-2 illustrates how this mechanism works.

asas a,

V2 completed!

$2,000 a

4 p

$250 > Charity gets $250

$500

~>Charity gets $500 a4

$2,1,00

(a) Example social network.

(b) Recruitment tree with two paths (shown in thick lines) initiated by a, led to finding balloons.

Figure A-2: Recursive incentive mechanism: (a) Suppose that in this network, agent al recruits all of his neighbors, namely a2, a5 and as. Suppose that a8 recruits a, who finds balloon 01. (b) We have a winning sequence S(,01) = (a1, a

8

, a6) with IS(1)I

=

3. The finder receives ps = 2(3+1) = 2, 000. Since a8 recruited a6, then ps

-

2(+000) = 1, 000.

From this sequence, a, receives 4,01+1 = 500. Likewise, looking at the left recruitment path, we have a winning sequence S(02) = (al, a2, a3, a4) with IS(4'2)I = 4.. The finder receives P4 = 0 = 2, 000. As above, we have p3

-

0 = 1, 000 and P2 = 24,+0)

500. From this sequence, al receives (

4

-

1

+1) = 250. Adding up its payments from the two sequences it initiated, a, receives a total payment of pi = 750. Assuming there are only two tasks, the surplus in this case is S = (4, 000 3, 500) + (4, 000 3, 750) =

750.

A.2.3 Academic Significance of Our Winning

As Tang pointed out, The Challenge demonstrated that geospatial intelligence is potentially available to anyone with an Internet connection, not just to government intelligence analysts[145]. The highly publicized DARPA challenge suggested that the crowd is very powerful if it is utilized correctly.

What surprised me is the overwhelming interests from the game theory and mechanism design community on the Red Balloon Challenge. While game theory related social incentive idea did exist before the DARPA challenge[84], 2inning the DARPA

92

Challenge with a scientific sound strategy is one of the very few successful studies on the actual performance of a game theory financial incentive[90]. This work has also inspired a lot of new theoretical[85] and practical ideas[129] along the line of crowd-based mechanism design.

All these academic recognition is built on the legitimacy of the MIT strategy.

The DARPA Challenge is not a well-controlled experiment, and it is a one time only contest. We heard a lot of research questions raised by fellow scholars regarding if we were simply "lucky" to win the challenge. The academic world also requested us to show a quantified approach for measuring the effects of the financial incentive. We adopted the analysis of big social data to tackle these problems.

A.3 Using Tweets to Analyze of the MIT Incentive

Structure

The balloon challenge was announced by DARPA on October 29, 2009, which launched the balloons on the morning of December 5th, and all balloons were found on the same day. Since the challenge and naturally our strategy implementation were not designed specifically as an experiment, we are limited for performing empirical analysis: There is simply no control and A/B test during the competition.

I managed to partially solve this problem. The richness of social media data gave us a unique opportunity to study the effectiveness of our strategy, which might not be possible if we were running the DARPA competition ten years ago. While we were busy with participating in the participation, researchers on the west coast, by pure chance, were also working on another project collecting every public tweet on the web[156]. This dataset eventually was passed to us, and became the corner stone of our analysis. Here we talked about two important questions raised most often by other researchers. We have answered these two questions by using the social media data:

0 : How our strategy compares to other strategies?

93

* : How large the roles other factors (mainly the reputation of the MIT name) play in our winning strategy?

A.3.1 Comparison with Other Strategies

Our first task here is to use the social media data to show the difference between our strategy and strategies by other team members.

To provide a qualitative comparison of diffusion between the MIT team and other teams, we analyzed data from micro-blogging site Twitter. We obtained around 100 million tweets for the time period between November 10th to December 9th. This dataset covers an estimated 20-30% of all public tweets for that period [156]. Initially, we filtered out all tweets except these containing the string "balloon" in a case insensitive manner. For each team in our analysis, we count the number of tweets which include either of the following as tweets about the team: team name

(e.g. MIT Red Balloon Team, I Spy A Red Balloon, DeciNena, etc.), team website (e.g. balloon.mit.edu, ispyaredballoon. com, etc.), hashtag for the team (e.g.

#mitredballoon, #ispyaredballoon, #decinena, etc.), short link for the team website (e.g. bit. ly/5chum7, etc.), team's affiliation name including the abbreviation.

(e.g. mit, gatech, George Hotz, Geocachers, etc.). In Figure A-3, we show the tweet counts for five teams. We rank the order of all these five teams by their final position in the competition [48].

The first class of strategies we compare to can be termed an altruism-based strat-

egy. The Georgia Institute of Technology (Gatech) team adopted a charity-based incentive method, by offering to donate all proceeds to the American Red Cross. It appears that they had a very limited number of tweets responding to their strategy.

This suggests that relying purely on altruistic propagation is not sufficient to amass large social mobilization. This may explain the Gatech's team's anecdotal observation that other aspects, such as obtaining a high Google page rank and reliance on conventional mass media, contributed to their success in attracting balloon spotters to contact them.

1

'See http://www.gtri.gatech. edu/casestudy/red-balloon-darpa- challenge

94

700

500 _- a 400-

0 o 300-

200-

100-

-

-

-

-+-

Tweets about MIT Team

Tweets about Gatech Team

Tweets about George Hotz

Tweets about Geocacher Team

Tweets about DeciNena Team

Nov 11 Nov 21 Dec 2

Date

Figure A-3: Raw tweet counts for five teams from the announcement of the challenge to the announcement of the winner. The time series starts at the announcement of the challenge, and ends at the announcement of the winner. The dotted line marks the time at which the balloons were launched. Note that the MIT team launched its web site and mechanism only

2 days before the balloon launch.

Dec 10

Another class of strategies is to capitalize on an existing community of interest to which a team had direct access. We refer to this as the community-based strategy.

George Hotz is a twitter celebrity with more than 35,000 followers, and his strategy was to use his fame in Twitter to help. As it turned out, he successfully created a burst in Twitter on the day he announced his participation in the competition.

Similar to Hotz, Geocacher is another team whose strategy was based on the existing community of geocaching, a sport based on using navigational techniques to hide and seek objects. As can be seen, it also created a burst by announcing its participation to the geocacher community. DeciNena aimed at assembling a balloon hunting team

by posting their participation on every related blog on the Internet to gain attention, but they failed to achieve a reasonable wide-range response.

It is interesting to note that Hotz and Geocacher were able to create a sudden response peak by propagating the news to an existing audience efficiently. However, this response was very short-lived in both teams. The tweet response almost disappeared the day after their announcement. On the other hand, the MIT strategy was able to sustain social response for a longer period, stretching up until the end of the competition. This happened despite not having access to a large community of followers. Instead we started with only 4 people; and after a couple of days, our twitter

95

response achieved a comparable number to Hotz, who started with 35,000 existing followers. Another interesting observation is that after the competition when mass media came to report the winning story of the MIT team, the tweet count actually decreased instead of increasing. This suggests that the incentives provided by the

MIT strategy played a dominant rule in generating Twitter response, rather than the mass media effect.

A.3.2 Understanding the "MIT Brand" Effects in Our Strategy

The second task to use social media totunderstand other potential factors in this incentive structures. Many people argued that the MIT brand helped promote our strategy and increase participation for the MIT team after we won this competition.

I here argue that the "MIT brand" did not play a major role in the success of the MIT team using the same Twitter analysis. In particular, here I showed that the burst of tweets about the MIT team was more sustained compared to other strategies, including those based on celebrity following, which experienced very shortlived bursts. I attributed this qualitative difference to the MIT incentive mechanism.

To further support this claim, bearing in mind the limited data available, we compared the MIT red balloon team's tweet count with another MIT-related event, namely launching a hybrid electrical bicycle. This event took place in the same month, received significant mass media attention

2

, and was the only MIT news exceeding

50 tweets in December in our Twitter dataset

3

. While it is difficult to conduct a systematic comparison, Figure A-4 suggests that this even also sustained a shortlived burst even with major media coverage, and MIT team achieved sustaining burst with our mechanism.

2

E.g. see http: //www.nytimes. com/2OO9/12/15/science/earth/15bike.html

3

See http: //senseable. mit

.

edu/copenhagenwheel/.

96

0.8

0.6

0.4- a 0.2

-

T

-Tweets about MIT Team

-0-Tweets about MIT Bike News

2 4 6

(a) Normalized

8 10

0 50-

C

-50 -Tweets about MIT Team

-- Tweets about MIT Bike News

00

~01 2 3 4 5

(b) Raw

6 7

Figure A-4: Tweet count over time of MIT red balloon team versus another MIT-related event during the same month on December 15. For easy comparison, we shifted data temporally by matching the day when the MIT team launched the online campaign with the day when the MIT bike news was released: (a) Daily Twitter counts for both events (Data are scaled in this figure so that both peaks have the same value); (b) Raw daily increase in

Twitter counts. The vertical blue dash line indicates the day of the DARPA Challenge competition.

A.4 Conclusion

The success of us winning the Red Balloon Challenge has made fundamental impact on the research of crowdsourcing and internet economics, not without any doubt [145].

What is really exciting is the fact that I was able to use some social media big datasets to dig into the actual responses of human behaviors during the competition. Such analysis provides a very unique and broad view about what's going on, and helped researchers to understand the MIT team strategy better and more clear.

There are some other interesting projects on using Internet and social media datasets to evaluate the adoption of an economic policy: for instance, the Twitter feed has been used to predict elections [146] and responses to a presidential debate [52], and social media can predict early flu epidemics [39]. Just as we proved in the Red Balloon Competition, monitoring human behaviors over the Internet may benefit policy makers by providing instant and broad information on the actual adoption and effectiveness of a new policy. Such approaches can help researchers and administrators to bravely design and adopt new innovative polices such as social polices in many cases, and promote important issues such as health with very little budget requirements.

97

98

Appendix B

Tables

Table B.1: Linear regression results

Eq. 4.4 with fu,w- included. (*:p < of coefficients for model variables of regression

0.05, **:p < 0.01, ***:p < 0.001)

Value

(S.E.)

Pu,w-4,w ru,w-4,w

Pu,w-12,w ru,w-12,w

Su,w-12,w

Pu,w-52,w ru,w-52,w

Su,w-52,w fuW_1

N

adj. R 2

Followers left:

-5.077e-6 **

(1.54e-6) f,

-0.002

(0.038)

2.63e-6*

(1.23e-06)

0.053

(0.071)

-0.0048 ***

(0.00086)

-2.04e-6 ***

(5.13e-7)

-0.042

(0.13)

-0.011***

(0.0024)

-0.0010

(0.000062)

3699

0.15

99

100

Bibliography

[1] Viral V Acharya, Lasse Heje Pedersen, Thomas Philippon, and Matthew P

Richardson. Measuring systemic risk. 2010.

[2] N. Aharony, W. Pan, C. Ip, I. Khayal, and A. Pentland. Social fmri: Investigating and shaping social mechanisms in the real world. Pervasive and Mobile

Computing, 2011.

[3] Nadav Aharony, Wei Pan, Cory Ip, Inas Khayal, and Alex Pentland. The social fmri: Measuring, understanding and designing social mechanisms in the real world. In Proceedings of the 13th ACM international conference on Ubiquitous

computing, Ubicomp '11, Beijing, China, 2011 (to appear).

[4] Y.Y. Ahn, J.P. Bagrow, and S. Lehmann. Link communities reveal multiscale complexity in networks. Nature, 466(7307):761-764, 2010.

[5] T.J. Allen. Managing The Flow Of Technology: Technology Transfer And The

Dissemination Of Technological Information Within The R&D Organization.

MIT Press, 2003.

[6] Yaniv Altshuler, Michael Fire, Erez Shmueli, Yuval Elovici, Alfred Bruckstein,

Alex Sandy Pentland, and David Lazer. Detecting anomalous behaviors using structural properties of social networks. In Social Computing, Behavioral-

Cultural Modeling and Prediction, pages 433-440. Springer, 2013.

[7] R.M. Anderson and R.M. May. Infectious diseases of humans: dynamics and

control. Oxford University Press, 1991.

[8] L. Anselin, A. Varga, and Z. Acs. Local geographic spillovers between university research and high technology innovations. Journal of Urban Economics,

42(3):422-448, 1997.

[9] S. Arbesman, J.M. Kleinberg, and S.H. Strogatz. Superlinear scaling for innovation in cities. Physical Review E, 79(1):16115, 2009.

[10] Kenneth J. Arrow, Robert Forsythe, Michael Gorham, Robert Hahn, Robin

Hanson, John 0. Ledyard, Saul Levmore, Robert Litan, Paul Milgrom, Forrest D. Nelson, George R. Neumann, Marco Ottaviani, Thomas C. Schelling,

Robert J. Shiller, Vernon L. Smith, Erik Snowberg, Cass R. Sunstein, Paul C.

101

Tetlock, Philip E. Tetlock, Hal R. Varian, Justin Wolfers, and Eric Zitzewitz.

The Promise of Prediction Markets. Science, 320(5878):877-878, 2008.

[11] K.J. Arrow, R. Forsythe, M. Gorham, R. Hahn, R. Hanson, J.O. Ledyard,

S. Levmore, R. Litan, P. Milgrom, F.D. Nelson, et al. The promise of prediction markets. SCIENCE, 320(5878):877, 2008.

[12] Nikos Askitas and Klaus F Zimmermann. Google econometrics and unemployment forecasting. 2009.

[13] D.B. Audretsch and M.P. Feldman. R&d spillovers and the geography of innovation and production. The American Economic Review, 86(3):630-640, 1996.

[14] Lars Backstrom, Eytan Bakshy, Jon M Kleinberg, Thomas M Lento, and Itamar

Rosenn. Center of attention: How facebook users allocate attention across friends. ICWSM, 11:23, 2011.

[15] Abhijit V Banerjee. A simple model of herd behavior. The Quarterly Journal

of Economics, pages 797-817, 1992.

[16] A.L. Barabasi and R. Albert. Emergence of scaling in random networks. science,

286(5439):509-512, 1999.

[17] Brad M Barber and Terrance Odean. All that glitters: The effect of attention and news on the buying behavior of individual and institutional investors.

Review of Financial Studies, 21(2):785-818, 2008.

[18] Nicholas Barberis and Richard Thaler. A survey of behavioral finance. Handbook

of the Economics of Finance, 1:1053-1128, 2003.

[19] G.S. Becker, E.L. Glaeser, and K.M. Murphy. Population and economic growth.

The American Economic Review, 89(2):145-149, 1999.

[20] L. Bettencourt, J. Lobo, D. Helbing, C. Kiihnert, and G.B. West. Growth, innovation, scaling, and the pace of life in cities. Proceedings of the National

Academy of Sciences, 104(17):7301, 2007.

[21] L. Bettencourt and G. West. A unified theory of urban living. Nature,

467(7318):912-913, 2010.

[22] Patrick Biernacki and Dan Waldorf. Snowball sampling: Problems and techniques of chain referral sampling. Sociological methods & research, 10(2):141-

163, 1981.

[23] Sushil Bikhchandani, David Hirshleifer, and Ivo Welch. A theory of fads, fashion, custom, and cultural change as informational cascades. Journal of political

Economy, pages 992-1026, 1992.

[24] M. Billio, L. Pelizzon, A. Lo, and M. Getmansky. Econometric measures of connectedness and systemic risk in the finance and insurance sectors. 2011.

102

[25] Geir Hoidal Bjonnes and Dagfinn Rime. Dealer behavior and trading systems in foreign exchange markets. Journal of Financial Economics, 75(3):571-605,

2005.

[26] F. Black and M. Scholes. The pricing of options and corporate liabilities. The

journal of political economy, pages 637-654, 1973.

[27] Vincent D Blondel, Markus Esch, Connie Chan, Fabrice Clerot, Pierre Deville, Etienne Huens, Frederic Morlot, Zbigniew Smoreda, and Cezary Ziemlicki.

Data for development: the d4d challenge on mobile phone data. arXiv preprint

arXiv:1210.0137, 2012.

[28] Johan Bollen, Huina Mao, and Xiaojun Zeng. Twitter mood predicts the stock market. Journal of Computational Science, 2(1):1-8, 2011.

[29] Robert M Bond, Christopher J Fariss, Jason J Jones, Adam DI Kramer,

Cameron Marlow, Jaime E Settle, and James H Fowler. A 61-million-person experiment in social influence and political mobilization. Nature, 489(7415):295-

298, 2012.

[30] J. Bruggeman. Network diversity and economic development: a comment. Arxiv

preprint arXiv:1011.0208, 2010.

[31] C.D. Brummitt, A. Gomez-Lievano, N. Goudemand, and G. Haslam. Hunting for keys to innovation: The diversity and mixing of occupations do not explain a citydAZs patent and economic productivity. Santa Fe Institute Technical

Report, 2012.

[32] R.S. Burt. Structural holes: The social structure of competition. Harvard Univ

Pr, 1995.

[33] F. Calabrese, D. Dahlem, A. Gerber, D.D. Paul, X. Chen, J. Rowland, C. Rath, and C. Ratti. The connected states of america: Quantifying social radii of influence. Proc. of IEEE International Conference on Social Computing, 2011.

[34] John Y Campbell and Robert J Shiller. Stock prices, earnings, and expected dividends. The Journal of Finance, 43(3):661-676, 1988.

[35] D. Centola. The Spread of Behavior in an Online Social Network Experiment.

science, 329(5996):1194, 2010.

[36] D. Centola and M. Macy. Complex contagions and the weakness of long tiesi.

American Journal of Sociology, 113(3):702-734, 2007.

[37] Damon Centola and Michael Macy. Complex contagions and the weakness of long ties. The American Journal of Sociology, 113(3):pp. 702-734, 2007.

[38] N.A. Christakis and J.H. Fowler. The spread of obesity in a large social network over 32 years. New England Journal of Medicine, 357(4):370-379, 2007.

103

[39] Nicholas A Christakis and James H Fowler. Social network sensors for early detection of contagious outbreaks. PloS one, 5(9):e12948, 2010.

[40] T.T. Clydesdale. Family behaviors among early US baby boomers: Exploring the effects of religion and income change, 1965-1982. Soc. F., 76:605, 1997.

[41] Derrick L Cogburn and Fatima K Espinoza-Vasquez. From networked nominee to networked nation: Examining the impact of web 2.0 and social media on political participation and civic engagement in the 2008 obama campaign.

Journal of Political Marketing, 10(1-2):189-213, 2011.

[42] European Commission et al. Regulation (ec) no 1059/2003 of the european parliament and of the council of 26 may 2003 on the establishment of a common classification of territorial units for statistics (nuts). Official Journal of the

European Union, 21:2003, 2003.

[43] P. Crane and A. Kinzig. Nature in the metropolis. Science, 308(5726):1225,

2005.

[44] Kent Daniel, David Hirshleifer, and Avanidhar Subrahmanyam. Investor psychology and security market under-and overreactions. the Journal of Finance,

53(6):1839-1885, 1998.

[45] Olivier De Bandt and Philipp Hartmann. Systemic risk: A survey. Technical report, CEPR Discussion Papers, 2000.

[46] Yves-Alexandre de Montjoye, Cesar A Hidalgo, Michel Verleysen, and Vincent D

Blondel. Unique in the crowd: The privacy bounds of human mobility. Scientific

reports, 3, 2013.

[47] Yves-Alexandre de Montjoye, Erez Shmueli, Samuel S Wang, and Alex Sandy

Pentland. openpds: Protecting the privacy of metadata through safeanswers.

PloS one, 9(7):e98790, 2014.

[48] Defense Advanced Research Projects Agency. DARPA Network Challenge, Accessed May 2010. http://networkchallenge.darpa.mil/.

[49] Defense Advanced Research Projects Agency. MIT Red Balloon Team wins

DARPA Network Challenge, December 5, 2009. Press Release.

[50] Defense Advanced Research Projects Agency. DARPA Network Challenge

Project Report, February 16, 2010.

[51] Rich DeVaul, Michael Sung, Jonathan Gips, et al. Mithril 2003: Applications and architecture. In 2012 16th International Symposium on Wearable Comput-

ers, pages 4-4. IEEE Computer Society, 2003.

[52] Nicholas A Diakopoulos and David A Shamma. Characterizing debate performance via aggregated twitter sentiment. In Proceedings of the SIGCHI Confer-

ence on Human Factors in Computing Systems, pages 1195-1198. ACM, 2010.

104

[53] Savas Dimopoulos and Greg Landsberg. Black holes at the large hadron collider.

Physical Review Letters, 87(16):161602, 2001.

[54] R. I. M. Dunbar. Co-evolution of neocortex size, group size and language in humans. Behavioral and Brain Sciences, 16(4):681-735, 1993.

[55] N. Eagle, M. Macy, and R. Claxton. Network diversity and economic development. Science, 328(5981):1029, 2010.

[56] N. Eagle and A. Pentland. Reality mining: sensing complex social systems.

Personal and Ubiquitous Computing, 10(4):255-268, 2006.

[57] N. Eagle, A.S. Pentland, and D. Lazer. Inferring friendship network structure

by using mobile phone data. Proceedings of the National Academy of Sciences,

106(36):15274, 2009.

[58] Nathan Eagle, Michael Macy, and Rob Claxton. Network Diversity and Economic Development. Science, 328, 2010.

[59] Nathan Eagle and Alex Pentland. Reality mining: Sensing complex social systems. Personal and Ubiquitous Computing, (10):255-268, 2006.

[60] Nathan Eagle, Alex (Sandy) Pentland, and David Lazer. Inferring friendship network structure by using mobile phone data. Proc. Natl. Academy of Sciences,

106(36), 2009.

[61] Scott Ellison. Worldwide and U.S. Mobile Applications, Storefronts, and Developer 2010D2014 Forecast and Year-End 2010 Vendor Shares: The "Appifi- cation" of Everything. Dec 2010.

[62] P. Expert, T.S. Evans, V.D. Blondel, and R. Lambiotte. Uncovering spaceindependent communities in spatial networks. Proceedings of the National

Academy of Sciences, 108(19):7663, 2011.

[63] K. Farrahi and D. Gatica-Perez. Probabilistic Mining of Socio-Geographic Routines From Mobile Phone Data. Selected Topics in Signal Processing, IEEE

Journal of, 4(4):746-755, 2010.

[64] James H Fowler and Nicholas A Christakis. Dynamic spread of happiness in a large social network: longitudinal analysis over 20 years in the framingham heart study. Bmj, 337, 2008.

[65] P. Frijters, J.P. Haisken-DeNew, and M.A. Shields. Money does matter! evidence from increasing real income and life satisfaction in east germany following reunification. The American Economic Review, 94(3):730-740, 2004.

[66] Masahisa. Fujita, P.R. Krugman, and A. Venables. The Spatial Economy. MIT

Press, 1999.

105

[67] Edward Fullbrook. A guide to what's wrong with economics. Anthem Press,

2004.

[68] A. Ganesh, L. Massouli6, and D. Towsley. The effect of network topology on the spread of epidemics. In INFOCOM 2005. 24th Annual Joint Conference of the

IEEE Computer and Communications Societies. Proceedings IEEE, volume 2, pages 1455-1466. IEEE, 2005.

[69] Edward L Glaeser, Jed Kolko, and Albert Saiz. Consumer city. Journal of

Economic Geography, 1(1):27-50, 2001.

[70] S. Goel, D.M. Reeves, D.J. Watts, and D.M. Pennock. Prediction without markets. In Proceedings of the 11th ACM conference on Electronic commerce, pages 357-366. ACM, 2010.

[71] Bruno Gongalves, Nicola Perra, and Alessandro Vespignani. Modeling users' activity on twitter networks: Validation of dunbar's number. PloS one,

6(8):e22656, 2011.

[72] Marta C. Gonzalez, Cesar A. Hidalgo, and Albert-Laszlo Barabasi. Understanding individual human mobility patterns. Nature, 453(7196):779-782, 06

2008.

[73] M. Granovetter. The impact of social structure on economic outcomes. The

Journal of Economic Perspectives, 19(1):33-50, 2005.

[74] M.S. Granovetter. The strength of weak ties. ajs, 78(6):1360, 1973.

[75] Andrew G Haldane and Robert M May. Systemic risk in banking ecosystems.

Nature, 469(7330):351-355, 2011.

[76] Robin Hanson. Logarithmic market scoring rules for modular combinatorial information aggregation. George Mason University, 2002.

[77] Rawley Z Heimer. Friends do let friends buy aapl, and f, and ipet... 2011.

[78] John D Hey and Andrea Morone. Do markets drive out lemmingsNor vice versa? Economica, 71(284):637-659, 2004.

[79] Xin Huang, Hao Zhou, and Haibin Zhu. A framework for assessing the systemic risk of major financial institutions. Journal of Banking & Finance, 33(11):2036-

2049, 2009.

[80] A.B. Jaffe, M. Trajtenberg, and R. Henderson. Geographic localization of knowledge spillovers as evidenced by patent citations. The Quarterly Journal

of Economics, 108(3):577, 1993.

[81] W Jos Jansen and Niek J Nahuis. The stock market and consumer confidence:

European evidence. Economics Letters, 79(1):89-98, 2003.

106

[82] Oliver P John and Sanjay Srivastava. The Big-Five trait taxonomy: History,

measurement, and theoretical perspectives, volume 2, pages 102-138. Guilford,

1999.

[83] W.O. Kermack and A.G. McKendrick. A contribution to the mathematical theory of epidemics. Proc. R. Soc. Lond. A, 115:700-721, 1927.

[84] J. Kleinberg and P. Raghavan. Query incentive networks. In Foundations of

Computer Science, 2005. FOCS 2005. 46th Annual IEEE Symposium on, pages

132-141. IEEE, 2005.

[85] Jon Kleinberg. Algorithms, networks, and social phenomena. In Automata,

Languages, and Programming, pages 1-3. Springer, 2013.

[86] G. Krings, F. Calabrese, C. Ratti, and VD Blondel. A gravity model for intercity telephone communication networks, 2009.

[87] P. Krugman. On the number and location of cities. European Economic Review,

37(2-3):293-298, 1993.

[88] Coco Krumme, Manuel Cebrian, and Alex Pentland. Patterns of individual shopping behavior. arXiv preprint arXiv:1008.2556, 2010.

[89] David Lazer, Alex Sandy Pentland, Lada Adamic, Sinan Aral, Albert Laszlo

Barabasi, Devon Brewer, Nicholas Christakis, Noshir Contractor, James Fowler,

Myron Gutmann, et al. Life in the network: the coming age of computational social science. Science (New York, NY), 323(5915):721, 2009.

[90] Matthew Lease and Omar Alonso. Crowdsourcing and human computation, introduction.

[91] J. Leskovec, K.J. Lang, A. Dasgupta, and M.W. Mahoney. Community structure in large networks: Natural cluster sizes and the absence of large well-defined clusters. Internet Mathematics, 6(1):29-123, 2009.

[92] D. Liben-Nowell, J. Novak, R. Kumar, P. Raghavan, and A. Tomkins. Geographic routing in social networks. Proceedings of the National Academy of

Sciences, 102(33):11623, 2005.

[93] George Lin. The growth and structural change of chinese cities: a contextual and geographic analysis. Cities, 19(5):299-316, 2002.

[94] Hong Lu et al. The jigsaw continuous sensing engine for mobile phone applications. In Proceedings of the 8th ACM Conference on Embedded Networked

Sensor Systems, SenSys '10, New York, NY, USA, 2010.

[95] Sydney C Ludvigson. Consumer confidence and consumer spending. Journal

of Economic perspectives, pages 29-50, 2004.

107

[96] Thomas Lux and Michele Marchesi. Scaling and criticality in a stochastic multiagent model of a financial market. Nature, 397(6719):498-500, 1999.

[97] Anmol Madan, Manuel Cebrian, David Lazer, and Alex Pentland. Social sensing for epidemiological behavior change. In Proceedings of the 12th ACM interna-

tional conference on Ubiquitous computing, Ubicomp '10, pages 291-300, New

York, NY, USA, 2010. ACM.

[98] Anmol Madan, Katayoun Farrahi, Daniel Gattica Perez, and Alex Pentland.

Pervasive sensing to model political opinions in face-to-face networks. In Per- vasive'11 (in press), 2011.

[99] Economist Magazine. The other-worldly philosophers, 2009.

[100] Ankur Mani, Iyad Rahwan, and Alex Pentland. Inducing peer pressure to promote cooperation. Scientific reports, 3, 2013.

[101] C.D. Manning, P. Raghavan, H. Schutze, and Ebooks Corporation. Introduction

to information retrieval, volume 1. Cambridge University Press Cambridge, UK,

2008.

[102] Huina Mao, Xin Shuai, Yong-Yeol Ahn, and Johan Bollen. Mobile communications reveal the regional economy in cute divoire. In Proceedings of the 3rd

Conference on the Analysis of Mobile Phone Datasets (NetMob), 2013.

[103] Ellen E Meade and David Stasavage. Publicity of debate and the incentive to dissent: Evidence from the us federal reserve*. The Economic Journal,

118(528):695-717, 2008.

[104] Jean-Baptiste Michel, Yuan Kui Shen, Aviva Presser Aiden, Adrian Veres,

Matthew K Gray, Joseph P Pickett, Dale Hoiberg, Dan Clancy, Peter Norvig,

Jon Orwant, et al. Quantitative analysis of culture using millions of digitized books. science, 331(6014):176-182, 2011.

[105] S. Milgram. The experience of living in cities. Crowding and Behavior, 167:41,

1974.

[106] Raul. Montoliu and Daniel Gatica-Perez. Discovering human places of interest from multimodal mobile phone data. In Proc of 9th Int. Conference on on

Mobile and Ubiquitous Multimedia (MUM,',','), 12 2010.

[107] P.J. Mucha, T. Richardson, K. Macon, M.A. Porter, and J.P. Onnela. Community structure in time-dependent, multiscale, and multiplex networks. Science,

328(5980):876, 2010.

[108] Arvind Narayanan and Vitaly Shmatikov. Robust de-anonymization of large sparse datasets. In Security and Privacy, 2008. SP 2008. IEEE Symposium on, pages 111-125. IEEE, 2008.

108

[109] A. Noulas, S. Scellato, R. Lambiotte, M. Pontil, and C. Mascolo. A tale of many cities: universal patterns in human urban mobility. PloS one, 7(5):e37027, 2012.

[110] United States. Bureau of the Census, United States. Dept. of the Treasury. Bureau of Statistics, Labor. Bureau of Statistics, United States. Bureau of Foreign,

Domestic Commerce, et al. Statistical abstract of the United States. Government

Printing Office, 1989.

[111] Daniel Olguin Olgufn, Benjamin N. Waber, Taemie Kim, Akshay Mohan, Koji

Ara, and Alex Pentland. Sensible organizations: Technology and methodology for automatically measuring organizational behavior. IEEE Transactions on

Systems, Man, and Cybernetics, Part B, 39(1):43-55, 2009.

[112] J.P. Onnela, S. Arbesman, M.C. Gonzilez, A.L. Barabdsi, and N.A. Christakis.

Geographic constraints on social network groups. PloS one, 6(4):e16939, 2011.

[113] S.E. Page. The difference: How the power of diversity creates better groups,

firms, schools, and societies. Princeton Univ Pr, 2008.

[114] W. Pan, N. Aharony, and A. Pentland. Composite social network for predicting mobile apps installation. In Proceedings of the 25th Conference on Artificial

Intelligence, AAAI-11, San Francisco, CA, 2011.

[115] W. Pan, W. Dong, M. Cebrian, T. Kim, JH Fowler, and AS Pentland. Modeling dynamical influence in human interaction: Using data to make better inferences about influence within social systems. Signal Processing Magazine,

IEEE, 29(2):77-86, 2012.

[116] Wei Pan, Nadav Aharony, and Alex Pentland. Composite social network for predicting mobile apps installation. In Proceedings of the 25th Conference on

Artificial Intelligence (AAAI-11), San Francisco, CA, August 2011.

[117] Wei Pan, Nadav Aharony, and Alex Sandy Pentland. Fortune monitor or fortune teller: Understanding the connection between interaction patterns and financial status. In Privacy, security, risk and trust (passat), 2011 ieee third international conference on and 2011 ieee third international conference on social computing

(socialcom), pages 200-207. IEEE, 2011.

[118] Wei Pan, Yaniv Altshuler, and Alex Pentland. Decoding social influence and the wisdom of the crowd in financial trading network. In Privacy, Security, Risk and Trust (PASSAT), 2012 International Conference on and 2012 International

Confernece on Social Computing (SocialCom), pages 203-209. IEEE, 2012.

[119] Wei Pan, Gourab Ghoshal, Coco Krumme, Manuel Cebrian, and Alex Pentland.

Urban characteristics attributable to density-driven tie formation. Nature com-

munications, 4, 2013.

109

[120] T. Paridon, SM Carraher, and SC Carraher. The income effect in personal shopping value, consumer selfconfidence, and information sharing (word of mouth communication) research. Academy of Marketing Studies Journal, 10(2):107-

124, 2006.

[121] Alex Pentland. Social Physics: How Good Ideas Spread-The Lessons from a

New Science. Penguin, 2014.

[122] Alex Sandy Pentland. Beyond the echo chamber. Harvard Business Review,

91(11):80-+, 2013.

[123] G. Pickard, I. Rahwan, W. Pan, M. Cebrian, R. Crane, A. Madan, and A. Pentland. Time critical social mobilization: The darpa network challenge winning strategy. Arxiv preprint arXiv:1008.3172, 2010.

[124] Galen Pickard, Wei Pan, Iyad Rahwan, Manuel Cebrian, Riley Crane, Anmol Madan, and Alex Pentland. Time-critical social mobilization. Science,

334(6055):509-512, 2011.

[125] S.L. Pong and D.B. Ju. The effects of change in family structure and income on dropping out of middle and high school. Journal of Family Issues, 21(2):147,

2000.

[126] Tobias Preis, Helen Susannah Moat, and H Eugene Stanley. Quantifying trading behavior in financial markets using google trends. Scientific reports, 3, 2013.

[127] Charles Rackoff and Daniel R Simon. Non-interactive zero-knowledge proof of knowledge and chosen ciphertext attack. In Advances in CryptologyNCRYP-

TO091, pages 433-444. Springer, 1992.

[128] R. Reagans and E.W. Zuckerman. Networks, diversity, and productivity: The social capital of corporate r&d teams. Organization Science, 12(4):502-517,

2001.

[129] Alex Rutherford, Manuel Cebrian, Iyad Rahwan, Sohan Dsouza, James McInerney, Victor Naroditskiy, Matteo Venanzi, Nicholas R Jennings, Eero Wahlstedt,

Steven U Miller, et al. Targeted social mobilization in a global manhunt. PloS

one, 8(9):e74628, 2013.

[130] S. Saavedra, K. Hagerty, and B. Uzzi. Synchronicity, instant messaging, and performance among financial traders. Proceedings of the National Academy of

Sciences, 108(13):5296, 2011.

[131] M.J. Salganik, P.S. Dodds, and D.J. Watts. Experimental study of inequality and unpredictability in an artificial cultural market. science, 311(5762):854-856,

2006.

110

[132] Markus Schldpfer, Luis MA Bettencourt, Mathias Raschke, Rob Claxton, Zbigniew Smoreda, Geoffrey B West, and Carlo Ratti. The scaling of human interactions with city size.

[133] Matthew D Shapiro and Joel B Slemrod. Did the 2008 tax rebates stimulate spending? Technical report, National Bureau of Economic Research, 2009.

[134] William F Sharpe. The sharpe ratio. Streetwise-the Best of the Journal of

Portfolio Management, pages 169-185, 1998.

[135] ROBERT J Shiller. Sharing nobel honors, and agreeing to disagree. New York

Times, 26, 2013.

[136] Erez Shmueli, Yaniv Altshuler, et al. Temporal dynamics of scale-free networks.

In Social Computing, Behavioral-Cultural Modeling and Prediction, pages 359-

366. Springer, 2014.

[137] Erez Shumueli, David Lazer, Yaniv Altshuler, and Alex Pentland. Wisdom of the network.

[138] David Simon and Rawley Heimer. Facebook finance: How social interaction propagates active investing. In AFA 2013 San Diego Meetings Paper, 2012.

[139] Vivek K Singh, Laura Freeman, Bruno Lepri, and Alex Sandy Pentland. Classifying spending behavior using socio-mobile data. HUMAN, 2(2):pp-99, 2013.

[140] A. Smith. The wealth of nations (1776). New York: Modern Library, page 740,

1937.

[141] Christopher Smith, Afra Mashhadi, and Licia Capra. Ubiquitous sensing for mapping poverty in developing countries. Paper submitted to the Orange D4D

Challenge, 2013.

[142] D Sornette and JV Andersen. A nonlinear super-exponential rational model of speculative financial bubbles. International Journal of Modern Physics C,

13(02):171-187, 2002.

[143] Didier Sornette. Why stock markets crash: critical events in complex financial

systems. Princeton University Press, 2009.

[144] Thad Starner. Human-powered wearable computing. IBM systems Journal,

35(3.4):618-629, 1996.

[145] John C Tang, Manuel Cebrian, Nicklaus A Giacobe, Hyun-Woo Kim, Taemie

Kim, and Douglas Beaker Wickert. Reflecting on the darpa red balloon challenge. Communications of the ACM, 54(4):78-85, 2011.

[146] Andranik Tumasjan, Timm Oliver Sprenger, Philipp G Sandner, and Isabell M

Welpe. Predicting elections with twitter: What 140 characters reveal about political sentiment. ICWSM, 10:178-185, 2010.

111

[147] John C Turner. Social influence. Thomson Brooks/Cole Publishing Co, 1991.

[148] US Census Bureau. Population, Housing Units, Area Measurements, and Den-

sity: 1790 to 2000, PHC-3-1. Feb 2012.

[149] US Center for Disease Control, 2013.

[150] Benjamin N Waber, Daniel Olguin Olguin, Taemie Kim, Akshay Mohan, Koji

Ara, and Alex Pentland. Organizational engineering using sociometric badges.

In International Conference on Network Science, New York, NY, 2007.

[151] D. J. Watts and J. Peretti. Viral marketing in the real world. Harvard Business

Review, May 2007.

[152] Justin Wolfers and Eric Zitzewitz. Prediction markets. Technical report, National Bureau of Economic Research, 2004.

[153] A.W. Woolley, C.F. Chabris, A. Pentland, N. Hashmi, and T.W. Malone. Evidence for a collective intelligence factor in the performance of human groups.

science, 330(6004):686-688, 2010.

[154] Lynn Wu, Ben Waber, Sinan Aral, Erik Brynjolfsson, and Alex Pentland. Mining face-to-face interaction networks using sociometric badges: Predicting productivity in an it configuration task. 2008.

[155] J. Yang and J. Leskovec. Modeling Information Diffusion in Implicit Networks.

2010.

[156] J. Yang and J. Leskovec. Temporal variation in online media. In Proceeding

of the ACM International Conference on Web Search and Data Mining. ACM,

2011.

[157] Michael Youssefmir, Bernardo A Huberman, and Tad Hogg. Bubbles and market crashes. Dynamics of Computation Group, Xerox Palo Alto Research Cen-

ter, Palo Alto, CA, 1994.

[158] H. Zhang and R. Dantu. Discovery of Social Groups Using Call Detail Records.

In On the Move to Meaningful Internet Systems: OTM 2008 Workshops, pages

489-498. Springer, 2010.

112

Document 10746547

Reality Hedging: Social System Approach for

Understanding Economic and Financial Dynamics by

Wei Pan

B.Eng., Tsinghua University (2007)

Submitted to the Program in Media Arts and Sciences

School of Architecture and Planning in partial fulfillment of the requirements for the degree of

Doctor of Philosophy

at the

MASSACHUSETTS INSTITUTE OF TECHNOLOGY

Februrary 2015

Massachusetts Institute of Technology 2015. All rights reserved.

wf

Signature redacted

Certified by.............

Accepted by ................

Program in Media Arts and Sciences

School of Architecture and Planning

September 15th, 2014

Signature redacted ...

Prof. Alex (Sandy) Pentland

Toshiba Professor of Media Arts and Science

-TLhesis

Supervisor

Signature redacted

..........

of Pattie Maes

Interim Academic Head

Program in Media Arts and Sciences

Reality Hedging: Social System Approach for Understanding

Economic and Financial Dynamics by

Wei Pan

Abstract

Doctoral Committee:

Thesis Supervisor: ......

Signature redacted

Alex (Sandy) Pentland

Toshiba Professor of Media, Arts, and Sciences

Massachusetts Institute of Technology

Signature redacted

Thesis Reader:..................

Andrew W. Lo

Harris & Harris Group Professor of Finance

Massachusetts Institute of Technology

Thesis Reader: ...

ignature redacted

Michael W. Macy

Goldwin Smith Professor of Arts and Sciences

Cornell University

Acknowledgments

Contents

List of Figures

List of Tables

Chapter 1

Introduction

[95].

[57].

Chapter 2

Big Data: New Measurements for

Social Systems

2.1 Overview

2.2 Big Data from the Internet

2.2.1 Review of Online Data

2.2.2 Online Social Networks for Trading

[701.

2.3 Big Data from Mobile and Wearable Devices

[1441,

2.3.1 Existing Studies Based on Mobile Phones

2.3.2 The Friends and Family Study

Pilot Phase: 55 Participants Phase 11: 130 Participants

[111],

[97].We

2.3.3 Results from the Friends and Family Study

[68]

C {0,

2.4 From Individual Financial Behavior to Policy Making: The Big Picture

Chapter 3

Idea Flow: Social Interactions and

Urban Economic Development

3.1 Introduction