A CASE STUDY OF HUMAN INTERACTION NETWORKS IN QATIF, SAUDI ARABIA" MASSACHUSETTS INSTIfUTE OF TECHNOLOGY "RIOTS AND SOCIABILITY: OCT 0L7 BY MICHAEL ANGELO GRECO B.S. III LIBRARIES COMPUTER SCIENCE, MATHEMATICS, ART UNIVERSITY OF WISCONSIN MADISON, 2006 SUBMITTED TO THE DEPARTMENT OF URBAN STUDIES AND PLANNING AND THE ENGINEERING SYSTEMS DIVISION IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREES OF MASTER IN CITY PLANNING AND MASTER OF SCIENCE IN TECHNOLOGY POLICY AT THE MASSACHUSETTS INSTITUTE OF TECHNOLOGY SEPTEMBER 2014 02014 - MASSACHUSETTS INSTITUTE OF TECHNOLOGY ALL RIGHTS RESERVED. Signature redacted Signature of Author Department of Urban Studies and Planning ms Division e Te bey 3, 2014 Signature redacted Certified by Professor o h Signature redacted , Accepted by Carlo Ratti Planning Studies and of Urban actie, Department Thesis Supervisor Dennis Frenchman Professor of Urban Design and Planning Chair, MCP Committee Department of Urban Studies and Planning Accepted by Signature ( redacted ' Dava J. Newman Professor of Aeronautics and Astronautics and Engineering Systems Director, Technology and Policy Program "Riotsand Sociability: A Case Study of Human Interaction Networks in Qatif Saudi Arabia" by Michael Angelo Greco III Submitted to the Department of Urban Studies and Planning and the Engineering Systems Division in partial fulfillment of the requirements for the degrees of Master in City Planning and Master of Science in Technology Policy ABSTRACT Since the onset of the Arab Spring in late 2010, waves of political activism have reverberated across much of the Arab world. A growing body of literature has emerged that explores how new communications and social media technologies have contributed to, and in certain cases instigated various forms of collective action. However, little research has examined the effect of these activities on communication patterns themselves. This thesis aims to investigate the reorganization of sociability under civil duress at an aggregate, urban scale. The study employs a novel approach to communications analysis, applying the Synthetic Control Method to estimate the causal effect of riots on different characteristics of human interaction within Qatif, Saudi Arabia, after an exogenous shock triggered a surge in public demonstrations. The analysis reveals a strong, statistically significant drop in total call volume, relative to other cities in Saudi Arabia. This is combined with a similarly strong and statistically significant drop in unique daily callersdemonstrating that people weren't only making fewer calls, fewer people were participating in the telecom network each day. Interestingly, daily phone activity is shown to increase within the subnetwork of users identified to hold strong spatiotemporal ties to the city, even though their total activity measures (which include connections both internal and external to the subnetwork) remain constant. This suggests a shift in callee preference for individuals who are more directly affected by urban unrest. Lastly, information transmission tests are performed on Qatif's pre and post treatment interaction networks. Initial research shows that-beyond a 26% diffusion thresholdinformation reaches more people faster through the post treatment network. This provides some support to the hypothesis that communities under duress intelligently reorganize communications to increase dissemination speed and breadth, however, further research will be required to refine these findings and demonstrate a causal link. Thesis Supervisor: Carlo Ratti Title: Professor of the Practice, Department of Urban Studies and Planning 3 Contents I INTRODUCTION 2 DATA AND PROCESSING '5 15 Call Detail Records 2.2 Tweets . . . . . . . . . . . . . . . . . . . 17 2.3 City Selection and Data Aggregation . . . . 20 2.4 Data Limitations . . . . . . . . . . . . . . 20 . . . . . . . . . . . . . . . . 2.1 1 METHODS 3.' Synthetic Control Methods.. . . . . 21 . . . 3 9 4 ANALYSIS: CALL BEHAVIOR 25 5 ANALYSIS: INTER AND INTRACITY CALLING PATTERNS 37 38 5.2 Urban Call Counts . . .. . . . . . . . . . 39 8 . 43 ANALYSIS: TWITTER ACTIVrTY 6.1 Geotagged Activity . . .. . . . . . . . . . . . 43 48 7.1 Location Estimation for Non-Geotagged Tweets . 48 7.2 Communication Networks . . . . . . . . . . . . FUTURE DIRECTIONS 52 7.3 . 7 . Location Identification . . . . . . . . . . . . 6 5.1 Religiosity . . . . . . . . . . . . . . . . . . . 56 6o DISCUSSION 4 APPENDICES 64 A APPENDIX: CALL BEHAVIOR 65 B APPENDIX: INTER AND INTRACITY CALLING PATTERNS 69 C APPENDIX: TWITTER ACTvITY 74 81 REFERENCES 5 Listing of figures i.o.i Protest Images From QatifFollowing Ahmad al-Matar's Death. Found at: http://khaleejsaihat.com/web3/showthread.php?t=129754 . 14 2.1.1 Geographic Distribution of Cell Towers in Saudi Arabia . . . . . . . 16 . . . 2.1.2 Left: Service Type Histogram, Right: Service Detail Description Histogram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1.3 17 Phone Activity Timeline Over Study Period (Top), Daily Phone Activity Timeline of Saudi Arabia, Dec. 12th (Bottom) . . . . . . . . 18 2.2.1 Tweet Timeline Over Study Period (Top), Daily Tweet Timeline of Saudi Arabia, Dec. 12th (Bottom) . . . . . . . . . . . . . . . . . . 4.0.1 Daily call distributions for Dec. 21st and Dec. 19 28th for All KSA govornerates (Left), and Qatif (Right) . . . . . . . . . . . . . . . . 26 4.0.2 Trends in total daily network activity, Qatif vs. Other Saudi Governorates, Dec. 20th - Jan. 3rd (Top), and Trends in Average Daily Call Duration, Qatif vs. Other Saudi Governorates, Dec. 20th - Jan. 3rd (Bottom). "Treatment" indicated by dashed pink line . . . . . . . . 27 4.0.3 Trends in Total Network Activity, Qatif and Synthetic Qatif (Left), and Total Network Activity Gap Between Qatif and Synthetic Qatif (Right) . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . 29 4.0.4 Trends in Average Daily Call Duration, Qatif and Synthetic Qatif (Left), and Average Daily Call Duration Gap Between Qatif and Syn- thetic Qatif (Right) . . . . . . . . . . . . . . . . . . . . . . . . . 31 4.0.5 Trends in Unique Callers, Qatif and Synthetic Qatif (Left), and Gap in Unique Callers, Qatif and Synthetic Qatif (Right) 6 . . . . . . . . 32 4.0.6 Synthetic Control Placebo Tests with Sabya. Total Daily Network Activity (Left), Average Call Duration (Middle), and Daily Unique Callers (Right) ..... .... .. ... ... . .... . .. .. 4.0.7 Across-Unit Placebo Tests: Total Activity (all, SOOx or less, less, Sox or less) . . . . . . . . . . 32 roox or . - - . . . . . . . . . . .. . 34 4.0.8 Across-Unit Placebo Tests: Daily Unique Callers (all, SOOx or less, oox or less, 5ox or less) . . . . . . . . . . . . . . . . . . . . . . . 35 4.0.9 In-Time Placebo Tests with Qatif. Total Daily Network Activity (Left) and Average Call Duration (Right) . . . .. . . . . . . . . .. .. 36 5.2.1 Trends in standardized intra (top), inter-in (middle), and inter-out (bottom) call volumes daily network activity, Qatif (solid) vs. other Saudi governorates (dashed). Dec. 20th - Jan. 3rd. Treatment indi- cated by dashed pink line . . . . . . . . . . . . . . . . . . . . . . 5.2.2 Trends in Intra Call Volumes . . . . . . . . . . . . . . . . . . . . 5.2.3 Trends in Inter-In Call Volumes . . . . . . . . . . . . . . . . . . . 5.2.4 Trends in Inter-Out Call Volumes . . . . . . . . . . . . . . . . . . 40 41 41 42 6.1.1 Trends in standardized daily Tweet volume (top), Tweet length (middle), and Tweets per user (bottom), Qatif (solid) vs. other Saudi governorates (dashed). Dec. 20th - Jan. dashed pink line . . .. 3rd. Treatment indicated by . . . . . . . . . . . . . . . . . . . . . . . 44 6.i .2 Trends in Total Tweet Activity, Qatif and Synthetic Qatif . . . . . . 6.i .3 Trends in Average Tweet Length, Qatif and Synthetic Qatif . . . . . 6.x .4 Trends in Tweets Per User, Qatif and Synthetic Qatif . . . . . . . . 45 7.1.' Trends in Total Tweet Activity, Qatif and Synthetic Qatif . . . . . . 52 7.1.2 Trends in Average Tweet Length, Qatif and Synthetic Qatif . . . . . 7..3 Trends in Tweets Per User, Qatif and Synthetic Qatif . . . . . . . . 53 45 46 53 7.2.1 Total Degree Distribution (Left), and Edge Weight Distribution (Right) of the Complete Reciprocated Network, KSA . . . . . . . . . . . . 7.2.2 Fraction of Infected Nodes as Function of Time (Top), Number of 54 infected Nodes at each instance of t (Middle), and Distributions of Edge Weights Responsible for Infection . . . . . . . . . . . . . . . 7.3.1 Daily Network Activity Distributions from Jeddah (Western Saudi 57 Arabia), Riyadh (Central Saudi Arabia), and the Eastern Region 58 7 . . 7.3.2 Trends in Daily Prayer Time Disruption, All KSA Cities. Qatif drawn in pink. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A.o. i Total Network Activity, Qatif and Synthetic Qatif (3 Weeks) . . . . 59 67 A.o.2 Number of Unique Daily Callers, Qatif and Synthetic Qatif (3 Weeks) 68 B.o.i Intra Call Activity Synthetic Control Placebo Test with Samteh (Left), In-time Intra Call Activity Placebo with Qatif (Right) . . . . . . . . 70 B.o.2 Daily Local Call Activity . . . . . . . . . . . . . . . . . . . . . . . 71 B.o.3 Across-Unit Placebo Tests: Intra Call Activity (all, 2Ox or less, iox or less, 5x or less) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 B.o.4 Intra Call Activity, Qatif and Synthetic Qatif (3 Weeks) . . . . . . . 73 C.o. i Tweets Per User Synthetic Control Placebo Test with Ahad Rufaydah (Left), In-time Tweets Per User Placebo with Qatif (Right) . . . . . 76 C.o.2Across-Unit Placebo Tests: Tweets Per User (all, 50x or less, 2ox or less, 5x or less) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 77 Introduction This paper seeks to explore how social unrest affects broad-scale sociability in a city or region. Since the Arab Spring in the early 2oios, there have been waves of politi- cal activism across much of the Arab world, including the Kingdom of Saudi Arabia. A growing body of literature has developed to investigate how new communications and social media technologies have contributed, and in certain cases instigated various forms of collective action. A few studies have considered the impact of mobile phone access in facilitating collective action, though most have narrowed in on the effects of a specific emerging technology, like Twitter or Facebook. Furthermore, these studies follow a wide range of methods that can be broadly grouped into the following three categories: qualitative approaches that relied on survey data and expert interviews; quantitive approaches that characterized the nature of communication patterns through these new media outlets; or more advanced analytical methods that sought to isolate the role various communications media played during social unrest. Examples of these types of research will be discussed in brief over the remainder of this section. 9 Expert interviews are a popular method for subjectively exploring the impact of social connectivity in urban environments. Tufekci et al. examined the protests in Egypt by surveying participants in Tahrir Square. They argued that respondents who used social media were much more likely to attend the demonstrations on the first day. They further noted that approximately half of those questioned spread media from the protests online, and that in many cases communication through social media, mobile phones and face-to-face conversations superseded the role of traditional news media during the protests. Thus, the authors concluded that social technologies were critical in diffusing information related to, and encouraging involvement in the activist intervention Tahir Square [I]. Similarly, Breuer et al. used a qualitative approach to characterize the role of social media in the Tunisian Revolution. Using a combination of expert interviews with protest participants and preference survey data from Tunisian internet users, the authors claimed that new social media forms helped overcome a government-enforced media blackout by enabling activists to broadcast information on the movement. Further, they found these collaborative technologies facilitated the emergence of partnerships between activist groups, and encouraged a kind of 'emotional mobilization' through the depiction of the regime's atrocities in the uncensored content [2]. Qualitative research has been a popular method for identifying a symbiosis between online communication technologies and social unrest. Another body of research pushed the relationship between social media and collective action further by quantifying aspects of user behavior in response to specific conflict events. Lotan et al compared and contrasted Tweet broadcasting and news consumption in Tunisia and Egypt [3]. At a high level, this study reiterated the importance of social media as a communications medium during periods of cultural instability. Furthermore, they built off the existing state of the literature by highlighting important differences and similarities in information flow across cultures [31. They found that tweets produced by both in- dependent activists and mainstream news outlets produced a larger responses in Egypt than Tunisia. On the hand, Tunisians responded more to news disseminated by blogger sites, while both countries showed a heavy reliance on information spread by journalists. This research suggests a link between social media platforms and city's cultural fabric, even though pinpointing this exact relationship eclipsed the scope of the paper. IO The papers by Szell et al. and Bagrow et al. examined how the social temperature of a city or region manifested itself in communication patterns. Szell et al. studied the nature of collective reaction to major events like presidential elections, sports tournaments, and weather abnormalities on Twitter. They found a negative correlation between users' excitation and message length. [41. Bagrow et al. also explored how emotions impacted communication patterns. They used geotagged mobile phone call records gathered from cities in emergency states. They found that spikes in communications volume were spatially and temporally bounded, and asserted that affected individuals "will only invoke the social network to propagate information under the most extreme circumstances" [51. On the other hand, even though Szell et al. and Bagrow et al's research suggested that the importance of communication mediums increased during periods of social excitation, not all elements of social media played a strong role in these events. Aday et al. investigated the role of specific features of social media in intra-country collective and regional diffusion. They limited their analysis to exploring how bit.ly links-a URL shortening service popular on Twitter-were used to share and consume relevant protest data [6]. They found no evidence that the service played a significant role in Tunisia, Egypt, Bahrain, or Libya. A final subset of papers on social media and sociability posit that a causal relationship exists between communication technology market penetration and the probability of social action. Pierskalla et al. explored the effects of mobile phone access on collective action in Africa. The authors employed UCDP conflict data from 1989 to 2010 alongside data on mobile phone coverage data from the GSMA. Using a number of binary dependent variable models, they showed the availability of mobile phone coverage significantly and substantially increased the likelihood of violent conflict in the region [7]. They further concluded that the adoption of communication technologies produced intrinsic changes in a city or region's communications patterns and willingness to collaborate. They also noted that the ability to assemble did not automatically yield a social good-in their case it facilitated violent collective action and increased overall instability in the study region. Most of the research on communication technologies and collective action has focused on the utilization of social media and mobile phones. Researchers tend to agree II that these new technologies facilitate collective action, although there are disputes over the scale of their impact. Researchers also agree that engagement attributes measured through these technologies change to reflect states of heightened tension. However, few studies have addressed the broader social implications of these communication devices. This analysis builds on existing studies by estimating the causal effect of civil unrest on various characteristics of the human interaction networks as expressed through Call Detail Records (CDRs), and social media as expressed through Twitter messages. Using Synthetic Control Methods, differences in call and tweet behavior are calculated between a region experiencing riots (Qatif, Saudi Arabia-the treated unit), and regions that do not experience riots (other cities in Saudi Arabia-the control units). Surprisingly, the results show that total daily network activity across the city of study significantly decreases with treatment, while call activity within the subset of individuals who hold strong spatiotemporal ties to the region significantly increases. The effects on average call duration, and inter-city call volumes remain inconclusive. When examining tweets, the number of Tweets per user, per day drastically increases at the symbolic climax of the demonstrations, but placebo tests indicate that this finding is not robust. Lastly, preliminary information transmission tests performed on city-scale interaction networks shows that information reaches more people faster during the post treatment period, however, further research will be required to refine these findings and demonstrate a causal link. CONTEXT: THE DEATH OF AHMAD AL-MATAR Large-scale public demonstrations have plagued Saudi Arabia since February 2011. The protests have been mainly concentrated in the oil-rich Eastern Province. The Eastern Province, and the city of Qatif in particular, is home to the largest proportion of Shiites, members of the minority denomination of Islam in Saudi Arabia. Sunni-Shia relations have a history of tension in the nation, and, following the early events of the Arab Spring in late 2010, sparks of unrest began to reignite. A movement began to coalesce around the early demonstrations of 2011, with protesters calling the release of political prisoners, freedom of expression and assembly, and an end to widespread discrimination against Shiites. In February, after seven young shiites were killed, the country experienced its largest collective uprising since 1979 [8]. Protests then occurred on an 12 I almost regular basis before cooling off through the spring and early summer of 2012. This period of relative tranquility was broken on July 8th with the shooting and arrest of Nirm al-Nimr [8], a Shia Sheikh and outspoken leader of the movement, which re-escalated tensions and sparked a new wave of demonstrations [9]. Large protests engulfed the city Qatif and quickly became violent, leading to the deaths of two more protestors [ro]. Security forces began to crack down on dissidents, pursuing 23 men whom the government claimed were wanted for inciting unrest in Qatif. Raids in late September brought about the deaths or injuries of several of these men.[i 1] As time passed, younger activists in the region began to incorporate protest tactics employed by Bahraini youth [io], which included the nightly burning tires on the roads around the city. It was at one of these demonstrations near midnight of December 2 7 th that teenage Shia activist Ahmad al-Matar was shot dead by security forces. In spite of little media coverage, reports indicate the protest was.held to demand the release of political prisoners, and several other protesters were injured and/or arrested. [ 12]. This event set in motion a wave of riots and demonstrations throughout the city that culminated in a funeral procession on December 31st with an estimated crowd of so,ooo [io]. Activists also took to Twitter, starting a campaign that used the Arabic hashtag "We All Are Qatif" [13]. Since protests occur on a regular basis in Qatif, the study has been framed around al-Matar's death as an exogenous 'treatment' applied to the social fabric of the city. 13 Figure 1.0.1: Protest Images From Qatif Following Ahmad al-Matar's Death. Found at: http://khaleejsaihat.com/web3/showthread.php?t=1 29754 14 2 Data and Processing 2.1 CALL DETAIL RECORDS CALL DETAIL RECORDS (CDRs) are the primary dataset of interest in this study. The records cover 12/03/12 to 1/3/13. They were obtained from a Saudi telecommunica- tions agency that offers mobile services to the Kingdom. Cellular activity is one of the most powerful real-time sensing mechanisms currently available to us; the ubiquity of digital devices allow us to capture extremely high-resolution traces of humanity across a variety of dimensions. Saudi Arabia's mobile phone penetration is above 198%-an astonishing figure suggesting that many across the Kingdom own more than one mobile device. The data cover the entire country of Saudi Arabia, with over 100 million daily network connections to over 1o thousand unique cell towers, with approximately 18 million '5 Figure 2.1.1: Geographic Distribution of Cell Towers in Saudi Arabia unique phones. The CDR dataset consists of anonymous location measurements generated each time a device connects to the cellular network. Each anonymized record holds a precise time and duration measure for the connection, the caller's location (by cell tower), the 'service type', and 'service detail description.' The service type consists of an identifier that logs the type of origin and destination telephones. The 'service detail description' describes the record's type of communication, e.g. voice, data, SMS, etc. There are 486 unique codes, however only 13 5 appear in the dataset. SMS activ- ities are among those excluded. Internet requests were found to hold broken spatial identifiers and were consequently excluded. Thus, for the purposes of this study, all but voice activity have been culled from the dataset. To construct the composite data table, all tower pings were summed per city, per day to arrive at a total activity measure. Average daily call duration, the number of unique callers, and the number of calls per individual were constructed in a similar fashion. Lastly, these results were combined with population statistics obtained from KSA's i6 Figure 2.1.2: Left: Service Type Histogram, Right: Service Detail Description Histogram Ministry of Economy & Planning, Central Department of Statistics & Information, Department of Analysis & Reports. The final dataset includes percentages of men and women, Saudis and non-Saudis, for each governorate from the year 2010. The top panel of Figure 2.1.3 presents a snapshot of daily activity across the nation. To capture activity at a more granular scale, a daily histogram of call activity was recorded at I 5-minute intervals over the course of each day (shown in the bottom panel). Each day follows a very stable pattern of low early-morning activity, a mid-day peak around 1:oopm, a lull in calls until approximately 3:oopm, and a daily maximum between 6:oo and 7:oopm. The overall stability of these plots suggests that it may be possible to detect to a city-wide disruption. 2.2 TWEETS The second dataset consists of messages posted on Twitter from Saudi Arabia over the study period (12/20/12 to 1/3/3). Twitter is a social networking service that allows users to share and read 'Tweets,' which they define as expressions of a moment or idea [14]. Tweets are limited to 140 characters and can be posted through a website in- terface, text message, or mobile app. Since its founding in 2006, Twitter has become increasingly popular across the globe. While its usage varies from country to country, A 2012 survey of European and Middle East markets conducted by GlobalWebIndex 17 10 x 10 9 8 7 12/20 12122 12/24 12/26 12/30 12/28 01/01 01/03 x 10 12 00 00 08:00 16:00 00:00 Figure 2.1.3: Phone Activity Timeline Over Study Period (Top), Daily Phone Activity Timeline of Saudi Arabia, Dec. 12th (Bottom) found that 51% of Saudi Arabian internet users are active on Twitter-the highest penetration rate of any locale in their report [15]. Saudi Arabia was also found to hold the fastest rate of growth over much of 2012. The Twitter dataset used in this study is comprised of geotagged Tweets from Twitter's Decahose service, which provides "statistically valid sample of at least 10% of all Tweets, selected at random"[ 16]. Each record contains a user identifier, message (up to 140 characters long), timestamp, and location (represented as a pair of latitude and longitude coordinates). These messages were posted by users who chose to include additional locational metadata with each Tweet. Roughly 0.00 1% of the total Tweet stream is geotagged. For the Saudi dataset-tweets corresponding to the cities under 18 comparison-this amounts to about 160,00o tweets from 62,00o unique users with an average message length of 70.5 characters. While this dataset is meager in comparison to CDRs, a method for estimating locations for non-geotagged Tweets is presented in Chapter 7. 2x-104 1.5 12/20 12/22 12/28 12/26 12/24 12/30 01/01 01/03 150 100 50 0 00:00 16:00 08:00 00:00 Figure 2.2.1: Tweet Timeline Over Study Period (Top), Daily Tweet Timeline of Saudi Arabia, Dec. 12th (Bottom) Unlike the phone activity shown above, daily tweet distributions shown in Figure 2.2.1 are significantly noisier and don't appear to follow an obvious pattern. It may be harder to characterize daily patterns and detect changes introduced by an exogenous shock. 19 2.3 CITY SELECTION AND DATA AGGREGATION To isolate the treatment effects on Qatif, additional Saudi cities were used as units of comparison. All cities that held a population of over 100,000 were selected, amounting to a total of 40 urban governorates (including Qatif). A city's collection of cell towers was identified by intersecting all towers with its geographic boundaries. The call records were then clustered by tower ID. The geotagged tweets were simply aggregated by city boundaries. 2.4 DATA LIMITATIONS After exploring the basic properties of the phone call and tweet datasets it's worth underscoring their limitations. Both are susceptible to some degree of sampling bias. Regarding the phone records, the data lack explicit market share values per city. While this could be calculated from indirectly through demographic indicators, it remains difficult to know how representative to the location the sample is. In spite of concerns related to possible sampling biases of CDRs [17], they remain one of the most comprehensive data sources available in representing large-scale human interaction. The geotagged tweets, on the other hand, capture high spatial resolution, but the population coverage is not nearly as high as the CDRs. Data from Twitter are also liable to other demographic biases, as sampling individuals who participate in online media is inherently biased towards groups who have access to the internet. Lastly-and this is true of both Tweets and CDRs-a user may not be tied to a single individual, and a single individual may not be tied to a single user. It's entirely possible for one person to hold multiple accounts (e.g. an individual owning a phone for business and personal use), as it's possible for a group to communicate through one account (e.g. a company utilizing bots for automated calling or Tweeting). This should not be detrimental to this study due to the level of aggregation in much of the analysis. However, care has been taken to eliminate this bias in specific instances which examine narrower subpopulations. 20 -I 3 Methods 3.1 SYNTHETIC CONTROL METHODS Synthetic Control Methods (SCM) is used as the primary methodological tool in this study. The statistical technique was developed by Abadie et al. as a means to investigate causal inference in comparative case studies with aggregate data [ 18]. SCM was primarily developed as a means to assess the impact of policy interventions that are applied at an aggregate level, or the effects of a 'treatment' that has been implemented at an aggregate scale (e.g over a country, region, or city), to a small number of units. The traditional approach to comparative case studies of this nature is to use a control group's outcome to approximate the outcome that would have been observed for the treatment group in the absence of treatment. The choice of control units is typically at the researcher's discretion, which has aroused questions over whether or not the control can be interpreted as a plausible counterfactual. It is also difficult to find a single untreated unit that appropriately approximates the unit that has received treatment. 21 11 SCM overcomes this by implementing a data-driven selection process for the control group, offering a much more empirically-defined means of inference. SCM incorporates a weighted combination of units to better approximate the unit that has been exposed to the treatment [19]. The method was first used to examine the economic effects of conflict in the Basque Country, where the the authors found that after an outbreak of terrorism in the 1970s, the region's per capita GDP declined about 1o percent [18]. SCM was then applied to California's Proposition 99, a cigarette tax enacted in 1988. The authors estimated that in 2000 the annual per-capita cigarette sales were roughly 26 packs lower than what they would have been without the tax [20]. This study represents the first time this methods has been applied to data on daily timescale. In this study the synthetic control approach will be employed to select a combination of urban governorates (a special Saudi designation for cities at the second level of regional administration within the country) to construct a better comparison for the exposed governorate to the treatment than any single governorate alone. The potential controls were chosen from a list of all Saudi governorates that had a population greater than 100,ooo, as of 201o in the official census. Qatif is the treated unit, as the riots were concentrated there. Other cities in the Eastern Province may have experienced heightened unrest during the treatment period as well. With this in mind, these cities were included in the donor pool, but the synthetic control method did not make significant use them in its construction of synthetic Qatif. This permits the assumption of no interference between units-violations of the stable unit treatment value assumption. Additionally, it is assumed that the treatment has no effect on the outcome variables before the implementation period. However, this may be a strong assumption since, as stated previously, the Eastern Province has been experiencing unrest since February 2011. Following Abadie et al. out as follows: For units i = 1,... , J+ 1 and time periods t = 1, ... 22 , T, let: [21], the model works * To be the number of pre-treatment periods with 1 < To < T * Yt e be the dependent variable for unit i at time t in the absence of treatment YI be the dependent variable for unit i at time t if unit i is exposed to treatment in period To + 1 to T. Only the first city (Qatif), i = 1, is exposed to treatment after period To, thus: Dit = 1 if i = 1 and t > To, 0 otherwise. The observed outcome for unit i at time t is Yt = Yt is: act = Y t - Yf an estimate of Y = Yit - YN for +aitDit . The desired estimate t > To. Yt can be observed, so to estimate ait is required, which can be given by the factor model: t tZi +Atl ti i O N where 6 t represents the unobserved common time-dependent factor; 6t is a vector of unknown parameters; Zi is a vector of observed covariates not affected by the treatment; At is a vector of unobserved common factors; i is a vector of unobserved covariates, and Eit are error terms representing unobserved transitory shocks. ait will be unbiased if a (Jx1) vector of weights W = (w2, ... , wJ+1)' is chosen such that wj > 0(j = 2, J+ 1) and w2 +... +wJ = 1 where each particular value of the vector ) W represents a potential synthetic control. Suppose that there are (w2, ... , w* 1 J+1 J+1 E w Yji = j=2 J+1 Y11, . . , E w jjTo = Y1T, and 1 w Zj = Zi j=2 j=2 The synthetic control units are selected such that this equation can hold approximately given an appropriate number of pre-treatment time periods. Let J be the number of available control units and W = (w2, ... , wJ+i)' be a vector of nonnegative weights which sum to i. The scalar wj(j = 1, ... weight of region j (Jx1) , J) is the in synthetic Qatif. Let Xi be a (Kx1) vector of pre-treatment char- acteristics for the treated unit Qatif. Let X0 be a (KxJ) matrix which contains the values of the same variables for the J possible control governorates. A vector of weights 23 W* is chosen to minimize ||X1 - XoW||v = where wj > 0(j = 2, ... , J+ 1) and w 2 ((X1 - XoW)'V(X1 - XOW)) + ... + wj = 1. V is a diagonal matrix (kxk) that assigns weights to linear combinations of the variables in Xo and X 1 to minimize the mean square prediction error (MSPE) of the synthetic control estimator. 24 U. 4 Analysis: Call Behavior As an introductory sanity check, daily call distributions were computed for Qatif by itself, and all other governorates in aggregate for the days of December 21st-the first pre-treatment Friday in the dataset-and December 28th-the first post-treatment Friday in the dataset. The first figure depicts the daily distributions for all governorates except Qatif and shows very little variation over the course of the day (Figure 4.0. 1), while the second figure demonstrates a clear decrease in combined activity between gam and 9pm (Figure 2). Figure 4.0.2 plots the trends in total network activity (top) and call duration (bottom) in Qatif and the rest of the governorates in the KSA. There exists some similarity between the plots during the pretreatment, and with some divergence from the treatment day onward. However, it remains difficult to judge how closely the aggregate group cities compare with Qatif. The plot of average call duration is even harder to 25 0000 --r 8- . 4 7 - ATI KSA All KSA C16.oo 100 TIO.> m3 P-Tres511.t 100.000. POgt-T 00 ,l 1 6 - r e1 m -Traatms,1 3000 6 5 -2500- 45 2 . :0861 ry6 ng1-1600 3000 2000 -1000 500 - Figure 4.0.1: Daily call distributions for Dec. 21st and Dec. 28th for All KSA govornerates (Left), and Qatif (Right) compare visually, but it's worth noting that the pronounced jump in Qatif's average call duration at the beginning of the post-treatment period, before it again falls below the country-wide average. Synthetic Qatif is constructed as a weighted average of potential control cities with weights chosen so that the result best reproduces the values of a set of predictors of sociability before the riots began on December 28th. The initial variables of interest are: total daily network activity, daily unique users, and the average duration of daily calls. Using the synthetic control method described previously, Two distinct synthetic Qatifs are constructed such that they that mirror the selected predictors. The treatment effects are then estimated as the differences between Qatif and its synthetic versions in the days following. The predictors of total daily network activity, daily unique callers, and average call duration are the same. They include: I. total network activity in the week before treatment, or daily unique callers in the week before treatment 2. average duration of calls in the week before treatment 3. percent of males as obtained from the Department of Analysis & Reports in 2010 26 1.5 -- T II I - 8 - All Cities 1 . 0.5 Qatif * -- C , - 0 -0 -1.5 -o -3 -2- Q -2.5 -- 20 % 19 21 22 23 24 25 26 27 28 29 I I 28 29 30 31 01 02 03 30 31 01 02 03 04 Day Qatif -- 0.5- -3 I I I 22 23 24 I - -1.5 219 20 21 25 26 27 Day 04 Figure 4.0.2: Trends in total daily network activity, Qatif vs. Other Saudi Governorates, Dec. 20th - Jan. 3rd (Top), and Trends in Average Daily Call Duration, Qatif vs. Other Saudi Governorates, Dec. 20th - Jan. 3rd (Bottom). "Treatment" indicated by dashed pink line 4. percent of Saudis, also obtained from the Department of Analysis & Reports in 2010 Table 4.0.1 displays the balance between Qatif and 27 synthetic Qatif for the outcome Table 4.0.1: Total Daily Activity Predictor Means Variables avgDuration (days 1-7) totalActivity (days percentMen percentSaudi 1-7) Treated Synthetic Sample Mean 122.141 12.400 122.208 137.957 12.402 13.059 0.545 o.870 0.570 o.6oo 0.822 o.855 Table 4.0.2 Total Daily Network Activity Governorate Weights in Synthetic Qatif Governorate Weight Al Bahah 0.010 Ar ar 0.007 Sakaka Yanbu Al Bahar Buraydah 0.007 Al Kharj 0.005 Khamis Mushayt Ha il 0.004 Sabya Al Qunfidhah 0.435 o.080 Najran 0.002 Tabuk Ad Dammam Governorate Ar Rass Unayzah Weight 0.000 0.007 Al Riyadh Ad Duwadimi 0.000 Al Majmaah Al Quwayiyah 0.009 Abha Ahad Rufaydah Al Majardah 0.003 o.oo6 o.oo6 0.001 Bishah Al Qhazaiah Abu Arish 0.003 Ahad Al Masarihah Al Ahsa Al Jubayl Al Khubar Haft Al Babin 0.003 Jizan 0.004 Samtah o.oo8 o.oo6 o.oo8 0.004 0.002 o.oo6 Al Taif Al Lith Al Quryyat 0.013 Jiddah 0.008 Medina 0.004 Mecca 0.004 Muhayil 0.003 0.004 0.004 0.003 0.010 0.000 0.005 0.003 0.311 o.oo8 variable total network activity. As explained previously, the value V was chosen to minimize the MSPE during the pre-treatment week; the values associated with total network activity, average call duration, and percent men was the largest (see Appendix A). Table 4.0.2 displays the weights of each control city in synthetic Qatif. Synthetic Qatif is largely composed of a combination of Sabya and Abu Arish. Both Sabya and Abu Arish are more underdeveloped than Qatif. Sabya, like Qatif, is known for having a higher concentration of Shiites than most cities in Saudi Arabia. 28 -I N8 2 4 6 8 10 -6t~i~l - tO-------------- 14 12 4 2 day 8 8 10 12 14 day Figure 4.0.3: Trends in Total Network Activity, Qatif and Synthetic Qatif (Left), and Total Network Activity Gap Between Qatif and Synthetic Qatif (Right) Qatif relative to synthetic Qatif in the pre and post treatment periods. Total network activity of synthetic The first panel of Figure 4.0.3 shows the total network activity in Qatif closely tracks real Qatif in the week before the riots. relative good balance in Table 4.0.1 suggests that synthetic proximation of the total network activity in Qatif fhis data, along with the Qatif provides a good ap- before treatment. To assess if this approximation holds throughout the pre-treatment period, the graph was extended an additional week prior to treatment (see Figure B.o.4 in the Appendix). After Ahmad al-Matar was shot at midnight of December 2 7 th (the end of day 7, beginning of day 8), the two for actual lines begin to diverge substantially, with total network activity decreasing Qatif. activity between had a The second panel of Figure 4 plots the daily gaps in total network Qatif and its synthetic counterpart and suggests that the treatment large effect on total activity for the following few days but, as expected, the effect is not sustained. 29 Table 4.0.3: Daily Average Call Duration Predictor Means Variables Treated Synthetic Sample Mean avgDuration (days I-7) 122.141 122.153 137-957 totalActivity (days 1-7) 12.4 12.402 13.059 0.545 0.551 o.6oo o.870 o.839 0.822 percentMen percentSaudi Table 4.0.4: Daily Average Call Duration Governorate Weights in Synthetic Qatif Governorate Weight Weight 0.005 o.oo6 0.001 o.oo8 o.oo6 o.oo6 Al Bahah 0.007 Governorate Ar Rass Ar ar o.oo8 Unayzah Sakaka 0.007 Yanbu Al Bahar 0.003 Al Riyadh Ad Duwadimi Buraydah Al Kharj 0.005 0.005 Al Majmaah Al Quwayiyah Khamis Mushayt 0.004 Ha il Sabya 0.003 Al Qunfidhah 0.009 Abha Ahad Rufaydah Al Majardah Bishah o.oo6 Najran 0.001 Al Qhazaiah 0.007 Tabuk 0.001 o.826 Ad Dammam 0.003 Al Ahsa Al Jubayl Al Khubar Haft Al Babin 0.005 Abu Arish Ahad Al Masarihah Jizan 0.003 Samtah 0.003 0.004 Al Quryyat o.oo8 Al Taif Al Lith Jiddah Medina 0.005 Mecca 0.004 Muhayil 0.002 0.000 0.007 0.003 o.oo6 0.001 0.005 0.005 0.005 0.004 0.002 Table 4.0.3 displays the balance between Qatif and its synthetic control for the outcome variable of average call duration. As expected under this scenario, the values of the diagonal element V associated with average call duration pre treatment is the largest; total network activity was not a predictor at all (see Table A.o.2 in the Appendix). Ta- ble 4.0.4 displays the weights of each control city in synthetic Qatif for average call duration. The weights indicate that in this case, synthetic Qatif is constructed mainly by Abu Arish, although others in the donor pool played a small part. 30 * - - OyTl0ed A 2 4 6 8 10 12 14 2 days 4 6 8 10 A 12 14 days Figure 4.0.4: Trends in Average Daily Call Duration, Qatif and Synthetic Qatif (Left), and Average Daily Call Duration Gap Between Qatif and Synthetic Qatif (Right) The first panel of Figure 5 shows average call duration for Qatif and its synthetic control. The two lines remain closely aligned before and after treatment. This outcome demonstrates that the current predictors are not good at estimating the causal effect of treatment on average call duration. Figure 4.0.5 shows the total number of unique callers per day between Qatif and synthetic Qatif. The plots are similar to the trends in daily network activity, in that both the synthetic and real plots look consistent leading up to the treatment event, after which they deviate considerably. Table 4.0.5 displays the balance between Qatif and synthetic Qatif for Daily Unique Callers. Table 4.0.6 displays the weights of each control city in synthetic Qatif. Sabya and Abu Arish again make up the majority of the counterfactual. 3' Go ---------------- -I 0 2 B E 4 10 12 14 2 4 0 8 day 10 12 14 day Figure 4.0.5: Trends in Unique Callers, Qatif and Synthetic Qatif (Left), and Gap in Unique Callers, Qatif and Synthetic Qatif (Right) Table 4.0.5: Daily Unique Callers Predictor Means Variables Treated Synthetic Sample Mean uniqueUser (days 1-7) 10.905 10.904 11.210 122.141 122.208 137.957 0.545 0.570 o.6oo o.870 0.855 0.822 avgDuration (days 1-7) percentMen percentSaudi INFERENCE [ I SM F t 0-p 2 a 6 e t0 2 +i z i s a to Figure 4.0.6: Synthetic Control Placebo Tests with Sabya. Total Daily Network Activity (Left), Average Call Duration (Middle), and Daily Unique Callers (Right) To assess the significance of the estimates, a variety of placebo tests are now conducted. 32 Table 4.0.6: Daily Unique Callers Governorate Weights in Synthetic Qatif Governorate Weight Al Bahah 0.004 Ar ar 0.006 Sakaka Yanbu Al Bahar 0.007 Governorate Ar Rass Weight 0.009 Al Kharj Khamis Mushayt Ha ii Sabya Al Qunfidhah 0.007 Najran Tabuk 0.005 0.005 Ad Dammam o.oo8 Al Ahsa Al Jubayl Al Khubar Haft Al Babin Al Quryyat 0.013 Unayzah Al Riyadh Ad Duwadimi Al Majmaah Al Quwayiyah Abha Ahad Rufaydah Al Majardah Bishah Al Qhazaiah Abu Arish Ahad Al Masarihah Jizan o.oo6 Samtah 0.005 0.007 0.006 0.009 0.001 Al Taif Al Lith Jiddah Medina 0.018 Mecca 0.038 Muhayil 0.005 Buraydah 0.005 o.oo8 o.oo78 o.oo86 0.006 0.471 0.003 0.009 0.004 0.004 0.003 0.000 0.006 0.007 0.004 0.006 0.004 0.274 0.005 0.008 o.oo4 0.004 0.005 First, the synthetic control method is applied to Sabya, a city similar to Qatif based on its daily network activity profile and contributed the highest weight in the synthetic control method for the outcome variable total network activity. Figure 4.0.6 shows there is no difference in outcome trajectory between pre and post treatment for Sabya. Across-unit and in-time permutation tests are now performed. The across-unit placebo test iteratively assigns treatment status to every other city in the donor pool and applies the synthetic control method. If the placebo studies create gaps of similar magnitude to the one estimated for Qatif, the analysis does not provide significant evidence of a negative effect of the treatment on total network activity. 33 fK 'S VM A 'S 11 ow 2 4 6 6 10 12 14 2 4 0 day e Pgoo 10 12 14 10 12 14 day 4 r 2 4 a 0 10 12 2 14 4 a o day day Figure 4.0.7: Across-Unit Placebo Tests: Total Activity (all, 500x or less, 100x or less, 50x or less) In the four graphs in Figure 4.0.8, the gray lines are the control cities and their divergence from their synthesized analogs, and the black line is same divergence for Qatif. This helps in assessing whether the estimated treatment effect for the treated unit is distinguishable from randomness. The top left graph includes all placebo cases. The top right graph excludes cities whose MSPEs are 20 times greater, the bottom left graph excludes cities whose MSPEs are io times greater, and the bottom right graph excludes cities whose MSPEs are 5 times greater. Once cities that do not provide a good counterfactual are excluded, the gaps are smaller in magnitude than Qatif's suggesting that the result is significant. 34 r 9 4 ri I p6 3 4 4 4 merol 2 4 8 8 10 12 2 14 4 8 8 10 12 14 10 12 14 day day n 0 s r o d d 3P a ,s ,s w _. g 4- Y - 0 4 4 2 4 8 8 10 12 2 14 4 8 8 day day Figure 4.0.8: Across-Unit Placebo Tests: Daily Unique Callers (all, 500x or less, 100x or less, 50x or less) The in-time placebo test assigns the treatment period to a time t < To for the ac- tual treated unit. A To of 4 days before the shooting of al-Matar was chosen. As shown in the leftmost panel of Figure 4.0.9, total network activity of synthetic Qatif closely tracks real Qatif during both pre and post treatments, demonstrating that no effect is detected in the absence of treatment. The middle panel of the figure shows the average call duration of synthetic and real Qatif. Although the actual treatment at day 8 has no effect on average duration, there exists a slight but discernible difference at the placebo treatment of day 4. Finally, the rightmost panel of unique daily callers shows a strong correspondence between the synthetic and actual plots over the entire time period, again signifying no effect. 35 V f 3a5 5P 12 3 a 6 : e s T e Figure 4.0.9: In-Time Placebo Tests with Qatif. Total Daily Network Activity (Left) and Average Call Duration (Right) 36 5 Analysis: Inter and Intracity Calling Patterns The aggregate activity analysis in the previous section serves as a strong indicator that the population of Qatif altered their communications patterns in response to the violence of December 2 7 th. However, little can be said about the nature of this change. The study will now turn its attention to some of the more nuanced characteristics of call behavior, namely, the relationships between and across cities. Three distinct properties of city communications will now be examined: " Intra call patterns: calls whose source and destination are within the city e Inter-in call patterns: calls whose source is outside of the city and destination is within the city " Inter-out call patterns: calls whose source is within of the city and destination is outside the city 37 These measures may provide a better understanding of whether people turn inward to their communities during times of political duress, or whether they turn outward to spread news beyond their localities. Due to the structure of the dataset, callee location must first be inferred to calculate each of the above quantities. To calculate each of the above quantities per city, dataset. The process is known in the literature as home/work estimation, and has been used often to explore urban mobility behavior [22] [23] [24] [25]. 5.1 LOCATION IDENTIFICATION Initially, the location identification procedure used followed a filtration procedure similar to the one discussed in Phithakkitnukoon et al. [26]. First, all weekend activities (weekends in Saudi Arabia are Thursdays and Fridays) were culled from the dataset. Then, call sequences that are too infrequent were removed to focus on meaningful estimates; only calls that were made within a 16 hour time window were included. Day and nighttime periods were defined as 10:oopm to 6:ooam and 9:ooam to 3:oopm respectively, and calls were binned accordingly. For each phone user, day and night locations were ranked by activity (with a small correction for call hops amongst nearby towers), and representative towers were selected if the user made more than 60% of his or her calls in this location. Although this helps eliminate false traces, it does limit the study to users who hold occupations that follow traditional business hours. While this may encapsulate students, it's likely the jobless, disenfranchised youth that may orient protest movement have been culled from the sample. On the whole, this filtration method resulted in well-defined home/work location pairs for roughly 11% of the the unique identifiers in the full dataset. Unfortunately, the resulting coverage was too sparse to identify weekly patterns at a city scale, making the filtration too stringent to capture phenomena related to the protests. The location identification procedure was then modified to encapsulate a greater sample size. After all, this study is only interested in a caller's primary city of residence, which permits a greater degree of leniency in the filtration process. In the modified approach all unique users are selected from the full, month-long dataset. Then a record of each user's activities are aggregated per city in the donor pool. To guarantee a meaningful 38 sample of call records, any user who made less than one call every 2 days is culled from the dataset. Then, for each user, governorates are ranked by total activity. A city is designated as a user's home location if more than 75% of his or her total calls are made there. In general, some care should be taken to prevent misidentification due to users moving or vacationing, but this is not considered in this instance due to the relatively tight time window of the dataset. The new dataset consists of roughly 7.3 million users, or about 40% of the total dataset. 5.2 URBAN CALL COUNTS After subsetting the set of users whose locations were identifiable, intra, inter-in, and inter-out activity counts are made per city, per day. Figure 5.2.1 shows the standardized profiles for Qatif (in black), and all other Saudi cities (in gray) over the study period. Contrary to the total daily call activity plot in the previous section (Figure4.o.2, one sees a gradual uptick in daily calls from pre to post treatment weeks. In fact, in terms of week-over-week changes, Qatif experiences a 9.08% increase in intra call activity, against a country-wide 6.64% increase. Similarly, Qatif sees a 7.81 % increase in inter-in activity and a 10.20% increase in inter-out call activity, while the other cities in the donor pool experience increase of roughly 7.7% for both activity types. Looking more closely at the weekly rhythms of the charts, one can see a clear drop in activity on Fridays, similar to the total activity plot from the previous section. SCM is now applied, using the following predictors: " Log of daily intra city call activity the week before treatment " Log of daily inter-in call activity the week before treatment " Log of daily inter-out call activity the week before treatment " Percent of males as obtained from the Department of Analysis & Reports in 2010 39 3 2 0i - 0 3 .. - - 0 - - 1 - - - -3 - , 2 1O 2 N 00 p--e--2 -3 - - ---- - - L 3 2 1 -2 l l._jJ Ai I I Iil 1| 3g19 - . 20 21 22 23 24 25 26 27 Day 28 29 -e- 30 Qatif - 31 e - All 01 02 03 04 Other Saudi Govemorates Figure 5.2.1: Trends in standardized intra (top), inter-in (middle), and inter-out (bottom) call volumes daily network activity, Qatif (solid) vs. other Saudi governorates (dashed). Dec. 20th - Jan. 3rd. Treatment indicated by dashed pink line - Percent of Saudis as obtained from the Department of Analysis & Reports in 2010 Most strikingly, the Figure 5.2.2 shows a marked increase in phone activity immediately following al-Matar's death. The plot shows a strong increase in intracity call activity at 40 g I Ul ) f:i 1 O 8O T O E yy.E O o Lb 1 tII O I a _ I 1 2 4 a 6 12 10 14 2 4 6 day 8 10 12 14 10 12 14 day Figure 5.2.2: Trends in Intra Call Volumes I i N O C1 0 m O U O C I m O 00 p 4 I I O - L 4 Oely - Syla alc pgtl -* 2 4 6 6 10 12 2 14 4 6 dy 8 day Figure 5.2.3: Trends in Inter-In Call Volumes the onset of the protests on December 2 7 th (day 8 of the study period), and another peak on January 1st (day 13 of the study period). In a western context one could mis- take this as an effect of the new years holiday, however, this measure is relative to all cities in the donor pool-synthetic Qatif would have seen an increase as well. Additionally, Saudis follow the Hijri, as opposed to the gregorian calendar year. January ist was significant to Qatif for a different reason; this was the day after al-Mater's funeral procession. 41 1i 0 O o b O 1\~ 2 4 0 6 10 12 2 14 4 s 1 10 12 14 day day Figure 5.2.4: Trends in Inter-Out Call Volumes See Tables C.o.i and B.o.2 in the appendix for more detail on these results, and Figures B.o. i and B.o.3 for their robustness checks. As an additional investigation, Figure B.o.2 shows daily measures of all calls made by individuals identified as residents against a synthetic control. The plot shows a strong correspondence between the real and synthetic measures, both pre and post treatment. This adds more nuance to the shift in communications patterns, suggesting that while a residential call 'budget' remained fixed before and after the event, a greater proportion of calls were made to within-city individuals. Unfortunately, the trends in inter-in and inter-out activity (Figures 5.2.3 and 5.2.4) do not tell as clear a story. Qatif's aggregate change in inter-in activity week over week is roughly consistent with the nation-wide average, and daily plot closely mirrors the synthetic control. On the other hand, Qatif's increase in inter-out activity is a good deal higher than the mean for all cities, yet the daily plot holds a consistent profile with the synthetic profile before and during the period of unrest. The next section will apply SCM to activity on Twitter. 42 Analysis: Twitter Activity The Twitter dataset is examined through the following dimensions: r. The total number of tweets across the city. 2. The number of tweets per unique user. 3. The average message length. 6.1 GEOTAGGED ACTIVITY The first pass of the analysis looks at trends in twitter activity through the geotagged dataset. The changes from the pre-treatment to post-treatment weeks show only a slight increase in overall tweet volume for Qatif (0. 4 3%) and a 7.91% decrease in unique daily users, against an average 18.32% increase for the rest of the country, with a corresponding 13.12%. increase in unique daily users. The standardized daily activity plots of Qatif 43 and the rest of Saudi Arabia, (reproduced in the appendix), show profiles that are much noisier than the phone activity trends. This may indicate that the twitter sample is too small, or the temporal resolution is too detailed to quantify an impact in behavior. 3 2 0 - . o--_ 0 i - y Q -1 p. -2 'II -3 ii ,i 3 ~ 2 JI L L L _ _ i L L L F C F 1 O 0 0 a CU Q. -1 - - Ca 6) -2 3 - r i- g- r- r ~ r -- -r - T T F- rr-F 1--F T-r- T--- 2 1-0 -2 - 19 " 20 21 A L 22 23 J 24 ~ I 25 26 L 27 Day 28 L 29 L 30 -G- Qatif 31 - e L 01 02 03 04 - All Other Saudi Govemorates Figure 6.1.1: Trends in standardized daily Tweet volume (top), Tweet length (middle), and Tweets per user (bottom), Qatif (solid) vs. other Saudi governorates (dashed). Dec. 20th - Jan. 3rd. Treatment indicated by dashed pink line The following predictors are used: - Log of the tweet volume or the number of tweets per unique user, per day in the week before treatment 44 " The average tweet length per day in the week before treatment - percent of males as obtained from the Department of Analysis & Reports in 2010 " percent of Saudis, also obtained from the Department of Analysis & Reports in 2010 I , T aO -------- -- ---- f1 __ 2 ~ __r a 4 8 10 12 14 2 4 8 8 10 12 14 Day Day Figure 6.1.2: Trends in Total Tweet Activity, Qatif and Synthetic Qatif It ---------- -- --- -- -------- -- 2 v 4 2 6 4 a 10 81 12 14 2 ay 4 6 i 8 10 day Figure 6.1.3: Trends in Average Tweet Length, Qatif and Synthetic Qatif 45 12 14 2 4 6 6 10 12 2 14 4 6 Figure 6.1.4: Trends 8 10 12 14 Day Day in Tweets Per User, Qatif and Synthetic Qatrf As evidenced in Figures 6.1.2 and 6.1.3, SCM is able to capture and quantify effects length. The plots of daily Tweets per user are slightly better however. Figure 6.1.4 shows a slight, but imperfect correon neither geotagged tweet volume nor average tweet spondence between Qatif and its counterfactual for days i through 11i, before a striking divergence for the remainder of the study period. These dates somewhat coincide with the funeral procession of Ahmad al-Matar, which took place on December 31st, or day 12. Predictor weights, governorate weights, and v-weights and displayed in Tables C.o.i, C.o.2, and C.o.3 in Appendix C. Similar robustness checks are now performed using the techniques explained in previous chapters. The second panel of Figure C.o. i shows the in-time placebo test over the pretreatment period. The lines for Qatif and synthetic Qatif are mostly consistent, showing that no effects are detected in the absence of treatment. The first across-unit placebo is conducted using Ahad Rufayday-the city that represents the largest share of Qatif's synthetic counterfactual. As the first panel in Figure C.o. i depicts, SCM does not produce a consistent counterfactual for the governorate. The complete across-unit permutation test is shown in C.o.2. They suggest that the estimated treatment effect for Qatif is not distinguishable from randomness. These are indications that the change seen in daily Tweets Per User may be the result of a statistical fluke, not the onset of city riots. Ultimately, it's likely that the Tweet data is simply too noisy at this scale. In order to capture a more comprehensive view of Twitter activity 46 in each city, the "Location Estimation for Non-Geotagged Tweets" section in Chapter 7 presents a procedure to estimate locational data from non-geotagged tweets using a probabilistic classifier. 47 7 Future Directions 7.1 LOCATION ESTIMATION FOR NON-GEOTAGGED TWEETS While majority of Twitter users do not share latitude and longitude data with their tweets, many record an individually-defined, publicly-accessible location string in their account information. Typically, this is the city and country in which they reside. Using this assumption, it is possible to estimate where tweets were made using a naive Bayes text classifier. The Naive Bayes model is a straightforward probabilistic learning method that has found great popularity in text classification problems due to its relative simplicity yet highly effective performance in many real-world problem domains [27]. It works particularly well for classification problems with high feature spaces, which is well suited for location identification based on a number of diverse textual indicators. Following the approach of Manning et al. [28], the probability of tweet t originating from city c 48 is based on Bayes Rule: P(C = cit = {w1, W2, .. we} P(C = c)P(t = {w 1 , w2, ... wn }IC = c) P(t = {wl, w2, ... wnt }) Where: " City c exists in the set of all Saudi Cities in the donor pool: " Each tweet c E C = {ci, c2, ... , C41} tE X " nt is the number of terms in the tweet's location field. P(wi Ic) is the conditional probability of term wi occurring in a tweet of city c. " P(c) is the prior probability of a tweet originating from city c. The denominator of the above expression is constant over the target cities, as it is a function of values in t. Additionally, Naive Bayes assumes conditional independence between the attribute values (the tweet location tokens) given a target value (each city). Hence: 11 P(clt) oC P(c) P(wijc) 1<=i<=nt Using a training set T of tweets labeled such that (t, c) E XxC, a classifier -y is devel- oped that maps tweets to cities: X -+ C If a tweet's location terms don't produce clear evidence for a specific city, the city that has the highest prior probability is chosen (the maximum a posteriori decision rule). The Naive Bayes classifier is defined as: 49 argmaxcecP(ct) = argmaxc E C (c) fi P(wi 2 c) 1 <=k<=nt Thus, the estimate to determine which city a tweet belongs to is the product of the probability of each location term of the tweet given a specific city, multiplied by the probability of the city. After calculating for each c E C, the c with the highest proba- bility is selected. To avoid floating point underflow (situations where the probabilities are so small they can't be stored in memory),instead of multiplying the probabilities they will be logged and summed: argmaxcEc[logP(c) log1(wi1c)] 1<=k<=nt Which effectively treats every conditional parameter logP(wi Ic) as a weight that indi- cates how good an indicator tk is for c, where the prior logP(c) is a weight that tells the relative frequency of c. The intuition here is that more frequently cited cities are more likely to be correct. The maximum likelihood estimate is used estimate the parameter P(c). Given the training set of geotagged tweets this is the relative frequency of c, i.e.: p(c) = N Where N, is the number of tweets from city c, and N is the total number of tweets. The conditional probability P(wl c) is estimated as the relative frequency of the location term w within tweets belonging to city c: 50 P(wIc) T = Et'E Wem Wow is the count of the location term w from city c. The count runs over all different positions k in the training set of tweets, thus the positions bear no impact on Here the estimates. This may pose problems in other applications of Naive Bayes, however the problem at hand is narrowly defined, and is unlikely to skew the results. As safeguard against misclassification, location terms were only associated with Saudi cities during the training process if at least 50% of their occurrences were found in that city. Applying the supervised learning procedure to stream of 10% of total tweets resulted in about 1,ooo,ooo new tweets from roughly 69,ooo unique users over the entire study period. The analysis from Chapter 7 was redone with the new dataset and the results are reproduced below. As the plots demonstrate, it seems that the data are still too sparse to identify any city-level trends through SCM. While it's possible to lower the classifier's stringency and accumulate more urban-level data, it's worth noting that, overall, the scale of coverage between CDRs and Tweet data is completely different; When looking at Qatif only, the CDR dataset holds roughly 235,ooo records per day, while the Tweet dataset holds approximately 90. It's entirely possible that Twitter's user base in Saudi Arabia was simply too underdeveloped and/or uneven at this point in time, making it impossible to aggregate and compare at the spatial scale of a city, or the temporal scale of a day. Further research will work to balance the classifier's rigidity and total output in an effort to better characterize the treatment effect through social media. 7.I.I INITIAL RESULTS: NON-GEOTAGGED TWEETS As in Chapter 7, the following predictors are used: 5' " Log of the tweet volume or The number of tweets per unique user, per day in the week before treatment " The average tweet length per day in the week before treatment - percent of males as obtained from the Department of Analysis & Reports in 2010 - percent of Saudis, also obtained from the Department of Analysis & Reports in 2010 a b - -- 0 '6. ---- - -- -- - j 0 a 9 - 2 4 a a 10 - 12 syMMtw Qotl 2 14 o4y 4 6 8 10 12 14 Day Figure 7.1.1: Trends in Total Tweet Activity, Qatif and Synthetic Qatif 7.2 COMMUNICATION NETWORKS The analyses on aggregate activity through call records and social media demonstrate a profound change in Qatif's communication patterns in response to civil unrest. The most important next step will be in exploring the compositional changes that occur in the city's social network; is it possible to identify any emergent reorganization strategies that either encourage or impede information flow? This section presents the a few initial investigations in this direction. 52 12 o I 2 4 B 8 n 10 12 14 2 Figure 7.1 2: Trends in Average Tweet Length, 4 0 B 10 12 14 12 14 Qatif and Synthetic Qatif 0! w-ww (p~~~~~p9~ wI - / -O----- ~ - - Day I 2 6 4 8 Figure 7.1.3: Day 10 12 14 2 4 6 I 8 10 Trends in Tweets Per User, Qatif and Synthetic Qatif NETWORKc GENERATION Human interaction networks based on CDR data have been constructed for each city in the Saudi donor pooi. The generation procedure follows [29], wherein each mobile phone user is defined as a node, and links are formed among nodes according to the communications records. This study focusses only on cities' reciprocal networks, in which two nodes are connected if and only if both of the corresponding users initiated at least one call to the other over the study period. A non-reciprocal 53 network, in which a link exists if either side initiated activity, may contain unidirectional communications-possibly interactions between individuals who do not know each other. Thus, it is presumed to represent a more superficial social network than the reciprocal alternative. Again, following Schlapfer et al., all nodes which never receive nor initiate calls are eliminated, in an effort to remove potential bias from call centers and/or other business hubs. The network is composed only of users whose home locations have been identified following the procedure described in Chapter 5. The set of users represents roughly 40% of the total individuals in the dataset. Each edge weight wi. is defined by the number of communications initiated by individual ni to individual nr. The degree and edge weight distributions of the two-week nationwide network are shown in Figure 7.2.1. Once the complete network was constructed it was split into 40 different city networks by severing intercity edges. A variety of timescales were used, but single day networks proved too sparse to offer meaningful insight. Ultimately, two week-long, directed networks were built corresponding to pre and post treatment periods. This analysis looks at the largest connected cluster (LCC, giant component) extracted from each Qatif network. The networks' basic properties are summarized in Table 7.2.1. 10 101 108 __ --- -- 106 ''__________1_____ 102 10 101 1 01 10-2 10 ___________ ______ 2120 104 103 10 103 100 10 102 10 14 Figure 7.2.1: Total Degree Distribution (Left), and Edge Weight Distribution (Right) of the Complete Reciprocated Network, KSA 54 Qatif Qatif Week n m Avg Degree GCC Post 23193 60379 5.21 0.092 LCC 76% Pre 22522 56392 5.01 0.092 73% Table 7.2.1: Summary statistics for Communication Networks. The size of the larges connected component (LCC) is presented as a percentage of the number of nodes in the full city network. The total number of nodes (n), number of links (m), average degree, and global clustering coefficient (GCC) correspond to the complete city networks INFORMATION DIFFUSION Following Onnela et al. [30], Figure 7.2.2 explores global information diffusion across both the pre and post treatment networks for Qatif. The process is based on a simple infection model in which the probability that an infected node passes the disease to its nearest neighbor node is proportional to the strength of their connection. The procedure randomly selects an individual and 'infects' him or her with information at time to = 0. At each following time step t8 each infected individual n will pass the information to another individual nr in its contact list with probability Pi = zwig, where wi3 is the edge weight. Thus if two individuals have a higher number of connections between them they will be more likely to pass information to each other. x is used a control parameter for the rate of overall spread through the network. The most straightforward choice is x = 1/max(wi3 ), such that the strongest weight will result in a probability of I. However, as Onnela et al. state, this creates very long simulation times due to the skewness of the weight distribution; normalizing by the maximum weight creates very small transmission probabilities for the majority of connections. By increasing the value of x the simulation can be sped up without dramatically altering the overall system. This produces a cutoff w* to the transmission probability, such that transmission will always occur for weights above w*. w* is chosen so that Pc,,(w*) ~~.965, or w* ~ 14. Thus Pig - The diffusion simulation has been conducted wig for 96.5% of the weights. 1,ooo times over each network. As the top panel in Figure 7.2.2 demonstrates, beyond a threshold of roughly 25% infected, the rate of transmission is actually faster in the post treatment network. Additionally, as seen in the bottom panel of 7.2.2, the distributions of edge weights of the links responsible for infecting an individual favor low edge strengths for both networks, sug- 55 gesting that the majority of individuals get their through weak ties-a finding that is consistent with the revered role of weak ties in information sharing [31]. Moreover, the phenomenon is slightly pronounced in the post treatment distribution, which may imply that weak ties become increasingly important during times of duress. It must be stated, however, these results are strictly preliminary. They provide some indication that communities intelligently reorganize communications to increase dissemination speed and breadth during periods of civil unrest, but further research will be required to clarify this relationship and demonstrate a causal link. 7.3 RELIGIOSITY An intriguing pattern was found in the Saudi mobile activity distributions; at various points in the day activity would simply drop off for around 30 to 40 minutes before retiring to its typical trend. These inactivity "valleys" were actually the result of daily prayer times. Millions of Muslims across the country put down their phones to turn and face the holy city of Mecca to give prayer five times a day. Shops and businesses essentially close for 20-30 minutes while the religious police-the Mutaween-surveil the streets in the hopes of sending all loiterers to the nearest mosques. Interestingly, the activity distributions capture this behavior very closely. The precise timing of these calls to prayer depend on the position of the sun in the sky, and thus, by differentiating the CDR distributions into western, central, and eastern regions one can see the prayer . times moving across the country as shown in Figure 7.3-1 This prayer time disruption could function as a rough proxy for urban-level religiosity. The following method has been created to catalogue daily disruption. It utilized the fourth prayer, Maghrib (the sunset call to prayer), due of its strong presence in the data. Method I: Let nmaxo be the maximum total network traffic at the beginning of the window, nd,max, be the maximum total network traffic at the end of the window for day d, td,maxi, and nd,mini be the minimum total network traffic over the wintd,maxo, dow, at td,min for day d. Now let C(t) equal call count for time t, and 56 O(t) as: I 0.8 0.6 a m c c a 0.4 U N d 0-2 Qatif Pre Treatment Qatif Post Treatment 0 - 0 - ------ 20 ---- - ----- 40 ---------- 60 80 10 0 Time t 500 Qatif Pre _ __ Treatment Qatif Post Treatment 400 8 300 E 200 z 100 0 0 20 40 60 Time 80 100 t -----7-------- ---------- ------------ 0.35 --- Qatif Pre Treatment Qatif Post Treatment - f-4----- 0.3 0.25 0.2 a. 0.15 0.1 0.05 0 5 10 Edge Weight 15 220 Figure 7.2.2: Fraction of Infected Nodes as Function of Time (Top), Number of infected Nodes at each instance of t (Middle), and Distributions of Edge Weights Responsible for Infection 57 U my -~ q'V v U I <~4(I / /7, / / ~ ~hJW 1< 1 iii Figure 7.3.1: Daily Network Activity Distributions from Jeddah (Western Saudi Arabia), Riyadh (Central Saudi Arabia), and the Eastern Region ((nd,MAX, - nd,maxo ) / (td,maxi - td,maxo)) t + nd,mini (the estimated curve had no disruption occurred). The disruption is then calculated as the ratio of the disturbance area over total possible area: tdmaxl f R2d d- C(t)-C(t)dt tdmax tdax 1 f C(t)dt tdrnaxo Figure 7.3.2 shows disruption for all cities over pre and post treatment periods. The plot is messy at best. Unfortunately it seems that religiosity is not tractable using this method over a daily time scale. Quantifying this phenomenon remains an open question for the future. 58 U L.Lr 0.18 0.16 0.14 0.12 0.1 0.08 0.06 0.04 ' 19 ' 20 ' 21 ' 0.02 22 23 24 25 27 26 28 29 30 31 01 02 03 Figure 7.3.2: Trends in Daily Prayer Time Disruption, All KSA Cities. Qatif drawn in pink. 59 04 8 Discussion The analysis presented above points to a number of compelling, statistically significant changes in communications across Qatif-relative to other urban agglomerations in Saudi Arabia-in the week following Ahmad al-Matar's death. The effects of social unrest undoubtedly reverberate through phone behavior, as Qatif's daily call activity appears to shift in both magnitude and composition in response to the exogenous shock. The most powerful trend identified in Qatif is the decrease in city-wide phone activity over the post-treatment time window; the volume of daily calls exhibits a dramatic drop immediately following the boy's death, and holds well below the synthetic counterfactual for the remainder of the study period. A similarly significant drop in unique daily callers is also found-evidence that not only were fewer calls being made, but fewer people were actively communicating through the city's telecom infrastructure. Interestingly, the differences between the treated and control units at the extremes of the post treatment period for unique daily callers are slightly less pronounced than those 6o of total daily calls, suggesting that treatment effect on unique callers has a slower initial response time and a faster falloff. These results raise two possible, but not mutually exclusive post-treatment scenarios: (i) individuals who had the means to leave the city did so; or (2) people limited their daily calls, potentially switching to other forms of communication. To begin with the former, it's possible that individuals who had been relative 'outsiders' in Qatifindividuals who had ties to other areas-turned away at the first sign of violence. This is not unlikely, given allegations that the Government had labeled the city as a dangerous place to visit [1 3]. It's within the realm of possibility that the alleged scaremongering had made individuals apprehensive about spending time in Qatif. To address latter, there seems to be some sensitivity in the Saudi population regard- ing privacy concerns related to mobile phones. The government has a huge stake in mobile infrastructure; STC, for instance, is majority-owned by the Saudi government through Saudi Arabia's Public Investment Fund. There have long been rumors circulating that the government monitors its citizens' activities through mobile devices [391. These rumors came to a head recently, when Human Rights Watch, an international human rights advocacy group, accused the Kingdom of tapping Qatif residents' phones and monitoring their activity [42). Allegedly, the surveillance software had been propagated through the city as malware masquerading as a local news app, since the onset of Shia-led protests in 2011 [43]. It's possible that people-even those not involved with the demonstrations-wanted to avoid any potential scrutiny and avoided phone use at the first indication of civil disruption. While daily call volume and unique daily callers experience steep declines, no evidence of a change in the average duration of daily calls is found. Social unrest appears to have a stronger, more generalized effect on how often people make calls and whom they call than how long they stay on the phone. Duration measures appear too noisy to isolate patterns at the levels of aggregation employed here. The study period may need to be constrained to a tighter time window around the event, and/or measurements may need to be obtained at more granular intervals to extract any changes in response to treatment. It's also possible that changes in call duration are only be measurable within subpopulations who are active in the protest movement. These remain issues to be explored in the future. Beyond changes in daily aggregate activity, strong evidence exists of a transformation in the call composition of individuals who are identified as local residents. The analysis presents an increase in daily activity within the subnetwork of users identified to hold strong spatiotemporal ties to the city, even though their total activity-the number of connections both internal and external to this subnetwork-remains constant. This increase in intra network communication suggests that people strengthen their connections with others in the urban community during periods social unrest. Interestedly, the call measures within this network peak on the day of the al-Matar's funeral procession, adding credence to the notion that these changes were tied to the treatment. Inter-in activity, calls originating in other cities and terminating in Qatif, sees a slight increase week over week, but the change is statistically consistent with all other cities in the donor pool. On the other hand, inter-out activity, calls originating in Qatif and terminating in other cities, sees an increase of roughly i 0% week over week, which is higher than the national increase of 7.7%. However, daily trends do not appear to tractable using SCM. Additionally, an initial exploration of Qatif's city-wide human interaction networks provides some evidence that information diffusion increases in breadth and speed after treatment. The transmission simulations also point to a higher reliance on weak ties in the post-treatment network, which is consistent with the leading theory on the topic. While these findings are strictly preliminary, they offer some suggestion that communities under duress intelligently reorganize communications to increase overall information flow. Further research will be required to better identify and articulate the structural changes to Qatif's human interaction network before work can done to determine a causal link. Finally, examining the behavior on Twitter yields some interesting, if not altogether robust, findings. Both geotagged and location-estimated tweet activity, aggregated to the city scale, show no recognizable trends from one week to the next. Average Tweet length is similarly noisy for both Tweet datasets. However, when looking at daily Tweets per user, there appears to be a striking increase before, during, and after al-Matar's funeral on December 31st. It should be stated, however, that this finding 62 is tenuous; the robustness checks do not provide much support that this is more than a statistical anomaly. Ultimately it seems that-while Twitter adoption is quite high in Saudi Arabia-Tweet coverage per city is simply too uneven and sparse to capture any recognizable trends during this time. Upcoming research will attempt to soften the locational classifier's stringency in the hopes of creating a more comprehensive view of social media usage during this period of time. 63 Appendices 64 A Appendix: Call Behavior Table A.0.1: V-Weights for Total Daily Activity v.Weights log(Total Activity) (days 1-7) Avg. Duration. (days 1-7) 0.421 0.315 Percent Men o.26 Percent Saudi 0.003 Table A.0.2: V-Weights for Average Daily Duration v.Weights 1-7) o Avg. Duration. (days 1-7) Percent Men 0.912 o.o68 Percent Saudi 0.02 log(Total Activity) (days Table A.0.3: V-Weights for Daily Unique Callers log(Unique Callers) (days Avg. Duration. (days 1-7) 1-7) Percent Men Percent Saudi 66 v.Weights 0.893 0.055 0.048 0.004 coj - Cu C 01 - - T 5 10 15 Qatif yntheic Qatif 20 day Figure A.0.1: Total Network Activity, Qatif and Synthetic Qatif (3 Weeks) 67 . . . m 0l r ; I , I r I r \ \ I I . 0 0 0 . r - Oat, synthetic Qatif 0~ 5 10 15 20 . day . Figure A.0.2 Number of Unique Daily Callers, Qatif and Synthetic Qatif (3 Weeks) 68 B Appendix: Inter and Intracity Calling Patterns Table B.0.1: Daily Intra Call Activity Predictor Means Variables intraLog (days 1-7) Treated Synthetic Sample Mean 10.521 10.521 11.643 interInLog (days 1-7) 9.059 9.030 10.120 interOutLog (days percentMen percentSaudi 9.008 9.068 10.145 0.545 0.546 0.602 0.870 0.804 0.822 1-7) 69 Table B.0.2: Governorate Weights in Synthetic Qatif (Daily Intra Call Activity) Weight Governorate Al Bahah 0.002 Ar ar 0.029 Governorate Ar Rass Unayzah Sakaka 0.000 Al Riyadh Yanbu Al Bahar 0.002 Buraydah 0.002 Al Kharj 0.002 Ad Duwadimi Al Majmaah Al Quwayiyah Khamis Mushayt 0.002 Ha ii Sabya 0.003 Al Qunfidhah 0.001 Naj ran 0.002 0.004 Weight 0.005 0.003 0.001 0.005 0.003 0.004 Abha Ahad Rufaydah Al Majardah 0.002 0.416 0.002 0.227 Tabuk 0.001 Ad Dammam 0.001 Al Ahsa Al Jubayl 0.001 Bishah Al Qhazaiah Abu Arish Ahad Al Masarihah Jizan 0.001 Samtah 0.004 Al Khubar 0.012 0.224 Haft Al Babin Al Quryyat 0.002 0.002 Al Taif Al Lith Jiddah Medina 0.002 Mecca 0.001 Muhayil 0.004 0.012 0.006 0.09 0.002 0.022 0.001 - I 4 e a a~rt~tc 6 - T I 2 4 6 a 10 12 1 14 day 2 3 4 5 6 7 8 days Figure B.0.1: Intra Call Activity Synthetic Control Placebo Test with Samteh (Left), In-time Intra Call Activity Placebo with Qatif (Right) 70 z Snthet coa 2 4 e 8 10 12 14 Figure B.0.2: Daily Local Call Activity Table B.0.3: V-Weights for Daily Unique Callers v.Weights Intra Calls (days 1-7) Inter-In Calls (days 1-7) Inter-Out Calls (days 1-7) Percent Men Percent Saudi 71 0.267 0.222 0.12 0.232 o.16 - -------------Oam 00" 2 4 6 a 10 12 2 14 6 4 a 12 10 - -------------- .yam., 14 day -- 2 4 a 6 10 12 2 14 4 --- - a - - a 10 0.11--- 12 -- 14 day day Figure B.0.3: Across-Unit Placebo Tests: Intra Call Activity (all, 20x or less, 10x or less, 5x or less) 72 0 J 0 0 da -Qatif synthetic Qatif 5 10 15 20 day Figure B.O.4: Intra Call Activity, Qatif and Synthetic Qatif (3 Weeks) 73 C Appendix: Twitter Activity Table C.0.1: Daily Tweets Per User Predictor Means Treated Synthetic Sample Mean Daily Tweets Per User (days 1-7) Log Total Daily Tweets (days 1-7) percentMen 0.267 0.267 0.246 3.501 0.545 3.501 0.546 percentSaudi 0.870 0.804 Variables 74 3.740 o.602 o.822 Table C.0.2: Governorate Weights in Synthetic Qatif (tweets Per User) Governorate Weight Al Bahah 0.007 Ar ar Sakaka 0.023 0.011 Yanbu Al Bahar o.oo8 Buraydah Al Kharj 0.013 Khamis Mushayt 0.014 Ha ii Sabya 0.018 Al Qunfidhah 0.098 Najran 0.010 Tabuk Ad Dammam Weight Governorate Ar Rass Unayzah 0.014 0.009 Al Riyadh 0.009 Ad Duwadimi Al Majmaah Al Quwayiyah 0.012 Abha Ahad Rufaydah Al Majardah Bishah 0.013 0.004 0.028 Al Qhazaiah Abu Arish 0.007 Ahad Al Masarihah 0.000 Al Ahsa Al Jubayl Al Khubar 0.029 Jizan 0.007 0.004 Samtah 0.012 0.002 0.00I 0.007 0.010 o.oo6 Haft Al Babin 0.032 Al Quryyat 0.013 Al Taif Al Lith Jiddah Medina Muhayil 0.010 Mecca 0.007 0.006 0.311 0.072 0.105 0.007 0.010 0.007 0.042 Table C.0.3: V-Weights for Daily Unique Callers v.Weights Daily Tweets Per User (days 1-7) Log Total Daily Tweets (days 1-7) 0.745 0.069 Percent Men o.185 Percent Saudi 0.001 75 o- m 0 _ 0 E d I 0 o" N _ O C -ydRh- -ha O - synthetic Qasti _ C 2 4 8 1o 12 1 14 2 3 4 5 6 7 6 Day Figure C.0.1: Tweets Per User Synthetic Control Placebo Test with Ahad Rufaydah (Left), In-time Tweets Per User Placebo with Qatif (Right) 76 r r D Ii - - - - ------ - ,.g- ---2 4 6 8 10 12 14 2 4 a 8 10 12 14 10 12 14 day day "4 a G p d ---- ---- ------- ------ --------- - 2 4 6 8 10 12 onu mmla :ayiw 2 14 4 6 8 day day Figure C.0.2: Across-Unit Placebo Tests: Tweets Per User (all, 50x or less, 20x or less, 5x or less) 77 References [I] Z. Tufekci and C. Wilson, "Social media and the decision to participate in political protest: Observations from tahrir square," Journalof Communication, vol. 62, pp. 363-379, Apr. 2012. [2] A. Breuer, T. Landman, and D. Farquhar, "Social media and protest mobilization: Evidence from the tunisian revolution," SSRN Scholarly Paper ID 2133897, Social Science Research Network, Rochester, NY, Aug. 2012. [3] G. Lotan, E. Graeff, M. Ananny, D. Gaffney, I. Pearce, and D. Boyd, "The revolutions were tweeted: Information flows during the 2011 tunisian and egyptian revolutions," InternationalJournalof Communication, vol. I1, pp. 1375-1405, 2011. [4] M. Szell, S. Grauwin, and C. Ratti, "Contraction of online response to major events," PLoS ONE, vol. 9, p. e89052, Feb. [5] 2014. J. P. Bagrow, D. Wang, and A.-L. Barabasi, "Collective response of human populations to large-scale emergencies," PLoS ONE, vol. 6, p. e1768o, Mar. 2011. [6] S. Aday, H. Farrell, M. Lynch, and D. Freelon, Blogs and Bullets H: New Media and Conflict after the Arab Spring. United States Institute of Peace Press, Mar. 2014. [7] J. H. Pierskalla and F. M. Hollenbach, "Technology and collective action: The effect of cell phone coverage on political violence in africa," American Political Science Review, vol. 107, pp. 207-224, May 2013. [8] T. Matthiesen, "Saudi arabia's shiite escalation," July 2012. [9] "Saudi police arrest prominent shi'ite muslim cleric," Reuters, July 2012. [io] T. Matthiesen, Sectariangulfi Bahrain, Saudi Arabia, and the Arab Spring that wasn't. Stanford, California: Stanford Briefs, an imprint of Stanford University Press, 2013. 78 [ I] R. Staff, "Two killed as saudi security forces try to arrest shi'ite man," Reuters, Sept. 2012. [I2] "Saudis protest killing of teen protester in qatif," Press TV, Jan. 2013. [13] B. Perazzo, "Propaganda & sectarianism: How the saudi government stifles the truth about qatif," Jan. 2013. [14] "The story of a tweet," Aug. 2014. [15] M. Mari, "Twitter usage is booming in saudi arabia - GlobalWebIndex (," Mar. 2013. [16] "Realtime twitter data access," Aug. 2014. [17] D. Boyd and K. Crawford, "Six provocations for big data," SSRNElectronicJour- nal, 2011. [18] A. Abadie and J. Gardeazabal, "The economic costs of conflict: A case-control study for the basque country," Tech. Rep. w8478, National Bureau of Economic Research, Cambridge, MA, Sept. 2001. ['9] A. Abadie, A. Diamond, and J. Hainmueller, "Comparative politics and the synthetic control method," SSRN Scholarly Paper ID 1950298, Social Science Research Network, Rochester, NY, Feb. 2014. [20] A. Abadie, A. Diamond, and J. Hainmueller, "Synthetic control methods for comparative case studies: Estimating the effect of california's tobacco control program," Journalofthe American StatisticalAssociation, vol. 10s, pp. 493-505, June 2010. [21] A. Abadie, A. Diamond, and J. Hainmueller, "Synth: An r package for synthetic control methods in comparative case studies,"JournalofStatisticalSoftware, vol. 42, pp. 1-17, June 2011. [22] E Calabrese, M. Colonna, P. Lovisolo, D. Parata, and C. Ratti, "Real-time urban monitoring using cell phones: A case study in rome," IEEE Transactionson Intelligent TransportationSystems, vol. 12, pp. 141-151, Mar. 2011. [23] F. Calabrese, G. Di Lorenzo, L. Liu, and C. Ratti, "Estimating origin-destination flows using mobile phone location data," IEEE Pervasive Computing, vol. 10, pp. 36-44, Apr. 201. [24] M. C. Gonzalez, C. A. Hidalgo, and A.-L. Barabasi, "Understanding individual human mobility patterns," Nature, vol. 453, pp. 779-782, June 2008. 79 [25] P. Wang, T. Hunter, A. M. Bayen, K. Schechtner, and M. C. Gonzalez, "Understanding road usage patterns in urban areas," Scientific Reports, vol. 2, Dec. 2012. [z6] S. Phithakkitnukoon, Z. Smoreda, and P. Olivier, "Socio-geography of human mobility: A study using longitudinal mobile phone data," PLoS ONE, vol. 7, p. e39253, June 2012. J. Hand and K. Yu, "Idiot's bayes: Not so stupid after all?," International Statistical Review / Revue Internationale de Statistique, vol. 69, p. 3 8 5, Dec. 2001. [27] D. [28] C. D. Manning, Introduction to information retrieval. University Press, 2008. New York: Cambridge [29] M. Schlapfer, L. M. A. Bettencourt, S. Grauwin, M. Raschke, R. Claxton, Z. Smoreda, G. B. West, and C. Ratti, "The scaling of human interactions with city size," Journalof The Royal Society Interface, vol. II, pp. 201 30789-201 30789, July 2014. [30] J.-P. Onnela, J. Saramaki, J. Hyvonen, G. Szabo, D. Lazer, K. Kaski, J. Kertesz, and A.-L. Barabasi, "Structure and tie strengths in mobile communication networks," Proceedingsofthe NationalAcademy of Sciences, vol. 104, pp. 73 3 2-73 36, May 2007. [ 31] M. Granovetter, "The strength of weak ties," The American vol. 78, pp. 1360-13 80, May [32] J.-P. Onnela, J. Journalof Sociology, 1973. Saramiki, J. Hyvbnen, G. Szab6, M. A. d. Menezes, K. Kaski, A.-L. Barabasi, and J. Kertesz, "Analysis of a large-scale weighted network of oneto-one human communication," New journal ofPhysics, vol. 9, pp. 179-179, June 2007. [33] "Another young man shot dead in qatif," Saudi Shia, Dec. 2012. [34] H. al Khoei, "Deadly shootings in saudi arabia, but arab media look the other way," The Guardian,Nov. 2011. [35] J.-P. Onnela, S. Arbesman, M. C. Gonzalez, A.-L. Barabisi, and N. A. Christakis, "Geographic constraints on social network groups," PLoS ONE, vol. 6, p. e16939, Apr. 2011. [36] P. Wood, "How saudis are learning to protest," Mar. 2011. [37] N. Eagle, A. Pentland, and D. Lazer, "Inferring friendship network structure by using mobile phone data," Proceedings of the National Academy of Sciences, vol. io6, pp. 15274-15278, Sept. 2009. 80 [38] F. Calabrese, Z. Smoreda, V. D. Blondel, and C. Ratti, "Interplay between telecommunications and face-to-face interactions: A study using mobile phone data," PLoS ONE, vol. 6, p. e2o814, July 2011. [39] Anonymous, "Interview with saudi lawyer," May 2013. [40] C. Song, Z. Qu, N. Blumm, and A.-L. Barabasi, "Limits of predictability in human mobility," Science, vol. 327, pp. 1018-1021, Feb. 2Q10. [41] A.-M. Staff, "Questions over death of protester in saudi arabia's eastern province - al-monitor: the pulse of the middle east," Jan. 2013. [42] "Riyadh accused of tapping dissidents' phones," June 2014. [43] "Saudi arabia: Malicious spyware app identified I human rights watch," June 2014. [44] "Saudi government monitoring internet to stifle protests." [45] "Saudi telecom sought US researcher's help in spying on mobile users (wired UK)." [46] G. Tavares and A. Faisal, "Scaling-laws of human broadcast communication enable distinction between human, corporate and robot twitter users," PLoS ONE, vol. 8, p. e65774, July 2013. [47] P. A. Grabowicz, J. J. Ramasco, E. Moro, J. M. Pujol, and V. M. Eguiluz, "Social features of online networks: The strength of intermediary ties in online social media," PLoS ONE, vol. 7, p. e293 58, Jan. 2012. [48] M. Granovetter, "The impact of social structure on economic outcomes," ofEconomic Perspectives, vol. 19, pp. 33-50, Jan. 2005. Journal [49] M. Batty, "The size, scale, and shape of cities," Science, vol. 319, pp. 769-771, Feb. 2008. [50] M. Fujita, P. R. Krugman, and A. J. Venables, The spatialeconomy: cities, regions and internationaltrade, vol. 213. Wiley Online Library, 1999. [51] M. Karsai, N. Perra, and A. Vespignani, "Time varying networks and the weakness of strong ties," Scientifc Reports, vol. 4, Feb. 2014. Colophon HIS THESIS WAS TYPESET using LTEX, originally developed by Leslie Lamport and based on Donald Knuth's TEX. The body text is set in I I point Arno Pro, designed by Robert Slimbach in the style of book types from the Aldine Press in Venice, and issued by Adobe in 2007. A template, which can be used to format a PhD thesis with this look and feel, has been released under the permissive MIT (xI i) license, and can be found online at github.com/suchow/ or from the author at suchow@post.harvard.edu. 82