A CASE STUDY OF HUMAN
INTERACTION NETWORKS IN QATIF, SAUDI ARABIA"
MASSACHUSETTS INSTIfUTE
OF TECHNOLOGY
"RIOTS AND SOCIABILITY:
OCT 0L7
BY
MICHAEL ANGELO GRECO
B.S.
III
LIBRARIES
COMPUTER SCIENCE, MATHEMATICS, ART
UNIVERSITY OF WISCONSIN MADISON, 2006
SUBMITTED TO THE DEPARTMENT OF
URBAN
STUDIES AND PLANNING AND THE
ENGINEERING SYSTEMS DIVISION
IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREES OF
MASTER IN CITY PLANNING
AND
MASTER OF SCIENCE IN TECHNOLOGY POLICY
AT THE
MASSACHUSETTS INSTITUTE OF TECHNOLOGY
SEPTEMBER 2014
02014
- MASSACHUSETTS INSTITUTE OF TECHNOLOGY ALL RIGHTS RESERVED.
Signature redacted
Signature of Author
Department of Urban Studies and Planning
ms Division
e
Te
bey 3,
2014
Signature redacted
Certified by
Professor o h
Signature redacted
,
Accepted by
Carlo Ratti
Planning
Studies
and
of
Urban
actie, Department
Thesis Supervisor
Dennis Frenchman
Professor of Urban Design and Planning
Chair, MCP Committee Department of Urban Studies and Planning
Accepted
by
Signature
(
redacted
'
Dava J. Newman
Professor of Aeronautics and Astronautics and Engineering Systems
Director, Technology and Policy Program
"Riotsand Sociability: A Case Study of Human Interaction
Networks in Qatif Saudi Arabia"
by
Michael Angelo Greco III
Submitted to the Department of Urban Studies and Planning and the Engineering
Systems Division in partial fulfillment of the requirements for the degrees of Master
in City Planning and Master of Science in Technology Policy
ABSTRACT
Since the onset of the Arab Spring in late 2010, waves of political activism have reverberated across much of the Arab world. A growing body of literature has emerged that
explores how new communications and social media technologies have contributed
to, and in certain cases instigated various forms of collective action. However, little
research has examined the effect of these activities on communication patterns themselves. This thesis aims to investigate the reorganization of sociability under civil duress
at an aggregate, urban scale.
The study employs a novel approach to communications analysis, applying the Synthetic Control Method to estimate the causal effect of riots on different characteristics
of human interaction within Qatif, Saudi Arabia, after an exogenous shock triggered
a surge in public demonstrations. The analysis reveals a strong, statistically significant drop in total call volume, relative to other cities in Saudi Arabia. This is combined with a similarly strong and statistically significant drop in unique daily callersdemonstrating that people weren't only making fewer calls, fewer people were participating in the telecom network each day. Interestingly, daily phone activity is shown
to increase within the subnetwork of users identified to hold strong spatiotemporal
ties to the city, even though their total activity measures (which include connections
both internal and external to the subnetwork) remain constant. This suggests a shift
in callee preference for individuals who are more directly affected by urban unrest.
Lastly, information transmission tests are performed on Qatif's pre and post treatment
interaction networks. Initial research shows that-beyond a 26% diffusion thresholdinformation reaches more people faster through the post treatment network. This provides some support to the hypothesis that communities under duress intelligently reorganize communications to increase dissemination speed and breadth, however, further
research will be required to refine these findings and demonstrate a causal link.
Thesis Supervisor: Carlo Ratti
Title: Professor of the Practice, Department of Urban Studies and Planning
3
Contents
I
INTRODUCTION
2
DATA AND PROCESSING
'5
15
Call Detail Records
2.2
Tweets . . . . . . . . . . . . . . . . . . .
17
2.3
City Selection and Data Aggregation . . . .
20
2.4
Data Limitations . . . . . . . . . . . . . .
20
.
.
.
. . . . . . . . . . . .
.
2.1
1
METHODS
3.'
Synthetic Control Methods.. . . . .
21
. .
.
3
9
4
ANALYSIS: CALL BEHAVIOR
25
5
ANALYSIS: INTER AND INTRACITY CALLING PATTERNS
37
38
5.2
Urban Call Counts . . ..
. . . . . . . . .
39
8
.
43
ANALYSIS: TWITTER ACTIVrTY
6.1
Geotagged Activity . . ..
. . . . . . . . . . .
43
48
7.1
Location Estimation for Non-Geotagged Tweets
.
48
7.2
Communication Networks . . . . . . . . . . .
.
FUTURE DIRECTIONS
52
7.3
.
7
.
Location Identification . . . . . . . . . . .
.
6
5.1
Religiosity . . . . . . . . . . . . . . . . . . .
56
6o
DISCUSSION
4
APPENDICES
64
A
APPENDIX: CALL BEHAVIOR
65
B
APPENDIX: INTER AND INTRACITY CALLING PATTERNS
69
C
APPENDIX: TWITTER ACTvITY
74
81
REFERENCES
5
Listing of figures
i.o.i Protest Images From QatifFollowing Ahmad al-Matar's Death. Found
at: http://khaleejsaihat.com/web3/showthread.php?t=129754
.
14
2.1.1 Geographic Distribution of Cell Towers in Saudi Arabia . . . . . . .
16
.
. .
2.1.2 Left: Service Type Histogram, Right: Service Detail Description Histogram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.1.3
17
Phone Activity Timeline Over Study Period (Top), Daily Phone Activity Timeline of Saudi Arabia, Dec.
12th (Bottom)
. . . . . . . .
18
2.2.1 Tweet Timeline Over Study Period (Top), Daily Tweet Timeline of
Saudi Arabia, Dec. 12th (Bottom) . . . . . . . . . . . . . . . . . .
4.0.1 Daily call distributions for Dec.
21st and Dec.
19
28th for All KSA
govornerates (Left), and Qatif (Right) . . . . . . . . . . . . . . . .
26
4.0.2 Trends in total daily network activity, Qatif vs. Other Saudi Governorates, Dec. 20th - Jan. 3rd (Top), and Trends in Average Daily Call
Duration, Qatif vs. Other Saudi Governorates, Dec. 20th -
Jan.
3rd
(Bottom). "Treatment" indicated by dashed pink line . . . . . . . .
27
4.0.3 Trends in Total Network Activity, Qatif and Synthetic Qatif (Left),
and Total Network Activity Gap Between Qatif and Synthetic Qatif
(Right) . . . . . . . . ..
. . . . . . . . . . . . . . . . . . . . . .
29
4.0.4 Trends in Average Daily Call Duration, Qatif and Synthetic Qatif
(Left), and Average Daily Call Duration Gap Between Qatif and Syn-
thetic Qatif (Right)
. . . . . . . . . . . . . . . . . . . . . . . . .
31
4.0.5 Trends in Unique Callers, Qatif and Synthetic Qatif (Left), and Gap
in Unique Callers, Qatif and Synthetic Qatif (Right)
6
. . . . . . . .
32
4.0.6 Synthetic Control Placebo Tests with Sabya.
Total Daily Network
Activity (Left), Average Call Duration (Middle), and Daily Unique
Callers (Right) .....
....
.. ... ... . ....
. ..
..
4.0.7 Across-Unit Placebo Tests: Total Activity (all, SOOx or less,
less, Sox or less) . . . . . . . . . .
32
roox or
. - - . . . . . . . . . . .. .
34
4.0.8 Across-Unit Placebo Tests: Daily Unique Callers (all, SOOx or less,
oox or less, 5ox or less) . . . . . . . . .
. . . . . . . . . . . . . .
35
4.0.9 In-Time Placebo Tests with Qatif. Total Daily Network Activity (Left)
and Average Call Duration (Right) . . . .. . . . . . .
. . ..
.. 36
5.2.1 Trends in standardized intra (top), inter-in (middle), and inter-out
(bottom) call volumes daily network activity, Qatif (solid) vs. other
Saudi governorates (dashed). Dec. 20th - Jan. 3rd. Treatment indi-
cated by dashed pink line
. . . . . . . . . . . . . . . . . . . . . .
5.2.2 Trends in Intra Call Volumes
. . . . . . . . . . . . . . . . . . . .
5.2.3 Trends in Inter-In Call Volumes . . . . . . . . . . . . . . . . . . .
5.2.4 Trends in Inter-Out Call Volumes . . . . . . . . . . . . . . . . . .
40
41
41
42
6.1.1 Trends in standardized daily Tweet volume (top), Tweet length (middle), and Tweets per user (bottom), Qatif (solid) vs. other Saudi governorates (dashed). Dec. 20th - Jan.
dashed pink line . . ..
3rd. Treatment indicated by
. . . . . . . . . . . . . . . . . . . . . . .
44
6.i .2 Trends in Total Tweet Activity, Qatif and Synthetic Qatif . . . . . .
6.i .3 Trends in Average Tweet Length, Qatif and Synthetic Qatif . . . . .
6.x .4 Trends in Tweets Per User, Qatif and Synthetic Qatif . . . . . . . .
45
7.1.' Trends in Total Tweet Activity, Qatif and Synthetic Qatif . . . . . .
52
7.1.2 Trends in Average Tweet Length, Qatif and Synthetic Qatif . . . . .
7..3 Trends in Tweets Per User, Qatif and Synthetic Qatif . . . . . . . .
53
45
46
53
7.2.1 Total Degree Distribution (Left), and Edge Weight Distribution (Right)
of the Complete Reciprocated Network, KSA . . . . . . . . . . . .
7.2.2 Fraction of Infected Nodes as Function of Time (Top), Number of
54
infected Nodes at each instance of t (Middle), and Distributions of
Edge Weights Responsible for Infection . . . . . . . . . . . . . . .
7.3.1 Daily Network Activity Distributions from Jeddah (Western Saudi
57
Arabia), Riyadh (Central Saudi Arabia), and the Eastern Region
58
7
. .
7.3.2 Trends
in Daily Prayer Time Disruption, All KSA Cities. Qatif drawn
in pink.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
A.o. i Total Network Activity, Qatif and Synthetic Qatif (3 Weeks) . . . .
59
67
A.o.2 Number of Unique Daily Callers, Qatif and Synthetic Qatif (3 Weeks)
68
B.o.i Intra Call Activity Synthetic Control Placebo Test with Samteh (Left),
In-time Intra Call Activity Placebo with Qatif (Right) . . . . . . . .
70
B.o.2 Daily Local Call Activity . . . . . . . . . . . . . . . . . . . . . . .
71
B.o.3 Across-Unit Placebo Tests: Intra Call Activity (all, 2Ox or less, iox or
less, 5x or less) . . . . . . . . . . . . . . . . . . . . . . . . . . . .
72
B.o.4 Intra Call Activity, Qatif and Synthetic Qatif (3 Weeks) . . . . . . .
73
C.o. i Tweets Per User Synthetic Control Placebo Test with Ahad Rufaydah
(Left), In-time Tweets Per User Placebo with Qatif (Right)
. . . . .
76
C.o.2Across-Unit Placebo Tests: Tweets Per User (all, 50x or less, 2ox or
less, 5x or less) . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8
77
Introduction
This paper seeks to explore how social unrest affects broad-scale sociability in a city
or region. Since the Arab Spring in the early
2oios,
there have been waves of politi-
cal activism across much of the Arab world, including the Kingdom of Saudi Arabia.
A growing body of literature has developed to investigate how new communications
and social media technologies have contributed, and in certain cases instigated various forms of collective action. A few studies have considered the impact of mobile
phone access in facilitating collective action, though most have narrowed in on the effects of a specific emerging technology, like Twitter or Facebook. Furthermore, these
studies follow a wide range of methods that can be broadly grouped into the following
three categories: qualitative approaches that relied on survey data and expert interviews; quantitive approaches that characterized the nature of communication patterns
through these new media outlets; or more advanced analytical methods that sought to
isolate the role various communications media played during social unrest. Examples
of these types of research will be discussed in brief over the remainder of this section.
9
Expert interviews are a popular method for subjectively exploring the impact of social connectivity in urban environments. Tufekci et al. examined the protests in Egypt
by surveying participants in Tahrir Square. They argued that respondents who used
social media were much more likely to attend the demonstrations on the first day.
They further noted that approximately half of those questioned spread media from the
protests online, and that in many cases communication through social media, mobile
phones and face-to-face conversations superseded the role of traditional news media
during the protests. Thus, the authors concluded that social technologies were critical in diffusing information related to, and encouraging involvement in the activist
intervention Tahir Square [I]. Similarly, Breuer et al. used a qualitative approach to
characterize the role of social media in the Tunisian Revolution. Using a combination
of expert interviews with protest participants and preference survey data from Tunisian
internet users, the authors claimed that new social media forms helped overcome a
government-enforced media blackout by enabling activists to broadcast information
on the movement. Further, they found these collaborative technologies facilitated the
emergence of partnerships between activist groups, and encouraged a kind of 'emotional mobilization' through the depiction of the regime's atrocities in the uncensored
content [2].
Qualitative research has been a popular method for identifying a symbiosis between online communication technologies and social unrest. Another body of research pushed
the relationship between social media and collective action further by quantifying aspects of user behavior in response to specific conflict events.
Lotan et al compared
and contrasted Tweet broadcasting and news consumption in Tunisia and Egypt [3].
At a high level, this study reiterated the importance of social media as a communications medium during periods of cultural instability. Furthermore, they built off the
existing state of the literature by highlighting important differences and similarities in
information flow across cultures
[31.
They found that tweets produced by both in-
dependent activists and mainstream news outlets produced a larger responses in Egypt
than Tunisia. On the hand, Tunisians responded more to news disseminated by blogger
sites, while both countries showed a heavy reliance on information spread by journalists. This research suggests a link between social media platforms and city's cultural
fabric, even though pinpointing this exact relationship eclipsed the scope of the paper.
IO
The papers by Szell et al. and Bagrow et al. examined how the social temperature
of a city or region manifested itself in communication patterns. Szell et al. studied
the nature of collective reaction to major events like presidential elections, sports tournaments, and weather abnormalities on Twitter. They found a negative correlation
between users' excitation and message length.
[41. Bagrow et al. also explored how
emotions impacted communication patterns. They used geotagged mobile phone call
records gathered from cities in emergency states. They found that spikes in communications volume were spatially and temporally bounded, and asserted that affected
individuals "will only invoke the social network to propagate information under the
most extreme circumstances"
[51. On the other hand, even though Szell et al. and
Bagrow et al's research suggested that the importance of communication mediums increased during periods of social excitation, not all elements of social media played a
strong role in these events. Aday et al. investigated the role of specific features of social
media in intra-country collective and regional diffusion. They limited their analysis to
exploring how bit.ly links-a URL shortening service popular on Twitter-were used
to share and consume relevant protest data [6]. They found no evidence that the service
played a significant role in Tunisia, Egypt, Bahrain, or Libya.
A final subset of papers on social media and sociability posit that a causal relationship
exists between communication technology market penetration and the probability of
social action. Pierskalla et al. explored the effects of mobile phone access on collective action in Africa. The authors employed UCDP conflict data from 1989 to 2010
alongside data on mobile phone coverage data from the GSMA. Using a number of
binary dependent variable models, they showed the availability of mobile phone coverage significantly and substantially increased the likelihood of violent conflict in the
region [7]. They further concluded that the adoption of communication technologies
produced intrinsic changes in a city or region's communications patterns and willingness to collaborate. They also noted that the ability to assemble did not automatically
yield a social good-in their case it facilitated violent collective action and increased
overall instability in the study region.
Most of the research on communication technologies and collective action has focused on the utilization of social media and mobile phones. Researchers tend to agree
II
that these new technologies facilitate collective action, although there are disputes over
the scale of their impact. Researchers also agree that engagement attributes measured
through these technologies change to reflect states of heightened tension. However, few
studies have addressed the broader social implications of these communication devices.
This analysis builds on existing studies by estimating the causal effect of civil unrest
on various characteristics of the human interaction networks as expressed through Call
Detail Records (CDRs), and social media as expressed through Twitter messages. Using Synthetic Control Methods, differences in call and tweet behavior are calculated
between a region experiencing riots (Qatif, Saudi Arabia-the treated unit), and regions that do not experience riots (other cities in Saudi Arabia-the control units).
Surprisingly, the results show that total daily network activity across the city of study
significantly decreases with treatment, while call activity within the subset of individuals who hold strong spatiotemporal ties to the region significantly increases. The effects
on average call duration, and inter-city call volumes remain inconclusive. When examining tweets, the number of Tweets per user, per day drastically increases at the
symbolic climax of the demonstrations, but placebo tests indicate that this finding is
not robust. Lastly, preliminary information transmission tests performed on city-scale
interaction networks shows that information reaches more people faster during the post
treatment period, however, further research will be required to refine these findings and
demonstrate a causal link.
CONTEXT: THE DEATH OF AHMAD AL-MATAR
Large-scale public demonstrations have plagued Saudi Arabia since February
2011.
The
protests have been mainly concentrated in the oil-rich Eastern Province. The Eastern
Province, and the city of Qatif in particular, is home to the largest proportion of Shiites,
members of the minority denomination of Islam in Saudi Arabia. Sunni-Shia relations
have a history of tension in the nation, and, following the early events of the Arab
Spring in late
2010,
sparks of unrest began to reignite. A movement began to coalesce
around the early demonstrations of 2011, with protesters calling the release of political
prisoners, freedom of expression and assembly, and an end to widespread discrimination against Shiites. In February, after seven young shiites were killed, the country
experienced its largest collective uprising since 1979 [8]. Protests then occurred on an
12
I
almost regular basis before cooling off through the spring and early summer of 2012.
This period of relative tranquility was broken on July 8th with the shooting and arrest of Nirm al-Nimr [8], a Shia Sheikh and outspoken leader of the movement, which
re-escalated tensions and sparked a new wave of demonstrations [9]. Large protests
engulfed the city Qatif and quickly became violent, leading to the deaths of two more
protestors [ro]. Security forces began to crack down on dissidents, pursuing 23 men
whom the government claimed were wanted for inciting unrest in Qatif. Raids in late
September brought about the deaths or injuries of several of these men.[i 1]
As time passed, younger activists in the region began to incorporate protest tactics employed by Bahraini youth [io], which included the nightly burning tires on the roads
around the city. It was at one of these demonstrations near midnight of December
2 7 th that teenage Shia activist Ahmad al-Matar was shot dead by security forces. In
spite of little media coverage, reports indicate the protest was.held to demand the release of political prisoners, and several other protesters were injured and/or arrested.
[ 12]. This event set in motion a wave of riots and demonstrations throughout the city
that culminated in a funeral procession on December 31st with an estimated crowd of
so,ooo [io]. Activists also took to Twitter, starting a campaign that used the Arabic
hashtag "We All Are Qatif" [13].
Since protests occur on a regular basis in Qatif, the study has been framed around
al-Matar's death as an exogenous 'treatment' applied to the social fabric of the city.
13
Figure 1.0.1: Protest Images From Qatif Following Ahmad al-Matar's Death. Found at:
http://khaleejsaihat.com/web3/showthread.php?t=1 29754
14
2
Data and Processing
2.1
CALL DETAIL RECORDS
CALL DETAIL RECORDS (CDRs) are the primary dataset of interest in this study. The
records cover
12/03/12 to
1/3/13. They were obtained from a Saudi telecommunica-
tions agency that offers mobile services to the Kingdom. Cellular activity is one of the
most powerful real-time sensing mechanisms currently available to us; the ubiquity of
digital devices allow us to capture extremely high-resolution traces of humanity across a
variety of dimensions. Saudi Arabia's mobile phone penetration is above 198%-an astonishing figure suggesting that many across the Kingdom own more than one mobile
device.
The data cover the entire country of Saudi Arabia, with over 100 million daily network
connections to over 1o thousand unique cell towers, with approximately 18 million
'5
Figure 2.1.1: Geographic Distribution of Cell Towers in Saudi Arabia
unique phones. The CDR dataset consists of anonymous location measurements generated each time a device connects to the cellular network. Each anonymized record
holds a precise time and duration measure for the connection, the caller's location (by
cell tower), the 'service type', and 'service detail description.' The service type consists
of an identifier that logs the type of origin and destination telephones. The 'service detail description' describes the record's type of communication, e.g. voice, data, SMS,
etc. There are 486 unique codes, however only
13 5 appear in the dataset.
SMS activ-
ities are among those excluded. Internet requests were found to hold broken spatial
identifiers and were consequently excluded. Thus, for the purposes of this study, all
but voice activity have been culled from the dataset.
To construct the composite data table, all tower pings were summed per city, per day
to arrive at a total activity measure. Average daily call duration, the number of unique
callers, and the number of calls per individual were constructed in a similar fashion.
Lastly, these results were combined with population statistics obtained from KSA's
i6
Figure 2.1.2: Left: Service Type Histogram, Right: Service Detail Description Histogram
Ministry of Economy & Planning, Central Department of Statistics & Information,
Department of Analysis & Reports. The final dataset includes percentages of men and
women, Saudis and non-Saudis, for each governorate from the year 2010.
The top panel of Figure 2.1.3 presents a snapshot of daily activity across the nation. To
capture activity at a more granular scale, a daily histogram of call activity was recorded
at I 5-minute intervals over the course of each day (shown in the bottom panel). Each
day follows a very stable pattern of low early-morning activity, a mid-day peak around
1:oopm, a lull in calls until approximately 3:oopm, and a daily maximum between
6:oo and 7:oopm. The overall stability of these plots suggests that it may be possible to
detect to a city-wide disruption.
2.2
TWEETS
The second dataset consists of messages posted on Twitter from Saudi Arabia over the
study period (12/20/12 to 1/3/3). Twitter is a social networking service that allows
users to share and read 'Tweets,' which they define as expressions of a moment or idea
[14].
Tweets are limited to 140 characters and can be posted through a website in-
terface, text message, or mobile app. Since its founding in 2006, Twitter has become
increasingly popular across the globe. While its usage varies from country to country,
A 2012 survey of European and Middle East markets conducted by GlobalWebIndex
17
10 x 10
9
8
7
12/20
12122
12/24
12/26
12/30
12/28
01/01
01/03
x 10
12
00 00
08:00
16:00
00:00
Figure 2.1.3: Phone Activity Timeline Over Study Period (Top), Daily Phone Activity Timeline of Saudi
Arabia, Dec. 12th (Bottom)
found that 51% of Saudi Arabian internet users are active on Twitter-the highest penetration rate of any locale in their report [15]. Saudi Arabia was also found to hold the
fastest rate of growth over much of 2012.
The Twitter dataset used in this study is comprised of geotagged Tweets from Twitter's Decahose service, which provides "statistically valid sample of at least 10% of all
Tweets, selected at random"[ 16]. Each record contains a user identifier, message (up
to 140 characters long), timestamp, and location (represented as a pair of latitude and
longitude coordinates).
These messages were posted by users who chose to include
additional locational metadata with each Tweet. Roughly 0.00 1% of the total Tweet
stream is geotagged. For the Saudi dataset-tweets corresponding to the cities under
18
comparison-this amounts to about 160,00o
tweets from 62,00o unique users with an
average message length of 70.5 characters. While this dataset is meager in comparison
to CDRs, a method for estimating locations for non-geotagged Tweets is presented in
Chapter 7.
2x-104
1.5
12/20
12/22
12/28
12/26
12/24
12/30
01/01
01/03
150
100
50
0
00:00
16:00
08:00
00:00
Figure 2.2.1: Tweet Timeline Over Study Period (Top), Daily Tweet Timeline of Saudi Arabia, Dec. 12th
(Bottom)
Unlike the phone activity shown above, daily tweet distributions shown in Figure
2.2.1
are significantly noisier and don't appear to follow an obvious pattern. It may be harder
to characterize daily patterns and detect changes introduced by an exogenous shock.
19
2.3
CITY SELECTION AND DATA AGGREGATION
To isolate the treatment effects on Qatif, additional Saudi cities were used as units of
comparison. All cities that held a population of over 100,000 were selected, amounting
to a total of 40 urban governorates (including Qatif). A city's collection of cell towers
was identified by intersecting all towers with its geographic boundaries. The call records
were then clustered by tower ID. The geotagged tweets were simply aggregated by city
boundaries.
2.4
DATA LIMITATIONS
After exploring the basic properties of the phone call and tweet datasets it's worth underscoring their limitations.
Both are susceptible to some degree of sampling bias.
Regarding the phone records, the data lack explicit market share values per city. While
this could be calculated from indirectly through demographic indicators, it remains
difficult to know how representative to the location the sample is. In spite of concerns
related to possible sampling biases of CDRs [17], they remain one of the most comprehensive data sources available in representing large-scale human interaction. The
geotagged tweets, on the other hand, capture high spatial resolution, but the population coverage is not nearly as high as the CDRs. Data from Twitter are also liable to
other demographic biases, as sampling individuals who participate in online media is
inherently biased towards groups who have access to the internet. Lastly-and this is
true of both Tweets and CDRs-a user may not be tied to a single individual, and a
single individual may not be tied to a single user. It's entirely possible for one person
to hold multiple accounts (e.g. an individual owning a phone for business and personal use), as it's possible for a group to communicate through one account (e.g. a
company utilizing bots for automated calling or Tweeting). This should not be detrimental to this study due to the level of aggregation in much of the analysis. However,
care has been taken to eliminate this bias in specific instances which examine narrower
subpopulations.
20
-I
3
Methods
3.1
SYNTHETIC CONTROL METHODS
Synthetic Control Methods (SCM) is used as the primary methodological tool in this
study. The statistical technique was developed by Abadie et al. as a means to investigate causal inference in comparative case studies with aggregate data [ 18]. SCM was
primarily developed as a means to assess the impact of policy interventions that are
applied at an aggregate level, or the effects of a 'treatment' that has been implemented
at an aggregate scale (e.g over a country, region, or city), to a small number of units.
The traditional approach to comparative case studies of this nature is to use a control
group's outcome to approximate the outcome that would have been observed for the
treatment group in the absence of treatment. The choice of control units is typically at
the researcher's discretion, which has aroused questions over whether or not the control can be interpreted as a plausible counterfactual. It is also difficult to find a single
untreated unit that appropriately approximates the unit that has received treatment.
21
11
SCM overcomes this by implementing a data-driven selection process for the control
group, offering a much more empirically-defined means of inference. SCM incorporates a weighted combination of units to better approximate the unit that has been
exposed to the treatment [19].
The method was first used to examine the economic effects of conflict in the Basque
Country, where the the authors found that after an outbreak of terrorism in the
1970s,
the region's per capita GDP declined about 1o percent [18]. SCM was then applied to
California's Proposition 99, a cigarette tax enacted in 1988. The authors estimated that
in 2000 the annual per-capita cigarette sales were roughly 26 packs lower than what
they would have been without the tax [20]. This study represents the first time this
methods has been applied to data on daily timescale.
In this study the synthetic control approach will be employed to select a combination of
urban governorates (a special Saudi designation for cities at the second level of regional
administration within the country) to construct a better comparison for the exposed
governorate to the treatment than any single governorate alone. The potential controls
were chosen from a list of all Saudi governorates that had a population greater than
100,ooo, as of 201o
in the official census.
Qatif is the treated unit, as the riots were concentrated there. Other cities in the Eastern
Province may have experienced heightened unrest during the treatment period as well.
With this in mind, these cities were included in the donor pool, but the synthetic control method did not make significant use them in its construction of synthetic Qatif.
This permits the assumption of no interference between units-violations of the stable
unit treatment value assumption. Additionally, it is assumed that the treatment has no
effect on the outcome variables before the implementation period. However, this may
be a strong assumption since, as stated previously, the Eastern Province has been experiencing unrest since February 2011. Following Abadie et al.
out as follows:
For units i = 1,... , J+
1 and time periods t = 1, ...
22
, T, let:
[21],
the model works
* To be the number of pre-treatment periods with 1 < To < T
* Yt
e
be the dependent variable for unit i at time t in the absence of treatment
YI be the dependent variable for unit i at time t if unit i is exposed to treatment
in period To + 1 to T.
Only the first city (Qatif), i = 1, is exposed to treatment after period To, thus:
Dit = 1 if i = 1 and t > To, 0 otherwise.
The observed outcome for unit i at time t is Yt = Yt
is:
act = Y t - Yf
an estimate of Y
= Yit - YN for
+aitDit . The desired estimate
t > To. Yt can be observed, so to estimate ait
is required, which can be given by the factor model:
t tZi +Atl ti i
O
N
where 6 t represents the unobserved common time-dependent factor;
6t is a vector of
unknown parameters; Zi is a vector of observed covariates not affected by the treatment; At is a vector of unobserved common factors;
i is a vector of unobserved covariates, and Eit are error terms representing unobserved transitory shocks. ait will
be unbiased if a (Jx1) vector of weights W = (w2, ... , wJ+1)' is chosen such that
wj
>
0(j
= 2, J+ 1) and w2 +...
+wJ = 1 where each particular value of the vector
)
W represents a potential synthetic control. Suppose that there are (w2, ... , w*
1
J+1
J+1
E w Yji =
j=2
J+1
Y11, . . , E w jjTo = Y1T, and 1 w Zj = Zi
j=2
j=2
The synthetic control units are selected such that this equation can hold approximately
given an appropriate number of pre-treatment time periods.
Let J be the number of available control units and W = (w2, ... , wJ+i)' be a
vector of nonnegative weights which sum to i. The scalar wj(j = 1, ...
weight of region
j
(Jx1)
, J) is the
in synthetic Qatif. Let Xi be a (Kx1) vector of pre-treatment char-
acteristics for the treated unit Qatif. Let X0 be a (KxJ) matrix which contains the
values of the same variables for the J possible control governorates. A vector of weights
23
W* is chosen to minimize ||X1 - XoW||v =
where wj > 0(j = 2, ...
, J+ 1) and w 2
((X1
- XoW)'V(X1 - XOW))
+ ... + wj = 1. V is a diagonal matrix
(kxk) that assigns weights to linear combinations of the variables in Xo and X 1 to
minimize the mean square prediction error (MSPE) of the synthetic control estimator.
24
U.
4
Analysis: Call Behavior
As an introductory sanity check, daily call distributions were computed for Qatif by
itself, and all other governorates in aggregate for the days of December
21st-the
first
pre-treatment Friday in the dataset-and December 28th-the first post-treatment Friday in the dataset. The first figure depicts the daily distributions for all governorates
except Qatif and shows very little variation over the course of the day (Figure 4.0. 1),
while the second figure demonstrates a clear decrease in combined activity between
gam and 9pm (Figure
2).
Figure 4.0.2 plots the trends in total network activity (top) and call duration (bottom) in Qatif and the rest of the governorates in the KSA. There exists some similarity
between the plots during the pretreatment, and with some divergence from the treatment day onward.
However, it remains difficult to judge how closely the aggregate
group cities compare with Qatif. The plot of average call duration is even harder to
25
0000
--r
8-
.
4
7
-
ATI KSA
All KSA
C16.oo
100
TIO.>
m3 P-Tres511.t
100.000. POgt-T 00 ,l
1
6
-
r
e1
m
-Traatms,1
3000
6
5
-2500-
45
2
.
:0861 ry6 ng1-1600
3000
2000
-1000
500
-
Figure 4.0.1: Daily call distributions for Dec. 21st and Dec. 28th for All KSA govornerates (Left), and Qatif
(Right)
compare visually, but it's worth noting that the pronounced jump in Qatif's average
call duration at the beginning of the post-treatment period, before it again falls below
the country-wide average.
Synthetic Qatif is constructed as a weighted average of potential control cities with
weights chosen so that the result best reproduces the values of a set of predictors of
sociability before the riots began on December 28th. The initial variables of interest
are: total daily network activity, daily unique users, and the average duration of daily
calls.
Using the synthetic control method described previously, Two distinct synthetic Qatifs are constructed such that they that mirror the selected predictors. The treatment
effects are then estimated as the differences between Qatif and its synthetic versions in
the days following.
The predictors of total daily network activity, daily unique callers, and average call
duration are the same. They include:
I. total network activity in the week before treatment, or daily unique callers in the
week before treatment
2. average duration of calls in the week before treatment
3. percent of males as obtained from the Department of Analysis & Reports in
2010
26
1.5
--
T
II
I
- 8 - All Cities
1 .
0.5
Qatif
*
--
C
,
-
0
-0
-1.5 -o
-3
-2-
Q
-2.5 --
20
%
19
21
22
23
24
25
26
27
28
29
I
I
28
29
30
31
01
02
03
30
31
01
02
03
04
Day
Qatif
--
0.5-
-3
I
I
I
22
23
24
I
-
-1.5
219
20
21
25
26
27
Day
04
Figure 4.0.2: Trends in total daily network activity, Qatif vs. Other Saudi Governorates, Dec. 20th - Jan.
3rd (Top), and Trends in Average Daily Call Duration, Qatif vs. Other Saudi Governorates, Dec. 20th - Jan.
3rd (Bottom). "Treatment" indicated by dashed pink line
4. percent of Saudis, also obtained from the Department of Analysis & Reports in
2010
Table 4.0.1 displays the balance between
Qatif and
27
synthetic
Qatif
for the outcome
Table 4.0.1: Total Daily Activity Predictor Means
Variables
avgDuration (days 1-7)
totalActivity (days
percentMen
percentSaudi
1-7)
Treated
Synthetic
Sample Mean
122.141
12.400
122.208
137.957
12.402
13.059
0.545
o.870
0.570
o.6oo
0.822
o.855
Table 4.0.2 Total Daily Network Activity Governorate Weights in Synthetic Qatif
Governorate
Weight
Al Bahah
0.010
Ar ar
0.007
Sakaka
Yanbu Al Bahar
Buraydah
0.007
Al Kharj
0.005
Khamis Mushayt
Ha il
0.004
Sabya
Al Qunfidhah
0.435
o.080
Najran
0.002
Tabuk
Ad Dammam
Governorate
Ar Rass
Unayzah
Weight
0.000
0.007
Al Riyadh
Ad Duwadimi
0.000
Al Majmaah
Al Quwayiyah
0.009
Abha
Ahad Rufaydah
Al Majardah
0.003
o.oo6
o.oo6
0.001
Bishah
Al Qhazaiah
Abu Arish
0.003
Ahad Al Masarihah
Al Ahsa
Al Jubayl
Al Khubar
Haft Al Babin
0.003
Jizan
0.004
Samtah
o.oo8
o.oo6
o.oo8
0.004
0.002
o.oo6
Al Taif
Al Lith
Al Quryyat
0.013
Jiddah
0.008
Medina
0.004
Mecca
0.004
Muhayil
0.003
0.004
0.004
0.003
0.010
0.000
0.005
0.003
0.311
o.oo8
variable total network activity. As explained previously, the value V was chosen to
minimize the MSPE during the pre-treatment week; the values associated with total
network activity, average call duration, and percent men was the largest (see Appendix
A). Table 4.0.2 displays the weights of each control city in synthetic Qatif. Synthetic
Qatif is largely composed of a combination of Sabya and Abu Arish. Both Sabya and
Abu Arish are more underdeveloped than Qatif. Sabya, like Qatif, is known for having
a higher concentration of Shiites than most cities in Saudi Arabia.
28
-I
N8
2
4
6
8
10
-6t~i~l
-
tO--------------
14
12
4
2
day
8
8
10
12
14
day
Figure 4.0.3: Trends in Total Network Activity, Qatif and Synthetic
Qatif (Left), and Total
Network Activity
Gap Between Qatif and Synthetic Qatif (Right)
Qatif relative to synthetic Qatif in the pre and post treatment periods. Total network activity of synthetic
The first panel of Figure 4.0.3 shows the total network activity in
Qatif closely
tracks real
Qatif in the
week before the riots.
relative good balance in Table 4.0.1 suggests that synthetic
proximation of the total network activity in
Qatif
fhis data, along with the
Qatif provides
a good ap-
before treatment. To assess if this
approximation holds throughout the pre-treatment period, the graph was extended an
additional week prior to treatment (see Figure B.o.4 in the Appendix). After Ahmad
al-Matar was shot at midnight of December 2 7 th (the end of day 7, beginning of day
8), the two
for actual
lines begin to diverge substantially, with total network activity decreasing
Qatif.
activity between
had a
The second panel of Figure 4 plots the daily gaps in total network
Qatif and
its synthetic counterpart and suggests that the treatment
large effect on total activity for the following few days but, as expected, the effect
is not sustained.
29
Table 4.0.3: Daily Average Call Duration Predictor Means
Variables
Treated
Synthetic
Sample Mean
avgDuration (days I-7)
122.141
122.153
137-957
totalActivity (days 1-7)
12.4
12.402
13.059
0.545
0.551
o.6oo
o.870
o.839
0.822
percentMen
percentSaudi
Table 4.0.4: Daily Average Call Duration Governorate Weights in Synthetic Qatif
Governorate
Weight
Weight
0.005
o.oo6
0.001
o.oo8
o.oo6
o.oo6
Al Bahah
0.007
Governorate
Ar Rass
Ar ar
o.oo8
Unayzah
Sakaka
0.007
Yanbu Al Bahar
0.003
Al Riyadh
Ad Duwadimi
Buraydah
Al Kharj
0.005
0.005
Al Majmaah
Al Quwayiyah
Khamis Mushayt
0.004
Ha il
Sabya
0.003
Al Qunfidhah
0.009
Abha
Ahad Rufaydah
Al Majardah
Bishah
o.oo6
Najran
0.001
Al Qhazaiah
0.007
Tabuk
0.001
o.826
Ad Dammam
0.003
Al Ahsa
Al Jubayl
Al Khubar
Haft Al Babin
0.005
Abu Arish
Ahad Al Masarihah
Jizan
0.003
Samtah
0.003
0.004
Al Quryyat
o.oo8
Al Taif
Al Lith
Jiddah
Medina
0.005
Mecca
0.004
Muhayil
0.002
0.000
0.007
0.003
o.oo6
0.001
0.005
0.005
0.005
0.004
0.002
Table 4.0.3 displays the balance between Qatif and its synthetic control for the outcome variable of average call duration. As expected under this scenario, the values of
the diagonal element V associated with average call duration pre treatment is the largest;
total network activity was not a predictor at all (see Table A.o.2 in the Appendix). Ta-
ble 4.0.4 displays the weights of each control city in synthetic Qatif for average call
duration. The weights indicate that in this case, synthetic Qatif is constructed mainly
by Abu Arish, although others in the donor pool played a small part.
30
*
- -
OyTl0ed
A
2
4
6
8
10
12
14
2
days
4
6
8
10
A
12
14
days
Figure 4.0.4: Trends in Average Daily Call Duration, Qatif and Synthetic Qatif (Left), and Average Daily
Call Duration Gap Between Qatif and Synthetic Qatif (Right)
The first panel of Figure 5 shows average call duration for Qatif and its synthetic control. The two lines remain closely aligned before and after treatment. This outcome
demonstrates that the current predictors are not good at estimating the causal effect of
treatment on average call duration.
Figure 4.0.5 shows the total number of unique callers per day between Qatif and synthetic Qatif. The plots are similar to the trends in daily network activity, in that both
the synthetic and real plots look consistent leading up to the treatment event, after
which they deviate considerably.
Table 4.0.5 displays the balance between Qatif and synthetic Qatif for Daily Unique
Callers. Table 4.0.6 displays the weights of each control city in synthetic Qatif. Sabya
and Abu Arish again make up the majority of the counterfactual.
3'
Go
----------------
-I
0
2
B
E
4
10
12
14
2
4
0
8
day
10
12
14
day
Figure 4.0.5: Trends in Unique Callers, Qatif and Synthetic Qatif (Left), and Gap in Unique Callers, Qatif
and Synthetic Qatif (Right)
Table 4.0.5: Daily Unique Callers Predictor Means
Variables
Treated
Synthetic
Sample Mean
uniqueUser (days 1-7)
10.905
10.904
11.210
122.141
122.208
137.957
0.545
0.570
o.6oo
o.870
0.855
0.822
avgDuration (days 1-7)
percentMen
percentSaudi
INFERENCE
[
I
SM
F
t
0-p
2
a
6
e
t0
2
+i
z
i
s
a
to
Figure 4.0.6: Synthetic Control Placebo Tests with Sabya. Total Daily Network Activity (Left), Average
Call Duration (Middle), and Daily Unique Callers (Right)
To assess the significance of the estimates, a variety of placebo tests are now conducted.
32
Table 4.0.6: Daily Unique Callers Governorate Weights in Synthetic Qatif
Governorate
Weight
Al Bahah
0.004
Ar ar
0.006
Sakaka
Yanbu Al Bahar
0.007
Governorate
Ar Rass
Weight
0.009
Al Kharj
Khamis Mushayt
Ha ii
Sabya
Al Qunfidhah
0.007
Najran
Tabuk
0.005
0.005
Ad Dammam
o.oo8
Al Ahsa
Al Jubayl
Al Khubar
Haft Al Babin
Al Quryyat
0.013
Unayzah
Al Riyadh
Ad Duwadimi
Al Majmaah
Al Quwayiyah
Abha
Ahad Rufaydah
Al Majardah
Bishah
Al Qhazaiah
Abu Arish
Ahad Al Masarihah
Jizan
o.oo6
Samtah
0.005
0.007
0.006
0.009
0.001
Al Taif
Al Lith
Jiddah
Medina
0.018
Mecca
0.038
Muhayil
0.005
Buraydah
0.005
o.oo8
o.oo78
o.oo86
0.006
0.471
0.003
0.009
0.004
0.004
0.003
0.000
0.006
0.007
0.004
0.006
0.004
0.274
0.005
0.008
o.oo4
0.004
0.005
First, the synthetic control method is applied to Sabya, a city similar to Qatif based on
its daily network activity profile and contributed the highest weight in the synthetic
control method for the outcome variable total network activity. Figure 4.0.6 shows
there is no difference in outcome trajectory between pre and post treatment for Sabya.
Across-unit and in-time permutation tests are now performed. The across-unit placebo
test iteratively assigns treatment status to every other city in the donor pool and applies
the synthetic control method. If the placebo studies create gaps of similar magnitude
to the one estimated for Qatif, the analysis does not provide significant evidence of a
negative effect of the treatment on total network activity.
33
fK
'S
VM
A
'S
11
ow
2
4
6
6
10
12
14
2
4
0
day
e
Pgoo
10
12
14
10
12
14
day
4
r
2
4
a
0
10
12
2
14
4
a
o
day
day
Figure 4.0.7: Across-Unit Placebo Tests: Total Activity (all, 500x or less, 100x or less, 50x or less)
In the four graphs in Figure 4.0.8, the gray lines are the control cities and their divergence from their synthesized analogs, and the black line is same divergence for Qatif.
This helps in assessing whether the estimated treatment effect for the treated unit is
distinguishable from randomness. The top left graph includes all placebo cases. The
top right graph excludes cities whose MSPEs are 20 times greater, the bottom left graph
excludes cities whose MSPEs are io times greater, and the bottom right graph excludes
cities whose MSPEs are 5 times greater. Once cities that do not provide a good counterfactual are excluded, the gaps are smaller in magnitude than Qatif's suggesting that
the result is significant.
34
r
9
4
ri
I
p6
3
4
4
4
merol
2
4
8
8
10
12
2
14
4
8
8
10
12
14
10
12
14
day
day
n
0
s
r
o
d
d
3P
a
,s
,s w
_.
g
4-
Y
-
0
4
4
2
4
8
8
10
12
2
14
4
8
8
day
day
Figure 4.0.8: Across-Unit Placebo Tests: Daily Unique Callers (all, 500x or less, 100x or less, 50x or less)
The in-time placebo test assigns the treatment period to a time t
< To for the ac-
tual treated unit. A To of 4 days before the shooting of al-Matar was chosen. As shown
in the leftmost panel of Figure 4.0.9, total network activity of synthetic Qatif closely
tracks real Qatif during both pre and post treatments, demonstrating that no effect is
detected in the absence of treatment. The middle panel of the figure shows the average call duration of synthetic and real Qatif. Although the actual treatment at day 8
has no effect on average duration, there exists a slight but discernible difference at the
placebo treatment of day 4. Finally, the rightmost panel of unique daily callers shows
a strong correspondence between the synthetic and actual plots over the entire time
period, again signifying no effect.
35
V
f 3a5
5P
12
3
a
6
:
e
s
T
e
Figure 4.0.9: In-Time Placebo Tests with Qatif. Total Daily Network Activity (Left) and Average Call Duration (Right)
36
5
Analysis: Inter and Intracity Calling Patterns
The aggregate activity analysis in the previous section serves as a strong indicator that the
population of Qatif altered their communications patterns in response to the violence
of December 2 7 th. However, little can be said about the nature of this change. The
study will now turn its attention to some of the more nuanced characteristics of call
behavior, namely, the relationships between and across cities. Three distinct properties
of city communications will now be examined:
" Intra call patterns: calls whose source and destination are within the city
e
Inter-in call patterns: calls whose source is outside of the city and destination is
within the city
" Inter-out call patterns: calls whose source is within of the city and destination is
outside the city
37
These measures may provide a better understanding of whether people turn inward to
their communities during times of political duress, or whether they turn outward to
spread news beyond their localities. Due to the structure of the dataset, callee location
must first be inferred to calculate each of the above quantities. To calculate each of the
above quantities per city, dataset. The process is known in the literature as home/work
estimation, and has been used often to explore urban mobility behavior [22] [23] [24]
[25].
5.1
LOCATION
IDENTIFICATION
Initially, the location identification procedure used followed a filtration procedure similar to the one discussed in Phithakkitnukoon et al. [26]. First, all weekend activities
(weekends in Saudi Arabia are Thursdays and Fridays) were culled from the dataset.
Then, call sequences that are too infrequent were removed to focus on meaningful estimates; only calls that were made within a 16 hour time window were included. Day
and nighttime periods were defined as 10:oopm to 6:ooam and 9:ooam to 3:oopm
respectively, and calls were binned accordingly. For each phone user, day and night
locations were ranked by activity (with a small correction for call hops amongst nearby
towers), and representative towers were selected if the user made more than 60% of his
or her calls in this location. Although this helps eliminate false traces, it does limit the
study to users who hold occupations that follow traditional business hours. While this
may encapsulate students, it's likely the jobless, disenfranchised youth that may orient
protest movement have been culled from the sample. On the whole, this filtration
method resulted in well-defined home/work location pairs for roughly 11% of the the
unique identifiers in the full dataset.
Unfortunately, the resulting coverage was too sparse to identify weekly patterns at a city
scale, making the filtration too stringent to capture phenomena related to the protests.
The location identification procedure was then modified to encapsulate a greater sample
size. After all, this study is only interested in a caller's primary city of residence, which
permits a greater degree of leniency in the filtration process. In the modified approach
all unique users are selected from the full, month-long dataset. Then a record of each
user's activities are aggregated per city in the donor pool. To guarantee a meaningful
38
sample of call records, any user who made less than one call every 2 days is culled from
the dataset. Then, for each user, governorates are ranked by total activity. A city is
designated as a user's home location if more than 75% of his or her total calls are made
there. In general, some care should be taken to prevent misidentification due to users
moving or vacationing, but this is not considered in this instance due to the relatively
tight time window of the dataset.
The new dataset consists of roughly 7.3 million users, or about 40% of the total dataset.
5.2
URBAN CALL COUNTS
After subsetting the set of users whose locations were identifiable, intra, inter-in, and
inter-out activity counts are made per city, per day.
Figure 5.2.1 shows the standardized profiles for Qatif (in black), and all other Saudi
cities (in gray) over the study period. Contrary to the total daily call activity plot in
the previous section (Figure4.o.2, one sees a gradual uptick in daily calls from pre to
post treatment weeks. In fact, in terms of week-over-week changes, Qatif experiences a
9.08% increase in intra call activity, against a country-wide 6.64% increase. Similarly,
Qatif sees a 7.81 % increase in inter-in activity and a
10.20% increase in inter-out call
activity, while the other cities in the donor pool experience increase of roughly 7.7%
for both activity types. Looking more closely at the weekly rhythms of the charts, one
can see a clear drop in activity on Fridays, similar to the total activity plot from the
previous section.
SCM is now applied, using the following predictors:
" Log of daily intra city call activity the week before treatment
" Log of daily inter-in call activity the week before treatment
" Log of daily inter-out call activity the week before treatment
" Percent of males as obtained from the Department of Analysis & Reports in
2010
39
3
2
0i
- 0
3
..
-
-
0
-
-
1
-
-
-
-3
-
,
2
1O
2
N
00
p--e--2
-3
-
-
----
-
-
L
3
2
1
-2
l l._jJ Ai
I I Iil
1|
3g19
-
.
20
21
22
23
24
25
26
27
Day
28
29
-e-
30
Qatif -
31
e
- All
01
02
03
04
Other Saudi Govemorates
Figure 5.2.1: Trends in standardized intra (top), inter-in (middle), and inter-out (bottom) call volumes daily
network activity, Qatif (solid) vs. other Saudi governorates (dashed). Dec. 20th - Jan. 3rd. Treatment
indicated by dashed pink line
- Percent of Saudis as obtained from the Department of Analysis & Reports in
2010
Most strikingly, the Figure 5.2.2 shows a marked increase in phone activity immediately
following al-Matar's death. The plot shows a strong increase in intracity call activity at
40
g
I
Ul
)
f:i
1
O
8O
T
O
E
yy.E
O
o
Lb
1
tII
O
I
a
_
I
1
2
4
a
6
12
10
14
2
4
6
day
8
10
12
14
10
12
14
day
Figure 5.2.2: Trends in Intra Call Volumes
I
i
N
O
C1
0
m
O
U
O
C
I
m
O
00
p
4
I
I
O
-
L
4
Oely
-
Syla alc
pgtl
-*
2
4
6
6
10
12
2
14
4
6
dy
8
day
Figure 5.2.3: Trends in Inter-In Call Volumes
the onset of the protests on December 2 7 th (day 8 of the study period), and another
peak on January 1st (day
13
of the study period). In a western context one could mis-
take this as an effect of the new years holiday, however, this measure is relative to all
cities in the donor pool-synthetic Qatif would have seen an increase as well. Additionally, Saudis follow the Hijri, as opposed to the gregorian calendar year. January ist
was significant to Qatif for a different reason; this was the day after al-Mater's funeral
procession.
41
1i
0
O o
b
O
1\~
2
4
0
6
10
12
2
14
4
s
1
10
12
14
day
day
Figure 5.2.4: Trends in Inter-Out Call Volumes
See Tables C.o.i and B.o.2 in the appendix for more detail on these results, and Figures B.o. i and B.o.3 for their robustness checks. As an additional investigation, Figure
B.o.2 shows daily measures of all calls made by individuals identified as residents against
a synthetic control. The plot shows a strong correspondence between the real and synthetic measures, both pre and post treatment. This adds more nuance to the shift in
communications patterns, suggesting that while a residential call 'budget' remained
fixed before and after the event, a greater proportion of calls were made to within-city
individuals.
Unfortunately, the trends in inter-in and inter-out activity (Figures 5.2.3 and 5.2.4)
do not tell as clear a story. Qatif's aggregate change in inter-in activity week over week
is roughly consistent with the nation-wide average, and daily plot closely mirrors the
synthetic control. On the other hand, Qatif's increase in inter-out activity is a good
deal higher than the mean for all cities, yet the daily plot holds a consistent profile with
the synthetic profile before and during the period of unrest.
The next section will apply SCM to activity on Twitter.
42
Analysis: Twitter Activity
The Twitter dataset is examined through the following dimensions:
r. The total number of tweets across the city.
2.
The number of tweets per unique user.
3. The average message length.
6.1
GEOTAGGED ACTIVITY
The first pass of the analysis looks at trends in twitter activity through the geotagged
dataset. The changes from the pre-treatment to post-treatment weeks show only a slight
increase in overall tweet volume for Qatif (0. 4 3%) and a 7.91% decrease in unique daily
users, against an average 18.32% increase for the rest of the country, with a corresponding 13.12%. increase in unique daily users. The standardized daily activity plots of Qatif
43
and the rest of Saudi Arabia, (reproduced in the appendix), show profiles that are much
noisier than the phone activity trends. This may indicate that the twitter sample is too
small, or the temporal resolution is too detailed to quantify an impact in behavior.
3
2
0
-
.
o--_
0
i
-
y
Q
-1
p.
-2
'II
-3
ii
,i
3
~
2
JI L L L _ _ i L L L
F
C
F
1
O
0
0
a
CU
Q.
-1
-
-
Ca
6)
-2
3
-
r
i-
g-
r-
r
~
r
-- -r
-
T T F- rr-F 1--F T-r- T---
2
1-0
-2 -
19
"
20
21
A
L
22
23
J
24
~
I
25
26
L
27
Day
28
L
29
L
30
-G- Qatif
31
-
e
L
01
02
03
04
- All Other Saudi Govemorates
Figure 6.1.1: Trends in standardized daily Tweet volume (top), Tweet length (middle), and Tweets per user
(bottom), Qatif (solid) vs. other Saudi governorates (dashed). Dec. 20th - Jan. 3rd. Treatment indicated
by dashed pink line
The following predictors are used:
- Log of the tweet volume or the number of tweets per unique user, per day in the
week before treatment
44
" The average tweet length per day in the week before treatment
- percent of males as obtained from the Department of Analysis & Reports in
2010
" percent of Saudis, also obtained from the Department of Analysis & Reports in
2010
I
,
T
aO
--------
--
----
f1
__
2
~
__r
a
4
8
10
12
14
2
4
8
8
10
12
14
Day
Day
Figure 6.1.2: Trends in Total Tweet Activity, Qatif and Synthetic Qatif
It
----------
--
---
--
--------
--
2
v
4
2
6
4
a
10
81
12
14
2
ay
4
6
i
8
10
day
Figure 6.1.3: Trends in Average Tweet Length, Qatif and Synthetic Qatif
45
12
14
2
4
6
6
10
12
2
14
4
6
Figure 6.1.4: Trends
8
10
12
14
Day
Day
in Tweets Per User, Qatif and Synthetic Qatrf
As evidenced in Figures 6.1.2 and 6.1.3, SCM is able to capture and quantify effects
length. The plots of daily Tweets
per user are slightly better however. Figure 6.1.4 shows a slight, but imperfect correon neither geotagged tweet volume nor average tweet
spondence between Qatif and its counterfactual for days i through 11i, before a striking
divergence for the remainder of the study period. These dates somewhat coincide with
the funeral procession of Ahmad al-Matar, which took place on December 31st, or
day 12. Predictor weights, governorate weights, and v-weights and displayed in Tables
C.o.i, C.o.2, and C.o.3 in Appendix C.
Similar robustness checks are now performed using the techniques explained in previous chapters. The second panel of Figure C.o. i shows the
in-time placebo test over
the pretreatment period. The lines for Qatif and synthetic Qatif are mostly consistent,
showing that no effects are detected in the absence of treatment. The first across-unit
placebo
is conducted using Ahad Rufayday-the city that represents the largest share of
Qatif's
synthetic counterfactual. As the first panel in Figure C.o. i depicts, SCM does
not produce a consistent counterfactual for the governorate. The complete across-unit
permutation test is shown in C.o.2. They suggest that the estimated treatment effect
for Qatif is not distinguishable from randomness. These are indications that the change
seen in daily Tweets Per User may be the result of a statistical fluke, not the onset of city
riots. Ultimately, it's likely that the Tweet data is simply too noisy at this scale. In order
to capture a more comprehensive view of Twitter activity
46
in each city, the "Location
Estimation for Non-Geotagged Tweets" section in Chapter 7 presents a procedure to
estimate locational data from non-geotagged tweets using a probabilistic classifier.
47
7
Future Directions
7.1
LOCATION ESTIMATION FOR NON-GEOTAGGED TWEETS
While majority of Twitter users do not share latitude and longitude data with their
tweets, many record an individually-defined, publicly-accessible location string in their
account information. Typically, this is the city and country in which they reside. Using
this assumption, it is possible to estimate where tweets were made using a naive Bayes
text classifier.
The Naive Bayes model is a straightforward probabilistic learning method that has
found great popularity in text classification problems due to its relative simplicity yet
highly effective performance in many real-world problem domains [27]. It works particularly well for classification problems with high feature spaces, which is well suited
for location identification based on a number of diverse textual indicators. Following
the approach of Manning et al. [28], the probability of tweet t originating from city c
48
is based on Bayes Rule:
P(C = cit
= {w1, W2, .. we}
P(C = c)P(t = {w 1 , w2, ... wn }IC = c)
P(t = {wl, w2, ... wnt })
Where:
" City c exists in the set of all Saudi Cities in the donor pool:
" Each tweet
c E C = {ci, c2, ... , C41}
tE X
" nt is the number of terms in the tweet's location field.
P(wi Ic) is the conditional probability of term wi occurring in a tweet of city c.
" P(c) is the prior probability of a tweet originating from city c.
The denominator of the above expression is constant over the target cities, as it is a
function of values in t. Additionally, Naive Bayes assumes conditional independence
between the attribute values (the tweet location tokens) given a target value (each city).
Hence:
11
P(clt) oC P(c)
P(wijc)
1<=i<=nt
Using a training set T of tweets labeled such that (t, c) E XxC, a classifier
-y is devel-
oped that maps tweets to cities: X -+ C
If a tweet's location terms don't produce clear evidence for a specific city, the city that
has the highest prior probability is chosen (the maximum a posteriori decision rule).
The Naive Bayes classifier is defined as:
49
argmaxcecP(ct) = argmaxc E C (c)
fi
P(wi
2
c)
1 <=k<=nt
Thus, the estimate to determine which city a tweet belongs to is the product of the
probability of each location term of the tweet given a specific city, multiplied by the
probability of the city. After calculating for each c E C, the c with the highest proba-
bility is selected.
To avoid floating point underflow (situations where the probabilities are so small they
can't be stored in memory),instead of multiplying the probabilities they will be logged
and summed:
argmaxcEc[logP(c)
log1(wi1c)]
1<=k<=nt
Which effectively treats every conditional parameter logP(wi
Ic) as a weight that indi-
cates how good an indicator tk is for c, where the prior logP(c) is a weight that tells
the relative frequency of c. The intuition here is that more frequently cited cities are
more likely to be correct.
The maximum likelihood estimate is used estimate the parameter P(c).
Given the
training set of geotagged tweets this is the relative frequency of c, i.e.:
p(c) = N
Where N, is the number of tweets from city c, and N is the total number of tweets.
The conditional probability P(wl c) is estimated as the relative frequency of the location term w within tweets belonging to city c:
50
P(wIc)
T
=
Et'E
Wem
Wow is the count of the location term w from city c. The count runs over all
different positions k in the training set of tweets, thus the positions bear no impact on
Here
the estimates. This may pose problems in other applications of Naive Bayes, however
the problem at hand is narrowly defined, and is unlikely to skew the results.
As safeguard against misclassification, location terms were only associated with Saudi
cities during the training process if at least 50% of their occurrences were found in that
city.
Applying the supervised learning procedure to stream of 10% of total tweets resulted
in about 1,ooo,ooo new tweets from roughly 69,ooo unique users over the entire study
period. The analysis from Chapter 7 was redone with the new dataset and the results are
reproduced below. As the plots demonstrate, it seems that the data are still too sparse to
identify any city-level trends through SCM. While it's possible to lower the classifier's
stringency and accumulate more urban-level data, it's worth noting that, overall, the
scale of coverage between CDRs and Tweet data is completely different; When looking
at Qatif only, the CDR dataset holds roughly 235,ooo records per day, while the Tweet
dataset holds approximately 90. It's entirely possible that Twitter's user base in Saudi
Arabia was simply too underdeveloped and/or uneven at this point in time, making it
impossible to aggregate and compare at the spatial scale of a city, or the temporal scale
of a day. Further research will work to balance the classifier's rigidity and total output
in an effort to better characterize the treatment effect through social media.
7.I.I
INITIAL RESULTS: NON-GEOTAGGED TWEETS
As in Chapter 7, the following predictors are used:
5'
" Log of the tweet volume or The number of tweets per unique user, per day in
the week before treatment
" The average tweet length per day in the week before treatment
- percent of males as obtained from the Department of Analysis & Reports in
2010
- percent of Saudis, also obtained from the Department of Analysis & Reports in
2010
a
b
- --
0
'6.
---- -
-- --
-
j
0
a
9
-
2
4
a
a
10
-
12
syMMtw Qotl
2
14
o4y
4
6
8
10
12
14
Day
Figure 7.1.1: Trends in Total Tweet Activity, Qatif and Synthetic Qatif
7.2
COMMUNICATION
NETWORKS
The analyses on aggregate activity through call records and social media demonstrate
a profound change in Qatif's communication patterns in response to civil unrest. The
most important next step will be in exploring the compositional changes that occur in
the city's social network; is it possible to identify any emergent reorganization strategies
that either encourage or impede information flow? This section presents the a few initial
investigations in this direction.
52
12
o
I
2
4
B
8
n
10
12
14
2
Figure 7.1 2: Trends in Average Tweet Length,
4
0
B
10
12
14
12
14
Qatif and Synthetic Qatif
0!
w-ww
(p~~~~~p9~
wI
- / -O-----
~
-
-
Day
I
2
6
4
8
Figure 7.1.3:
Day
10
12
14
2
4
6
I
8
10
Trends in Tweets Per User, Qatif and Synthetic Qatif
NETWORKc GENERATION
Human interaction networks based on CDR data have been constructed for each city
in the Saudi donor pooi. The generation procedure follows [29], wherein each mobile phone user is defined as a node, and
links are formed among nodes according
to the communications records. This study focusses only on cities' reciprocal networks, in which two nodes are connected if and only if both of the corresponding
users initiated at
least one call to the other over the study period. A non-reciprocal
53
network, in which a link exists if either side initiated activity, may contain unidirectional communications-possibly interactions between individuals who do not know
each other. Thus, it is presumed to represent a more superficial social network than the
reciprocal alternative. Again, following Schlapfer et al., all nodes which never receive
nor initiate calls are eliminated, in an effort to remove potential bias from call centers
and/or other business hubs.
The network is composed only of users whose home locations have been identified
following the procedure described in Chapter 5. The set of users represents roughly
40% of the total individuals in the dataset.
Each edge weight wi. is defined by the
number of communications initiated by individual ni to individual
nr. The degree
and edge weight distributions of the two-week nationwide network are shown in Figure 7.2.1. Once the complete network was constructed it was split into 40 different city
networks by severing intercity edges. A variety of timescales were used, but single day
networks proved too sparse to offer meaningful insight. Ultimately, two week-long,
directed networks were built corresponding to pre and post treatment periods. This
analysis looks at the largest connected cluster (LCC, giant component) extracted from
each Qatif network. The networks' basic properties are summarized in Table 7.2.1.
10
101
108
__
---
--
106
''__________1_____
102
10
101
1
01
10-2
10
___________
______
2120
104
103
10
103
100
10
102
10
14
Figure 7.2.1: Total Degree Distribution (Left), and Edge Weight Distribution (Right) of the Complete Reciprocated Network, KSA
54
Qatif
Qatif
Week
n
m
Avg Degree
GCC
Post
23193
60379
5.21
0.092
LCC
76%
Pre
22522
56392
5.01
0.092
73%
Table 7.2.1: Summary statistics for Communication Networks. The size of the larges connected component (LCC) is presented as a percentage of the number of nodes in the full city network. The total number
of nodes (n), number of links (m), average degree, and global clustering coefficient (GCC) correspond to
the complete city networks
INFORMATION DIFFUSION
Following Onnela et al. [30], Figure 7.2.2 explores global information diffusion across
both the pre and post treatment networks for Qatif. The process is based on a simple infection model in which the probability that an infected node passes the disease
to its nearest neighbor node is proportional to the strength of their connection. The
procedure randomly selects an individual and 'infects' him or her with information at
time to = 0. At each following time step t8 each infected individual n will pass the
information to another individual
nr in its contact list with probability Pi = zwig,
where wi3 is the edge weight. Thus if two individuals have a higher number of connections between them they will be more likely to pass information to each other. x is
used a control parameter for the rate of overall spread through the network. The most
straightforward choice is x = 1/max(wi3 ), such that the strongest weight will result
in a probability of I. However, as Onnela et al. state, this creates very long simulation
times due to the skewness of the weight distribution; normalizing by the maximum
weight creates very small transmission probabilities for the majority of connections.
By increasing the value of x the simulation can be sped up without dramatically altering the overall system. This produces a cutoff w* to the transmission probability,
such that transmission will always occur for weights above w*. w* is chosen so that
Pc,,(w*) ~~.965, or w* ~ 14. Thus Pig
-
The diffusion simulation has been conducted
wig for 96.5% of the weights.
1,ooo times over each network. As the
top panel in Figure 7.2.2 demonstrates, beyond a threshold of roughly 25% infected,
the rate of transmission is actually faster in the post treatment network. Additionally,
as seen in the bottom panel of 7.2.2, the distributions of edge weights of the links
responsible for infecting an individual favor low edge strengths for both networks, sug-
55
gesting that the majority of individuals get their through weak ties-a finding that is
consistent with the revered role of weak ties in information sharing [31]. Moreover,
the phenomenon is slightly pronounced in the post treatment distribution, which may
imply that weak ties become increasingly important during times of duress. It must
be stated, however, these results are strictly preliminary. They provide some indication
that communities intelligently reorganize communications to increase dissemination
speed and breadth during periods of civil unrest, but further research will be required
to clarify this relationship and demonstrate a causal link.
7.3
RELIGIOSITY
An intriguing pattern was found in the Saudi mobile activity distributions; at various
points in the day activity would simply drop off for around 30 to 40 minutes before
retiring to its typical trend. These inactivity "valleys" were actually the result of daily
prayer times. Millions of Muslims across the country put down their phones to turn
and face the holy city of Mecca to give prayer five times a day. Shops and businesses
essentially close for 20-30 minutes while the religious police-the Mutaween-surveil
the streets in the hopes of sending all loiterers to the nearest mosques. Interestingly,
the activity distributions capture this behavior very closely. The precise timing of these
calls to prayer depend on the position of the sun in the sky, and thus, by differentiating
the CDR distributions into western, central, and eastern regions one can see the prayer
.
times moving across the country as shown in Figure 7.3-1
This prayer time disruption could function as a rough proxy for urban-level religiosity.
The following method has been created to catalogue daily disruption. It utilized the
fourth prayer, Maghrib (the sunset call to prayer), due of its strong presence in the data.
Method I:
Let nmaxo be the maximum total network traffic at the beginning of the window,
nd,max, be the maximum total network traffic at the end of the window
for day d, td,maxi, and nd,mini be the minimum total network traffic over the wintd,maxo,
dow, at td,min for day d. Now let C(t) equal call count for time t, and
56
O(t)
as:
I
0.8
0.6
a
m
c
c
a
0.4
U
N
d
0-2
Qatif Pre Treatment
Qatif Post Treatment
0
-
0
-
------
20
----
-
-----
40
----------
60
80
10 0
Time t
500
Qatif Pre
_
__
Treatment
Qatif Post Treatment
400
8
300
E 200
z
100
0
0
20
40
60
Time
80
100
t
-----7-------- ---------- ------------
0.35
---
Qatif Pre Treatment
Qatif Post Treatment
-
f-4-----
0.3
0.25
0.2
a.
0.15
0.1
0.05
0
5
10
Edge Weight
15
220
Figure 7.2.2: Fraction of Infected Nodes as Function of Time (Top), Number of infected Nodes at each
instance of t (Middle), and Distributions of Edge Weights Responsible for Infection
57
U
my
-~
q'V v
U
I
<~4(I
/
/7,
/
/
~
~hJW
1<
1
iii
Figure 7.3.1: Daily Network Activity Distributions from Jeddah (Western Saudi Arabia), Riyadh (Central
Saudi Arabia), and the Eastern Region
((nd,MAX, - nd,maxo ) / (td,maxi
-
td,maxo)) t + nd,mini (the estimated curve had no
disruption occurred). The disruption is then calculated as the ratio of the disturbance
area over total possible area:
tdmaxl
f
R2d
d-
C(t)-C(t)dt
tdmax
tdax
1
f
C(t)dt
tdrnaxo
Figure 7.3.2 shows disruption for all cities over pre and post treatment periods. The
plot is messy at best. Unfortunately it seems that religiosity is not tractable using this
method over a daily time scale. Quantifying this phenomenon remains an open question for the future.
58
U
L.Lr
0.18
0.16
0.14
0.12
0.1
0.08
0.06
0.04
'
19
'
20
'
21
'
0.02
22
23
24
25
27
26
28
29
30
31
01
02
03
Figure 7.3.2: Trends in Daily Prayer Time Disruption, All KSA Cities. Qatif drawn in pink.
59
04
8
Discussion
The analysis presented above points to a number of compelling, statistically significant changes in communications across Qatif-relative to other urban agglomerations
in Saudi Arabia-in the week following Ahmad al-Matar's death. The effects of social
unrest undoubtedly reverberate through phone behavior, as Qatif's daily call activity appears to shift in both magnitude and composition in response to the exogenous shock.
The most powerful trend identified in Qatif is the decrease in city-wide phone activity
over the post-treatment time window; the volume of daily calls exhibits a dramatic drop
immediately following the boy's death, and holds well below the synthetic counterfactual for the remainder of the study period. A similarly significant drop in unique daily
callers is also found-evidence that not only were fewer calls being made, but fewer
people were actively communicating through the city's telecom infrastructure. Interestingly, the differences between the treated and control units at the extremes of the
post treatment period for unique daily callers are slightly less pronounced than those
6o
of total daily calls, suggesting that treatment effect on unique callers has a slower initial
response time and a faster falloff.
These results raise two possible, but not mutually exclusive post-treatment scenarios:
(i) individuals who had the means to leave the city did so; or (2) people limited their
daily calls, potentially switching to other forms of communication.
To begin with
the former, it's possible that individuals who had been relative 'outsiders' in Qatifindividuals who had ties to other areas-turned away at the first sign of violence. This is
not unlikely, given allegations that the Government had labeled the city as a dangerous
place to visit [1 3]. It's within the realm of possibility that the alleged scaremongering
had made individuals apprehensive about spending time in Qatif.
To address latter, there seems to be some sensitivity in the Saudi population regard-
ing privacy concerns related to mobile phones. The government has a huge stake in
mobile infrastructure; STC, for instance, is majority-owned by the Saudi government
through Saudi Arabia's Public Investment Fund. There have long been rumors circulating that the government monitors its citizens' activities through mobile devices [391.
These rumors came to a head recently, when Human Rights Watch, an international
human rights advocacy group, accused the Kingdom of tapping Qatif residents' phones
and monitoring their activity [42). Allegedly, the surveillance software had been propagated through the city as malware masquerading as a local news app, since the onset
of Shia-led protests in 2011 [43]. It's possible that people-even those not involved
with the demonstrations-wanted to avoid any potential scrutiny and avoided phone
use at the first indication of civil disruption.
While daily call volume and unique daily callers experience steep declines, no evidence
of a change in the average duration of daily calls is found. Social unrest appears to have
a stronger, more generalized effect on how often people make calls and whom they
call than how long they stay on the phone. Duration measures appear too noisy to
isolate patterns at the levels of aggregation employed here. The study period may need
to be constrained to a tighter time window around the event, and/or measurements
may need to be obtained at more granular intervals to extract any changes in response
to treatment. It's also possible that changes in call duration are only be measurable
within subpopulations who are active in the protest movement. These remain issues to
be explored in the future.
Beyond changes in daily aggregate activity, strong evidence exists of a transformation
in the call composition of individuals who are identified as local residents. The analysis
presents an increase in daily activity within the subnetwork of users identified to hold
strong spatiotemporal ties to the city, even though their total activity-the number
of connections both internal and external to this subnetwork-remains constant. This
increase in intra network communication suggests that people strengthen their connections with others in the urban community during periods social unrest. Interestedly,
the call measures within this network peak on the day of the al-Matar's funeral procession, adding credence to the notion that these changes were tied to the treatment.
Inter-in activity, calls originating in other cities and terminating in Qatif, sees a slight
increase week over week, but the change is statistically consistent with all other cities
in the donor pool. On the other hand, inter-out activity, calls originating in Qatif
and terminating in other cities, sees an increase of roughly i 0% week over week, which
is higher than the national increase of 7.7%. However, daily trends do not appear to
tractable using SCM.
Additionally, an initial exploration of Qatif's city-wide human interaction networks
provides some evidence that information diffusion increases in breadth and speed after treatment. The transmission simulations also point to a higher reliance on weak
ties in the post-treatment network, which is consistent with the leading theory on the
topic. While these findings are strictly preliminary, they offer some suggestion that
communities under duress intelligently reorganize communications to increase overall
information flow. Further research will be required to better identify and articulate
the structural changes to Qatif's human interaction network before work can done to
determine a causal link.
Finally, examining the behavior on Twitter yields some interesting, if not altogether
robust, findings.
Both geotagged and location-estimated tweet activity, aggregated
to the city scale, show no recognizable trends from one week to the next. Average
Tweet length is similarly noisy for both Tweet datasets. However, when looking at
daily Tweets per user, there appears to be a striking increase before, during, and after
al-Matar's funeral on December 31st. It should be stated, however, that this finding
62
is tenuous; the robustness checks do not provide much support that this is more than
a statistical anomaly. Ultimately it seems that-while Twitter adoption is quite high
in Saudi Arabia-Tweet coverage per city is simply too uneven and sparse to capture
any recognizable trends during this time. Upcoming research will attempt to soften the
locational classifier's stringency in the hopes of creating a more comprehensive view of
social media usage during this period of time.
63
Appendices
64
A
Appendix: Call Behavior
Table A.0.1: V-Weights for Total Daily Activity
v.Weights
log(Total Activity) (days 1-7)
Avg. Duration. (days 1-7)
0.421
0.315
Percent Men
o.26
Percent Saudi
0.003
Table A.0.2: V-Weights for Average Daily Duration
v.Weights
1-7)
o
Avg. Duration. (days 1-7)
Percent Men
0.912
o.o68
Percent Saudi
0.02
log(Total Activity) (days
Table A.0.3: V-Weights for Daily Unique Callers
log(Unique Callers) (days
Avg. Duration. (days
1-7)
1-7)
Percent Men
Percent Saudi
66
v.Weights
0.893
0.055
0.048
0.004
coj
-
Cu
C
01
- -
T
5
10
15
Qatif
yntheic Qatif
20
day
Figure A.0.1: Total Network Activity, Qatif and Synthetic Qatif (3 Weeks)
67
.
.
.
m
0l
r
;
I
,
I
r
I
r
\
\
I
I
.
0
0
0
.
r
-
Oat,
synthetic Qatif
0~
5
10
15
20
.
day
.
Figure A.0.2 Number of Unique Daily Callers, Qatif and Synthetic Qatif (3 Weeks)
68
B
Appendix: Inter and Intracity Calling
Patterns
Table B.0.1: Daily Intra Call Activity Predictor Means
Variables
intraLog (days 1-7)
Treated
Synthetic
Sample Mean
10.521
10.521
11.643
interInLog (days 1-7)
9.059
9.030
10.120
interOutLog (days
percentMen
percentSaudi
9.008
9.068
10.145
0.545
0.546
0.602
0.870
0.804
0.822
1-7)
69
Table B.0.2: Governorate Weights in Synthetic Qatif (Daily Intra Call Activity)
Weight
Governorate
Al Bahah
0.002
Ar ar
0.029
Governorate
Ar Rass
Unayzah
Sakaka
0.000
Al Riyadh
Yanbu Al Bahar
0.002
Buraydah
0.002
Al Kharj
0.002
Ad Duwadimi
Al Majmaah
Al Quwayiyah
Khamis Mushayt
0.002
Ha ii
Sabya
0.003
Al Qunfidhah
0.001
Naj ran
0.002
0.004
Weight
0.005
0.003
0.001
0.005
0.003
0.004
Abha
Ahad Rufaydah
Al Majardah
0.002
0.416
0.002
0.227
Tabuk
0.001
Ad Dammam
0.001
Al Ahsa
Al Jubayl
0.001
Bishah
Al Qhazaiah
Abu Arish
Ahad Al Masarihah
Jizan
0.001
Samtah
0.004
Al Khubar
0.012
0.224
Haft Al Babin
Al Quryyat
0.002
0.002
Al Taif
Al Lith
Jiddah
Medina
0.002
Mecca
0.001
Muhayil
0.004
0.012
0.006
0.09
0.002
0.022
0.001
-
I
4
e
a
a~rt~tc 6
-
T
I
2
4
6
a
10
12
1
14
day
2
3
4
5
6
7
8
days
Figure B.0.1: Intra Call Activity Synthetic Control Placebo Test with Samteh (Left), In-time Intra Call Activity Placebo with Qatif (Right)
70
z
Snthet coa
2
4
e
8
10
12
14
Figure B.0.2: Daily Local Call Activity
Table B.0.3: V-Weights for Daily Unique Callers
v.Weights
Intra Calls (days 1-7)
Inter-In Calls (days 1-7)
Inter-Out Calls (days 1-7)
Percent Men
Percent Saudi
71
0.267
0.222
0.12
0.232
o.16
-
-------------Oam
00"
2
4
6
a
10
12
2
14
6
4
a
12
10
-
--------------
.yam.,
14
day
--
2
4
a
6
10
12
2
14
4
---
-
a
-
-
a
10
0.11---
12
--
14
day
day
Figure B.0.3: Across-Unit Placebo Tests: Intra Call Activity (all, 20x or less, 10x or less, 5x or less)
72
0
J
0
0
da
-Qatif
synthetic Qatif
5
10
15
20
day
Figure B.O.4: Intra Call Activity, Qatif and Synthetic Qatif (3 Weeks)
73
C
Appendix: Twitter Activity
Table C.0.1: Daily Tweets Per User Predictor Means
Treated
Synthetic
Sample Mean
Daily Tweets Per User (days 1-7)
Log Total Daily Tweets (days 1-7)
percentMen
0.267
0.267
0.246
3.501
0.545
3.501
0.546
percentSaudi
0.870
0.804
Variables
74
3.740
o.602
o.822
Table C.0.2: Governorate Weights in Synthetic Qatif (tweets Per User)
Governorate
Weight
Al Bahah
0.007
Ar ar
Sakaka
0.023
0.011
Yanbu Al Bahar
o.oo8
Buraydah
Al Kharj
0.013
Khamis Mushayt
0.014
Ha ii
Sabya
0.018
Al Qunfidhah
0.098
Najran
0.010
Tabuk
Ad Dammam
Weight
Governorate
Ar Rass
Unayzah
0.014
0.009
Al Riyadh
0.009
Ad Duwadimi
Al Majmaah
Al Quwayiyah
0.012
Abha
Ahad Rufaydah
Al Majardah
Bishah
0.013
0.004
0.028
Al Qhazaiah
Abu Arish
0.007
Ahad Al Masarihah
0.000
Al Ahsa
Al Jubayl
Al Khubar
0.029
Jizan
0.007
0.004
Samtah
0.012
0.002
0.00I
0.007
0.010
o.oo6
Haft Al Babin
0.032
Al Quryyat
0.013
Al Taif
Al Lith
Jiddah
Medina
Muhayil
0.010
Mecca
0.007
0.006
0.311
0.072
0.105
0.007
0.010
0.007
0.042
Table C.0.3: V-Weights for Daily Unique Callers
v.Weights
Daily Tweets Per User (days 1-7)
Log Total Daily Tweets (days 1-7)
0.745
0.069
Percent Men
o.185
Percent Saudi
0.001
75
o-
m
0
_
0
E
d
I
0
o"
N
_
O
C
-ydRh-
-ha
O
-
synthetic
Qasti
_
C
2
4
8
1o
12
1
14
2
3
4
5
6
7
6
Day
Figure C.0.1: Tweets Per User Synthetic Control Placebo Test with Ahad Rufaydah (Left), In-time Tweets
Per User Placebo with Qatif (Right)
76
r
r
D
Ii
- - -
-
------
-
,.g-
---2
4
6
8
10
12
14
2
4
a
8
10
12
14
10
12
14
day
day
"4
a
G
p
d
----
---- -------
------
---------
-
2
4
6
8
10
12
onu
mmla :ayiw
2
14
4
6
8
day
day
Figure C.0.2: Across-Unit Placebo Tests: Tweets Per User (all, 50x or less, 20x or less, 5x or less)
77
References
[I] Z. Tufekci and C. Wilson, "Social media and the decision to participate in political protest: Observations from tahrir square," Journalof Communication, vol. 62,
pp. 363-379, Apr.
2012.
[2] A. Breuer, T. Landman, and D. Farquhar, "Social media and protest mobilization: Evidence from the tunisian revolution," SSRN Scholarly Paper ID 2133897,
Social Science Research Network, Rochester, NY, Aug. 2012.
[3] G. Lotan, E. Graeff, M. Ananny, D. Gaffney, I. Pearce, and D. Boyd, "The revolutions were tweeted: Information flows during the 2011 tunisian and egyptian
revolutions," InternationalJournalof Communication, vol.
I1,
pp.
1375-1405,
2011.
[4] M. Szell, S. Grauwin, and C. Ratti, "Contraction of online response to major
events," PLoS ONE, vol. 9, p. e89052, Feb.
[5]
2014.
J. P. Bagrow, D. Wang, and A.-L. Barabasi, "Collective response of human populations to large-scale emergencies," PLoS ONE, vol. 6, p. e1768o, Mar. 2011.
[6] S. Aday, H. Farrell, M. Lynch, and D. Freelon, Blogs and Bullets H: New Media
and Conflict after the Arab Spring. United States Institute of Peace Press, Mar.
2014.
[7] J. H. Pierskalla and F. M. Hollenbach, "Technology and collective action: The
effect of cell phone coverage on political violence in africa," American Political
Science Review, vol. 107, pp. 207-224, May 2013.
[8] T. Matthiesen, "Saudi arabia's shiite escalation," July 2012.
[9] "Saudi police arrest prominent shi'ite muslim cleric," Reuters, July 2012.
[io] T. Matthiesen, Sectariangulfi Bahrain, Saudi Arabia, and the Arab Spring that
wasn't. Stanford, California: Stanford Briefs, an imprint of Stanford University
Press, 2013.
78
[ I]
R. Staff, "Two killed as saudi security forces try to arrest shi'ite man," Reuters,
Sept.
2012.
[I2] "Saudis protest killing of teen protester in qatif," Press TV, Jan. 2013.
[13] B. Perazzo, "Propaganda & sectarianism: How the saudi government stifles the
truth about qatif," Jan. 2013.
[14] "The story of a tweet," Aug. 2014.
[15] M. Mari, "Twitter usage is booming in saudi arabia - GlobalWebIndex (," Mar.
2013.
[16] "Realtime twitter data access," Aug. 2014.
[17] D. Boyd and K. Crawford, "Six provocations for big data," SSRNElectronicJour-
nal,
2011.
[18] A. Abadie and J. Gardeazabal, "The economic costs of conflict: A case-control
study for the basque country," Tech. Rep. w8478, National Bureau of Economic
Research, Cambridge, MA, Sept. 2001.
['9] A. Abadie, A. Diamond, and J. Hainmueller, "Comparative politics and the synthetic control method," SSRN Scholarly Paper ID 1950298, Social Science Research Network, Rochester, NY, Feb. 2014.
[20] A. Abadie, A. Diamond, and J. Hainmueller, "Synthetic control methods for
comparative case studies: Estimating the effect of california's tobacco control program," Journalofthe American StatisticalAssociation, vol. 10s, pp. 493-505, June
2010.
[21] A. Abadie, A. Diamond, and J. Hainmueller, "Synth: An r package for synthetic control methods in comparative case studies,"JournalofStatisticalSoftware,
vol. 42, pp. 1-17, June 2011.
[22] E Calabrese, M. Colonna, P. Lovisolo, D. Parata, and C. Ratti, "Real-time urban monitoring using cell phones: A case study in rome," IEEE Transactionson
Intelligent TransportationSystems, vol. 12, pp. 141-151, Mar. 2011.
[23] F. Calabrese, G. Di Lorenzo, L. Liu, and C. Ratti, "Estimating origin-destination
flows using mobile phone location data," IEEE Pervasive Computing, vol. 10,
pp. 36-44, Apr.
201.
[24] M. C. Gonzalez, C. A. Hidalgo, and A.-L. Barabasi, "Understanding individual
human mobility patterns," Nature, vol. 453, pp. 779-782, June 2008.
79
[25] P. Wang, T. Hunter, A. M. Bayen, K. Schechtner, and M. C. Gonzalez, "Understanding road usage patterns in urban areas," Scientific Reports, vol. 2, Dec.
2012.
[z6] S. Phithakkitnukoon, Z. Smoreda, and P. Olivier, "Socio-geography of human
mobility: A study using longitudinal mobile phone data," PLoS ONE, vol. 7,
p. e39253, June 2012.
J. Hand and K. Yu, "Idiot's bayes: Not so stupid after all?," International
Statistical Review / Revue Internationale de Statistique, vol. 69, p. 3 8 5, Dec. 2001.
[27] D.
[28] C. D. Manning, Introduction to information retrieval.
University Press, 2008.
New York: Cambridge
[29] M. Schlapfer, L. M. A. Bettencourt, S. Grauwin, M. Raschke, R. Claxton,
Z. Smoreda, G. B. West, and C. Ratti, "The scaling of human interactions with
city size," Journalof The Royal Society Interface, vol. II, pp. 201 30789-201 30789,
July 2014.
[30] J.-P. Onnela, J. Saramaki, J. Hyvonen, G. Szabo, D. Lazer, K. Kaski,
J. Kertesz,
and A.-L. Barabasi, "Structure and tie strengths in mobile communication networks," Proceedingsofthe NationalAcademy of Sciences, vol. 104, pp. 73 3 2-73 36,
May 2007.
[ 31] M. Granovetter, "The strength of weak ties," The American
vol. 78, pp. 1360-13 80, May
[32] J.-P. Onnela,
J.
Journalof Sociology,
1973.
Saramiki, J. Hyvbnen, G. Szab6, M. A. d. Menezes, K. Kaski,
A.-L. Barabasi, and J. Kertesz, "Analysis of a large-scale weighted network of oneto-one human communication," New journal ofPhysics, vol. 9, pp. 179-179, June
2007.
[33] "Another young man shot dead in qatif," Saudi Shia, Dec.
2012.
[34] H. al Khoei, "Deadly shootings in saudi arabia, but arab media look the other
way," The Guardian,Nov. 2011.
[35]
J.-P. Onnela, S. Arbesman, M. C. Gonzalez, A.-L. Barabisi,
and N. A. Christakis,
"Geographic constraints on social network groups," PLoS ONE, vol. 6, p. e16939,
Apr. 2011.
[36] P. Wood, "How saudis are learning to protest," Mar. 2011.
[37] N. Eagle, A. Pentland, and D. Lazer, "Inferring friendship network structure
by using mobile phone data," Proceedings of the National Academy of Sciences,
vol. io6, pp. 15274-15278, Sept. 2009.
80
[38] F. Calabrese, Z. Smoreda, V. D. Blondel, and C. Ratti, "Interplay between
telecommunications and face-to-face interactions: A study using mobile phone
data," PLoS ONE, vol. 6, p. e2o814, July
2011.
[39] Anonymous, "Interview with saudi lawyer," May
2013.
[40] C. Song, Z. Qu, N. Blumm, and A.-L. Barabasi, "Limits of predictability in human mobility," Science, vol. 327, pp. 1018-1021, Feb. 2Q10.
[41] A.-M. Staff, "Questions over death of protester in saudi arabia's eastern province
- al-monitor: the pulse of the middle east," Jan. 2013.
[42] "Riyadh accused of tapping dissidents' phones," June 2014.
[43] "Saudi arabia: Malicious spyware app identified
I human rights watch," June
2014.
[44] "Saudi government monitoring internet to stifle protests."
[45] "Saudi telecom sought US researcher's help in spying on mobile users (wired
UK)."
[46] G. Tavares and A. Faisal, "Scaling-laws of human broadcast communication enable distinction between human, corporate and robot twitter users," PLoS ONE,
vol. 8, p. e65774, July
2013.
[47] P. A. Grabowicz, J. J. Ramasco, E. Moro, J. M. Pujol, and V. M. Eguiluz, "Social
features of online networks: The strength of intermediary ties in online social
media," PLoS ONE, vol. 7, p. e293 58, Jan.
2012.
[48] M. Granovetter, "The impact of social structure on economic outcomes,"
ofEconomic Perspectives, vol. 19, pp. 33-50, Jan. 2005.
Journal
[49] M. Batty, "The size, scale, and shape of cities," Science, vol. 319, pp. 769-771,
Feb. 2008.
[50] M. Fujita, P. R. Krugman, and A. J. Venables, The spatialeconomy: cities, regions
and internationaltrade, vol. 213. Wiley Online Library, 1999.
[51] M. Karsai, N. Perra, and A. Vespignani, "Time varying networks and the weakness
of strong ties," Scientifc Reports, vol. 4, Feb. 2014.
Colophon
HIS THESIS WAS TYPESET using LTEX,
originally developed by Leslie Lamport
and based on Donald Knuth's TEX. The
body text is set in I I point Arno Pro, designed
by Robert Slimbach in the style of book types
from the Aldine Press in Venice, and issued by
Adobe in 2007. A template, which can be used
to format a PhD thesis with this look and feel,
has been released under the permissive MIT
(xI i) license, and can be found online at
github.com/suchow/ or from the author at
suchow@post.harvard.edu.
82