L10 - Stony Brook University

advertisement
CSE 592
INTERNET CENSORSHIP
(FALL 2015)
LECTURE 10
PHILLIPA GILL – STONY BROOK UNIVERSITY
WHERE WE ARE
Last time:
•
Case study: Iran + Pakistan
•
Questions?
•
Revisit hands on activity …
RIPEstat page for AS 12880:
https://stat.ripe.net/AS12880#tabId=at-a-glance
Try looking up other Iranian networks
NDT data in Google
http://www.google.com/publicdata/explore?ds=e9krd11m38onf_&ctype=l&str
ail=false&bcs=d&nselm=h&met_y=download_throughput&scale_y=lin&ind_y
=false&rdim=country&idim=country:364&ifdim=country&ind=false
OOKLA Speed test:
http://www.google.com/publicdata/explore?ds=z8ii06k9csels2_&ctype=l&met
_y=avg_download_speed
ASSIGNMENT 1 DISCUSSION
Some questions led to confusion
Part 1
• Q2 who are the source and destination (most people got this):
• Source ICSI ; Destination Baidu
• (accepted anything from the destination whois since it was a
weird looking record).
• Q3 Why did Wireshark flag Packet 8 as a retransmission?
• Q4 What is unusual about the response in packet 9?
• (there was a typo in this question, I reduced total by 3 for it)
• Q5 (most people got this): Is it the same device responding to
packet 8 as the first GET (packet 4)?
ASSIGNMENT 1 DISCUSSION
Some questions led to confusion
Part 2
• Q2 What is missing from the TCP connection?
• Handshake!
• Q3 Flow or packet level censor? (a lot of people had trouble
with this!)
• Q4 What is returned in response to the HTTP GET?
Part 3
• On path or in path censor? Why?
TEST YOUR KNOWLEDGE (IRAN)
1.
How did the government assert control over the Internet in
2001?
2.
What were the conflicting goals of Iran in implementing
censorship?
3.
What was the end result of these?
4.
What is the `campaign for halal Internet’? Why did people fear
this?
5.
What is the purpose of IP addresses designated in RFC1918?
6.
What are two techniques used by Anderson to map an internal
10.x.x.x address to an external IP?
7.
What is the idea of “dimming the Internet”?
8.
How can “dimming” be measured?
9.
How did the pseudonymous paper identify filtering based on the
host header?
10. What type of proxy did they find?
TEST YOUR KNOWLEDGE (PAKISTAN)
1. Why is censorship in Pakistan well known?
2. Which protocol allowed collateral damage when they
censored YouTube?
3. What is the main form of censorship in Pakistan identified by
the Web censorship paper?
4. What are the most common circumvention techniques in
Pakistan?
5. What are some circumvention techniques that were found to
work for Pakistan censorship?
6. Which product did CitizenLab find in Pakistan?
7. How did they find the IP of installations of the product?
8. How did they verify that the installations were used for
censorship?
TODAY
Case Study: China
• Background (ONI report)
• Concept Doppler (Crandall et al.)
• http://www.csd.uoc.gr/~hy558/papers/conceptdoppler.pdf
• Locating the censors (Xu et al.)
• http://pam2011.gatech.edu/papers/pam2011--Xu.pdf
• Great Cannon  Seminar Tuesday October 13 @ 1pm!
BACKGROUND
•
China: one-party state, ruled by the Chinese Communist Party
(CCP)
•
Conflict between IT development and ability to contain sensitive
or threatening information
Several milestones challenged gov’t control:
•
Anniversary of Tibetan Uprising in 2008: Protests in Lhasa
•
2008 Beijing Olympics, pressures to lessen censorship
•
•
•
•
Foreign reporters had unprecedented access within the country
Domestic news still highly restricted
Unfettered Internet access for foreign journalists restricted to
“games-related” Web sites
2009 Riots in Urumqi led to Internet restrictions to “quench the
riot … and prevent violent”
•
State sponsored Web site access (31 sites) slowly restored but
Internet access in Xinjiang was effectively severed for 10 months
BACKGROUND 2
•
2010 Google refuses to comply with legal requirements of content
filtering in China
•
•
•
Attempts to hack gmail accounts of human rights activists
Google wanted to establish a truly free and open search engine or
officially close Google.cn
Google’s actions placed PRC’s censorship practices in the
international spotlight
•
China’s population means that even at 28.9% Internet penetration
the country has the most Internet users in the world
•
Mobile networking is important for bridging the rural/urban divide
•
• 10% of users access the Internet only on a mobile device
Widespread use of Internet has led to a change in the public
discourse and exposing corruption of government officials and
even dismissal of senior officials
INTERNET FILTERING IN CHINA
• Initial project: Green Dam Youth Escort project
• Filtering at the level of the user’s computer
• Analysis by ONI + StopBadware showed that it wasn’t effective
in blocking all pornography and would unpredictably block
political and religious content
• Follow on project: “Blue Dam” with more features mandated
to be installed by ISPs
• Server side/ISP-level blocking of content deemed to be
inappropriate
• Blog services are also responsible for policing content on
their sites
• Must install keyword filters + delete accounts of violating users
• More on this in the online social networking lecture …
TODAY’S READINGS
ConceptDoppler: A Weather Tracker for Internet Censorship.
Crandall et al. 2006
Internet Censorship in China: Where Does the Filtering Occur?
Xu et al. 2011
CONCEPTDOPPLER
• Goals: Develop the capability to monitor both the technical
censorship mechanism and how it is used
• E.g., to answer questions such as why was “keyword blocked”
in a given place and time
• Focus on keyword blocking: less collateral damage, less
negative fall out than blunt censorship techniques
• Specific aim to monitor the set of blocked keywords over time
and monitor for variations between regions
• Useful for circumvention (e.g., encoding keywords to make
them unidentifiable to DPI boxes).
• The need for continuous monitoring of censored keywords
requires efficient probing:
• Latent semantic analysis + techniques from Web search to
minimize the set of keywords that must be probed.
EXPERIMENT 1 TEST GFC
72 hours of HTTP GETs to www.yahoo.cn
1. Send FALUN (blocked keyword) until a RST is received
2. Switch to send TEST (benign word) until a valid HTTP
response is received
Observation: hosts are blocked from communicating for 90
seconds after sending the bad keyword.
RESULTS: BLOCKING IS NOT 100% RELIABLE
Diurnal trends in filtering effectiveness
White = not blocked; grey = blocked
X = 0 is 15:00 in Beijing
Y = # of probes
EXPERIMENT 2: FIND GFC FIREWALLS
Target URLs probed = top 100 URLs returned by Google for the queries: site:x where
X= .cn, .com.cn, .edu.cn, .org.cn, and .net.cn
No RSTs observed if the connection was not open -> GFC is stateful
WHERE ARE THE CENSORS?
FINDING KEYWORDS EFFICIENTLY
• Latent semantic analysis (LSA) to find keywords related to
concepts the government might filter
• Details of LSA in the paper.
• Experimental set up:
• Gather all the pages of Chinese language Wikipedia
• 12 term lists based on 12 general concepts (terms are selected
to be most related to the concept via LSA)
•
Probed 2,500 terms from each of the 12 lists
NUMBER OF KEYWORDS FOUND
X axis = bins of 250 keywords
Y axis = # of blocked keywords found
Key takeaway: 2,500 tested terms selected via LSA contains many more
Blocked keywords than randomly chosen terms
LIMITATIONS
• Scale: Querying each 2,500 word list takes 1.2 - 6.7 hours;
heavy use of network resources
• False positives + false negatives
• No way to claim these lists are exhaustive …
Potentials for evation:
• If we know what keywords are filtered when we observe them
we can be clever and hide them
•
•
•
•
E.g., fragment IP packets on the keyword,
HTML comments in the word: fa<!- comment ->lun
Use different encodings for keywords: F%61lun Gong
Put keywords in Captchas or insert other characters: f@lun
g0ng
READING 2: MAPPING THE GFC
• Presentation
HANDS ON ACTIVITY
Online reports of Chinese censorship:
https://en.greatfire.org/
http://www.herdict.org/explore/indepth?fc=CN
http://www.google.com/transparencyreport/traffic/explorer/?r=CN
&l=EVERYTHING
China Chats data: https://china-chats.net/
Video chat censorship:
https://www.usenix.org/sites/default/files/conference/protectedfiles/foci15_slides_knockel.pdf
Download