CSE 592 INTERNET CENSORSHIP (FALL 2015) LECTURE 10 PHILLIPA GILL – STONY BROOK UNIVERSITY WHERE WE ARE Last time: • Case study: Iran + Pakistan • Questions? • Revisit hands on activity … RIPEstat page for AS 12880: https://stat.ripe.net/AS12880#tabId=at-a-glance Try looking up other Iranian networks NDT data in Google http://www.google.com/publicdata/explore?ds=e9krd11m38onf_&ctype=l&str ail=false&bcs=d&nselm=h&met_y=download_throughput&scale_y=lin&ind_y =false&rdim=country&idim=country:364&ifdim=country&ind=false OOKLA Speed test: http://www.google.com/publicdata/explore?ds=z8ii06k9csels2_&ctype=l&met _y=avg_download_speed ASSIGNMENT 1 DISCUSSION Some questions led to confusion Part 1 • Q2 who are the source and destination (most people got this): • Source ICSI ; Destination Baidu • (accepted anything from the destination whois since it was a weird looking record). • Q3 Why did Wireshark flag Packet 8 as a retransmission? • Q4 What is unusual about the response in packet 9? • (there was a typo in this question, I reduced total by 3 for it) • Q5 (most people got this): Is it the same device responding to packet 8 as the first GET (packet 4)? ASSIGNMENT 1 DISCUSSION Some questions led to confusion Part 2 • Q2 What is missing from the TCP connection? • Handshake! • Q3 Flow or packet level censor? (a lot of people had trouble with this!) • Q4 What is returned in response to the HTTP GET? Part 3 • On path or in path censor? Why? TEST YOUR KNOWLEDGE (IRAN) 1. How did the government assert control over the Internet in 2001? 2. What were the conflicting goals of Iran in implementing censorship? 3. What was the end result of these? 4. What is the `campaign for halal Internet’? Why did people fear this? 5. What is the purpose of IP addresses designated in RFC1918? 6. What are two techniques used by Anderson to map an internal 10.x.x.x address to an external IP? 7. What is the idea of “dimming the Internet”? 8. How can “dimming” be measured? 9. How did the pseudonymous paper identify filtering based on the host header? 10. What type of proxy did they find? TEST YOUR KNOWLEDGE (PAKISTAN) 1. Why is censorship in Pakistan well known? 2. Which protocol allowed collateral damage when they censored YouTube? 3. What is the main form of censorship in Pakistan identified by the Web censorship paper? 4. What are the most common circumvention techniques in Pakistan? 5. What are some circumvention techniques that were found to work for Pakistan censorship? 6. Which product did CitizenLab find in Pakistan? 7. How did they find the IP of installations of the product? 8. How did they verify that the installations were used for censorship? TODAY Case Study: China • Background (ONI report) • Concept Doppler (Crandall et al.) • http://www.csd.uoc.gr/~hy558/papers/conceptdoppler.pdf • Locating the censors (Xu et al.) • http://pam2011.gatech.edu/papers/pam2011--Xu.pdf • Great Cannon Seminar Tuesday October 13 @ 1pm! BACKGROUND • China: one-party state, ruled by the Chinese Communist Party (CCP) • Conflict between IT development and ability to contain sensitive or threatening information Several milestones challenged gov’t control: • Anniversary of Tibetan Uprising in 2008: Protests in Lhasa • 2008 Beijing Olympics, pressures to lessen censorship • • • • Foreign reporters had unprecedented access within the country Domestic news still highly restricted Unfettered Internet access for foreign journalists restricted to “games-related” Web sites 2009 Riots in Urumqi led to Internet restrictions to “quench the riot … and prevent violent” • State sponsored Web site access (31 sites) slowly restored but Internet access in Xinjiang was effectively severed for 10 months BACKGROUND 2 • 2010 Google refuses to comply with legal requirements of content filtering in China • • • Attempts to hack gmail accounts of human rights activists Google wanted to establish a truly free and open search engine or officially close Google.cn Google’s actions placed PRC’s censorship practices in the international spotlight • China’s population means that even at 28.9% Internet penetration the country has the most Internet users in the world • Mobile networking is important for bridging the rural/urban divide • • 10% of users access the Internet only on a mobile device Widespread use of Internet has led to a change in the public discourse and exposing corruption of government officials and even dismissal of senior officials INTERNET FILTERING IN CHINA • Initial project: Green Dam Youth Escort project • Filtering at the level of the user’s computer • Analysis by ONI + StopBadware showed that it wasn’t effective in blocking all pornography and would unpredictably block political and religious content • Follow on project: “Blue Dam” with more features mandated to be installed by ISPs • Server side/ISP-level blocking of content deemed to be inappropriate • Blog services are also responsible for policing content on their sites • Must install keyword filters + delete accounts of violating users • More on this in the online social networking lecture … TODAY’S READINGS ConceptDoppler: A Weather Tracker for Internet Censorship. Crandall et al. 2006 Internet Censorship in China: Where Does the Filtering Occur? Xu et al. 2011 CONCEPTDOPPLER • Goals: Develop the capability to monitor both the technical censorship mechanism and how it is used • E.g., to answer questions such as why was “keyword blocked” in a given place and time • Focus on keyword blocking: less collateral damage, less negative fall out than blunt censorship techniques • Specific aim to monitor the set of blocked keywords over time and monitor for variations between regions • Useful for circumvention (e.g., encoding keywords to make them unidentifiable to DPI boxes). • The need for continuous monitoring of censored keywords requires efficient probing: • Latent semantic analysis + techniques from Web search to minimize the set of keywords that must be probed. EXPERIMENT 1 TEST GFC 72 hours of HTTP GETs to www.yahoo.cn 1. Send FALUN (blocked keyword) until a RST is received 2. Switch to send TEST (benign word) until a valid HTTP response is received Observation: hosts are blocked from communicating for 90 seconds after sending the bad keyword. RESULTS: BLOCKING IS NOT 100% RELIABLE Diurnal trends in filtering effectiveness White = not blocked; grey = blocked X = 0 is 15:00 in Beijing Y = # of probes EXPERIMENT 2: FIND GFC FIREWALLS Target URLs probed = top 100 URLs returned by Google for the queries: site:x where X= .cn, .com.cn, .edu.cn, .org.cn, and .net.cn No RSTs observed if the connection was not open -> GFC is stateful WHERE ARE THE CENSORS? FINDING KEYWORDS EFFICIENTLY • Latent semantic analysis (LSA) to find keywords related to concepts the government might filter • Details of LSA in the paper. • Experimental set up: • Gather all the pages of Chinese language Wikipedia • 12 term lists based on 12 general concepts (terms are selected to be most related to the concept via LSA) • Probed 2,500 terms from each of the 12 lists NUMBER OF KEYWORDS FOUND X axis = bins of 250 keywords Y axis = # of blocked keywords found Key takeaway: 2,500 tested terms selected via LSA contains many more Blocked keywords than randomly chosen terms LIMITATIONS • Scale: Querying each 2,500 word list takes 1.2 - 6.7 hours; heavy use of network resources • False positives + false negatives • No way to claim these lists are exhaustive … Potentials for evation: • If we know what keywords are filtered when we observe them we can be clever and hide them • • • • E.g., fragment IP packets on the keyword, HTML comments in the word: fa<!- comment ->lun Use different encodings for keywords: F%61lun Gong Put keywords in Captchas or insert other characters: f@lun g0ng READING 2: MAPPING THE GFC • Presentation HANDS ON ACTIVITY Online reports of Chinese censorship: https://en.greatfire.org/ http://www.herdict.org/explore/indepth?fc=CN http://www.google.com/transparencyreport/traffic/explorer/?r=CN &l=EVERYTHING China Chats data: https://china-chats.net/ Video chat censorship: https://www.usenix.org/sites/default/files/conference/protectedfiles/foci15_slides_knockel.pdf