Shining Light in Dark Places: Understanding the Tor Network IM E-Mail Telnet WWW BitTorrent SSL WWW E-Mail BitTorrent Damon McCoy1 Kevin Bauer1 Dirk Grunwald1 Tadayoshi Kohno2 Douglas Sicker1 1 University of Colorado 2 University of Washington Rumors Abound: How Are PETs Being Used? This talk shows how Tor is used in practice – How is Tor being used? What application layer protocols are common in the network? – How is Tor being mis-used? What sorts of malicious behavior can one find? – Who is using Tor? From where do Tor users and routers originate? Background: Tor uses layered encryption to protect each client’s identity Client (Tor Proxy) Exit Router Middle Router Destination Server Entry Guard Circuit Directory Server Router list Tor provides anonymity for TCP by tunneling traffic through a circuit of three Tor routers using a layered encryption technique Running a Tor router can tell us who uses Tor and what applications are used Who uses Tor? Entry guards can directly Papa Smurf observe client identities How is Tor used? Exit routers can observe - destination server - decrypted payload Our Tor Router HTTP protocol www.google.com Talk Outline • Data collection methodology – Entrance router logging – Exit router logging • How is Tor being used? – Protocol distribution – Interactive/non-interactive protocols – Insecure/secure (SSL) protocols • How is Tor being mis-used? – Router abuse – Client abuse • Who is using Tor? We deploy a Tor router and collect data about clients and exit traffic • We setup a Tor router on a 1 Gb/s network link during December 2007 and January 2008 • It is necessary to run the router twice, in two distinct configurations: To maximize client observations: block all exit traffic necessary to be marked “entry guard” To maximize destination observations: use default exit policy Entry and Middle Router Logging Details • • • • Ran our router from January 15-30, 2008 (15 days) No exit traffic allowed Marked as an entry guard Circuit Next router Client identification: Our router Possible Tor client 128.138.243.151, port 5544 (cs.colorado.edu) A router? 128.138.243.151 is a client No 128.31.0.34, port 9001 (moria.csail.mit.edu) Cell 10 received From: 128.138.243.151, 5544 Next: 128.31.0.34, 9001 Yes Directory Server We’re the middle router Exit Router Logging Details • Ran our router from December 15-19, 2007 (4 days) • Used tcpdump to capture only the first 150 bytes 150 bytes total is captured Ethernet Header IP Header TCP Header (no options) 14 bytes 20 bytes 20 bytes Application Header 96 bytes • Our router transported 709 GB of TCP exit traffic • Used ethereal for application protocol identification Talk Outline • Data collection methodology – Entrance router logging – Exit router logging • How is Tor being used? – Protocol distribution – Interactive/non-interactive protocols – Insecure/secure (SSL) protocols • How is Tor being mis-used? – Router abuse – Client abuse • Who is using Tor? Understanding the Protocol Distribution (1) Most connections are interactive HTTP Number of TCP Connections by Protocol 12,160,437 (92.45%) HTTP SSL 534,666 (4.06%) BitTorrent 438,395 (3.33%) Instant Messenging 10,506 (0.08%) E-Mail 7,611 (0.06%) FTP 1,338 (0.01%) Telnet 1,045 (0.01%) 0 5,000,000 Over 13 million exit connections were observed 10,000,000 HTTP makes up over 92% of the connections through Tor Only 3.5% of HTTP connections transported more than 1 MB 15,000,000 Understanding the Protocol Distribution (2) Non-Interactive traffic uses too much bandwidth Total Number of Bytes Transported by Protocol 411 GB (57.90%) HTTP 11 GB (1.55%) SSL 285 GB (40.20%) BitTorrent Instant Messenger 735 MB (0.10%) E-Mail 291 MB (0.04%) FTP 792 MB (0.11%) Telnet 110 MB (0.02%) 0 50 100 150 200 250 300 350 400 450 Gigabytes (GB) HTTP makes up 58%, but BitTorrent accounts for a disproportionate amount of bandwidth: ~40% of bandwidth used by ~3% of connections Insecure protocols are common (and dangerous!) Problem: Insecure protocols (non-SSL) are very common Only 2% of destination servers provide SSL Circuit ID: 3 IM name: PapaSmurf Why is this bad? Circuit Malicious Exit Router Circuit ID: 3 SSL Bank of Belgium The malicious exit knows the client’s identity (PapaSmurf) Even worse, the exit knows the client for all traffic from circuit 3 Mitigating Information Leakage WarnPlaintextPorts and RejectPlaintextPorts through Insecure Protocols options now in Tor spec (1) Block insecure protocols (port-based blocking) Port 23 - denied Tor telnet cs.colorado.edu local Tor proxy (2) Improve HTTP proxy to recognize identity leakage over HTTP POST /login.jsp HTTP/1.1 Host: www.myspace.com Userid=papa_smurf&password=this_is_my_password (3) Do not multiplex secure and non-secure traffic on same circuits Circuit 1 Circuit 1 Circuit 2 Talk Outline • Data collection methodology – Entrance router logging – Exit router logging • How is Tor being used? – Protocol distribution – Interactive/non-interactive protocols – Insecure/secure (SSL) protocols • How is Tor being mis-used? – Router abuse – Client abuse • Who is using Tor? Malicious Router Behavior Detecting Routers that Log Exit Traffic How can we tell if an exit router is logging traffic? – Remotely detecting a network interface in promiscuous mode is hard – Remotely detecting that a packet sniffer is running is hard Key observation: tcpdump performs reverse DNS queries in real time (by default) Who is 128.138.242.23? Client (or client’s DNS server) Answer: www.cs.colorado.edu Authoritative DNS Server Malicious Router Behavior Method to Detect Routers that Log Exit Traffic Vacant IP address 1.1.1.1 Tor Client (we control) Tor Network SYN 1.1.1.1 Circuit Authoritative DNS Server for IP 1.1.1.1 (we control) Logging Exit Router Step 1. Run authoritative DNS server for a vacant IP address Step 2. Use Tor client to send a TCP SYN packet to our IP address through every exit router Malicious Router Behavior Results and Detection Optimizations • In one day, we observed reverse DNS queries from an exit router’s DNS server shortly after pinging port 110 (POP3) through that router • Use honeypot and send unique (username, password) But what if reverse DNS is not performed? combinations through each exit router $ ssh 1.1.1.1 Tor Client (we control) Tor Network User: root Password: passwd root/passwd Circuit Honeypot IP 1.1.1.1 (we control) Logging Exit Router Dear XXXXXXX: Client We are writing this Malicious letter on behalf Tor of Warner Bros. Behavior EntertainmentWhat Inc. ("Warner Bros."). types of Abuse are Common? Dear Security coordinators, We have received information that an individual has utilized looking I found these on the the below-referenced IP address at suspicious the noted date andconnections time to IRC Chat picture(s) Network connecting from a netblock downloads of Undernet copyrighted motion offer through a control. They wereas mostly connecting "peer-to-peer” service,you including such title(s) to the servers that XXXXXX resolves to, most likely on V For Vendetta – port DVD6667. Other possible ports include 6660-7000, 8888, and 8080. The distribution of unauthorized copies of copyrighted motion Please check for a compromise, pictures constitutes copyright infringement under the possible hidden running and Section an altered process listing. Run the Copyright Act, Title 17 process United States Code 106(3). updates for your system to close possible exploit holes, … and send any unusual programs found to info@cyberabuse.org for investigation. Dear Sir, We are sorry to inform you that earlier this morning your server was used by hackers to attack a Swedish newspaper. We kindly ask you to check your logs. Hacking Attempts Talk Outline • Data collection methodology – Entrance router logging – Exit router logging • How is Tor being used? – Protocol distribution – Interactive/non-interactive protocols – Insecure/secure (SSL) protocols • How is Tor being mis-used? – Router abuse – Client abuse • Who is using Tor? Who uses Tor? Mapping Client and Router IPs to Countries • Tor clients’ IPs can be observed when our router is used as the client’s entry guard • We map client IPs to their countries of origin: – Remove Tor routers (using directory servers) – Query IP assignment authorities (ARIN, APNIC, LACNIC, RIPE, and AFRINIC) 128.138.243.151 (cs.colorado.edu) where? American Registry for Internet Numbers answer United States of America Who uses Tor? Geopolitical Client Distribution: One Day Snapshot Geopolitical Client Distribution 2,304 (30.4%) Germany 988 (13.0%) China 864 (11.4%) United States 254 (3.4%) Italy 221 (2.9%) Turkey 0 500 1000 1500 2000 2500 7,571 unique clients observed during one day snapshot Clients from over 126/195 countries observed during 15 day observation Who uses Tor? Geopolitical Client Distribution: Relative Tor Usage Relative Tor Usage Germany 7.73 Tor is disproportionately popular 2.47 Turkey 1.37 Italy Russia 0.89 China 0.84 0 2 Example, Germany: 30.4% of Tor users 3.9% of Internet users 30.4% / 3.9% = 7.73 4 Relative Tor popularity ( X) = 6 8 10 Fraction of Tor users from country X Fraction of Internet users from country X Who uses Tor? Geopolitical Router Distribution Geopolitical Router Distribution 374 (31.5%) Germany 326 (27.4%) United States 69 (5.9%) France China 40 (3.4%) Italy 36 (3.0%) 0 100 200 300 400 Over 7 days, hourly snapshots of router IPs are taken from directory severs On average 1,188 unique routers were observed per hour Who uses Tor? Geopolitical Router Distribution: Implications for Enforcing Location Diversity Router Bandwidth Distribution China; 2% Netherlands; 4% Other; 17% Germany; 45% Most bandwidth is in just a few countries France; 9% United States; 23% Germany and US provide 59% of running routers, but 68% of the bandwidth Who uses Tor? Modeling Router Utilization: One hour snapshot Compute a probability density function (PDF) of router utilization Most popular router transports 4.1% of all traffic observed ~1,400 unique routers observed in 1 hour 2% (27) of the routers transport 50% of the total traffic; located in 6 countries This utilization model provides: 1. Insight into traffic analysis vulnerabilities 2. The basis for simulating Tor-like networks Summary • How Tor is used? – Primarily interactive HTTP and BitTorrent – Non-SSL protocols are common • How Tor is mis-used? – Exit routers logging sensitive information – Copyright infringement allegations, hacking attempts, and web page defacement • Who uses Tor? – Most clients come from Germany, China, and US – Location diversity in Tor’s routing is currently difficult to guarantee Conclusion and Future Research Directions • This study highlights the challenges of deploying a real anonymity service • Additional research: – Enforcing client accountability w/o degrading anonymity – Protecting users from disclosing identifying information – Encouraging router participation and location diversity Dataset Release We definitely want to share our data with the community But, the obstacles are: • Completely removing application-layer data • Scrubbing all network identifiers in headers (IP addresses, TCP ports, etc.) • Server to host such a large data set (several GB compressed) We’re currently working on these issues Ethics of Our Data Collection Procedure We understand that there are serious privacy concerns We capture only 96 bytes of application-level headers/data The data we collected cannot be used to link a sender/receiver pair Since we transported so much exit traffic, it was not feasible to perform protocol identification in real-time Application-level data is captured only for the purpose of automatic protocol identification We DO NOT examine the decrypted payloads of any packets, collect login credentials, or any personally identifying data Was It Necessary to Consult the IRB? Institutional Review Boards (IRB) ensure that human research subjects are treated ethically It is the norm within the networking community to collect live network traces without consulting IRBs We present aggregated and anonymous results – absolutely no personally identifying information Why Didn’t We Use GeoIP for Localization? We found that GeoIP provided adequate coverage for North America and European IP addresses, but more limited coverage of other regions Overcounting in Geopolitical Distributions? We restricted our client observations to 24 hours to limit the influence of highly dynamic IP addresses, particularly in China and Germany What Did We Do When We Caught The Eavesdropping Exit Router? Upon detect reverse DNS queries and failed login attempts from an exit router, we immediately contacted the Tor developers and shared our detection method and observations