Who uses Tor?

advertisement
Shining Light in Dark Places:
Understanding the Tor Network
IM
E-Mail
Telnet
WWW
BitTorrent
SSL
WWW
E-Mail
BitTorrent
Damon McCoy1
Kevin Bauer1
Dirk Grunwald1
Tadayoshi Kohno2
Douglas Sicker1
1 University of Colorado
2 University of Washington
Rumors Abound: How Are PETs Being Used?
This talk shows how Tor is used in practice
– How is Tor being used? What application
layer protocols are common in the
network?
– How is Tor being mis-used? What sorts of
malicious behavior can one find?
– Who is using Tor? From where do Tor
users and routers originate?
Background: Tor uses layered encryption to
protect each client’s identity
Client
(Tor Proxy)
Exit Router
Middle Router
Destination
Server
Entry Guard
Circuit
Directory Server
Router list
Tor provides anonymity for TCP by tunneling traffic through a
circuit of three Tor routers using a layered encryption technique
Running a Tor router can tell us who uses Tor
and what applications are used
Who uses Tor?
Entry guards can directly Papa Smurf
observe client identities
How is Tor used?
Exit routers can observe
- destination server
- decrypted payload
Our
Tor
Router
HTTP protocol
www.google.com
Talk Outline
• Data collection methodology
– Entrance router logging
– Exit router logging
• How is Tor being used?
– Protocol distribution
– Interactive/non-interactive protocols
– Insecure/secure (SSL) protocols
• How is Tor being mis-used?
– Router abuse
– Client abuse
• Who is using Tor?
We deploy a Tor router and collect data about
clients and exit traffic
• We setup a Tor router on a 1 Gb/s network link
during December 2007 and January 2008
• It is necessary to run the router twice, in two distinct
configurations:
To maximize client observations:
block all exit traffic
necessary to be marked “entry guard”
To maximize destination observations:
use default exit policy
Entry and Middle Router Logging Details
•
•
•
•
Ran our router from January 15-30, 2008 (15 days)
No exit traffic allowed
Marked as an entry guard
Circuit
Next router
Client identification:
Our router
Possible Tor client
128.138.243.151, port 5544
(cs.colorado.edu)
A router?
128.138.243.151
is a client
No
128.31.0.34, port 9001
(moria.csail.mit.edu)
Cell 10 received
From: 128.138.243.151, 5544
Next: 128.31.0.34, 9001
Yes
Directory Server
We’re the
middle router
Exit Router Logging Details
• Ran our router from December 15-19, 2007 (4 days)
• Used tcpdump to capture only the first 150 bytes
150 bytes total is captured
Ethernet
Header
IP Header
TCP Header
(no options)
14 bytes
20 bytes
20 bytes
Application Header
96 bytes
• Our router transported 709 GB of TCP exit traffic
• Used ethereal for application protocol identification
Talk Outline
• Data collection methodology
– Entrance router logging
– Exit router logging
• How is Tor being used?
– Protocol distribution
– Interactive/non-interactive protocols
– Insecure/secure (SSL) protocols
• How is Tor being mis-used?
– Router abuse
– Client abuse
• Who is using Tor?
Understanding the Protocol Distribution (1)
Most connections are interactive HTTP
Number of TCP Connections by Protocol
12,160,437
(92.45%)
HTTP
SSL
534,666 (4.06%)
BitTorrent
438,395 (3.33%)
Instant Messenging
10,506 (0.08%)
E-Mail
7,611 (0.06%)
FTP
1,338 (0.01%)
Telnet
1,045 (0.01%)
0
5,000,000
Over 13 million exit connections were observed
10,000,000
HTTP makes up over 92% of the connections through Tor
Only 3.5% of HTTP connections transported more than 1 MB
15,000,000
Understanding the Protocol Distribution (2)
Non-Interactive traffic uses too much bandwidth
Total Number of Bytes Transported by Protocol
411 GB
(57.90%)
HTTP
11 GB
(1.55%)
SSL
285 GB
(40.20%)
BitTorrent
Instant Messenger
735 MB (0.10%)
E-Mail
291 MB (0.04%)
FTP
792 MB (0.11%)
Telnet
110 MB (0.02%)
0
50
100
150
200
250
300
350
400
450
Gigabytes (GB)
HTTP makes up 58%, but BitTorrent accounts for a disproportionate amount
of bandwidth: ~40% of bandwidth used by ~3% of connections
Insecure protocols are common (and dangerous!)
Problem: Insecure protocols (non-SSL) are very common
Only 2% of destination servers provide SSL
Circuit ID: 3
IM name: PapaSmurf
Why is this bad?
Circuit
Malicious Exit Router
Circuit ID: 3
SSL Bank of Belgium
The malicious exit knows the client’s identity (PapaSmurf)
Even worse, the exit knows the client for all traffic from circuit 3
Mitigating Information Leakage WarnPlaintextPorts and
RejectPlaintextPorts
through Insecure Protocols options now in Tor spec
(1) Block insecure protocols (port-based blocking)
Port 23 - denied
Tor
telnet cs.colorado.edu
local Tor proxy
(2) Improve HTTP proxy to recognize identity leakage over HTTP
POST /login.jsp HTTP/1.1
Host: www.myspace.com
Userid=papa_smurf&password=this_is_my_password
(3) Do not multiplex secure and non-secure traffic on same circuits
Circuit 1
Circuit 1
Circuit 2
Talk Outline
• Data collection methodology
– Entrance router logging
– Exit router logging
• How is Tor being used?
– Protocol distribution
– Interactive/non-interactive protocols
– Insecure/secure (SSL) protocols
• How is Tor being mis-used?
– Router abuse
– Client abuse
• Who is using Tor?
Malicious Router Behavior
Detecting Routers that Log Exit Traffic
How can we tell if an exit router is logging traffic?
– Remotely detecting a network interface in promiscuous
mode is hard
– Remotely detecting that a packet sniffer is running is hard
Key observation: tcpdump performs reverse DNS queries in real
time (by default)
Who is 128.138.242.23?
Client
(or client’s
DNS server)
Answer: www.cs.colorado.edu
Authoritative
DNS Server
Malicious Router Behavior
Method to Detect Routers that Log Exit Traffic
Vacant IP address
1.1.1.1
Tor Client
(we control)
Tor Network
SYN 1.1.1.1
Circuit
Authoritative
DNS Server
for IP 1.1.1.1
(we control)
Logging Exit Router
Step 1. Run authoritative DNS server for a vacant IP address
Step 2. Use Tor client to send a TCP SYN packet to our IP address
through every exit router
Malicious Router Behavior
Results and Detection Optimizations
• In one day, we observed reverse DNS queries from an exit
router’s DNS server shortly after pinging port 110 (POP3)
through that router
• Use honeypot and send unique (username, password)
But what if reverse DNS is not performed?
combinations through each exit router $ ssh 1.1.1.1
Tor Client
(we control)
Tor Network
User: root
Password: passwd
root/passwd
Circuit
Honeypot IP
1.1.1.1
(we control)
Logging Exit Router
Dear XXXXXXX:
Client
We are writing this Malicious
letter on behalf Tor
of Warner
Bros. Behavior
EntertainmentWhat
Inc. ("Warner
Bros.").
types
of Abuse are Common?
Dear Security coordinators,
We have received information that an individual has utilized
looking
I found
these
on the
the below-referenced IP
address
at suspicious
the noted date
andconnections
time to
IRC
Chat picture(s)
Network
connecting
from a netblock
downloads of Undernet
copyrighted
motion
offer
through
a
control.
They
wereas
mostly connecting
"peer-to-peer” service,you
including
such
title(s)
to the servers that XXXXXX resolves to, most likely on
V For Vendetta – port
DVD6667. Other possible ports include 6660-7000,
8888, and 8080.
The distribution of unauthorized copies of copyrighted motion
Please check
for a compromise,
pictures constitutes copyright
infringement
under the possible hidden
running
and Section
an altered
process listing. Run the
Copyright Act, Title 17 process
United States
Code
106(3).
updates for your system to close possible exploit holes,
…
and send any unusual programs found to
info@cyberabuse.org for investigation.
Dear Sir,
We are sorry to inform you that earlier this morning your server was
used by hackers to attack a Swedish newspaper.
We kindly ask you to check your logs.
Hacking Attempts
Talk Outline
• Data collection methodology
– Entrance router logging
– Exit router logging
• How is Tor being used?
– Protocol distribution
– Interactive/non-interactive protocols
– Insecure/secure (SSL) protocols
• How is Tor being mis-used?
– Router abuse
– Client abuse
• Who is using Tor?
Who uses Tor?
Mapping Client and Router IPs to Countries
• Tor clients’ IPs can be observed when our router is
used as the client’s entry guard
• We map client IPs to their countries of origin:
– Remove Tor routers (using directory servers)
– Query IP assignment authorities
(ARIN, APNIC, LACNIC, RIPE, and AFRINIC)
128.138.243.151
(cs.colorado.edu)
where?
American Registry
for Internet Numbers
answer
United States of America
Who uses Tor?
Geopolitical Client Distribution: One Day Snapshot
Geopolitical Client Distribution
2,304 (30.4%)
Germany
988 (13.0%)
China
864 (11.4%)
United States
254 (3.4%)
Italy
221 (2.9%)
Turkey
0
500
1000
1500
2000
2500
7,571 unique clients observed during one day snapshot
Clients from over 126/195 countries observed during 15 day observation
Who uses Tor?
Geopolitical Client Distribution: Relative Tor Usage
Relative Tor Usage
Germany
7.73
Tor is
disproportionately
popular
2.47
Turkey
1.37
Italy
Russia
0.89
China
0.84
0
2
Example, Germany:
30.4% of Tor users
3.9% of Internet users
30.4% / 3.9% = 7.73
4
Relative Tor popularity ( X) =
6
8
10
Fraction of Tor users from country X
Fraction of Internet users from country X
Who uses Tor?
Geopolitical Router Distribution
Geopolitical Router Distribution
374 (31.5%)
Germany
326 (27.4%)
United States
69 (5.9%)
France
China
40 (3.4%)
Italy
36 (3.0%)
0
100
200
300
400
Over 7 days, hourly snapshots of router IPs are taken from directory severs
On average 1,188 unique routers were observed per hour
Who uses Tor?
Geopolitical Router Distribution: Implications for
Enforcing Location Diversity
Router Bandwidth Distribution
China; 2%
Netherlands; 4%
Other;
17%
Germany;
45%
Most bandwidth is in
just a few countries
France; 9%
United States;
23%
Germany and US provide 59% of running routers, but 68% of the bandwidth
Who uses Tor?
Modeling Router Utilization: One hour snapshot
Compute a probability density function
(PDF) of router utilization
Most popular router
transports 4.1% of all
traffic observed
~1,400 unique routers observed in 1 hour
2% (27) of the routers transport 50% of
the total traffic; located in 6 countries
This utilization model provides:
1. Insight into traffic analysis vulnerabilities
2. The basis for simulating Tor-like networks
Summary
• How Tor is used?
– Primarily interactive HTTP and BitTorrent
– Non-SSL protocols are common
• How Tor is mis-used?
– Exit routers logging sensitive information
– Copyright infringement allegations, hacking
attempts, and web page defacement
• Who uses Tor?
– Most clients come from Germany, China, and US
– Location diversity in Tor’s routing is currently
difficult to guarantee
Conclusion and Future Research Directions
• This study highlights the challenges of deploying a
real anonymity service
• Additional research:
– Enforcing client accountability w/o degrading anonymity
– Protecting users from disclosing identifying information
– Encouraging router participation and location diversity
Dataset Release
We definitely want to share our data with the community
But, the obstacles are:
• Completely removing application-layer data
• Scrubbing all network identifiers in headers (IP addresses, TCP
ports, etc.)
• Server to host such a large data set (several GB compressed)
We’re currently working on these issues
Ethics of Our Data Collection Procedure
We understand that there are serious privacy concerns
We capture only 96 bytes of application-level headers/data
The data we collected cannot be used to link a sender/receiver pair
Since we transported so much exit traffic, it was not feasible to
perform protocol identification in real-time
Application-level data is captured only for the purpose of
automatic protocol identification
We DO NOT examine the decrypted payloads of any packets,
collect login credentials, or any personally identifying data
Was It Necessary to Consult the IRB?
Institutional Review Boards (IRB) ensure that human research
subjects are treated ethically
It is the norm within the networking community to collect live
network traces without consulting IRBs
We present aggregated and anonymous results – absolutely no
personally identifying information
Why Didn’t We Use GeoIP for Localization?
We found that GeoIP provided adequate coverage for North
America and European IP addresses, but more limited coverage
of other regions
Overcounting in Geopolitical Distributions?
We restricted our client observations to 24 hours to limit the
influence of highly dynamic IP addresses, particularly in China
and Germany
What Did We Do When We Caught The
Eavesdropping Exit Router?
Upon detect reverse DNS queries and failed login attempts from an
exit router, we immediately contacted the Tor developers and
shared our detection method and observations
Download