Disambiguation of Residential Wired and Wireless Access in a

advertisement
Disambiguation of Residential
Wired and Wireless Access in
a Forensic Setting
Sookhyun Yang, Jim Kurose, Brian Neil Levine
University of Massachusetts Amherst
shyang@cs.umass.edu
This research is supported by NSF awards CNS-0905349 and CNS-1040781.
UNIVERSITY OF MASSACHUSETTS, AMHERST • Department of Computer Science
Outline





Introduction
Problem Statement
Experimental Methodology
Classification Results
Conclusion
UNIVERSITY OF MASSACHUSETTS, AMHERST • Department of Computer Science
2
Illegal content distributed P2P from known location
Someone
used my
open Wi-Fi!
“wired
or wireless
access? ”
Illegal content
distributor (e.g., CP)
Step2.
Known sender
peerlocation
Step1.
peer
Public IP
P2P network
address
Wireless
router
peer
peer
Law enforcement
Challenge:
“Can we legally determine that a suspect used wired access, thus
making the resident user more likely to be a responsible party?”
UNIVERSITY OF MASSACHUSETTS, AMHERST • Department of Computer Science
3
Can We Intercept Data at Intermediate Nodes?
Wireless
router
router
…
…
peer
Illegal content distributor
Data
interception
Data
interception
via a sniffer
Law
enforcement
No, law enforcement can not legally take traces at
intermediate nodes without a warrant or wiretap.


Reasonable expectation of privacy (REP) for the sources of data.
The Wiretap Act and the Pen Register statute.
UNIVERSITY OF MASSACHUSETTS, AMHERST • Department of Computer Science
4
Can We Intercept Data as a Peer?
P2P network
Illegal content distributor
Wireless
router
Law
enforcement
peer
Yes, measurements taken at a peer, before a warrant, are legal!


Users of P2P file sharing networks have no “reasonable
expectation of privacy”.
Software designed for law enforcement to monitor P2P activity
does not violate US 4th amendment protections.
UNIVERSITY OF MASSACHUSETTS, AMHERST • Department of Computer Science
5
Outline





Introduction
Problem Statement
Experimental Methodology
Classification Results
Conclusion
UNIVERSITY OF MASSACHUSETTS, AMHERST • Department of Computer Science
6
Our Problem Setting
Challenge: can we classify the access network type of target
sender using remotely measured P2P traces?
?
? ?? ? ?
?? ?
Ethernet
Target
Wi-Fi
AP
Wired
access?
Cable
network
P2P
Internet
Cable
modem
Law
enforcement
peer
Challenges in this forensic setting: hidden and unknown
residential factors can affect classification results.
UNIVERSITY OF MASSACHUSETTS, AMHERST • Department of Computer Science
7
Our Contribution


Investigate performance of several wired-vswireless classification algorithms in various
home network scenarios.
Observe how several scenario factors affect
classifier performance.





Single flow vs. Multiple flows from a target.
Operating systems.
P2P application rate limit.
Wireless channel contention.
Explain when, why and how the classifier
works reliably or poorly.
See Tech. Rep. UM-CS-2013-001, Dept. of CS, UMass Amherst.
UNIVERSITY OF MASSACHUSETTS, AMHERST • Department of Computer Science
8
Outline





Introduction
Problem Statement
Experimental Methodology
Classification Results
Conclusion
UNIVERSITY OF MASSACHUSETTS, AMHERST • Department of Computer Science
9
Diversely Emulated P2P Traces in Controlled Settings
Host-side vs.
Cable network
Single full-rate
TCP flow.
Multiple TCP
flows.
802.11g or
1Gbps
Ethernet.
Linux vs.
Windows
XP
…
Target device
Cable network
effect (different
times, and houses)
Remotely
collecting pairs of
wired and wireless
datasets
Internet
Wi-Fi
AP
Cable
modem
UMass
server
Less than 1m
(the worst case)
Wired sniffer
Purdue
server
Houses near UMass
We take measurement here to help us explain/understand
classification. but do NOT use them in classification.
UNIVERSITY OF MASSACHUSETTS, AMHERST • Department of Computer Science
10
Outline





Introduction
Problem Statement
Experimental Methodology
Classification Results
Conclusion
UNIVERSITY OF MASSACHUSETTS, AMHERST • Department of Computer Science
11
Classification Procedure

Classification features.

25th, 50th, 75th percentiles, entropy of packet interarrival times distribution for datasets.

We train and cross-validate decision tree,
logistic regression, SVM, and EM classifiers.

Classification performance metrics.



TPR (True Positive Rate).
FPR (False Positive Rate).
FPR≤0.10 and 0.90≤TPR are acceptable
classification results.
UNIVERSITY OF MASSACHUSETTS, AMHERST • Department of Computer Science
12
Single-flow Classification Results
Linux
Windows XP
25th
percentile
Inconsistent
Not acceptable
Entropy
Not acceptable
Inconsistent
Accurate classification is difficult in single full-rate
flow cases.
UNIVERSITY OF MASSACHUSETTS, AMHERST • Department of Computer Science
13
Multiple Flows Classification Results
Linux
Windows XP
25th
percentile
Acceptable
Not acceptable
Entropy
Acceptable
Acceptable
Multiple flows cases can show better
classification results than single full-rate flow
cases.
UNIVERSITY OF MASSACHUSETTS, AMHERST • Department of Computer Science
14
Classification: insight into how it works
Packet inter-arrival times
before a cable network
Target
device
Wi-Fi
AP
Cable
modem
…
802.11 or
Ethernet
access
protocol
Cable
network
access
protocol
Packet inter-arrival times
after a cable network
…
UMass
server
Key insight: Classify at receiver using packet inter-arrival
times at sender that were not significantly changed a by cable
network access protocol or a network at sender.
UNIVERSITY OF MASSACHUSETTS, AMHERST • Department of Computer Science
15
Discussion

Classification features showing acceptable
results are different for Linux and Windows
XP.



Windows’s small 8 KB TCP send buffer.
This is also found in other Windows versions.
Single full-rate flow vs. multiple-flows.

A flow generated with multiple competing flows from a
target would be less-affected by a cable network.
See Tech. Rep. UM-CS-2013-001, Dept. of CS, UMass Amherst.
UNIVERSITY OF MASSACHUSETTS, AMHERST • Department of Computer Science
16
Conclusion

We justified our traces gathering method’s
legality based on US law.

We proposed a classifier for determining
whether a target used wired or wireless.

Through extensive experimentation, we
determined scenarios where classifier works
reliably.

Traces: traces.cs.umass.edu.
UNIVERSITY OF MASSACHUSETTS, AMHERST • Department of Computer Science
17
Open Questions

Other hidden or unknown residential
factors.





Mac OS.
802.11n, MIMO.
Modified TCP implementation.
Multiple-flow across multiple sites.
Long-term traces.
UNIVERSITY OF MASSACHUSETTS, AMHERST • Department of Computer Science
18
End
Questions or comments
welcome!
UNIVERSITY OF MASSACHUSETTS, AMHERST • Department of Computer Science
Download