Disambiguation of Residential Wired and Wireless Access in a Forensic Setting Sookhyun Yang, Jim Kurose, Brian Neil Levine University of Massachusetts Amherst shyang@cs.umass.edu This research is supported by NSF awards CNS-0905349 and CNS-1040781. UNIVERSITY OF MASSACHUSETTS, AMHERST • Department of Computer Science Outline Introduction Problem Statement Experimental Methodology Classification Results Conclusion UNIVERSITY OF MASSACHUSETTS, AMHERST • Department of Computer Science 2 Illegal content distributed P2P from known location Someone used my open Wi-Fi! “wired or wireless access? ” Illegal content distributor (e.g., CP) Step2. Known sender peerlocation Step1. peer Public IP P2P network address Wireless router peer peer Law enforcement Challenge: “Can we legally determine that a suspect used wired access, thus making the resident user more likely to be a responsible party?” UNIVERSITY OF MASSACHUSETTS, AMHERST • Department of Computer Science 3 Can We Intercept Data at Intermediate Nodes? Wireless router router … … peer Illegal content distributor Data interception Data interception via a sniffer Law enforcement No, law enforcement can not legally take traces at intermediate nodes without a warrant or wiretap. Reasonable expectation of privacy (REP) for the sources of data. The Wiretap Act and the Pen Register statute. UNIVERSITY OF MASSACHUSETTS, AMHERST • Department of Computer Science 4 Can We Intercept Data as a Peer? P2P network Illegal content distributor Wireless router Law enforcement peer Yes, measurements taken at a peer, before a warrant, are legal! Users of P2P file sharing networks have no “reasonable expectation of privacy”. Software designed for law enforcement to monitor P2P activity does not violate US 4th amendment protections. UNIVERSITY OF MASSACHUSETTS, AMHERST • Department of Computer Science 5 Outline Introduction Problem Statement Experimental Methodology Classification Results Conclusion UNIVERSITY OF MASSACHUSETTS, AMHERST • Department of Computer Science 6 Our Problem Setting Challenge: can we classify the access network type of target sender using remotely measured P2P traces? ? ? ?? ? ? ?? ? Ethernet Target Wi-Fi AP Wired access? Cable network P2P Internet Cable modem Law enforcement peer Challenges in this forensic setting: hidden and unknown residential factors can affect classification results. UNIVERSITY OF MASSACHUSETTS, AMHERST • Department of Computer Science 7 Our Contribution Investigate performance of several wired-vswireless classification algorithms in various home network scenarios. Observe how several scenario factors affect classifier performance. Single flow vs. Multiple flows from a target. Operating systems. P2P application rate limit. Wireless channel contention. Explain when, why and how the classifier works reliably or poorly. See Tech. Rep. UM-CS-2013-001, Dept. of CS, UMass Amherst. UNIVERSITY OF MASSACHUSETTS, AMHERST • Department of Computer Science 8 Outline Introduction Problem Statement Experimental Methodology Classification Results Conclusion UNIVERSITY OF MASSACHUSETTS, AMHERST • Department of Computer Science 9 Diversely Emulated P2P Traces in Controlled Settings Host-side vs. Cable network Single full-rate TCP flow. Multiple TCP flows. 802.11g or 1Gbps Ethernet. Linux vs. Windows XP … Target device Cable network effect (different times, and houses) Remotely collecting pairs of wired and wireless datasets Internet Wi-Fi AP Cable modem UMass server Less than 1m (the worst case) Wired sniffer Purdue server Houses near UMass We take measurement here to help us explain/understand classification. but do NOT use them in classification. UNIVERSITY OF MASSACHUSETTS, AMHERST • Department of Computer Science 10 Outline Introduction Problem Statement Experimental Methodology Classification Results Conclusion UNIVERSITY OF MASSACHUSETTS, AMHERST • Department of Computer Science 11 Classification Procedure Classification features. 25th, 50th, 75th percentiles, entropy of packet interarrival times distribution for datasets. We train and cross-validate decision tree, logistic regression, SVM, and EM classifiers. Classification performance metrics. TPR (True Positive Rate). FPR (False Positive Rate). FPR≤0.10 and 0.90≤TPR are acceptable classification results. UNIVERSITY OF MASSACHUSETTS, AMHERST • Department of Computer Science 12 Single-flow Classification Results Linux Windows XP 25th percentile Inconsistent Not acceptable Entropy Not acceptable Inconsistent Accurate classification is difficult in single full-rate flow cases. UNIVERSITY OF MASSACHUSETTS, AMHERST • Department of Computer Science 13 Multiple Flows Classification Results Linux Windows XP 25th percentile Acceptable Not acceptable Entropy Acceptable Acceptable Multiple flows cases can show better classification results than single full-rate flow cases. UNIVERSITY OF MASSACHUSETTS, AMHERST • Department of Computer Science 14 Classification: insight into how it works Packet inter-arrival times before a cable network Target device Wi-Fi AP Cable modem … 802.11 or Ethernet access protocol Cable network access protocol Packet inter-arrival times after a cable network … UMass server Key insight: Classify at receiver using packet inter-arrival times at sender that were not significantly changed a by cable network access protocol or a network at sender. UNIVERSITY OF MASSACHUSETTS, AMHERST • Department of Computer Science 15 Discussion Classification features showing acceptable results are different for Linux and Windows XP. Windows’s small 8 KB TCP send buffer. This is also found in other Windows versions. Single full-rate flow vs. multiple-flows. A flow generated with multiple competing flows from a target would be less-affected by a cable network. See Tech. Rep. UM-CS-2013-001, Dept. of CS, UMass Amherst. UNIVERSITY OF MASSACHUSETTS, AMHERST • Department of Computer Science 16 Conclusion We justified our traces gathering method’s legality based on US law. We proposed a classifier for determining whether a target used wired or wireless. Through extensive experimentation, we determined scenarios where classifier works reliably. Traces: traces.cs.umass.edu. UNIVERSITY OF MASSACHUSETTS, AMHERST • Department of Computer Science 17 Open Questions Other hidden or unknown residential factors. Mac OS. 802.11n, MIMO. Modified TCP implementation. Multiple-flow across multiple sites. Long-term traces. UNIVERSITY OF MASSACHUSETTS, AMHERST • Department of Computer Science 18 End Questions or comments welcome! UNIVERSITY OF MASSACHUSETTS, AMHERST • Department of Computer Science