Architecture and Techniques for Diagnosing Faults in IEEE 802.11 Infrastructure Networks Atul Adya, Victor Bahl, Ranveer Chandra, Lili Qiu Microsoft Research 1 Wireless Network Woes • How many times have you heard users say: – “My machine says: wireless connection unavailable” – “Why can’t my machine authenticate?” – “My performance on wireless really sucks” IT Dept: Several hundred complaints per month • You may have heard network admins say: – “I wonder if some one has sneakily installed an unauthorized access point” – “Do we have complete coverage in all the buildings?” 2 Enterprise Wireless Problems Main problems observed by IT department: – Connectivity: RF Holes – Authentication: 802.1x protocol issues – Performance: Unexplained delays – Security: Rogue APs 3 Existing Products • Provide management/diagnostic functions – E.g., AirWave, CA’s NSM, Air Defense, Air Magnet • Insufficient functionality: – – – – No support for disconnected clients Weak root-cause analysis (raw data, mostly) Diagnosis only from the AP perspective Sometimes need expensive sensor deployment 4 Our Contributions • Flexible client-based framework for detection and diagnosis of wireless faults • Client Conduit: communication for disconnected clients via nearby connected clients • Diagnostic mechanisms – Approximate location of disconnected clients – Rogue AP detection – Performance problem analysis 5 Talk Outline • Diagnostics architecture and implementation • Client Conduit: diagnosing disconnected clients • Diagnostic mechanisms – Locating disconnected clients – Detecting unauthorized APs – Analyzing performance problems • Summary and Future Work 6 Assumptions • Can install diagnostic software on clients – APs are typically closed platforms – Can provide improved diagnosis with modified APs • Nearby clients available for fault diagnosis – At least 13 active clients on our floor (approx. 2500 sq. feet) • Network admins maintain AP Location Database 7 Client-Centric Architecture Diagnostic Server (DS) Authentication/User Info RADIUS Diagnostic AP Module (DAP) Client Conduit Disconnected Client Kerberos Legacy AP Diagnostic Client Module (DC) 8 Diagnostic Architecture Properties • Exploits client-view of network (not just APs) • Supports proactive and reactive mechanisms • Scalable • Secure 9 Client Implementation User Mode Kernel Mode Diagnostics Daemon • Prototype system on Windows TCP/IP Diagnostics IM Module • Native WiFi: Extensibility framework for 802.11 [Microsoft Networking 2003] NDIS Native WiFi IM Driver Diagnostics Miniport Module • Daemon: most of functionality and main control flow Native WiFi Miniport Driver • IM driver: limited changes Native WiFi NIC – Packet capture & monitoring 10 Talk Outline • Diagnostics architecture and implementation • Client Conduit: diagnosing disconnected clients • Diagnostic mechanisms – Locating disconnected clients – Detecting unauthorized APs – Analyzing performance problems • Summary and Future Work 11 Cause of Disconnection • Lack of coverage – In an RF Hole – Just outside AP range • Authentication issues, e.g., stale certificates • Protocol problems, e.g., no DHCP address Can we communicate via nearby connected clients? 12 Communication via Nearby Clients Adhoc Mode Access Point Disconnected Client “Grumpy” Cannot be on 2 networks. Packet dropped! Connected Client “Happy” (Infrastructure) Possible (unsatisfactory) solutions: • Multiple radios: extra radio for diagnostics • MultiNet [InfoCom04]: Multiplex “Happy” between Infrastructure/Adhoc modes Penalizing normal case behavior for rare scenario 13 Our Solution: Client Conduit Stops Becomes beaconing an Access Point (Starts beaconing) Access Point Disconnected Client “Not-so-Grumpy” “Grumpy” SOS Ack (Probe Req) Ad hoc network via MultiNet SOS (Beacon) Connected Client “Happy” Disconnected station detected Help disconnected wireless clients with: • Online diagnosis • Certificate bootstrapping 14 Client Conduit Features • Incurs no extra overhead for connected clients – Use existing 802.11 messages: beacons & probes • Works with legacy APs • Includes security mechanisms to avoid abuses 15 Client Conduit Performance 8 Time (seconds) 6.7 seconds 6 4 2.7 seconds 2 Adhoc-mode association Become Station Get Ack Set Beacon Period Set SSID Become AP Set channel 0 No mode changes • Time for “Grumpy” to get connected < 7 seconds – Reduced time can enable transparent recovery • Bandwidth available for diagnosis > 400 Kbps (when “Happy” donates only 20% of time) 16 Talk Outline • Diagnostics architecture and implementation • Client Conduit: diagnosing disconnected clients • Diagnostic mechanisms – Locating disconnected clients – Detecting unauthorized APs – Analyzing performance problems • Summary and Future Work 17 Locating Disconnected Clients Goal: Approximately locate to determine RF Holes Solution: Use nearby connected clients • “Grumpy” starts beaconing • Nearby clients report signal strength to server • Diagnostic server uses RADAR [InfoCom00] twice – Locates connected clients – Locates “Grumpy” with clients as “anchor points” • Location error: 10 – 15 meters 18 Talk Outline • Diagnostics architecture and implementation • Client Conduit: diagnosing disconnected clients • Diagnostic mechanisms – Locating disconnected clients – Detecting unauthorized APs – Analyzing performance problems • Summary and Future Work 19 Rogue AP Problems Why problematic? • Allow network access to unauthorized users • Hurt performance: interfere with existing APs Detection goals: • Common case: mistakes by employees • Detect unauthorized IEEE 802.11 APs – Not considering non-compliant APs Solution: Use clients for monitoring nearby APs 20 Rogue AP Detection • Clients monitor nearby APs. Send to server: – MAC address, Channel, SSID, RSSI (for location) • Server checks 4-tuple in AP Location Database • Obtaining AP Information at clients: – Same/overlapping channel as client: from Beacons – AP on non-overlapping channel: • Active Scan periodically • AP information from Probe Response 21 Rogue AP Detection Overheads • Bandwidth usage < 0.2 Kbps per client • Can active scans be performed without disruption? – Sufficient idleness available (2½ – 3 min.) – Simple threshold-based prediction: Active scan completed in idle period for 95% cases 22 Talk Outline • Diagnostics architecture and implementation • Client Conduit: diagnosing disconnected clients • Diagnostic mechanisms – Locating disconnected clients – Detecting unauthorized APs – Analyzing performance problems • Summary and Future Work 23 Summary • • • • Diagnostics critical for 802.11 deployments Client-centric architecture Client Conduit Diagnosis using nearby clients – Locate disconnected clients – Detect rogue APs – Analyze performance problems • Prototype in Windows using Native WiFi – Mechanisms are effective with low overheads 24 Future Work • Detecting Rogue Ad Hoc networks • 802.1x protocol analyzer • Detailed wireless delay analyzer • Automated recovery after fault diagnosis 25