Wireshark in the Large Enterprise June 16, 2010 Hansang Bae Senior Vice President | Citi (f.k.a. Citigroup) Email: hansang@gmail.com PLEASE REFER TO THE “ANSWERSHEET.DOCX” FILE FOR ADDITIONAL INFORMATION ABOUT THIS PRESENTATION. THESE SESSIONS WILL BE AVAILABLE ON YOUTUBE: HTTP://WWW.YOUTUBE.COM/USER/HANSANGB SHARKFEST ‘10 Stanford University June 14-17, 2010 SHARKFEST ‘10 | Stanford University | June 14 –17, 2010 Please Let TCP Do Its Job. Problem: Application developers escalate an issue with slow file (MQ) transfers. Troubleshooting Steps: 1. What should you rule out immediately? 2. What affects throughput and why? 3. Look for patterns and ask the right questions. Quick examination would reveal what? Doesn’t it look normal? Can you spot the issue quickly? Were you guys paying attention yesterday?!? 4. Use the graphing tools. Picture is worth a thousand words. 5. Setup your Wireshark environment in a standard way. Use Configuration Manager to help you. SHARKFEST ‘10 | Stanford University | June 14 –17, 2010 Don’t Jump to Conclusions! Another application development team escalates a “slowness” problem. Troubleshooting Steps: 1. Trust But Verify (tcp.analysis.flags) 2. Look for telltale signs of problems. 3. Who’s sending and who’s receiving? Besides looking at the name of the file….can you figure it out? 4. Apply Occam’s Razor when solving problems. SHARKFEST ‘10 | Stanford University | June 14 –17, 2010 Another (unusual) Hidden Danger! Application testing with an external vendor doesn’t work. It tested fine when tested with intraresources. Troubleshooting Steps: 1. If it works internally but not with an external vendor (reachable via Internet) what device should you suspect? Learn to Divide and Conquer – the power of binary search! 2. Have “High Bandwidth Conversations” with qualified peers. 3. Look out for “Defaults” HSB’ism: Defaults are the guardian angels for the clueless! 4. Another case of “picture is worth a thousand words” SHARKFEST ‘10 | Stanford University | June 14 –17, 2010 Odd Numbers are Evil? Really? Software Update System is slow in delivering packages to staging servers. It impacts 300,000+ users! Troubleshooting Steps: 1. Usual Suspects (Duplex, Window size, Pkt loss, and LFN) 2. Use the information in the trace to eliminate some of the “usual suspects.” Not all inefficiencies come into play. Does Window come into play here? 3. Do I need to see the SYN/SYN+ACK to see what environment this is? What other options are there? 4. Use Time Reference markings liberally? 5. Case of “too much of a good thing” SHARKFEST ‘10 | Stanford University | June 14 –17, 2010 Another Zebra Case! Users are calling into the helpdesk because the Citrix sessions are dying. Main Concept: 1. Applications traversing the Internet play by a different set of rules/standards. Packet loss is a way of life. 2. Do you **REALLY** know TCP? 3. Did you pick up on why the 500ms delay is significant? 4. What is Fast Retransmit and how is it different from “regular” Retransmission? 5. Learn the art of spotting something unusual. But first, you need to understand “what’s unusual.” SHARKFEST ‘10 | Stanford University | June 14 –17, 2010 Wan Optimization After upgrading WAN optimization appliances, tellers started reporting intermittent printing issues. Transient problems like these are the toughest to resolve. What was the time to Resolution? Three days - thanks to packet captures. Main Concept: 1. Last change was OS upgrade on the wan optimization appliance, so start there. 2. Capturing in the right capture points is critical. Why? 3. Is it worth looking at TCP Session #2? 4. What should you compare? What can you compare? 5. Sake Blok’s session last year on SSL decryption was VERY helpful! SHARKFEST ‘10 | Stanford University | June 14 –17, 2010 Wan Optimization (Con’t) SHARKFEST ‘10 | Stanford University | June 14 –17, 2010 Wan Optimization (Con’t) SHARKFEST ‘10 | Stanford University | June 14 –17, 2010