TMG Data Packager Analysis Part1 By Suraj Singh Security Support Escalation Engineer Microsoft Forefront EDGE team Scope • Intro • Scenario • Analyzing individual logs collected by TMG data packager. • Correlating different logs to make sense and complete the picture and understand what's going on. INTRO • TMGpackage.cab file is generated after we collect data using TMG data packager tool while doing repro of the issue. • More info about TMG packager tool http://blogs.technet.com/b/soorajsec/archive/2010/04/10/instructions-for-isa-data-packager-to-collect-datain-repro-mode.aspx • This presentation talks about how to analyze the data collected by this tool. • So first thing to do is unzip this cab file and proceed as explained in coming slides. • Idea is to give starting point to begin data analysis by yourself. Logs collected • It depends on the scenario and the issue and corresponding template used on the TMG data packager wizard • The logs collected also depends on the options we choose in the modify options For Basic repro the logs would look like this after extracting the contents from the cab file. BPA Reports This report can be opened using ISABPA tool What we get from BPA reports • • • • • • • Static configuration of the TMG Build EMS Array Monitoring logging and reporting. Firewall policy Networking(network configuration, ipconfig,routing table, NIC config, NLB etc) • Services , processes. • Regkeys TMGLogs TMGLogs Network Captures ISAtracing Tools • Netmon(to read network traces) • Excel(to read TMG logs) • Text reader/editor tools for searching the text. Scenario • Issue : Outlook anywhere users disconnect intermittently and admin sees, huge amount of failed connections in the TMG live logs with different status codes e.g. 10054,64,1460. • Troubleshooting For status codes 10054 and 64 provided following blog post written by Thomas deztner http://blogs.technet.com/b/isablog/archive/2007/06/25/rpc-overhttp-logging-wildness.aspx But there was no explanation for1460. Data Analysis • In this case data was already collected by Admin as per instructions by previous engineer, it was taken as Basic repro at the time of issue. • We used client ip of the client machine that was facing the issue to filter the data. • From the nature of the issue, it was clear that four important logs were going to help us i.e. TMG live logs, Network captures, ISA Tracing(this log can only be used by MS Engineers as it needs certain files to parse this log, I m putting this here for reference to my peers as well) ,ISA BPA(to look into config)as we needed to understand why issue was happening and these on the fly logs were over best bets. Where to start with • We can choose BPA Reports- To understand the config, To know what Ip ranges are in internal network, NIC configs and IP configs, NLB config , Obvious alerts. • Then in scenarios like this we can start with TMG logs, Admin provided the client machine IP, so that was used to filter all the traffic that came from this client. • I have masked the client IP in the next snapshot but idea is that we can use excel data filters to filter Live logs using client IP. TMG Logs After scrolling right on excel looked for filter info column and corresponding request id ISATracing(for my MS peers only) • Used a text editor/reader tool to filter the ISAtracing log using the request id found in the TMG Live Logs. • Filtered the log and located the result code 1460 in it as we can see in the next slode. Zoom in Noise:WPPISAPUBLIC:(x.x.x.x:1234 ==> x.x.x.x:443) (x.x.x.x:56789 --x.x.x.x:443), 0 bytes, "<NULL>", 1460(ERROR_TIMEOUT) Network captures • Filter the network montor traces taken on the internal NIC with this combination 1.2.3.4:12345 --- 7.8.9.10:443(sample ip of TMG internal NIC and CAS server and sample ports as I m masking the original ip/ports) • With filter like Tcp.port==12345 and tcp.port ==443 as it makes a unique session, but could not find the traffic in the network capture as due to huge traffic netmon buffers ran out. From the error it was clear that there was time out and server did not respond properly and time out happened , since we could not get information about this unique session, next best plan was to see how good is the connectivity between TMG and the CAS server, so once again looked at the traces with a different display filter. Conclusion and Action Plan • After looking at three logs in this sequence we were able to find why we were getting 1460 i.e. it was due to time out (ISAtracing) and then looking at the network traces we found that there was really bad connection between TMG and CAS server. To get the clear picture suggested Admin that from current data we were able to determine that there is really bad connectivity between TMG and CAS, we should also see the picture from CAS end to determine how it responds to traffic sent by TMG at the time of the issue. • If the CAS server does not respond due to bad NIC/drivers etc or its because of a networking device between TMG and CAS. So suggested him to collect simultaneous data from CAS, TMG and Client at the time of the issue. CAS : Network monitor captures TMG server-TMG data packager with web proxy and publishing template Client machine : Network monitor captures Few Tricks with ISA Tracing using Text editor/reader tools apart from Request ID. • Keywords Result code. Error i.e. Keyword Error Hexadecimal error that you are getting while doing repro of the issue e.g. 0x8007003a IP address : port combination. Failed (keyword) Status code e.g. 1460 What to do if you see result of the keyword used • Its hard work converted to smart work • You need to locate your repro or issue traffic by finding IP address or port or user name of the user who experienced the issue and once you are able to locate that note down the context id or Request id and then search with that context id in text analysis tool and you shall get one single conversation and then observe through this conversation if you notice issue in that. • You might have to repeat this if you saw multiple request id with your initial search to locate the problem. • With practice this process becomes quicker so more you practice, better and faster you become reading this log. Q&A