[SSPCPP-401] IIS App Pool Crash Created: 28/Oct/11 Status: Project: Component/s: Affects Version/s: Fix Version/s: Type: Reporter: Resolution: Labels: Remaining Estimate: Time Spent: Original Estimate: Environment: Attachments: Updated: 06/Aug/12 Resolved: 06/Aug/12 Closed Shibboleth SP - C++ Error Handling 2.4.3 2.5.0 Bug Priority: jlaweave@idp.protectnetwork.org Assignee: Fixed Session 0 minutes Major Scott Cantor 5 hours, 30 minutes 1 day, 4 hours Windows 2008R2 (SP1 + current patches), IIS 7.5 and .NET 3.5 framework isapi_shib-2.4.3a.zip Windows isapi_shib-2.4.3a.zip.asc Operating System: x86_64 CPU Type: C/C++ Compiler: Unknown IIS 7 (Windows 2008) Web Server: Description Our IIS web service keeps crashing with the following error: A process serving application pool 'app.Production' suffered a fatal communication error with the Windows Process Activation Service. The process id was '2576'. The data field contains the error number. A debug of the crash shows an access violation that appears to be caused by Shibboleth as referenced from a support request we opened with Microsoft (below): //From Microsoft PSS I have finished analyzing the dumps you sent Friday and the crash is being caused by the Shibboleth ISAPI filter. In the crashing call stack below you can see that isapi_shib is calling into our ServerSupportFunction. When I look at the instruction that is causing the AV in our code, I can see that the Shibboleth component is passing in a bad pointer for the ul1 parameter, see http://msdn.microsoft.com/en-us/library/aa503395.aspx. 0:030> kL Child-SP RetAddr Call Site 00000000`0b23b970 00000001`8000174e filter!W3_FILTER_CONTEXT::ServerSupportFunction+0x174 00000000`0b23bb80 00000001`8000c566 isapi_shib!TerminateFilter+0x45e 00000000`0b23bbc0 00000000`745e6f60 isapi_shib!GetFilterVersion+0x2896 00000000`0b23bc00 00000000`745b3b3c msvcr90!_CallSettingFrame+0x20 00000000`0b23bc30 00000000`77990c21 msvcr90!__CxxCallCatchBlock+0xfc 00000000`0b23bd00 00000001`80007ff2 ntdll!RcFrameConsolidation+0x3 00000000`0b23ea20 000007fe`f67e17e4 isapi_shib!HttpFilterProc+0x2d2 00000000`0b23eec0 000007fe`f67e1e01 filter!W3_FILTER_CONTEXT::NotifyFilters+0x178 00000000`0b23f0e0 000007fe`f8f6a185 filter!GlobalDoWork+0x351 00000000`0b23f310 000007fe`f8f6ab24 iiscore!W3_CONTEXT::SetupStateMachine+0x685 00000000`0b23f820 000007fe`fb4310d2 iiscore!W3_MAIN_CONTEXT::OnNewRequest+0x1b0 00000000`0b23f850 000007fe`fb43109c w3dt!UL_NATIVE_REQUEST::DoWork+0x126 00000000`0b23f8b0 000007fe`f8b01fba w3dt!OverlappedCompletionRoutine+0x1c 00000000`0b23f8e0 000007fe`f8b02024 w3tp!THREAD_POOL_DATA::ThreadPoolThread+0x7a 00000000`0b23f930 000007fe`f8b020a1 w3tp!THREAD_POOL_DATA::ThreadPoolThread+0x34 00000000`0b23f960 00000000`7783652d w3tp!THREAD_MANAGER::ThreadManagerThread+0x61 00000000`0b23f990 00000000`7796c521 kernel32!BaseThreadInitThunk+0xd 00000000`0b23f9c0 00000000`00000000 ntdll!RtlUserThreadStart+0x1d As you can see from below the AV is caused by an attempted read from 8000e6f0. 0:030> .exr -1 ExceptionAddress: 000007fef67ebc14 (filter!W3_FILTER_CONTEXT::ServerSupportFunction+0x0000000000000174) ExceptionCode: c0000005 (Access violation) ExceptionFlags: 00000000 NumberParameters: 2 Parameter[0]: 0000000000000000 Parameter[1]: 000000008000e6f0 Attempt to read from address 000000008000e6f0 The memory at the referenced address is free and is Page_Protect 0:030> !address 000000008000e6f0 Usage: Free Base Address: 00000000`7fff0000 End Address: 00000000`ffb00000 Region Size: 00000000`7fb10000 Type: 00000000 State: 00010000 MEM_FREE Protect: 00000001 PAGE_NOACCESS 0:030> lmvm isapi_shib start end module name 00000001`80000000 00000001`80020000 isapi_shib (export symbols) isapi_shib.dll Loaded symbol image file: isapi_shib.dll Image path: D:\opt\shibboleth-sp\lib\shibboleth\isapi_shib.dll Image name: isapi_shib.dll Timestamp: Sun Jul 03 17:00:27 2011 (4E10D86B) CheckSum: 00026390 ImageSize: 00020000 File version: 2.4.3.0 Product version: 2.4.3.0 File flags: 0 (Mask 3F) File OS: 40004 NT Win32 File type: 2.0 Dll File date: 00000000.00000000 Translations: 0409.04b0 CompanyName: UCAID ProductName: Shibboleth 2.4.3 InternalName: isapi_shib OriginalFilename: isapi_shib.dll ProductVersion: 2, 4, 3, 0 FileVersion: 2, 4, 3, 0 PrivateBuild: 2, 4, 3, 0 SpecialBuild: 2, 4, 3, 0 FileDescription: Shibboleth ISAPI Filter / Extension LegalCopyright: Copyright © 2011 UCAID LegalTrademarks: Copyright © 2011 UCAID Comments: Copyright © 2011 UCAID Comments Comment by jlaweave@idp.protectnetwork.org [ 28/Oct/11 ] Hi Scott, The actual crash dump is > 10MB upload size allowed here but I can upload elsewhere if required. Thanks Comment by Scott Cantor [ 28/Oct/11 ] Not necessary. You are using the 64-bit SP? Just want to be sure what to try and reproduce with. Also, something you might try is to set port and/or sslport attributes in your ISAPI <Site> element. One of the things the filter is fetching that could lead to the error is SERVER_PORT, and it only does that if the port isn't set explicitly in the element. I doubt that's the one failing, but it's worth a try. Comment by jlaweave@idp.protectnetwork.org [ 28/Oct/11 ] Scott, Yes, we are using the 64-bit SP. This is the current setting in shibboleth-2.xml <ISAPI normalizeRequest="true" safeHeaderNames="true"> <Site id="3" name="app.servicename.com" scheme="https" port="443"/> </ISAPI> Comment by Scott Cantor [ 28/Oct/11 ] Ok, so it's offloaded SSL, which means it's reading the port from the entry. That eliminates one source of the problem. I know how to handle bypassing the requirement for the other two. I would be willing to patch the 2.4 branch with that fix and get a DLL built that you could use as a test. I don't have plans to release a 2.4 patch right now, so it would be unreleased code, but it would be good as a test, and I suspect not crashing on unreleased but checked in code is of more use than crashing on a release. Probably will be next week when I have a chance to do it. Comment by jlaweave@idp.protectnetwork.org [ 28/Oct/11 ] Hi Scott, If you get me a test dll, I am willing to put the patch up to test immediately. Let me know when you have a patch ready. Thanks in advance, Joe Comment by jlaweave@idp.protectnetwork.org [ 31/Oct/11 ] Hi Scott, Not to be a pest but do you have an idea on when you will be able to have that test DLL ready? We're trying to determine what direction to take in handling our production site issues. Thanks, Joe Comment by Scott Cantor [ 31/Oct/11 ] I have no ETA. Probably some time this week. Comment by jlaweave@idp.protectnetwork.org [ 01/Nov/11 ] Hi Scott, I have been corresponding with Microsoft PSS regarding this issue (and passed along your observations) and they have replied with the following observations. Can you take a look at their synopsis and let me know if that helps with resolving the issue or if you can give us any idea as to what we can do on our end to help? This issues is really becoming a problem for us. Let me know if you need any additional information or if you have any other suggestions on how to fix the problem? //Microsoft PSS response: I am from the IIS/ASP.NET escalation team and have taken ownership of your support case. I have been assisting Anjum behind the scenes on the issue so I am already familiar with the issue and the concerns that the developer from Shibboleth has raised. I have analyzed the latest crash dumps with the new version of the filter and the problem is still the same as we saw previously. I will first speak to the issues raised by the developer. <Shibboleth Dev> Here, IIS is shutting down my filter and probably unloading it from memory. But it also lets the call into the IIS server support function to send the response to the client proceed. When that happens, the static string constant I passed into the function is gone from memory by the time the call happens. </Shibboleth Dev> The developer is misinterpreting part of the call stack so his assertions about what is happening are incorrect. The debugger is building the call stack using exported methods since we do not have the correct symbols for the component. He is partially correct in that his filter throws an exception and goes into the catch block however the calls back into his filter are not to GetFilterVersion and TerminateFilter. These are calls back into catch block of the filter but because we do not have symbols the debugger is showing the wrong methods. Notice the offset (highlighted in yellow). These offsets show where we are in the function. These are very large offsets and the functions are not even that long so we can say for sure that these aren’t the correct methods. What is happening here is that the C++ catch code is creating the exception object to pass back into his handler and then his handler calls into ServerSupportFunction to send the response (as he has acknowledged). 0:006> kL Child-SP RetAddr Call Site 00000000`0165bf70 00000001`8000184e filter!W3_FILTER_CONTEXT::ServerSupportFunction+0x174 00000000`0165c180 00000001`80011426 isapi_shib!TerminateFilter+0x50e 00000000`0165c1c0 00000000`74d48a40 isapi_shib!GetFilterVersion+0x2746 00000000`0165c200 00000000`74d40463 msvcr100!_CallSettingFrame+0x20 00000000`0165c230 00000000`77730c21 msvcr100!__CxxCallCatchBlock+0xeb 00000000`0165c2f0 00000001`8000cf46 ntdll!RcFrameConsolidation+0x3 00000000`0165ecd0 000007fe`f58817e4 isapi_shib!HttpFilterProc+0x3b6 00000000`0165f120 000007fe`f5881e01 filter!W3_FILTER_CONTEXT::NotifyFilters+0x178 00000000`0165f340 000007fe`f5daa185 filter!GlobalDoWork+0x351 00000000`0165f570 000007fe`f5daab24 iiscore!W3_CONTEXT::SetupStateMachine+0x685 00000000`0165fa80 000007fe`f64c10d2 iiscore!W3_MAIN_CONTEXT::OnNewRequest+0x1b0 00000000`0165fab0 000007fe`f64c109c w3dt!UL_NATIVE_REQUEST::DoWork+0x126 00000000`0165fb10 000007fe`f6391fba w3dt!OverlappedCompletionRoutine+0x1c 00000000`0165fb40 000007fe`f6392024 w3tp!THREAD_POOL_DATA::ThreadPoolThread+0x7a 00000000`0165fb90 000007fe`f63920a1 w3tp!THREAD_POOL_DATA::ThreadPoolThread+0x34 00000000`0165fbc0 00000000`774d652d w3tp!THREAD_MANAGER::ThreadManagerThread+0x61 00000000`0165fbf0 00000000`7770c521 kernel32!BaseThreadInitThunk+0xd 00000000`0165fc20 00000000`00000000 ntdll!RtlUserThreadStart+0x1d <Shibboleth Dev> Something that is supposed to be set for every request by IIS is apparently not set. My logging is unfortunately not sufficient in that release to tell which piece of information it is, but the set of possibilities is small. </Shibboleth Dev> We have looked at the disassembly and this is not really the case. We have traced the exception to be coming from the following call in the ISAPI code. res = stf.getServiceProvider().doAuthentication(stf); The exception that is getting raised is matching ERROR_NO_DATA so this is giving the false reading that there is a missing header or server variable. This is something that Shibboleth support will have to help you track down. While this exception is the trigger for the AV that is crashing the process, it is a separate issue from the actual AV which I will speak to now. The AV is happening because of an incorrect cast of a pointer to a DWORD which is causing the incorrect pointer to be used in the ServerSupportFunction call. The operation that causes the AV is seen here where we are dereferencing the pointer in register RDI. RDI is supposed to contain the pointer to the string the ISAPI filter is attempting to send in the response. filter!W3_FILTER_CONTEXT::ServerSupportFunction+0x174: 000007fe`f588bc14 f2ae repne scas byte ptr [rdi] As we see below the pointer value in RDI is 0000000080012798 0:006> r rax=0000000000000000 rbx=000000000112a0f0 rcx=ffffffffffffffff rdx=000000000165c030 rsi=000000000112b0f8 rdi=0000000080012798 rip=000007fef588bc14 rsp=000000000165bf70 rbp=000007fef588f600 r8=0000000000000100 r9=0000000080012798 r10=0000000000000008 r11=000000000165c170 r12=0000000000000001 r13=0000000080012798 r14=000000018001278c r15=0000000180011400 iopl=0 nv up ei pl zr na po nc cs=0033 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00010244 filter!W3_FILTER_CONTEXT::ServerSupportFunction+0x174: 000007fe`f588bc14 f2ae repne scas byte ptr [rdi] If we dump out the contents of what’s at RDI we see 0:006> dc @rdi 00000000`80012798 ???????? ???????? ???????? ???????? ???????????????? 00000000`800127a8 ???????? ???????? ???????? ???????? ???????????????? 00000000`800127b8 ???????? ???????? ???????? ???????? ???????????????? 00000000`800127c8 ???????? ???????? ???????? ???????? ???????????????? 00000000`800127d8 ???????? ???????? ???????? ???????? ???????????????? 00000000`800127e8 ???????? ???????? ???????? ???????? ???????????????? 00000000`800127f8 ???????? ???????? ???????? ???????? ???????????????? 00000000`80012808 ???????? ???????? ???????? ???????? ???????????????? And if we check the memory usage at that address we see it is free memory that has not been allocated yet so this is not a valid pointer. 0:006> !address @rdi Usage: Free Base Address: 00000000`7fff0000 End Address: 00000000`ff4a0000 Region Size: 00000000`7f4b0000 Type: 00000000 State: 00010000 MEM_FREE Protect: 00000001 PAGE_NOACCESS On a hunch we checked what’s at 0000000180012798 which would be the address if the value wasn’t truncated by the invalid cast. Here we see clearly the string that the developer was intending to return. 0:006> dc 0000000180012798 00000001`80012798 6e6e6f43 69746365 203a6e6f 736f6c63 Connection: clos 00000001`800127a8 430a0d65 65746e6f 542d746e 3a657079 e..Content-Type: 00000001`800127b8 78657420 74682f74 0a0d6c6d 00000a0d text/html...... 00000001`800127c8 00000000 00000000 4d54483c 483c3e4c ........<HTML><H 00000001`800127d8 3e444145 5449543c 533e454c 62626968 EAD><TITLE>Shibb 00000001`800127e8 74656c6f 69462068 7265746c 72724520 oleth Filter Err 00000001`800127f8 2f3c726f 4c544954 2f3c3e45 44414548 or</TITLE></HEAD 00000001`80012808 4f423c3e 3c3e5944 533e3148 62626968 ><BODY><H1>Shibb 0:006> d 00000001`80012818 74656c6f 69462068 7265746c 72724520 oleth Filter Err 00000001`80012828 2f3c726f 003e3148 4f422f3c 3c3e5944 or</H1>.</BODY>< 00000001`80012838 4d54482f 00003e4c 4d54483c 483c3e4c /HTML>..<HTML><H 00000001`80012848 3e444145 5449543c 533e454c 62626968 EAD><TITLE>Shibb 00000001`80012858 74656c6f 72452068 3c726f72 5449542f oleth Error</TIT 00000001`80012868 3c3e454c 4145482f 423c3e44 3e59444f LE></HEAD><BODY> 00000001`80012878 3e31483c 62696853 656c6f62 45206874 <H1>Shibboleth E 00000001`80012888 726f7272 31482f3c 0000003e 00000000 rror</H1>....... 0:006> d 00000001`80012898 69727473 7420676e 6c206f6f 00676e6f string too long. 00000001`800128a8 61766e69 2064696c 69727473 7020676e invalid string p 00000001`800128b8 7469736f 006e6f69 800144d0 00000001 osition..D...... 00000001`800128c8 80002b10 00000001 8000f7dc 00000001 .+.............. 00000001`800128d8 8000f7e2 00000001 80003d70 00000001 ........p=...... 00000001`800128e8 800027f0 00000001 8000f7e8 00000001 .'.............. 00000001`800128f8 80002750 00000001 8000f7ee 00000001 P'.............. 00000001`80012908 8000f7f4 00000001 8000f7fa 00000001 .............. Below highlighted in yellow is where the bug is: DWORD WriteClientError(PHTTP_FILTER_CONTEXT pfc, const char* msg) { LogEvent(nullptr, EVENTLOG_ERROR_TYPE, 2100, nullptr, msg); static const char* ctype="Connection: close\r\nContent-Type: text/html\r\n\r\n"; pfc->ServerSupportFunction(pfc,SF_REQ_SEND_RESPONSE_HEADER,"200 OK",(DWORD)ctype,0); static const char* xmsg="<HTML><HEAD><TITLE>Shibboleth Filter Error</TITLE></HEAD><BODY>" "<H1>Shibboleth Filter Error</H1>"; We found same problem in several other places in the vendors code. This is a problem because a DWORD will only hold a 32bit address on x86 or x64. The vendor should probably use a UINT_PTR instead http://msdn.microsoft.com/en-us/library/aa384242(v=VS.85).aspx You will need to take this up with the developer for a fix in their ISAPI filter. Please let me know if you have any questions or concerns. Comment by Scott Cantor [ 01/Nov/11 ] Yes, that's very helpful. The log entry you provided does prove that the source of the exception is a piece of request data that's missing. They are in turn correct that the crash is because of the pointer cast. Their API used to be based on a DWORD parameter (the cast was imposed by their API), and it's now been changed to take a larger value to accomodate the 64-bit pointer. So that's an easy fix I will make, but I have to be somewhat careful not to break older IIS support. That fix alone will just change it from a crash to a failure page the browser will display about the missing data, as it does for me when I test it. The other fixes still need to be made to figure out which piece of data is missing and work around it. I'm sorry this is an emergency for you, but all I can do is work on it when my local work schedule permits. I expect that I should be able to produce a build to try by tomorrow evening. Comment by Scott Cantor [ 01/Nov/11 ] Patch should fix the 64-bit casts and eliminates exceptions when request variables aren't set by IIS. Missing port is assumed to be 80/443, missing name is assumed to be the canonical vhost name. Missing INSTANCE_ID is fatal and returns an error to the client. http://svn.shibboleth.net/view/cpp-sp?rev=3536&view=rev Comment by Scott Cantor [ 01/Nov/11 ] Ok, I've lightly tested this patched version incorporating the check-in. I tested primarily with a 64-bit IIS process to match what you're using. I signed the archive with my PGP key so you can verify the files if you want to. Use the appropriate architecture DLL from the ZIP and copy into lib/shibboleth. Aside from the obvious monitoring, I think you'll need to keep an eye on the Windows Event log and see if it reports any problems accessing IIS variables. Logging has been added to try and report on that. It's my expectation that your system may still have a problem, but hopefully this will stop crashing it, and will report the problem back to the user. But that won't ultimately fix things for the user, so we need to monitor it and find out why it was throwing this exception to begin with. Comment by jlaweave@idp.protectnetwork.org [ 02/Nov/11 ] Hi Scott, Due to our circumstances, we put your patch into production early this morning but had to back out after experiencing problems. What we saw upon starting IIS and the Shibboleth Daemon was an immediate 99% processor utilization from the w3wp process associated with our application and subsequent IIS recycle and a box reboot did not fix it. Backing out the fix resolved the problem. Unfortunately, we had a small window of opportunity to put the patch out and I did not have time to try and diagnose what was causing the problem. For reference, I copied over the two dll's from the x64 zip directory and put them in their respective shibboleth installation directories (after stopping IIS/Shibboleth Daemon). Can you think of anything else that I should have done differently? Comment by Scott Cantor [ 02/Nov/11 ] No, I saw something similar but it appeared to be unrelated to the SP so I wrote it off at the time. I don't know what could have been caused by any changes I made, but I'll have to dig deeper. In the meantime, I can back out to a simpler patch that just tries to fix the crash alone. Comment by Scott Cantor [ 02/Nov/11 ] I replaced the patched version with one that only fixes the pointer cast problem. Since I'm not seeing the crash, I can't be sure of that, but the cast matches the new API documentation from MS. With this change, if the crashing stops, you'll definitely get errors in the browser about the missing variable, but we won't know for sure what's causing that. Comment by Scott Cantor [ 02/Nov/11 ] Disregard the last, I found the CPU bug, simple accident. Patch is now updated again with the full fix. Will wait for your feedback. Comment by Scott Cantor [ 02/Nov/11 ] Patch ported back to development branch. http://svn.shibboleth.net/view/cpp-sp?rev=3539&view=rev Comment by Scott Cantor [ 06/Aug/12 ] No response from reporter to interim fix, but patch was backported into the new release. Generated at Mon Mar 07 01:20:34 EST 2016 using JIRA 7.0.10#70120sha1:37e3d7a6fc4d580639533e7f7c232c925e554a6a.