Word

advertisement
[SSPCPP-401] IIS App Pool Crash Created: 28/Oct/11
Status:
Project:
Component/s:
Affects
Version/s:
Fix Version/s:
Type:
Reporter:
Resolution:
Labels:
Remaining
Estimate:
Time Spent:
Original
Estimate:
Environment:
Attachments:
Updated: 06/Aug/12 Resolved: 06/Aug/12
Closed
Shibboleth SP - C++
Error Handling
2.4.3
2.5.0
Bug
Priority:
jlaweave@idp.protectnetwork.org Assignee:
Fixed
Session
0 minutes
Major
Scott Cantor
5 hours, 30 minutes
1 day, 4 hours
Windows 2008R2 (SP1 + current patches), IIS 7.5 and .NET 3.5 framework
isapi_shib-2.4.3a.zip
Windows
isapi_shib-2.4.3a.zip.asc
Operating
System:
x86_64
CPU Type:
C/C++ Compiler: Unknown
IIS 7 (Windows 2008)
Web Server:
Description
Our IIS web service keeps crashing with the following error:
A process serving application pool 'app.Production' suffered a fatal communication error with
the Windows Process Activation Service. The process id was '2576'. The data field contains the
error number.
A debug of the crash shows an access violation that appears to be caused by Shibboleth as
referenced from a support request we opened with Microsoft (below):
//From Microsoft PSS
I have finished analyzing the dumps you sent Friday and the crash is being caused by the
Shibboleth ISAPI filter. In the crashing call stack below you can see that isapi_shib is calling
into our ServerSupportFunction. When I look at the instruction that is causing the AV in our
code, I can see that the Shibboleth component is passing in a bad pointer for the ul1 parameter,
see http://msdn.microsoft.com/en-us/library/aa503395.aspx.
0:030> kL
Child-SP RetAddr Call Site
00000000`0b23b970 00000001`8000174e
filter!W3_FILTER_CONTEXT::ServerSupportFunction+0x174
00000000`0b23bb80 00000001`8000c566 isapi_shib!TerminateFilter+0x45e
00000000`0b23bbc0 00000000`745e6f60 isapi_shib!GetFilterVersion+0x2896
00000000`0b23bc00 00000000`745b3b3c msvcr90!_CallSettingFrame+0x20
00000000`0b23bc30 00000000`77990c21 msvcr90!__CxxCallCatchBlock+0xfc
00000000`0b23bd00 00000001`80007ff2 ntdll!RcFrameConsolidation+0x3
00000000`0b23ea20 000007fe`f67e17e4 isapi_shib!HttpFilterProc+0x2d2
00000000`0b23eec0 000007fe`f67e1e01 filter!W3_FILTER_CONTEXT::NotifyFilters+0x178
00000000`0b23f0e0 000007fe`f8f6a185 filter!GlobalDoWork+0x351
00000000`0b23f310 000007fe`f8f6ab24 iiscore!W3_CONTEXT::SetupStateMachine+0x685
00000000`0b23f820 000007fe`fb4310d2
iiscore!W3_MAIN_CONTEXT::OnNewRequest+0x1b0
00000000`0b23f850 000007fe`fb43109c w3dt!UL_NATIVE_REQUEST::DoWork+0x126
00000000`0b23f8b0 000007fe`f8b01fba w3dt!OverlappedCompletionRoutine+0x1c
00000000`0b23f8e0 000007fe`f8b02024
w3tp!THREAD_POOL_DATA::ThreadPoolThread+0x7a
00000000`0b23f930 000007fe`f8b020a1
w3tp!THREAD_POOL_DATA::ThreadPoolThread+0x34
00000000`0b23f960 00000000`7783652d
w3tp!THREAD_MANAGER::ThreadManagerThread+0x61
00000000`0b23f990 00000000`7796c521 kernel32!BaseThreadInitThunk+0xd
00000000`0b23f9c0 00000000`00000000 ntdll!RtlUserThreadStart+0x1d
As you can see from below the AV is caused by an attempted read from 8000e6f0.
0:030> .exr -1
ExceptionAddress: 000007fef67ebc14
(filter!W3_FILTER_CONTEXT::ServerSupportFunction+0x0000000000000174)
ExceptionCode: c0000005 (Access violation)
ExceptionFlags: 00000000
NumberParameters: 2
Parameter[0]: 0000000000000000
Parameter[1]: 000000008000e6f0
Attempt to read from address 000000008000e6f0
The memory at the referenced address is free and is Page_Protect
0:030> !address 000000008000e6f0
Usage: Free
Base Address: 00000000`7fff0000
End Address: 00000000`ffb00000
Region Size: 00000000`7fb10000
Type: 00000000
State: 00010000 MEM_FREE
Protect: 00000001 PAGE_NOACCESS
0:030> lmvm isapi_shib
start end module name
00000001`80000000 00000001`80020000 isapi_shib (export symbols) isapi_shib.dll
Loaded symbol image file: isapi_shib.dll
Image path: D:\opt\shibboleth-sp\lib\shibboleth\isapi_shib.dll
Image name: isapi_shib.dll
Timestamp: Sun Jul 03 17:00:27 2011 (4E10D86B)
CheckSum: 00026390
ImageSize: 00020000
File version: 2.4.3.0
Product version: 2.4.3.0
File flags: 0 (Mask 3F)
File OS: 40004 NT Win32
File type: 2.0 Dll
File date: 00000000.00000000
Translations: 0409.04b0
CompanyName: UCAID
ProductName: Shibboleth 2.4.3
InternalName: isapi_shib
OriginalFilename: isapi_shib.dll
ProductVersion: 2, 4, 3, 0
FileVersion: 2, 4, 3, 0
PrivateBuild: 2, 4, 3, 0
SpecialBuild: 2, 4, 3, 0
FileDescription: Shibboleth ISAPI Filter / Extension
LegalCopyright: Copyright © 2011 UCAID
LegalTrademarks: Copyright © 2011 UCAID
Comments: Copyright © 2011 UCAID
Comments
Comment by jlaweave@idp.protectnetwork.org [ 28/Oct/11 ]
Hi Scott,
The actual crash dump is > 10MB upload size allowed here but I can upload elsewhere if
required.
Thanks
Comment by Scott Cantor [ 28/Oct/11 ]
Not necessary.
You are using the 64-bit SP? Just want to be sure what to try and reproduce with.
Also, something you might try is to set port and/or sslport attributes in your ISAPI <Site>
element. One of the things the filter is fetching that could lead to the error is SERVER_PORT,
and it only does that if the port isn't set explicitly in the element.
I doubt that's the one failing, but it's worth a try.
Comment by jlaweave@idp.protectnetwork.org [ 28/Oct/11 ]
Scott,
Yes, we are using the 64-bit SP.
This is the current setting in shibboleth-2.xml
<ISAPI normalizeRequest="true" safeHeaderNames="true">
<Site id="3" name="app.servicename.com" scheme="https" port="443"/>
</ISAPI>
Comment by Scott Cantor [ 28/Oct/11 ]
Ok, so it's offloaded SSL, which means it's reading the port from the entry. That eliminates one
source of the problem.
I know how to handle bypassing the requirement for the other two. I would be willing to patch
the 2.4 branch with that fix and get a DLL built that you could use as a test. I don't have plans to
release a 2.4 patch right now, so it would be unreleased code, but it would be good as a test, and
I suspect not crashing on unreleased but checked in code is of more use than crashing on a
release.
Probably will be next week when I have a chance to do it.
Comment by jlaweave@idp.protectnetwork.org [ 28/Oct/11 ]
Hi Scott,
If you get me a test dll, I am willing to put the patch up to test immediately.
Let me know when you have a patch ready.
Thanks in advance,
Joe
Comment by jlaweave@idp.protectnetwork.org [ 31/Oct/11 ]
Hi Scott,
Not to be a pest but do you have an idea on when you will be able to have that test DLL ready?
We're trying to determine what direction to take in handling our production site issues.
Thanks,
Joe
Comment by Scott Cantor [ 31/Oct/11 ]
I have no ETA. Probably some time this week.
Comment by jlaweave@idp.protectnetwork.org [ 01/Nov/11 ]
Hi Scott,
I have been corresponding with Microsoft PSS regarding this issue (and passed along your
observations) and they have replied with the following observations. Can you take a look at
their synopsis and let me know if that helps with resolving the issue or if you can give us any
idea as to what we can do on our end to help?
This issues is really becoming a problem for us. Let me know if you need any additional
information or if you have any other suggestions on how to fix the problem?
//Microsoft PSS response:
I am from the IIS/ASP.NET escalation team and have taken ownership of your support case. I
have been assisting Anjum behind the scenes on the issue so I am already familiar with the issue
and the concerns that the developer from Shibboleth has raised. I have analyzed the latest crash
dumps with the new version of the filter and the problem is still the same as we saw previously.
I will first speak to the issues raised by the developer.
<Shibboleth Dev>
Here, IIS is shutting down my filter and probably unloading it from memory. But it also lets the
call into the IIS server support function to send the response to the client proceed. When that
happens, the static string constant I passed into the function is gone from memory by the time
the call happens.
</Shibboleth Dev>
The developer is misinterpreting part of the call stack so his assertions about what is happening
are incorrect. The debugger is building the call stack using exported methods since we do not
have the correct symbols for the component. He is partially correct in that his filter throws an
exception and goes into the catch block however the calls back into his filter are not to
GetFilterVersion and TerminateFilter. These are calls back into catch block of the filter but
because we do not have symbols the debugger is showing the wrong methods. Notice the offset
(highlighted in yellow). These offsets show where we are in the function. These are very large
offsets and the functions are not even that long so we can say for sure that these aren’t the
correct methods. What is happening here is that the C++ catch code is creating the exception
object to pass back into his handler and then his handler calls into ServerSupportFunction to
send the response (as he has acknowledged).
0:006> kL
Child-SP RetAddr Call Site
00000000`0165bf70 00000001`8000184e
filter!W3_FILTER_CONTEXT::ServerSupportFunction+0x174
00000000`0165c180 00000001`80011426 isapi_shib!TerminateFilter+0x50e
00000000`0165c1c0 00000000`74d48a40 isapi_shib!GetFilterVersion+0x2746
00000000`0165c200 00000000`74d40463 msvcr100!_CallSettingFrame+0x20
00000000`0165c230 00000000`77730c21 msvcr100!__CxxCallCatchBlock+0xeb
00000000`0165c2f0 00000001`8000cf46 ntdll!RcFrameConsolidation+0x3
00000000`0165ecd0 000007fe`f58817e4 isapi_shib!HttpFilterProc+0x3b6
00000000`0165f120 000007fe`f5881e01 filter!W3_FILTER_CONTEXT::NotifyFilters+0x178
00000000`0165f340 000007fe`f5daa185 filter!GlobalDoWork+0x351
00000000`0165f570 000007fe`f5daab24 iiscore!W3_CONTEXT::SetupStateMachine+0x685
00000000`0165fa80 000007fe`f64c10d2
iiscore!W3_MAIN_CONTEXT::OnNewRequest+0x1b0
00000000`0165fab0 000007fe`f64c109c w3dt!UL_NATIVE_REQUEST::DoWork+0x126
00000000`0165fb10 000007fe`f6391fba w3dt!OverlappedCompletionRoutine+0x1c
00000000`0165fb40 000007fe`f6392024
w3tp!THREAD_POOL_DATA::ThreadPoolThread+0x7a
00000000`0165fb90 000007fe`f63920a1
w3tp!THREAD_POOL_DATA::ThreadPoolThread+0x34
00000000`0165fbc0 00000000`774d652d
w3tp!THREAD_MANAGER::ThreadManagerThread+0x61
00000000`0165fbf0 00000000`7770c521 kernel32!BaseThreadInitThunk+0xd
00000000`0165fc20 00000000`00000000 ntdll!RtlUserThreadStart+0x1d
<Shibboleth Dev>
Something that is supposed to be set for every request by IIS is apparently not set. My logging
is unfortunately not sufficient in that release to tell which piece of information it is, but the set
of possibilities is small.
</Shibboleth Dev>
We have looked at the disassembly and this is not really the case. We have traced the exception
to be coming from the following call in the ISAPI code.
res = stf.getServiceProvider().doAuthentication(stf);
The exception that is getting raised is matching ERROR_NO_DATA so this is giving the false
reading that there is a missing header or server variable. This is something that Shibboleth
support will have to help you track down. While this exception is the trigger for the AV that is
crashing the process, it is a separate issue from the actual AV which I will speak to now.
The AV is happening because of an incorrect cast of a pointer to a DWORD which is causing
the incorrect pointer to be used in the ServerSupportFunction call. The operation that causes the
AV is seen here where we are dereferencing the pointer in register RDI. RDI is supposed to
contain the pointer to the string the ISAPI filter is attempting to send in the response.
filter!W3_FILTER_CONTEXT::ServerSupportFunction+0x174:
000007fe`f588bc14 f2ae repne scas byte ptr [rdi]
As we see below the pointer value in RDI is 0000000080012798
0:006> r
rax=0000000000000000 rbx=000000000112a0f0 rcx=ffffffffffffffff
rdx=000000000165c030 rsi=000000000112b0f8 rdi=0000000080012798
rip=000007fef588bc14 rsp=000000000165bf70 rbp=000007fef588f600
r8=0000000000000100 r9=0000000080012798 r10=0000000000000008
r11=000000000165c170 r12=0000000000000001 r13=0000000080012798
r14=000000018001278c r15=0000000180011400
iopl=0 nv up ei pl zr na po nc
cs=0033 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00010244
filter!W3_FILTER_CONTEXT::ServerSupportFunction+0x174:
000007fe`f588bc14 f2ae repne scas byte ptr [rdi]
If we dump out the contents of what’s at RDI we see
0:006> dc @rdi
00000000`80012798 ???????? ???????? ???????? ???????? ????????????????
00000000`800127a8 ???????? ???????? ???????? ???????? ????????????????
00000000`800127b8 ???????? ???????? ???????? ???????? ????????????????
00000000`800127c8 ???????? ???????? ???????? ???????? ????????????????
00000000`800127d8 ???????? ???????? ???????? ???????? ????????????????
00000000`800127e8 ???????? ???????? ???????? ???????? ????????????????
00000000`800127f8 ???????? ???????? ???????? ???????? ????????????????
00000000`80012808 ???????? ???????? ???????? ???????? ????????????????
And if we check the memory usage at that address we see it is free memory that has not been
allocated yet so this is not a valid pointer.
0:006> !address @rdi
Usage: Free
Base Address: 00000000`7fff0000
End Address: 00000000`ff4a0000
Region Size: 00000000`7f4b0000
Type: 00000000
State: 00010000 MEM_FREE
Protect: 00000001 PAGE_NOACCESS
On a hunch we checked what’s at 0000000180012798 which would be the address if the value
wasn’t truncated by the invalid cast. Here we see clearly the string that the developer was
intending to return.
0:006> dc 0000000180012798
00000001`80012798 6e6e6f43 69746365 203a6e6f 736f6c63 Connection: clos
00000001`800127a8 430a0d65 65746e6f 542d746e 3a657079 e..Content-Type:
00000001`800127b8 78657420 74682f74 0a0d6c6d 00000a0d text/html......
00000001`800127c8 00000000 00000000 4d54483c 483c3e4c ........<HTML><H
00000001`800127d8 3e444145 5449543c 533e454c 62626968 EAD><TITLE>Shibb
00000001`800127e8 74656c6f 69462068 7265746c 72724520 oleth Filter Err
00000001`800127f8 2f3c726f 4c544954 2f3c3e45 44414548 or</TITLE></HEAD
00000001`80012808 4f423c3e 3c3e5944 533e3148 62626968 ><BODY><H1>Shibb
0:006> d
00000001`80012818 74656c6f 69462068 7265746c 72724520 oleth Filter Err
00000001`80012828 2f3c726f 003e3148 4f422f3c 3c3e5944 or</H1>.</BODY><
00000001`80012838 4d54482f 00003e4c 4d54483c 483c3e4c /HTML>..<HTML><H
00000001`80012848 3e444145 5449543c 533e454c 62626968 EAD><TITLE>Shibb
00000001`80012858 74656c6f 72452068 3c726f72 5449542f oleth Error</TIT
00000001`80012868 3c3e454c 4145482f 423c3e44 3e59444f LE></HEAD><BODY>
00000001`80012878 3e31483c 62696853 656c6f62 45206874 <H1>Shibboleth E
00000001`80012888 726f7272 31482f3c 0000003e 00000000 rror</H1>.......
0:006> d
00000001`80012898 69727473 7420676e 6c206f6f 00676e6f string too long.
00000001`800128a8 61766e69 2064696c 69727473 7020676e invalid string p
00000001`800128b8 7469736f 006e6f69 800144d0 00000001 osition..D......
00000001`800128c8 80002b10 00000001 8000f7dc 00000001 .+..............
00000001`800128d8 8000f7e2 00000001 80003d70 00000001 ........p=......
00000001`800128e8 800027f0 00000001 8000f7e8 00000001 .'..............
00000001`800128f8 80002750 00000001 8000f7ee 00000001 P'..............
00000001`80012908 8000f7f4 00000001 8000f7fa 00000001 ..............
Below highlighted in yellow is where the bug is:
DWORD WriteClientError(PHTTP_FILTER_CONTEXT pfc, const char* msg)
{
LogEvent(nullptr, EVENTLOG_ERROR_TYPE, 2100, nullptr, msg);
static const char* ctype="Connection: close\r\nContent-Type: text/html\r\n\r\n";
pfc->ServerSupportFunction(pfc,SF_REQ_SEND_RESPONSE_HEADER,"200
OK",(DWORD)ctype,0);
static const char* xmsg="<HTML><HEAD><TITLE>Shibboleth Filter
Error</TITLE></HEAD><BODY>"
"<H1>Shibboleth Filter Error</H1>";
We found same problem in several other places in the vendors code. This is a problem because a
DWORD will only hold a 32bit address on x86 or x64. The vendor should probably use a
UINT_PTR instead
http://msdn.microsoft.com/en-us/library/aa384242(v=VS.85).aspx
You will need to take this up with the developer for a fix in their ISAPI filter. Please let me
know if you have any questions or concerns.
Comment by Scott Cantor [ 01/Nov/11 ]
Yes, that's very helpful. The log entry you provided does prove that the source of the exception
is a piece of request data that's missing. They are in turn correct that the crash is because of the
pointer cast. Their API used to be based on a DWORD parameter (the cast was imposed by their
API), and it's now been changed to take a larger value to accomodate the 64-bit pointer. So
that's an easy fix I will make, but I have to be somewhat careful not to break older IIS support.
That fix alone will just change it from a crash to a failure page the browser will display about
the missing data, as it does for me when I test it. The other fixes still need to be made to figure
out which piece of data is missing and work around it.
I'm sorry this is an emergency for you, but all I can do is work on it when my local work
schedule permits. I expect that I should be able to produce a build to try by tomorrow evening.
Comment by Scott Cantor [ 01/Nov/11 ]
Patch should fix the 64-bit casts and eliminates exceptions when request variables aren't set by
IIS. Missing port is assumed to be 80/443, missing name is assumed to be the canonical vhost
name. Missing INSTANCE_ID is fatal and returns an error to the client.
http://svn.shibboleth.net/view/cpp-sp?rev=3536&view=rev
Comment by Scott Cantor [ 01/Nov/11 ]
Ok, I've lightly tested this patched version incorporating the check-in. I tested primarily with a
64-bit IIS process to match what you're using.
I signed the archive with my PGP key so you can verify the files if you want to.
Use the appropriate architecture DLL from the ZIP and copy into lib/shibboleth.
Aside from the obvious monitoring, I think you'll need to keep an eye on the Windows Event
log and see if it reports any problems accessing IIS variables. Logging has been added to try and
report on that.
It's my expectation that your system may still have a problem, but hopefully this will stop
crashing it, and will report the problem back to the user. But that won't ultimately fix things for
the user, so we need to monitor it and find out why it was throwing this exception to begin with.
Comment by jlaweave@idp.protectnetwork.org [ 02/Nov/11 ]
Hi Scott,
Due to our circumstances, we put your patch into production early this morning but had to back
out after experiencing problems. What we saw upon starting IIS and the Shibboleth Daemon
was an immediate 99% processor utilization from the w3wp process associated with our
application and subsequent IIS recycle and a box reboot did not fix it. Backing out the fix
resolved the problem.
Unfortunately, we had a small window of opportunity to put the patch out and I did not have
time to try and diagnose what was causing the problem.
For reference, I copied over the two dll's from the x64 zip directory and put them in their
respective shibboleth installation directories (after stopping IIS/Shibboleth Daemon).
Can you think of anything else that I should have done differently?
Comment by Scott Cantor [ 02/Nov/11 ]
No, I saw something similar but it appeared to be unrelated to the SP so I wrote it off at the
time. I don't know what could have been caused by any changes I made, but I'll have to dig
deeper.
In the meantime, I can back out to a simpler patch that just tries to fix the crash alone.
Comment by Scott Cantor [ 02/Nov/11 ]
I replaced the patched version with one that only fixes the pointer cast problem. Since I'm not
seeing the crash, I can't be sure of that, but the cast matches the new API documentation from
MS.
With this change, if the crashing stops, you'll definitely get errors in the browser about the
missing variable, but we won't know for sure what's causing that.
Comment by Scott Cantor [ 02/Nov/11 ]
Disregard the last, I found the CPU bug, simple accident. Patch is now updated again with the
full fix. Will wait for your feedback.
Comment by Scott Cantor [ 02/Nov/11 ]
Patch ported back to development branch.
http://svn.shibboleth.net/view/cpp-sp?rev=3539&view=rev
Comment by Scott Cantor [ 06/Aug/12 ]
No response from reporter to interim fix, but patch was backported into the new release.
Generated at Mon Mar 07 01:20:34 EST 2016 using JIRA 7.0.10#70120sha1:37e3d7a6fc4d580639533e7f7c232c925e554a6a.
Download