[ofw] svn.2079 checkins and hung dapltest.exe server process

Smith, Stan stan.smith at intel.com
Wed Apr 15 17:46:01 PDT 2009


Hello,

Tzachi Dar wrote:
> Have you been using ipoib? (the other option that I see is wsd).

Yes - IPoIB & WSD both exhibit the hung (zombie) process in exit. WSD on the sending/client side with dapltest IPoIB on the server side.

>
> In that case, than the only problem that I can think of is of sends
> that were not completed in time.
>
> Do you have any message in the event log?

I always seem to forget about the event log, will check and get back with you.
I suspect no event log entries in that the process has seen no device/send/recv errors and has proceeded to call exit; the user-mode process is stuck in the kernel - a zombie process unable to die.
Fab had suggested a lost IRP or non-zero reference counts.

Stan.

>
> Thanks
> Tzachi
>
>> -----Original Message-----
>> From: ofw-bounces at lists.openfabrics.org
>> [mailto:ofw-bounces at lists.openfabrics.org] On Behalf Of Smith, Stan
>> Sent: Tuesday, April 14, 2009 3:16 AM
>> To: Fab Tillier
>> Cc: ofw at lists.openfabrics.org
>> Subject: [ofw] svn.2079 checkins and hung dapltest.exe server process
>>
>>
>> Further testing of hung windows process has shown the problem
>> in ttcp.exe (sender) when an IPoIB IPv4 remote address is used.
>>
>> A debug version of ibbus.sys did not fire any asserts w.r.t.
>> reference counting or anything else.
>>
>> Attaching windbg to the hung process shows it's waiting in
>> the NT kernel - somewhat supports the ref counting theory.
>> More attention needed here as I did not have current kernel symbols.
>>
>> The salient point: this is not 'just' a dapltest.exe issue.
>>
>> Cause-n-effect point out that before svn.2079 daptest.exe
>> worked fine, afterwards problems.
>>
>> Stan.
>> _______________________________________________
>> ofw mailing list
>> ofw at lists.openfabrics.org
>> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ofw




More information about the ofw mailing list