[ofw] svn.2079 checkins and hung dapltest.exe server process
Tzachi Dar
tzachid at mellanox.co.il
Wed Apr 15 23:38:00 PDT 2009
Please note that once the process can't die you can break in with a
kernel debugger (local or remote) and dump the zombie process. This will
give us a much better understanding of where the problem is.
To do that please run "!process 0 0" to see the list of processes and
then "!process xxx 7" where x is the zombie process.
Please send the results and we will be able to tell more where the
problem is.
Thanks
Tzachi
> -----Original Message-----
> From: Smith, Stan [mailto:stan.smith at intel.com]
> Sent: Thursday, April 16, 2009 3:46 AM
> To: Tzachi Dar; Fab Tillier
> Cc: ofw at lists.openfabrics.org
> Subject: RE: [ofw] svn.2079 checkins and hung dapltest.exe
> server process
>
> Hello,
>
> Tzachi Dar wrote:
> > Have you been using ipoib? (the other option that I see is wsd).
>
> Yes - IPoIB & WSD both exhibit the hung (zombie) process in
> exit. WSD on the sending/client side with dapltest IPoIB on
> the server side.
>
> >
> > In that case, than the only problem that I can think of is of sends
> > that were not completed in time.
> >
> > Do you have any message in the event log?
>
> I always seem to forget about the event log, will check and
> get back with you.
> I suspect no event log entries in that the process has seen
> no device/send/recv errors and has proceeded to call exit;
> the user-mode process is stuck in the kernel - a zombie
> process unable to die.
> Fab had suggested a lost IRP or non-zero reference counts.
>
> Stan.
>
> >
> > Thanks
> > Tzachi
> >
> >> -----Original Message-----
> >> From: ofw-bounces at lists.openfabrics.org
> >> [mailto:ofw-bounces at lists.openfabrics.org] On Behalf Of Smith, Stan
> >> Sent: Tuesday, April 14, 2009 3:16 AM
> >> To: Fab Tillier
> >> Cc: ofw at lists.openfabrics.org
> >> Subject: [ofw] svn.2079 checkins and hung dapltest.exe
> server process
> >>
> >>
> >> Further testing of hung windows process has shown the problem in
> >> ttcp.exe (sender) when an IPoIB IPv4 remote address is used.
> >>
> >> A debug version of ibbus.sys did not fire any asserts w.r.t.
> >> reference counting or anything else.
> >>
> >> Attaching windbg to the hung process shows it's waiting in the NT
> >> kernel - somewhat supports the ref counting theory.
> >> More attention needed here as I did not have current
> kernel symbols.
> >>
> >> The salient point: this is not 'just' a dapltest.exe issue.
> >>
> >> Cause-n-effect point out that before svn.2079 daptest.exe worked
> >> fine, afterwards problems.
> >>
> >> Stan.
> >> _______________________________________________
> >> ofw mailing list
> >> ofw at lists.openfabrics.org
> >> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ofw
>
>
More information about the ofw
mailing list