[ofw] svn.2079 checkins and hung dapltest.exe server process

Tzachi Dar tzachid at mellanox.co.il
Wed Apr 15 23:38:00 PDT 2009


Please note that once the process can't die you can break in with a
kernel debugger (local or remote) and dump the zombie process. This will
give us a much better understanding of where the problem is.

To do that please run "!process 0 0" to see the list of processes and
then "!process xxx 7" where x is the zombie process.
Please send the results and we will be able to tell more where the
problem is.

Thanks
Tzachi
 

> -----Original Message-----
> From: Smith, Stan [mailto:stan.smith at intel.com] 
> Sent: Thursday, April 16, 2009 3:46 AM
> To: Tzachi Dar; Fab Tillier
> Cc: ofw at lists.openfabrics.org
> Subject: RE: [ofw] svn.2079 checkins and hung dapltest.exe 
> server process
> 
> Hello,
> 
> Tzachi Dar wrote:
> > Have you been using ipoib? (the other option that I see is wsd).
> 
> Yes - IPoIB & WSD both exhibit the hung (zombie) process in 
> exit. WSD on the sending/client side with dapltest IPoIB on 
> the server side.
> 
> >
> > In that case, than the only problem that I can think of is of sends 
> > that were not completed in time.
> >
> > Do you have any message in the event log?
> 
> I always seem to forget about the event log, will check and 
> get back with you.
> I suspect no event log entries in that the process has seen 
> no device/send/recv errors and has proceeded to call exit; 
> the user-mode process is stuck in the kernel - a zombie 
> process unable to die.
> Fab had suggested a lost IRP or non-zero reference counts.
> 
> Stan.
> 
> >
> > Thanks
> > Tzachi
> >
> >> -----Original Message-----
> >> From: ofw-bounces at lists.openfabrics.org 
> >> [mailto:ofw-bounces at lists.openfabrics.org] On Behalf Of Smith, Stan
> >> Sent: Tuesday, April 14, 2009 3:16 AM
> >> To: Fab Tillier
> >> Cc: ofw at lists.openfabrics.org
> >> Subject: [ofw] svn.2079 checkins and hung dapltest.exe 
> server process
> >>
> >>
> >> Further testing of hung windows process has shown the problem in 
> >> ttcp.exe (sender) when an IPoIB IPv4 remote address is used.
> >>
> >> A debug version of ibbus.sys did not fire any asserts w.r.t.
> >> reference counting or anything else.
> >>
> >> Attaching windbg to the hung process shows it's waiting in the NT 
> >> kernel - somewhat supports the ref counting theory.
> >> More attention needed here as I did not have current 
> kernel symbols.
> >>
> >> The salient point: this is not 'just' a dapltest.exe issue.
> >>
> >> Cause-n-effect point out that before svn.2079 daptest.exe worked 
> >> fine, afterwards problems.
> >>
> >> Stan.
> >> _______________________________________________
> >> ofw mailing list
> >> ofw at lists.openfabrics.org
> >> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ofw
> 
> 



More information about the ofw mailing list