[Openib-windows] A problem in ib_close_al

Leonid Keller leonid at mellanox.co.il
Tue Jul 25 01:50:34 PDT 2006


 

> -----Original Message-----
> From: ftillier.sst at gmail.com [mailto:ftillier.sst at gmail.com] 
> On Behalf Of Fabian Tillier
> Sent: Tuesday, July 25, 2006 4:11 AM
> To: Leonid Keller
> Cc: openib-windows at openib.org
> Subject: Re: [Openib-windows] A problem in ib_close_al
> 
> Hi again Leo,
> 
> On 7/23/06, Leonid Keller <leonid at mellanox.co.il> wrote:
> > Hi Fab,
> > Seems like I found the reason of the stuck on shutdown.
> > Find attached 2 patches for problems, which I come across on during 
> > investigating of this case.
> > Here are short description.
> > 1. (a bug responsible for the stuck)
> >        If a send MAD times out, it sends once more, so one 
> can get 2 
> > responds for it.
> 
> I'm confused here - the code will retry a send only as many 
> times as specified by the retry_cnt field.  I don't see where 
> the extra send comes from.  Can you explain?

I didn't check retry_cnt and I'm not sure, I can explain, why it gets
here, but it does.
> 
> I do however see that a timeout of preceding send could 
> result in a retry, and two responses could be received before 
> that send completes.
>  This however seems extremely unlikely, and that is the only 
> time that the response MAD could be leaked.  It's not 
> impossible, though, so the check you suggest is correct - 
> I've committed a similar fix in revision 429.
> 
> Please let me know if this solves the leak or if there is 
> still some other issue.
> 
> Thanks,
> 
> - Fab
> 
Thank you, we'll check.




More information about the ofw mailing list