[Openib-windows] races on __destrot_obj function
Fabian Tillier
ftillier at silverstorm.com
Tue Jul 11 09:05:50 PDT 2006
Hi Yossi,
On 7/11/06, Yossi Leybovich <sleybo at mellanox.co.il> wrote:
>
>
> > -----Original Message-----
> > From: ftillier.sst at gmail.com [mailto:ftillier.sst at gmail.com]
> > On Behalf Of Fabian Tillier
> > Sent: Tuesday, July 11, 2006 6:16 PM
> > To: Yossi Leybovich
> > Cc: openib-windows at openib.org
> > Subject: Re: [Openib-windows] races on __destrot_obj function
> >
> > Hi Yossi,
> >
> > On 7/11/06, Yossi Leybovich <sleybo at mellanox.co.il> wrote:
> > > Attached is patch that solve this problem
> > >
> > > It includes:
> > >
> > > 1. fix to cl_obj to return value if the parent port is not in
> > > CL_INITIALIZED
> >
> > I think this should be an assertion - the client should be
> > able to check its state before trying to make this call - the
> > assertion would trap improper usage.
> >
>
> Why just ASSERT ? Look at the al_obj it uses similar mechanism but al
> obj does not allowed to add child to parent obj that is not in
> CL_INITIALIZED state.
> Why we need to remember to check the state in all passes of child
> creation let the code do that for us (same as attched_al..)
>
> > A state check in ipoib_port_resume should be all that's
> > needed to prevent the MC join from being kicked off.
>
> I want general solution not just to the ipoib_port, but to all clients
> that use cl_obj (adapter,endpt).
> This way we can keep resume similar to send so that the code try to send
> but fail because the QP is in error state or because ethere is no endpt,
> no need to check port/adapter state.
Actually, you're right - we'd need to do more than just check the port
state, but sychronize against it changing. Having the function return
is cleaner.
> > > 2. fix to insert/insert_locked function to check return value
> >
> > Same as above.
>
> Even if we move the check from cl_obj
> the client (ipoib) still need to check its state before trying to add
> itself to the father list.
> So insert/insert_lock might fail so they should be change to return
> value.
>
> >
> > > 3. add ref count mechanism to help debug ref count problems
> >
> > This is good.
> >
> > > 4. not call port_resume in endpt cleanup but in port destroy
> >
> > I think this is needed for the case where the adapter was
> > part of an MC group, had some sends posted, and then the MC
> > group is being explicitly left. If the sends to that MC
> > group where posted before the MC group is resolved, they
> > would block the send queue until some other code path calls
> > ipoib_port_resume.
>
> But ipoib_port_resume does not guaranty that you flush all the sends to
> that MC group.
> The send to the MC group might be at the end of the pending_list, so if
> the sq_depth is full you still end up destroy the endpt while having
> outstanding sends to that MC.
> Any way if IPoIB come to MC send with out matching endpt :
> 1. if its 10005E.. MAC ,it will create new endpt and try to send - so
> its OK
> 2. other MC MAC drop the send - OK.
You're right, I didn't think of the case where a different send would
block the send queue...
I'm looking closer at the patch and have some other comments - but
I'll send those in a different message.
Thanks!
- Fab
More information about the ofw
mailing list