[Openib-windows] IPoIB crash

Fabian Tillier ftillier at silverstorm.com
Mon Sep 11 10:19:37 PDT 2006


Hi Anatoly,

On 9/5/06, Fabian Tillier <ftillier at silverstorm.com> wrote:
> Hi Anatoly,
>

<snip...>

> > From analyzing the crash dump, I've found that
> > p_endpt->lid_item->pool_item.list_item->p_next is NULL.
> >
> > The crash itself happens in the line "p_list_item->p_next->p_prev=
> > p_list_item->p_prev" in the inline function __cl_primitive_remove() called
> > from cl_fmap_remove_item()
> >
> > I've search for unprotected changes of lid_item, and found the following (at
> > __path_query_cb):
> >
> >  if( !p_endpt->dlid )
> > {
> >             cl_map_item_t   *p_qitem;
> >
> >             /* This is a subnet local endpoint that does not have its LID
> > set. */
> >             p_endpt->dlid = p_path->dlid;
> >
> >             /*
> >              * Insert the item in the LID map so that locally routed unicast
> >              * traffic will resolve it properly.
> >              */
> >             cl_obj_lock( &p_port->obj );
> >
> >             p_qitem = cl_qmap_insert( &p_port->endpt_mgr.lid_endpts,
> >
> > p_endpt->dlid, &p_endpt->lid_item );
> >             CL_ASSERT( p_qitem == &p_endpt->lid_item );
> >             cl_obj_unlock( &p_port->obj );
> > }
> >
> > What do you say ?
>
> That's definitely a bug.
>
> > Do we need to lock the reference to p_endpt->dlid with cl_obj_lock/unlock(
> > &p_endpt->obj ) ?
>
> You need a lock, but it needs to beh the port object's lock since that
> is what is held when the LID is checked in __endpt_mgr_reset_all.
>
> We need to take the port lock before the if( !p_endpt->dlid ).
>
> Can you try the attached patch and see if it resovles the issue?  If
> it does, let me know and I will check it in.

I checked in the patch I had sent as revision 491.

- Fab




More information about the ofw mailing list