[Openib-windows] IPoIB crash
Fabian Tillier
ftillier at silverstorm.com
Mon Sep 11 10:19:37 PDT 2006
Hi Anatoly,
On 9/5/06, Fabian Tillier <ftillier at silverstorm.com> wrote:
> Hi Anatoly,
>
<snip...>
> > From analyzing the crash dump, I've found that
> > p_endpt->lid_item->pool_item.list_item->p_next is NULL.
> >
> > The crash itself happens in the line "p_list_item->p_next->p_prev=
> > p_list_item->p_prev" in the inline function __cl_primitive_remove() called
> > from cl_fmap_remove_item()
> >
> > I've search for unprotected changes of lid_item, and found the following (at
> > __path_query_cb):
> >
> > if( !p_endpt->dlid )
> > {
> > cl_map_item_t *p_qitem;
> >
> > /* This is a subnet local endpoint that does not have its LID
> > set. */
> > p_endpt->dlid = p_path->dlid;
> >
> > /*
> > * Insert the item in the LID map so that locally routed unicast
> > * traffic will resolve it properly.
> > */
> > cl_obj_lock( &p_port->obj );
> >
> > p_qitem = cl_qmap_insert( &p_port->endpt_mgr.lid_endpts,
> >
> > p_endpt->dlid, &p_endpt->lid_item );
> > CL_ASSERT( p_qitem == &p_endpt->lid_item );
> > cl_obj_unlock( &p_port->obj );
> > }
> >
> > What do you say ?
>
> That's definitely a bug.
>
> > Do we need to lock the reference to p_endpt->dlid with cl_obj_lock/unlock(
> > &p_endpt->obj ) ?
>
> You need a lock, but it needs to beh the port object's lock since that
> is what is held when the LID is checked in __endpt_mgr_reset_all.
>
> We need to take the port lock before the if( !p_endpt->dlid ).
>
> Can you try the attached patch and see if it resovles the issue? If
> it does, let me know and I will check it in.
I checked in the patch I had sent as revision 491.
- Fab
More information about the ofw
mailing list