[openib-general] RE: Some opensm/osm_vl15intf.c questions

Hal Rosenstock halr at voltaire.com
Thu Jan 5 06:49:07 PST 2006


Hi Yael,

On Thu, 2006-01-05 at 07:52, Yael Kalka wrote:
> Hi Hal,
> I've reviewed the code and as you noted - there is a bug in the code,
> actually 2 bugs.
> The qp0_mads_outstanding counter is incremented on mads with response
> expected.
> In the "regular" flow this counter is decremented through
> __osm_sm_mad_ctrl_retire_trans_mad,
> after the response was received and the mad was handled.
> The qp0_mads_outstanding is used for the signalling of
> "NO_PENDING_TRANSACTIONS", so I 
> do not think this counter should be incremented when the mads are not
> ones with response expected.
> The 2 bugs are, thus:
> 1. In the _osm_vl15_poller in case of osm_vendor_send failure. This code
> is similar to
>    the one in __osm_sm_mad_ctrl_retire_trans_mad, but should decrement
> and handle the 
>    qp0_mads_outstanding only if response_expected == TRUE. 
> 2. In the shutdown process. Again, only mads with response_expected ==
> TRUE should decrement 
>    the counter.

Makes sense.

> The reason we haven't seen errors due to this is because the main and
> usual flow is fine.
> I will issue a patch for fixing this soon.

Thanks.

-- Hal

> Thanks,
> Yael
> 
> -----Original Message-----
> From: Eitan Zahavi 
> Sent: Saturday, December 31, 2005 11:43 AM
> To: 'Hal Rosenstock'; Yael Kalka
> Cc: openib-general at openib.org
> Subject: RE: Some opensm/osm_vl15intf.c questions
> 
> 
> Hi Hal,
> 
> As Yael was working on the ref-counting issues (a month or two ago) I
> will let her answer. It is very possible we are missing some.
> 
> Eitan Zahavi
> Design Technology Director
> Mellanox Technologies LTD
> Tel:+972-4-9097208
> Fax:+972-4-9593245
> P.O. Box 586 Yokneam 20692 ISRAEL
> 
> 
> > -----Original Message-----
> > From: Hal Rosenstock [mailto:halr at voltaire.com]
> > Sent: Friday, December 30, 2005 6:03 PM
> > To: Eitan Zahavi
> > Cc: openib-general at openib.org
> > Subject: Some opensm/osm_vl15intf.c questions
> > 
> > Hi Eitan,
> > 
> > In chasing an issue with a trap repress not being sent in a certain
> > scenario, I stumbled across the following questions about
> > opensm/osm_vl15intf.c.
> > 
> > 1. osm_vl15_post increments qp0_mads_outstanding when a response is
> > expected (rfifo) and not when unsolicited (ufifo) (what appears to be
> > called unicasts):
> > 
> > osm_vl15_post:
> >   if( p_madw->resp_expected == TRUE )
> >   {
> >     cl_qlist_insert_tail( &p_vl->rfifo, (cl_list_item_t*)p_madw );
> >     cl_atomic_inc( &p_vl->p_stats->qp0_mads_outstanding );
> >   }
> >   else
> >   {
> >     cl_qlist_insert_tail( &p_vl->ufifo, (cl_list_item_t*)p_madw );
> >   }
> > 
> > osm_vl15_shutdown retires all outstanding MADs as follows:
> > 
> > osm_vl15_shutdown:
> >   while ( p_madw != (osm_madw_t*)cl_qlist_end( &p_vl->ufifo ) )
> >   {
> >     if( osm_log_is_active( p_vl->p_log, OSM_LOG_DEBUG ) )
> >     {
> >       osm_log( p_vl->p_log, OSM_LOG_DEBUG,
> >                "osm_vl15_shutdown: "
> >                "Releasing Response p_madw = %p\n", p_madw );
> >     }
> > 
> >     osm_mad_pool_put( p_mad_pool, p_madw );
> >     cl_atomic_dec( &p_vl->p_stats->qp0_mads_outstanding );
> > 
> >     p_madw = (osm_madw_t*)cl_qlist_remove_head( &p_vl->ufifo );
> >   }
> > 
> > Either post should increment qp0_mads_outstanding for unsolicited or
> > shutdown shouldn't decrement it when removing from ufifo. If you
> agree,
> > which should it be ?
> > 
> > 2. In the case of a failure from osm_vendor_send, __osm_vl15_poller
> > decrements qp0_mads_outstanding regardless of whether a response is
> > expected. This is inconsistent with the increment. This leads me to
> > believe that this should also be incremented for unsolicited
> (unicasts)
> > as well as those for which responses are expected. Is this correct or
> am
> > I missing something ?
> > 
> > So my conclusion is that in osm_vl15_post, it should be:
> > 
> >   if( p_madw->resp_expected == TRUE )
> >   {
> >     cl_qlist_insert_tail( &p_vl->rfifo, (cl_list_item_t*)p_madw );
> >   }
> >   else
> >   {
> >     cl_qlist_insert_tail( &p_vl->ufifo, (cl_list_item_t*)p_madw );
> >   }
> >   cl_atomic_inc( &p_vl->p_stats->qp0_mads_outstanding );
> > 
> > If you agree, I will generate a patch for this. Thanks.
> > 
> > -- Hal




More information about the general mailing list