[openib-general] RE: Some opensm/osm_vl15intf.c questions

Yael Kalka yael at mellanox.co.il
Thu Jan 5 04:52:31 PST 2006


Hi Hal,
I've reviewed the code and as you noted - there is a bug in the code,
actually 2 bugs.
The qp0_mads_outstanding counter is incremented on mads with response
expected.
In the "regular" flow this counter is decremented through
__osm_sm_mad_ctrl_retire_trans_mad,
after the response was received and the mad was handled.
The qp0_mads_outstanding is used for the signalling of
"NO_PENDING_TRANSACTIONS", so I 
do not think this counter should be incremented when the mads are not
ones with response expected.
The 2 bugs are, thus:
1. In the _osm_vl15_poller in case of osm_vendor_send failure. This code
is similar to
   the one in __osm_sm_mad_ctrl_retire_trans_mad, but should decrement
and handle the 
   qp0_mads_outstanding only if response_expected == TRUE. 
2. In the shutdown process. Again, only mads with response_expected ==
TRUE should decrement 
   the counter.
The reason we haven't seen errors due to this is because the main and
usual flow is fine.
I will issue a patch for fixing this soon.
Thanks,
Yael

-----Original Message-----
From: Eitan Zahavi 
Sent: Saturday, December 31, 2005 11:43 AM
To: 'Hal Rosenstock'; Yael Kalka
Cc: openib-general at openib.org
Subject: RE: Some opensm/osm_vl15intf.c questions


Hi Hal,

As Yael was working on the ref-counting issues (a month or two ago) I
will let her answer. It is very possible we are missing some.

Eitan Zahavi
Design Technology Director
Mellanox Technologies LTD
Tel:+972-4-9097208
Fax:+972-4-9593245
P.O. Box 586 Yokneam 20692 ISRAEL


> -----Original Message-----
> From: Hal Rosenstock [mailto:halr at voltaire.com]
> Sent: Friday, December 30, 2005 6:03 PM
> To: Eitan Zahavi
> Cc: openib-general at openib.org
> Subject: Some opensm/osm_vl15intf.c questions
> 
> Hi Eitan,
> 
> In chasing an issue with a trap repress not being sent in a certain
> scenario, I stumbled across the following questions about
> opensm/osm_vl15intf.c.
> 
> 1. osm_vl15_post increments qp0_mads_outstanding when a response is
> expected (rfifo) and not when unsolicited (ufifo) (what appears to be
> called unicasts):
> 
> osm_vl15_post:
>   if( p_madw->resp_expected == TRUE )
>   {
>     cl_qlist_insert_tail( &p_vl->rfifo, (cl_list_item_t*)p_madw );
>     cl_atomic_inc( &p_vl->p_stats->qp0_mads_outstanding );
>   }
>   else
>   {
>     cl_qlist_insert_tail( &p_vl->ufifo, (cl_list_item_t*)p_madw );
>   }
> 
> osm_vl15_shutdown retires all outstanding MADs as follows:
> 
> osm_vl15_shutdown:
>   while ( p_madw != (osm_madw_t*)cl_qlist_end( &p_vl->ufifo ) )
>   {
>     if( osm_log_is_active( p_vl->p_log, OSM_LOG_DEBUG ) )
>     {
>       osm_log( p_vl->p_log, OSM_LOG_DEBUG,
>                "osm_vl15_shutdown: "
>                "Releasing Response p_madw = %p\n", p_madw );
>     }
> 
>     osm_mad_pool_put( p_mad_pool, p_madw );
>     cl_atomic_dec( &p_vl->p_stats->qp0_mads_outstanding );
> 
>     p_madw = (osm_madw_t*)cl_qlist_remove_head( &p_vl->ufifo );
>   }
> 
> Either post should increment qp0_mads_outstanding for unsolicited or
> shutdown shouldn't decrement it when removing from ufifo. If you
agree,
> which should it be ?
> 
> 2. In the case of a failure from osm_vendor_send, __osm_vl15_poller
> decrements qp0_mads_outstanding regardless of whether a response is
> expected. This is inconsistent with the increment. This leads me to
> believe that this should also be incremented for unsolicited
(unicasts)
> as well as those for which responses are expected. Is this correct or
am
> I missing something ?
> 
> So my conclusion is that in osm_vl15_post, it should be:
> 
>   if( p_madw->resp_expected == TRUE )
>   {
>     cl_qlist_insert_tail( &p_vl->rfifo, (cl_list_item_t*)p_madw );
>   }
>   else
>   {
>     cl_qlist_insert_tail( &p_vl->ufifo, (cl_list_item_t*)p_madw );
>   }
>   cl_atomic_inc( &p_vl->p_stats->qp0_mads_outstanding );
> 
> If you agree, I will generate a patch for this. Thanks.
> 
> -- Hal



More information about the general mailing list