[Openib-windows] RE: [PATCH] client reregister

Yossi Leybovich sleybo at mellanox.co.il
Tue Sep 13 07:32:05 PDT 2005



> -----Original Message-----
> From: Fab Tillier [mailto:ftillier at silverstorm.com] 
> Sent: Monday, September 12, 2005 11:00 PM
> To: 'Yossi Leybovich'
> Cc: openib-windows at openib.org
> Subject: RE: [PATCH] client reregister 
> 
> 
> > From: Yossi Leybovich [mailto:sleybo at mellanox.co.il]
> > Sent: Monday, September 12, 2005 10:20 AM
> > 
> > Fab
> >
> > Attach is patch for supporting client reregister event in the HCA , 
> > PnP manager and in the IPoIB I followed your recommendation to 
> > implement reregistration just in the level of IPoIB till 
> you will have 
> > the full design to the mcast/service_record mngr in the AL
> >
> > Main changes:
> > 1. for the HCA async event I add the support to parse the 
> event, but I 
> > didn't have the chance to check that it actually work (I encounter 
> > problem to get async events from the HCA  , I don't get the 
> cq async 
> > events as well. I will debug that later thus week) The code 
> just add 
> > defines and cases and I think t safe to add it.
> 
> This code looks fine to me.  If you want, we can split this 
> patch into three parts - adding reregister support to IBAL, 
> to the HCA driver, and to IPoIB. This might help keep the 
> discussions more focused.  It's up to you, though.

I don't really mind , of course the important part is of the IPoIB.

> 
> > 2. In PnP manager I trapped the port_info and set flag to 
> signal that 
> > client reregister was set by the SM .
> 
> Is there any difference in client behavior between SM_CHANGE 
> and CLIENT_REREGISTER?  If not, can we use SM_CHANGE for 
> both?  In either case, the SM has lost all information about 
> the end node.  Is the only difference that the SM LID didn't 
> change?  If so, rather than introducing an internal field in 
> a public structure, why not set the SM LID to zero in the PnP 
> manager's cached attributes?  This would cause the poll 
> operation to detect the change as an SM_CHANGE, thus 
> collapsing the two events into one.
> 

You right the what really matter is that the SM lost all privios records. 
So you can remove the new field and set the LID to zero

> > 3. In the IPoIB after getting the PNP_CLIENT_REREGISTER 
> event I remove 
> > the bcast entry and issue get_bcast again. After the IpoIB join the 
> > bcast group it register the IP address and refresh the 
> mcast array (as 
> > it do in the PNP_ACTIVE).
> 
> Since neither the mcast or service registrations where 
> flushed, refresh_mcast and reg_addrs will do nothing.  We 
> need to add support in IBAL to reregister these given an 
> existing object handle, or add code to IPoIB to first 
> deregister and then reregister.  I would prefer to add this to IBAL.

OK . I will set the adaper->hang to TRUE as you suggets.
That will solve the problem.
After we will implement the reregistration in IBAL we can remove this.

> 
> > Of course that wont be needed after we will implement the 
> > mcast/service_records reregistration mngr in the AL (BTW 
> did you find 
> > the time to put some initial design draft)
> 
> An alternative to these changes in IPoIB is to do something 
> similar to what the Linux driver does - down the interface 
> and then up it again.  The easiest way to do this is to punt 
> on NDIS and set the adapter's hung flag.  This flag will 
> cause NDIS to reset the adapter, which will re-register 
> everything properly.
> 
> This is probably the simplest solution, albeit a little 
> crude.  It will get the work done, though, and we can refine it later.

OK.




> > 
> > Singned-off-by:Yossi Leybovich (sleybo at mellanox.co.il)
> > Index: core/al/al_init.c 
> > ===================================================================
> > --- core/al/al_init.c   (revision 355)
> > +++ core/al/al_init.c   (working copy)
> > @@ -39,7 +39,7 @@
> > 
> > 
> > 
> > -uint32_t                               g_al_dbg_lvl = AL_DBG_ERROR;
> > +uint32_t                               g_al_dbg_lvl = 
> AL_DBG_ERROR |
> > AL_DBG_SMI;
> 
> The debug level should stay as AL_DBG_ERROR in SVN.  Note 
> that you can set the debug level via the registry, as a key 
> under the ibal service parameters - this value will be set 
> when the driver loads.  This might make it easier to test 
> without having to recompile when changing debug levels.  You 
> can also change the debug level via the kernel debugger.
> 

Sorry ,my mistake I didn't intend to send it in the patch .


> (p_port_info->subnet_timeout & 0x80)
> > +                               {
> > +                                       
> AL_TRACE(AL_DBG_PNP,(" setting 
> > + the
> > client reregister event \n"));
> > +                                       
> > + p_spl_qp_svc->obj.p_ci_ca->p_pnp_attr-
> > >p_port_attr->client_reregistration = 1;
> > +                               }
> 
> Access to the port attributes needs to be synchronized with 
> the PnP manager through a call to ci_ca_lock_attr and 
> ci_ca_unlock_attr.  Can we just set the SM LID to zero in the 
> PnP manager's cached attributes?  This would result in 
> existing clients potentially seeing a SM LID of zero, but 
> they would receive a SM_CHANGE event pretty quickly updating 
> them with the new value.  This would avoid adding an internal 
> flag to a public structure.  Of course, this would only work 
> if we reuse the SM_CHANGE event rather than define a new one.
>  

As said above I will use the SM change event.

> >                         }
> >                 }
> >         }
> > Index: hw/mt23108/kernel/hca_verbs.c 
> > ===================================================================
> > --- hw/mt23108/kernel/hca_verbs.c       (revision 355)
> > +++ hw/mt23108/kernel/hca_verbs.c       (working copy)
> > @@ -414,6 +414,10 @@
> >                 hca_attr_mask |= HCA_ATTR_IS_VENDOR_CLS_SUP;
> >                 hca_attr.is_vendor_cls_sup = 
> (MT_bool)p_port_attr->cap.vend;
> >         }
> > +       if (modca_cmd & IB_CA_MOD_IS_CLIENT_REREGISTER_SUPPORTED) {
> > +               hca_attr_mask |= 
> HCA_ATTR_IS_CLIENT_REREGISTRATION_SUP;
> > +               hca_attr.is_client_reregister_sup= 
> > + (MT_bool)p_port_attr-
> > >cap.client_reregister;
> > +       }
> 
> Do we want to let users modify this?  It seems that making 
> such a change would have global effects on unsuspecting 
> clients.  I would rather not support disabling the client 
> reregister flag as I don't see it providing any value except 
> potentially for DoS type of activities.  This falls into a 
> larger port capability management issue that we should tackle 
> at some point.
> 

This is bigger issue, do we want to let the user change Q_KEY violation SNMP
support etc. ?
I think that like Gen2 we should remove modify CA from user level and allow
it only in kernel level
(In Gen2 Modify CA change only the sys image GUID and in kernel one can
change the cap mask by modify_port )
Still this code is in low level driver and should support the set/reset of
this value.

If we do that we need to supply hack to set the flag of IsSM from user.

 
> > @@ -7177,6 +7177,7 @@
> >         IB_AE_WQ_ACCESS_ERROR,
> >         IB_AE_PORT_ACTIVE,
> >         IB_AE_PORT_DOWN,
> > +       IB_AE_CLIENT_REREGISTER,
> >         IB_AE_UNKNOWN           /* ALWAYS LAST ENUM VALUE */
> 
> This event should be trapped in al_ci_ca_shared.c, in the 
> function ci_ca_process_event_cb and processed like other port 
> events (~line 227).
> 
> > 
> >  }      ib_async_event_t;
> > @@ -9279,6 +9288,10 @@
> >  *
> >  *      IB_CA_MOD_SYSTEM_IMAGE_GUID
> >  *              Used to modify the system image GUID for the port.
> > +*
> > +*      IB_CA_MOD_IS_CLIENT_REREGISTER_SUPPORTED
> > +*              Used to modify the system image GUID for the port.
> 
> Minor nit - the comment is wrong.  Goes away if we disallow 
> changing the client reregister support bit.

As I wrote above , we should leave it the same as the other capabilities.
Need to change the comment.
> >         }
> > @@ -855,7 +876,20 @@
> >         case IB_PNP_PORT_REMOVE:
> >                 ipoib_resume_oids( p_adapter );
> >                 break;
> > +       case IB_PNP_CLIENT_REREGISTER:
> > 
> > +               /* Join all programmed multicast groups. */
> > +               ipoib_refresh_mcast(p_adapter, 
> p_adapter->mcast_array, 
> > +p_adapter->mcast_array_size);
> 
> This won't re-join the groups because they will already have 
> an endpoint.  You either have to flush those endpoints just 
> like you did with the broadcast group, or we need to find a 
> way to re-register the existing groups.  Flushing the 
> endpoints will cause packets to be dropped.
> 
> Should we be reporting a cable disconnected event from the 
> time we detect the SM has changed to the time everything has 
> been restored?

I will use your suggestion to signal the adapterr as hang.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/ofw/attachments/20050913/9d429590/attachment.html>


More information about the ofw mailing list