<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META content="text/html; charset=us-ascii" http-equiv=Content-Type>
<META name=GENERATOR content="MSHTML 8.00.6001.18241"></HEAD>
<BODY>
<DIV><SPAN class=234461418-09092009><FONT color=#0000ff size=2 face=Arial>Hi 
All,</FONT></SPAN></DIV>
<DIV><SPAN class=234461418-09092009><FONT color=#0000ff size=2 
face=Arial></FONT></SPAN> </DIV>
<DIV><SPAN class=234461418-09092009><FONT color=#0000ff size=2 face=Arial>Sorry 
I am not following the exact details of the changes made.</FONT></SPAN></DIV>
<DIV><SPAN class=234461418-09092009><FONT color=#0000ff size=2 face=Arial>But 
one thing must be clear:</FONT></SPAN><SPAN class=234461418-09092009><FONT 
color=#0000ff size=2 face=Arial>The deferred multicast handling is mandatory for 
the SM.</FONT></SPAN></DIV>
<DIV><SPAN class=234461418-09092009><FONT color=#0000ff size=2 face=Arial>The 
reason is that many requests may arrive together. The deferred execution enables 
immediate response to the client </FONT></SPAN></DIV>
<DIV><SPAN class=234461418-09092009><FONT color=#0000ff size=2 
face=Arial>and reducing the computation time dramatically. 
</FONT></SPAN></DIV>
<DIV><SPAN class=234461418-09092009><FONT color=#0000ff size=2 
face=Arial></FONT></SPAN> </DIV>
<DIV><SPAN class=234461418-09092009><FONT color=#0000ff size=2 face=Arial>The 
need to respond immediately and not wait for the MFT calculation and setting 
arises from the fact there</FONT></SPAN></DIV>
<DIV><SPAN class=234461418-09092009><FONT color=#0000ff size=2 face=Arial>is 
only one timeout value for MAD requests. Increasing that timeout to handle the 
expected delay on N MFT changes will make error handling </FONT></SPAN><SPAN 
class=234461418-09092009><FONT color=#0000ff size=2 face=Arial>too 
slow.</FONT></SPAN></DIV>
<DIV><SPAN class=234461418-09092009><FONT color=#0000ff size=2 
face=Arial></FONT></SPAN> </DIV>
<DIV><SPAN class=234461418-09092009><FONT color=#0000ff size=2 face=Arial>The 
reduced computation comes from the fact that the routing is not 
recalculated for each request but only after a set of changes were 
made.</FONT></SPAN></DIV>
<DIV><SPAN class=234461418-09092009><FONT color=#0000ff size=2 face=Arial>This 
happens automatically since when the deferred execution happens all the changes 
made from last MFT routing are handled together.</FONT></SPAN></DIV>
<DIV><SPAN class=234461418-09092009><FONT color=#0000ff size=2 
face=Arial></FONT></SPAN> </DIV>
<DIV><SPAN class=234461418-09092009><FONT color=#0000ff size=2 face=Arial>I am 
not sure what is the change but please make sure to test it on a large cluster 
when all nodes joins or leaves at once multiple groups.</FONT></SPAN></DIV>
<DIV><SPAN class=234461418-09092009><FONT color=#0000ff size=2 
face=Arial></FONT></SPAN> </DIV>
<DIV><SPAN class=234461418-09092009><FONT color=#0000ff size=2 
face=Arial>Eitan</FONT></SPAN></DIV>
<DIV> </DIV>
<DIV dir=ltr align=left><FONT size=2 face=Arial>
<P><SPAN lang=en-gb><B><I><FONT color=#0000ff size=6 
face="Monotype Corsiva">Eitan Zahavi</FONT></I></B><I></I></SPAN> <BR><SPAN 
lang=en-gb><FONT size=2 face=Tahoma>Senior Engineering 
Director</FONT></SPAN><BR><SPAN lang=en-gb><FONT size=2 face=Tahoma>Mellanox 
Technologies LTD</FONT></SPAN> <BR><SPAN lang=en-gb><FONT size=2 
face=Tahoma>Tel:+972-4-9097208<BR>Fax:+972-4-9593245</FONT></SPAN> <BR><SPAN 
lang=en-gb><FONT size=2 face=Tahoma>P.O. Box 586 Yokneam 20692 
ISRAEL</FONT></SPAN> </P></FONT></DIV>
<DIV> </DIV><BR>
<BLOCKQUOTE 
style="BORDER-LEFT: #0000ff 2px solid; PADDING-LEFT: 5px; MARGIN-LEFT: 5px; MARGIN-RIGHT: 0px">
  <DIV dir=ltr lang=en-us class=OutlookMessageHeader align=left>
  <HR tabIndex=-1>
  <FONT size=2 face=Tahoma><B>From:</B> Hal Rosenstock 
  [mailto:hal.rosenstock@gmail.com] <BR><B>Sent:</B> Wednesday, September 09, 
  2009 1:23 AM<BR><B>To:</B> Sasha Khapyorsky<BR><B>Cc:</B> OpenIB; Eli Dorfman; 
  Eitan Zahavi; Yevgeny Kliteynik<BR><B>Subject:</B> Re: [ofa-general] [PATCH] 
  opensm: improve multicast re-routing requests processing<BR></FONT><BR></DIV>
  <DIV></DIV><BR><BR>
  <DIV class=gmail_quote>On Sun, Sep 6, 2009 at 1:39 PM, Sasha Khapyorsky <SPAN 
  dir=ltr><<A target=_blank 
  href="mailto:sashak@voltaire.com">sashak@voltaire.com</A>></SPAN> 
wrote:<BR>
  <BLOCKQUOTE 
  style="BORDER-LEFT: #ccc 1px solid; MARGIN: 0px 0px 0px 0.8ex; PADDING-LEFT: 1ex" 
  class=gmail_quote><BR>When we have two or more changes in a same multicast 
    group multiple<BR>multicast rerouting requests will be created and 
    processed. To prevent<BR>this we will use array of requests indexed by mlid 
    value minus<BR>IB_LID_MCAST_START_HO and for each multicast group change we 
    will just<BR>mark that specific mlid requires re-routing and "duplicated" 
    requests<BR>will be merged there.<BR><BR>Also in this way we will be able to 
    process multicast group routing<BR>entries deletion for already removed 
    groups by just knowing its MLID<BR>and not using its content - this will let 
    us to not delay mutlicast<BR>groups deletion ('to_be_deleted' flag) and will 
    simplify many multicast<BR>related code flows.<BR></BLOCKQUOTE>
  <DIV> </DIV>
  <DIV>While the delay adds complexity, it is a feature. Delayed deletion (and 
  join) is allowed by IBA and is needed in a fast changing subnet when there are 
  a lot of groups changing. This was seen quite a while ago and was how 
  OpenSM evolved based on field experience and other testing. Eitan is the 
  expert here. IMO support for this needs to be added (back in).</DIV>
  <DIV> </DIV>
  <DIV>-- Hal</DIV>
  <DIV> </DIV>
  <DIV> </DIV>
  <BLOCKQUOTE 
  style="BORDER-LEFT: #ccc 1px solid; MARGIN: 0px 0px 0px 0.8ex; PADDING-LEFT: 1ex" 
  class=gmail_quote><SPAN></SPAN><BR>Signed-off-by: Sasha Khapyorsky <<A 
    target=_blank 
    href="mailto:sashak@voltaire.com">sashak@voltaire.com</A>><BR>---<BR> opensm/include/opensm/osm_sm.h 
    |    4 +-<BR> opensm/opensm/osm_mcast_mgr.c  |   27 
    +++++++++------------<BR> opensm/opensm/osm_sm.c       
      |   49 +++++++++++++---------------------------<BR> 3 files 
    changed, 30 insertions(+), 50 deletions(-)<BR><BR>diff --git 
    a/opensm/include/opensm/osm_sm.h b/opensm/include/opensm/osm_sm.h<BR>index 
    0914a95..986143a 100644<BR>--- a/opensm/include/opensm/osm_sm.h<BR>+++ 
    b/opensm/include/opensm/osm_sm.h<BR>@@ -126,8 +126,8 @@ typedef struct 
    osm_sm {<BR>       cl_dispatcher_t *p_disp;<BR>  
         cl_plock_t *p_lock;<BR>      
     atomic32_t sm_trans_id;<BR>-       cl_spinlock_t 
    mgrp_lock;<BR>-       cl_qlist_t mgrp_list;<BR>+   
        unsigned mlids_req_max;<BR>+       uint8_t 
    *mlids_req;<BR>       osm_sm_mad_ctrl_t 
    mad_ctrl;<BR>       osm_lid_mgr_t lid_mgr;<BR>  
         osm_ucast_mgr_t ucast_mgr;<BR>diff --git 
    a/opensm/opensm/osm_mcast_mgr.c b/opensm/opensm/osm_mcast_mgr.c<BR>index 
    d7c5ce1..dd504ef 100644<BR>--- a/opensm/opensm/osm_mcast_mgr.c<BR>+++ 
    b/opensm/opensm/osm_mcast_mgr.c<BR>@@ -1116,7 +1116,6 @@ static int 
    mcast_mgr_set_mftables(osm_sm_t * sm)<BR> int 
    osm_mcast_mgr_process(osm_sm_t * sm)<BR> {<BR>      
     cl_qmap_t *p_sw_tbl;<BR>-       cl_qlist_t *p_list = 
    &sm->mgrp_list;<BR>       osm_mgrp_t 
    *p_mgrp;<BR>       int i, ret = 0;<BR><BR>@@ -1150,16 
    +1149,14 @@ int osm_mcast_mgr_process(osm_sm_t * sm)<BR>      
                    
     mcast_mgr_process_mgrp(sm, p_mgrp);<BR>      
     }<BR><BR>+       memset(sm->mlids_req, 0, 
    sm->mlids_req_max);<BR>+       sm->mlids_req_max = 
    0;<BR>+<BR>       /*<BR>        
      Walk the switches and download the tables for each.<BR>    
        */<BR>       ret = 
    mcast_mgr_set_mftables(sm);<BR><BR>-       while 
    (!cl_is_qlist_empty(p_list)) {<BR>-           
        cl_list_item_t *p = cl_qlist_remove_head(p_list);<BR>-   
                free(p);<BR>-       
    }<BR>-<BR> exit:<BR>      
     CL_PLOCK_RELEASE(sm->p_lock);<BR><BR>@@ -1174,11 +1171,10 @@ 
    exit:<BR> **********************************************************************/<BR> int 
    osm_mcast_mgr_process_mgroups(osm_sm_t * sm)<BR> {<BR>-     
      cl_qlist_t *p_list = &sm->mgrp_list;<BR>      
     osm_mgrp_t *p_mgrp;<BR>       ib_net16_t 
    mlid;<BR>-       osm_mcast_mgr_ctxt_t *ctx;<BR>    
       int ret = 0;<BR>+       unsigned 
    i;<BR><BR>       OSM_LOG_ENTER(sm->p_log);<BR><BR>@@ 
    -1192,14 +1188,12 @@ int osm_mcast_mgr_process_mgroups(osm_sm_t * 
    sm)<BR>               goto 
    exit;<BR>       }<BR><BR>-       while 
    (!cl_is_qlist_empty(p_list)) {<BR>-           
        ctx = (osm_mcast_mgr_ctxt_t *) 
    cl_qlist_remove_head(p_list);<BR>-<BR>-           
        /* nice copy no warning on size diff */<BR>-     
              memcpy(&mlid, &ctx->mlid, 
    sizeof(mlid));<BR>+       for (i = 0; i <= 
    sm->mlids_req_max; i++) {<BR>+             
      if (!sm->mlids_req[i])<BR>+           
                continue;<BR>+     
              sm->mlids_req[i] = 0;<BR><BR>-   
                /* we can destroy the context now 
    */<BR>-               free(ctx);<BR>+ 
                  mlid = cl_hton16(i + 
    IB_LID_MCAST_START_HO);<BR><BR>            
       /* since we delayed the execution we prefer to pass 
    the<BR>                  mlid 
    as the mgrp identifier and then find it or abort */<BR>@@ -1223,6 +1217,9 @@ 
    int osm_mcast_mgr_process_mgroups(osm_sm_t * sm)<BR>      
             mcast_mgr_process_mgrp(sm, 
    p_mgrp);<BR>       }<BR><BR>+       
    memset(sm->mlids_req, 0, sm->mlids_req_max);<BR>+       
    sm->mlids_req_max = 0;<BR>+<BR>       /*<BR>  
            Walk the switches and download the tables for 
    each.<BR>        */<BR>diff --git 
    a/opensm/opensm/osm_sm.c b/opensm/opensm/osm_sm.c<BR>index 50aee91..e446c9d 
    100644<BR>--- a/opensm/opensm/osm_sm.c<BR>+++ b/opensm/opensm/osm_sm.c<BR>@@ 
    -166,7 +166,6 @@ void osm_sm_construct(IN osm_sm_t * p_sm)<BR>    
       cl_event_construct(&p_sm->subnet_up_event);<BR>  
        
     cl_event_wheel_construct(&p_sm->trap_aging_tracker);<BR>  
         cl_thread_construct(&p_sm->sweeper);<BR>-   
        cl_spinlock_construct(&p_sm->mgrp_lock);<BR>  
        
     osm_sm_mad_ctrl_construct(&p_sm->mad_ctrl);<BR>    
       osm_lid_mgr_construct(&p_sm->lid_mgr);<BR>    
       osm_ucast_mgr_construct(&p_sm->ucast_mgr);<BR>@@ -234,8 
    +233,8 @@ void osm_sm_destroy(IN osm_sm_t * p_sm)<BR>      
     cl_event_destroy(&p_sm->signal_event);<BR>      
     cl_event_destroy(&p_sm->subnet_up_event);<BR>    
       cl_spinlock_destroy(&p_sm->signal_lock);<BR>-   
        cl_spinlock_destroy(&p_sm->mgrp_lock);<BR>    
       cl_spinlock_destroy(&p_sm->state_lock);<BR>+   
        free(p_sm->mlids_req);<BR><BR>      
     osm_log(p_sm->p_log, OSM_LOG_SYS, "Exiting SM\n");     
     /* Format Waived */<BR>      
     OSM_LOG_EXIT(p_sm->p_log);<BR>@@ -288,11 +287,14 @@ ib_api_status_t 
    osm_sm_init(IN osm_sm_t * p_sm, IN osm_subn_t * p_subn,<BR>    
       if (status != CL_SUCCESS)<BR>          
         goto Exit;<BR><BR>-       
    cl_qlist_init(&p_sm->mgrp_list);<BR>-<BR>-       
    status = cl_spinlock_init(&p_sm->mgrp_lock);<BR>-     
      if (status != CL_SUCCESS)<BR>+       
    p_sm->mlids_req_max = 0;<BR>+       p_sm->mlids_req = 
    malloc((IB_LID_MCAST_END_HO - IB_LID_MCAST_START_HO +<BR>+     
                          
          1) * sizeof(p_sm->mlids_req[0]));<BR>+     
      if (!p_sm->mlids_req)<BR>            
       goto Exit;<BR>+       memset(p_sm->mlids_req, 
    0,<BR>+              (IB_LID_MCAST_END_HO 
    - IB_LID_MCAST_START_HO +<BR>+             
      1) * sizeof(p_sm->mlids_req[0]));<BR><BR>      
     status = osm_sm_mad_ctrl_init(&p_sm->mad_ctrl, 
    p_sm->p_subn,<BR>                
                        
     p_sm->p_mad_pool, p_sm->p_vl15,<BR>@@ -441,32 +443,15 @@ 
    Exit:<BR><BR> /**********************************************************************<BR> **********************************************************************/<BR>-static 
    ib_api_status_t sm_mgrp_process(IN osm_sm_t * p_sm,<BR>-     
                          
               IN osm_mgrp_t * p_mgrp)<BR>+static 
    void request_mlid(osm_sm_t * sm, uint16_t mlid)<BR> {<BR>-   
        osm_mcast_mgr_ctxt_t *ctx;<BR>-<BR>-       
    /*<BR>-        * 'Schedule' all the QP0 traffic for when 
    the state manager<BR>-        * isn't busy trying to do 
    something else.<BR>-        */<BR>-       
    ctx = malloc(sizeof(*ctx));<BR>-       if (!ctx)<BR>-   
                return IB_ERROR;<BR>-   
        memset(ctx, 0, sizeof(*ctx));<BR>-       
    ctx->mlid = p_mgrp->mlid;<BR>-<BR>-       
    cl_spinlock_acquire(&p_sm->mgrp_lock);<BR>-       
    cl_qlist_insert_tail(&p_sm->mgrp_list, &ctx->list_item);<BR>- 
          cl_spinlock_release(&p_sm->mgrp_lock);<BR>-<BR>- 
          osm_sm_signal(p_sm, 
    OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST);<BR>-<BR>-       return 
    IB_SUCCESS;<BR>+       mlid -= IB_LID_MCAST_START_HO;<BR>+ 
          sm->mlids_req[mlid] = 1;<BR>+       
    if (sm->mlids_req_max < mlid)<BR>+           
        sm->mlids_req_max = mlid;<BR>+       
    osm_sm_signal(sm, 
    OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST);<BR> }<BR><BR>-/**********************************************************************<BR>- 
    **********************************************************************/<BR> ib_api_status_t 
    osm_sm_mcgrp_join(IN osm_sm_t * p_sm, IN osm_mgrp_t *mgrp,<BR>    
                          
           IN const ib_net64_t port_guid)<BR> {<BR>@@ 
    -519,7 +504,7 @@ ib_api_status_t osm_sm_mcgrp_join(IN osm_sm_t * p_sm, IN 
    osm_mgrp_t *mgrp,<BR>              
     goto Exit;<BR>       }<BR><BR>-     
      status = sm_mgrp_process(p_sm, mgrp);<BR>+       
    request_mlid(p_sm, cl_ntoh16(mgrp->mlid));<BR> Exit:<BR>  
         CL_PLOCK_RELEASE(p_sm->p_lock);<BR>    
       OSM_LOG_EXIT(p_sm->p_log);<BR>@@ -527,8 +512,6 @@ 
    Exit:<BR>       return 
    status;<BR> }<BR><BR>-/**********************************************************************<BR>- 
    **********************************************************************/<BR> ib_api_status_t 
    osm_sm_mcgrp_leave(IN osm_sm_t * p_sm, IN osm_mgrp_t *mgrp,<BR>    
                          
            IN const ib_net64_t port_guid)<BR> {<BR>@@ 
    -557,7 +540,7 @@ ib_api_status_t osm_sm_mcgrp_leave(IN osm_sm_t * p_sm, IN 
    osm_mgrp_t *mgrp,<BR><BR>      
     osm_port_remove_mgrp(p_port, mgrp);<BR><BR>-       
    status = sm_mgrp_process(p_sm, mgrp);<BR>+       
    request_mlid(p_sm, cl_hton16(mgrp->mlid));<BR> Exit:<BR>  
         CL_PLOCK_RELEASE(p_sm->p_lock);<BR>    
      
     OSM_LOG_EXIT(p_sm->p_log);<BR>--<BR>1.6.4.2<BR><BR>_______________________________________________<BR>general 
    mailing list<BR><A target=_blank 
    href="mailto:general@lists.openfabrics.org">general@lists.openfabrics.org</A><BR><A 
    target=_blank 
    href="http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general">http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general</A><BR><BR>To 
    unsubscribe, please visit <A target=_blank 
    href="http://openib.org/mailman/listinfo/openib-general">http://openib.org/mailman/listinfo/openib-general</A><BR></BLOCKQUOTE></DIV><BR></BLOCKQUOTE></BODY></HTML>