<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">

<HTML><HEAD>

<META content="text/html; charset=us-ascii" http-equiv=Content-Type>

<META name=GENERATOR content="MSHTML 8.00.6001.18241"></HEAD>

<BODY>

<DIV><SPAN class=234461418-09092009><FONT color=#0000ff size=2 face=Arial>Hi 

All,</FONT></SPAN></DIV>

<DIV><SPAN class=234461418-09092009><FONT color=#0000ff size=2 

face=Arial></FONT></SPAN> </DIV>

<DIV><SPAN class=234461418-09092009><FONT color=#0000ff size=2 face=Arial>Sorry 

I am not following the exact details of the changes made.</FONT></SPAN></DIV>

<DIV><SPAN class=234461418-09092009><FONT color=#0000ff size=2 face=Arial>But 

one thing must be clear:</FONT></SPAN><SPAN class=234461418-09092009><FONT 

color=#0000ff size=2 face=Arial>The deferred multicast handling is mandatory for 

the SM.</FONT></SPAN></DIV>

<DIV><SPAN class=234461418-09092009><FONT color=#0000ff size=2 face=Arial>The 

reason is that many requests may arrive together. The deferred execution enables 

immediate response to the client </FONT></SPAN></DIV>

<DIV><SPAN class=234461418-09092009><FONT color=#0000ff size=2 

face=Arial>and reducing the computation time dramatically. 

</FONT></SPAN></DIV>

<DIV><SPAN class=234461418-09092009><FONT color=#0000ff size=2 

face=Arial></FONT></SPAN> </DIV>

<DIV><SPAN class=234461418-09092009><FONT color=#0000ff size=2 face=Arial>The 

need to respond immediately and not wait for the MFT calculation and setting 

arises from the fact there</FONT></SPAN></DIV>

<DIV><SPAN class=234461418-09092009><FONT color=#0000ff size=2 face=Arial>is 

only one timeout value for MAD requests. Increasing that timeout to handle the 

expected delay on N MFT changes will make error handling </FONT></SPAN><SPAN 

class=234461418-09092009><FONT color=#0000ff size=2 face=Arial>too 

slow.</FONT></SPAN></DIV>

<DIV><SPAN class=234461418-09092009><FONT color=#0000ff size=2 

face=Arial></FONT></SPAN> </DIV>

<DIV><SPAN class=234461418-09092009><FONT color=#0000ff size=2 face=Arial>The 

reduced computation comes from the fact that the routing is not 

recalculated for each request but only after a set of changes were 

made.</FONT></SPAN></DIV>

<DIV><SPAN class=234461418-09092009><FONT color=#0000ff size=2 face=Arial>This 

happens automatically since when the deferred execution happens all the changes 

made from last MFT routing are handled together.</FONT></SPAN></DIV>

<DIV><SPAN class=234461418-09092009><FONT color=#0000ff size=2 

face=Arial></FONT></SPAN> </DIV>

<DIV><SPAN class=234461418-09092009><FONT color=#0000ff size=2 face=Arial>I am 

not sure what is the change but please make sure to test it on a large cluster 

when all nodes joins or leaves at once multiple groups.</FONT></SPAN></DIV>

<DIV><SPAN class=234461418-09092009><FONT color=#0000ff size=2 

face=Arial></FONT></SPAN> </DIV>

<DIV><SPAN class=234461418-09092009><FONT color=#0000ff size=2 

face=Arial>Eitan</FONT></SPAN></DIV>

<DIV> </DIV>

<DIV dir=ltr align=left><FONT size=2 face=Arial>

<P><SPAN lang=en-gb><B><I><FONT color=#0000ff size=6 

face="Monotype Corsiva">Eitan Zahavi</FONT></I></B><I></I></SPAN> <BR><SPAN 

lang=en-gb><FONT size=2 face=Tahoma>Senior Engineering 

Director</FONT></SPAN><BR><SPAN lang=en-gb><FONT size=2 face=Tahoma>Mellanox 

Technologies LTD</FONT></SPAN> <BR><SPAN lang=en-gb><FONT size=2 

face=Tahoma>Tel:+972-4-9097208<BR>Fax:+972-4-9593245</FONT></SPAN> <BR><SPAN 

lang=en-gb><FONT size=2 face=Tahoma>P.O. Box 586 Yokneam 20692 

ISRAEL</FONT></SPAN> </P></FONT></DIV>

<DIV> </DIV><BR>

<BLOCKQUOTE 

style="BORDER-LEFT: #0000ff 2px solid; PADDING-LEFT: 5px; MARGIN-LEFT: 5px; MARGIN-RIGHT: 0px">

  <DIV dir=ltr lang=en-us class=OutlookMessageHeader align=left>

  <HR tabIndex=-1>

  <FONT size=2 face=Tahoma><B>From:</B> Hal Rosenstock 

  [mailto:hal.rosenstock@gmail.com] <BR><B>Sent:</B> Wednesday, September 09, 

  2009 1:23 AM<BR><B>To:</B> Sasha Khapyorsky<BR><B>Cc:</B> OpenIB; Eli Dorfman; 

  Eitan Zahavi; Yevgeny Kliteynik<BR><B>Subject:</B> Re: [ofa-general] [PATCH] 

  opensm: improve multicast re-routing requests processing<BR></FONT><BR></DIV>

  <DIV></DIV><BR><BR>

  <DIV class=gmail_quote>On Sun, Sep 6, 2009 at 1:39 PM, Sasha Khapyorsky <SPAN 

  dir=ltr><<A target=_blank 

  href="mailto:sashak@voltaire.com">sashak@voltaire.com</A>></SPAN> 

wrote:<BR>

  <BLOCKQUOTE 

  style="BORDER-LEFT: #ccc 1px solid; MARGIN: 0px 0px 0px 0.8ex; PADDING-LEFT: 1ex" 

  class=gmail_quote><BR>When we have two or more changes in a same multicast 

    group multiple<BR>multicast rerouting requests will be created and 

    processed. To prevent<BR>this we will use array of requests indexed by mlid 

    value minus<BR>IB_LID_MCAST_START_HO and for each multicast group change we 

    will just<BR>mark that specific mlid requires re-routing and "duplicated" 

    requests<BR>will be merged there.<BR><BR>Also in this way we will be able to 

    process multicast group routing<BR>entries deletion for already removed 

    groups by just knowing its MLID<BR>and not using its content - this will let 

    us to not delay mutlicast<BR>groups deletion ('to_be_deleted' flag) and will 

    simplify many multicast<BR>related code flows.<BR></BLOCKQUOTE>

  <DIV> </DIV>

  <DIV>While the delay adds complexity, it is a feature. Delayed deletion (and 

  join) is allowed by IBA and is needed in a fast changing subnet when there are 

  a lot of groups changing. This was seen quite a while ago and was how 

  OpenSM evolved based on field experience and other testing. Eitan is the 

  expert here. IMO support for this needs to be added (back in).</DIV>

  <DIV> </DIV>

  <DIV>-- Hal</DIV>

  <DIV> </DIV>

  <DIV> </DIV>

  <BLOCKQUOTE 

  style="BORDER-LEFT: #ccc 1px solid; MARGIN: 0px 0px 0px 0.8ex; PADDING-LEFT: 1ex" 

  class=gmail_quote><SPAN></SPAN><BR>Signed-off-by: Sasha Khapyorsky <<A 

    target=_blank 

    href="mailto:sashak@voltaire.com">sashak@voltaire.com</A>><BR>---<BR> opensm/include/opensm/osm_sm.h 

    |    4 +-<BR> opensm/opensm/osm_mcast_mgr.c  |   27 

    +++++++++------------<BR> opensm/opensm/osm_sm.c       

      |   49 +++++++++++++---------------------------<BR> 3 files 

    changed, 30 insertions(+), 50 deletions(-)<BR><BR>diff --git 

    a/opensm/include/opensm/osm_sm.h b/opensm/include/opensm/osm_sm.h<BR>index 

    0914a95..986143a 100644<BR>--- a/opensm/include/opensm/osm_sm.h<BR>+++ 

    b/opensm/include/opensm/osm_sm.h<BR>@@ -126,8 +126,8 @@ typedef struct 

    osm_sm {<BR>       cl_dispatcher_t *p_disp;<BR>  

         cl_plock_t *p_lock;<BR>      

     atomic32_t sm_trans_id;<BR>-       cl_spinlock_t 

    mgrp_lock;<BR>-       cl_qlist_t mgrp_list;<BR>+   

        unsigned mlids_req_max;<BR>+       uint8_t 

    *mlids_req;<BR>       osm_sm_mad_ctrl_t 

    mad_ctrl;<BR>       osm_lid_mgr_t lid_mgr;<BR>  

         osm_ucast_mgr_t ucast_mgr;<BR>diff --git 

    a/opensm/opensm/osm_mcast_mgr.c b/opensm/opensm/osm_mcast_mgr.c<BR>index 

    d7c5ce1..dd504ef 100644<BR>--- a/opensm/opensm/osm_mcast_mgr.c<BR>+++ 

    b/opensm/opensm/osm_mcast_mgr.c<BR>@@ -1116,7 +1116,6 @@ static int 

    mcast_mgr_set_mftables(osm_sm_t * sm)<BR> int 

    osm_mcast_mgr_process(osm_sm_t * sm)<BR> {<BR>      

     cl_qmap_t *p_sw_tbl;<BR>-       cl_qlist_t *p_list = 

    &sm->mgrp_list;<BR>       osm_mgrp_t 

    *p_mgrp;<BR>       int i, ret = 0;<BR><BR>@@ -1150,16 

    +1149,14 @@ int osm_mcast_mgr_process(osm_sm_t * sm)<BR>      

                    

     mcast_mgr_process_mgrp(sm, p_mgrp);<BR>      

     }<BR><BR>+       memset(sm->mlids_req, 0, 

    sm->mlids_req_max);<BR>+       sm->mlids_req_max = 

    0;<BR>+<BR>       /*<BR>        

      Walk the switches and download the tables for each.<BR>    

        */<BR>       ret = 

    mcast_mgr_set_mftables(sm);<BR><BR>-       while 

    (!cl_is_qlist_empty(p_list)) {<BR>-           

        cl_list_item_t *p = cl_qlist_remove_head(p_list);<BR>-   

                free(p);<BR>-       

    }<BR>-<BR> exit:<BR>      

     CL_PLOCK_RELEASE(sm->p_lock);<BR><BR>@@ -1174,11 +1171,10 @@ 

    exit:<BR> **********************************************************************/<BR> int 

    osm_mcast_mgr_process_mgroups(osm_sm_t * sm)<BR> {<BR>-     

      cl_qlist_t *p_list = &sm->mgrp_list;<BR>      

     osm_mgrp_t *p_mgrp;<BR>       ib_net16_t 

    mlid;<BR>-       osm_mcast_mgr_ctxt_t *ctx;<BR>    

       int ret = 0;<BR>+       unsigned 

    i;<BR><BR>       OSM_LOG_ENTER(sm->p_log);<BR><BR>@@ 

    -1192,14 +1188,12 @@ int osm_mcast_mgr_process_mgroups(osm_sm_t * 

    sm)<BR>               goto 

    exit;<BR>       }<BR><BR>-       while 

    (!cl_is_qlist_empty(p_list)) {<BR>-           

        ctx = (osm_mcast_mgr_ctxt_t *) 

    cl_qlist_remove_head(p_list);<BR>-<BR>-           

        /* nice copy no warning on size diff */<BR>-     

              memcpy(&mlid, &ctx->mlid, 

    sizeof(mlid));<BR>+       for (i = 0; i <= 

    sm->mlids_req_max; i++) {<BR>+             

      if (!sm->mlids_req[i])<BR>+           

                continue;<BR>+     

              sm->mlids_req[i] = 0;<BR><BR>-   

                /* we can destroy the context now 

    */<BR>-               free(ctx);<BR>+ 

                  mlid = cl_hton16(i + 

    IB_LID_MCAST_START_HO);<BR><BR>            

       /* since we delayed the execution we prefer to pass 

    the<BR>                  mlid 

    as the mgrp identifier and then find it or abort */<BR>@@ -1223,6 +1217,9 @@ 

    int osm_mcast_mgr_process_mgroups(osm_sm_t * sm)<BR>      

             mcast_mgr_process_mgrp(sm, 

    p_mgrp);<BR>       }<BR><BR>+       

    memset(sm->mlids_req, 0, sm->mlids_req_max);<BR>+       

    sm->mlids_req_max = 0;<BR>+<BR>       /*<BR>  

            Walk the switches and download the tables for 

    each.<BR>        */<BR>diff --git 

    a/opensm/opensm/osm_sm.c b/opensm/opensm/osm_sm.c<BR>index 50aee91..e446c9d 

    100644<BR>--- a/opensm/opensm/osm_sm.c<BR>+++ b/opensm/opensm/osm_sm.c<BR>@@ 

    -166,7 +166,6 @@ void osm_sm_construct(IN osm_sm_t * p_sm)<BR>    

       cl_event_construct(&p_sm->subnet_up_event);<BR>  

        

     cl_event_wheel_construct(&p_sm->trap_aging_tracker);<BR>  

         cl_thread_construct(&p_sm->sweeper);<BR>-   

        cl_spinlock_construct(&p_sm->mgrp_lock);<BR>  

        

     osm_sm_mad_ctrl_construct(&p_sm->mad_ctrl);<BR>    

       osm_lid_mgr_construct(&p_sm->lid_mgr);<BR>    

       osm_ucast_mgr_construct(&p_sm->ucast_mgr);<BR>@@ -234,8 

    +233,8 @@ void osm_sm_destroy(IN osm_sm_t * p_sm)<BR>      

     cl_event_destroy(&p_sm->signal_event);<BR>      

     cl_event_destroy(&p_sm->subnet_up_event);<BR>    

       cl_spinlock_destroy(&p_sm->signal_lock);<BR>-   

        cl_spinlock_destroy(&p_sm->mgrp_lock);<BR>    

       cl_spinlock_destroy(&p_sm->state_lock);<BR>+   

        free(p_sm->mlids_req);<BR><BR>      

     osm_log(p_sm->p_log, OSM_LOG_SYS, "Exiting SM\n");     

     /* Format Waived */<BR>      

     OSM_LOG_EXIT(p_sm->p_log);<BR>@@ -288,11 +287,14 @@ ib_api_status_t 

    osm_sm_init(IN osm_sm_t * p_sm, IN osm_subn_t * p_subn,<BR>    

       if (status != CL_SUCCESS)<BR>          

         goto Exit;<BR><BR>-       

    cl_qlist_init(&p_sm->mgrp_list);<BR>-<BR>-       

    status = cl_spinlock_init(&p_sm->mgrp_lock);<BR>-     

      if (status != CL_SUCCESS)<BR>+       

    p_sm->mlids_req_max = 0;<BR>+       p_sm->mlids_req = 

    malloc((IB_LID_MCAST_END_HO - IB_LID_MCAST_START_HO +<BR>+     

                          

          1) * sizeof(p_sm->mlids_req[0]));<BR>+     

      if (!p_sm->mlids_req)<BR>            

       goto Exit;<BR>+       memset(p_sm->mlids_req, 

    0,<BR>+              (IB_LID_MCAST_END_HO 

    - IB_LID_MCAST_START_HO +<BR>+             

      1) * sizeof(p_sm->mlids_req[0]));<BR><BR>      

     status = osm_sm_mad_ctrl_init(&p_sm->mad_ctrl, 

    p_sm->p_subn,<BR>                

                        

     p_sm->p_mad_pool, p_sm->p_vl15,<BR>@@ -441,32 +443,15 @@ 

    Exit:<BR><BR> /**********************************************************************<BR> **********************************************************************/<BR>-static 

    ib_api_status_t sm_mgrp_process(IN osm_sm_t * p_sm,<BR>-     

                          

               IN osm_mgrp_t * p_mgrp)<BR>+static 

    void request_mlid(osm_sm_t * sm, uint16_t mlid)<BR> {<BR>-   

        osm_mcast_mgr_ctxt_t *ctx;<BR>-<BR>-       

    /*<BR>-        * 'Schedule' all the QP0 traffic for when 

    the state manager<BR>-        * isn't busy trying to do 

    something else.<BR>-        */<BR>-       

    ctx = malloc(sizeof(*ctx));<BR>-       if (!ctx)<BR>-   

                return IB_ERROR;<BR>-   

        memset(ctx, 0, sizeof(*ctx));<BR>-       

    ctx->mlid = p_mgrp->mlid;<BR>-<BR>-       

    cl_spinlock_acquire(&p_sm->mgrp_lock);<BR>-       

    cl_qlist_insert_tail(&p_sm->mgrp_list, &ctx->list_item);<BR>- 

          cl_spinlock_release(&p_sm->mgrp_lock);<BR>-<BR>- 

          osm_sm_signal(p_sm, 

    OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST);<BR>-<BR>-       return 

    IB_SUCCESS;<BR>+       mlid -= IB_LID_MCAST_START_HO;<BR>+ 

          sm->mlids_req[mlid] = 1;<BR>+       

    if (sm->mlids_req_max < mlid)<BR>+           

        sm->mlids_req_max = mlid;<BR>+       

    osm_sm_signal(sm, 

    OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST);<BR> }<BR><BR>-/**********************************************************************<BR>- 

    **********************************************************************/<BR> ib_api_status_t 

    osm_sm_mcgrp_join(IN osm_sm_t * p_sm, IN osm_mgrp_t *mgrp,<BR>    

                          

           IN const ib_net64_t port_guid)<BR> {<BR>@@ 

    -519,7 +504,7 @@ ib_api_status_t osm_sm_mcgrp_join(IN osm_sm_t * p_sm, IN 

    osm_mgrp_t *mgrp,<BR>              

     goto Exit;<BR>       }<BR><BR>-     

      status = sm_mgrp_process(p_sm, mgrp);<BR>+       

    request_mlid(p_sm, cl_ntoh16(mgrp->mlid));<BR> Exit:<BR>  

         CL_PLOCK_RELEASE(p_sm->p_lock);<BR>    

       OSM_LOG_EXIT(p_sm->p_log);<BR>@@ -527,8 +512,6 @@ 

    Exit:<BR>       return 

    status;<BR> }<BR><BR>-/**********************************************************************<BR>- 

    **********************************************************************/<BR> ib_api_status_t 

    osm_sm_mcgrp_leave(IN osm_sm_t * p_sm, IN osm_mgrp_t *mgrp,<BR>    

                          

            IN const ib_net64_t port_guid)<BR> {<BR>@@ 

    -557,7 +540,7 @@ ib_api_status_t osm_sm_mcgrp_leave(IN osm_sm_t * p_sm, IN 

    osm_mgrp_t *mgrp,<BR><BR>      

     osm_port_remove_mgrp(p_port, mgrp);<BR><BR>-       

    status = sm_mgrp_process(p_sm, mgrp);<BR>+       

    request_mlid(p_sm, cl_hton16(mgrp->mlid));<BR> Exit:<BR>  

         CL_PLOCK_RELEASE(p_sm->p_lock);<BR>    

      

     OSM_LOG_EXIT(p_sm->p_log);<BR>--<BR>1.6.4.2<BR><BR>_______________________________________________<BR>general 

    mailing list<BR><A target=_blank 

    href="mailto:general@lists.openfabrics.org">general@lists.openfabrics.org</A><BR><A 

    target=_blank 

    href="http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general">http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general</A><BR><BR>To 

    unsubscribe, please visit <A target=_blank 

    href="http://openib.org/mailman/listinfo/openib-general">http://openib.org/mailman/listinfo/openib-general</A><BR></BLOCKQUOTE></DIV><BR></BLOCKQUOTE></BODY></HTML>