<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META content="text/html; charset=us-ascii" http-equiv=Content-Type>
<META name=GENERATOR content="MSHTML 8.00.6001.18241"></HEAD>
<BODY>
<DIV><SPAN class=234461418-09092009><FONT color=#0000ff size=2 face=Arial>Hi
All,</FONT></SPAN></DIV>
<DIV><SPAN class=234461418-09092009><FONT color=#0000ff size=2
face=Arial></FONT></SPAN> </DIV>
<DIV><SPAN class=234461418-09092009><FONT color=#0000ff size=2 face=Arial>Sorry
I am not following the exact details of the changes made.</FONT></SPAN></DIV>
<DIV><SPAN class=234461418-09092009><FONT color=#0000ff size=2 face=Arial>But
one thing must be clear:</FONT></SPAN><SPAN class=234461418-09092009><FONT
color=#0000ff size=2 face=Arial>The deferred multicast handling is mandatory for
the SM.</FONT></SPAN></DIV>
<DIV><SPAN class=234461418-09092009><FONT color=#0000ff size=2 face=Arial>The
reason is that many requests may arrive together. The deferred execution enables
immediate response to the client </FONT></SPAN></DIV>
<DIV><SPAN class=234461418-09092009><FONT color=#0000ff size=2
face=Arial>and reducing the computation time dramatically.
</FONT></SPAN></DIV>
<DIV><SPAN class=234461418-09092009><FONT color=#0000ff size=2
face=Arial></FONT></SPAN> </DIV>
<DIV><SPAN class=234461418-09092009><FONT color=#0000ff size=2 face=Arial>The
need to respond immediately and not wait for the MFT calculation and setting
arises from the fact there</FONT></SPAN></DIV>
<DIV><SPAN class=234461418-09092009><FONT color=#0000ff size=2 face=Arial>is
only one timeout value for MAD requests. Increasing that timeout to handle the
expected delay on N MFT changes will make error handling </FONT></SPAN><SPAN
class=234461418-09092009><FONT color=#0000ff size=2 face=Arial>too
slow.</FONT></SPAN></DIV>
<DIV><SPAN class=234461418-09092009><FONT color=#0000ff size=2
face=Arial></FONT></SPAN> </DIV>
<DIV><SPAN class=234461418-09092009><FONT color=#0000ff size=2 face=Arial>The
reduced computation comes from the fact that the routing is not
recalculated for each request but only after a set of changes were
made.</FONT></SPAN></DIV>
<DIV><SPAN class=234461418-09092009><FONT color=#0000ff size=2 face=Arial>This
happens automatically since when the deferred execution happens all the changes
made from last MFT routing are handled together.</FONT></SPAN></DIV>
<DIV><SPAN class=234461418-09092009><FONT color=#0000ff size=2
face=Arial></FONT></SPAN> </DIV>
<DIV><SPAN class=234461418-09092009><FONT color=#0000ff size=2 face=Arial>I am
not sure what is the change but please make sure to test it on a large cluster
when all nodes joins or leaves at once multiple groups.</FONT></SPAN></DIV>
<DIV><SPAN class=234461418-09092009><FONT color=#0000ff size=2
face=Arial></FONT></SPAN> </DIV>
<DIV><SPAN class=234461418-09092009><FONT color=#0000ff size=2
face=Arial>Eitan</FONT></SPAN></DIV>
<DIV> </DIV>
<DIV dir=ltr align=left><FONT size=2 face=Arial>
<P><SPAN lang=en-gb><B><I><FONT color=#0000ff size=6
face="Monotype Corsiva">Eitan Zahavi</FONT></I></B><I></I></SPAN> <BR><SPAN
lang=en-gb><FONT size=2 face=Tahoma>Senior Engineering
Director</FONT></SPAN><BR><SPAN lang=en-gb><FONT size=2 face=Tahoma>Mellanox
Technologies LTD</FONT></SPAN> <BR><SPAN lang=en-gb><FONT size=2
face=Tahoma>Tel:+972-4-9097208<BR>Fax:+972-4-9593245</FONT></SPAN> <BR><SPAN
lang=en-gb><FONT size=2 face=Tahoma>P.O. Box 586 Yokneam 20692
ISRAEL</FONT></SPAN> </P></FONT></DIV>
<DIV> </DIV><BR>
<BLOCKQUOTE
style="BORDER-LEFT: #0000ff 2px solid; PADDING-LEFT: 5px; MARGIN-LEFT: 5px; MARGIN-RIGHT: 0px">
<DIV dir=ltr lang=en-us class=OutlookMessageHeader align=left>
<HR tabIndex=-1>
<FONT size=2 face=Tahoma><B>From:</B> Hal Rosenstock
[mailto:hal.rosenstock@gmail.com] <BR><B>Sent:</B> Wednesday, September 09,
2009 1:23 AM<BR><B>To:</B> Sasha Khapyorsky<BR><B>Cc:</B> OpenIB; Eli Dorfman;
Eitan Zahavi; Yevgeny Kliteynik<BR><B>Subject:</B> Re: [ofa-general] [PATCH]
opensm: improve multicast re-routing requests processing<BR></FONT><BR></DIV>
<DIV></DIV><BR><BR>
<DIV class=gmail_quote>On Sun, Sep 6, 2009 at 1:39 PM, Sasha Khapyorsky <SPAN
dir=ltr><<A target=_blank
href="mailto:sashak@voltaire.com">sashak@voltaire.com</A>></SPAN>
wrote:<BR>
<BLOCKQUOTE
style="BORDER-LEFT: #ccc 1px solid; MARGIN: 0px 0px 0px 0.8ex; PADDING-LEFT: 1ex"
class=gmail_quote><BR>When we have two or more changes in a same multicast
group multiple<BR>multicast rerouting requests will be created and
processed. To prevent<BR>this we will use array of requests indexed by mlid
value minus<BR>IB_LID_MCAST_START_HO and for each multicast group change we
will just<BR>mark that specific mlid requires re-routing and "duplicated"
requests<BR>will be merged there.<BR><BR>Also in this way we will be able to
process multicast group routing<BR>entries deletion for already removed
groups by just knowing its MLID<BR>and not using its content - this will let
us to not delay mutlicast<BR>groups deletion ('to_be_deleted' flag) and will
simplify many multicast<BR>related code flows.<BR></BLOCKQUOTE>
<DIV> </DIV>
<DIV>While the delay adds complexity, it is a feature. Delayed deletion (and
join) is allowed by IBA and is needed in a fast changing subnet when there are
a lot of groups changing. This was seen quite a while ago and was how
OpenSM evolved based on field experience and other testing. Eitan is the
expert here. IMO support for this needs to be added (back in).</DIV>
<DIV> </DIV>
<DIV>-- Hal</DIV>
<DIV> </DIV>
<DIV> </DIV>
<BLOCKQUOTE
style="BORDER-LEFT: #ccc 1px solid; MARGIN: 0px 0px 0px 0.8ex; PADDING-LEFT: 1ex"
class=gmail_quote><SPAN></SPAN><BR>Signed-off-by: Sasha Khapyorsky <<A
target=_blank
href="mailto:sashak@voltaire.com">sashak@voltaire.com</A>><BR>---<BR> opensm/include/opensm/osm_sm.h
| 4 +-<BR> opensm/opensm/osm_mcast_mgr.c | 27
+++++++++------------<BR> opensm/opensm/osm_sm.c
| 49 +++++++++++++---------------------------<BR> 3 files
changed, 30 insertions(+), 50 deletions(-)<BR><BR>diff --git
a/opensm/include/opensm/osm_sm.h b/opensm/include/opensm/osm_sm.h<BR>index
0914a95..986143a 100644<BR>--- a/opensm/include/opensm/osm_sm.h<BR>+++
b/opensm/include/opensm/osm_sm.h<BR>@@ -126,8 +126,8 @@ typedef struct
osm_sm {<BR> cl_dispatcher_t *p_disp;<BR>
cl_plock_t *p_lock;<BR>
atomic32_t sm_trans_id;<BR>- cl_spinlock_t
mgrp_lock;<BR>- cl_qlist_t mgrp_list;<BR>+
unsigned mlids_req_max;<BR>+ uint8_t
*mlids_req;<BR> osm_sm_mad_ctrl_t
mad_ctrl;<BR> osm_lid_mgr_t lid_mgr;<BR>
osm_ucast_mgr_t ucast_mgr;<BR>diff --git
a/opensm/opensm/osm_mcast_mgr.c b/opensm/opensm/osm_mcast_mgr.c<BR>index
d7c5ce1..dd504ef 100644<BR>--- a/opensm/opensm/osm_mcast_mgr.c<BR>+++
b/opensm/opensm/osm_mcast_mgr.c<BR>@@ -1116,7 +1116,6 @@ static int
mcast_mgr_set_mftables(osm_sm_t * sm)<BR> int
osm_mcast_mgr_process(osm_sm_t * sm)<BR> {<BR>
cl_qmap_t *p_sw_tbl;<BR>- cl_qlist_t *p_list =
&sm->mgrp_list;<BR> osm_mgrp_t
*p_mgrp;<BR> int i, ret = 0;<BR><BR>@@ -1150,16
+1149,14 @@ int osm_mcast_mgr_process(osm_sm_t * sm)<BR>
mcast_mgr_process_mgrp(sm, p_mgrp);<BR>
}<BR><BR>+ memset(sm->mlids_req, 0,
sm->mlids_req_max);<BR>+ sm->mlids_req_max =
0;<BR>+<BR> /*<BR>
Walk the switches and download the tables for each.<BR>
*/<BR> ret =
mcast_mgr_set_mftables(sm);<BR><BR>- while
(!cl_is_qlist_empty(p_list)) {<BR>-
cl_list_item_t *p = cl_qlist_remove_head(p_list);<BR>-
free(p);<BR>-
}<BR>-<BR> exit:<BR>
CL_PLOCK_RELEASE(sm->p_lock);<BR><BR>@@ -1174,11 +1171,10 @@
exit:<BR> **********************************************************************/<BR> int
osm_mcast_mgr_process_mgroups(osm_sm_t * sm)<BR> {<BR>-
cl_qlist_t *p_list = &sm->mgrp_list;<BR>
osm_mgrp_t *p_mgrp;<BR> ib_net16_t
mlid;<BR>- osm_mcast_mgr_ctxt_t *ctx;<BR>
int ret = 0;<BR>+ unsigned
i;<BR><BR> OSM_LOG_ENTER(sm->p_log);<BR><BR>@@
-1192,14 +1188,12 @@ int osm_mcast_mgr_process_mgroups(osm_sm_t *
sm)<BR> goto
exit;<BR> }<BR><BR>- while
(!cl_is_qlist_empty(p_list)) {<BR>-
ctx = (osm_mcast_mgr_ctxt_t *)
cl_qlist_remove_head(p_list);<BR>-<BR>-
/* nice copy no warning on size diff */<BR>-
memcpy(&mlid, &ctx->mlid,
sizeof(mlid));<BR>+ for (i = 0; i <=
sm->mlids_req_max; i++) {<BR>+
if (!sm->mlids_req[i])<BR>+
continue;<BR>+
sm->mlids_req[i] = 0;<BR><BR>-
/* we can destroy the context now
*/<BR>- free(ctx);<BR>+
mlid = cl_hton16(i +
IB_LID_MCAST_START_HO);<BR><BR>
/* since we delayed the execution we prefer to pass
the<BR> mlid
as the mgrp identifier and then find it or abort */<BR>@@ -1223,6 +1217,9 @@
int osm_mcast_mgr_process_mgroups(osm_sm_t * sm)<BR>
mcast_mgr_process_mgrp(sm,
p_mgrp);<BR> }<BR><BR>+
memset(sm->mlids_req, 0, sm->mlids_req_max);<BR>+
sm->mlids_req_max = 0;<BR>+<BR> /*<BR>
Walk the switches and download the tables for
each.<BR> */<BR>diff --git
a/opensm/opensm/osm_sm.c b/opensm/opensm/osm_sm.c<BR>index 50aee91..e446c9d
100644<BR>--- a/opensm/opensm/osm_sm.c<BR>+++ b/opensm/opensm/osm_sm.c<BR>@@
-166,7 +166,6 @@ void osm_sm_construct(IN osm_sm_t * p_sm)<BR>
cl_event_construct(&p_sm->subnet_up_event);<BR>
cl_event_wheel_construct(&p_sm->trap_aging_tracker);<BR>
cl_thread_construct(&p_sm->sweeper);<BR>-
cl_spinlock_construct(&p_sm->mgrp_lock);<BR>
osm_sm_mad_ctrl_construct(&p_sm->mad_ctrl);<BR>
osm_lid_mgr_construct(&p_sm->lid_mgr);<BR>
osm_ucast_mgr_construct(&p_sm->ucast_mgr);<BR>@@ -234,8
+233,8 @@ void osm_sm_destroy(IN osm_sm_t * p_sm)<BR>
cl_event_destroy(&p_sm->signal_event);<BR>
cl_event_destroy(&p_sm->subnet_up_event);<BR>
cl_spinlock_destroy(&p_sm->signal_lock);<BR>-
cl_spinlock_destroy(&p_sm->mgrp_lock);<BR>
cl_spinlock_destroy(&p_sm->state_lock);<BR>+
free(p_sm->mlids_req);<BR><BR>
osm_log(p_sm->p_log, OSM_LOG_SYS, "Exiting SM\n");
/* Format Waived */<BR>
OSM_LOG_EXIT(p_sm->p_log);<BR>@@ -288,11 +287,14 @@ ib_api_status_t
osm_sm_init(IN osm_sm_t * p_sm, IN osm_subn_t * p_subn,<BR>
if (status != CL_SUCCESS)<BR>
goto Exit;<BR><BR>-
cl_qlist_init(&p_sm->mgrp_list);<BR>-<BR>-
status = cl_spinlock_init(&p_sm->mgrp_lock);<BR>-
if (status != CL_SUCCESS)<BR>+
p_sm->mlids_req_max = 0;<BR>+ p_sm->mlids_req =
malloc((IB_LID_MCAST_END_HO - IB_LID_MCAST_START_HO +<BR>+
1) * sizeof(p_sm->mlids_req[0]));<BR>+
if (!p_sm->mlids_req)<BR>
goto Exit;<BR>+ memset(p_sm->mlids_req,
0,<BR>+ (IB_LID_MCAST_END_HO
- IB_LID_MCAST_START_HO +<BR>+
1) * sizeof(p_sm->mlids_req[0]));<BR><BR>
status = osm_sm_mad_ctrl_init(&p_sm->mad_ctrl,
p_sm->p_subn,<BR>
p_sm->p_mad_pool, p_sm->p_vl15,<BR>@@ -441,32 +443,15 @@
Exit:<BR><BR> /**********************************************************************<BR> **********************************************************************/<BR>-static
ib_api_status_t sm_mgrp_process(IN osm_sm_t * p_sm,<BR>-
IN osm_mgrp_t * p_mgrp)<BR>+static
void request_mlid(osm_sm_t * sm, uint16_t mlid)<BR> {<BR>-
osm_mcast_mgr_ctxt_t *ctx;<BR>-<BR>-
/*<BR>- * 'Schedule' all the QP0 traffic for when
the state manager<BR>- * isn't busy trying to do
something else.<BR>- */<BR>-
ctx = malloc(sizeof(*ctx));<BR>- if (!ctx)<BR>-
return IB_ERROR;<BR>-
memset(ctx, 0, sizeof(*ctx));<BR>-
ctx->mlid = p_mgrp->mlid;<BR>-<BR>-
cl_spinlock_acquire(&p_sm->mgrp_lock);<BR>-
cl_qlist_insert_tail(&p_sm->mgrp_list, &ctx->list_item);<BR>-
cl_spinlock_release(&p_sm->mgrp_lock);<BR>-<BR>-
osm_sm_signal(p_sm,
OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST);<BR>-<BR>- return
IB_SUCCESS;<BR>+ mlid -= IB_LID_MCAST_START_HO;<BR>+
sm->mlids_req[mlid] = 1;<BR>+
if (sm->mlids_req_max < mlid)<BR>+
sm->mlids_req_max = mlid;<BR>+
osm_sm_signal(sm,
OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST);<BR> }<BR><BR>-/**********************************************************************<BR>-
**********************************************************************/<BR> ib_api_status_t
osm_sm_mcgrp_join(IN osm_sm_t * p_sm, IN osm_mgrp_t *mgrp,<BR>
IN const ib_net64_t port_guid)<BR> {<BR>@@
-519,7 +504,7 @@ ib_api_status_t osm_sm_mcgrp_join(IN osm_sm_t * p_sm, IN
osm_mgrp_t *mgrp,<BR>
goto Exit;<BR> }<BR><BR>-
status = sm_mgrp_process(p_sm, mgrp);<BR>+
request_mlid(p_sm, cl_ntoh16(mgrp->mlid));<BR> Exit:<BR>
CL_PLOCK_RELEASE(p_sm->p_lock);<BR>
OSM_LOG_EXIT(p_sm->p_log);<BR>@@ -527,8 +512,6 @@
Exit:<BR> return
status;<BR> }<BR><BR>-/**********************************************************************<BR>-
**********************************************************************/<BR> ib_api_status_t
osm_sm_mcgrp_leave(IN osm_sm_t * p_sm, IN osm_mgrp_t *mgrp,<BR>
IN const ib_net64_t port_guid)<BR> {<BR>@@
-557,7 +540,7 @@ ib_api_status_t osm_sm_mcgrp_leave(IN osm_sm_t * p_sm, IN
osm_mgrp_t *mgrp,<BR><BR>
osm_port_remove_mgrp(p_port, mgrp);<BR><BR>-
status = sm_mgrp_process(p_sm, mgrp);<BR>+
request_mlid(p_sm, cl_hton16(mgrp->mlid));<BR> Exit:<BR>
CL_PLOCK_RELEASE(p_sm->p_lock);<BR>
OSM_LOG_EXIT(p_sm->p_log);<BR>--<BR>1.6.4.2<BR><BR>_______________________________________________<BR>general
mailing list<BR><A target=_blank
href="mailto:general@lists.openfabrics.org">general@lists.openfabrics.org</A><BR><A
target=_blank
href="http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general">http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general</A><BR><BR>To
unsubscribe, please visit <A target=_blank
href="http://openib.org/mailman/listinfo/openib-general">http://openib.org/mailman/listinfo/openib-general</A><BR></BLOCKQUOTE></DIV><BR></BLOCKQUOTE></BODY></HTML>