<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META http-equiv=Content-Type content="text/html; charset=us-ascii">
<META content="MSHTML 6.00.6000.16640" name=GENERATOR></HEAD>
<BODY>
<DIV><FONT face=Arial size=2>Bug description and reproduction:</FONT></DIV>
<DIV><FONT face=Arial size=2>1. <SPAN
class=872394014-11022009>Con</SPAN>nect to machines (A and B) via IB
switch</FONT></DIV>
<DIV><SPAN class=872394014-11022009><FONT face=Arial size=2>2. Run subnet
manager (say, opensm) on B</FONT></SPAN></DIV>
<DIV><SPAN class=872394014-11022009><FONT face=Arial size=2>3. Kill opensm and
clear arp tables</FONT></SPAN></DIV>
<DIV><SPAN class=872394014-11022009><FONT face=Arial size=2>4. Rerun opensm -
ping will not longer work</FONT></SPAN></DIV>
<DIV><SPAN class=872394014-11022009><FONT face=Arial size=2>5. That's because
new opensm instance will clear old multicast groups, and side A will be not
aware about opensm restart and will not request to join new MCAST
group</FONT></SPAN></DIV>
<DIV><SPAN class=872394014-11022009><FONT face=Arial
size=2></FONT></SPAN> </DIV>
<DIV><SPAN class=872394014-11022009><FONT face=Arial
size=2>Explanations:</FONT></SPAN></DIV>
<DIV><SPAN class=872394014-11022009><FONT face=Arial size=2>There are 2 types of
events relevant in our case: PnP and AE.</FONT></SPAN></DIV>
<DIV><SPAN class=872394014-11022009><FONT face=Arial size=2>The problem had
happened due to:</FONT></SPAN></DIV>
<DIV><SPAN class=872394014-11022009><FONT face=Arial size=2>1. During opensm
restart, port will generate AE event: IB_EVENT_LID_CHANGE or (in other
cases) IB_EVENT_CLIENT_REREGISTER</FONT></SPAN></DIV>
<DIV><SPAN class=872394014-11022009><FONT face=Arial size=2>These events will be
generated even in the case when SM was restart<SPAN
class=872394014-11022009>e</SPAN>d but LID will not actually
change.</FONT></SPAN></DIV>
<DIV><SPAN class=872394014-11022009><FONT face=Arial
size=2></FONT></SPAN> </DIV>
<DIV><SPAN class=872394014-11022009><FONT face=Arial size=2>2. All PnP events
were handled properly; but these events were mapped to
IB_AE_FATAL</FONT></SPAN></DIV>
<DIV><SPAN class=872394014-11022009><FONT face=Arial size=2>This patch fixes it
and maps IB_EVENT_* events to appropriate IB_AE_* events and then to IB_PNP_*
events</FONT></SPAN></DIV>
<DIV><SPAN class=872394014-11022009><FONT face=Arial
size=2></FONT></SPAN> </DIV>
<DIV><SPAN class=872394014-11022009><FONT face=Arial size=2>3. function
force_smi_poll() will now update it's subscribers about LID change
event iff LID was changed.</FONT></SPAN></DIV>
<DIV><SPAN class=872394014-11022009><FONT face=Arial size=2>So, we still have
the problem when opensm was restarted and no one of the port attributes was
changed</FONT></SPAN></DIV>
<DIV><SPAN class=872394014-11022009><FONT face=Arial size=2>This patch generated
appropriate IB_PNP event to resolve this issue</FONT></SPAN></DIV>
<DIV><FONT face=Arial size=2></FONT> </DIV>
<DIV><SPAN class=872394014-11022009><FONT face=Arial size=2>Signed-off by:
Alexander Naslednikov (xalex at mellanox.co.il)</FONT></SPAN></DIV>
<DIV><SPAN class=872394014-11022009><FONT face=Arial
size=2></FONT></SPAN> </DIV>
<DIV><FONT face=Arial size=2>Index:
core/al/al_ci_ca_shared.c<BR>===================================================================<BR>---
core/al/al_ci_ca_shared.c (revision 3889)<BR>+++
core/al/al_ci_ca_shared.c (working copy)<BR>@@ -299,10 +299,27
@@<BR> cq_async_event_cb( &p_event_item->event_rec
);<BR> break;<BR> <BR>+#ifdef
CL_KERNEL<BR>+<BR>+ case IB_AE_LID_CHANGE:<BR>+ case
IB_AE_CLIENT_REREGISTER:<BR>+ // These AE event will be generated
even in the case when<BR>+ // SM was restaretd but LID will not
actually change.<BR>+ // It's important to propagate these event (via
PnP mechanism)<BR>+ // up to subscribers. Otherwise, there will be no
ping after<BR>+ // subnet manager restart<BR>+ //if
(AL_OBJ_IS_TYPE(p_obj, AL_OBJ_TYPE_CI_CA)<BR>+ if (AL_BASE_TYPE(
p_obj->type) == AL_OBJ_TYPE_CI_CA)
{<BR>+ pnp_force_event( (struct _al_ci_ca *) p_obj,
IB_PNP_LID_CHANGE,<BR>+ p_event_item->event_rec.port_number
);<BR>+ }<BR>+ break;<BR>+#endif
//CL_KERNEL<BR>+<BR> case IB_AE_PORT_TRAP:<BR> case
IB_AE_PORT_DOWN:<BR> case IB_AE_PORT_ACTIVE:<BR>- case
IB_AE_CLIENT_REREGISTER:<BR>+ <BR> #ifdef
CL_KERNEL<BR> /* The SMI polling routine may report a PnP
event. */<BR> force_smi_poll();<BR>Index:
core/al/al_pnp.h<BR>===================================================================<BR>---
core/al/al_pnp.h (revision 3889)<BR>+++ core/al/al_pnp.h (working
copy)<BR>@@ -216,6 +216,13
@@<BR> IN KEVENT *p_sync_event,<BR> OUT ib_pnp_handle_t*
const ph_pnp );<BR> <BR>+void<BR>+pnp_force_event(<BR>+ IN
struct _al_ci_ca * p_ci_ca,<BR>+ IN
ib_pnp_event_t pnp_event,<BR>+ IN uint8_t
port_num);<BR>+<BR>+<BR> #endif /* CL_KERNEL
*/<BR> <BR> static inline ib_pnp_class_t<BR>Index:
core/al/kernel/al_ci_ca.c<BR>===================================================================<BR>---
core/al/kernel/al_ci_ca.c (revision 3889)<BR>+++
core/al/kernel/al_ci_ca.c (working copy)<BR>@@ -347,6 +347,7
@@<BR> event_rec.code =
p_event_record->type;<BR> event_rec.context =
p_event_record->context;<BR> event_rec.vendor_specific =
p_event_record->vendor_specific;<BR>+ event_rec.port_number =
p_event_record->port_number;<BR> <BR> ci_ca_async_event(
&event_rec );<BR> <BR>Index:
core/al/kernel/al_pnp.c<BR>===================================================================<BR>---
core/al/kernel/al_pnp.c (revision 3889)<BR>+++
core/al/kernel/al_pnp.c (working copy)<BR>@@ -1740,3 +1740,26
@@<BR> AL_EXIT( AL_DBG_PNP );<BR> return
IB_UNSUPPORTED;<BR> }<BR>+<BR>+void<BR>+pnp_force_event(<BR>+ IN
struct _al_ci_ca * p_ci_ca,<BR>+ IN
ib_pnp_event_t pnp_event,<BR>+ IN uint8_t
port_num)<BR>+{<BR>+ <BR>+#define PORT_INDEX_OFFSET
1<BR>+ al_pnp_ca_event_t event_rec;<BR>+<BR>+ ASSERT(p_ci_ca);<BR>+ <BR>+ if
(!p_ci_ca)<BR>+ return;<BR>+ <BR>+ event_rec.p_ci_ca =
p_ci_ca;<BR>+ event_rec.port_index = port_num -
PORT_INDEX_OFFSET;<BR>+ event_rec.pnp_event =
pnp_event;<BR>+ __pnp_process_port_forward( &event_rec
);<BR>+}<BR>+<BR>+<BR>Index:
hw/mlx4/kernel/bus/inc/ib_verbs.h<BR>===================================================================<BR>---
hw/mlx4/kernel/bus/inc/ib_verbs.h (revision 3889)<BR>+++
hw/mlx4/kernel/bus/inc/ib_verbs.h (working copy)<BR>@@ -274,10 +274,11
@@<BR> IB_EVENT_RESET_CLIENT =
IB_AE_RESET_CLIENT, // device will be reset upon client
request<BR> IB_EVENT_RESET_END =
IB_AE_RESET_END, // device has been reset
<BR> IB_EVENT_RESET_FAILED =
IB_AE_RESET_FAILED, // device has been reset
<BR>- IB_EVENT_LID_CHANGE = IB_AE_UNKNOWN +
1,<BR>- IB_EVENT_PKEY_CHANGE,<BR>- IB_EVENT_SM_CHANGE,<BR>- IB_EVENT_CLIENT_REREGISTER<BR>+ IB_EVENT_LID_CHANGE =
IB_AE_LID_CHANGE,<BR>+ IB_EVENT_CLIENT_REREGISTER =
IB_AE_CLIENT_REREGISTER,<BR>+ IB_EVENT_PKEY_CHANGE =
IB_AE_PKEY_CHANGE,<BR>+ IB_EVENT_SM_CHANGE =
IB_AE_SM_CHANGE<BR>+ <BR> };<BR> <BR> struct ib_event
{<BR>Index:
inc/iba/ib_al.h<BR>===================================================================<BR>---
inc/iba/ib_al.h (revision 3889)<BR>+++ inc/iba/ib_al.h (working
copy)<BR>@@ -473,6 +473,8 @@<BR> TO_LONG_PTR(struct
_ib_srq*, h_srq);<BR> <BR> }
handle;<BR>+ <BR>+ uint8_t port_number;<BR> <BR> } ib_async_event_rec_t;<BR> /*<BR>Index:
inc/iba/ib_types.h<BR>===================================================================<BR>---
inc/iba/ib_types.h (revision 3889)<BR>+++ inc/iba/ib_types.h (working
copy)<BR>@@ -8797,6 +8797,9
@@<BR> IB_AE_RESET_CLIENT,<BR> IB_AE_RESET_END,<BR> IB_AE_RESET_FAILED,<BR>+ IB_AE_LID_CHANGE,<BR>+ IB_AE_PKEY_CHANGE,<BR>+ IB_AE_SM_CHANGE,<BR> IB_AE_UNKNOWN /*
ALWAYS LAST ENUM VALUE */<BR> <BR> } ib_async_event_t;<BR>Index:
ulp/opensm/user/include/iba/ib_types_extended.h<BR>===================================================================<BR>---
ulp/opensm/user/include/iba/ib_types_extended.h (revision 3889)<BR>+++
ulp/opensm/user/include/iba/ib_types_extended.h (working copy)<BR>@@ -234,6
+234,9
@@<BR> IB_AE_SRQ_LIMIT_REACHED,<BR> IB_AE_SRQ_CATAS_ERROR,<BR> IB_AE_SRQ_QP_LAST_WQE_REACHED,<BR>+ IB_AE_LID_CHANGE,<BR>+ IB_AE_PKEY_CHANGE,<BR>+ IB_AE_SM_CHANGE,<BR> IB_AE_UNKNOWN /*
ALWAYS LAST ENUM VALUE
*/<BR> <BR> } ib_async_event_t;<BR></DIV></FONT></BODY></HTML>