<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META http-equiv=Content-Type content="text/html; charset=us-ascii">
<META content="MSHTML 6.00.6000.16587" name=GENERATOR></HEAD>
<BODY>
<DIV><FONT face=Arial size=2><SPAN class=023533720-06112008>First, I'm happy to
say that I have found the source of the blue screens that we had in the
lists.</SPAN></FONT></DIV>
<DIV><FONT face=Arial size=2><SPAN
class=023533720-06112008></SPAN></FONT> </DIV>
<DIV><FONT face=Arial size=2><SPAN class=023533720-06112008>The problem happens
when the function __mcast_cb and tries to enter an end_point to the dlid
list and fails. (see call stack below)</SPAN></FONT></DIV>
<DIV><FONT face=Arial size=2><SPAN
class=023533720-06112008></SPAN></FONT> </DIV>
<DIV><FONT face=Arial size=2><SPAN class=023533720-06112008>As a result we have
an end_point that is not in the dlid list but has a dlid that is not zero. When
we take the endpoint from the list, </SPAN></FONT></DIV>
<DIV><FONT face=Arial size=2><SPAN class=023533720-06112008>we try to remove it
from the dlid lists and crash.</SPAN></FONT></DIV>
<DIV><FONT face=Arial size=2><SPAN
class=023533720-06112008></SPAN></FONT> </DIV>
<DIV><FONT face=Arial size=2><SPAN class=023533720-06112008>This checkin makes
sure that once we fail to enter the list dlid will be 0, we will not try to
remove it from the list and no blue screen.</SPAN></FONT></DIV>
<DIV><FONT face=Arial size=2><SPAN
class=023533720-06112008></SPAN></FONT> </DIV>
<DIV><FONT face=Arial size=2><SPAN class=023533720-06112008>The real issue is
what else should we done. I'm afraid that things will not work as this endpoint
has no dlid.</SPAN></FONT></DIV>
<DIV><FONT face=Arial size=2><SPAN class=023533720-06112008>My ideas
are:</SPAN></FONT></DIV>
<DIV><FONT face=Arial size=2><SPAN
class=023533720-06112008></SPAN></FONT> </DIV>
<DIV><FONT face=Arial size=2><SPAN class=023533720-06112008>1) Remove this
endpoint from the list.</SPAN></FONT></DIV>
<DIV><FONT face=Arial size=2><SPAN class=023533720-06112008>2) Remove the other
endpoint from the list (the one that has the same dlid)</SPAN></FONT></DIV>
<DIV><FONT face=Arial size=2><SPAN class=023533720-06112008>3) Force a reset by
NDIS, to start things all over again.</SPAN></FONT></DIV>
<DIV><FONT face=Arial size=2><SPAN
class=023533720-06112008></SPAN></FONT> </DIV>
<DIV><FONT face=Arial size=2><SPAN class=023533720-06112008>What are the
community thoughts.</SPAN></FONT></DIV>
<DIV><FONT face=Arial size=2><SPAN
class=023533720-06112008></SPAN></FONT> </DIV>
<DIV><FONT face=Arial size=2><SPAN
class=023533720-06112008></SPAN></FONT> </DIV>
<DIV><FONT face=Arial size=2><SPAN
class=023533720-06112008></SPAN></FONT> </DIV>
<DIV><FONT face=Arial size=2><SPAN class=023533720-06112008>call stack of the
program:</SPAN></FONT></DIV>
<DIV><FONT face=Arial size=2><SPAN
class=023533720-06112008>Child-SP
RetAddr Call
Site<BR>fffffa60`051fa648 fffff800`017374a8
nt!DbgBreakPoint<BR>fffffa60`051fa650 fffffa60`053bfdd5
nt!RtlAssert+0x108<BR>fffffa60`051fab70 fffffa60`052e8f62 ipoib!__mcast_cb+0xc45
[s:\builds\3433\branches\mlnx_winof_2-0\ulp\ipoib\kernel\ipoib_port.c @
6096]<BR>fffffa60`051fabf0 fffffa60`05264e0f ibbus!join_async_cb+0x4b2
[s:\builds\3433\branches\mlnx_winof_2-0\core\al\al_mcast.c @
535]<BR>fffffa60`051fac90 fffffa60`0526ade5 ibbus!__cl_async_proc_worker+0xbf
[s:\builds\3433\branches\mlnx_winof_2-0\core\complib\cl_async_proc.c @
153]<BR>fffffa60`051face0 fffffa60`0526c0cc ibbus!__cl_thread_pool_routine+0x75
[s:\builds\3433\branches\mlnx_winof_2-0\core\complib\cl_threadpool.c @
67]<BR>fffffa60`051fad20 fffff800`018c1de3 ibbus!__thread_callback+0x3c
[s:\builds\3433\branches\mlnx_winof_2-0\core\complib\kernel\cl_thread.c @
49]<BR>fffffa60`051fad50 fffff800`016d8536
nt!PspSystemThreadStartup+0x57<BR>fffffa60`051fad80 00000000`00000000
nt!KiStartSystemThread+0x16<BR></SPAN></FONT></DIV>
<DIV><FONT face=Arial size=2></FONT> </DIV>
<DIV><FONT face=Arial size=2></FONT> </DIV>
<DIV><FONT face=Arial size=2>Index:
Q:/projinf4/trunk/ulp/ipoib/kernel/ipoib_port.c<BR>===================================================================<BR>---
Q:/projinf4/trunk/ulp/ipoib/kernel/ipoib_port.c (revision 3441)<BR>+++
Q:/projinf4/trunk/ulp/ipoib/kernel/ipoib_port.c (revision 3442)<BR>@@
-5007,6 +5007,10 @@<BR> p_qitem =
cl_qmap_insert(<BR> &p_port->endpt_mgr.lid_endpts,
p_endpt->dlid, &p_endpt->lid_item );<BR> CL_ASSERT(
p_qitem == &p_endpt->lid_item );<BR>+ if (p_qitem !=
&p_endpt->lid_item) {<BR>+ // Since we failed to insert
into the list, make sure it is not
removed<BR>+ p_endpt->dlid
=0;<BR>+ }<BR> }<BR> <BR> IPOIB_EXIT(
IPOIB_DBG_ENDPT );<BR>@@ -6094,6 +6098,11 @@<BR> p_qitem =
cl_qmap_insert(<BR> &p_port->endpt_mgr.lid_endpts,
p_endpt->dlid, &p_endpt->lid_item );<BR> CL_ASSERT(
p_qitem == &p_endpt->lid_item );<BR>+ if (p_qitem !=
&p_endpt->lid_item) {<BR>+ // Since we failed to insert
into the list, make sure it is not
removed<BR>+ p_endpt->dlid
=0;<BR>+ }<BR>+ <BR> }<BR> /* set
flag that endpoint is use */<BR> p_endpt->is_in_use =
TRUE;<BR></DIV></FONT></BODY></HTML>