<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META http-equiv=Content-Type content="text/html; charset=us-ascii">
<META content="MSHTML 6.00.6000.16587" name=GENERATOR></HEAD>
<BODY>
<DIV><SPAN class=840311217-10112008><FONT face=Arial color=#0000ff size=2>I have
applied the minimum change (set the dlid to 0) on 1745,
1746.</FONT></SPAN></DIV>
<DIV><SPAN class=840311217-10112008><FONT face=Arial color=#0000ff
size=2></FONT></SPAN> </DIV>
<DIV><SPAN class=840311217-10112008><FONT face=Arial color=#0000ff size=2>This
should stop the blue screen.</FONT></SPAN></DIV>
<DIV><SPAN class=840311217-10112008><FONT face=Arial color=#0000ff
size=2></FONT></SPAN> </DIV>
<DIV><SPAN class=840311217-10112008><FONT face=Arial color=#0000ff
size=2>Thanks</FONT></SPAN></DIV>
<DIV><SPAN class=840311217-10112008><FONT face=Arial color=#0000ff
size=2>Tzachi</FONT></SPAN></DIV>
<DIV><SPAN class=840311217-10112008><FONT face=Arial color=#0000ff
size=2></FONT></SPAN> </DIV><BR>
<BLOCKQUOTE dir=ltr
style="PADDING-LEFT: 5px; MARGIN-LEFT: 5px; BORDER-LEFT: #0000ff 2px solid; MARGIN-RIGHT: 0px">
<DIV class=OutlookMessageHeader lang=en-us dir=ltr align=left>
<HR tabIndex=-1>
<FONT face=Tahoma size=2><B>From:</B> ofw-bounces@lists.openfabrics.org
[mailto:ofw-bounces@lists.openfabrics.org] <B>On Behalf Of </B>Tzachi
Dar<BR><B>Sent:</B> Thursday, November 06, 2008 10:47 PM<BR><B>To:</B>
ofw@lists.openfabrics.org<BR><B>Subject:</B> [ofw] Patch: [ipoib] Make sure
that the dlid is zero if it is not inthe list.<BR></FONT><BR></DIV>
<DIV></DIV>
<DIV><FONT face=Arial size=2><SPAN class=023533720-06112008>First, I'm happy
to say that I have found the source of the blue screens that we had in the
lists.</SPAN></FONT></DIV>
<DIV><FONT face=Arial size=2><SPAN
class=023533720-06112008></SPAN></FONT> </DIV>
<DIV><FONT face=Arial size=2><SPAN class=023533720-06112008>The problem
happens when the function __mcast_cb and tries to enter an end_point to the
dlid list and fails. (see call stack below)</SPAN></FONT></DIV>
<DIV><FONT face=Arial size=2><SPAN
class=023533720-06112008></SPAN></FONT> </DIV>
<DIV><FONT face=Arial size=2><SPAN class=023533720-06112008>As a result we
have an end_point that is not in the dlid list but has a dlid that is not
zero. When we take the endpoint from the list, </SPAN></FONT></DIV>
<DIV><FONT face=Arial size=2><SPAN class=023533720-06112008>we try to remove
it from the dlid lists and crash.</SPAN></FONT></DIV>
<DIV><FONT face=Arial size=2><SPAN
class=023533720-06112008></SPAN></FONT> </DIV>
<DIV><FONT face=Arial size=2><SPAN class=023533720-06112008>This checkin makes
sure that once we fail to enter the list dlid will be 0, we will not try to
remove it from the list and no blue screen.</SPAN></FONT></DIV>
<DIV><FONT face=Arial size=2><SPAN
class=023533720-06112008></SPAN></FONT> </DIV>
<DIV><FONT face=Arial size=2><SPAN class=023533720-06112008>The real issue is
what else should we done. I'm afraid that things will not work as this
endpoint has no dlid.</SPAN></FONT></DIV>
<DIV><FONT face=Arial size=2><SPAN class=023533720-06112008>My ideas
are:</SPAN></FONT></DIV>
<DIV><FONT face=Arial size=2><SPAN
class=023533720-06112008></SPAN></FONT> </DIV>
<DIV><FONT face=Arial size=2><SPAN class=023533720-06112008>1) Remove this
endpoint from the list.</SPAN></FONT></DIV>
<DIV><FONT face=Arial size=2><SPAN class=023533720-06112008>2) Remove the
other endpoint from the list (the one that has the same
dlid)</SPAN></FONT></DIV>
<DIV><FONT face=Arial size=2><SPAN class=023533720-06112008>3) Force a reset
by NDIS, to start things all over again.</SPAN></FONT></DIV>
<DIV><FONT face=Arial size=2><SPAN
class=023533720-06112008></SPAN></FONT> </DIV>
<DIV><FONT face=Arial size=2><SPAN class=023533720-06112008>What are the
community thoughts.</SPAN></FONT></DIV>
<DIV><FONT face=Arial size=2><SPAN
class=023533720-06112008></SPAN></FONT> </DIV>
<DIV><FONT face=Arial size=2><SPAN
class=023533720-06112008></SPAN></FONT> </DIV>
<DIV><FONT face=Arial size=2><SPAN
class=023533720-06112008></SPAN></FONT> </DIV>
<DIV><FONT face=Arial size=2><SPAN class=023533720-06112008>call stack of the
program:</SPAN></FONT></DIV>
<DIV><FONT face=Arial size=2><SPAN
class=023533720-06112008>Child-SP
RetAddr Call
Site<BR>fffffa60`051fa648 fffff800`017374a8
nt!DbgBreakPoint<BR>fffffa60`051fa650 fffffa60`053bfdd5
nt!RtlAssert+0x108<BR>fffffa60`051fab70 fffffa60`052e8f62
ipoib!__mcast_cb+0xc45
[s:\builds\3433\branches\mlnx_winof_2-0\ulp\ipoib\kernel\ipoib_port.c @
6096]<BR>fffffa60`051fabf0 fffffa60`05264e0f ibbus!join_async_cb+0x4b2
[s:\builds\3433\branches\mlnx_winof_2-0\core\al\al_mcast.c @
535]<BR>fffffa60`051fac90 fffffa60`0526ade5 ibbus!__cl_async_proc_worker+0xbf
[s:\builds\3433\branches\mlnx_winof_2-0\core\complib\cl_async_proc.c @
153]<BR>fffffa60`051face0 fffffa60`0526c0cc
ibbus!__cl_thread_pool_routine+0x75
[s:\builds\3433\branches\mlnx_winof_2-0\core\complib\cl_threadpool.c @
67]<BR>fffffa60`051fad20 fffff800`018c1de3 ibbus!__thread_callback+0x3c
[s:\builds\3433\branches\mlnx_winof_2-0\core\complib\kernel\cl_thread.c @
49]<BR>fffffa60`051fad50 fffff800`016d8536
nt!PspSystemThreadStartup+0x57<BR>fffffa60`051fad80 00000000`00000000
nt!KiStartSystemThread+0x16<BR></SPAN></FONT></DIV>
<DIV><FONT face=Arial size=2></FONT> </DIV>
<DIV><FONT face=Arial size=2></FONT> </DIV>
<DIV><FONT face=Arial size=2>Index:
Q:/projinf4/trunk/ulp/ipoib/kernel/ipoib_port.c<BR>===================================================================<BR>---
Q:/projinf4/trunk/ulp/ipoib/kernel/ipoib_port.c (revision 3441)<BR>+++
Q:/projinf4/trunk/ulp/ipoib/kernel/ipoib_port.c (revision 3442)<BR>@@
-5007,6 +5007,10 @@<BR> p_qitem =
cl_qmap_insert(<BR> &p_port->endpt_mgr.lid_endpts,
p_endpt->dlid, &p_endpt->lid_item );<BR> CL_ASSERT(
p_qitem == &p_endpt->lid_item );<BR>+ if (p_qitem !=
&p_endpt->lid_item) {<BR>+ // Since we failed to
insert into the list, make sure it is not
removed<BR>+ p_endpt->dlid
=0;<BR>+ }<BR> }<BR> <BR> IPOIB_EXIT(
IPOIB_DBG_ENDPT );<BR>@@ -6094,6 +6098,11 @@<BR> p_qitem =
cl_qmap_insert(<BR> &p_port->endpt_mgr.lid_endpts,
p_endpt->dlid, &p_endpt->lid_item );<BR> CL_ASSERT(
p_qitem == &p_endpt->lid_item );<BR>+ if (p_qitem !=
&p_endpt->lid_item) {<BR>+ // Since we failed to
insert into the list, make sure it is not
removed<BR>+ p_endpt->dlid
=0;<BR>+ }<BR>+ <BR> }<BR> /* set
flag that endpoint is use */<BR> p_endpt->is_in_use =
TRUE;<BR></DIV></BLOCKQUOTE></FONT></BODY></HTML>