<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">

<HTML>

<HEAD>

<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=US-ASCII">

<META NAME="Generator" CONTENT="MS Exchange Server version 5.5.2654.45">

<TITLE>RE: [openib-general] SM Bad Port Handling</TITLE>

</HEAD>

<BODY>


<P><FONT SIZE=2>Yes, the algorithm looks reasonable. </FONT>

<BR><FONT SIZE=2>I would make the number of packets required to qualify the ports on the way a parameter with default value of 10 or 20 (surely not 4).</FONT></P>


<P><FONT SIZE=2>Eitan Zahavi</FONT>

<BR><FONT SIZE=2>Design Technology Director</FONT>

<BR><FONT SIZE=2>Mellanox Technologies LTD</FONT>

<BR><FONT SIZE=2>Tel:+972-4-9097208</FONT>

<BR><FONT SIZE=2>Fax:+972-4-9593245</FONT>

<BR><FONT SIZE=2>P.O. Box 586 Yokneam 20692 ISRAEL</FONT>

</P>

<BR>


<P><FONT SIZE=2>> -----Original Message-----</FONT>

<BR><FONT SIZE=2>> From: Hal Rosenstock [<A HREF="mailto:halr@voltaire.com">mailto:halr@voltaire.com</A>]</FONT>

<BR><FONT SIZE=2>> Sent: Tuesday, April 19, 2005 5:00 PM</FONT>

<BR><FONT SIZE=2>> To: Eitan Zahavi</FONT>

<BR><FONT SIZE=2>> Cc: 'shaharf'; openib-general@openib.org</FONT>

<BR><FONT SIZE=2>> Subject: RE: [openib-general] SM Bad Port Handling</FONT>

<BR><FONT SIZE=2>> </FONT>

<BR><FONT SIZE=2>> On Thu, 2005-04-14 at 07:13, Eitan Zahavi wrote:</FONT>

<BR><FONT SIZE=2>> > [EZ] Not at all. Although the target port is known. The flaky link</FONT>

<BR><FONT SIZE=2>> > that fails the mad might be anywhere along the path to the port. So,</FONT>

<BR><FONT SIZE=2>> > if you mark the target port as bad you might be marking the wrong</FONT>

<BR><FONT SIZE=2>> > port!</FONT>

<BR><FONT SIZE=2>> </FONT>

<BR><FONT SIZE=2>> OpenSM does look for traps 128-131 which includes Local link integrity</FONT>

<BR><FONT SIZE=2>> (129) which is likely from these noisy ports, right ?</FONT>

<BR><FONT SIZE=2>> </FONT>

<BR><FONT SIZE=2>> >  [EZ] Let me clarify with an example:</FONT>

<BR><FONT SIZE=2>> > SM=HCA1/P1 -></FONT>

<BR><FONT SIZE=2>> > SW1/P1....SW1/P2->SW2/P1..SW2/P2->SW3/P1....SW3/P3->HCA2/P1</FONT>

<BR><FONT SIZE=2>> ></FONT>

<BR><FONT SIZE=2>> > \..SW4/P4->SW3/P4..SW3/P5->SW3/P2../</FONT>

<BR><FONT SIZE=2>> ></FONT>

<BR><FONT SIZE=2>> > If the flaky link is between SW2/P2 and SW3/P1 then the packet sent to</FONT>

<BR><FONT SIZE=2>> > HCA2 using DR : [0][1][1][2][3] might fail . If you mark HCA2/P1 as</FONT>

<BR><FONT SIZE=2>> > bad then you actually will loose that HCA for no good reason since</FONT>

<BR><FONT SIZE=2>> > another path from SM to HCA2 exists.</FONT>

<BR><FONT SIZE=2>> </FONT>

<BR><FONT SIZE=2>> OK. To be really sure about the failed port, one could then walk the</FONT>

<BR><FONT SIZE=2>> entire DR path from the SM to the perceived non responding port and if</FONT>

<BR><FONT SIZE=2>> the same port along the path doesn't respond some number of times (say</FONT>

<BR><FONT SIZE=2>> 4) in a row, that port's peer port can be marked as unhealthy. Is this</FONT>

<BR><FONT SIZE=2>> algorithm acceptable ?</FONT>

<BR><FONT SIZE=2>> </FONT>

<BR><FONT SIZE=2>> -- Hal</FONT>

</P>


</BODY>

</HTML>