<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML>
<HEAD>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=US-ASCII">
<META NAME="Generator" CONTENT="MS Exchange Server version 5.5.2654.45">
<TITLE>RE: [openib-general] SM Bad Port Handling</TITLE>
</HEAD>
<BODY>
<P><FONT SIZE=2>Hi Hal,</FONT>
</P>
<P><FONT SIZE=2>I have looked up the mail thread. </FONT>
<BR><FONT SIZE=2>I could not find a log file indicating there was a repetitive query of the bad port.</FONT>
</P>
<P><FONT SIZE=2>I know the code and I can not find a reason for it to do a repetitive port. </FONT>
<BR><FONT SIZE=2> </FONT>
<BR><FONT SIZE=2>Can you explain how this can happen? </FONT>
</P>
<BR>
<P><FONT SIZE=2>Eitan Zahavi</FONT>
<BR><FONT SIZE=2>Design Technology Director</FONT>
<BR><FONT SIZE=2>Mellanox Technologies LTD</FONT>
<BR><FONT SIZE=2>Tel:+972-4-9097208</FONT>
<BR><FONT SIZE=2>Fax:+972-4-9593245</FONT>
<BR><FONT SIZE=2>P.O. Box 586 Yokneam 20692 ISRAEL</FONT>
</P>
<BR>
<P><FONT SIZE=2>> -----Original Message-----</FONT>
<BR><FONT SIZE=2>> From: Hal Rosenstock [<A HREF="mailto:halr@voltaire.com">mailto:halr@voltaire.com</A>]</FONT>
<BR><FONT SIZE=2>> Sent: Friday, April 08, 2005 10:07 PM</FONT>
<BR><FONT SIZE=2>> To: Eitan Zahavi</FONT>
<BR><FONT SIZE=2>> Cc: openib-general@openib.org</FONT>
<BR><FONT SIZE=2>> Subject: RE: [openib-general] SM Bad Port Handling</FONT>
<BR><FONT SIZE=2>> </FONT>
<BR><FONT SIZE=2>> On Fri, 2005-04-08 at 11:55, Eitan Zahavi wrote:</FONT>
<BR><FONT SIZE=2>> > Hi Hal,</FONT>
<BR><FONT SIZE=2>> ></FONT>
<BR><FONT SIZE=2>> > This is a physical port attribute so the file is osm_port.h and the</FONT>
<BR><FONT SIZE=2>> > structure is osm_physp_t.</FONT>
<BR><FONT SIZE=2>> > From the doc on the structure:</FONT>
<BR><FONT SIZE=2>> > *</FONT>
<BR><FONT SIZE=2>> > * healthy</FONT>
<BR><FONT SIZE=2>> > * Tracks the health of the port. Normally should be TRUE but</FONT>
<BR><FONT SIZE=2>> > * might change as a result of incoming traps indicating the port</FONT>
<BR><FONT SIZE=2>> > * healthy is questionable.</FONT>
<BR><FONT SIZE=2>> > *</FONT>
<BR><FONT SIZE=2>> </FONT>
<BR><FONT SIZE=2>> Yup. It's definitely there in the gen2 code base.</FONT>
<BR><FONT SIZE=2>> </FONT>
<BR><FONT SIZE=2>> > I have been trying my best to find how it can happen that a port that</FONT>
<BR><FONT SIZE=2>> > does not respond will cause OpenSM to continuously poll it. This can</FONT>
<BR><FONT SIZE=2>> > not happen so unless you can explain how it happens please do not</FONT>
<BR><FONT SIZE=2>> > contaminate the code with un-needed code.</FONT>
<BR><FONT SIZE=2>> </FONT>
<BR><FONT SIZE=2>> This part of the code has not been touched. I've put all meaningful</FONT>
<BR><FONT SIZE=2>> patches and ideas on how things might change out on this list.</FONT>
<BR><FONT SIZE=2>> </FONT>
<BR><FONT SIZE=2>> > The only thing that comes to mind it the case of failure to "Set" some</FONT>
<BR><FONT SIZE=2>> > attributes of devices in the fabric.</FONT>
<BR><FONT SIZE=2>> </FONT>
<BR><FONT SIZE=2>> I dug out Ron's emails on this. I don't think that was what was going</FONT>
<BR><FONT SIZE=2>> on. It was a SubnGet(NodeInfo) which failed. See</FONT>
<BR><FONT SIZE=2>> <A HREF="http://openib.org/pipermail/openib-general/2005-February/009125.html" TARGET="_blank">http://openib.org/pipermail/openib-general/2005-February/009125.html</A></FONT>
<BR><FONT SIZE=2>> for more details.</FONT>
<BR><FONT SIZE=2>> </FONT>
<BR><FONT SIZE=2>> -- Hal</FONT>
<BR><FONT SIZE=2>> </FONT>
<BR><FONT SIZE=2>> > This happens only after discovery is completed and only after the</FONT>
<BR><FONT SIZE=2>> > validity of the data base is verified (i.e. each node has ports, each</FONT>
<BR><FONT SIZE=2>> > port have a node ...)</FONT>
<BR><FONT SIZE=2>> ></FONT>
<BR><FONT SIZE=2>> > In that case of failure to set some attributes (aka LFT, PortInfo etc)</FONT>
<BR><FONT SIZE=2>> > OpenSM will output a clear error message: "Errors in intialization"</FONT>
<BR><FONT SIZE=2>> > and will restart a full sweep of the fabric.</FONT>
<BR><FONT SIZE=2>> ></FONT>
<BR><FONT SIZE=2>> > This is the only way one get an infinite polling on the entire subnet.</FONT>
<BR><FONT SIZE=2>> </FONT>
<BR><FONT SIZE=2>> > In general, it might make sense to try and improve how OpenSM</FONT>
<BR><FONT SIZE=2>> > qualifies each fabric port for the statistics of the number of packet</FONT>
<BR><FONT SIZE=2>> > drops versus good packets it passed through. Note this is complex due</FONT>
<BR><FONT SIZE=2>> > to the fact a port might affect packets that goes through it. And</FONT>
<BR><FONT SIZE=2>> > there is no way to know on which hop on the path the packet was</FONT>
<BR><FONT SIZE=2>> > dropped.</FONT>
</P>
</BODY>
</HTML>