<br><br>
<div><span class="gmail_quote">On 6/12/08, <b class="gmail_sendername">Hal Rosenstock</b> <<a href="mailto:hrosenstock@xsigo.com">hrosenstock@xsigo.com</a>> wrote:</span>
<blockquote class="gmail_quote" style="PADDING-LEFT: 1ex; MARGIN: 0px 0px 0px 0.8ex; BORDER-LEFT: #ccc 1px solid">On Thu, 2008-06-12 at 14:08 +0300, Olga Shern (Voltaire) wrote:<br>><br>><br>> On 6/12/08, Hal Rosenstock <<a href="mailto:hrosenstock@xsigo.com">hrosenstock@xsigo.com</a>> wrote:<br>
> Hi Olga,<br>><br>> On Thu, 2008-06-12 at 09:46 +0300, Olga Shern wrote:<br>> > Hi All,<br>> ><br>> ><br>> ><br>> > We have found something that seems like Infiniband Spec<br>
> hole,<br>><br>> What's the spec hole ?<br>><br>> According to the Infiniband spec - partial member cannot "talk" with<br>> partial member only with full member.<br>> Therefore if partial member sending MC packet - all other partial<br>
> members of this partition will generate BAD PKEY trap.<br>> It means that the behavior that we see is according to Infiniband<br>> Spec - but very problematic<br><br>Originally, multicast groups were all full member only and more recently<br>
was this extended to allow partial members and this was missed. A<br>comment should be filed against the spec on this.<br><br>> > This issue is system issue that prevents from partial P_Key<br>> setup to<br>
> > go into production.<br>><br>> Indeed :-(<br>><br>> > Short Setup & test description:<br>> > ------------------------------------------<br>> > * Node A: P_Key XXX (full member)<br>
> > * Node B, C, D, E, F: P_Key XXx (partial member)<br>> ><br>> > 1. Send ping from B -> A : ping is OK<br>> > 2. Send ping from C -> A : ping is OK<br>> > 3. Send ping from B -> C : no ping also OK<br>
> > * Get traps Bad P_Key in SM - from all HCA in the fabric<br>> both for<br>> > test 1 & 2 (one time) and also for test 3 (all the time).<br><br>What does all the time mean ? Does this mean with one test 3 ping, the<br>
traps are repeated ? If so, at what rate ?</blockquote>
<div> </div>
<div>every ping will generate ARP that will generate BAD PKEY trap</div><br>
<blockquote class="gmail_quote" style="PADDING-LEFT: 1ex; MARGIN: 0px 0px 0px 0.8ex; BORDER-LEFT: #ccc 1px solid">> > Probably the ARP request that is MC traffic generate the<br>> trap in HCA,<br>
> > for test 1<br>> > & 2 we have only one ARP but for test 3 we send ARP all the<br>> time<br>> > because<br>> > we do not get any ARP reply.<br>> ><br>
> > * The trap number SM get is 257 (HCA trap) if we will do<br>> P_Key<br>> > switch enforcement we will probably get 259<br>><br>> Is this with OpenSM or VSM ?<br>><br>
> We tested it with Voltaire SM but it should behave the same with<br>> OpenSM.<br><br>That's likely but I'm not sure yet.<br><br>> -- Hal<br>><br>> > * We get trap also from the originator of the MC traffic<br>
> even<br>> > though that receive switch relay error counter is increased<br>> (when out<br>> > port==in port), the switch does not drop the packet ?<br><br>The implementation of that counter is broken and occurs "normally". The<br>
increment of this counter is relatively meaningless :-(<br><br>> > Additional questions/issues:<br>> > * Do we have a way to suppress port traps from SMA ?? i.e.<br>> that<br>> > the port will not generate traps that can "kill the SM" - as<br>
> its look<br>> > this is bug in the spec where we can't send any mc traffic<br>> (even ARP)<br>> > when we have partial members and we do not have a way to<br>> suppress the<br>
> > traps.<br><br>All the SM can do is TrapRepress.<br><br>> > * What will happen in the HCA when we get many traps (mc<br>> packets<br>> > from many nodes) and they need to keep all events until SM<br>
> will<br>> > acknowledge? - Is there limitation in the number of on-<br>> going<br>> > traps (any HCA specific issues)?<br><br>Assuming you mean events from which traps are generated, I think this is<br>
left as an implementation dependent detail in terms of the spec. An<br>implementation needs to take care not to lose certain events; others<br>like this aren't critical but that's left to the specific SMA<br>implementation.<br>
<br>-- Hal<br><br>> ><br>> ><br>> ><br>> ><br>> > Best Regards<br>> ><br>> > Olga<br>> ><br>> ><br>
> > _______________________________________________<br>> > general mailing list<br>> > <a href="mailto:general@lists.openfabrics.org">general@lists.openfabrics.org</a><br>> > <a href="http://lists.openfabrics.org/cgi-">http://lists.openfabrics.org/cgi-</a><br>
> bin/mailman/listinfo/general<br>> ><br>> > To unsubscribe, please visit<br>> <a href="http://openib.org/mailman/listinfo/openib-general">http://openib.org/mailman/listinfo/openib-general</a><br>
><br>> _______________________________________________<br>> general mailing list<br>> <a href="mailto:general@lists.openfabrics.org">general@lists.openfabrics.org</a><br>> <a href="http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general">http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general</a><br>
><br>> To unsubscribe, please visit<br>> <a href="http://openib.org/mailman/listinfo/openib-general">http://openib.org/mailman/listinfo/openib-general</a><br>><br><br></blockquote></div><br>