<br><br>
<div><span class="gmail_quote">On 6/12/08, <b class="gmail_sendername">Hal Rosenstock</b> <<a href="mailto:hrosenstock@xsigo.com">hrosenstock@xsigo.com</a>> wrote:</span>
<blockquote class="gmail_quote" style="PADDING-LEFT: 1ex; MARGIN: 0px 0px 0px 0.8ex; BORDER-LEFT: #ccc 1px solid">On Thu, 2008-06-12 at 16:31 +0300, Olga Shern (Voltaire) wrote:<br>><br>><br>> On 6/12/08, Hal Rosenstock <<a href="mailto:hrosenstock@xsigo.com">hrosenstock@xsigo.com</a>> wrote:<br>
> On Thu, 2008-06-12 at 14:08 +0300, Olga Shern (Voltaire)<br>> wrote:<br>> ><br>> ><br>> > On 6/12/08, Hal Rosenstock <<a href="mailto:hrosenstock@xsigo.com">hrosenstock@xsigo.com</a>> wrote:<br>
> > Hi Olga,<br>> ><br>> > On Thu, 2008-06-12 at 09:46 +0300, Olga Shern wrote:<br>> > > Hi All,<br>> > ><br>> > ><br>
> > ><br>> > > We have found something that seems like Infiniband<br>> Spec<br>> > hole,<br>> ><br>> > What's the spec hole ?<br>
> ><br>> > According to the Infiniband spec - partial member cannot<br>> "talk" with<br>> > partial member only with full member.<br>> > Therefore if partial member sending MC packet - all other<br>
> partial<br>> > members of this partition will generate BAD PKEY trap.<br>> > It means that the behavior that we see is according to<br>> Infiniband<br>> > Spec - but very problematic<br>
><br>> Originally, multicast groups were all full member only and<br>> more recently<br>> was this extended to allow partial members and this was<br>> missed. A<br>> comment should be filed against the spec on this.<br>
><br>> > > This issue is system issue that prevents from<br>> partial P_Key<br>> > setup to<br>> > > go into production.<br>> ><br>
> > Indeed :-(<br>> ><br>> > > Short Setup & test description:<br>> > > ------------------------------------------<br>> > > * Node A: P_Key XXX (full member)<br>
> > > * Node B, C, D, E, F: P_Key XXx (partial member)<br>> > ><br>> > > 1. Send ping from B -> A : ping is OK<br>> > > 2. Send ping from C -> A : ping is OK<br>
> > > 3. Send ping from B -> C : no ping also OK<br>> > > * Get traps Bad P_Key in SM - from all HCA in the<br>> fabric<br>> > both for<br>
> > > test 1 & 2 (one time) and also for test 3 (all the<br>> time).<br>><br>> What does all the time mean ? Does this mean with one test 3<br>> ping, the<br>
> traps are repeated ? If so, at what rate ?<br>><br>> every ping will generate ARP that will generate BAD PKEY trap<br><br>OK; so what do you mean by one time v. all the time ? Is that really the<br>case ?<br>
<br>> > > Probably the ARP request that is MC traffic<br>> generate the<br>> > trap in HCA,<br>> > > for test 1<br>> > > & 2 we have only one ARP but for test 3 we send<br>
> ARP all the<br>> > time<br>> > > because<br>> > > we do not get any ARP reply.<br>> > ><br>> > > * The trap number SM get is 257 (HCA trap) if we<br>
> will do<br>> > P_Key<br>> > > switch enforcement we will probably get 259<br>> ><br>> > Is this with OpenSM or VSM ?<br>> ><br>
> > We tested it with Voltaire SM but it should behave the same<br>> with<br>> > OpenSM.<br>><br>> That's likely but I'm not sure yet.<br><br>Would you try this with OpenSM (and validate your theory about getting<br>
switch bad PKey traps v. end port bad PKey traps) or does VSM have such<br>a mode (ingress/egress partition filtering) ?<br><br>-- Hal</blockquote>
<div> </div>
<div>Yes, I will test it with OpenSM</div><br>
<blockquote class="gmail_quote" style="PADDING-LEFT: 1ex; MARGIN: 0px 0px 0px 0.8ex; BORDER-LEFT: #ccc 1px solid">> > -- Hal<br>> ><br>> > > * We get trap also from the originator of the MC<br>
> traffic<br>> > even<br>> > > though that receive switch relay error counter is<br>> increased<br>> > (when out<br>> > > port==in port), the switch does not drop the<br>
> packet ?<br>><br>> The implementation of that counter is broken and occurs<br>> "normally". The<br>> increment of this counter is relatively meaningless :-(<br>><br>
> > > Additional questions/issues:<br>> > > * Do we have a way to suppress port traps from<br>> SMA ?? i.e.<br>> > that<br>> > > the port will not generate traps that can "kill<br>
> the SM" - as<br>> > its look<br>> > > this is bug in the spec where we can't send any mc<br>> traffic<br>> > (even ARP)<br>
> > > when we have partial members and we do not have a<br>> way to<br>> > suppress the<br>> > > traps.<br>><br>> All the SM can do is TrapRepress.<br>
><br>> > > * What will happen in the HCA when we get many<br>> traps (mc<br>> > packets<br>> > > from many nodes) and they need to keep all events<br>
> until SM<br>> > will<br>> > > acknowledge? - Is there limitation in the number<br>> of on-<br>> > going<br>> > > traps (any HCA specific issues)?<br>
><br>> Assuming you mean events from which traps are generated, I<br>> think this is<br>> left as an implementation dependent detail in terms of the<br>> spec. An<br>> implementation needs to take care not to lose certain events;<br>
> others<br>> like this aren't critical but that's left to the specific SMA<br>> implementation.<br>><br>> -- Hal<br>><br>> > ><br>> > ><br>
> > ><br>> > ><br>> > > Best Regards<br>> > ><br>> > > Olga<br>> > ><br>
> > ><br>> > > _______________________________________________<br>> > > general mailing list<br>> > > <a href="mailto:general@lists.openfabrics.org">general@lists.openfabrics.org</a><br>
> > > <a href="http://lists.openfabrics.org/cgi-">http://lists.openfabrics.org/cgi-</a><br>> > bin/mailman/listinfo/general<br>> > ><br>> > > To unsubscribe, please visit<br>
> > <a href="http://openib.org/mailman/listinfo/openib-general">http://openib.org/mailman/listinfo/openib-general</a><br>> ><br>> > _______________________________________________<br>
> > general mailing list<br>> > <a href="mailto:general@lists.openfabrics.org">general@lists.openfabrics.org</a><br>> > <a href="http://lists.openfabrics.org/cgi-">http://lists.openfabrics.org/cgi-</a><br>
> bin/mailman/listinfo/general<br>> ><br>> > To unsubscribe, please visit<br>> > <a href="http://openib.org/mailman/listinfo/openib-general">http://openib.org/mailman/listinfo/openib-general</a><br>
> ><br>><br>><br><br></blockquote></div><br>