[ofa-general] Multicast traffic generates Bad P_Key trap in SM when working in partial member setup

Olga Shern (Voltaire) olga.shern at gmail.com
Thu Jun 12 04:08:47 PDT 2008


On 6/12/08, Hal Rosenstock <hrosenstock at xsigo.com> wrote:
>
> Hi Olga,
>
> On Thu, 2008-06-12 at 09:46 +0300, Olga Shern wrote:
> > Hi All,
> >
> >
> >
> > We have found something that seems like Infiniband Spec hole,
>
> What's the spec hole ?


According to the Infiniband spec - partial member cannot "talk" with partial
member only with full member.
Therefore if partial member sending MC packet - all other partial members of
this partition will generate BAD PKEY trap.
 It means that the behavior that we see is according to Infiniband Spec -
but very problematic

> This issue is system issue that prevents from partial P_Key setup to
> > go into production.
>
> Indeed :-(
>
> > Short Setup & test description:
> > ------------------------------------------
> > * Node A: P_Key XXX (full member)
> > * Node B, C, D, E, F: P_Key XXx (partial member)
> >
> > 1. Send ping from B -> A : ping is OK
> > 2. Send ping from C -> A : ping is OK
> > 3. Send ping from B -> C  : no ping also OK
> > * Get traps Bad P_Key in SM - from all HCA in the fabric both for
> > test 1 & 2 (one time) and also for test 3 (all the time).
> >
> > Probably the ARP request that is MC traffic generate the trap in HCA,
> > for test 1
> > & 2 we have only one ARP but for test 3 we send ARP all the time
> > because
> > we do not get any ARP reply.
> >
> > * The trap number SM get is 257 (HCA trap) if we will do P_Key
> > switch enforcement we will probably get 259
>
> Is this with OpenSM or VSM ?


We tested it with Voltaire SM but it should behave the same with OpenSM.

-- Hal
>
> > * We get trap also from the originator of the MC traffic even
> > though that receive switch relay error counter is increased (when out
> > port==in port), the switch does not drop the packet ?
> >
> > Additional questions/issues:
> > * Do we have a way to suppress port traps from SMA ?? i.e. that
> > the port will not generate traps that can "kill the SM" - as its look
> > this is bug in the spec where we can't send any mc traffic (even ARP)
> > when we have partial members and we do not have a way to suppress the
> > traps.
> >
> >
> > * What will happen in the HCA when we get many traps (mc packets
> > from many nodes) and they need to keep all events until SM will
> > acknowledge?  - Is there limitation in the number of on-going
> > traps (any HCA specific issues)?
> >
> >
> >
> >
> >
> > Best Regards
> >
> > Olga
> >
> >
> > _______________________________________________
> > general mailing list
> > general at lists.openfabrics.org
> > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> >
> > To unsubscribe, please visit
> http://openib.org/mailman/listinfo/openib-general
>
> _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
>
> To unsubscribe, please visit
> http://openib.org/mailman/listinfo/openib-general
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080612/19406da1/attachment.html>


More information about the general mailing list