[ofa-general] Multicast traffic generates Bad P_Key trap in SM when working in partial member setup

Yevgeny Kliteynik kliteyn at dev.mellanox.co.il
Thu Jun 12 05:58:42 PDT 2008


Hal Rosenstock wrote:
> On Thu, 2008-06-12 at 14:08 +0300, Olga Shern (Voltaire) wrote:
>>
>> On 6/12/08, Hal Rosenstock <hrosenstock at xsigo.com> wrote: 
>>         Hi Olga,
>>         
>>         On Thu, 2008-06-12 at 09:46 +0300, Olga Shern wrote:
>>         > Hi All,
>>         >
>>         >
>>         >
>>         > We have found something that seems like Infiniband Spec
>>         hole,
>>         
>>         What's the spec hole ?
>>  
>> According to the Infiniband spec - partial member cannot "talk" with
>> partial member only with full member.
>> Therefore if partial member sending MC packet - all other partial
>> members of this partition will generate BAD PKEY trap.
>>  It means that the behavior that we see is according to Infiniband
>> Spec - but very problematic
> 
> Originally, multicast groups were all full member only and more recently
> was this extended to allow partial members and this was missed. A
> comment should be filed against the spec on this.
> 
>>         > This issue is system issue that prevents from partial P_Key
>>         setup to
>>         > go into production.
>>         
>>         Indeed :-(
>>         
>>         > Short Setup & test description:
>>         > ------------------------------------------
>>         > * Node A: P_Key XXX (full member)
>>         > * Node B, C, D, E, F: P_Key XXx (partial member)
>>         >
>>         > 1. Send ping from B -> A : ping is OK
>>         > 2. Send ping from C -> A : ping is OK
>>         > 3. Send ping from B -> C  : no ping also OK
>>         > * Get traps Bad P_Key in SM - from all HCA in the fabric
>>         both for
>>         > test 1 & 2 (one time) and also for test 3 (all the time).
> 
> What does all the time mean ? Does this mean with one test 3 ping, the
> traps are repeated ? If so, at what rate ?

Also, why do the HCAs issue these traps? Is the pkey enforcement
on switch external ports is off? AFAIK, by default, OpenSM should
configure pkeys on switch ports that are connected to these HCAs,
so that partial member wouldn't get packet from another partial
member.

-- Yevgeny

>>         > Probably the ARP request that is MC traffic generate the
>>         trap in HCA,
>>         > for test 1
>>         > & 2 we have only one ARP but for test 3 we send ARP all the
>>         time
>>         > because
>>         > we do not get any ARP reply.
>>         >
>>         > * The trap number SM get is 257 (HCA trap) if we will do
>>         P_Key
>>         > switch enforcement we will probably get 259
>>         
>>         Is this with OpenSM or VSM ?
>>  
>> We tested it with Voltaire SM but it should behave the same with
>> OpenSM.
> 
> That's likely but I'm not sure yet.
> 
>>         -- Hal
>>         
>>         > * We get trap also from the originator of the MC traffic
>>         even
>>         > though that receive switch relay error counter is increased
>>         (when out
>>         > port==in port), the switch does not drop the packet ?
> 
> The implementation of that counter is broken and occurs "normally". The
> increment of this counter is relatively meaningless :-(
> 
>>         > Additional questions/issues:
>>         > * Do we have a way to suppress port traps from SMA ?? i.e.
>>         that
>>         > the port will not generate traps that can "kill the SM" - as
>>         its look
>>         > this is bug in the spec where we can't send any mc traffic
>>         (even ARP)
>>         > when we have partial members and we do not have a way to
>>         suppress the
>>         > traps.
> 
> All the SM can do is TrapRepress.
> 
>>         > * What will happen in the HCA when we get many traps (mc
>>         packets
>>         > from many nodes) and they need to keep all events until SM
>>         will
>>         > acknowledge?  - Is there limitation in the number of on-
>>         going
>>         > traps (any HCA specific issues)?
> 
> Assuming you mean events from which traps are generated, I think this is
> left as an implementation dependent detail in terms of the spec. An
> implementation needs to take care not to lose certain events; others
> like this aren't critical but that's left to the specific SMA
> implementation.
> 
> -- Hal
> 
>>         >
>>         >
>>         >
>>         >
>>         > Best Regards
>>         >
>>         > Olga
>>         >
>>         >
>>         > _______________________________________________
>>         > general mailing list
>>         > general at lists.openfabrics.org
>>         > http://lists.openfabrics.org/cgi-
>>         bin/mailman/listinfo/general
>>         >
>>         > To unsubscribe, please visit
>>         http://openib.org/mailman/listinfo/openib-general
>>         
>>         _______________________________________________
>>         general mailing list
>>         general at lists.openfabrics.org
>>         http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
>>         
>>         To unsubscribe, please visit
>>         http://openib.org/mailman/listinfo/openib-general
>>
> 
> _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
> 




More information about the general mailing list