[ofa-general] [RFC] OpenSM and IPv6 Scalability Proposal

Hal Rosenstock hrosenstock at xsigo.com
Sat Jun 7 06:03:17 PDT 2008


There have been several discussions on SM issues with IPv6 solicited
node multicast (SNM) scalability. There was a thread entitled
"IPv6 and IPoIB scalability issue"
(http://lists.openfabrics.org/pipermail/general/2006-November/029621.html)
and a couple of subsequent threads on a workaround.

It is proposed here to remove the workaround and replace it with a complete
solution for this issue.

PROBLEM STATEMENT
The primary issue is that IPv6 SNM traded off using separate multicast
groups (rather than broadcast) for performing neighbor discovery (ND).
SM's that utilize a simple scheme of a 1:1 mapping of multicast group (MGID)
to multicast LID (MLID) consume too many MLIDs in large clusters.

CURRENT DESIGN
There is a current workaround in OpenSM for this is an option called
consolidate_ipv6_snm_req. This workaround attempts to compress the 
IPv6 SNM groups to 1 MLID. Limitations of this workaround have been
discussed on this list previously.

Underlying this, the current OpenSM design assumes a 1:1 mapping of
multicast group to MLID. It currently utilizes a "quick" map, which
is a red/black tree, supporting up through 64 bit keys. Unfortunately,
multicast group (MGID) is a 128 bit key.

PROPOSED APPROACH
The approach is in 2 steps:

1. Change the current underlying multicast tree from being MLID based 
to MGID based. This involves using fleximap rather than qmap. The
downside of this is that MLID lookups will be slower as now they
are not as "direct" as the MLID will no longer be the key in the map.
Rather than searching by MLID key, the tree will need to be scanned
entry by entry for MLID matches. It's unclear how much this will slow down
MLID searches but it is thought that none of these searches are
time critical (and shouldn't cause any existing timeouts to "pop").

2. Add in support for overloading MLIDs. On the configuration side, a
number of additional options would be added to consolidate_ipv6_snm_req.
These include the number of MLIDs to compress down to (default 16),
a multicast group (MGID) base address and (full MGID) mask. this would
default to 0xff1Z601bXXXX0000 : 0x00000001ffYYYYYY where Z is the scope,
XXXX is the P_Key, and YYYYYY is the last 24 bits of the port guid (
the YYYYYY bits would be masked out by default). This is what the
current workaround uses for collapsing the multicast groups.

The criteria for overloading MLIDs includes any group parameters that 
need to be in common (e.g. rate. MTU, perhaps PKey (see below), etc.).

Aside from changing the underlying implementation of MLID searches,
multicast group deletion wll need another check when there are no ports
left in a group. If that group is on a compressed MLID (this part of the
check is an optimization), then the multicast group tree needs to be checked
to ensure there are no other groups sharing that MLID. 

IBA 1.2.1 v1 p.151 4.1.3 Local Identifiers item 10) states:
"When a multicast LID is overloaded, the multicast groups
sharing the same MLID must have the same P_Key. This simplification
is required to allow switches and routers that implement optional
P_Key enforcement for multicast operations."
This is part of the C4-5 compliance.
 
OPEN ISSUE
As PKey is part of the MGID, does this need to be addressed (and if
so) how ?

More on the above as I get further.

If the approach above seems reasonable, I will work on such a set of patches.

Comments ? Thoughts ?

-- Hal




More information about the general mailing list