[ofa-general] Re: [RFC] OpenSM and IPv6 Scalability Proposal

Sun Jun 8 08:02:51 PDT 2008

On Sat, 2008-06-07 at 23:44 -0600, Jason Gunthorpe wrote:
> On Sat, Jun 07, 2008 at 06:03:17AM -0700, Hal Rosenstock wrote:
> 
> > 2. Add in support for overloading MLIDs. On the configuration side, a
> > number of additional options would be added to consolidate_ipv6_snm_req.
> > These include the number of MLIDs to compress down to (default 16),
> > a multicast group (MGID) base address and (full MGID) mask. this would
> > default to 0xff1Z601bXXXX0000 : 0x00000001ffYYYYYY where Z is the scope,
> > XXXX is the P_Key, and YYYYYY is the last 24 bits of the port guid (
> > the YYYYYY bits would be masked out by default). This is what the
> > current workaround uses for collapsing the multicast groups.
> 
> When I wrote originally I was imagining having a list, like:
> 
>        MGID/PREFIX MLIDS
> ff11:601b::/32      4
> ff1e:601b::/32      4
> ff00::/8            128     # Everything else
> 
> Please, please use IPv6 notation for this (and all other GIDs). Since
> the addressing scheme is taken from IPv6 the interesting bits align
> sensibly.. The double 64 bit number notation used in osm is horrible,
> and should be killed off :) 

This is a better approach with the one caveat below:

The only issues with the prefix length are the scope and PKey fields
potentially making this non contiguous but I think the syntax can be
extended for this if needed.

In terms of IB routers, if an MLID was overloaded, wouldn't they filter
on scope as well (link local scope would be filtered), right ?

> inet_pton/inet_ntop w/ AF_INET6 can be used, and is compatible with
> the existing GID structure.
> 
> I would recommend a longest prefix bits scheme rather than a mask for
> administator sanity. The IPv6 addressing scheme that MGID/GID borrows
> from was designed to work like this, and administrators will be
> familiar with the concept. If you feel the complexity of non-prefix
> matching is needed then it is easier on administrators to copy the
> existing idom used in ACLs and use a two level label/property scheme:
> 
> Table 1:
>        MGID/PREFIX  GROUP NAME
> ff11:601b::/32      IPV6-SNM
> ff1e:601b::/32      IPV6-SNM
> ff1e:601b:89:/48    IPV6-SNM-PKEY_89
> ff00::/8            OTHERS
> 
> Table 2:
>        NAME        MLIDS
>   IPV6-SNM         4
>   IPV6_SNM-PKEY_89 128    
>   OTHERS           128
> 
> But I think that is probably overkill, unless you know of other heavy
> multicast users besides SNM.

I'm unaware of any other "applications" which might "require" this but I
don't think there's much extra cost in the flexibility provided to
accomodate these.

> Also note that a longest prefix match data structure can likely be
> re-used for future work on IB routing, along with this use for MGID
> compaction.
> 
> > The criteria for overloading MLIDs includes any group parameters that 
> > need to be in common (e.g. rate. MTU, perhaps PKey (see below), etc.).
> 
> It seems reasonable to me that every MTU+rate gets a seperate
> allocation space, each limited to the configured size.
> 
> > As PKey is part of the MGID, does this need to be addressed (and if
> > so) how ?
> 
> I don't think pkey needs special treatment - prefix matching takes care
> of it. If you feel seperate allocations for each pkey may someday be
> necessary then having a table scheme rather than a single value takes
> care of it nicely.

So you think it's "safe" to ignore the related IBA compliance ? It does
look to me like that should be changed. I think that was also Roland's
take a while ago.

-- Hal

> Regards,
> Jason