[openib-general] [PATCH] [CM] add private data comparisonto match REQs with listens

Rimmer, Todd trimmer at silverstorm.com
Fri Dec 2 13:22:19 PST 2005

> -----Original Message-----
> From: Caitlin Bestler [mailto:caitlinb at broadcom.com]
> Sent: Friday, December 02, 2005 1:59 PM
> To: Sean Hefty
> Cc: openib-general at openib.org
> Subject: RE: [openib-general] [PATCH] [CM] add private data 
> comparisonto
> match REQs with listens
> openib-general-bounces at openib.org wrote:
> > Sean Hefty wrote:
> >> As an update: further testing revealed that there is an issue with
> >> this implementation that is also found in the original code.  The
> >> issue deals with how listen requests that rely on a data mask are
> >> inserted and located in the red/black tree.  I'm trying to come up
> >> with a fix for this. 
> > 
> > After researching into this, I'm coming to the conclusion
> > that there does not exist an efficient way to sort/search for
> > listens without adding some restrictions.
> > 
> > For example, a client listens on id1 with mask1.  A request
> > is matched with the listen if its serviceid & mask1 = id1.
> > If a second client listens on id2 with mask2, then a request
> > must check against both requests for a match, or until a
> > match is found.  There's no method that I can find that can
> > be used to filter checks that works in a generic fashion,
> > resulting in requests needing to walk a linear list of
> > listens.  There are several potential fixes for this, with
> > only a couple mentioned below.
> > 
> > One solution around this is to have the IB CM only listen on
> > service IDs, and remove the mask parameter from the API.
> > This requires SDP to change to only listen on ports that have a
> > listener. 
> > 
> > Another alternative is to restrict the type of masks that are
> > supported.  If masks are restricted to a series of most
> > significant bits, then the existing algorithm can be used.
> > For instance, we can support masks 0xFF00 and 0xFFF0, but not
> > 0x00FF or 0xFF0F.  This restriction would work for both SDP and the
> >   CMA. To be clear, the API could change from a mask to the 
> number of
> > bits to match. 
> > 
> > Matching on private data can either be done by clients, or
> > restrictions can be placed on it as well.  For private data,
> > I believe that a restriction that all listen requests on the
> > same service ID use the same mask is sufficient.
> > 
> > Hopefully this makes sense to people.  Thoughts?
> > 
> Just listen on the Service ID / Port and let the ULP sort them
> out by destination IP address.

On approach is to make the sort criteria of the tree dependent on a comparison function.

For example the sort could have a multi-faceted compare.

We solved this problem in our stack (which allows listen by SID, sender GUID, receiver Port, private data, etc) by the following set of functions.  These were called per red/black tree comparison (both inserts and searches used functions, potentially different).  I realize these would not be used exactly as given, but they can provide some ideas on how to do it.  ListenMap is the red/black tree our stack used to keep track of all listening CEPs in the system.

// ListenMap Key Compare functions
// Three functions are provided:
// CepListenAddrCompare - is used to insert cep entries into the ListenMap and
// is the primary key_compare function for the ListenMap
// CepReqAddrCompare - is used to search the ListenMap as part of processing
// an inbound REQ
// CepSidrReqAddrCompare - is used to search the ListenMap as part of
// processing an inbound SIDR_REQ
// To provide the maximum flexibilty, the key for a CEP bound address is
// sophisticated and allows wildcarded/optional fields.  This allows
// a listener to simply bind for all traffic of a given SID or to refine the
// scope by binding for traffic to/from specific addresses, or specific
// private data.  The QPN/EECN/CaGUID aspect is used to allow multiple
// outbound Peer Connects to still be considered unique.
// The result of this approach is very flexible CM bind.  The same SID
// can be used on different ports or between different node pairs for
// completely different meanings.  However a SID used between a given
// pair of nodes must be used for a single model (Listen, Peer, Sidr)
// In addition for Peer connects, each connect must have a unique
// Comparision allows for wildcarding in all but SID
// A value of 0 is a wildcard.  See ib_helper.h:WildcardGidCompare for
// the rules of GID comparision, which are more involved due to multiple Gid
// formats
//                                          Field is Used by models as follows:
// Coallating order is:                  Listen     Peer Connect   Sidr Register
// SID                                     Y             Y              Y
// local GID                             option          Y         future option
// local LID                             option          Y         future option
// QPN                                  wildcard         Y           wildcard
// EECN                                 wildcard         Y           wildcard
// CaGUID                               wildcard         Y           wildcard
// remote GID                            option          Y         future option
// remote LID                            option          Y         future option
// private data discriminator length     option        option         option
// private data discriminator value      option        option         option
// if bPeer is 0 for either CEP, the QPN, EECN and CaGUID are treated as a match
// FUTURE: add a sid masking option so can easily listen on a group
// of SIDs with 1 listen (such as if low bits of sid have a private meaning)
// FUTURE: add a pkey option so can easily listen on a partition
// FUTURE: for SIDR to support GID/LID they will have to come from the LRH
// and GRH headers to the CM mad.  local GID and lid could be used to merely
// select the local port number

// A qmap key_compare function to compare the bound address for
// two listener, SIDR or Peer Connect CEPs
// key1 - CEP1 pointer
// key2 - CEP2 pointer
// Returns:
// -1:  cep1 bind address < cep2 bind address
//      0:      cep1 bind address = cep2 bind address (accounting for wildcards)
//      1:      cep1 bind address > cep2 bind address
CepListenAddrCompare(uint64 key1, uint64 key2)
        IN CM_CEP_OBJECT* pCEP1 = (CM_CEP_OBJECT*)(uintn)key1;
        IN CM_CEP_OBJECT* pCEP2 = (CM_CEP_OBJECT*)(uintn)key2;
        int res;

        if (pCEP1->SID < pCEP2->SID)
                return -1;
        else if (pCEP1->SID > pCEP2->SID)
                return 1;
        res = WildcardGidCompare(&pCEP1->PrimaryPath.LocalGID, &pCEP2->PrimaryPath.LocalGID);
        if (res != 0)
                return res;
        res = WildcardCompareU64(pCEP1->PrimaryPath.LocalLID, pCEP2->PrimaryPath.LocalLID);
        if (res != 0)
                return res;
        if (pCEP1->bPeer && pCEP2->bPeer)
                res = CompareU64(pCEP1->LocalEndPoint.QPN, pCEP2->LocalEndPoint.QPN);
                if (res != 0)
                        return res;
                res = CompareU64(pCEP1->LocalEndPoint.EECN, pCEP2->LocalEndPoint.EECN);
                if (res != 0)
                        return res;
                res = CompareU64(pCEP1->LocalEndPoint.CaGUID, pCEP2->LocalEndPoint.CaGUID);
                if (res != 0)
                        return res;
        res = WildcardGidCompare(&pCEP1->PrimaryPath.RemoteGID, &pCEP2->PrimaryPath.RemoteGID);
        if (res != 0)
                return res;
        res = WildcardCompareU64(pCEP1->PrimaryPath.RemoteLID, pCEP2->PrimaryPath.RemoteLID);
        if (res != 0)
                return res;
        // a length of 0 matches any private data, so this too is a wildcard compare
        if (pCEP1->DiscriminatorLen == 0 || pCEP2->DiscriminatorLen == 0)
                return 0;
        res = CompareU64(pCEP1->DiscriminatorLen, pCEP2->DiscriminatorLen);
        if (res != 0)
                return res;
        res = MemoryCompare(pCEP1->Discriminator, pCEP2->Discriminator, pCEP1->DiscriminatorLen);
        return res;

// A qmap key_compare function to search the ListenMap for a match with
// a given REQ
// key1 - CEP pointer
// key2 - REQ pointer
// Returns:
// -1:  cep1 bind address < req remote address
//      0:      cep1 bind address = req remote address (accounting for wildcards)
//      1:      cep1 bind address > req remote address
// The QPN/EECN/CaGUID are not part of the search, hence multiple Peer Connects
// could be matched (and one which was started earliest should be then linearly
// searched for among the neighbors of the matching CEP)
CepReqAddrCompare(uint64 key1, uint64 key2)
        IN CM_CEP_OBJECT* pCEP = (CM_CEP_OBJECT*)(uintn)key1;
        IN CMM_REQ* pREQ = (CMM_REQ*)(uintn)key2;
        int res;

        if (pCEP->SID < pREQ->ServiceID)
                return -1;
        else if (pCEP->SID > pREQ->ServiceID)
                return 1;
        // local and remote is from perspective of sender (remote node in this
        // case, so we compare local to remote and visa versa
        res = WildcardGidCompare(&pCEP->PrimaryPath.LocalGID, &pREQ->PrimaryRemoteGID);
        if (res != 0)
                return res;
        res = WildcardCompareU64(pCEP->PrimaryPath.LocalLID, pREQ->PrimaryRemoteLID);
        if (res != 0)
                return res;
        // do not compare QPN/EECN/CaGUID
        res = WildcardGidCompare(&pCEP->PrimaryPath.RemoteGID, &pREQ->PrimaryLocalGID);
        if (res != 0)
                return res;
        res = WildcardCompareU64(pCEP->PrimaryPath.RemoteLID, pREQ->PrimaryLocalLID);
        if (res != 0)
                return res;
        // a length of 0 matches any private data, so this too is a wildcard compare
        if (pCEP->DiscriminatorLen == 0)
                return 0;
        res = MemoryCompare(pCEP->Discriminator, pREQ->PrivateData+pCEP->DiscrimPrivateDataOffset, pCEP->DiscriminatorLen);
        return res;

// A qmap key_compare function to search the ListenMap for a match with
// a given SIDR_REQ
// key1 - CEP pointer
// key2 - SIDR_REQ pointer
// Returns:
// -1:  cep1 bind address < cep2 bind address
//      0:      cep1 bind address = cep2 bind address (accounting for wildcards)
//      1:      cep1 bind address > cep2 bind address
// The QPN/EECN/CaGUID are not part of the search.
CepSidrReqAddrCompare(uint64 key1, uint64 key2)
        IN CM_CEP_OBJECT* pCEP = (CM_CEP_OBJECT*)(uintn)key1;
        IN CMM_SIDR_REQ* pSIDR_REQ = (CMM_SIDR_REQ*)(uintn)key2;
        int res;

        if (pCEP->SID < pSIDR_REQ->ServiceID)
                return -1;
        else if (pCEP->SID > pSIDR_REQ->ServiceID)
                return 1;
        // GID and LIDs are wildcarded/not available at this time
        // do not compare QPN/EECN/CaGUID
        // a length of 0 matches any private data, so this too is a wildcard compare
        if (pCEP->DiscriminatorLen == 0)
                return 0;
        res = MemoryCompare(pCEP->Discriminator, pSIDR_REQ->PrivateData+pCEP->DiscrimPrivateDataOffset, pCEP->DiscriminatorLen);
        return res;

/* non-Wildcarded compare of 2 64 bit values
 * Return:
 *      0 : v1 == v2
 *      -1: v1 < v2
 *      1 : v1 > v2
static __inline int
CompareU64(uint64 v1, uint64 v2)
        if (v1 == v2)
                return 0;
        else if (v1 < v2)
                return -1;
                return 1;

/* Wildcarded compare of 2 64 bit values
 * Return:
 *      0 : v1 == v2
 *      -1: v1 < v2
 *      1 : v1 > v2
 *      if v1 or v2 is 0, they are considered wildcards and match any value
static __inline int
WildcardCompareU64(uint64 v1, uint64 v2)
        if (v1 == 0 || v2 == 0 || v1 == v2)
                return 0;
        else if (v1 < v2)
                return -1;
                return 1;

/* Compare Gid1 to Gid2 (host byte order)
 * Return:
 *      0 : Gid1 == Gid2
 *      -1: Gid1 < Gid2
 *      1 : Gid1 > Gid2
 * This also allows for Wildcarded compare.
 * A MC Gid with the lower 56 bits all 0, will match any MC gid
 * A SubnetPrefix of 0 will match any top 64 bits of a non-MC gid
 * A InterfaceID of 0 will match any low 64 bits of a non-MC gid
 * Coallating order:
 *  non-MC Subnet Prefix (0 is wildcard and comes first)
 *  non-MC Interface ID (0 is wilcard and comes first)
 *      MC wildcard
 *      MC by value of low 56 bits (0 is wildcard and comes first)
static __inline int
WildcardGidCompare(IN const IB_GID* const pGid1, IN const IB_GID* const pGid2 )
        if (pGid1->Type.Multicast.s.FormatPrefix == IPV6_MULTICAST_PREFIX
                && pGid2->Type.Multicast.s.FormatPrefix == IPV6_MULTICAST_PREFIX)
                /* Multicast compare: compare low 120 bits, 120 bits of 0 is wildcard */
                uint64 h1 = pGid1->AsReg64s.H & ~IB_GID_MCAST_FORMAT_MASK_H;
                uint64 h2 = pGid2->AsReg64s.H & ~IB_GID_MCAST_FORMAT_MASK_H;
                /* check for 120 bits of wildcard */
                if ((h1 == 0 && pGid1->AsReg64s.L == 0)
                        || (h2 == 0 && pGid2->AsReg64s.L == 0))
                        return 0;
                } else if (h1 < h2) {
                        return -1;
                } else if (h1 > h2) {
                        return 1;
                } else {
                        return CompareU64(pGid1->AsReg64s.L, pGid1->AsReg64s.L);
        } else if (pGid1->Type.Multicast.s.FormatPrefix == IPV6_MULTICAST_PREFIX) {
                /* Gid1 is MC, Gid2 is other, treat MC as > others */
                return 1;
        } else if (pGid2->Type.Multicast.s.FormatPrefix == IPV6_MULTICAST_PREFIX) {
                /* Gid1 is other, Gid2 is MC, treat other as < MC */
                return -1;
        } else {
                /* Non-Multicast compare: compare high 64 bits */
                /* Note all other GID formats are essentially a prefix in upper */
                /* 64 bits and a identifier in the low 64 bits */
                /* so this covers link local, site local, global formats */
                int res = WildcardCompareU64(pGid1->AsReg64s.H, pGid2->AsReg64s.H);
                if (res == 0)
                        return WildcardCompareU64(pGid1->AsReg64s.L, pGid2->AsReg64s.L);
                } else {
                        return res;

More information about the general mailing list