[openib-general] [PATCH] [ib_addr] generalize address to RDMA device translation

Caitlin Bestler caitlinb at broadcom.com
Tue Jan 3 10:46:40 PST 2006


openib-general-bounces at openib.org wrote:
> The following patch changes the ib_addr interface to make it
> more generic.
> The interface now translates IP addresses directly into
> source and destination device and broadcast addresses.  The
> CMA is then responsible for interpreting these addresses as
> GIDs/PKey or MAC addresses.  The intent is that this will
> simplify integrating support for other RDMA devices in the CMA.
> 

My understanding of these is that they should all translate
pretty much to nops for IP networks. But I'd like to review
that, and have defintions that make that clear.

There are times where fetching lower, say to fetch an
Ethernet MAC address of a VLAN ID, could violate layering
in an iWARP implmenetaiton and might have actual performance
hits to fetch data that was not naturally available to the
RDMA layer.

More in-line.

> I'd like to get some feedback from the iWarp community on
> whether this approach works for them, or if
> different/additional changes are needed.
> 
> Signed-off-by: Sean Hefty <sean.hefty at intel.com>
> 
> 
> Index: include/rdma/rdma_cm.h
> ===================================================================
> --- include/rdma/rdma_cm.h	(revision 4651)
> +++ include/rdma/rdma_cm.h	(working copy)
> @@ -68,9 +68,7 @@ struct rdma_addr {
>  	struct sockaddr dst_addr;
>  	u8		dst_pad[sizeof(struct sockaddr_in6) -
>  				sizeof(struct sockaddr)];
> -	union {
> -		struct ib_addr	ibaddr;
> -	} addr;
> +	struct rdma_dev_addr dev_addr;
>  };
>

"dev_addr" is any clarifying/lower layer address that matches
the sockaddr *and is required by the RDMA layer*.

If the rdma layer does not require this data the contents
may be logically void. This interface MUST NOT be used to
query for a deterministic lower layer address.

Rationale: if the RDMA layer is cleanly layered over IP
then the MAC layer address is not accessible to it without
doing additional queries. It is not a natural by-product
and should not be automatically fetched, at extra cost,
when most applications do not care.

 
>  struct rdma_route {
> Index: include/rdma/ib_addr.h
> ===================================================================
> --- include/rdma/ib_addr.h	(revision 4654)
> +++ include/rdma/ib_addr.h	(working copy)
> @@ -32,26 +32,28 @@
> 
>  #include <linux/in.h>
>  #include <linux/in6.h>
> +#include <linux/netdevice.h>
>  #include <linux/socket.h>
>  #include <rdma/ib_verbs.h>
> 
>  extern struct workqueue_struct *rdma_wq;
> 
> -struct ib_addr {
> -	union ib_gid	sgid;
> -	union ib_gid	dgid;
> -	u16		pkey;
> +struct rdma_dev_addr {
> +	unsigned char src_dev_addr[MAX_ADDR_LEN];
> +	unsigned char dst_dev_addr[MAX_ADDR_LEN];
> +	unsigned char broadcast[MAX_ADDR_LEN];
> +	enum ib_node_type dev_type;
>  };
> 
>  /**
> - * ib_translate_addr - Translate a local IP address to an
> Infiniband GID and
> - *   PKey.
> + * rdma_translate_ip - Translate a local IP address to an
> RDMA hardware
> + *   address.
>   */
> -int ib_translate_addr(struct sockaddr *addr, union ib_gid *gid, u16
> *pkey); +int rdma_translate_ip(struct sockaddr *addr, struct
> rdma_dev_addr +*dev_addr); 
> 
>  /**
> - * ib_resolve_addr - Resolve source and destination IP addresses to
> - *   Infiniband network addresses.
> + * rdma_resolve_ip - Resolve source and destination IP addresses to
> + *   RDMA hardware addresses.
>   * @src_addr: An optional source address to use in the
> resolution.  If a
>   *   source address is not provided, a usable address will
> be returned via
>   *   the callback.
> @@ -64,13 +66,13 @@ int ib_translate_addr(struct sockaddr *a
>   *   or been canceled.  A status of 0 indicates success.
>   * @context: User-specified context associated with the call.   */
> -int ib_resolve_addr(struct sockaddr *src_addr, struct
> sockaddr *dst_addr,
> -		    struct ib_addr *addr, int timeout_ms,
> +int rdma_resolve_ip(struct sockaddr *src_addr, struct
> sockaddr *dst_addr,
> +		    struct rdma_dev_addr *addr, int timeout_ms,
>  		    void (*callback)(int status, struct
> sockaddr *src_addr,
> -				     struct ib_addr *addr, void
> *context),
> +				     struct rdma_dev_addr
> *addr, void *context),
>  		    void *context);
> 
> -void ib_addr_cancel(struct ib_addr *addr);
> +void rdma_addr_cancel(struct rdma_dev_addr *addr);
> 
>  static inline int ip_addr_size(struct sockaddr *addr)  { @@
> -78,5 +80,38 @@ static inline int ip_addr_size(struct so
>  	       sizeof(struct sockaddr_in6) : sizeof(struct
> sockaddr_in);  }
> 
> +static inline u16 ib_addr_get_pkey(struct rdma_dev_addr *dev_addr) {
> +	return ((u16)dev_addr->broadcast[8] << 8) |
> +(u16)dev_addr->broadcast[9]; }
> +
> +static inline void ib_addr_set_pkey(struct rdma_dev_addr *dev_addr,
> u16 +pkey) {
> +	dev_addr->broadcast[8] = pkey >> 8;
> +	dev_addr->broadcast[9] = (unsigned char) pkey; }
> +

The closest IP equivalent of a pkey is the VLAN. It is dealt with
well below the transport layer and is not visible to the RDMA
layer at all. These routines should not attempt to define 
some sort of transport neutral PKEY/VLAN ID concept. That
is fabric discovery/configuration, not RDMA networking.
Transport neutral discovery/configuration is not something
we want to attempt.





More information about the general mailing list