[ewg] bug 1918 - openmpi broken due to rdma-cm changes

Steve Wise swise at opengridcomputing.com
Sat Feb 6 08:31:04 PST 2010


> rdma/cm: disallow loopback address for iwarp devices
>
> From: Sean Hefty <sean.hefty at intel.com>
>
> The current RDMA iWarp devices cannot be used to establish
> connections using the loopback address.  Prevent rdma_bind_addr
> from associating the loopback address with an iWarp device.
>
> This fixes an issue with openmpi, where it tries to identify which
> IP addresses map to RDMA devices by calling rdma_bind_addr on
> each address and seeing if the bind succeeds.  Prior to patch
> 6f8372b6 "RDMA/cm: fix loopback address support", this process
> worked.  But the rdma_cm now allows rdma_bind_addr to bind to an
> RDMA device using the loopback address, and attaches the rdma_cm_id
> to the RDMA device as part of the bind.
>
> Signed-off-by: Sean Hefty <sean.hefty at intel.com>
> ---
>
>  drivers/infiniband/core/cma.c |   14 ++++++++++----
>  1 files changed, 10 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
> index cc9b594..5850411 100644
> --- a/drivers/infiniband/core/cma.c
> +++ b/drivers/infiniband/core/cma.c
> @@ -1739,6 +1739,9 @@ err:
>  }
>  EXPORT_SYMBOL(rdma_resolve_route);
>  
> +/*
> + * Only IB devices support loopback connections.
> + */
>  static int cma_bind_loopback(struct rdma_id_private *id_priv)
>  {
>  	struct cma_device *cma_dev;
> @@ -1753,11 +1756,16 @@ static int cma_bind_loopback(struct rdma_id_private *id_priv)
>  		ret = -ENODEV;
>  		goto out;
>  	}
> -	list_for_each_entry(cma_dev, &dev_list, list)
> +	list_for_each_entry(cma_dev, &dev_list, list) {
> +		if (rdma_node_get_transport(cma_dev->device->node_type) !=
> +		    RDMA_TRANSPORT_IB)
> +			continue;
> +
>  		for (p = 1; p <= cma_dev->device->phys_port_cnt; ++p)
>  			if (!ib_query_port(cma_dev->device, p, &port_attr) &&
>  			    port_attr.state == IB_PORT_ACTIVE)
>  				goto port_found;
> +	}
>   

Here you need to:
                    ret = -ENODEV;
                    goto out;

instead of:
>  
>  	p = 1;
>  	cma_dev = list_entry(dev_list.next, struct cma_device, list);
>   

Otherwise it will still bind to the first device even if its iwarp...

With this mod, it works.

> @@ -1771,9 +1779,7 @@ port_found:
>  	if (ret)
>  		goto out;
>  
> -	id_priv->id.route.addr.dev_addr.dev_type =
> -		(rdma_node_get_transport(cma_dev->device->node_type) == RDMA_TRANSPORT_IB) ?
> -		ARPHRD_INFINIBAND : ARPHRD_ETHER;
> +	id_priv->id.route.addr.dev_addr.dev_type = ARPHRD_INFINIBAND;
>  
>  	rdma_addr_set_sgid(&id_priv->id.route.addr.dev_addr, &gid);
>  	ib_addr_set_pkey(&id_priv->id.route.addr.dev_addr, pkey);
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo at vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>   




More information about the ewg mailing list