[ewg] OFED-1.5.1 failure over iWarp

Eli Cohen eli at dev.mellanox.co.il
Wed Feb 3 13:31:27 PST 2010


On Wed, Feb 03, 2010 at 03:10:40PM -0600, Steve Wise wrote:
> Eli Cohen wrote:
> >On Wed, Feb 03, 2010 at 02:28:05PM -0600, Steve Wise wrote:
> >>Here is the patched cma_acquire_dev() function.  Where does it
> >>"build the gid in the pre rocee patches fashion and search again"
> >>for the iwarp case?  Maybe I'm missing it?
> >>
> >>---------------
> >>static int cma_acquire_dev(struct rdma_id_private *id_priv)
> >>{
> >>       struct rdma_dev_addr *dev_addr = &id_priv->id.route.addr.dev_addr;
> >>       struct cma_device *cma_dev;
> >>       union ib_gid gid;
> >>       int ret = -ENODEV;
> >>
> >>       if (dev_addr->dev_type != ARPHRD_INFINIBAND) {
> >>               rocee_addr_get_sgid(dev_addr, &gid);
> >>               list_for_each_entry(cma_dev, &dev_list, list) {
> >>                       ret = ib_find_cached_gid(cma_dev->device, &gid,
> >>                                                &id_priv->id.port_num,
> >>NULL);
> >>                       if (!ret)
> >>                               break;
> >>               }
> >>       } else {
> >
> >here it is - it's the memcpy below:
> >
> How does it get here if it was already in the above block?  IE it
> won't fall into this block, right?

Oops, you're right.

Please try this one:

commit 483fe703b03b1db99fa4a968fc3a918aa43f856f
Author: Eli Cohen <eli at mellanox.co.il>
Date:   Wed Feb 3 13:10:14 2010 +0200

    CMA: Fix iWarp failures to bind to a device
    
    rdma_addr_get_sgid() relies on dev_addr->transport to retrieve the correct GID
    based on the hardware address. However, when called from cma_acquire_dev(), the
    transport field is not yet valid. The solution is to avoid calling
    rdma_addr_get_sgid() from cma_acquire_dev() and find the device based on it's
    GID: for ethernet, assume first it is rocee and search the GID table, if not
    found generate the GID by copying it from the hardware address.
    
    Signed-off-by: Eli Cohen <eli at mellanox.co.il>

diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index a2d5aad..3c5c59f 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -348,15 +348,29 @@ static int cma_acquire_dev(struct rdma_id_private *id_priv)
 	union ib_gid gid;
 	int ret = -ENODEV;
 
-	rdma_addr_get_sgid(dev_addr, &gid);
+	if (dev_addr->dev_type != ARPHRD_INFINIBAND) {
+		rocee_addr_get_sgid(dev_addr, &gid);
+		list_for_each_entry(cma_dev, &dev_list, list) {
+			ret = ib_find_cached_gid(cma_dev->device, &gid,
+						 &id_priv->id.port_num, NULL);
+			if (!ret)
+				goto out;
+		}
+	}
+
+	memcpy(&gid, dev_addr->src_dev_addr +
+	       rdma_addr_gid_offset(dev_addr), sizeof gid);
 	list_for_each_entry(cma_dev, &dev_list, list) {
 		ret = ib_find_cached_gid(cma_dev->device, &gid,
 					 &id_priv->id.port_num, NULL);
-		if (!ret) {
-			cma_attach_to_dev(id_priv, cma_dev);
+		if (!ret)
 			break;
-		}
 	}
+
+out:
+	if (!ret)
+		cma_attach_to_dev(id_priv, cma_dev);
+
 	return ret;
 }
 

> 
> >>               memcpy(&gid, dev_addr->src_dev_addr +
> >>                      rdma_addr_gid_offset(dev_addr), sizeof gid);
> >>               list_for_each_entry(cma_dev, &dev_list, list) {
> >>                       ret = ib_find_cached_gid(cma_dev->device, &gid,
> >>                                                &id_priv->id.port_num,
> >>NULL);
> >>                       if (!ret)
> >>                               break;
> >>               }
> >>       }
> >>
> >>       if (!ret)
> >>               cma_attach_to_dev(id_priv, cma_dev);
> >>
> >>       return ret;
> >>}
> >>----------------
> >>
> >>
> >>
> >>Eli Cohen wrote:
> >>>On Wed, Feb 03, 2010 at 09:20:05AM -0600, Steve Wise wrote:
> >>>>>diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
> >>>>>index a2d5aad..76dce2b 100644
> >>>>>--- a/drivers/infiniband/core/cma.c
> >>>>>+++ b/drivers/infiniband/core/cma.c
> >>>>>@@ -348,15 +348,28 @@ static int cma_acquire_dev(struct rdma_id_private *id_priv)
> >>>>>	union ib_gid gid;
> >>>>>	int ret = -ENODEV;
> >>>>>-	rdma_addr_get_sgid(dev_addr, &gid);
> >>>>>-	list_for_each_entry(cma_dev, &dev_list, list) {
> >>>>>-		ret = ib_find_cached_gid(cma_dev->device, &gid,
> >>>>>-					 &id_priv->id.port_num, NULL);
> >>>>>-		if (!ret) {
> >>>>>-			cma_attach_to_dev(id_priv, cma_dev);
> >>>>>-			break;
> >>>>>+	if (dev_addr->dev_type != ARPHRD_INFINIBAND) {
> >>>>>+		rocee_addr_get_sgid(dev_addr, &gid);
> >>>>>+		list_for_each_entry(cma_dev, &dev_list, list) {
> >>>>>+			ret = ib_find_cached_gid(cma_dev->device, &gid,
> >>>>>+						 &id_priv->id.port_num, NULL);
> >>>>>+			if (!ret)
> >>>>>+				break;
> >>>>>+		}
> >>>>The above if statement is true for iwarp devices, so this patch is
> >>>>just wrong.   rocee__addr_get_sgid() should only be used for ROCEE
> >>>>interfaces, correct?
> >>>No, the idea is this: for non ARPHRD_INFINIBAND devices (e.g. rocee or
> >>>iwarp) I assume first this rocee, get the rocee gid, and check if this
> >>>gid appears in any device's gid table. It the mac address belongs to a
> >>>rocee device then it will be found; if it belongs to an iwarp device
> >>>then it won't be found. In the later case I build the gid in the pre
> >>>rocee patches fashion and search again.
> >>>>>+	} else {
> >>>>>+		memcpy(&gid, dev_addr->src_dev_addr +
> >>>>>+		       rdma_addr_gid_offset(dev_addr), sizeof gid);
> >>>>>+		list_for_each_entry(cma_dev, &dev_list, list) {
> >>>>>+			ret = ib_find_cached_gid(cma_dev->device, &gid,
> >>>>>+						 &id_priv->id.port_num, NULL);
> >>>>>+			if (!ret)
> >>>>>+				break;
> >>>>>		}
> >>>>>	}
> >>>>>+
> >>>>>+	if (!ret)
> >>>>>+		cma_attach_to_dev(id_priv, cma_dev);
> >>>>>+
> >>>>>	return ret;
> >>>>>}



More information about the ewg mailing list