[ewg] rdma/cm: disallow loopback address for iwarp devices

Jeff Squyres jsquyres at cisco.com
Mon Feb 8 13:56:23 PST 2010


Sorry -- I missed many of these mails today due to mail filtering (don't ask).

FWIW:

- I'm not opposed to adding LOOPBACK checks into OMPI to avoid this problem (I'm waiting for a patch, actually).  I'm just saying that we're not going to get a release out immediately with this fix.  Our next release was scheduled to be 1.4.2, and it is still at least several weeks away.  So allowing this in 2.6.33 would be Bad because a) we know it breaks OMPI, and b) OMPI can't get a release out immediately to fix the issue.

- There are customers who are using RDMA CM with IB (e.g., Sandia with their Mesh/IB routing stuff).

- I see the following in rdma_bind_addr(3):

-----
DESCRIPTION
       Associates a source address with an rdma_cm_id.  The  address  may  be
       wildcarded.   If  binding  to a specific local address, the rdma_cm_id
       will also be bound to a local RDMA device.
-----

What RDMA device is bound to when you use 127.0.0.1?  I'm not 100% sure, but I think that this might be where we got the rationale that we didn't need additional LOOPBACK tests in OMPI...  (if anyone else agrees with this interpretation, then it's at least one argument that allowing binding to LOOPBACK devices *is* a change in semantics, and therefore should be treated extremely carefully)


On Feb 8, 2010, at 4:16 PM, Steve Wise wrote:

> 
> Sean Hefty wrote:
> >> IMO 127.0.0.1 should be for SW loopback, not HW RDMA loopback.
> >>    
> >
> > I disagree, but what does it matter?  So, we add a 'software' loopback that uses
> > 127.0.0.1.  Openmpi still wouldn't work.
> >
> >  
> 
> I guess that's true.
> 
> >> I will commit to get the fix in openmpi asap.
> >>    
> >
> > If we don't care if the fix is in the kernel or user space, then we could add an
> > a 'disable-loopback-support' build option to librdmacm, which can fail any
> > attempt to bind to a loopback address.
> >
> >  
> 
> I'd rather see it removed from 2.6.33 kernel before it shipts, and then
> we fix openmpi, and then re-submit 127.0.0.1 support once openmpi
> publishes a release with its fix.  See my other email that submits a
> potential commit to remove 127.0.0.1 support for 2.6.33.
> 
> Steve.
> 


-- 
Jeff Squyres
jsquyres at cisco.com

For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/




More information about the ewg mailing list