[ewg] bug 1918 - openmpi broken due to rdma-cm changes

Steve Wise swise at opengridcomputing.com
Fri Feb 5 08:45:08 PST 2010


> I agree that we should probably not allow 127.0.0.1 binds in 
> ofed-1.5.1 at all because it regresses OpenMPI.  Even with IB systems, 
> if the bind to 127.0.0.1 succeeds, then OpenMPI assumes 127.0.0.1 is 
> bound to that rdma interface and advertises this address to its peer 
> as an address to-which that peer can rdma connect!  This will break IB 
> clusters too, not just T3/iWARP cluster.   While I think OpenMPI needs 
> to skip 127.0.0.1 in its logic, I think we should probably defer 
> allowing 127.0.0.1 binds until ofed-1.6.
>
> But Jeff, note that if someone uses the upstream kernel and OpenMPI, 
> its busted...
>
> So I recommend:
>
> 1) Don't allow 127.0.0.1 binds in ofed-1.5.1
>
> 2) Fix OpenMPI ASAP to never advertise 127.0.0.1 as a valid rdma-cm 
> connect address (get it in ofed-1.5.2 or ofed-1.6).

Also, there is a good argument for never allowing 127.0.0.1 for rdma 
anyway.  It implies a _software_ loopback.  It should NEVER be bound to 
a real NIC interface and thus rdma binds shouldn't be allowed to it 
since there is no software rdma loopback support...

Unless someone implements software rdma loobpack...  ;)





More information about the ewg mailing list