[ewg] bug 1918 - openmpi broken due to rdma-cm changes
Steve Wise
swise at opengridcomputing.com
Fri Feb 5 08:45:08 PST 2010
> I agree that we should probably not allow 127.0.0.1 binds in
> ofed-1.5.1 at all because it regresses OpenMPI. Even with IB systems,
> if the bind to 127.0.0.1 succeeds, then OpenMPI assumes 127.0.0.1 is
> bound to that rdma interface and advertises this address to its peer
> as an address to-which that peer can rdma connect! This will break IB
> clusters too, not just T3/iWARP cluster. While I think OpenMPI needs
> to skip 127.0.0.1 in its logic, I think we should probably defer
> allowing 127.0.0.1 binds until ofed-1.6.
>
> But Jeff, note that if someone uses the upstream kernel and OpenMPI,
> its busted...
>
> So I recommend:
>
> 1) Don't allow 127.0.0.1 binds in ofed-1.5.1
>
> 2) Fix OpenMPI ASAP to never advertise 127.0.0.1 as a valid rdma-cm
> connect address (get it in ofed-1.5.2 or ofed-1.6).
Also, there is a good argument for never allowing 127.0.0.1 for rdma
anyway. It implies a _software_ loopback. It should NEVER be bound to
a real NIC interface and thus rdma binds shouldn't be allowed to it
since there is no software rdma loopback support...
Unless someone implements software rdma loobpack... ;)
More information about the ewg
mailing list