[ewg] bug 1918 - openmpi broken due to rdma-cm changes

Sean Hefty sean.hefty at intel.com
Sun Feb 7 21:27:05 PST 2010


>Can you identify the source of the regression?  ie what was the change
>that broke things?

My understanding is that support for loopback addresses exposes an existing bug
in openmpi.  It tries to bind to 127.0.0.1, which now succeeds.  Openmpi passes
that address to a remote node for use in connections.

>I'm most concerned that there is another regression in 2.6.33, and if so
>I would like to try and avoid letting that get into the final release.

Unless we never support loopback addresses, openmpi will see a regression.  The
only other problem that I'm aware of for 2.6.33 is that the bind to a loopback
address will succeed, even though the RDMA device may not support loopback.
This is true for the Chelsio and Ammasso drivers.  Connections should still
fail, but the bind is basically useless in this case.  I will try to get a patch
for that tomorrow.

- Sean




More information about the ewg mailing list