[ewg] bug 1918 - openmpi broken due to rdma-cm changes

Steve Wise swise at opengridcomputing.com
Fri Feb 5 13:53:45 PST 2010


Jeff Squyres wrote:
> On Feb 5, 2010, at 4:14 PM, Jason Gunthorpe wrote:
>
>   
>> Well, I think you are right. This kind of change seems appropriate to
>> me for mainline, but OFED/RHEL should carry a responsibility to manage
>> an identified incompatibility, either patch their kernel, patch their
>> OMPI, or publish an errata. That is the role of a distribution.
>>     
>
> RHEL has said, multiple times, that they rely on OpenFabrics to do the Right Thing.  They don't do a lot of testing, validating, etc.
>
>   
>> Sounds like this is taken care for now anyhow, Sean's patch to remove
>> it for iwarp since it doesn't work today with any iwarp drivers does
>> obscure the problem.. But it does seem like rdma_cm mode for IB
>> networks will still be broken in OMPI with the new kernels.
>>     
>
> Correct.
>
> So why not back off putting this in the kernel that's coming out now now now?  Why not put it in *next* kernel?  (or even better, the one after that)
>
> Is there a rush / need to have this in *now*?
>
>   

There is still some inconsistency here.   Sean, you claimed binds to 
127.0.0.1 succeed in ofed-1.4 for IB devices.  If so, then folks running 
IB/openmpi/rdmacm should be seeing issues.  We need to dig a little more...





More information about the ewg mailing list