[ofa-general] mpi failures on large ia64/ofed/IB clusters

Roland Dreier rdreier at cisco.com
Fri Oct 5 15:51:21 PDT 2007


 > I don't really see anything racy in the FW command stuff, but it's
 > possible that there's something like an mmiowb() missing somewhere (I
 > have a hard time spotting that type of race for some reason).

Another possibility (independent of the hack I suggested before) would
be to add an mmiowb() before the mutex_unlock() in mthca_cmd_post().

I actually have a good feeling about this theory....

 - R.



More information about the general mailing list