[ofa-general] mpi failures on large ia64/ofed/IB clusters
Roland Dreier
rdreier at cisco.com
Fri Oct 5 15:51:21 PDT 2007
> I don't really see anything racy in the FW command stuff, but it's
> possible that there's something like an mmiowb() missing somewhere (I
> have a hard time spotting that type of race for some reason).
Another possibility (independent of the hack I suggested before) would
be to add an mmiowb() before the mutex_unlock() in mthca_cmd_post().
I actually have a good feeling about this theory....
- R.
More information about the general
mailing list