[ewg] [PATCH 1/2] fmr_pool flush serials can get out of sync
Roland Dreier
rdreier at cisco.com
Wed Jan 16 13:54:52 PST 2008
> Normally, the serial numbers for flush requests and flushes
> executed should be in sync.
>
> When we decide to flush dirty MRs because there are too many of them, we
> wake up the cleanup thread and let it do its stuff. As a side effect, it
> increments pool->flush_ser, which leaves it one higher than req_ser. The
> next time the user calls ib_flush_fmr_pool, it will wake up the cleanup
> thread, but won't wait for the flush to complete. This can cause memory
> corruption, as the user expects the flush to have taken place.
Thanks, good catch, and I applied this (except I removed the BUG_ON,
since I don't think killing the system with minimal info available on
how the counts got out of sync is that useful...)
However I'm a little puzzled about how this can lead to memory
corruption in practice: the only thing that flushing FMRs should do is
make memory keys that should no longer be in use anyway become
invalid. So the only effect of this fix should be to expose a bug in
your ULP by having some RDMA operation complete with a protection
error -- and you're not relying on that behavior in normal operation,
are you? What am I missing?
- R.
More information about the ewg
mailing list