[ewg] [PATCH 1/2] fmr_pool flush serials can get out of sync

Olaf Kirch olaf.kirch at oracle.com
Wed Jan 16 23:38:28 PST 2008


Hi Roland,

On Wednesday 16 January 2008 22:54, Roland Dreier wrote:
> However I'm a little puzzled about how this can lead to memory
> corruption in practice: the only thing that flushing FMRs should do is
> make memory keys that should no longer be in use anyway become
> invalid.  So the only effect of this fix should be to expose a bug in
> your ULP by having some RDMA operation complete with a protection
> error -- and you're not relying on that behavior in normal operation,
> are you?  What am I missing?

The corruption happened when the process that allocated the MRs went
away in the middle of the operation. We would free the MR and invalidate
- and expect the in flight RDMA to error out. RDS does not know who is
doing RDMA to or from a MR at any given time.

There is a second potential issue however.

When RDS performs an RDMA, the initiator will queue two work requests -
one for the actual RDMA, immediately followed by a normal SEND with
a RDS packet. When the consumer sees that RDS packet, it will
release the MR to which the RDMA was directed.

Is that a safe thing to do? I found the spec a little unclear on
the ordering rules. It *seems* that RDMA writes are always fencing
against subsequent operations, and RDMA reads will fence if we ask
for it. But I'm not perfectly sure whether the ordering applies
to the sending system only, or if IB also guarantees that the
RDMA will have completed when it puts the incoming message on
the completion queue at the consumer.

If there is no such guarantee, then we have a second potential issue
in RDS wrt RDMA and memory corruption.

Thanks,
Olaf
-- 
Olaf Kirch  |  --- o --- Nous sommes du soleil we love when we play
okir at lst.de |    / | \   sol.dhoop.naytheet.ah kin.ir.samse.qurax



More information about the ewg mailing list