[openib-general] FMR and how they work

Fab Tillier ftillier at infiniconsys.com
Mon May 2 14:59:26 PDT 2005


> From: Dror Goldenberg [mailto:gdror at mellanox.co.il]
> Sent: Monday, May 02, 2005 7:18 AM
> 
> > -----Original Message-----
> > From: Fab Tillier [mailto:ftillier at infiniconsys.com]
> > Sent: Monday, May 02, 2005 4:50 AM
> >
> > Is the HW2SW_MPT command more expensive than the SYNC_TPT
> > command?
>
> SYNC_TPT wipes out the caches. HW2SW_MPT cleans caches for the
> specific MPT. You're welcome to compare the two execution time. I believe
> that the main impact on CPU is by the fact that you need to submit a
> command and wait for it to complete (interrupt->EQE).

The PRM doesn't clearly state that HW2SW_MPT invalidates cache entries.  It
doesn't say either whether just the MPT is invalidated, or also its
associated MTTs.

>
> > Does it just flush the MPT out of the cache?  What
> > happens to that MPT's MTTs - do they get flushed out too
> > (assuming the MPT still references them)?
>
> For correctness you'd have to flush MPTs and MTTs.

Does the HW2SW_MPT command flush the MTTs referenced by the MPT (if any)?
That is, if an MPT has mtt_seg_adr_h and mtt_seg_adr_l set during HW2SW_MPT,
do the MTTs get flushed?

> >
> > Why aren't FMR's bindable?
>
> They are not. Check out the PRM.

The PRM says they're not bindable but doesn't explain why.  What happens if
an FMR is marked as bindable?

> > It seems that if the HW2SW_MPT
> > flushes the MPT out of the cache, then one could use it for
> > normal memory registrations and avoid the WRITE_MTT command,
> > no?
>
> Please explain. Maybe you meant to use HW2SW_MPT for Deregistration.
> If this is what you mean, then you're right. But it is against the spirit
> of "lazy deregistration".  This will only speed the IO operations halfway.
> On the creation of mapping it'll be fast - just posted writes to the
> MPT/MTT, on the destruction of the mapping, it'll be slow - because you
> need HW2SW_MPT and you'd need probably to create a new blank MR
> for the next reuse.
> The intent was to use bulk deregistrations...

No, I'm thinking about improving the "slow" registration path.  If I could
avoid using the WRITE_MTT command, this would save a few commands (and
potential error cases) from the registration path.  I'm envisioning a case
where the MTTs are written via a posted write to memory, and then the
SW2HW_MPT command is issued for an MPT that references the just-written
MTTs.

Currently (assuming 64MTTs per region), registration is a 2 command process
for buffers up to 256K.  It becomes 3 commands for registrations to 512K, 4
to 768K, and 5 for 1M, and so forth.  Having all of these be just a single
SW2HW_MPT command would be nice.

Doing this would require the MTTs associated with an MPT to get flushed when
the HW2SW_MPT command is invoked so that they can be reused in a subsequent
registration without hitting stale cached values.

This is also why I'm asking about the bindable properties.  Regular memory
regions need to be bindable.  Can I do posted writes for the MTTs and use
the SW2HW_MPT and HW2SW_MPT commands and retain the bindable property of the
MPT?

> > That is, during registration, do posted writes for the
> > MTTs and then a SW2HW_MPT command for an MPT entry that
> > references those MTTs.  Would this work?
> >
> > How much slower are memory windows compared to FMRs (assuming
> > the underlying MR is already registered)?
>
> You're welcomed to measure. I think that FMRs will be the fastest way
> to create mapping. Faster than MWs.

The reason I asked is SDP went down the FMR path, but that can only work
properly if the remote peer is trusted.  I don't know if this assumption
should be made - the same way that we shouldn't trust user-mode processes,
we shouldn't trust remote user-mode processes.  It is possible for an SDP
implementation to be in user-mode.  Thus a window exists where a user-mode
process on a remote node could corrupt memory via RDMA to stale (but still
cached in the HCA) regions.  The pages that used to represent those regions
could well have been freed and reallocated.  This design decision led me to
believe that MWs were significantly slower than FMRs (hence the tradeoff
with system security).

Note that I'm not fully versed in the SDP implementation, so if it addresses
this, great!

> Also note that FMRs are not bindable. So, there are certain applications
> that can benefit from FMRs (e.g. SRP), but can not leverage MWs, at least
> I can't think of a model for that...

Which gets back to my question of why they're not bindable.  What happens if
an FMR indicates that it is bindable, and a MW gets bound to it?

- Fab




More information about the general mailing list