[ofa-general] [PATCH RFC v3 1/2] RDMA/Core: MEM_MGT_EXTENSIONS support
Steve Wise
swise at opengridcomputing.com
Tue May 20 06:55:28 PDT 2008
Or Gerlitz wrote:
> Steve Wise wrote:
>> Support for the IB BMME and iWARP equivalent memory extensions to non
>> shared memory regions. Usage Model:
>> - MR allocated with ib_alloc_mr()
>> - Page lists allocated via ib_alloc_fast_reg_page_list().
>> - MR made VALID and bound to a specific page list via
>> ib_post_send(IB_WR_FAST_REG_MR)
>> - MR made INVALID via ib_post_send(IB_WR_INVALIDATE_MR)
> Steve,
>
> I am trying to further understand what would be a real life ULP design
> here, and I think there are some more issues to clarify/define for the
> case of ULP which has to create a mapping for a list of pages and send
> this mapping (eg IB/rkey iWARP/stag) to a remote party that uses it
> for RDMA.
>
> AFAIK, the idea was to let the ulp post --two-- work requests, where
> the first creates the mapping and the second sends this mapping to the
> remote side, such that the second does not start before the first
> completes (i.e a fence).
>
> Now, the above scheme means that the ulp knows the value of the
> rkey/stag at the time of posting these two work requests (since it has
> to encode it in the second one), so something has to be clarified re
> the rkey/stag here, do they change each time this MR is used? how many
> bits can be changed, etc.
The ULP knows the rkey/stag because its returned up front in the
ib_alloc_fast_reg_mr(). And it doesn't change (ignoring the key issue
which we haven't exposed yet to the ULP). The same rkey/stag can be
used for multiple mappings. It can be made invalid at any point in time
via the IB_WR_INVALIDATE_MR so the fact that you're leaving the same
rkey/stag advertised is not a risk.
So you allocate the rkey/stag up front, allocate page_lists up front,
then as needed you populate your page list and bind it to the rkey/stag
via IB_WR_FAST_REG_MR, and invalidate that mapping via
IB_WR_INVALIDATE_MR. You can do this any number of times, and with
proper fencing, you can pipeline these mappings. Eventually when
you're done doing IO (like for NFSRDMA when the mount is unmounted) you
free up the page list(s) and mr/rkey/stag.
So NFSRDMA will keep these fast_reg_mrs and page_list structs
pre-allocated and hung off some context so that per RPC, they can be
bound/registered, the IO executed, and then the MR invalidated as part
of processing the RPC.
>
> I guess my questions are to some extent RTFM ones, but, first, with
> some quick looking in the IB spec I did not manage to get enough
> answers (pointers appreciated...) and second, you are proposing an
> implementation here, so I think it makes sense to review the actual
> usage model to see all aspects needed for ULPs are covered...
>
> Talking on usage, do you plan to patch the mainline nfs-rdma code to
> use these verbs?
Yes. Tom Tucker will be doing this. Jon Mason is implementing RDS
changes to utilize this too. The hope is all this makes 2.6.27/ofed-1.4.
I can also post test code (krping module) if anyone is interested. I'm
developing that now.
Steve.
More information about the general
mailing list