[ofa-general] [PATCH RFC v3 1/2] RDMA/Core: MEM_MGT_EXTENSIONS support

Steve Wise swise at opengridcomputing.com
Tue May 20 06:55:28 PDT 2008


Or Gerlitz wrote:
> Steve Wise wrote:
>> Support for the IB BMME and iWARP equivalent memory extensions to non 
>> shared memory regions.  Usage Model:
>> - MR allocated with ib_alloc_mr()
>> - Page lists allocated via ib_alloc_fast_reg_page_list().
>> - MR made VALID and bound to a specific page list via 
>> ib_post_send(IB_WR_FAST_REG_MR)
>> - MR made INVALID via ib_post_send(IB_WR_INVALIDATE_MR)
> Steve,
>
> I am trying to further understand what would be a real life ULP design 
> here, and I think there are some more issues to clarify/define for the 
> case of ULP which has to create a mapping for a list of pages and send 
> this mapping (eg IB/rkey iWARP/stag) to a remote party that uses it 
> for RDMA.
>
> AFAIK, the idea was to let the ulp post --two-- work requests, where 
> the first creates the mapping and the second sends this mapping to the 
> remote side, such that the second does not start before the first 
> completes (i.e a fence).
>
> Now, the above scheme means that the ulp knows the value of the 
> rkey/stag at the time of posting these two work requests (since it has 
> to encode it in the second one), so something has to be clarified re 
> the rkey/stag here, do they change each time this MR is used? how many 
> bits can be changed, etc.

The ULP knows the rkey/stag because its returned up front in the 
ib_alloc_fast_reg_mr().  And it doesn't change (ignoring the key issue 
which we haven't exposed yet to the ULP).  The same rkey/stag can be 
used for multiple mappings.  It can be made invalid at any point in time 
via the IB_WR_INVALIDATE_MR so the fact that you're leaving the same 
rkey/stag advertised is not a risk.

So you allocate the rkey/stag up front, allocate page_lists up front, 
then as needed you populate your page list and bind it to the rkey/stag 
via IB_WR_FAST_REG_MR, and invalidate that mapping via 
IB_WR_INVALIDATE_MR.  You can do this any number of times, and with 
proper fencing, you can pipeline these mappings.   Eventually when 
you're done doing IO (like for NFSRDMA when the mount is unmounted) you 
free up the page list(s) and mr/rkey/stag.

So NFSRDMA will keep these fast_reg_mrs and page_list structs 
pre-allocated and hung off some context so that per RPC, they can be 
bound/registered, the IO executed, and then the MR invalidated as part 
of processing the RPC.

>
> I guess my questions are to some extent RTFM ones, but, first, with 
> some quick looking in the IB spec I did not manage to get enough 
> answers (pointers appreciated...) and second, you are proposing an 
> implementation here, so I think it makes sense to review the actual 
> usage model to see all aspects needed for ULPs are covered...
>
> Talking on usage, do you plan to patch the mainline nfs-rdma code to 
> use these verbs?

Yes.  Tom Tucker will be doing this.  Jon Mason is implementing RDS 
changes to utilize this too.  The hope is all this makes 2.6.27/ofed-1.4.

I can also post test code (krping module) if anyone is interested.  I'm 
developing that now.

Steve.





More information about the general mailing list