[ofa-general] [PATCH RFC] RDMA: New Memory Extensions.

Steve Wise swise at opengridcomputing.com
Wed May 14 18:05:30 PDT 2008



Roland Dreier wrote:
>  > Can the same ib_alloc_fast_reg_page_list() page list be
>  > bound to more than one MR?
> 
> Yes, but as the IB spec describes, the page list belongs to the
> low-level driver until the fast-reg operation has completed.
> 
>  > What happens if a user tries to issue a
>  > ib_post_send(IB_WR_FAST_REG_MR) to a VALID MR?
> 
> The operation completes with an error status.
> 
>  > How can the memory be read/written?
> 
> what memory?
> 
>  > > +struct ib_mr *ib_alloc_mr(struct ib_pd *pd, int pbl_depth, int remote_access)
> 
>  > What does pbl_depth actually control?
> 
> pbl_depth is actual a terrible name.  I would suggest calling the
> parameter something like max_page_list_len.
>

Terrible?  :(

max_page_list_len is ok.

> I wonder if we really need the remote access flag.  I know the iWARP and
> IB verbs both call this out, but is there really a case where specifying
> the exact permissions when doing the fast register is insufficient?
> 

I agree.  I don't know why they specify this.  Lets remove it.

> also I wonder if it's clearer if we call this verb
> ib_alloc_fast_reg_mr().

Ok.

> 
>  > What is fbo? First byte offset?
> 
> yes... too many abbreviations in this API, better to make things
> self-documenting at the cost of a bit more typing.
>

ooh_kay

:)

>  > So I'm guessing the fbo and length select a subset from page_list for
>  > initializing the mr. Otherwise, the ib_fast_reg_page_list has the
>  > info.
> 
> If you pass in one page, you might want the MR to start after the
> beginning of the page, and end before the end of the page.
> 
>  > We should define what error return values are possible
>  > and what they mean. Obviously ENOSYS is being used as
>  > the call is not supported by the device. ENOMEM is
>  > obvious. But what about EPERM, EINVAL, etc.
> 
> This is a big project, given we haven't done this for any other functions.
> 
>  > Is the page size always assumed to be PAGE_SIZE?
> 
> I think we want a page_size member here for sure.
> 

So you want the page size specified in the fast_reg_page_list as opposed 
to when the page list is bound to the fast_reg mr (via post_send)?


>  > The interface definition should say whether the page_list
>  > values are meaningful to the verbs caller.
> 
> not sure what you mean... the values are initialized by the verbs
> consumer so they better mean something.
>

The idea is the (kernel) application will allocate the page_list memory 
vi ib_alloc_fast_reg_page_list(), then map the desired physical IO 
memory page-by-page, filling in the page_list with the resulting dma 
addresses.  This page_list is then bound to a MR via the 
post_send(IB_WR_FAST_REG_MR).  The rkey can then be advertised to peers 
for remote IO, or the lkey used for local IO.


>  > Can this
>  > list be used only for ib_post_send(IB_WR_FAST_REG_MR)
>  > or also by ib_map_phys_fmr() for example.
> 
> It's just for posting sends, because it gives us a way to let low-level
> drivers enforce requirements they have for the page_list passed into the
> fast register via send queue operation-- eg it may need to be DMA-able
> memory (since the adapter fetches it as part of executing the WQE),
> there may be alignment restrictions, etc.
> 
> I think we should consider the fmr interface as legacy and try to phase
> out using it over the long term.

Agreed.

Steve.




More information about the general mailing list