[ofa-general] [PATCH RFC] RDMA: New Memory Extensions.
Steve Wise
swise at opengridcomputing.com
Wed May 14 18:05:30 PDT 2008
Roland Dreier wrote:
> > Can the same ib_alloc_fast_reg_page_list() page list be
> > bound to more than one MR?
>
> Yes, but as the IB spec describes, the page list belongs to the
> low-level driver until the fast-reg operation has completed.
>
> > What happens if a user tries to issue a
> > ib_post_send(IB_WR_FAST_REG_MR) to a VALID MR?
>
> The operation completes with an error status.
>
> > How can the memory be read/written?
>
> what memory?
>
> > > +struct ib_mr *ib_alloc_mr(struct ib_pd *pd, int pbl_depth, int remote_access)
>
> > What does pbl_depth actually control?
>
> pbl_depth is actual a terrible name. I would suggest calling the
> parameter something like max_page_list_len.
>
Terrible? :(
max_page_list_len is ok.
> I wonder if we really need the remote access flag. I know the iWARP and
> IB verbs both call this out, but is there really a case where specifying
> the exact permissions when doing the fast register is insufficient?
>
I agree. I don't know why they specify this. Lets remove it.
> also I wonder if it's clearer if we call this verb
> ib_alloc_fast_reg_mr().
Ok.
>
> > What is fbo? First byte offset?
>
> yes... too many abbreviations in this API, better to make things
> self-documenting at the cost of a bit more typing.
>
ooh_kay
:)
> > So I'm guessing the fbo and length select a subset from page_list for
> > initializing the mr. Otherwise, the ib_fast_reg_page_list has the
> > info.
>
> If you pass in one page, you might want the MR to start after the
> beginning of the page, and end before the end of the page.
>
> > We should define what error return values are possible
> > and what they mean. Obviously ENOSYS is being used as
> > the call is not supported by the device. ENOMEM is
> > obvious. But what about EPERM, EINVAL, etc.
>
> This is a big project, given we haven't done this for any other functions.
>
> > Is the page size always assumed to be PAGE_SIZE?
>
> I think we want a page_size member here for sure.
>
So you want the page size specified in the fast_reg_page_list as opposed
to when the page list is bound to the fast_reg mr (via post_send)?
> > The interface definition should say whether the page_list
> > values are meaningful to the verbs caller.
>
> not sure what you mean... the values are initialized by the verbs
> consumer so they better mean something.
>
The idea is the (kernel) application will allocate the page_list memory
vi ib_alloc_fast_reg_page_list(), then map the desired physical IO
memory page-by-page, filling in the page_list with the resulting dma
addresses. This page_list is then bound to a MR via the
post_send(IB_WR_FAST_REG_MR). The rkey can then be advertised to peers
for remote IO, or the lkey used for local IO.
> > Can this
> > list be used only for ib_post_send(IB_WR_FAST_REG_MR)
> > or also by ib_map_phys_fmr() for example.
>
> It's just for posting sends, because it gives us a way to let low-level
> drivers enforce requirements they have for the page_list passed into the
> fast register via send queue operation-- eg it may need to be DMA-able
> memory (since the adapter fetches it as part of executing the WQE),
> there may be alignment restrictions, etc.
>
> I think we should consider the fmr interface as legacy and try to phase
> out using it over the long term.
Agreed.
Steve.
More information about the general
mailing list