[ofa-general] [PATCH RFC v3 1/2] RDMA/Core: MEM_MGT_EXTENSIONS support

Or Gerlitz ogerlitz at voltaire.com
Mon May 19 01:26:29 PDT 2008


Steve Wise wrote:
> Support for the IB BMME and iWARP equivalent memory extensions to 
> non shared memory regions. Usage Model:
>
> - MR allocated with ib_alloc_mr()
> - Page lists allocated via ib_alloc_fast_reg_page_list().
> - MR made VALID and bound to a specific page list via ib_post_send(IB_WR_FAST_REG_MR)
> - MR made INVALID via ib_post_send(IB_WR_INVALIDATE_MR)
> - MR deallocated with ib_dereg_mr()
> - page lists dealloced via ib_free_fast_reg_page_list().
Steve,

Does this design goes hand-in-hand with remote invalidation? such that 
if the remote side invalidated the mapping there no need to issue the 
IB_WR_INVALIDATE_MR work request.

Also, does the proposed design support fmr pages of granularity 
different than the OS ones? for example the OS pages are 4K and the ULP 
wants to use fmr of 512 byte "pages (the "block lists" feature), etc. In 
that case doesn't the size of each page has to be specified in as a 
param to the alloc_fast_reg_mr() verb?
>
> Applications can allocate a fast_reg mr once, and then can repeatedly
> bind the mr to different physical memory SGLs via posting work requests
> to the send queue.  For each outstanding mr-to-pbl binding in the SQ
> pipe, a fast_reg_page_list needs to be allocated.  Thus pipelining can
> be achieved while still allowing device-specific page_list processing.
mmm, is it a must for the ULP issue page list alloc/free per 
IB_WR_FAST_REG_MR call?

> --- a/include/rdma/ib_verbs.h
> +++ b/include/rdma/ib_verbs.h
> @@ -676,6 +683,20 @@ struct ib_send_wr {
>  			u16	pkey_index; /* valid for GSI only */
>  			u8	port_num;   /* valid for DR SMPs on switch only */
>  		} ud;
> +		struct {
> +			u64				iova_start;
> +			struct ib_mr 			*mr;
> +			struct ib_fast_reg_page_list	*page_list;
> +			unsigned int			page_size;
> +			unsigned int			page_list_len;
> +			unsigned int			first_byte_offset;
> +			u32				length;
> +			int				access_flags;
> +			
> +		} fast_reg;
> +		struct {
> +			struct ib_mr 	*mr;
> +		} local_inv;
>  	} wr;
>  };
I suggest to use a "page_shift" notation and not "page_size" to comply 
with the kernel semantics of other APIs.


Or.




More information about the general mailing list