[ofa-general] [PATCH RFC] RDMA: New Memory Extensions.

Steve Wise swise at opengridcomputing.com
Thu May 15 11:39:25 PDT 2008


Ralph Campbell wrote:
> On Wed, 2008-05-14 at 19:49 -0700, Roland Dreier wrote:
>   
>>  > So you want the page size specified in the fast_reg_page_list as
>>  > opposed to when the page list is bound to the fast_reg mr (via
>>  > post_send)?
>>
>> It's kind of the same thing, since the fast_reg_page_list is part of the
>> send work request... the structures you have at the moment are:
>>
>>  > +		struct {
>>  > +			u64				iova_start;
>>  > +			struct ib_fast_reg_page_list	*page_list;
>>  > +			int				fbo;
>>  > +			u32				length;
>>  > +			int				access_flags;
>>  > +			struct ib_mr 			*mr;
>>
>> (side note... move this pointer up with the other pointers, so you don't
>> end up with a hole in the structure due to alignment... or stick an int
>> page_size in to fill the hole)
>>
>>  > +		} fast_reg;
>>
>>  > +struct ib_fast_reg_page_list {
>>  > +	struct ib_device 	*device;
>>  > +	u64			*page_list;
>>  > +	int			page_list_len;
>>  > +};
>>
>> is page_list_len the maximum length of the page_list, or is it filled in
>> by the consumer?  The driver could figure out the length of the
>> page_list for any given work request by looking at the MR length and the
>> page_size I suppose.
>>
>>  - R.
>>     
>
> I think Roland and Steve misunderstood what I was asking about
> the struct ib_fast_reg_page_list * returned from
> ib_alloc_fast_reg_page_list().
>
> The question is "what can the caller do with the pointer?"
> Clearly, the caller can pass the pointer to
> ib_post_send(IB_WR_FAST_REG_MR) and use the [LR]_Key in the
> normal ways.
>
> Can the caller dereference the pointer and look at the
> values in page_list[]? Are these values understood to be
> a physical addresses that can be passed to phys_to_virt() for example?
> Are they byte addresses always aligned to a page boundary?
>
>   

The caller must _fill in_ the values in the page list.  That's the whole 
point.   IE all this func is doing is allocating the _memory_ to store 
the page list that the caller is building.  The special function is 
needed because some devices might need to DMA the page list array from 
this memory as part of processing the FAST_REG_MR work request, and thus 
needs to allocate it dma coherently.  The pointer returned is a kernel 
virtual address and can be read from/written to by the caller.

> The reason I ask is that the address used with the [LR]_Key from
> ib_get_dma_mr() has to be translated with ib_dma_map_single(), etc.
> because the ipath driver doesn't necessarily use physical addresses
> for the address in the send WQEs. Normally, the address in the
> send WQE is a kernel virtual address so the ib_ipath driver can
> memcpy() the data to the chip.
>   

> Lets say that ib_ipath uses vmalloc() to allocate the pages
> instead of dma_alloc_coherent(). As long as the ULP only uses
> the page_list values as an uninterpreted number that is passed
> back to the driver via subsequent verbs calls, it wouldn't
> matter to the ULP what the number represents. But if the ULP
> expects to be able to call some other kernel function to
> map or translate that value, then the ULP has to know what
> kind of number it represents, its size and alignment, etc.
>   


We're not talking about allocating the pages themselves. 

Here's an example (ignoring errors):

page_list = ib_alloc_fast_reg_page_list(device, 1);

v = get_free_page(GFP_KERNEL);

page_list->page_list[0] = ib_dma_map_single(device, v, PAGE_SIZE,
                                                                
DMA_TO_DEVICE|DMA_FROM_DEVICE);

wr.opcode = IB_WR_FAST_REG_MR;
wr.next = NULL;
wr.send_flags = 0;
wr.wr_id = 0xdeadbeef;
wr.wr.fast_reg.mr = mr;
wr.wr.fast_reg.page_list = page_list;
wr.wr.fast_reg.page_size = PAGE_SIZE;
wr.wr.fast_reg.page_list_len = 1;
wr.wr.fast_reg.first_byte_offset = 0;
wr.wr.fast_reg.iova_start = (u64)v;
wr.wr.fast_reg.length = PAGE_SIZE;
wr.wr.fast_reg.access_flags = IB_ACCESS_LOCAL_WRITE |
                                                        
IB_ACCESS_REMOTE_READ |
                                                        
IB_ACCESS_REMOTE_WRITE;

ib_post_send(qp, &wr, &bad_wr);





More information about the general mailing list