[openib-general] Re: FW: [PATCH 1 of 3] mad: large RMPP support

Thu Feb 9 15:46:54 PST 2006

Quoting r. Sean Hefty <mshefty at ichips.intel.com>:
> Subject: Re: FW: [PATCH 1 of 3] mad: large RMPP support
> 
> Roland Dreier wrote:
> >My rule of thumb is that we shouldn't rely on being able to allocate a
> >contiguous buffer bigger than 4 KB, but assuming we can allocate 4 KB
> >is fine.  4 KB is the lowest page size of any real architecture, and
> >if the kernel is out of free pages then any allocation is likely to
> >fail.  Allocations of larger buffers may fail because of memory
> >fragmentation, even with plenty of free memory.
> >
> >That is: a 4 KB buffer is fine.
> 
> Given this, I think that we'll need to go with the linked list then.  Maybe 
> something like:
> 
> struct ib_mad_segment {
> 	struct list_head list;
> 	u8 data[0];
> };
> 
> struct ib_mad_send_buf {
> 	...
> 	void		*mad;	/* first segment */
> 	struct list_head rmpp_list;
> 	u32		 segment_size;
> 	...
> };

Given that the last segment has a different size, it seems cleaner
to just keep the segment size part of ib_mad_segment structure.

> I'm undecided about whether all MADs should use the rmpp_list, with *mad 
> referencing the data of the first segment.  This keeps the code consistent, 
> but would result in the first segment being larger (256-bytes) than 
> additional segments (say 220-bytes).

I dont htink its a good idea.
Recall that when you send segments starting from the second one,
you need the header from *mad. So this gets ugly very quickly.

> Users could then walk the list of buffers without calling a routine that 
> needs to start at the beginning of the list every time.
> 
> - Sean

On the other hand, it makes sense to keep the single mad case
as simple as possible. So that's a good reason to have the rmpp list
include segments starting from the second one.

-- 
Michael S. Tsirkin
Staff Engineer, Mellanox Technologies