[ofa-general] RE: [UPDATE] [V3] [PATCH 3/3] ib/ipoib: IPoIB-UD RX S/G supportfor 4K MTU

Eli Cohen eli at mellanox.co.il
Mon Feb 4 05:54:05 PST 2008


On Sun, 2008-02-03 at 09:36 -0800, Shirley Ma wrote:
> Does your recommendation is the same as Roland's before? I hope it's
> not, otherwise, it doesn't work. Since the first buffer is GRH + IPoIB
> HEAD = 44 bytes not 40 bytes. If we put all skb data in the first
> frag, then the IP header is not aligned to 16 bytes. I am copying
> Roland's comments regarding this approach:
> ---------
> However, I now realize that my earlier idea of allocating a scratch
> buffer for the GRH and just allocating a 4096 byte skb doesn't work,
> because the skb_shinfo ends up being allocated along with the buffer,
> so trying to allocate a 4096-byte skb will bloat the data past a
> single page, which is what we're trying to avoid.
> 
> So how about the following?  When using a UD MTU of 4096 with a page
> size of 4096, allocate an skb of size 44 for the GRH and ethertype,
> and then allocate a single page for the fragment list.  This means
> that the IP packet will start nicely 16-byte aligned for free, and all
> the bookkeeping is very simple. 
> -------
> 

I actually say lets allocate for example, 128 bytes in the linear data
and then a 4K page. The first 128 bytes will be used for GRH, for the
encapsulation header, and for the IP and TCP/UDP headers. The following
4K fragment will have large enough space to contain the rest of the
packet.

Another thing to consider is use a 3 entries receive scatter list:
1. The first will point to 40 bytes generic buffer (allocated once per
netdevice). All receive buffer will point to this buffer. As Roland
suggested before, this will save us the skb_pull on the GRH.

2. A 128 bytes buffer which comes from the linear part of the SKB - we
can align this buffer to ensure IP is aligned at 16 byte boundary.

3. A 4K page to in the first fragment.
We can then check when the packet is received whether the overall packet
length is small enough such that it did not touch the page. If it did
not we can use this page for the newly posted buffer.

** the above 128 bytes value can be a macro and we can determine what is
the correct value.




More information about the general mailing list