[ofa-general] IPOIB CM performance issues

Thu Mar 22 14:35:16 PDT 2007

While working on the non-SRQ support for IPOIB CM I observed that 
scatter-gather lists adversely impacts performance (as compared to without 
it). On the whole, CM mode does improve performance -with or without 
scatter-gather lists, but we lose a lot of throughput (something like 
>15%) with sg lists.

I looked at the profiles and found that ipoib_cm_alloc_rx_skb() (and the 
associated alloc_page()) show up far more (> 10X) in the profile with sg 
lists, than without it. To put this in perspective, upon receipt of a 
packet we call ipoib_cm_alloc_rx_skb() which in turn ends up calling 
alloc_page() 16 times (every time!).  I believe that is where we are 
taking a big hit with sg lists. This and the associated sg list processing 
is what causes the throughput drop.

I loked at the e1000 driver to see how they handle this issue and here are 
a few things that I learnt; which we may try and incorporate as we find 
suitable:

1. e1000 driver does not use sg lists in all cases
2. e100 driver uses a max of 3 fragments (to handle jumbo frames)
3. e1000 driver uses "copybreak" as a module paramater. For small packets 
(less than copybreak)
they actually go ahead and unsplit the packet. In fact they specifically 
call out alloc_page() and put_page() as eating up
CPU cycles and try to avoid them when feasible.
4. There is decision made (rx_ps_pages) if one one should use packet split 
or not. This decision is based on 
several factors like mtu, page size and the like.

Can we try and incorporate items 1, 3 and 4 in to the implementation of 
IPOIB CM? What is the general opinions about this? Should we look at some 
other drivers?

Pradeep
pradeep at us.ibm.com