[ofa-general] IPOIB CM performance issues
Pradeep Satyanarayana
pradeep at us.ibm.com
Thu Mar 22 14:35:16 PDT 2007
While working on the non-SRQ support for IPOIB CM I observed that
scatter-gather lists adversely impacts performance (as compared to without
it). On the whole, CM mode does improve performance -with or without
scatter-gather lists, but we lose a lot of throughput (something like
>15%) with sg lists.
I looked at the profiles and found that ipoib_cm_alloc_rx_skb() (and the
associated alloc_page()) show up far more (> 10X) in the profile with sg
lists, than without it. To put this in perspective, upon receipt of a
packet we call ipoib_cm_alloc_rx_skb() which in turn ends up calling
alloc_page() 16 times (every time!). I believe that is where we are
taking a big hit with sg lists. This and the associated sg list processing
is what causes the throughput drop.
I loked at the e1000 driver to see how they handle this issue and here are
a few things that I learnt; which we may try and incorporate as we find
suitable:
1. e1000 driver does not use sg lists in all cases
2. e100 driver uses a max of 3 fragments (to handle jumbo frames)
3. e1000 driver uses "copybreak" as a module paramater. For small packets
(less than copybreak)
they actually go ahead and unsplit the packet. In fact they specifically
call out alloc_page() and put_page() as eating up
CPU cycles and try to avoid them when feasible.
4. There is decision made (rx_ps_pages) if one one should use packet split
or not. This decision is based on
several factors like mtu, page size and the like.
Can we try and incorporate items 1, 3 and 4 in to the implementation of
IPOIB CM? What is the general opinions about this? Should we look at some
other drivers?
Pradeep
pradeep at us.ibm.com
More information about the general
mailing list