[ofa-general] [PATCHv2] IB/ipoib: S/G and HW checksum support

Jason Gunthorpe jgunthorpe at obsidianresearch.com
Tue Sep 4 22:51:08 PDT 2007


On Wed, Sep 05, 2007 at 08:10:40AM +0300, Michael S. Tsirkin wrote:
> > With the new changes to ip_forward, maybe you could get away with
> > setting CHECKSUM_PARTIAL in your RX path to get the TX of the final
> > output device to regenerate the L4 checksum?
> 
> Good idea, the comment in linux/skbuff.h says

Ooh fancy, the comments are updated now :) Yes, this matches my
expectation, CHECKSUM_PARTIAL should definately be used instead of
CHECKSUM_UNECESSARY in the case of a 'known to be invalid on the wire'
checksum.

> > Even so, sending out malformed UD packets
> 
> When you say UD, you really mean UDP, don't you?

No, I do mean UD. I singled out UD here just because of the multicast
problem, and there really seems to be no way to fix that..

To summarize for clarity, both TCP and UDP have a checksum at the top
of the data payload, and IPv4 also has a header checksum. This
discussion, and your optimization, is all about the L4 TCP/UDP
checksum.. Linux does not offload computation of the header
checksum. If you TX a packet with ip_summed == CHECKSUM_PARTIAL
without doing the hardware csum offload procedure then on-the-wire the
L3 TCP/UDP checksum bytes are garbage. The stack no longer conforms to
the various current RFCs -> the packet is malformed.

> > strikes me as a
> > compatability killer..
> 
> Filling in PARTIAL will address that, right?

No, I don't think so. PARTIAL will make Linux forward packets
correctly between different network interfaces, but it does not
address the on-the-ib-wire problem of old/new hosts interoperating.

To do that you must call skb_checksum_help for CHECKSUM_PARTIAL
packets in the tx path when a new host is talking to an old host.

> > This would be much better as a RC only
> > negotiated at CM feature.
> 
> We can always go there, and it would be easy to
> enable this per-destination for datagram mode, too, by setting a
> bit in HW address, and thus enabling inter-operability with IETF
> compliant ipoib (at a slower rate, since we'll have to do an extra
> pass over data to calculate the checksum in software).

Right, combine this with enforcing correct checksum on all TX
multicast and this looks much better.

I don't think adding a conditional call to skb_checksum_help in the
ipoib tx path will not make performance any worse for those packets
than it is today - dev_queue_xmit today calls skb_checksum_help on
behalf of ipoib for every packet.

Also, my other thought was about the RX path, it should work more like

   if (header->flags & cpu_to_be16(IPOIB_HEADER_F_HWCSUM))
       ip_summed = CHECKSUM_PARTIAL     // Sender says the csum is bad
   else
     if (enabled_hw_csum_support)
        ip_summed = CHECKSUM_UNNECESSARY // Sender says the csum should be good
     else
        ip_summed = CHECKSUM_NONE;   // Force checking

(Of course, if the underlying hardware supports checksum offload then
 the hardware's calculation should just unconditionally be used on the
 rx path)

Tx is more like:

header->flags = 0;
if (ip_summed == CHECKSUM_PARTIAL)
   if (destination_is_compatible)
      header->flags = cpu_to_be16(IPOIB_HEADER_F_HWCSUM);
   else
      skb_checksum_help(skb);

(And again, if the HW supports offload, then don't bother with
 F_HWCSUM)

One thing I'm missing here is why care about UD csum performance?
ConnectX fixes it, and using RC on older cards with something like
this patch will give best possible speed. Why use an older card and UD
without csumming and then care about speed? Why not make it RC only
where it can be done safely and compatibly? RC already gives a good
uptick over UD, so if you are concerned by speed, you are already
using it - right?

Jason



More information about the general mailing list