[ofw] changes to ib_wc committed to svn

Tzachi Dar tzachid at mellanox.co.il
Thu Jul 24 00:57:57 PDT 2008


Hi Fab,

Xalex has sent a mail to the community describing the proposed patch.
See attached.
As their have been no objections from the community I did the checkin.
Looking at the mail again (see attached) it seems that the mail had a
problem in it's addresses.
 
Bellow there are answers to the different issues that you have raised.

Thanks
Tzachi

> -----Original Message-----
> From: ofw-bounces at lists.openfabrics.org 
> [mailto:ofw-bounces at lists.openfabrics.org] On Behalf Of Fab Tillier
> Sent: Thursday, July 24, 2008 1:30 AM
> To: Sean Hefty; ofw at lists.openfabrics.org
> Subject: RE: [ofw] changes to ib_wc committed to svn
> 
> > From: Sean Hefty
> > Sent: Wednesday, July 23, 2008 1:51 PM
> >
> > I must have missed seeing the patch to review the following 
> change to
> > ib_wc:
> 
> Me too...
> 
> Besides agreeing with Sean about the vendor_specific field, I 
> have issues with the changes to IPoIB.  I'll put the changes 
> here, as if they had been sent out for review, and comment inline.
> 
> >Index: ulp/ipoib/kernel/ipoib_port.c
> >===================================================================
> >--- ulp/ipoib/kernel/ipoib_port.c      (revision 1429)
> >+++ ulp/ipoib/kernel/ipoib_port.c      (revision 1430)
> >@@ -1954,6 +1954,7 @@
> >
> >               }
> >               /* Successful completion.  Get the receive 
> information. 
> > */
> >+              p_desc->ndis_csum.Value = (ULONG) p_wc->csum_ok;
> 
> If the csum_ok field is vendor specific (last I checked the 
> IB spec didn't define a standard method for reporting IP 
> checksum capabilities) then this assignment needs to check 
> that the underlying HCA hardware supports checksum offload.
> 
> >               cl_perf_start( GetRecvEndpts );
> >               __recv_get_endpts( p_port, p_desc, p_wc, 
> &p_src, &p_dst );
> >               cl_perf_stop( &p_port->p_adapter->perf, 
> GetRecvEndpts ); 
> >@@ -2501,7 +2502,6 @@
> >       NDIS_STATUS                                           
>           status;
> >       uint32_t                                              
>           pkt_filter;
> >       ip_stat_sel_t                                           type;
> >-      NDIS_TCP_IP_CHECKSUM_PACKET_INFO        chksum;
> >       PERF_DECLARE( GetNdisPkt );
> >
> >       IPOIB_ENTER( IPOIB_DBG_RECV );
> >@@ -2582,14 +2582,9 @@
> >               return IB_INSUFFICIENT_RESOURCES;
> >       }
> >
> >-      /* Flag the checksums as having been calculated. */
> >-      chksum.Value = 0;
> >-      chksum.Receive.NdisPacketTcpChecksumSucceeded = TRUE;
> >-      chksum.Receive.NdisPacketUdpChecksumSucceeded = TRUE;
> >-      chksum.Receive.NdisPacketIpChecksumSucceeded = TRUE;
> >-      NDIS_PER_PACKET_INFO_FROM_PACKET( *pp_packet, 
> TcpIpChecksumPacketInfo ) =
> >-              (void*)(uintn_t)chksum.Value;
> >-
> >+      /* Get the checksums directly from packet information. */
> >+      NDIS_PER_PACKET_INFO_FROM_PACKET( *pp_packet, 
> TcpIpChecksumPacketInfo ) =
> >+              (PVOID) (uintn_t) (p_desc->ndis_csum.Value);
> 
> Ok, so here, you assume that p_desc->ndis_csum.Value is 
> properly formatted. That means that the assignment above 
> (p_desc->ndis_csum.Value = (ULONG) p_wc->csum_ok) does the 
> right thing.  However, the MTHCA driver doesn't clear this field.

Agreed, The mthca driver will be changed to give the correct checksum
value where possibale.

> 
> You've also lost the ability to eliminate receive checksum 
> validation.  IB has 32-bit CRC which would cause any 
> corrupted packets to get dropped. If you assume that packets 
> are being put on the wire correctly, then if they are 
> received at all they haven't been corrupted.

This approach has a few problems.
First, Ethernet also has CRC in it and still IP check sum is used on
Ethernet. This means that the users of IP believe that there still
should be an end to end check.
Second, your approach works only inside a small cluster that is all
windows. If the cluster has Linux machines in it they will through the
packets. If there is a gateway to Ethernet, that gateway should be
configured to do the checksum once packets are moved to the Ethernet. I
don't think there is such a gateway.

In any case, I believe that we should have 3 possibilities:
1) Never do checksum in hw. Windows will always do checksum.
2) Never do checksum by windows. The hw will do checksum if it can or
send packets with bad checksum if the card doesn't support that.
3) [default] Do checksum in HW if the hw supports that and do it by
software if HW doesn't support that.

This will give anyone the chance to choose the right option for him.

> 
> >       ipoib_inc_recv_stat( p_port->p_adapter, type, p_desc->len );
> >
> >       IPOIB_EXIT( IPOIB_DBG_RECV );
> >@@ -3618,6 +3613,8 @@
> > {
> >       NDIS_STATUS                     status;
> >       int32_t                         hdr_idx;
> >+      PNDIS_PACKET_EXTENSION                          PktExt;
> >+      PNDIS_TCP_IP_CHECKSUM_PACKET_INFO       
> pChecksumPktInfo; //NDIS 5.1
> >
> >       PERF_DECLARE( SendMgrFilter );
> >
> >@@ -3652,6 +3649,23 @@
> >       p_desc->wr.wr_id = (uintn_t)p_desc->p_pkt;
> >       p_desc->wr.wr_type = WR_SEND;
> >       p_desc->wr.send_opt = IB_SEND_OPT_SIGNALED;
> >+
> >+      PktExt = NDIS_PACKET_EXTENSION_FROM_PACKET(p_desc->p_pkt);
> >+      pChecksumPktInfo = 
> (PNDIS_TCP_IP_CHECKSUM_PACKET_INFO)&PktExt->NdisPacketInfo[Tcp
> IpChecksumPacketInfo];
> >+      if(p_port->p_adapter->params.send_chksum_offload &
> >+              
> (pChecksumPktInfo->Transmit.NdisPacketChecksumV4 || 
> pChecksumPktInfo->Transmit.NdisPacketChecksumV6))
> >+      {
> >+              // Set transimition checksum offloading
> >+              if (pChecksumPktInfo->Transmit.NdisPacketIpChecksum)
> >+              {
> >+                      p_desc->wr.send_opt |= IB_SEND_OPT_TX_IP_CSUM;
> >+              }
> >+              if(pChecksumPktInfo->Transmit.NdisPacketTcpChecksum  )
> >+              {
> >+                      p_desc->wr.send_opt |= 
> IB_SEND_OPT_TX_TCP_UDP_CSUM;
> >+              }
> 
> What happens if these options are set and the underlying HCA 
> doesn't support checksum offload?  Does the WR just get sent 
> or does it fail?

In the existing code, this flags are not checked by other drivers so
things should be fine. This might have to be changed once the 3 modes
suggested above will be available.

> 
> All in all, I think using the vendor specific field for 
> checksum notifications seems lacking proper design, and more 
> importantly, community input. A few issues from a quick glance:
> - it should be clear that it matches the 
> NDIS_TCP_IP_CHECKSUM_PACKET_INFO definition.

Indeed it matches this field exactly. This was done to avoid doing un
needed work on the data path.

> - either checksum offload is vendor specific or it isn't.  If 
> it is, then:
>         1. the IB_SEND_OPT_TX_XXX_CSUM values should be in the
>         IB_SEND_OPT_VEND_MASK range
>         2. The ib_wc_t structure stays unchanged
>         3. IPoIB checks which HCA it runs on and has 
> HCA-specific code to
>         handle checksum offload.
>         4. ib_ca_attr_t is unchanged
> If it isn't then the vendor_specific field should get 
> overloaded with non-vendor specific information and csum_ok 
> should be put into the recv.ud part of the ib_wc_t structure.
> - MTHCA needs to be updated so that it properly clears csum_ok.

What is your suggestion, do you want to put it as vendor specific or
not?
I suggest to have it non vendor specific.

> 
> -Fab
> _______________________________________________
> ofw mailing list
> ofw at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ofw
> 
-------------- next part --------------
An embedded message was scrubbed...
From: "Alex Naslednikov" <xalex at mellanox.co.il>
Subject: [ofw] [patch] Implementation of checksum offload
Date: Mon, 21 Jul 2008 14:47:50 +0300
Size: 14318
URL: <http://lists.openfabrics.org/pipermail/ofw/attachments/20080724/54ac74ab/attachment.mht>


More information about the ofw mailing list