[ofw] changes to ib_wc committed to svn

Tzachi Dar tzachid at mellanox.co.il
Tue Jul 29 05:05:21 PDT 2008


Just to summarize:

The structure ib_ca_attr_t alread contains the field ipoib_csum
(according to your request).
The structure ib_wc_t will look as:

typedef struct _ib_wc
{
        TO_LONG_PTR(struct _ib_wc* ,    p_next) ;
        uint64_t                                wr_id;
        ib_wc_type_t                    wc_type;

        uint32_t                                length;
        ib_wc_status_t                  status;
        uint64_t                                vendor_specific;

        union _wc_recv
        {
                struct _wc_conn
                {
                        ib_recv_opt_t   recv_opt;
                        ib_net32_t              immediate_data;

                }       conn;

                struct _wc_ud
                {
                        ib_recv_opt_t   recv_opt;
                        ib_net32_t              immediate_data;
                        ib_net32_t              remote_qp;
                        uint16_t                pkey_index;
                        ib_net16_t              remote_lid;
                        uint8_t         remote_sl;
                        uint8_t         path_bits;
                        uint8_t           csum_ok;

                }       ud;

                struct _wc_raw_ipv6
                {
                        ib_net16_t              remote_lid;
                        uint8_t                 remote_sl;
                        uint8_t                 path_bits;

                }       raw_ipv6;

                struct _wc_raw_ether
                {
                        ib_net16_t              remote_lid;
                        uint8_t                 remote_sl;
                        uint8_t                 path_bits;
                        ib_net16_t              ether_type;

                }       raw_ether;

        }       recv;

}       ib_wc_t;

As Jan has said,  Hardware send checksum generation has to be
controllable on a per packet basis.
For ipoib connected mode check sum is not being done by the hardware.
We will soon add support for the three types of recieve checksum to the
code.

Thanks
Tzachi

> -----Original Message-----
> From: Fab Tillier [mailto:ftillier at windows.microsoft.com]
> Sent: Thursday, July 24, 2008 7:12 PM
> To: Tzachi Dar; Sean Hefty; ofw at lists.openfabrics.org
> Subject: RE: [ofw] changes to ib_wc committed to svn
>
> Hi Tzachi,
>
> > -----Original Message-----
> > From: Tzachi Dar [mailto:tzachid at mellanox.co.il]
> > Sent: Thursday, July 24, 2008 12:58 AM
> >
> > Hi Fab,
> >
> >> You've also lost the ability to eliminate receive checksum
> >> validation.  IB has 32-bit CRC which would cause any corrupted
> >> packets to get dropped. If you assume that packets are
> being put on
> >> the wire correctly, then if they are received at all they haven't
> >> been corrupted.
> >
> > This approach has a few problems.
> > First, Ethernet also has CRC in it and still IP check sum
> is used on
> > Ethernet. This means that the users of IP believe that there still
> > should be an end to end check.
> > Second, your approach works only inside a small cluster that is all
> > windows. If the cluster has Linux machines in it they will
> through the
> > packets.
>
> No, the checksum is still calculated and set properly by the
> sender.  I agree that if you enable send checksum offload and
> don't perform the calculation then the packets can get
> discarded at the receiver.  However, if you let the OS do
> send checksums, you can pass packets to the stack at the
> receiver as if checksum had been done.  This should interop
> just fine with Linux since the checksums in the packet should
> be correct.
>
> > If there is a gateway to Ethernet, that gateway should be
> configured
> > to do the checksum once packets are moved to the Ethernet. I don't
> > think there is such a gateway.
>
> This only applies if you don't calculate checksums when
> putting a packet on the wire.  Receive checksum calculation
> is independent.
>
> > In any case, I believe that we should have 3 possibilities:
> > 1) Never do checksum in hw. Windows will always do checksum.
> > 2) Never do checksum by windows. The hw will do checksum if
> it can or
> > send packets with bad checksum if the card doesn't support that.
>
> This one I think can be refined - you can enable receive
> checksum 'offload' without packet drop.  Enabling send
> checksum 'offload' without hardware support will put packets
> on the wire with incorrect checksums though, and I think that
> should be discouraged.
>
> > 3) [default] Do checksum in HW if the hw supports that and do it by
> > software if HW doesn't support that.
> >
> > This will give anyone the chance to choose the right option for him.
>
> Agreed, this should be configurable.
>
> >> What happens if these options are set and the underlying
> HCA doesn't
> >> support checksum offload?  Does the WR just get sent or
> does it fail?
> >
> > In the existing code, this flags are not checked by other
> drivers so
> > things should be fine. This might have to be changed once
> the 3 modes
> > suggested above will be available.
>
> It seems to me that checksum offload should be a QP
> attribute, not a work request option.  That way clients can
> do a check to see if the CA attributes indicate the HW
> supports checksum offload, and then create their QP accordingly.
>
> Ignoring send flags is a bit odd - right now none of the
> flags get ignored do they?  If someone requests a send with
> immediate, it's expected that the send will happen with
> immediate.  Sending the data without immediate data would
> affect the wire protocol.
>
> >> All in all, I think using the vendor specific field for checksum
> >> notifications seems lacking proper design, and more importantly,
> >> community input. A few issues from a quick glance:
> >> - it should be clear that it matches the
> >> NDIS_TCP_IP_CHECKSUM_PACKET_INFO definition.
> >
> > Indeed it matches this field exactly. This was done to
> avoid doing un
> > needed work on the data path.
>
> A comment to that extent would be good.
>
> >> - either checksum offload is vendor specific or it isn't. 
> If it is,
> >> then:
> >>         1. the IB_SEND_OPT_TX_XXX_CSUM values should be in the
> >>         IB_SEND_OPT_VEND_MASK range 2. The ib_wc_t structure stays
> >>         unchanged 3. IPoIB checks which HCA it runs on and has
> >>         HCA-specific code to handle checksum offload. 4.
> ib_ca_attr_t
> >>         is unchanged
> >> If it isn't then the vendor_specific field should get
> overloaded with
> >> non-vendor specific information and csum_ok should be put into the
> >> recv.ud part of the ib_wc_t structure.
> >> - MTHCA needs to be updated so that it properly clears csum_ok.
> >  What is your suggestion, do you want to put it as vendor
> specific or
> > not? I suggest to have it non vendor specific.
>
> I think it makes sense as non-vendor specific.  So think I'd
> like to see:
> - checksum offload capabilities are reported in CA attributes.
> - checksum offload as QP creation attribute (do you support
> this for both UD and RC QPs?  That is, can IPoIB-CM take
> advantage of this too?).  If offload is requested but HW
> doesn't support, QP creation should fail.  Clients should be
> expected to check the capabilities of the HW.
> - checksum results are reported in ib_wc_t::recv.ud.csum_ok
> - ib_wc_t::vendor_specific remains as a uint64_t.
>
> -Fab
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/ofw/attachments/20080729/e9a4734a/attachment.html>


More information about the ofw mailing list