[ofw] Re: Completion with bad status: IBV_WC_EXC_RETRY_EXC_ERROR

Smith, Stan stan.smith at intel.com
Wed Nov 21 10:03:26 PST 2007


Fab Tillier wrote:
> Hi Diego,
> 
> It sounds like you're still having a QP configuration issue, and that
> you're not yet at the point where RDMA operations would work.  Have
> you tried send/receive operations to isolate potential rkey issues? 
> I suspect these won't work either.   
> 
> My current theory is an endianess issue somewhere in your
> application.  If you look at the ib_qp_mod_t structure in ib_types.h,
> the structure used as input to the ib_modify_qp function, you will
> see many fields as 'ib_net32'.  These are fields that are treated in
> network order by the drivers, and the 'ib_netxx' types (or simply
> 'netxx') are used to identify which fields are network order vs. host
> order.  

YES - RKEY ordering was recently realized during DAPL socket-cm testing.

Windows RKEY is expected to be in network byte ordering.
Linux expects an RKEY in host format.

S.

    
> 
> Here's the list of fields that you need to treat in network order on
> Windows.  I don't know how they're handled in Linux: 
> ->INIT: qkey
> ->RTR: rq_psn, dest_qp, primary_av.dlid
> ->RTS: sq_psn
> 
> It sounds like you have the DLID issue handled correctly, but do you
> set the destination QP and PSNs properly? 
> 
> -Fab
> 
> -----Original Message-----
> From: Diego Guella [mailto:diego.guella at sircomtech.com]
> Sent: Wednesday, November 21, 2007 6:11 AM
> To: Fab Tillier
> Cc: ofw at lists.openfabrics.org
> Subject: Re: [ofw] Re: Completion with bad status:
> IBV_WC_EXC_RETRY_EXC_ERROR 
> 
> Hi Fab,
> Thanks for your answer.
> Please see my replies inline.
> 
> 
> ----- Original Message -----
> From: "Fab Tillier" <ftillier at windows.microsoft.com>
>> 
> 
>> When you exchange the rkey, are you keeping track of endianness?  The
>> Windows drivers treat rkeys in network order.  I think the Linux
>> stack 
>>> does this in host order, and this could cause your problems.  I
>>> would have 
>> expected a different error than a retry exceeded error, though.
> 
> No, I didn't change endianness of the rkey.
> So I made a test changing endianness of the rkey, but the error is
> always 
> the same.
> I too would have expected a different error, say a
> IB_WCS_REM_ACCESS_ERR, 
> instead of this retry exceeded.
> 
>> For the LIDs, you need to swap it on the Windows side, not the Linux
>> side - this could be the cause for the retry error.
> You said (or perhaps Tzachi said) that Windows treats the LID in
> network 
> order.
> So in my "CM" protocol I am exchanging the LID in network order:
> Windows 
> sends (and receives) the LID _as is_, while Linux sends it applying
> ntohs 
> before the send (and applying htons after receive).
> 
>> Is there any reason you don't use the IB CM or RDMA CM for connection
>> establishment?  On the Windows side, you'll need to deal with the
>> >RDMA CM private data format yourself, but at least it will take
>> care of the QP settings for you.
> 
> I have taken the example in WinIB 1.3, and slightly modified it
> (removed 
> some parts and added support to RDMA READ/WRITE tests).
> This program works well in a Windows/Windows test.
> Then I ported this program to Linux, modified again to use verbs
> instead of 
> ib_al. It works well in a Linux/Linux test.
> The problem only arises when I try to use a Windows daemon and a Linux
> client, and vice versa.
> 
> I posted the source code of this programs in older emails, I can
> resend it 
> to you if you wish.
> 
> 
> 
> Thanks,
> Diego
> 
> 
> _______________________________________________
> ofw mailing list
> ofw at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ofw



More information about the ofw mailing list