[ofw] Re: Completion with bad status: IBV_WC_EXC_RETRY_EXC_ERROR
Smith, Stan
stan.smith at intel.com
Wed Nov 21 10:03:26 PST 2007
Fab Tillier wrote:
> Hi Diego,
>
> It sounds like you're still having a QP configuration issue, and that
> you're not yet at the point where RDMA operations would work. Have
> you tried send/receive operations to isolate potential rkey issues?
> I suspect these won't work either.
>
> My current theory is an endianess issue somewhere in your
> application. If you look at the ib_qp_mod_t structure in ib_types.h,
> the structure used as input to the ib_modify_qp function, you will
> see many fields as 'ib_net32'. These are fields that are treated in
> network order by the drivers, and the 'ib_netxx' types (or simply
> 'netxx') are used to identify which fields are network order vs. host
> order.
YES - RKEY ordering was recently realized during DAPL socket-cm testing.
Windows RKEY is expected to be in network byte ordering.
Linux expects an RKEY in host format.
S.
>
> Here's the list of fields that you need to treat in network order on
> Windows. I don't know how they're handled in Linux:
> ->INIT: qkey
> ->RTR: rq_psn, dest_qp, primary_av.dlid
> ->RTS: sq_psn
>
> It sounds like you have the DLID issue handled correctly, but do you
> set the destination QP and PSNs properly?
>
> -Fab
>
> -----Original Message-----
> From: Diego Guella [mailto:diego.guella at sircomtech.com]
> Sent: Wednesday, November 21, 2007 6:11 AM
> To: Fab Tillier
> Cc: ofw at lists.openfabrics.org
> Subject: Re: [ofw] Re: Completion with bad status:
> IBV_WC_EXC_RETRY_EXC_ERROR
>
> Hi Fab,
> Thanks for your answer.
> Please see my replies inline.
>
>
> ----- Original Message -----
> From: "Fab Tillier" <ftillier at windows.microsoft.com>
>>
>
>> When you exchange the rkey, are you keeping track of endianness? The
>> Windows drivers treat rkeys in network order. I think the Linux
>> stack
>>> does this in host order, and this could cause your problems. I
>>> would have
>> expected a different error than a retry exceeded error, though.
>
> No, I didn't change endianness of the rkey.
> So I made a test changing endianness of the rkey, but the error is
> always
> the same.
> I too would have expected a different error, say a
> IB_WCS_REM_ACCESS_ERR,
> instead of this retry exceeded.
>
>> For the LIDs, you need to swap it on the Windows side, not the Linux
>> side - this could be the cause for the retry error.
> You said (or perhaps Tzachi said) that Windows treats the LID in
> network
> order.
> So in my "CM" protocol I am exchanging the LID in network order:
> Windows
> sends (and receives) the LID _as is_, while Linux sends it applying
> ntohs
> before the send (and applying htons after receive).
>
>> Is there any reason you don't use the IB CM or RDMA CM for connection
>> establishment? On the Windows side, you'll need to deal with the
>> >RDMA CM private data format yourself, but at least it will take
>> care of the QP settings for you.
>
> I have taken the example in WinIB 1.3, and slightly modified it
> (removed
> some parts and added support to RDMA READ/WRITE tests).
> This program works well in a Windows/Windows test.
> Then I ported this program to Linux, modified again to use verbs
> instead of
> ib_al. It works well in a Linux/Linux test.
> The problem only arises when I try to use a Windows daemon and a Linux
> client, and vice versa.
>
> I posted the source code of this programs in older emails, I can
> resend it
> to you if you wish.
>
>
>
> Thanks,
> Diego
>
>
> _______________________________________________
> ofw mailing list
> ofw at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ofw
More information about the ofw
mailing list