[ofw] RE: Completion with bad status: IBV_WC_RETRY_EXC_ERROR

Dotan Barak dotanb at mellanox.co.il
Thu Nov 15 06:35:12 PST 2007


I don't have an available Linux <--> windows setup, can you please send
me the following data:
* LID values (from ibv_devinfo/vstat)
* the values that you set to the AV.LID (the value that you set, after
fixing/handling endianess)
* the QP number in each size
* the remote QP number that you set in each side
 
 
thanks
Dotan
 


________________________________

	From: Diego Guella [mailto:diego.guella at sircomtech.com] 
	Sent: Wednesday, November 14, 2007 12:11 PM
	To: Dotan Barak; Tzachi Dar; Fab Tillier
	Cc: ofw at lists.openfabrics.org
	Subject: Re: Completion with bad status: IBV_WC_RETRY_EXC_ERROR
	
	
	[resend 1 because of attachment problems]
	 
	 
	Hi Dotan,
	 
	About the failure in RDMA Read, do you mean the garbage data
that's printed out?
	I don't mind about that for now, that's expected because the
buffer isn't initialized in the client.
	 
	 
	I already thought about endianness of the LID, and tried to
change endianness but with no success.
	 
	But now I modified the code in both Windows and Linux to send
out the LID in network order.
	The Windows side does nothing(sends and receives the LID as is).
	The Linunx side changes endianness of the LID before sending,
and changes endianness of the received LID.
	(Attached new sources and executables)
	 
	 
	But the problem is still there.
	 
	 
	Do you have any other ideas?
	 
	 
	Thanks,
	Diego
	

		----- Original Message ----- 
		From: Dotan Barak <mailto:dotanb at mellanox.co.il>  
		To: Diego Guella <mailto:diego.guella at sircomtech.com>  ;
Tzachi Dar <mailto:tzachid at mellanox.co.il>  ; Fab Tillier
<mailto:ftillier at windows.microsoft.com>  
		Cc: ofw at lists.openfabrics.org 
		Sent: Wednesday, November 14, 2007 9:57 AM
		Subject: RE: Completion with bad status:
IBV_WC_RETRY_EXC_ERROR

		Hi.
		 
		I checked your code in Linux and you have a failure in
RDMA Read...
		 
		 But for your problem: in windows the LID of the port is
in network order and in Linux the LID of the port is in host order.
		(order means endianess).
		 
		 
		Fixing this should solve the problem ...
		 
		 
		Dotan
		 


________________________________

			From: Diego Guella
[mailto:diego.guella at sircomtech.com] 
			Sent: Wednesday, November 14, 2007 10:29 AM
			To: Dotan Barak; Tzachi Dar; Fab Tillier
			Cc: ofw at lists.openfabrics.org
			Subject: Re: Completion with bad status:
IBV_WC_RETRY_EXC_ERROR
			
			
			Hi Dotan,
			 
			I apologize for that silly mistake, obviously
WR_SEND is different from IBV_WR_SEND, and the same was for
WR_RDMA_READ, etc. etc...
			 
			So, I removed <iba/ib_types.h> from the
includes, to make sure I don't use them.
			 
			Now the Linux program works with send/recv and
rdma to himself (daemon/client on the same machine), but I still get the
same error when I try communication between Windows/Linux.
			The error applies to SEND, RDMA_WRITE,
RDMA_READ, and using the daemon both on Linux or Windows.
			 
			Attached are the new sources (note that Windows
sources aren't changed).
			 
			 
			 
			Thanks,
			Diego
			 
			 
			 

				----- Original Message ----- 
				From: Dotan Barak
<mailto:dotanb at mellanox.co.il>  
				To: Diego Guella
<mailto:diego.guella at sircomtech.com>  ; Tzachi Dar
<mailto:tzachid at mellanox.co.il>  ; Fab Tillier
<mailto:ftillier at windows.microsoft.com>  
				Cc: ofw at lists.openfabrics.org 
				Sent: Tuesday, November 13, 2007 3:13 PM
				Subject: RE: Completion with bad status:
IBV_WC_RETRY_EXC_ERROR

				o.k., I passed that and the code
compilation stage ....
				 
				 
				Just a small note:
				In the modify_qp: qp_access_flags is
only for supported remote operations, so IBV_ACCESS_LOCAL_WRITE should
be removed
				 
				 
				I understand what the root cause of the
problem is: you took a code from windows and moved only PART of the code
to Linux.
				 
				For example: WR_SEND is defined in
iba/ib_types and has the value 1 (which is RDMA WRITE_WITH_IMM in Linux)
				so actually, you did RDMA write with
rkey and remote address with undefined values.
				(this is only an example for the
corruption that happens during the test execution because of this issue)
				 
				The code passed compilation because you
included iba/ib_types.h (in types.h) in Linux too.
				This file should not be included in
Linux (unless you really need it ...)
				 
				All of the structures, functions,
enumerations in Linux verbs start with IBV_ or ibv_.
				 
				 
				This should fix the test problems (I
hope ...)
				 
				 
				thanks
				Dotan
				 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/ofw/attachments/20071115/d9ac931f/attachment.html>


More information about the ofw mailing list