[openib-general] What can be the reason for VAPI_WR_FLUSH_ERR when sending from gen2 to gen1

Jack Morgenstein jackm at dev.mellanox.co.il
Sun Sep 17 08:01:37 PDT 2006


On Friday 15 September 2006 12:37, Bub Thomas wrote:
> I'm now in the situation that I have a gen2 client connected to a gen1
> server via CM.
> Unfortunately the first IBV_WR_SEND causes a:
> (syndrome=0xf9=VAPI_WR_FLUSH_ERR , opcode=6=VAPI_CQE_RQ_SEND_DATA)
> error in the receive completion queue of the server.
> 
Its not at all clear what the error could be.  The Gen1 and Gen2 stacks
are implemented with totally different code.

Some suggestions (together with dotan at mellanox.co.il):
1. Connect a CATC/analyzer to the wire and capture the detailed traffic.
   Examine the CM messages exchanged to see that they are correct.

2. It sounds like the server QP is already in an error state when the first
   send is performed. Query the QP on the server side before performing the
   first server send to verify that it is in the RTS state.

3. Examine /var/log/messages on the server side to see if there were any
   CQ overruns (which would cause the associated QP to enter an error state).

PLEASE NOTE:  The opcode field is NOT valid in a completion-with-error. The only
	valid fields upon error completion are the status and work-request-id
	fields (all other completion fields are undefined).  Therefore, you
	cannot depend on the opcode value!  You need to save work request
        information keyed to the transaction ID to know what really happened.

Another question:  is the send you are talking about on the client side?
Is it a regular send, or an rdma operation?

- Jack





More information about the general mailing list