[ofa-general] Bogus Receive Completions

Roman Kononov kononov at dls.net
Wed Jan 23 16:31:03 PST 2008


On 2008-01-23 17:32, Roland Dreier wrote:
> I'd be curious to run it.  It can't hurt to have the test...

This is similar to my previous program. The difference is that this one 
makes many (up to 10, in test_create()) sets of SQ+RQ+CQ in struct conn_t, 
which share a single Completion Channel in struct ctx_t. Every conn_t has a 
ring of receive buffers, a ring of send buffers, send sequence number, 
receive sequence number. Every time a buffer is sent, just before 
ibv_post_send() call, the send sequence number is placed into the buffer, 
imm_data and wr_id. Upon Send Completion, wr_id and the sent buffer must 
contain the expected send sequence number. Every time a buffer is received, 
just before ibv_post_recv() call, the receive sequence number is placed into 
wr_id. Upon Receive Completion, wr_id, the received buffer and imm_data must 
contain the expected receive sequence number. These 2 "musts" are sometimes 
violated. In my setup assertion fails in lines 303 (receiver) and 287 (sender).

The program has 2 threads. The first one reads the completion channel, 
validates the Send and Receive Completions, issues ibv_post_recv() and 
ibv_post_send(). The second one can only issue ibv_post_send().

The program makes 2 QP. Increasing the number of QP seemly does not increase 
the probability of failure.

The program prints <time: iteration_#_for_QP_#0 iteration_#_for_QP_#1>.

In my setup, it seems that if I run two pairs of the program, the failure 
occurs sooner.

Roman



More information about the general mailing list