[libfabric-users] fi_sockets remote CQ data ordering

Ezra Kissel ezkissel at indiana.edu
Tue Feb 23 17:50:59 PST 2016


On 2/23/2016 4:02 PM, Jose, Jithin wrote:
> Hi Ezra,
>
> Does multiple senders imply multiple Eps or multiple threads?
>
> If TX operations are posted to the same EP, then the completions should be in order. It might be a bug if the completions are out-of-order. It would be great if we could get a reproducer for this.
>

https://github.com/disprosium8/fi_thread_test/blob/master/fi_rma_thread_rcq.c#L124

I modified my previous rcq test to check the expected iteration encoded 
in the immediate data.  The test is currently set to start with 2 
sending nodes, 4k writes.

Ranks  Senders  Bytes       Sync PUT
2      2        4096        fi_rma_thread_rcq: Expecting iter 1809, got 
1810 from RCQ


I have only been able to reproduce this on 2-core ARMv7 nodes up to this 
point, these boards to be specific: https://www.parallella.org/board/

I'm queuing up a bunch of runs on x86_64 systems to see if it ever fails 
there, but so far that identical test has been completing without 
complications. I've tried constraining the core count for each process, 
etc.  Any chance it's a 32b issue?  I unfortunately don't have any i386 
nodes handy at the moment.

I'm testing against 
https://github.com/jithinjosepkl/libfabric/tree/pr/sockets

- ezra



More information about the Libfabric-users mailing list