[libfabric-users] fi_sockets remote CQ data ordering
Ezra Kissel
ezkissel at indiana.edu
Tue Feb 23 17:50:59 PST 2016
On 2/23/2016 4:02 PM, Jose, Jithin wrote:
> Hi Ezra,
>
> Does multiple senders imply multiple Eps or multiple threads?
>
> If TX operations are posted to the same EP, then the completions should be in order. It might be a bug if the completions are out-of-order. It would be great if we could get a reproducer for this.
>
https://github.com/disprosium8/fi_thread_test/blob/master/fi_rma_thread_rcq.c#L124
I modified my previous rcq test to check the expected iteration encoded
in the immediate data. The test is currently set to start with 2
sending nodes, 4k writes.
Ranks Senders Bytes Sync PUT
2 2 4096 fi_rma_thread_rcq: Expecting iter 1809, got
1810 from RCQ
I have only been able to reproduce this on 2-core ARMv7 nodes up to this
point, these boards to be specific: https://www.parallella.org/board/
I'm queuing up a bunch of runs on x86_64 systems to see if it ever fails
there, but so far that identical test has been completing without
complications. I've tried constraining the core count for each process,
etc. Any chance it's a 32b issue? I unfortunately don't have any i386
nodes handy at the moment.
I'm testing against
https://github.com/jithinjosepkl/libfabric/tree/pr/sockets
- ezra
More information about the Libfabric-users
mailing list