[libfabric-users] fi_sockets remote CQ data ordering
Jose, Jithin
jithin.jose at intel.com
Fri Feb 26 15:51:46 PST 2016
Hi Ezra,
I have the patch to fix this in pr/sockets. The benchmark seems to be running fine on the ARM cluster.
Can you verify as well?
Thanks,
- Jithin
-----Original Message-----
From: Ezra Kissel <ezkissel at indiana.edu>
Date: Tuesday, February 23, 2016 at 5:50 PM
To: Jithin Jose <jithin.jose at intel.com>, "libfabric-users at lists.openfabrics.org" <libfabric-users at lists.openfabrics.org>
Subject: Re: [libfabric-users] fi_sockets remote CQ data ordering
>On 2/23/2016 4:02 PM, Jose, Jithin wrote:
>> Hi Ezra,
>>
>> Does multiple senders imply multiple Eps or multiple threads?
>>
>> If TX operations are posted to the same EP, then the completions should be in order. It might be a bug if the completions are out-of-order. It would be great if we could get a reproducer for this.
>>
>
>https://github.com/disprosium8/fi_thread_test/blob/master/fi_rma_thread_rcq.c#L124
>
>I modified my previous rcq test to check the expected iteration encoded
>in the immediate data. The test is currently set to start with 2
>sending nodes, 4k writes.
>
>Ranks Senders Bytes Sync PUT
>2 2 4096 fi_rma_thread_rcq: Expecting iter 1809, got
>1810 from RCQ
>
>
>I have only been able to reproduce this on 2-core ARMv7 nodes up to this
>point, these boards to be specific: https://www.parallella.org/board/
>
>I'm queuing up a bunch of runs on x86_64 systems to see if it ever fails
>there, but so far that identical test has been completing without
>complications. I've tried constraining the core count for each process,
>etc. Any chance it's a 32b issue? I unfortunately don't have any i386
>nodes handy at the moment.
>
>I'm testing against
>https://github.com/jithinjosepkl/libfabric/tree/pr/sockets
>
>- ezra
More information about the Libfabric-users
mailing list