[libfabric-users] fi_sockets remote CQ data ordering

Jose, Jithin jithin.jose at intel.com
Fri Feb 26 15:51:46 PST 2016


Hi Ezra,

I have the patch to fix this in pr/sockets. The benchmark seems to be running fine on the ARM cluster. 
Can you verify as well?

Thanks,
- Jithin







-----Original Message-----
From: Ezra Kissel <ezkissel at indiana.edu>
Date: Tuesday, February 23, 2016 at 5:50 PM
To: Jithin Jose <jithin.jose at intel.com>, "libfabric-users at lists.openfabrics.org" <libfabric-users at lists.openfabrics.org>
Subject: Re: [libfabric-users] fi_sockets remote CQ data ordering

>On 2/23/2016 4:02 PM, Jose, Jithin wrote:
>> Hi Ezra,
>>
>> Does multiple senders imply multiple Eps or multiple threads?
>>
>> If TX operations are posted to the same EP, then the completions should be in order. It might be a bug if the completions are out-of-order. It would be great if we could get a reproducer for this.
>>
>
>https://github.com/disprosium8/fi_thread_test/blob/master/fi_rma_thread_rcq.c#L124
>
>I modified my previous rcq test to check the expected iteration encoded 
>in the immediate data.  The test is currently set to start with 2 
>sending nodes, 4k writes.
>
>Ranks  Senders  Bytes       Sync PUT
>2      2        4096        fi_rma_thread_rcq: Expecting iter 1809, got 
>1810 from RCQ
>
>
>I have only been able to reproduce this on 2-core ARMv7 nodes up to this 
>point, these boards to be specific: https://www.parallella.org/board/
>
>I'm queuing up a bunch of runs on x86_64 systems to see if it ever fails 
>there, but so far that identical test has been completing without 
>complications. I've tried constraining the core count for each process, 
>etc.  Any chance it's a 32b issue?  I unfortunately don't have any i386 
>nodes handy at the moment.
>
>I'm testing against 
>https://github.com/jithinjosepkl/libfabric/tree/pr/sockets
>
>- ezra


More information about the Libfabric-users mailing list