[libfabric-users] sockets provider, number of outstanding reads

Wed Feb 10 09:02:43 PST 2016

Yes, it is possible to get -FI_EAGAIN for any TX/RX operation, especially when there are a lot of TX/RX operations. The app need to handle this case.
However, the fi_tx/rx_size_left() interfaces can be used to get the number of operations that may be posted to the given endpoint without that operation returning -FI_EAGAIN.

I had noticed that you were using size_left() apis in the benchmark. Are you still getting -FI_EAGAIN?
After taking a look at the provider code that checks for tx/size_left, I think we need update it to consider the available space in CQ as well.

- Jithin

-----Original Message-----
From: Ezra Kissel <ezkissel at indiana.edu>
Date: Wednesday, February 10, 2016 at 7:47 AM
To: Jithin Jose <jithin.jose at intel.com>, "Hefty, Sean" <sean.hefty at intel.com>, "libfabric-users at lists.openfabrics.org" <libfabric-users at lists.openfabrics.org>
Subject: Re: [libfabric-users] sockets provider, number of outstanding reads

>Hi Jithin,
>
>Thanks for responding so quickly.  Your patch does seem to eliminate the 
>hang but now I'll occasionally have the RMA read/write return -EAGAIN 
>when flooding.  Is that expected or is there some way to ensure that N 
>number of operations can be handled?
>
>- ezra
>
>On 2/10/2016 1:09 AM, Jose, Jithin wrote:
>> Hi Ezra,
>>
>> Thanks for the reproducer, it helped in identifying the issue.
>> There was an issue in the provider that prevented the RMA-ack control messages to handled, when flooded with RMA ops.
>>
>> This pull-request should fix the issue:
>> https://github.com/ofiwg/libfabric/pull/1734
>>
>>
>> I had tried the benchmark with this fix, and it worked fine for me. Can you verify as well?
>>
>> Thanks,
>> - Jithin
>>
>>
>>
>>
>>
>>
>> -----Original Message-----
>> From: Ezra Kissel <ezkissel at indiana.edu>
>> Date: Tuesday, February 9, 2016 at 2:20 PM
>> To: Jithin Jose <jithin.jose at intel.com>, "Hefty, Sean" <sean.hefty at intel.com>, "libfabric-users at lists.openfabrics.org" <libfabric-users at lists.openfabrics.org>
>> Subject: Re: [libfabric-users] sockets provider, number of outstanding reads
>>
>>> I was able to extract enough code from my larger library to reproduce
>>> the issue I'm facing, and it's hopefully concise enough for others to
>>> debug.
>>>
>>> https://github.com/disprosium8/fi_thread_test
>>>
>>> You'll need MPI and a couple nodes to test on.
>>>
>>> There's no apparent problem when there's a single sender (see outer for
>>> loop in test), but introducing multiple senders, all reading the same
>>> buffer from each peer, seems to create some issue in the provider.  I'll
>>> see if I can narrow down what's actually causing it to spin yet.
>>>
>>> - ezra
>>>
>>> On 2/9/2016 1:04 PM, Jose, Jithin wrote:
>>>> There are no specific limits for outstanding number of reads in sockets-provider.
>>>> A reproducer will definitely help here.
>>>>
>>>> - Jithin
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> -----Original Message-----
>>>> From: "Hefty, Sean" <sean.hefty at intel.com>
>>>> Date: Tuesday, February 9, 2016 at 9:54 AM
>>>> To: Ezra Kissel <ezkissel at indiana.edu>, "libfabric-users at lists.openfabrics.org" <libfabric-users at lists.openfabrics.org>, Jithin Jose <jithin.jose at intel.com>
>>>> Subject: RE: [libfabric-users] sockets provider, number of outstanding reads
>>>>
>>>>> Copying Jithin, who maintains the sockets provider, in case he's not subscribed to this list.  This sounds like it may be exposing a bug in the provider.  Is the source code of the test available somewhere?
>>>>>
>>>>>
>>>>>> I recently ran into this behavior using the sockets provider in a
>>>>>> threaded environment, using FI_RMA.  My example attempts to post N
>>>>>> number of fi_reads to get a registered source buffer into a registered
>>>>>> destination buffer.  This example is a simple benchmark so I'm not
>>>>>> worried about the integrity of the destination buffer.  Let's say N=256,
>>>>>> so for each message size, I'll do 256 fi_reads for each message size up
>>>>>> to a max of 1MB, waiting for N completions after each iteration.  Base
>>>>>> pointers for each N remain the same.
>>>>>>
>>>>>> This works great up to around 256K size reads.  After that point I will
>>>>>> no longer get local completions and the example hangs.  There are two
>>>>>> things I can do to get it to work reliably:
>>>>>>
>>>>>> 1) Simply increase the destination buffer size to msg_size*N, but still
>>>>>> issuing reads at the same base offset works.  Incrementing the
>>>>>> destination offset by msg_size for each N also works in this case, as
>>>>>> expected.
>>>>>>
>>>>>> 2) Limit the posting rate of reads for larger sizes.  I check to make
>>>>>> sure I don't exceed fi_tx_size_left() but I'll have to throttle the rate
>>>>>> significantly to ensure progress, I think it was around 32MB "in-flight"
>>>>>> before it was stable on my system.
>>>>>>
>>>>>> I don't see this behavior with psm provider, or with fi_writes.
>>>>>> Question is, is there some read limit I need to observe that I'm missing
>>>>>> or is there some issue with lots of outstanding socket reads?
>>>>>