[libfabric-users] sockets provider, number of outstanding reads

Wed Feb 10 09:43:21 PST 2016

Right, that was my understanding of tx_size_left().  My test spins until 
the size_left() return is not zero before issuing the TX operation and 
still sees -FI_EAGAIN on occasion.

On 2/10/2016 12:02 PM, Jose, Jithin wrote:
> Yes, it is possible to get -FI_EAGAIN for any TX/RX operation, especially when there are a lot of TX/RX operations. The app need to handle this case.
> However, the fi_tx/rx_size_left() interfaces can be used to get the number of operations that may be posted to the given endpoint without that operation returning -FI_EAGAIN.
>
> I had noticed that you were using size_left() apis in the benchmark. Are you still getting -FI_EAGAIN?
> After taking a look at the provider code that checks for tx/size_left, I think we need update it to consider the available space in CQ as well.
>
> - Jithin
>
>
>
>
>
> -----Original Message-----
> From: Ezra Kissel <ezkissel at indiana.edu>
> Date: Wednesday, February 10, 2016 at 7:47 AM
> To: Jithin Jose <jithin.jose at intel.com>, "Hefty, Sean" <sean.hefty at intel.com>, "libfabric-users at lists.openfabrics.org" <libfabric-users at lists.openfabrics.org>
> Subject: Re: [libfabric-users] sockets provider, number of outstanding reads
>
>> Hi Jithin,
>>
>> Thanks for responding so quickly.  Your patch does seem to eliminate the
>> hang but now I'll occasionally have the RMA read/write return -EAGAIN
>> when flooding.  Is that expected or is there some way to ensure that N
>> number of operations can be handled?
>>
>> - ezra
>>
>> On 2/10/2016 1:09 AM, Jose, Jithin wrote:
>>> Hi Ezra,
>>>
>>> Thanks for the reproducer, it helped in identifying the issue.
>>> There was an issue in the provider that prevented the RMA-ack control messages to handled, when flooded with RMA ops.
>>>
>>> This pull-request should fix the issue:
>>> https://github.com/ofiwg/libfabric/pull/1734
>>>
>>>
>>> I had tried the benchmark with this fix, and it worked fine for me. Can you verify as well?
>>>
>>> Thanks,
>>> - Jithin
>>>
>>>
>>>
>>>
>>>
>>>
>>> -----Original Message-----
>>> From: Ezra Kissel <ezkissel at indiana.edu>
>>> Date: Tuesday, February 9, 2016 at 2:20 PM
>>> To: Jithin Jose <jithin.jose at intel.com>, "Hefty, Sean" <sean.hefty at intel.com>, "libfabric-users at lists.openfabrics.org" <libfabric-users at lists.openfabrics.org>
>>> Subject: Re: [libfabric-users] sockets provider, number of outstanding reads
>>>
>>>> I was able to extract enough code from my larger library to reproduce
>>>> the issue I'm facing, and it's hopefully concise enough for others to
>>>> debug.
>>>>
>>>> https://github.com/disprosium8/fi_thread_test
>>>>
>>>> You'll need MPI and a couple nodes to test on.
>>>>
>>>> There's no apparent problem when there's a single sender (see outer for
>>>> loop in test), but introducing multiple senders, all reading the same
>>>> buffer from each peer, seems to create some issue in the provider.  I'll
>>>> see if I can narrow down what's actually causing it to spin yet.
>>>>
>>>> - ezra
>>>>
>>>> On 2/9/2016 1:04 PM, Jose, Jithin wrote:
>>>>> There are no specific limits for outstanding number of reads in sockets-provider.
>>>>> A reproducer will definitely help here.
>>>>>
>>>>> - Jithin
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> -----Original Message-----
>>>>> From: "Hefty, Sean" <sean.hefty at intel.com>
>>>>> Date: Tuesday, February 9, 2016 at 9:54 AM
>>>>> To: Ezra Kissel <ezkissel at indiana.edu>, "libfabric-users at lists.openfabrics.org" <libfabric-users at lists.openfabrics.org>, Jithin Jose <jithin.jose at intel.com>
>>>>> Subject: RE: [libfabric-users] sockets provider, number of outstanding reads
>>>>>
>>>>>> Copying Jithin, who maintains the sockets provider, in case he's not subscribed to this list.  This sounds like it may be exposing a bug in the provider.  Is the source code of the test available somewhere?
>>>>>>
>>>>>>
>>>>>>> I recently ran into this behavior using the sockets provider in a
>>>>>>> threaded environment, using FI_RMA.  My example attempts to post N
>>>>>>> number of fi_reads to get a registered source buffer into a registered
>>>>>>> destination buffer.  This example is a simple benchmark so I'm not
>>>>>>> worried about the integrity of the destination buffer.  Let's say N=256,
>>>>>>> so for each message size, I'll do 256 fi_reads for each message size up
>>>>>>> to a max of 1MB, waiting for N completions after each iteration.  Base
>>>>>>> pointers for each N remain the same.
>>>>>>>
>>>>>>> This works great up to around 256K size reads.  After that point I will
>>>>>>> no longer get local completions and the example hangs.  There are two
>>>>>>> things I can do to get it to work reliably:
>>>>>>>
>>>>>>> 1) Simply increase the destination buffer size to msg_size*N, but still
>>>>>>> issuing reads at the same base offset works.  Incrementing the
>>>>>>> destination offset by msg_size for each N also works in this case, as
>>>>>>> expected.
>>>>>>>
>>>>>>> 2) Limit the posting rate of reads for larger sizes.  I check to make
>>>>>>> sure I don't exceed fi_tx_size_left() but I'll have to throttle the rate
>>>>>>> significantly to ensure progress, I think it was around 32MB "in-flight"
>>>>>>> before it was stable on my system.
>>>>>>>
>>>>>>> I don't see this behavior with psm provider, or with fi_writes.
>>>>>>> Question is, is there some read limit I need to observe that I'm missing
>>>>>>> or is there some issue with lots of outstanding socket reads?
>>>>>>