[libfabric-users] sockets provider, number of outstanding reads

Tue Feb 9 10:04:35 PST 2016

There are no specific limits for outstanding number of reads in sockets-provider.
A reproducer will definitely help here.

- Jithin

-----Original Message-----
From: "Hefty, Sean" <sean.hefty at intel.com>
Date: Tuesday, February 9, 2016 at 9:54 AM
To: Ezra Kissel <ezkissel at indiana.edu>, "libfabric-users at lists.openfabrics.org" <libfabric-users at lists.openfabrics.org>, Jithin Jose <jithin.jose at intel.com>
Subject: RE: [libfabric-users] sockets provider, number of outstanding reads

>Copying Jithin, who maintains the sockets provider, in case he's not subscribed to this list.  This sounds like it may be exposing a bug in the provider.  Is the source code of the test available somewhere?
>
>
>> I recently ran into this behavior using the sockets provider in a
>> threaded environment, using FI_RMA.  My example attempts to post N
>> number of fi_reads to get a registered source buffer into a registered
>> destination buffer.  This example is a simple benchmark so I'm not
>> worried about the integrity of the destination buffer.  Let's say N=256,
>> so for each message size, I'll do 256 fi_reads for each message size up
>> to a max of 1MB, waiting for N completions after each iteration.  Base
>> pointers for each N remain the same.
>> 
>> This works great up to around 256K size reads.  After that point I will
>> no longer get local completions and the example hangs.  There are two
>> things I can do to get it to work reliably:
>> 
>> 1) Simply increase the destination buffer size to msg_size*N, but still
>> issuing reads at the same base offset works.  Incrementing the
>> destination offset by msg_size for each N also works in this case, as
>> expected.
>> 
>> 2) Limit the posting rate of reads for larger sizes.  I check to make
>> sure I don't exceed fi_tx_size_left() but I'll have to throttle the rate
>> significantly to ensure progress, I think it was around 32MB "in-flight"
>> before it was stable on my system.
>> 
>> I don't see this behavior with psm provider, or with fi_writes.
>> Question is, is there some read limit I need to observe that I'm missing
>> or is there some issue with lots of outstanding socket reads?
>