[libfabric-users] sockets provider, number of outstanding reads

Hefty, Sean sean.hefty at intel.com
Tue Feb 9 09:54:03 PST 2016


Copying Jithin, who maintains the sockets provider, in case he's not subscribed to this list.  This sounds like it may be exposing a bug in the provider.  Is the source code of the test available somewhere?


> I recently ran into this behavior using the sockets provider in a
> threaded environment, using FI_RMA.  My example attempts to post N
> number of fi_reads to get a registered source buffer into a registered
> destination buffer.  This example is a simple benchmark so I'm not
> worried about the integrity of the destination buffer.  Let's say N=256,
> so for each message size, I'll do 256 fi_reads for each message size up
> to a max of 1MB, waiting for N completions after each iteration.  Base
> pointers for each N remain the same.
> 
> This works great up to around 256K size reads.  After that point I will
> no longer get local completions and the example hangs.  There are two
> things I can do to get it to work reliably:
> 
> 1) Simply increase the destination buffer size to msg_size*N, but still
> issuing reads at the same base offset works.  Incrementing the
> destination offset by msg_size for each N also works in this case, as
> expected.
> 
> 2) Limit the posting rate of reads for larger sizes.  I check to make
> sure I don't exceed fi_tx_size_left() but I'll have to throttle the rate
> significantly to ensure progress, I think it was around 32MB "in-flight"
> before it was stable on my system.
> 
> I don't see this behavior with psm provider, or with fi_writes.
> Question is, is there some read limit I need to observe that I'm missing
> or is there some issue with lots of outstanding socket reads?




More information about the Libfabric-users mailing list