[libfabric-users] sockets provider, number of outstanding reads
Ezra Kissel
ezkissel at indiana.edu
Tue Feb 9 08:44:39 PST 2016
Hi,
I recently ran into this behavior using the sockets provider in a
threaded environment, using FI_RMA. My example attempts to post N
number of fi_reads to get a registered source buffer into a registered
destination buffer. This example is a simple benchmark so I'm not
worried about the integrity of the destination buffer. Let's say N=256,
so for each message size, I'll do 256 fi_reads for each message size up
to a max of 1MB, waiting for N completions after each iteration. Base
pointers for each N remain the same.
This works great up to around 256K size reads. After that point I will
no longer get local completions and the example hangs. There are two
things I can do to get it to work reliably:
1) Simply increase the destination buffer size to msg_size*N, but still
issuing reads at the same base offset works. Incrementing the
destination offset by msg_size for each N also works in this case, as
expected.
2) Limit the posting rate of reads for larger sizes. I check to make
sure I don't exceed fi_tx_size_left() but I'll have to throttle the rate
significantly to ensure progress, I think it was around 32MB "in-flight"
before it was stable on my system.
I don't see this behavior with psm provider, or with fi_writes.
Question is, is there some read limit I need to observe that I'm missing
or is there some issue with lots of outstanding socket reads?
thanks,
- ezra
More information about the Libfabric-users
mailing list