[libfabric-users] Queue size question

Tue Mar 23 16:31:25 PDT 2021

This sort of situation is where I miss something like the (now deprecated) 'sockets' provider. As documented at least, it supported the entire interface. Of course that would be work to maintain (and I'm not volunteering, heh), but a thing doesn't have to perform well to be beneficial.

greg

________________________________
From: Libfabric-users <libfabric-users-bounces at lists.openfabrics.org> on behalf of Hefty, Sean <sean.hefty at intel.com>
Sent: Tuesday, March 23, 2021 4:52 PM
To: Biddiscombe, John A. <john.biddiscombe at cscs.ch>; libfabric-users at lists.openfabrics.org <libfabric-users at lists.openfabrics.org>
Subject: Re: [libfabric-users] Queue size question

That feature is not supported.  It would need to be emulated in SW, which wouldn’t provide any real benefit.

From: Biddiscombe, John A. <john.biddiscombe at cscs.ch>
Sent: Tuesday, March 23, 2021 2:02 PM
To: Hefty, Sean <sean.hefty at intel.com>; libfabric-users at lists.openfabrics.org
Subject: Re: Queue size question

I thought I'd experiment with scalable endpoints as an alternative to the thread local endpoints, but I'm getting ENOSYS  from the tcp;rxm_ofi setup.

is that something I can work around with different flags, or is that something that jjust isn't supported (feature matrix doesn't mention it)

Thanks

JB

int fi_no_scalable_ep(struct fid_domain *domain, struct fi_info *info,

              struct fid_ep **sep, void *context)

{

       return -FI_ENOSYS;

}

________________________________

From: Hefty, Sean <sean.hefty at intel.com<mailto:sean.hefty at intel.com>>
Sent: 23 March 2021 00:08:38
To: Biddiscombe, John A.; libfabric-users at lists.openfabrics.org<mailto:libfabric-users at lists.openfabrics.org>
Subject: RE: Queue size question

> I have a test that seems to run fine on tcp;ofi_rxm - though this test is two ranks on
> the same laptop, so it isn't really a very good test - however, I can throw anything at
> it and it seems to reliably complete.
>
> On GNI, I get lockups and after much head scratching, I am wondering what the
> significance of the tx/rx attribute size may be.
>
> On tcp/ofi_rxm the size reports as "size: 65536" and I can have 16 threads each sending
> up to 128 messages in flight on one thread per endpoint, and a single receive endpoint
> handling all receives - possibly 16*128 messages with posted receives = 2048.
>
> When I run on GNI, using two nodes, each reports tx/rx attr "size: 500" - and I find
> that when many messages are in flight, things can lock up because some posted sends are
> never received. This seems to happen even when I drop down to 16 threads with 8 in
> flight messages which ought to be 128 at a time - and I would have suspected that a
> size of 500 (cq size limitation?) would handle this.
>
> Question 1 - what is the tx/rx attr size really telling me?

Unfortunately, this is provider dependent, and there's very little that can be done to define it crisper without forcing an implementation.  In some cases it's related to a HW queue size.  I suspect that may be the case with gni.  However, the HW queue size doesn't necessarily mean that the number is equal to the number of operations that can be queue.  For example, it's possible for a send that requires 2 iovecs to consume 2 entries in the queue.  But each operation consuming 1 entry is usually a safe assumption.

Someone familiar with gni will need to chime in on how it maps to their HW.

> Question 2 - if I post more than the allowed receives or sends, should I not receive
> some kind of error? (I have enabled resource management, so I might expect a retry code
> when I attempt the send/recv)

Yes, you should see -FI_EAGAIN when trying to post more operations that the queues support.  There are checks like this in some providers -- I think rxm, verbs, and tcp all do, and rxm is actually forgiving about it by allowing queues to overflow.  (Because it's easy to swamp a receiver, even with a reasonably well-written app.)

> Ideally, I'd like to throttle the number of messages in flight according to what the
> hardware reports its capabilities - which vars should I use from the fi_info to do
> this?

Resource management is the correct setting.  Manually limited your application to the tx/rx sizes, and sizing the CQ appropriately should have done the trick.

It sounds like this is a problem likely restricted to gni.

- Sean
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/libfabric-users/attachments/20210323/81350547/attachment.htm>