[ofiwg] Thinking about writing a provider...

Heinz, Michael William michael.william.heinz at cornelisnetworks.com
Sun Apr 18 10:20:06 PDT 2021


So, I have a question related to Hugo’s question… other than https://ofiwg.github.io/libfabric/ and fi_provider, are there documents I can read on how to be a better maintainer of the PSM2 provider?

-----
Michael Heinz
michael.william.heinz at cornelisnetworks.com<mailto:michael.william.heinz at cornelisnetworks.com>

On Apr 14, 2021, at 3:00 PM, ofiwg-request at lists.openfabrics.org<mailto:ofiwg-request at lists.openfabrics.org> wrote:

Send ofiwg mailing list submissions to
ofiwg at lists.openfabrics.org<mailto:ofiwg at lists.openfabrics.org>

To subscribe or unsubscribe via the World Wide Web, visit
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.openfabrics.org%2Fmailman%2Flistinfo%2Fofiwg&data=04%7C01%7Cmichael.william.heinz%40cornelisnetworks.com%7Cac3c4925ee2344d3ff1f08d8ff7761e4%7C4dbdb7da74ee4b458747ef5ce5ebe68a%7C0%7C0%7C637540235521136674%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=rgTe5E7EnhJPa6P4cVZvGlMm6%2BNRAdHRsve79KQTOSM%3D&reserved=0
or, via email, send a message with subject or body 'help' to
ofiwg-request at lists.openfabrics.org

You can reach the person managing the list at
ofiwg-owner at lists.openfabrics.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of ofiwg digest..."


Today's Topics:

  1. Resource Management RX CQ and address vectors (BOLLORE, HUGO)
  2. Re: Resource Management RX CQ and address vectors (Hefty, Sean)


----------------------------------------------------------------------

Message: 1
Date: Wed, 14 Apr 2021 16:33:54 +0000
From: "BOLLORE, HUGO" <hugo.bollore at atos.net>
To: "ofiwg at lists.openfabrics.org" <ofiwg at lists.openfabrics.org>
Subject: [ofiwg] Resource Management RX CQ and address vectors
Message-ID: <475d88f7-1b8e-abbb-e054-b9c5b8e5da96 at atos.net>
Content-Type: text/plain; charset="utf-8"

Hello,

I'm working with Mehdi Bendahhou (Atos) on a new provider.
We have a few questions about the API specification.

First, we were wondering how to deal with the MULTI_RECV option when
checking overruns of the RX CQ.
When fi_recv is called with this option we cannot know how many messages
will be received and thus how many completions may result of this operation.
If resource management is enabled, documentation states that the
provider must return -FI_EAGAIN if an operation could result in CQ overruns.
Is there a specific case to apply with the MULTI_RECV option ? If not,
what should we do in this situation ?

Second, we are unsure about the state of an address vector during the
execution of an application.
Are address vectors static as soon as the endpoint is enabled or can
they change dynamically ?

Last, in a connectionless endpoint, is it required for a receiving
application to have the sender address in its address vector if the
fi_recv is posted with a src_addr to FI_ADDR_UNSPEC and/or
FI_DIRECTED_RECV is disabled ?

If this helps to answer the last two questions, we are trying to
determine if at any given point an endpoint is able to retrieve a list
of all address/endpoints it is communicating with (both in emission and
reception).

Thanks in advance to those who can enlighten us !
Hugo Bolloré - Atos

------------------------------

Message: 2
Date: Wed, 14 Apr 2021 17:22:05 +0000
From: "Hefty, Sean" <sean.hefty at intel.com>
To: "BOLLORE, HUGO" <hugo.bollore at atos.net>,
"ofiwg at lists.openfabrics.org" <ofiwg at lists.openfabrics.org>
Subject: Re: [ofiwg] Resource Management RX CQ and address vectors
Message-ID:
<DM6PR11MB4609A51ED8C1A6694E0E55559E4E9 at DM6PR11MB4609.namprd11.prod.outlook.com>

Content-Type: text/plain; charset="utf-8"

I'm working with Mehdi Bendahhou (Atos) on a new provider.
We have a few questions about the API specification.

Welcome!

First, we were wondering how to deal with the MULTI_RECV option when
checking overruns of the RX CQ.
When fi_recv is called with this option we cannot know how many messages
will be received and thus how many completions may result of this operation.
If resource management is enabled, documentation states that the
provider must return -FI_EAGAIN if an operation could result in CQ overruns.
Is there a specific case to apply with the MULTI_RECV option ? If not,
what should we do in this situation ?

This probably needs to be handled as part of flow control, to back off the sender if handling the receive would result in CQ overflow.  In the current implementations that I'm aware of that handle MULTI_RECV, CQ overflow is handled.  The overflow entries are queued in separate location.  For example, the util_cq code will handle this.

We can discuss if the man pages need to be updated exclude multi-recv from resource management requirements.

Second, we are unsure about the state of an address vector during the
execution of an application.
Are address vectors static as soon as the endpoint is enabled or can
they change dynamically ?

They can change dynamically, and some applications do this (fabtest for one).  But ultimately, this comes down to what your hardware can do and its target applications.  If needed, you can document this restriction.  I know there are environments where the implementation ends up using a static address vector, possibly pre-loaded before the app starts and shared between processes.

Last, in a connectionless endpoint, is it required for a receiving
application to have the sender address in its address vector if the
fi_recv is posted with a src_addr to FI_ADDR_UNSPEC and/or
FI_DIRECTED_RECV is disabled ?

No, the receiver does not have to have the sender's address in its AV.  Unidirectional transfers are supported.  The use of an out of band, separate communication library, should not be required.

If this helps to answer the last two questions, we are trying to
determine if at any given point an endpoint is able to retrieve a list
of all address/endpoints it is communicating with (both in emission and
reception).

It's not required from an API perspective.

FWIW, the util providers (rxm, rxd) handle this by maintaining a map between addresses in the local AV, versus those peers that its communicating with.  It's possible for the AV to contains addresses where remote communication has not (yet) be setup.  Likewise, the endpoint may communicate with peers that are not in the local AV.

- Sean

------------------------------

Subject: Digest Footer

_______________________________________________
ofiwg mailing list
ofiwg at lists.openfabrics.org
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.openfabrics.org%2Fmailman%2Flistinfo%2Fofiwg&data=04%7C01%7Cmichael.william.heinz%40cornelisnetworks.com%7Cac3c4925ee2344d3ff1f08d8ff7761e4%7C4dbdb7da74ee4b458747ef5ce5ebe68a%7C0%7C0%7C637540235521136674%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=rgTe5E7EnhJPa6P4cVZvGlMm6%2BNRAdHRsve79KQTOSM%3D&reserved=0


------------------------------

End of ofiwg Digest, Vol 93, Issue 4
************************************

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/ofiwg/attachments/20210418/98c78ca5/attachment-0001.htm>


More information about the ofiwg mailing list