[libfabric-users] fi_cq_sread fails with "Resource temporarily unavailable"
Niyaz Murshed
Niyaz.Murshed at arm.com
Fri May 3 13:55:27 PDT 2024
Should not take long I think.
Server: fi_rma_pingpong -s 192.168.1.100 -e msg
Client : fi_rma_pingpong 192.168.1.100 -e msg
bytes iters total time MB/sec usec/xfer Mxfers/sec
64 10k 1.2m 0.03s 42.54 1.50 0.66
256 10k 4.8m 0.04s 139.01 1.84 0.54
1k 10k 19m 0.05s 424.88 2.41 0.41
4k 10k 78m 0.05s 1580.58 2.59 0.39
64k 1k 125m 0.01s 8886.24 7.38 0.14
1m 100 200m 0.02s 11581.36 90.54 0.01
real 0m1.129s
user 0m0.225s
sys 0m0.478s
From: Libfabric-users <libfabric-users-bounces at lists.openfabrics.org> on behalf of Alisa Parashchenko <ge24cuc at mytum.de>
Date: Friday, May 3, 2024 at 2:49 PM
To: Zegelstein, Seth <szegel at amazon.com>, libfabric-users at lists.openfabrics.org <libfabric-users at lists.openfabrics.org>
Subject: Re: [libfabric-users] fi_cq_sread fails with "Resource temporarily unavailable"
Hello,
I found my mistake. The fi_cq_sread() fails because I set the wrong
timeout (0 instead of -1). I really should have noticed that sooner.
However, with the correct timeout set, fi_inject() returns the same
error. It works eventually if I retry often enough:
while ((ret = fi_inject(ep, buf, 6, 1)) == -FI_EAGAIN);
But do I really have to just retry until it works, or is there a better way?
Running fi_pingpong from utils/ on my setup does work. The
fi_rma_pingpong from fabtests/benchmarks/ keeps running for many
minutes, even if I specify the smallest size and only 10 iterations
(i.e. "fi_rma_pingpong -S l:1 -I 10" for the server, and
"fi_rma_pingpong -S l:1 -I 10 localhost" for the client). Is it supposed
to run this long and I should just wait, or am I doing something wrong?
Regards,
Alisa
01.05.2024 17:33, Zegelstein, Seth wrote:
> Hey Alisa,
>
> Can you start with trying to run fabtests on your setup? Start with
one of the pinpong tests.
>
> Best,
> Seth
>
> On 5/1/24, 6:29 AM, "Libfabric-users on behalf of Alisa
Parashchenko" <libfabric-users-bounces at lists.openfabrics.org
<mailto:libfabric-users-bounces at lists.openfabrics.org> on behalf of
ge24cuc at mytum.de <mailto:ge24cuc at mytum.de>> wrote:
>
>
> CAUTION: This email originated from outside of the organization. Do
not click links or open attachments unless you can confirm the sender
and know the content is safe.
>
>
>
>
>
>
> Hello,
>
>
> I am new to Libfabric and trying to write some code that does RMAs.
> Currently, however, even reading from the completion queue after doing a
> regular fi_recv() is failing with "Resource temporarily unavailable".
>
>
> Here is a minimal program that gets this error. Could someone tell me
> what I'm doing wrong? Setting FI_LOG_LEVEL=Debug didn't give any helpful
> information. I am on a regular Linux desktop, with Libfabric using its
> TCP provider, if that's relevant.
>
>
> Regards,
> Alisa
>
>
> #include <assert.h>
> #include <errno.h>
> #include <stdlib.h>
> #include <stdio.h>
> #include <unistd.h>
>
>
> #include <rdma/fabric.h>
> #include <rdma/fi_cm.h>
> #include <rdma/fi_domain.h>
> #include <rdma/fi_endpoint.h>
> #include <rdma/fi_rma.h>
>
>
> #define PANIC_NZ(a) if ((ret = a)) panic("" #a "", fi_strerror(ret));
>
>
> static struct fi_info *info;
> static struct fid_fabric *fabric;
> static struct fid_domain *domain;
> static struct fid_ep *ep;
> static struct fi_av_attr av_attr = { 0 };
> static struct fi_cq_attr cq_attr = { 0 };
> static struct fi_eq_attr eq_attr = { 0 };
> static struct fid_av *av;
> static struct fid_cq *cq;
> static struct fid_eq *eq;
> int ret;
>
>
> void panic(char *f, const char *msg) {
> fprintf(stderr, "%s failed: %s\n", f, msg);
> exit(1);
> }
>
>
> void hexdump(int len, void *buf) {
> for (int i = 0; i < len; i++) printf("%02hhx ", ((char*)buf)[i]);
> printf("\n");
> }
>
>
> int main(int argc, char **argv) {
> char *host = "localhost";
> int is_server = argc <= 1;
> char *port = is_server ? "1234" : "4321" ;
>
>
> /* Select fabric */
> struct fi_info *hints = fi_allocinfo();
> hints->ep_attr->type = FI_EP_RDM;
> hints->caps = FI_MSG | FI_RMA;
> PANIC_NZ(fi_getinfo(FI_VERSION(1,21), host, port, FI_SOURCE, hints,
> &info));
> printf("Selected fabric \"%s\", domain \"%s\"\n",
> info->fabric_attr->name, info->domain_attr->name);
> fi_freeinfo(hints);
>
>
> /* Set up address vector */
> PANIC_NZ(fi_fabric(info->fabric_attr, &fabric, NULL));
> PANIC_NZ(fi_domain(fabric, info, &domain, NULL));
> av_attr.type = FI_AV_TABLE;
> av_attr.count = 2;
> PANIC_NZ(fi_av_open(domain, &av_attr, &av, NULL));
>
>
> /* Open the endpoint, bind it to an EQ, CQ, and AV*/
> PANIC_NZ(fi_endpoint(domain, info, &ep, NULL));
> cq_attr.wait_obj = FI_WAIT_UNSPEC;
> PANIC_NZ(fi_cq_open(domain, &cq_attr, &cq, NULL));
> PANIC_NZ(fi_eq_open(fabric, &eq_attr, &eq, NULL));
> PANIC_NZ(fi_ep_bind(ep, &av->fid, 0));
> PANIC_NZ(fi_ep_bind(ep, &cq->fid, FI_TRANSMIT|FI_RECV));
> PANIC_NZ(fi_ep_bind(ep, &eq->fid, 0));
> PANIC_NZ(fi_enable(ep));
>
>
> /* Get the address of the endpoint */
> char fi_addr[160];
> size_t fi_addrlen = 160;
> PANIC_NZ(fi_getname(&ep->fid, fi_addr, &fi_addrlen));
> printf("Got libfabric EP addr of length %zu:\n", fi_addrlen);
> hexdump(fi_addrlen, fi_addr);
>
>
> /* Insert own address and peer's address into AV */
> ret = fi_av_insert(av, fi_addr, 1, NULL, 0, NULL);
> assert(ret == 1);
> /* Obviously not the right way to do this, but the shortest way */
> char *peer_port = is_server ? "\x10\xe1" : "\x04\xd2";
> memcpy(fi_addr + 2, peer_port, 2);
> ret = fi_av_insert(av, fi_addr, 1, NULL, 0, NULL);
> assert(ret == 1);
>
>
> /* Try to exchange a message */
> if (is_server) {
> char buf[6];
> char cq_buf[128];
> PANIC_NZ(fi_recv(ep, buf, 5, NULL, 1, NULL));
> ret = fi_cq_sread(cq, cq_buf, 1, NULL, 0);
> if (ret < 0) panic("fi_cq_sread", fi_strerror(ret));
> printf("Got message: %s\n", buf);
> } else {
> char buf[6] = "Hello";
> PANIC_NZ(fi_inject(ep, buf, 6, 1));
> }
>
>
> fi_close((struct fid *) ep);
> fi_close((struct fid *) av);
> fi_close((struct fid *) eq);
> fi_close((struct fid *) cq);
> fi_close((struct fid *) domain);
> fi_close((struct fid *) fabric);
> fi_freeinfo(info);
> return 0;
> }
>
>
> _______________________________________________
> Libfabric-users mailing list
> Libfabric-users at lists.openfabrics.org
<mailto:Libfabric-users at lists.openfabrics.org>
> https://lists.openfabrics.org/mailman/listinfo/libfabric-users
<https://lists.openfabrics.org/mailman/listinfo/libfabric-users>
>
>
>
_______________________________________________
Libfabric-users mailing list
Libfabric-users at lists.openfabrics.org
https://lists.openfabrics.org/mailman/listinfo/libfabric-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/libfabric-users/attachments/20240503/d30b8425/attachment-0001.htm>
More information about the Libfabric-users
mailing list