[libfabric-users] fi_cq_sread fails with "Resource temporarily unavailable"
Alisa Parashchenko
ge24cuc at mytum.de
Fri May 3 12:49:23 PDT 2024
Hello,
I found my mistake. The fi_cq_sread() fails because I set the wrong
timeout (0 instead of -1). I really should have noticed that sooner.
However, with the correct timeout set, fi_inject() returns the same
error. It works eventually if I retry often enough:
while ((ret = fi_inject(ep, buf, 6, 1)) == -FI_EAGAIN);
But do I really have to just retry until it works, or is there a better way?
Running fi_pingpong from utils/ on my setup does work. The
fi_rma_pingpong from fabtests/benchmarks/ keeps running for many
minutes, even if I specify the smallest size and only 10 iterations
(i.e. "fi_rma_pingpong -S l:1 -I 10" for the server, and
"fi_rma_pingpong -S l:1 -I 10 localhost" for the client). Is it supposed
to run this long and I should just wait, or am I doing something wrong?
Regards,
Alisa
01.05.2024 17:33, Zegelstein, Seth wrote:
> Hey Alisa,
>
> Can you start with trying to run fabtests on your setup? Start with
one of the pinpong tests.
>
> Best,
> Seth
>
> On 5/1/24, 6:29 AM, "Libfabric-users on behalf of Alisa
Parashchenko" <libfabric-users-bounces at lists.openfabrics.org
<mailto:libfabric-users-bounces at lists.openfabrics.org> on behalf of
ge24cuc at mytum.de <mailto:ge24cuc at mytum.de>> wrote:
>
>
> CAUTION: This email originated from outside of the organization. Do
not click links or open attachments unless you can confirm the sender
and know the content is safe.
>
>
>
>
>
>
> Hello,
>
>
> I am new to Libfabric and trying to write some code that does RMAs.
> Currently, however, even reading from the completion queue after doing a
> regular fi_recv() is failing with "Resource temporarily unavailable".
>
>
> Here is a minimal program that gets this error. Could someone tell me
> what I'm doing wrong? Setting FI_LOG_LEVEL=Debug didn't give any helpful
> information. I am on a regular Linux desktop, with Libfabric using its
> TCP provider, if that's relevant.
>
>
> Regards,
> Alisa
>
>
> #include <assert.h>
> #include <errno.h>
> #include <stdlib.h>
> #include <stdio.h>
> #include <unistd.h>
>
>
> #include <rdma/fabric.h>
> #include <rdma/fi_cm.h>
> #include <rdma/fi_domain.h>
> #include <rdma/fi_endpoint.h>
> #include <rdma/fi_rma.h>
>
>
> #define PANIC_NZ(a) if ((ret = a)) panic("" #a "", fi_strerror(ret));
>
>
> static struct fi_info *info;
> static struct fid_fabric *fabric;
> static struct fid_domain *domain;
> static struct fid_ep *ep;
> static struct fi_av_attr av_attr = { 0 };
> static struct fi_cq_attr cq_attr = { 0 };
> static struct fi_eq_attr eq_attr = { 0 };
> static struct fid_av *av;
> static struct fid_cq *cq;
> static struct fid_eq *eq;
> int ret;
>
>
> void panic(char *f, const char *msg) {
> fprintf(stderr, "%s failed: %s\n", f, msg);
> exit(1);
> }
>
>
> void hexdump(int len, void *buf) {
> for (int i = 0; i < len; i++) printf("%02hhx ", ((char*)buf)[i]);
> printf("\n");
> }
>
>
> int main(int argc, char **argv) {
> char *host = "localhost";
> int is_server = argc <= 1;
> char *port = is_server ? "1234" : "4321" ;
>
>
> /* Select fabric */
> struct fi_info *hints = fi_allocinfo();
> hints->ep_attr->type = FI_EP_RDM;
> hints->caps = FI_MSG | FI_RMA;
> PANIC_NZ(fi_getinfo(FI_VERSION(1,21), host, port, FI_SOURCE, hints,
> &info));
> printf("Selected fabric \"%s\", domain \"%s\"\n",
> info->fabric_attr->name, info->domain_attr->name);
> fi_freeinfo(hints);
>
>
> /* Set up address vector */
> PANIC_NZ(fi_fabric(info->fabric_attr, &fabric, NULL));
> PANIC_NZ(fi_domain(fabric, info, &domain, NULL));
> av_attr.type = FI_AV_TABLE;
> av_attr.count = 2;
> PANIC_NZ(fi_av_open(domain, &av_attr, &av, NULL));
>
>
> /* Open the endpoint, bind it to an EQ, CQ, and AV*/
> PANIC_NZ(fi_endpoint(domain, info, &ep, NULL));
> cq_attr.wait_obj = FI_WAIT_UNSPEC;
> PANIC_NZ(fi_cq_open(domain, &cq_attr, &cq, NULL));
> PANIC_NZ(fi_eq_open(fabric, &eq_attr, &eq, NULL));
> PANIC_NZ(fi_ep_bind(ep, &av->fid, 0));
> PANIC_NZ(fi_ep_bind(ep, &cq->fid, FI_TRANSMIT|FI_RECV));
> PANIC_NZ(fi_ep_bind(ep, &eq->fid, 0));
> PANIC_NZ(fi_enable(ep));
>
>
> /* Get the address of the endpoint */
> char fi_addr[160];
> size_t fi_addrlen = 160;
> PANIC_NZ(fi_getname(&ep->fid, fi_addr, &fi_addrlen));
> printf("Got libfabric EP addr of length %zu:\n", fi_addrlen);
> hexdump(fi_addrlen, fi_addr);
>
>
> /* Insert own address and peer's address into AV */
> ret = fi_av_insert(av, fi_addr, 1, NULL, 0, NULL);
> assert(ret == 1);
> /* Obviously not the right way to do this, but the shortest way */
> char *peer_port = is_server ? "\x10\xe1" : "\x04\xd2";
> memcpy(fi_addr + 2, peer_port, 2);
> ret = fi_av_insert(av, fi_addr, 1, NULL, 0, NULL);
> assert(ret == 1);
>
>
> /* Try to exchange a message */
> if (is_server) {
> char buf[6];
> char cq_buf[128];
> PANIC_NZ(fi_recv(ep, buf, 5, NULL, 1, NULL));
> ret = fi_cq_sread(cq, cq_buf, 1, NULL, 0);
> if (ret < 0) panic("fi_cq_sread", fi_strerror(ret));
> printf("Got message: %s\n", buf);
> } else {
> char buf[6] = "Hello";
> PANIC_NZ(fi_inject(ep, buf, 6, 1));
> }
>
>
> fi_close((struct fid *) ep);
> fi_close((struct fid *) av);
> fi_close((struct fid *) eq);
> fi_close((struct fid *) cq);
> fi_close((struct fid *) domain);
> fi_close((struct fid *) fabric);
> fi_freeinfo(info);
> return 0;
> }
>
>
> _______________________________________________
> Libfabric-users mailing list
> Libfabric-users at lists.openfabrics.org
<mailto:Libfabric-users at lists.openfabrics.org>
> https://lists.openfabrics.org/mailman/listinfo/libfabric-users
<https://lists.openfabrics.org/mailman/listinfo/libfabric-users>
>
>
>
More information about the Libfabric-users
mailing list