[libfabric-users] verbs questions, problem with fi_connect

Arne arnestruck at astruck.de
Mon Oct 26 13:24:25 PDT 2020


Sorry to press the issue, but due date when I shall present some results 
is comming closer.


By now I have a setupt which should result in a libfabric installation 
with debug mode enabled.

Could debug mode output help us trouble shooting the problem with the 
fi_connect call?


Are there ideas why fi_connect returns an error entry containing Unkown 
Error -8 to the respective eq of the ep?


Greetings,

Arne.


Am 25.10.20 um 16:50 schrieb Arne:
> Hey, got additional questions regarding the usage of the verbs-provider.
>
>
> Are there any details/configs to acknowledge sending a fi_connect 
> request when trying to connect an Endpoint on the verbs Provider to 
> its peer or reading the respective eq?
>
>
> Reading the eq afterwards gives Unknown Error -8 from fi_eq_strerror 
> (so maybe Exec format error?).
>
>
> Debug Level info did not provide any useful info regarding the Problem 
> (last info is verbs checking for suitable endpoint types and since the 
> fi_endpoint did not return an error I assume it found the requested 
> FI_EP_MSG).
>
> eq-attributes used:
>
>     eq_attr.size = eq_size;
>     eq_attr.flags = 0;
>     eq_attr.wait_obj = FI_WAIT_FD;
>     eq_attr.signaling_vector = 0;
>     eq_attr.wait_set = NULL;
>
>
> hints used are:
>
>     hints = fi_allocinfo();
>     hints->caps = FI_RMA | FI_MSG;
>     msg_hints->mode = FI_RX_CQ_DATA;
>     msg_hints->ep_attr->type = FI_EP_MSG;
>     msg_hints->addr_format = FI_SOCKADDR_IN;
>     msg_hints->domain_attr->mr_mode = FI_MR_LOCAL | FI_MR_VIRT_ADDR | 
> FI_MR_ALLOCATED | FI_MR_PROV_KEY;
>     msg_hints->fabric_attr->prov_name = g_strdup("verbs");
>
>
> The address used as target of the fi_connect is the one corresponding 
> to the infiniband IP of the server node and as FI_SOCKADDR_IN if that 
> is of relevance.
>
> The call adds some additional connection data to the request.
>
>
> On another note: I saw that the node parameter cant be NULL for 
> creating fi_infos for active endpoints (got a ENODATA and warning 
> revealed that verbs couldnt resolve the address, so I replaced it with 
> localhost).
>
> Is that true or do I have another error?
>
>
> Greetings, Arne.
>


More information about the Libfabric-users mailing list