[libfabric-users] verbs questions, problem with fi_connect

Hefty, Sean sean.hefty at intel.com
Mon Nov 2 14:34:07 PST 2020


I was on vacation last week, so I did not see this until today.

> Maybe some code can help a more experienced person to spot my fault
> (mostly skipped unrelevant error handling and combined definition and
> init for brevity):
> 
> 
> fi_getinfo(FI_VERSION(1, 11),
> 
>                 "10.0.10.4",
>                 "4711",
>                 FI_NUMERICHOST,
>                 msg_hints*,
>                 &info);
> 
> That info is used for creating fabric, domain, and endpoint (ep) of the
> client side, msg_hints entries in prior message.
> 
> 
> from connection function (connection_data is 38 byte Buffer for a UUID,
> ep and event_queue are associated):
> 
> uint32_t* event;
> 
> size_t event_entry_size = sizeof(struct fi_eq_cm_entry) + 128;
> 
> void* event_entry = malloc(event_entry_size);
> 
> struct fi_eq_err_entry* event_queue_err_entry;
> 
> struct sockaddr_in* address = malloc(sizeof(struct sockaddr_in));
> 
> 
> address = malloc(sizeof(struct sockaddr_in));
> 
> address->sin_family = AF_INET;
> 
> address->sin_port = htons(4711);
> 
> inet_aton("10.0.10.3", &address->sin_addr);

Btw, you should be able to use fi_info->dest_addr from the output of fi_getinfo()


> fi_connect(ep, address, (void*)con_data, sizeof(struct JConData));

You can also pass in fi_info->dest_addrlen.


> error = fi_eq_sread(event_queue, event, event_entry, event_entry_size,
> -1, 0);
> 
> if (error < 0)
> 
> {
>       if (error == -FI_EAVAIL)
>       {
>          error = fi_eq_readerr(event_queue, event_queue_err_entry, 0);

Be sure event_queue_err_entry is pointing to allocated memory, and that event_queue_err_entry::err_data_size/err_data is set correctly.


Before calling fi_connect(), ensure that the server is listening.


> >> The address used as target of the fi_connect is the one corresponding
> >> to the infiniband IP of the server node and as FI_SOCKADDR_IN if that
> >> is of relevance.

Yes, this is relevant and required actually to make use of verbs provider.


> >> The call adds some additional connection data to the request.
> >>
> >>
> >> On another note: I saw that the node parameter cant be NULL for
> >> creating fi_infos for active endpoints (got a ENODATA and warning
> >> revealed that verbs couldnt resolve the address, so I replaced it
> >> with localhost).

I didn't follow this.  Once you call fi_getinfo(), you should use the returned fi_info structure to create the endpoint.

- Sean


More information about the Libfabric-users mailing list