[libfabric-users] Verbs provider not permitting FI_EP_MSG

Philip Davis philip.e.davis at rutgers.edu
Thu Jan 16 10:08:11 PST 2020


Hi Sean,

> On Jan 16, 2020, at 11:11 AM, Hefty, Sean <sean.hefty at intel.com> wrote:
>
>> I am working with a user that is running on an older Infiniband cluster. Using libfaric
>> with the following hints:
>>
>> hints->caps = FI_MSG | FI_SEND | FI_RECV | FI_REMOTE_READ |
>>                  FI_REMOTE_WRITE | FI_RMA | FI_READ | FI_WRITE;
>>    hints->mode = FI_CONTEXT | FI_LOCAL_MR | FI_CONTEXT2 | FI_MSG_PREFIX |
>>                  FI_ASYNC_IOV | FI_RX_CQ_DATA;
>>    hints->domain_attr->mr_mode = FI_MR_BASIC;
>
> You may want to consider updating to the newer mr_mode bits.  This field was changed starting in the 1.5 release.
>
>>    hints->domain_attr->control_progress = FI_PROGRESS_AUTO;
>>    hints->domain_attr->data_progress = FI_PROGRESS_AUTO;
>>    hints->ep_attr->type = FI_EP_RDM;
>
> This is requesting RDM endpoints, not MSG.  Is this the intent for your app, and the issue is that it can't find the verbs support underneath?

That’s right, I am expecting to use RXM for libfabric >= 1.7, and the issue is there is no verbs provider that supports FI_EP_MSG
>
>> No verbs providers are found. Looking through the debug output, I suspect this is the
>> crucial line:
>
> If you run fi_info, do you see the verbs provider there?

I see verbs providers that support FI_EP_DGRAM, but not FI_EP_MSG.

>
>> libfabric:verbs:fabric:fi_ibv_get_matching_info():1213<info> hints->ep_attr->rx_ctx_cnt
>> != FI_SHARED_CONTEXT. Skipping XRC FI_EP_MSG endpoints
>
> This is checking for XRC QP support.  Skipping these if the hardware doesn't support it should be fine.

Okay, thanks. I was thinking that’s why the FI_EP_MSG support wasn’t there.

>
>> I take it that the underlying hardware is only compatible with FI_PROTO_RDMA_CM_IB_XRC
>> protocol for MSG endpoints, and it looks like I need to have FI_SHARED_CONTEXT enabled
>> for these endpoints to be supported. I’m having some trouble understanding the
>> implications of using FI_SHARED_CONTEXT. If I only ever use one endpoint, is there any
>> functional or performance impact to setting this? I’d rather not change to using shared
>> contexts unconditionally, so is there a good way for me to detect this situation other
>> than to do a maximally permissive fi_getinfo and iterate through the verbs results?
>
> You don't need to use shared contexts or XRC.  When you mention only using one endpoint, do you mean one MSG endpoint or one RDM endpoint?

One RDM endpoint.

>
> What version of libfabric are you using?  Attaching the full debug output from the startup checks might help isolate the problem.

This output is from 1.7.2. The user reports the same problem with 1.6.x and 1.9.x, but I unfortuantely don’t have access to debug logs from those versions. I have attached the debug log from the startup phase. I’m afraid it’s a bit of a mess because it’s multiple unlabeled ranks.
>
> - Sean

Thank you for your help.

Philip

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/libfabric-users/attachments/20200116/38f9555b/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: debugstartup.log
Type: application/octet-stream
Size: 420715 bytes
Desc: debugstartup.log
URL: <http://lists.openfabrics.org/pipermail/libfabric-users/attachments/20200116/38f9555b/attachment-0001.obj>


More information about the Libfabric-users mailing list