[libfabric-users] Can only use one NIC port in libfabric 1.6.1

Jörn Schumacher jorn.schumacher at cern.ch
Wed Sep 5 03:12:15 PDT 2018

Hi Arun,

Sorry for the late reply. Our servers got updated and I was without PCs 
to test for a while.

I put together a minimal test program that demonstrates the issue: 
https://gitlab.cern.ch/joschuma/libfabric-debug (let me know if there 
are issues with the access)

The issue occurs even on a single host. IP configuration:


In one terminal ./listener will listen on and print if a 
CONNREQ occurs.

In the other terminal:

(1) ./connect 12345
(2) ./connect 12345

(1) will generate no event in the listener program. (2) yields a CONNREQ 
event in the listener program. This happens with libfabric 1.6.1 and the 
verbs provider.

With the rdma_server/rdma_client tools I am able to create a connection 
using both IP addresses. So I suspect a bug in libfabric.

Let me know if you need any more info, I am happy to provide any help 
you might need.

Thanks a lot.



On 08/25/2018 12:00 AM, Ilango, Arun wrote:
> Hi Jörn,
> ibv_devinfo shows different NIC ports as different devices as expected.
> To listen on multiple NIC ports, you just need one fabric and a passive endpoint listening on the wildcard address ( That should work. I tried the same on a multi-port iwarp NIC and it was working for me. This is on v1.6.0 and master.
> You can try initiating a connection request only from the second port to check if that works.
> Thanks,
> Arun.
> -----Original Message-----
> From: Jörn Schumacher [mailto:jorn.schumacher at cern.ch]
> Sent: Thursday, August 23, 2018 2:27 AM
> To: Ilango, Arun <arun.ilango at intel.com>; libfabric-users at lists.openfabrics.org
> Subject: Re: [libfabric-users] Can only use one NIC port in libfabric 1.6.1
> Hi Arun,
> Thanks for your reply.
> ibv_devinfo: https://gist.github.com/joerns/cb7d216b0c3a71b5ea327d0292459211
> Looking at my code, I realize the issue actually occurs before even setting up the fi_domain object. I posted my (stripped-down) initialization procedure in the other file in the gist.
> In case I want to listen on multiple ports, do I need multiple fi_fabric objects? Or multiple endpoints? Or should I be able to listen on multiple interfaces with "" like I am doing?
> Thanks,
> Jörn
> On 08/22/2018 07:46 PM, Ilango, Arun wrote:
>> Hi Jörn,
>> The verbs provider assigns separate domains for each device got from rdma_get_devices(). So if the NIC ports show up as separate devices, they would belong to separate domains. This had been the case even for 1.4.
>> Can you check the output of ibv_devinfo? How does the ports show up there?
>> Thanks,
>> Arun.
>> -----Original Message-----
>> From: Libfabric-users
>> [mailto:libfabric-users-bounces at lists.openfabrics.org] On Behalf Of
>> Jörn Schumacher
>> Sent: Tuesday, August 21, 2018 2:17 AM
>> To: libfabric-users at lists.openfabrics.org
>> Subject: [libfabric-users] Can only use one NIC port in libfabric
>> 1.6.1
>> Dear libfabric developers,
>> I recently updated to libfabric 1.6.1 (from 1.4). It looks like in this release we can only use on port of our NIC (Mellanox ConnectX-5 with RoCE).
>> On the receiving side we listen for a RC. We monitor the event queue
>> with a file descriptor + epoll. On one port of the NIC this works
>> fine, but if the request comes in on the second port (on a different
>> IP
>> subnet) this fails: we get an epoll notification, but then the subsequent fi_eq_sread(...) call yields FI_EAGAIN.
>> I open a single domain. This worked fine in the earlier libfabric.
>> Reading the documentation a bit I understand that a domain is tied to a port. Does this mean I need to open multiple domains?
>> Thanks and best regards,
>> Jörn
>> _______________________________________________
>> Libfabric-users mailing list
>> Libfabric-users at lists.openfabrics.org
>> https://lists.openfabrics.org/mailman/listinfo/libfabric-users

More information about the Libfabric-users mailing list