[libfabric-users] Can only use one NIC port in libfabric 1.6.1
jorn.schumacher at cern.ch
Wed Sep 5 03:12:15 PDT 2018
Sorry for the late reply. Our servers got updated and I was without PCs
to test for a while.
I put together a minimal test program that demonstrates the issue:
https://gitlab.cern.ch/joschuma/libfabric-debug (let me know if there
are issues with the access)
The issue occurs even on a single host. IP configuration:
In one terminal ./listener will listen on 0.0.0.0:12345 and print if a
In the other terminal:
(1) ./connect 192.168.1.17 12345
(2) ./connect 192.168.2.17 12345
(1) will generate no event in the listener program. (2) yields a CONNREQ
event in the listener program. This happens with libfabric 1.6.1 and the
With the rdma_server/rdma_client tools I am able to create a connection
using both IP addresses. So I suspect a bug in libfabric.
Let me know if you need any more info, I am happy to provide any help
you might need.
Thanks a lot.
On 08/25/2018 12:00 AM, Ilango, Arun wrote:
> Hi Jörn,
> ibv_devinfo shows different NIC ports as different devices as expected.
> To listen on multiple NIC ports, you just need one fabric and a passive endpoint listening on the wildcard address (0.0.0.0). That should work. I tried the same on a multi-port iwarp NIC and it was working for me. This is on v1.6.0 and master.
> You can try initiating a connection request only from the second port to check if that works.
> -----Original Message-----
> From: Jörn Schumacher [mailto:jorn.schumacher at cern.ch]
> Sent: Thursday, August 23, 2018 2:27 AM
> To: Ilango, Arun <arun.ilango at intel.com>; libfabric-users at lists.openfabrics.org
> Subject: Re: [libfabric-users] Can only use one NIC port in libfabric 1.6.1
> Hi Arun,
> Thanks for your reply.
> ibv_devinfo: https://gist.github.com/joerns/cb7d216b0c3a71b5ea327d0292459211
> Looking at my code, I realize the issue actually occurs before even setting up the fi_domain object. I posted my (stripped-down) initialization procedure in the other file in the gist.
> In case I want to listen on multiple ports, do I need multiple fi_fabric objects? Or multiple endpoints? Or should I be able to listen on multiple interfaces with "0.0.0.0" like I am doing?
> On 08/22/2018 07:46 PM, Ilango, Arun wrote:
>> Hi Jörn,
>> The verbs provider assigns separate domains for each device got from rdma_get_devices(). So if the NIC ports show up as separate devices, they would belong to separate domains. This had been the case even for 1.4.
>> Can you check the output of ibv_devinfo? How does the ports show up there?
>> -----Original Message-----
>> From: Libfabric-users
>> [mailto:libfabric-users-bounces at lists.openfabrics.org] On Behalf Of
>> Jörn Schumacher
>> Sent: Tuesday, August 21, 2018 2:17 AM
>> To: libfabric-users at lists.openfabrics.org
>> Subject: [libfabric-users] Can only use one NIC port in libfabric
>> Dear libfabric developers,
>> I recently updated to libfabric 1.6.1 (from 1.4). It looks like in this release we can only use on port of our NIC (Mellanox ConnectX-5 with RoCE).
>> On the receiving side we listen for a RC. We monitor the event queue
>> with a file descriptor + epoll. On one port of the NIC this works
>> fine, but if the request comes in on the second port (on a different
>> subnet) this fails: we get an epoll notification, but then the subsequent fi_eq_sread(...) call yields FI_EAGAIN.
>> I open a single domain. This worked fine in the earlier libfabric.
>> Reading the documentation a bit I understand that a domain is tied to a port. Does this mean I need to open multiple domains?
>> Thanks and best regards,
>> Libfabric-users mailing list
>> Libfabric-users at lists.openfabrics.org
More information about the Libfabric-users