[libfabric-users] Can only use one NIC port in libfabric 1.6.1
arun.ilango at intel.com
Fri Aug 24 15:00:40 PDT 2018
ibv_devinfo shows different NIC ports as different devices as expected.
To listen on multiple NIC ports, you just need one fabric and a passive endpoint listening on the wildcard address (0.0.0.0). That should work. I tried the same on a multi-port iwarp NIC and it was working for me. This is on v1.6.0 and master.
You can try initiating a connection request only from the second port to check if that works.
From: Jörn Schumacher [mailto:jorn.schumacher at cern.ch]
Sent: Thursday, August 23, 2018 2:27 AM
To: Ilango, Arun <arun.ilango at intel.com>; libfabric-users at lists.openfabrics.org
Subject: Re: [libfabric-users] Can only use one NIC port in libfabric 1.6.1
Thanks for your reply.
Looking at my code, I realize the issue actually occurs before even setting up the fi_domain object. I posted my (stripped-down) initialization procedure in the other file in the gist.
In case I want to listen on multiple ports, do I need multiple fi_fabric objects? Or multiple endpoints? Or should I be able to listen on multiple interfaces with "0.0.0.0" like I am doing?
On 08/22/2018 07:46 PM, Ilango, Arun wrote:
> Hi Jörn,
> The verbs provider assigns separate domains for each device got from rdma_get_devices(). So if the NIC ports show up as separate devices, they would belong to separate domains. This had been the case even for 1.4.
> Can you check the output of ibv_devinfo? How does the ports show up there?
> -----Original Message-----
> From: Libfabric-users
> [mailto:libfabric-users-bounces at lists.openfabrics.org] On Behalf Of
> Jörn Schumacher
> Sent: Tuesday, August 21, 2018 2:17 AM
> To: libfabric-users at lists.openfabrics.org
> Subject: [libfabric-users] Can only use one NIC port in libfabric
> Dear libfabric developers,
> I recently updated to libfabric 1.6.1 (from 1.4). It looks like in this release we can only use on port of our NIC (Mellanox ConnectX-5 with RoCE).
> On the receiving side we listen for a RC. We monitor the event queue
> with a file descriptor + epoll. On one port of the NIC this works
> fine, but if the request comes in on the second port (on a different
> subnet) this fails: we get an epoll notification, but then the subsequent fi_eq_sread(...) call yields FI_EAGAIN.
> I open a single domain. This worked fine in the earlier libfabric.
> Reading the documentation a bit I understand that a domain is tied to a port. Does this mean I need to open multiple domains?
> Thanks and best regards,
> Libfabric-users mailing list
> Libfabric-users at lists.openfabrics.org
More information about the Libfabric-users