[libfabric-users] Can only use one NIC port in libfabric 1.6.1

Ilango, Arun arun.ilango at intel.com
Fri Aug 24 15:00:40 PDT 2018


Hi Jörn,

ibv_devinfo shows different NIC ports as different devices as expected.

To listen on multiple NIC ports, you just need one fabric and a passive endpoint listening on the wildcard address (0.0.0.0). That should work. I tried the same on a multi-port iwarp NIC and it was working for me. This is on v1.6.0 and master. 

You can try initiating a connection request only from the second port to check if that works.

Thanks,
Arun.

-----Original Message-----
From: Jörn Schumacher [mailto:jorn.schumacher at cern.ch] 
Sent: Thursday, August 23, 2018 2:27 AM
To: Ilango, Arun <arun.ilango at intel.com>; libfabric-users at lists.openfabrics.org
Subject: Re: [libfabric-users] Can only use one NIC port in libfabric 1.6.1

Hi Arun,

Thanks for your reply.

ibv_devinfo: https://gist.github.com/joerns/cb7d216b0c3a71b5ea327d0292459211

Looking at my code, I realize the issue actually occurs before even setting up the fi_domain object. I posted my (stripped-down) initialization procedure in the other file in the gist.

In case I want to listen on multiple ports, do I need multiple fi_fabric objects? Or multiple endpoints? Or should I be able to listen on multiple interfaces with "0.0.0.0" like I am doing?

Thanks,
Jörn



On 08/22/2018 07:46 PM, Ilango, Arun wrote:
> Hi Jörn,
>
> The verbs provider assigns separate domains for each device got from rdma_get_devices(). So if the NIC ports show up as separate devices, they would belong to separate domains. This had been the case even for 1.4.
>
> Can you check the output of ibv_devinfo? How does the ports show up there?
>
> Thanks,
> Arun.
>
> -----Original Message-----
> From: Libfabric-users 
> [mailto:libfabric-users-bounces at lists.openfabrics.org] On Behalf Of 
> Jörn Schumacher
> Sent: Tuesday, August 21, 2018 2:17 AM
> To: libfabric-users at lists.openfabrics.org
> Subject: [libfabric-users] Can only use one NIC port in libfabric 
> 1.6.1
>
> Dear libfabric developers,
>
> I recently updated to libfabric 1.6.1 (from 1.4). It looks like in this release we can only use on port of our NIC (Mellanox ConnectX-5 with RoCE).
>
> On the receiving side we listen for a RC. We monitor the event queue 
> with a file descriptor + epoll. On one port of the NIC this works 
> fine, but if the request comes in on the second port (on a different 
> IP
> subnet) this fails: we get an epoll notification, but then the subsequent fi_eq_sread(...) call yields FI_EAGAIN.
>
> I open a single domain. This worked fine in the earlier libfabric.
> Reading the documentation a bit I understand that a domain is tied to a port. Does this mean I need to open multiple domains?
>
>
> Thanks and best regards,
> Jörn
> _______________________________________________
> Libfabric-users mailing list
> Libfabric-users at lists.openfabrics.org
> https://lists.openfabrics.org/mailman/listinfo/libfabric-users



More information about the Libfabric-users mailing list