[libfabric-users] Can only use one NIC port in libfabric 1.6.1

Jörn Schumacher jorn.schumacher at cern.ch
Wed Sep 5 04:30:31 PDT 2018


Hi,

In addition to my last message:

- I cross-checked with libfabric 1.6.0 since you reported not see the 
issue with this version. Unfortunately we still see the same issue.

- I cross-checked on a different system to exclude issues with the NIC, 
same result.

Cheers,
Jörn


On 09/05/2018 12:12 PM, Jörn Schumacher wrote:
> Hi Arun,
>
> Sorry for the late reply. Our servers got updated and I was without 
> PCs to test for a while.
>
> I put together a minimal test program that demonstrates the issue: 
> https://gitlab.cern.ch/joschuma/libfabric-debug (let me know if there 
> are issues with the access)
>
> The issue occurs even on a single host. IP configuration:
>
> eth2: 192.168.1.17/24
> eth3: 192.168.2.17/24
>
> In one terminal ./listener will listen on 0.0.0.0:12345 and print if a 
> CONNREQ occurs.
>
> In the other terminal:
>
> (1) ./connect 192.168.1.17 12345
> (2) ./connect 192.168.2.17 12345
>
> (1) will generate no event in the listener program. (2) yields a 
> CONNREQ event in the listener program. This happens with libfabric 
> 1.6.1 and the verbs provider.
>
> With the rdma_server/rdma_client tools I am able to create a 
> connection using both IP addresses. So I suspect a bug in libfabric.
>
> Let me know if you need any more info, I am happy to provide any help 
> you might need.
>
>
> Thanks a lot.
>
>
> Cheers,
>
> Jörn
>
>
>
> On 08/25/2018 12:00 AM, Ilango, Arun wrote:
>> Hi Jörn,
>>
>> ibv_devinfo shows different NIC ports as different devices as expected.
>>
>> To listen on multiple NIC ports, you just need one fabric and a 
>> passive endpoint listening on the wildcard address (0.0.0.0). That 
>> should work. I tried the same on a multi-port iwarp NIC and it was 
>> working for me. This is on v1.6.0 and master.
>>
>> You can try initiating a connection request only from the second port 
>> to check if that works.
>>
>> Thanks,
>> Arun.
>>
>> -----Original Message-----
>> From: Jörn Schumacher [mailto:jorn.schumacher at cern.ch]
>> Sent: Thursday, August 23, 2018 2:27 AM
>> To: Ilango, Arun <arun.ilango at intel.com>; 
>> libfabric-users at lists.openfabrics.org
>> Subject: Re: [libfabric-users] Can only use one NIC port in libfabric 
>> 1.6.1
>>
>> Hi Arun,
>>
>> Thanks for your reply.
>>
>> ibv_devinfo: 
>> https://gist.github.com/joerns/cb7d216b0c3a71b5ea327d0292459211
>>
>> Looking at my code, I realize the issue actually occurs before even 
>> setting up the fi_domain object. I posted my (stripped-down) 
>> initialization procedure in the other file in the gist.
>>
>> In case I want to listen on multiple ports, do I need multiple 
>> fi_fabric objects? Or multiple endpoints? Or should I be able to 
>> listen on multiple interfaces with "0.0.0.0" like I am doing?
>>
>> Thanks,
>> Jörn
>>
>>
>>
>> On 08/22/2018 07:46 PM, Ilango, Arun wrote:
>>> Hi Jörn,
>>>
>>> The verbs provider assigns separate domains for each device got from 
>>> rdma_get_devices(). So if the NIC ports show up as separate devices, 
>>> they would belong to separate domains. This had been the case even 
>>> for 1.4.
>>>
>>> Can you check the output of ibv_devinfo? How does the ports show up 
>>> there?
>>>
>>> Thanks,
>>> Arun.
>>>
>>> -----Original Message-----
>>> From: Libfabric-users
>>> [mailto:libfabric-users-bounces at lists.openfabrics.org] On Behalf Of
>>> Jörn Schumacher
>>> Sent: Tuesday, August 21, 2018 2:17 AM
>>> To: libfabric-users at lists.openfabrics.org
>>> Subject: [libfabric-users] Can only use one NIC port in libfabric
>>> 1.6.1
>>>
>>> Dear libfabric developers,
>>>
>>> I recently updated to libfabric 1.6.1 (from 1.4). It looks like in 
>>> this release we can only use on port of our NIC (Mellanox ConnectX-5 
>>> with RoCE).
>>>
>>> On the receiving side we listen for a RC. We monitor the event queue
>>> with a file descriptor + epoll. On one port of the NIC this works
>>> fine, but if the request comes in on the second port (on a different
>>> IP
>>> subnet) this fails: we get an epoll notification, but then the 
>>> subsequent fi_eq_sread(...) call yields FI_EAGAIN.
>>>
>>> I open a single domain. This worked fine in the earlier libfabric.
>>> Reading the documentation a bit I understand that a domain is tied 
>>> to a port. Does this mean I need to open multiple domains?
>>>
>>>
>>> Thanks and best regards,
>>> Jörn
>>> _______________________________________________
>>> Libfabric-users mailing list
>>> Libfabric-users at lists.openfabrics.org
>>> https://lists.openfabrics.org/mailman/listinfo/libfabric-users
>
> _______________________________________________
> Libfabric-users mailing list
> Libfabric-users at lists.openfabrics.org
> https://lists.openfabrics.org/mailman/listinfo/libfabric-users



More information about the Libfabric-users mailing list