[Users] ibacm?

Weiny, Ira ira.weiny at intel.com
Tue Oct 24 13:38:11 PDT 2017


We probably need to fall back to an AF_INET check.  I did not realize that disabling ipv6 would cause this to fail because AF_INET6 usually “covers” AF_INET.  My guess here is that if you excluded IPv6 support from the kernel that would explain the failure.

Perhaps you could try a patch which falls back to AF_INET?

Ira

From: Users [mailto:users-bounces at lists.openfabrics.org] On Behalf Of Hal Rosenstock
Sent: Tuesday, October 24, 2017 8:59 AM
To: Michael Di Domenico <mdidomenico4 at gmail.com>
Cc: users at lists.openfabrics.org
Subject: Re: [Users] ibacm?

This makes more sense ;-)

What were the steps used to disable ipv6 ?

On Tue, Oct 24, 2017 at 11:16 AM, Michael Di Domenico <mdidomenico4 at gmail.com<mailto:mdidomenico4 at gmail.com>> wrote:
it looks like i found the culprit

when running ibacm out of the box on rhel 7.4 the ibacm.log shows

acm_if_iter_sys: ioctl ifconf error -1

if i change the line

s = socket(AF_INET6, SOCK_DGRAM, 0);

to

s = socket(AF_INET, SOCK_DGRAM, 0);

as Hal suggested and start the ibacm daemon i acm correctly bind to be
ipoib addresses and interfaces.  my initial report of this change not
being effective was a miss communication in that i though it related
to the client and not the service process

i can then run

ib_acme -f i -s 172.22.64.96 -d 172.22.64.96 -S 172.22.64.96 -v V

and get back valid data

ib_acme -d <hostname> still doesn't work, but that might be internal
we don't currently have reverse/forward entries for our ipoib
interfaces, i'm still looking into it



On Mon, Oct 16, 2017 at 1:54 PM, Michael Di Domenico
<mdidomenico4 at gmail.com<mailto:mdidomenico4 at gmail.com>> wrote:
> it's come to my attention that ibacm might not be working correctly on
> my cluster, but i'm unable to determine why ibacm is failing
>
> here's what i did
>
> ib_acme -A -O
> systemctl restart ibacm
>
> in the /var/log/ibacm file i see
>
> acm_if_iter_sys: ioctl ifconf error -1
> acmp_join_group: qib0 1 pkey 0xffff, sl 0x0, rate 0x3, mtu 0x4
> acm_server: started
>
> but when i try to query the node locally
>
> ib_acme -d node001 -v -V
>
> in the log file i see
>
> acm_svr_resolve_dest: notice - unknown local end point address
>
> on the console i see
>
> Service: localhost
> Destination: 172.21.80.1
> ib_acm_resolve_ip failed: cannot assign requested address
> SA verification: failed cannot assign requested address
>
> the ibacm_addr.cfg contains
> node001 qib0 1 default
> node001-1 qib0 1 default
>
> all the nodes in the cluster are configured the exact same way.  and
> produce the same result when trying to query locally or a remote node
>
> any thoughts?
_______________________________________________
Users mailing list
Users at lists.openfabrics.org<mailto:Users at lists.openfabrics.org>
http://lists.openfabrics.org/mailman/listinfo/users

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/users/attachments/20171024/f0844e21/attachment.html>


More information about the Users mailing list