[ewg] mlx4 and ibv_devinfo discrepancy?

Jack Morgenstein jackm at dev.mellanox.co.il
Wed Jul 8 02:48:59 PDT 2009


Pradeep,
There is no one-to-one connection between an **infiniband** port being active, and
an **IPoIB** port being up.

The Infiniband port active means that its logical link is up, and it can send and receive
packets from its wire interface.  For example, if you run the ibv_ud_pingpong example application,
you will see that it sends and receives packets.

On the other hand, ib0 up or down is an IPoIB concept.  ib0 may show "down" even if the infiniband
port is active.  For example, if the administrator does "ifconfig ib0 down", you will see that
the ib0 operational state is "down" -- however, the underlying infiniband port remains active -- and,
for example, the various ibv_xx_pingpong example apps will work.

There is no problem with what you saw.

-Jack


On Tuesday 07 July 2009 19:48, Pradeep Satyanarayana wrote:
> I was attempting to debug an IPoIB "multicast join failed" issue and in the process
> discovered the discrepancy (was using OFED-1.4.1 on ppc64 blades) as described below.
> 
> My setup consisted of two nodes with dual port ConnectX HCAs with ports 1 on each node 
> connected to a switch say switch1 and ports 2 on each node connected to another switch, say switch2.
> 
> The problem was that ports 1 would not join the multicast group as shown
> 
> ib0: multicast join failed for ff12:401b:ffff:0000:0000:0000:ffff:ffff, status -22
> ib0: multicast join failed for ff12:401b:ffff:0000:0000:0000:ffff:ffff, status -22
> 
> If the same ports were connected to switch2, using the same cables, everything
> worked fine. The problem was due to an MTU mismatch, so IPoIB did behave as expected.
> 
> However, as shown the output of ibv_devinfo was misleading. This was the output when
> the port 1 was connected to switch1 with incorrect MTU.
> 
> [root at cluster-1 ~]# ibv_devinfo
> hca_id: mlx4_0
>         fw_ver:                         2.6.000
>         node_guid:                      0002:c903:0001:2058
>         sys_image_guid:                 0002:c903:0001:205b
>         vendor_id:                      0x02c9
>         vendor_part_id:                 25418
>         hw_ver:                         0xA0
>         board_id:                       IBM08A0000001
>         phys_port_cnt:                  2
>                 port:   1
>                         state:                  PORT_ACTIVE (4)
>                         max_mtu:                2048 (4)
>                         active_mtu:             2048 (4)
>                         sm_lid:                 1
>                         port_lid:               50
>                         port_lmc:               0x00
> 
>                 port:   2
>                         state:                  PORT_ACTIVE (4)
>                         max_mtu:                2048 (4)
>                         active_mtu:             2048 (4)
>                         sm_lid:                 1
>                         port_lid:               11
>                         port_lmc:               0x00
> 
> 
> Same issue with the other HCA too.
> 
> [root at cluster-2 ~]# ibv_devinfo
> hca_id: mlx4_0
>         fw_ver:                         2.6.000
>         node_guid:                      0002:c903:0001:21e4
>         sys_image_guid:                 0002:c903:0001:21e7
>         vendor_id:                      0x02c9
>         vendor_part_id:                 25418
>         hw_ver:                         0xA0
>         board_id:                       IBM08A0000001
>         phys_port_cnt:                  2
>                 port:   1
>                         state:                  PORT_ACTIVE (4)
>                         max_mtu:                2048 (4)
>                         active_mtu:             2048 (4)
>                         sm_lid:                 1
>                         port_lid:               51
>                         port_lmc:               0x00
> 
>                 port:   2
>                         state:                  PORT_ACTIVE (4)
>                         max_mtu:                2048 (4)
>                         active_mtu:             2048 (4)
>                         sm_lid:                 1
>                         port_lid:               66
>                         port_lmc:               0x00
> 
> [root at cluster-2 ~]#
> 
> 
> "cat /sys/class/net/ib0/operstate" showed down and that clued me to the fact that there was something amiss and
> as shown the link was down.
> 
> [root at cluster-1 ~]# ip link show dev ib0
> 3: ib0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 65520 qdisc pfifo_fast qlen 256
>     link/infiniband 80:00:00:48:fe:80:00:00:00:00:00:00:00:02:c9:03:00:01:20:59 brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
> [root at cluster-1 ~]# ip link show dev ib1
> 4: ib1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 65520 qdisc pfifo_fast qlen 256
>     link/infiniband 80:00:00:49:fe:80:00:00:00:00:00:00:00:02:c9:03:00:01:20:5a brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
> [root at cluster-1 ~]#
> 
> [root at cluster-2 ~]# ip link show dev ib0
> 3: ib0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 65520 qdisc pfifo_fast qlen 256
>     link/infiniband 80:00:00:48:fe:80:00:00:00:00:00:00:00:02:c9:03:00:01:21:e5 brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
> [root at cluster-2 ~]# ip link show dev ib1
> 4: ib1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 2044 qdisc pfifo_fast qlen 256
>     link/infiniband 80:00:00:49:fe:80:00:00:00:00:00:00:00:02:c9:03:00:01:21:e6 brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
> [root at cluster-2 ~]#
> 
> 
> 
> Why does ibv_devinfo show the port 1 as PORT_ACTIVE? Isn't that incorrect? Is this a known problem?
> 
> Pradeep
> 
> _______________________________________________
> ewg mailing list
> ewg at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
> 



More information about the ewg mailing list