[Users] Compatibility problems between OFED 1.5.3 and OFED 2.2 ?
Txema Heredia
txema.llistes at gmail.com
Thu Jul 3 04:03:04 PDT 2014
Hi again Sébastien,
>> So, it seems that, on mellanox-ofed-1.5.3, all those 4 parameters (
>> pfctx, pfcrx, log_num_mtt and log_mtts_per_seg ) were on mlx4_core.
>> But in mellanox-ofed-2.2, log_num_mtt and log_mtts_per_seg stayed in
>> mlx4_core while pfctx and pfcrx moved to mlx4_en.
>>
>> Yes, we were told to add those 2 parameters (log_num_mtt and
>> log_mtts_per_seg) to allow GPFS to use up to 6GB of RAM as cache. The
>> other 2 (pfctx and pfcrx) were set by default in the modprobe.d file. It
>> seems that log_num_mtt still exists in mellanox-2.2.
> First thing why do you bother with mlx4_en if you're running an IB cluster?
> Just don't load that module.
I have been messing around reinstalling the nodes and so, but I cannot
find the reason why mlx4_en is loaded. In fact, the mlx5_* modules are
also loaded, even though /etc/modprobe.d/mlnx.conf blacklists them all.
# lsmod | grep "ib\|ml"
ib_ucm 12120 0
ib_ipoib 117911 0
ib_cm 42214 3 ib_ucm,rdma_cm,ib_ipoib
ib_uverbs 49621 2 rdma_ucm,ib_ucm
ib_umad 12562 0
mlx5_ib 112262 0
mlx5_core 86190 1 mlx5_ib
mlx4_en 132913 0
ptp 9614 1 mlx4_en
mlx4_ib 170553 0
ib_sa 36465 5 rdma_ucm,rdma_cm,ib_ipoib,ib_cm,mlx4_ib
ib_mad 43273 4 ib_cm,ib_umad,mlx4_ib,ib_sa
ib_core 112681 12
rdma_ucm,ib_ucm,rdma_cm,iw_cm,ib_ipoib,ib_cm,ib_uverbs,ib_umad,mlx5_ib,mlx4_ib,ib_sa,ib_mad
ib_addr 7796 3 rdma_cm,ib_uverbs,ib_core
ipv6 317829 83 ib_ipoib,mlx4_ib,ib_addr
mlx4_core 288295 2 mlx4_en,mlx4_ib
compat 21768 17
rdma_ucm,ib_ucm,rdma_cm,iw_cm,ib_ipoib,ib_cm,ib_uverbs,ib_umad,mlx5_ib,mlx5_core,mlx4_en,mlx4_ib,ib_sa,ib_mad,ib_core,ib_addr,mlx4_core
>> Could this explain the mac address issue??
> No, this renaming is needed for kernel after 2.6.32 to avoid spamming
> the logs with:
>
> "Loading kernel module for a network device with CAP_SYS_MODULE (deprecated).
> Use CAP_NET_ADMIN and alias netdev-ib0 instead"
>
> entries, and has nothing to do with the address change.
>
> After looking at the ipoib module sources from MLNX OFED 2.2, it looks like
> flags bit 2 from the HW address means that the ipoib interface supports TSS.
>
> The difference in the QP number part of the HW address possibly means that more
> QPs have been reserved for the driver's use, and the first client (ipoib) QP number
> gets shifted.
>
>
It seems that my problem with the mac address change (that leads to an
uninitialized ib0 interface at startup) is due to the mellanox installer.
The mellanox ofed-2.2 installer removes the rdma-3.10 rpm. That rpm
contains the ifup/down-ib scripts described here (
https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/5/html/5.9_Technical_Notes/OFED.html
) that are able to start/stop the infiniband interface impervious to mac
address changes.
Before installation:
# rpm -qa | grep -i rdma
librdmacm-1.0.17-1.el6.x86_64
rdma-3.10-3.el6.noarch
After installation:
# rpm -qa | grep -i rdma
librdmacm-1.0.17.2mlnx3-OFED.2.2.117.g81abe68.x86_64
librdmacm-utils-1.0.17.2mlnx3-OFED.2.2.117.g81abe68.x86_64
librdmacm-devel-1.0.17.2mlnx3-OFED.2.2.117.g81abe68.x86_64
I have modified my kickstart image so all nodes copy these 2 scripts
after installing the infiniband ofed, and it seems to work fine. The mac
address is still misreported, but ib0 is up at startup.
Knowing all these problems I am having, would you recommend to update
all my infiniband GPFS nodes to mellanox ofed-2.2? Or should I keep a
mellanox-1.5.3 / mellanox-2.2 hybrid environment? Or mellanox-1.5.3
/community ?
Thanks,
Txema
More information about the Users
mailing list