[Users] Compatibility problems between OFED 1.5.3 and OFED 2.2 ?

Txema Heredia txema.llistes at gmail.com
Thu Jul 3 04:03:04 PDT 2014


Hi again Sébastien,

>> So, it seems that, on mellanox-ofed-1.5.3, all those 4 parameters (
>> pfctx, pfcrx, log_num_mtt and log_mtts_per_seg ) were on mlx4_core.
>> But in mellanox-ofed-2.2, log_num_mtt and log_mtts_per_seg stayed in
>> mlx4_core while pfctx and pfcrx moved to mlx4_en.
>>
>> Yes, we were told to add those 2 parameters (log_num_mtt and
>> log_mtts_per_seg) to allow GPFS to use up to 6GB of RAM as cache. The
>> other 2 (pfctx and pfcrx) were set by default in the modprobe.d file. It
>> seems that log_num_mtt still exists in mellanox-2.2.
>    First thing why do you bother with mlx4_en if you're running an IB cluster?
> Just don't load that module.

I have been messing around reinstalling the nodes and so, but I cannot 
find the reason why mlx4_en is loaded. In fact, the mlx5_* modules are 
also loaded, even though /etc/modprobe.d/mlnx.conf blacklists them all.

# lsmod | grep "ib\|ml"
ib_ucm                 12120  0
ib_ipoib              117911  0
ib_cm                  42214  3 ib_ucm,rdma_cm,ib_ipoib
ib_uverbs              49621  2 rdma_ucm,ib_ucm
ib_umad                12562  0
mlx5_ib               112262  0
mlx5_core              86190  1 mlx5_ib
mlx4_en               132913  0
ptp                     9614  1 mlx4_en
mlx4_ib               170553  0
ib_sa                  36465  5 rdma_ucm,rdma_cm,ib_ipoib,ib_cm,mlx4_ib
ib_mad                 43273  4 ib_cm,ib_umad,mlx4_ib,ib_sa
ib_core               112681  12 
rdma_ucm,ib_ucm,rdma_cm,iw_cm,ib_ipoib,ib_cm,ib_uverbs,ib_umad,mlx5_ib,mlx4_ib,ib_sa,ib_mad
ib_addr                 7796  3 rdma_cm,ib_uverbs,ib_core
ipv6                  317829  83 ib_ipoib,mlx4_ib,ib_addr
mlx4_core             288295  2 mlx4_en,mlx4_ib
compat                 21768  17 
rdma_ucm,ib_ucm,rdma_cm,iw_cm,ib_ipoib,ib_cm,ib_uverbs,ib_umad,mlx5_ib,mlx5_core,mlx4_en,mlx4_ib,ib_sa,ib_mad,ib_core,ib_addr,mlx4_core

>> Could this explain the mac address issue??
>    No, this renaming is needed for kernel after 2.6.32 to avoid spamming
> the logs with:
>
>     "Loading kernel module for a network device with CAP_SYS_MODULE (deprecated).
>      Use CAP_NET_ADMIN and alias netdev-ib0 instead"
>
> entries, and has nothing to do with the address change.
>
>    After looking at the ipoib module sources from MLNX OFED 2.2, it looks like
> flags bit 2 from the HW address means that the ipoib interface supports TSS.
>
>    The difference in the QP number part of the HW address possibly means that more
> QPs have been reserved for the driver's use, and the first client (ipoib) QP number
> gets shifted.
>
>

It seems that my problem with the mac address change (that leads to an 
uninitialized ib0 interface at startup) is due to the mellanox installer.
The mellanox ofed-2.2 installer removes the rdma-3.10 rpm. That rpm 
contains the ifup/down-ib scripts described here ( 
https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/5/html/5.9_Technical_Notes/OFED.html 
) that are able to start/stop the infiniband interface impervious to mac 
address changes.

Before installation:
# rpm -qa | grep -i rdma
librdmacm-1.0.17-1.el6.x86_64
rdma-3.10-3.el6.noarch

After installation:
# rpm -qa | grep -i rdma
librdmacm-1.0.17.2mlnx3-OFED.2.2.117.g81abe68.x86_64
librdmacm-utils-1.0.17.2mlnx3-OFED.2.2.117.g81abe68.x86_64
librdmacm-devel-1.0.17.2mlnx3-OFED.2.2.117.g81abe68.x86_64


I have modified my kickstart image so all nodes copy these 2 scripts 
after installing the infiniband ofed, and it seems to work fine. The mac 
address is still misreported, but ib0 is up at startup.


Knowing all these problems I am having, would you recommend to update 
all my infiniband GPFS nodes to mellanox ofed-2.2? Or should I keep a 
mellanox-1.5.3 / mellanox-2.2 hybrid environment? Or mellanox-1.5.3 
/community ?

Thanks,

Txema




More information about the Users mailing list