[ewg] Re: OFED 1.4.2 requires rebuild of kernel modules on each node

Vladimir Sokolovsky vlad at dev.mellanox.co.il
Thu Oct 29 05:02:36 PDT 2009


Bryan wrote:
> On Sun, Oct 25, 2009 at 10:22 AM, Vladimir Sokolovsky
> <vlad at dev.mellanox.co.il> wrote:
>> Woodruff, Robert J wrote:
>>> Sending this to the EWG openfabrics list,
>>> since this seems to be an OFED build/installation issue
>>> rather than a general code problem.
>>>
>>> One thing that you might try is to instead of copying the
>>> entire build directory and re-runing ./install.pl -c ofed.conf
>>> on each system, instead, after building on one node,
>>> just copy the binrary RPMS directory and the uninstall script to the other
>>> nodes,
>>> Then just run the uninstall script and
>>> install the RPMS manually... e.g,
>>> ./uninstall.sh
>>> cd ./RPMS/redhat-release-xxxx/x86_64
>>> rpm -i *
>>>
>>> This method has worked for me in the past.
>>> woody
>>>
>>>
>>> -----Original Message-----
>>> From: linux-rdma-owner at vger.kernel.org
>>> [mailto:linux-rdma-owner at vger.kernel.org] On Behalf Of Bryan
>>> Sent: Thursday, October 22, 2009 11:22 AM
>>> To: linux-rdma at vger.kernel.org
>>> Subject: OFED 1.4.2 requires rebuild of kernel modules on each node
>>>
>>> I was referred to this list by the general mailing list on OFED.
>>> Emailing from my personal address since Lotus Notes insists that
>>> anything it sends has to contain some portion of HTML.
>>>
>>> This problem was observed on Red Hat Enterprise Linux 5 update 3.  I
>>> searched the list but did not see anything immediately applicable.
>>> We've seen issues similar in the past where we were able to modify the
>>> script to solve an RPM that didn't match the expected naming scheme,
>>> but did not see anything immediately when looking at the scripts for
>>> this version.
>>>
>>> Copied from an internal bug reporting tool:
>>>
>>> On installing OFED 1.4.2, the tarball was extracted, in directory the code
>>> was extracted to, ./install.pl was run and all components of OFED were
>>> build/installed with the default settings.  Then this directory was
>>> copied to another node, and ./install.pl -c ofed.conf was run.  Previously
>>> this would just do the install of the already built components, but with
>>> OFED 1.4.2, the kernel RPM gets re-built when this is done.
>>>
>>> This means that the build tools have to be on each node, and that
>>> deployment of OFED takes longer.
>>>
>>> Bryan Reese -- breese at us.ibm.com
>>> e1350 Linux Cluster Test Engineer
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>>> the body of a message to majordomo at vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>>> the body of a message to majordomo at vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>> Hi Bryan,
>> This can happen when some of the selected kernel components (in ofed.conf)
>> are not supported on the current kernel.
>> Have you seen the following message while running install.pl script?
>> "<package> is not available on this platform"
>>
>> Send me please your ofed.conf and the kernel version.
>>
>> Regards,
>> Vladimir
>>
>>
> 
> Hi Vladimir,
> 
> Sorry if a couple people are getting this twice, I'm not used to using
> gmail with mailing lists.
> 
> We're running Red Hat Enterprise Linux 5 Update 3 with the x86_64 SMP
> kernel 2.6.18-128.el5.
> 
> The below is what we see.  There is no error.  It goes on like this
> afterwards with everything else getting installed with no build.
> 
> Build ofa_kernel RPM
> Running rpmbuild --rebuild  --define '_topdir /var/tmp/OFED_topdir'
> --define 'configure_options   --with-core-mod --with-user_mad-mod
> --with-user_access-mod --with-addr_trans-mod --with-mthca-mod
> --with-mlx4-mod --with-mlx4_en-mod --with-cxgb3-mod --with-nes-mod
> --with-ipath_inf-mod --with-ipoib-mod --with-sdp-mod --w
> ith-srp-mod --with-srp-target-mod --with-rds-mod --with-qlgc_vnic-mod
> --with-iser-mod --with-nfsrdma-mod' --define 'build_kernel_ib 1'
> --define 'build_kernel_ib_devel 1' --define 'KVERSION 2.6.18-128.el5'
> --define 'K_SRC /lib/modules/2.6.18-128.el5/build' --define
> 'network_dir /etc/sysconfig/network-scripts' --define
> '_prefix /usr' --define '__arch_install_post %{nil}'
> /cluster/software/OFED/OFED-1.4.2/SRPMS/ofa_kernel-1.4.2-ofed1.4.2.src.rpm
> Install kernel-ib RPM:
> Running rpm -iv
> /cluster/software/OFED/OFED-1.4.2/RPMS/redhat-release-5Server-5.3.0.3/x86_64/kernel-ib-1.4.2-2.6.18_128.el5.x86_64.rpm
> 
> kernel-ib=y
> kernel-ib-devel=y
> ib-bonding=y
> ib-bonding-debuginfo=y
> libibverbs=y
> libibverbs-devel=y
> libibverbs-devel-static=y
> libibverbs-utils=y
> libibverbs-debuginfo=y
> libmthca=y
> libmthca-devel-static=y
> libmthca-debuginfo=y
> libmlx4=y
> libmlx4-devel=y
> libmlx4-debuginfo=y
> libcxgb3=y
> libcxgb3-devel=y
> libcxgb3-debuginfo=y
> libnes=y
> libnes-devel-static=y
> libnes-debuginfo=y
> libipathverbs=y
> libipathverbs-devel=y
> libipathverbs-debuginfo=y
> libibcm=y
> libibcm-devel=y
> libibcm-debuginfo=y
> libibcommon=y
> libibcommon-devel=y
> libibcommon-static=y
> libibcommon-debuginfo=y
> libibumad=y
> libibumad-devel=y
> libibumad-static=y
> libibumad-debuginfo=y
> libibmad=y
> libibmad-devel=y
> libibmad-static=y
> libibmad-debuginfo=y
> ibsim=y
> ibsim-debuginfo=y
> librdmacm=y
> librdmacm-utils=y
> librdmacm-devel=y
> librdmacm-debuginfo=y
> libsdp=y
> libsdp-devel=y
> libsdp-debuginfo=y
> opensm=y
> opensm-libs=y
> opensm-devel=y
> opensm-debuginfo=y
> opensm-static=y
> compat-dapl=y
> compat-dapl-devel=y
> dapl=y
> dapl-devel=y
> dapl-devel-static=y
> dapl-utils=y
> dapl-debuginfo=y
> perftest=y
> mstflint=y
> tvflash=y
> qlvnictools=y
> sdpnetstat=y
> srptools=y
> rds-tools=y
> rnfs-utils=y
> ibutils=y
> infiniband-diags=y
> qperf=y
> qperf-debuginfo=y
> ofed-docs=y
> ofed-scripts=y
> tgt-generic=y
> mpi-selector=y
> mvapich_gcc=y
> mvapich2_gcc=y
> openmpi_gcc=y
> mpitests_mvapich_gcc=y
> mpitests_mvapich2_gcc=y
> mpitests_openmpi_gcc=y
> ibvexdmtools=y
> qlgc_vnic_daemon=y
> core=y
> mthca=y
> mlx4=y
> mlx4_en=y
> cxgb3=y
> nes=y
> ipath=y
> ipoib=y
> sdp=y
> srp=y
> srpt=y
> rds=y
> qlgc_vnic=y
> iser=y
> nfsrdma=y
> mvapich2_conf_impl=ofa
> mvapich2_conf_romio=1
> mvapich2_conf_shared_libs=1
> mvapich2_conf_ckpt=0
> mvapich2_conf_vcluster=small
> 

Hi Bryan,
Please try the following patch to the install.pl script:

diff --git a/install.pl b/install.pl
index d12733a..56ceaef 100755
--- a/install.pl
+++ b/install.pl
@@ -2452,6 +2452,10 @@ sub module_in_rpm
          return 1;
      }

+    if ($module eq "nfsrdma") {
+        $module = "xprtrdma";
+    }
+
      open(LIST, "rpm -qlp $package |") or die "Can't run 'rpm -qlp 
$package': $!\n";
      while (<LIST>) {
          if (/$module[a-z_]*.ko/) {

Regards,
Vladimir



More information about the ewg mailing list