[ewg] Re: OFED 1.4.2 requires rebuild of kernel modules on each node

Bryan bryan.mreese at gmail.com
Wed Oct 28 06:52:30 PDT 2009


On Sun, Oct 25, 2009 at 10:22 AM, Vladimir Sokolovsky
<vlad at dev.mellanox.co.il> wrote:
> Woodruff, Robert J wrote:
>>
>> Sending this to the EWG openfabrics list,
>> since this seems to be an OFED build/installation issue
>> rather than a general code problem.
>>
>> One thing that you might try is to instead of copying the
>> entire build directory and re-runing ./install.pl -c ofed.conf
>> on each system, instead, after building on one node,
>> just copy the binrary RPMS directory and the uninstall script to the other
>> nodes,
>> Then just run the uninstall script and
>> install the RPMS manually... e.g,
>> ./uninstall.sh
>> cd ./RPMS/redhat-release-xxxx/x86_64
>> rpm -i *
>>
>> This method has worked for me in the past.
>> woody
>>
>>
>> -----Original Message-----
>> From: linux-rdma-owner at vger.kernel.org
>> [mailto:linux-rdma-owner at vger.kernel.org] On Behalf Of Bryan
>> Sent: Thursday, October 22, 2009 11:22 AM
>> To: linux-rdma at vger.kernel.org
>> Subject: OFED 1.4.2 requires rebuild of kernel modules on each node
>>
>> I was referred to this list by the general mailing list on OFED.
>> Emailing from my personal address since Lotus Notes insists that
>> anything it sends has to contain some portion of HTML.
>>
>> This problem was observed on Red Hat Enterprise Linux 5 update 3.  I
>> searched the list but did not see anything immediately applicable.
>> We've seen issues similar in the past where we were able to modify the
>> script to solve an RPM that didn't match the expected naming scheme,
>> but did not see anything immediately when looking at the scripts for
>> this version.
>>
>> Copied from an internal bug reporting tool:
>>
>> On installing OFED 1.4.2, the tarball was extracted, in directory the code
>> was extracted to, ./install.pl was run and all components of OFED were
>> build/installed with the default settings.  Then this directory was
>> copied to another node, and ./install.pl -c ofed.conf was run.  Previously
>> this would just do the install of the already built components, but with
>> OFED 1.4.2, the kernel RPM gets re-built when this is done.
>>
>> This means that the build tools have to be on each node, and that
>> deployment of OFED takes longer.
>>
>> Bryan Reese -- breese at us.ibm.com
>> e1350 Linux Cluster Test Engineer
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>> the body of a message to majordomo at vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>> the body of a message to majordomo at vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>
> Hi Bryan,
> This can happen when some of the selected kernel components (in ofed.conf)
> are not supported on the current kernel.
> Have you seen the following message while running install.pl script?
> "<package> is not available on this platform"
>
> Send me please your ofed.conf and the kernel version.
>
> Regards,
> Vladimir
>
>

Hi Vladimir,

Sorry if a couple people are getting this twice, I'm not used to using
gmail with mailing lists.

We're running Red Hat Enterprise Linux 5 Update 3 with the x86_64 SMP
kernel 2.6.18-128.el5.

The below is what we see.  There is no error.  It goes on like this
afterwards with everything else getting installed with no build.

Build ofa_kernel RPM
Running rpmbuild --rebuild  --define '_topdir /var/tmp/OFED_topdir'
--define 'configure_options   --with-core-mod --with-user_mad-mod
--with-user_access-mod --with-addr_trans-mod --with-mthca-mod
--with-mlx4-mod --with-mlx4_en-mod --with-cxgb3-mod --with-nes-mod
--with-ipath_inf-mod --with-ipoib-mod --with-sdp-mod --w
ith-srp-mod --with-srp-target-mod --with-rds-mod --with-qlgc_vnic-mod
--with-iser-mod --with-nfsrdma-mod' --define 'build_kernel_ib 1'
--define 'build_kernel_ib_devel 1' --define 'KVERSION 2.6.18-128.el5'
--define 'K_SRC /lib/modules/2.6.18-128.el5/build' --define
'network_dir /etc/sysconfig/network-scripts' --define
'_prefix /usr' --define '__arch_install_post %{nil}'
/cluster/software/OFED/OFED-1.4.2/SRPMS/ofa_kernel-1.4.2-ofed1.4.2.src.rpm
Install kernel-ib RPM:
Running rpm -iv
/cluster/software/OFED/OFED-1.4.2/RPMS/redhat-release-5Server-5.3.0.3/x86_64/kernel-ib-1.4.2-2.6.18_128.el5.x86_64.rpm

kernel-ib=y
kernel-ib-devel=y
ib-bonding=y
ib-bonding-debuginfo=y
libibverbs=y
libibverbs-devel=y
libibverbs-devel-static=y
libibverbs-utils=y
libibverbs-debuginfo=y
libmthca=y
libmthca-devel-static=y
libmthca-debuginfo=y
libmlx4=y
libmlx4-devel=y
libmlx4-debuginfo=y
libcxgb3=y
libcxgb3-devel=y
libcxgb3-debuginfo=y
libnes=y
libnes-devel-static=y
libnes-debuginfo=y
libipathverbs=y
libipathverbs-devel=y
libipathverbs-debuginfo=y
libibcm=y
libibcm-devel=y
libibcm-debuginfo=y
libibcommon=y
libibcommon-devel=y
libibcommon-static=y
libibcommon-debuginfo=y
libibumad=y
libibumad-devel=y
libibumad-static=y
libibumad-debuginfo=y
libibmad=y
libibmad-devel=y
libibmad-static=y
libibmad-debuginfo=y
ibsim=y
ibsim-debuginfo=y
librdmacm=y
librdmacm-utils=y
librdmacm-devel=y
librdmacm-debuginfo=y
libsdp=y
libsdp-devel=y
libsdp-debuginfo=y
opensm=y
opensm-libs=y
opensm-devel=y
opensm-debuginfo=y
opensm-static=y
compat-dapl=y
compat-dapl-devel=y
dapl=y
dapl-devel=y
dapl-devel-static=y
dapl-utils=y
dapl-debuginfo=y
perftest=y
mstflint=y
tvflash=y
qlvnictools=y
sdpnetstat=y
srptools=y
rds-tools=y
rnfs-utils=y
ibutils=y
infiniband-diags=y
qperf=y
qperf-debuginfo=y
ofed-docs=y
ofed-scripts=y
tgt-generic=y
mpi-selector=y
mvapich_gcc=y
mvapich2_gcc=y
openmpi_gcc=y
mpitests_mvapich_gcc=y
mpitests_mvapich2_gcc=y
mpitests_openmpi_gcc=y
ibvexdmtools=y
qlgc_vnic_daemon=y
core=y
mthca=y
mlx4=y
mlx4_en=y
cxgb3=y
nes=y
ipath=y
ipoib=y
sdp=y
srp=y
srpt=y
rds=y
qlgc_vnic=y
iser=y
nfsrdma=y
mvapich2_conf_impl=ofa
mvapich2_conf_romio=1
mvapich2_conf_shared_libs=1
mvapich2_conf_ckpt=0
mvapich2_conf_vcluster=small



More information about the ewg mailing list