[ofa-general] sminfo report iberror in the first configuration on RHEL5.3
Doug Ledford
dledford at redhat.com
Mon Feb 16 18:40:08 PST 2009
On Tue, 2009-02-17 at 08:31 +0800, Wen Hao Wang wrote:
> OK, Doug:
>
> Thanks a lot for your detailed explanation! So if I donot want to
> reboot the machine, I need run "chkconfig", "kudzu" and "openibd
> start".
Correct.
> Wen Hao Wang
> Email: wangwhao at cn.ibm.com
>
>
> Doug Ledford <dledford at redhat.com> wrote on 2009-02-17 01:49:33:
>
> > On Mon, 2009-02-16 at 09:29 +0800, Wen Hao Wang wrote:
> > >
> > > Wen Hao Wang
> > >
> > > Software Engineer
> > > IBM China Software Development Laboratory
> > > Email: wangwhao at cn.ibm.com
> > > Tel: 86-10-82451055
> > > Fax: 86-10-82782244 ext. 2312
> > > Address: 1/F, IBM ZGC Campus. Ring Building 28,ZhongGuanCun
> Software
> > > Park,No.8 Dong Bei Wang West Road, Haidian District Beijing,
> 100193,
> > > P.R.China
> > >
> > >
> > > Doug Ledford <dledford at redhat.com> 写于 2009-02-14 00:13:32:
> > >
> > > > On Fri, 2009-02-13 at 08:05 +0800, Wen Hao Wang wrote:
> > > > > Doug Ledford <dledford at redhat.com> 写于 2009-02-12 21:20:30:
> > > > >
> > > > > > On Thu, 2009-02-12 at 13:20 +0200, Tziporet Koren wrote:
> > > > > > > Wen Hao Wang wrote:
> > > > > > > >
> > > > > > > > Hi all:
> > > > > > > >
> > > > > > > > I changed my blade OS to RHEL5.3 yesterday and installed
> > > OFED
> > > > > (shipped
> > > > > > > > in RHEL5.3 image) by "yum groupisntall". Then I load
> some
> > > > > drivers and
> > > > > > > > wrote network interface configuration file ifcfg-ib0.
> ifup
> > > ib0
> > > > > also
> > > > > > > > succeeded. But IB utilites report Connetion timed out.
> > > > > > > >
> > > > > > > >
> > > > > > > > [root at xblade06 network-scripts]# sminfo
> > > > > > > > ibwarn: [32593] _do_madrpc: recv failed: Connection
> timed
> > > out
> > > > > > > > ibwarn: [32593] mad_rpc: _do_madrpc failed; dport (Lid
> 9)
> > > > > > > > sminfo: iberror: failed: query
> > > > > > > >
> > > > > > > > I had to reboot the blade and rerun "openibd start".
> Then
> > > > > sminfo
> > > > > > > > reported correct contents. I do not suppose this reboot
> is
> > > > > required.
> > > > > > > > Did I miss any configuration step?
> > > > > >
> > > > > > There was an unintentional bug in the rhel5.2 openibd init
> > > script in
> > > > > > that it automatically turned itself on during install
> > > (generally,
> > > > > most
> > > > > > init scripts should default to *not* turning themselves on
> > > during
> > > > > > install of the package, nor should they start themselves
> during
> > > > > install
> > > > > > of the package...this is for security reasons, imagine if
> you
> > > > > installed
> > > > > > the bind name server on your box and it automatically
> started up
> > > > > before
> > > > > > you had a chance to configure it). In rhel5.3 we fixed that
> > > bug.
> > > > > So,
> > > > >
> > > > > Yeah. I heard of this bug.
> > > > >
> > > > > > you may need to 'chkconfig --level 2345 openibd on' to make
> sure
> > > > > openibd
> > > > > > starts up each time. The error you list above is consistent
> > > with
> > > > > not
> > > > > > all of the kernel modules being loaded when you tried to use
> the
> > > > > sminfo
> > > > > > program.
> > > > >
> > > > > Even after reboot, service openibd is not started
> automatically.
> > > > > [root at xblade06 ~]# chkconfig --list openibd
> > > > > openibd 0:off 1:off 2:off 3:off 4:off 5:off
> > > 6:off
> > > >
> > > > That's because you have to run the command I listed in my first
> > > email to
> > > > turn it on.
> > > >
> > >
> > > I totally agree with this. But I am still confused why sminfo gave
> > > errors
> > > before reboot, or which steps I should take for the first OFED
> usage
> > > before
> > > reboot. As far as I can see, whether the service is added into
> system
> > > runlevel DB is not related to the sminfo error. Please correct me
> if
> > > that
> > > is not the case.
> >
> > It is related. The runlevel db is only consulted on boot up. If
> the
> > openibd service was not enabled at startup, then adding it to the
> > runlevel startup does *not* start it at that time. You have to both
> add
> > it to the runlevel startup and also start it manually if you want
> things
> > to work properly prior to reboot. The sminfo errors you first
> posted
> > are consistent with some of the modules not being loaded, and it
> went
> > away after you started the openibd service, which is also consistent
> > with the problem.
> >
> > > > > I agree with you that maybe some modules were not loaded. But
> > > what's
> > > > > that?
> > > > > Before reboot, I run "/etc/init.d/openibd start" and
> > > > > "/etc/init.d/network
> > > > > restart". No error was reported. "openibd status" also looked
> > > good.
> > > >
> > > > Running start on a service does not enable that service at the
> next
> > > > reboot. You must specifically enable the service in order for
> it to
> > > > start automatically.
> > > >
> > > > > >
> > > > > > > > Moreover, "openibd start" report one warning message
> about
> > > > > hwconf.
> > > > > > > > Anyone has comments about this?
> > > > > > > >
> > > > > > > > [root at xblade07 ~]# /etc/init.d/openibd start
> > > > > > > > Loading OpenIB kernel
> modules:grep: /etc/sysconfig/hwconf:
> > > No
> > > > > such
> > > > > > > > file or directory
> > > > > > > > [ OK ]
> > > > > >
> > > > > > Can you see if the kudzu package is installed on your
> machine?
> > > The
> > > > > > openib package uses this config file written by kudzu to
> > > determine
> > > > > what
> > > > > > hardware drivers to load. I suppose I should put a specific
> > > > > requires in
> > > > > > the rpm for that.
> > > > >
> > > > > kudzu is installed.
> > > > > [root at xblade06 ~]# rpm -q kudzu
> > > > > kudzu-1.2.57.1.21-1
> > > >
> > > > Make sure kudzu has been run at least once then (it would appear
> to
> > > be
> > > > turned off on your machine or else /etc/sysconfig/hwconf would
> > > exist).
> > > > You can run it manually from the command line and that should be
> > > > sufficient for the openibd init script's needs.
> > > >
> > >
> > > Yes. After kudza created the file on my machine, openibd script
> had no
> > > error
> > > this time. I want to know in my scenario, is "openibd restart"
> > > needed/required?
> >
> > It would probably be advisable, but only if you haven't rebooted
> since
> > running kudzu for the first time. If you've rebooted since then,
> then
> > it doesn't matter.
> >
> > > Many thanks!
> > >
> > > Wen Hao Wang
> > > Email: wangwhao at cn.ibm.com
> > >
> > > > --
> > > > Doug Ledford <dledford at redhat.com>
> > > > GPG KeyID: CFBFF194
> > > > http://people.redhat.com/dledford
> > > >
> > > > Infiniband specific RPMs available at
> > > > http://people.redhat.com/dledford/Infiniband
> > > >
> > > > [附件 "signature.asc" 被 Wen Hao Wang/China/IBM 删除]
> > >
> > --
> > Doug Ledford <dledford at redhat.com>
> > GPG KeyID: CFBFF194
> > http://people.redhat.com/dledford
> >
> > Infiniband specific RPMs available at
> > http://people.redhat.com/dledford/Infiniband
> >
> > [附件 "signature.asc" 被 Wen Hao Wang/China/IBM 删除]
>
--
Doug Ledford <dledford at redhat.com>
GPG KeyID: CFBFF194
http://people.redhat.com/dledford
Infiniband specific RPMs available at
http://people.redhat.com/dledford/Infiniband
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20090216/fb0cacf9/attachment.sig>
More information about the general
mailing list