[ofa-general] sminfo report iberror in the first configuration on RHEL5.3

Doug Ledford dledford at redhat.com
Mon Feb 16 09:49:33 PST 2009


On Mon, 2009-02-16 at 09:29 +0800, Wen Hao Wang wrote:
> 
> Wen Hao Wang (王文昊)
> 
> Software Engineer
> IBM China Software Development Laboratory
> Email: wangwhao at cn.ibm.com
> Tel: 86-10-82451055
> Fax: 86-10-82782244 ext. 2312
> Address: 1/F, IBM ZGC Campus. Ring Building 28,ZhongGuanCun Software
> Park,No.8 Dong Bei Wang West Road, Haidian District Beijing, 100193,
> P.R.China
> 
> 
> Doug Ledford <dledford at redhat.com> 写于 2009-02-14 00:13:32:
> 
> > On Fri, 2009-02-13 at 08:05 +0800, Wen Hao Wang wrote:
> > > Doug Ledford <dledford at redhat.com> 写于 2009-02-12 21:20:30:
> > > 
> > > > On Thu, 2009-02-12 at 13:20 +0200, Tziporet Koren wrote:
> > > > > Wen Hao Wang wrote:
> > > > > >
> > > > > > Hi all:
> > > > > >
> > > > > > I changed my blade OS to RHEL5.3 yesterday and installed
> OFED
> > > (shipped 
> > > > > > in RHEL5.3 image) by "yum groupisntall". Then I load some
> > > drivers and 
> > > > > > wrote network interface configuration file ifcfg-ib0. ifup
> ib0
> > > also 
> > > > > > succeeded. But IB utilites report Connetion timed out.
> > > > > >
> > > > > >
> > > > > > [root at xblade06 network-scripts]# sminfo
> > > > > > ibwarn: [32593] _do_madrpc: recv failed: Connection timed
> out
> > > > > > ibwarn: [32593] mad_rpc: _do_madrpc failed; dport (Lid 9)
> > > > > > sminfo: iberror: failed: query
> > > > > >
> > > > > > I had to reboot the blade and rerun "openibd start". Then
> > > sminfo 
> > > > > > reported correct contents. I do not suppose this reboot is
> > > required. 
> > > > > > Did I miss any configuration step?
> > > > 
> > > > There was an unintentional bug in the rhel5.2 openibd init
> script in
> > > > that it automatically turned itself on during install
> (generally,
> > > most
> > > > init scripts should default to *not* turning themselves on
> during
> > > > install of the package, nor should they start themselves during
> > > install
> > > > of the package...this is for security reasons, imagine if you
> > > installed
> > > > the bind name server on your box and it automatically started up
> > > before
> > > > you had a chance to configure it).  In rhel5.3 we fixed that
> bug.
> > >  So,
> > > 
> > > Yeah. I heard of this bug.
> > > 
> > > > you may need to 'chkconfig --level 2345 openibd on' to make sure
> > > openibd
> > > > starts up each time.  The error you list above is consistent
> with
> > > not
> > > > all of the kernel modules being loaded when you tried to use the
> > > sminfo
> > > > program.
> > > 
> > > Even after reboot, service openibd is not started automatically.
> > > [root at xblade06 ~]# chkconfig --list openibd
> > > openibd         0:off   1:off   2:off   3:off   4:off   5:off
> 6:off
> > 
> > That's because you have to run the command I listed in my first
> email to
> > turn it on.
> >
> 
> I totally agree with this. But I am still confused why sminfo gave
> errors
> before reboot, or which steps I should take for the first OFED usage
> before
> reboot. As far as I can see, whether the service is added into system
> runlevel DB is not related to the sminfo error. Please correct me if
> that
> is not the case.

It is related.  The runlevel db is only consulted on boot up.  If the
openibd service was not enabled at startup, then adding it to the
runlevel startup does *not* start it at that time.  You have to both add
it to the runlevel startup and also start it manually if you want things
to work properly prior to reboot.  The sminfo errors you first posted
are consistent with some of the modules not being loaded, and it went
away after you started the openibd service, which is also consistent
with the problem.

> > > I agree with you that maybe some modules were not loaded. But
> what's
> > > that?
> > > Before reboot, I run "/etc/init.d/openibd start" and
> > > "/etc/init.d/network
> > > restart". No error was reported. "openibd status" also looked
> good.
> > 
> > Running start on a service does not enable that service at the next
> > reboot.  You must specifically enable the service in order for it to
> > start automatically.
> > 
> > > > 
> > > > > > Moreover, "openibd start" report one warning message about
> > > hwconf. 
> > > > > > Anyone has comments about this?
> > > > > >
> > > > > > [root at xblade07 ~]# /etc/init.d/openibd start
> > > > > > Loading OpenIB kernel modules:grep: /etc/sysconfig/hwconf:
> No
> > > such 
> > > > > > file or directory
> > > > > > [ OK ]
> > > > 
> > > > Can you see if the kudzu package is installed on your machine?
>  The
> > > > openib package uses this config file written by kudzu to
> determine
> > > what
> > > > hardware drivers to load.  I suppose I should put a specific
> > > requires in
> > > > the rpm for that.
> > > 
> > > kudzu is installed.
> > > [root at xblade06 ~]# rpm -q kudzu
> > > kudzu-1.2.57.1.21-1
> > 
> > Make sure kudzu has been run at least once then (it would appear to
> be
> > turned off on your machine or else /etc/sysconfig/hwconf would
> exist).
> > You can run it manually from the command line and that should be
> > sufficient for the openibd init script's needs.
> > 
> 
> Yes. After kudza created the file on my machine, openibd script had no
> error
> this time. I want to know in my scenario, is "openibd restart"
> needed/required?

It would probably be advisable, but only if you haven't rebooted since
running kudzu for the first time.  If you've rebooted since then, then
it doesn't matter.

> Many thanks!
> 
> Wen Hao Wang
> Email: wangwhao at cn.ibm.com
> 
> > -- 
> > Doug Ledford <dledford at redhat.com>
> >               GPG KeyID: CFBFF194
> >               http://people.redhat.com/dledford
> > 
> > Infiniband specific RPMs available at
> >               http://people.redhat.com/dledford/Infiniband
> > 
> > [附件 "signature.asc" 被 Wen Hao Wang/China/IBM 删除]
> 
-- 
Doug Ledford <dledford at redhat.com>
              GPG KeyID: CFBFF194
              http://people.redhat.com/dledford

Infiniband specific RPMs available at
              http://people.redhat.com/dledford/Infiniband

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20090216/5d6a91fa/attachment.sig>


More information about the general mailing list