[ofa-general] sminfo report iberror in the first configuration on RHEL5.3

Wen Hao Wang wangwhao at cn.ibm.com
Thu Feb 12 16:10:22 PST 2009



Nicolas Morey Chaisemartin <nicolas.morey-chaisemartin at ext.bull.net> 写于
2009-02-12 20:20:36:

> Wen Hao Wang wrote:
> >
> > Hi all:
> >
> > I changed my blade OS to RHEL5.3 yesterday and installed OFED (shipped
> > in RHEL5.3 image) by "yum groupisntall". Then I load some drivers and
> > wrote network interface configuration file ifcfg-ib0. ifup ib0 also
> > succeeded. But IB utilites report Connetion timed out.
> >
> >
> > [root at xblade06 network-scripts]# sminfo
> > ibwarn: [32593] _do_madrpc: recv failed: Connection timed out
> > ibwarn: [32593] mad_rpc: _do_madrpc failed; dport (Lid 9)
> > sminfo: iberror: failed: query
> >
> > I had to reboot the blade and rerun "openibd start". Then sminfo
> > reported correct contents. I do not suppose this reboot is required.
> > Did I miss any configuration step?
> >
> > Moreover, "openibd start" report one warning message about hwconf.
> > Anyone has comments about this?
> >
> > [root at xblade07 ~]# /etc/init.d/openibd start
> > Loading OpenIB kernel modules:grep: /etc/sysconfig/hwconf: No such
> > file or directory
> > [ OK ]
> >
> > Thanks a lot!
> >
> > Wen Hao Wang
> > Email: wangwhao at cn.ibm.com
> >
> >
------------------------------------------------------------------------
> >
> > _______________________________________________
> > general mailing list
> > general at lists.openfabrics.org
> > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> >
> > To unsubscribe, please visit http://openib.
> org/mailman/listinfo/openib-general
> Sounds to me as if you haven't any Subnet Manager (OpenSM or managed
> switch) running.
> $sminfo
> sminfo: sm lid 2 sm guid 0x8f1040041254a, activity count 751941 priority
> 3 state 3 SMINFO_MASTER
> $ sminfo -P 2
> ibwarn: [17975] mad_rpc: _do_madrpc failed; dport (Lid 3945)
> sminfo: iberror: failed: query
>
> (we don't have any SM on the subnet connected to port 2)
>
> Your reboot might have started OpenSM. Thus making it works
>
> Nicolas
>
>

OpenSM is running on another machine with Lid 9. While this machine
(xblade06)
has Lid 8. Here is the output after reboot:

[root at xblade06 ~]# sminfo
sminfo: sm lid 9 sm guid 0x2c90300013101, activity count 618300 priority 0
state 3 SMINFO_MASTER
[root at xblade06 ~]# ps -ef|grep opensm
root      5369  5234  0 00:08 pts/0    00:00:00 grep opensm
[root at xblade06 ~]# ibv_devices
    device                 node GUID
    ------              ----------------
    mlx4_0              0002c903000134b0
[root at xblade06 ~]# ibnetdiscover |grep 2c903000134b0
# Initiated from node 0002c903000134b0 port 0002c903000134b1
[10]    "H-0002c903000134b0"[1](2c903000134b1)          # "xblade06 HCA-1"
lid 8 4xSDR
caguid=0x2c903000134b0
Ca      2 "H-0002c903000134b0"          # "xblade06 HCA-1"

Thanks!

Wen Hao Wang
Email: wangwhao at cn.ibm.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20090213/32185d98/attachment.html>


More information about the general mailing list