[openib-general] Re: opensm: new segv on shutdown

Tom Duffy tduffy at sun.com
Thu Jun 2 10:31:35 PDT 2005


On Wed, 2005-06-01 at 20:45 -0400, Hal Rosenstock wrote:
> On Wed, 2005-06-01 at 16:51, Tom Duffy wrote: 
> > I am putting together a network with a dumb IB switch, a couple of Linux
> > OpenIB boxes, a Solaris 10 box, a Solaris Nevada box, etc.  I fired up
> > opensm on one of the Linux nodes, tried to plumb Solaris, no luck.  I
> > then hit control-c on opensm and it crashed.  Here is the messages and
> > then crash.
> 
> Anything from the Solaris side on what it doesn't like about the OpenIB
> RMPP ?

I am not seeing any errors coming from Solaris, I will have to enable
debug and try again.  Clearly OpenSM is able to find the Solaris nodes
(there are 3 solaris, 2 linux):

[root at flopteron2 ~]# ibhosts
Hca     : 0x0002c90109766e40 ports 2 "MT23108 InfiniHost Mellanox Technologies"
Hca     : 0x0002c90109765630 ports 2 "MT23108 InfiniHost Mellanox Technologies"
Hca     : 0x0002c901097624c0 ports 2 "MT23108 InfiniHost Mellanox Technologies"
Hca     : 0x0002c90109765710 ports 2 "MT23108 InfiniHost Mellanox Technologies"
Hca     : 0x0002c9010a99e030 ports 2 "MT25208 InfiniHostEx Mellanox Technologies"

[root at flopteron2 ~]# ibnetdiscover
#
# Topology file: generated on Thu Jun  2 10:20:58 2005
# 
switchguids=0x617000000000d
Switch  8 "S-000617000000000d"          # Agilent and RedSwitch High Performance 8 Port 4x IBA Switch port 0 lid 2
[8]     "H-0002c90109766e40"[2] [6]     "H-0002c90109766e40"[1]
[5]     "H-0002c90109765630"[1]
[4]     "H-0002c90109765710"[1]
[3]     "H-0002c901097624c0"[1]
[2]     "H-00109765710"[2]
[7]     "H-0002c9010a99e030"[1]

hcaguids=0x2c90109766e40
Hca     2 "H-0002c90109766e40"          # MT23108 InfiniHost Mellanox Technologies
[2]     "S-000617000000000d"[8]         # lid 0 lmc 0
[1]     "S-000617000000000d"[6]         # lid 0 lmc 0

hcaguids=0x2c90109765630
Hca     2 "H-0002c90109765630"          # MT23108 InfiniHost Mellanox Technologies
[1]     "S-000617000000000d"[5]         # lid 4 lmc 0

hcaguids=0x2c901097624c0
Hca     2 "H-0002c901097624c0"          # MT23108 InfiniHost Mellanox Technologies
[1]     "S-000617000000000d"[3]         # lid 5 lmc 0

hcaguids=0x2c90109765710
Hca     2 "H-0002c90109765710"          # MT23108 InfiniHost Mellanox Technologies
[1]     "S-000617000000000d"[4]         # lid 18 lmc 0
[2]     "S-000617000000000d"[2]         # lid 3 lmc 0

hcaguids=0x2c9010a99e030
Hca     2 "H-0002c9010a99e030"          # MT25208 InfiniHostEx Mellanox Technologies
[1]     "S-000617000000000d"[7]         # lid 1 lmc 0

---

When I pop open ibsmgui (the graphical IB browser on Solaris), I get an
error:

sa_access_retrieve failed
sa_access_retrieve failed: -18

from the application.  This is the relevant opensm log generated by this
query:

Jun 02 10:22:57 [44808960] -> __osm_sa_mad_ctrl_rcv_callback: [
Jun 02 10:22:57 [44808960] -> __osm_sa_mad_ctrl_rcv_callback: 13802 QP1 MADs received.
Jun 02 10:22:57 [44808960] -> SA MAD dump:
                                base_ver................0x1
                                mgmt_class..............0x3
                                class_ver...............0x2
                                method..................0x12 (SubnAdmGetTable)
                                status..................0x0
                                resv....................0x0
                                trans_id................0x976563100000010
                                attr_id.................0x31 (ServiceRecord)
                                resv1...................0x0
                                attr_mod................0xFFFFFFFF
                                rmpp_version............0x0
                                rmpp_type...............0x0
                                rmpp_flags..............0x0
                                rmpp_status.............0x0
                                seg_num.................0x0
                                payload_len/new_win.....0x0
                                sm_key..................0x0000000000000000
                                attr_offset.............0x0
                                resv2...................0x0
                                comp_mask...............0x0000000000000000


Jun 02 10:22:57 [44808960] -> __osm_sa_mad_ctrl_process: [
Jun 02 10:22:57 [44808960] -> __osm_sa_mad_ctrl_process: Posting Dispatcher message OSM_MSG_MAD_SERVICE_RECORD.
Jun 02 10:22:57 [44808960] -> __osm_sa_mad_ctrl_process: ]

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20050602/0c2a671f/attachment.sig>


More information about the general mailing list