[openib-general] Re: opensm: new segv on shutdown
Tom Duffy
tduffy at sun.com
Thu Jun 2 10:31:35 PDT 2005
On Wed, 2005-06-01 at 20:45 -0400, Hal Rosenstock wrote:
> On Wed, 2005-06-01 at 16:51, Tom Duffy wrote:
> > I am putting together a network with a dumb IB switch, a couple of Linux
> > OpenIB boxes, a Solaris 10 box, a Solaris Nevada box, etc. I fired up
> > opensm on one of the Linux nodes, tried to plumb Solaris, no luck. I
> > then hit control-c on opensm and it crashed. Here is the messages and
> > then crash.
>
> Anything from the Solaris side on what it doesn't like about the OpenIB
> RMPP ?
I am not seeing any errors coming from Solaris, I will have to enable
debug and try again. Clearly OpenSM is able to find the Solaris nodes
(there are 3 solaris, 2 linux):
[root at flopteron2 ~]# ibhosts
Hca : 0x0002c90109766e40 ports 2 "MT23108 InfiniHost Mellanox Technologies"
Hca : 0x0002c90109765630 ports 2 "MT23108 InfiniHost Mellanox Technologies"
Hca : 0x0002c901097624c0 ports 2 "MT23108 InfiniHost Mellanox Technologies"
Hca : 0x0002c90109765710 ports 2 "MT23108 InfiniHost Mellanox Technologies"
Hca : 0x0002c9010a99e030 ports 2 "MT25208 InfiniHostEx Mellanox Technologies"
[root at flopteron2 ~]# ibnetdiscover
#
# Topology file: generated on Thu Jun 2 10:20:58 2005
#
switchguids=0x617000000000d
Switch 8 "S-000617000000000d" # Agilent and RedSwitch High Performance 8 Port 4x IBA Switch port 0 lid 2
[8] "H-0002c90109766e40"[2] [6] "H-0002c90109766e40"[1]
[5] "H-0002c90109765630"[1]
[4] "H-0002c90109765710"[1]
[3] "H-0002c901097624c0"[1]
[2] "H-00109765710"[2]
[7] "H-0002c9010a99e030"[1]
hcaguids=0x2c90109766e40
Hca 2 "H-0002c90109766e40" # MT23108 InfiniHost Mellanox Technologies
[2] "S-000617000000000d"[8] # lid 0 lmc 0
[1] "S-000617000000000d"[6] # lid 0 lmc 0
hcaguids=0x2c90109765630
Hca 2 "H-0002c90109765630" # MT23108 InfiniHost Mellanox Technologies
[1] "S-000617000000000d"[5] # lid 4 lmc 0
hcaguids=0x2c901097624c0
Hca 2 "H-0002c901097624c0" # MT23108 InfiniHost Mellanox Technologies
[1] "S-000617000000000d"[3] # lid 5 lmc 0
hcaguids=0x2c90109765710
Hca 2 "H-0002c90109765710" # MT23108 InfiniHost Mellanox Technologies
[1] "S-000617000000000d"[4] # lid 18 lmc 0
[2] "S-000617000000000d"[2] # lid 3 lmc 0
hcaguids=0x2c9010a99e030
Hca 2 "H-0002c9010a99e030" # MT25208 InfiniHostEx Mellanox Technologies
[1] "S-000617000000000d"[7] # lid 1 lmc 0
---
When I pop open ibsmgui (the graphical IB browser on Solaris), I get an
error:
sa_access_retrieve failed
sa_access_retrieve failed: -18
from the application. This is the relevant opensm log generated by this
query:
Jun 02 10:22:57 [44808960] -> __osm_sa_mad_ctrl_rcv_callback: [
Jun 02 10:22:57 [44808960] -> __osm_sa_mad_ctrl_rcv_callback: 13802 QP1 MADs received.
Jun 02 10:22:57 [44808960] -> SA MAD dump:
base_ver................0x1
mgmt_class..............0x3
class_ver...............0x2
method..................0x12 (SubnAdmGetTable)
status..................0x0
resv....................0x0
trans_id................0x976563100000010
attr_id.................0x31 (ServiceRecord)
resv1...................0x0
attr_mod................0xFFFFFFFF
rmpp_version............0x0
rmpp_type...............0x0
rmpp_flags..............0x0
rmpp_status.............0x0
seg_num.................0x0
payload_len/new_win.....0x0
sm_key..................0x0000000000000000
attr_offset.............0x0
resv2...................0x0
comp_mask...............0x0000000000000000
Jun 02 10:22:57 [44808960] -> __osm_sa_mad_ctrl_process: [
Jun 02 10:22:57 [44808960] -> __osm_sa_mad_ctrl_process: Posting Dispatcher message OSM_MSG_MAD_SERVICE_RECORD.
Jun 02 10:22:57 [44808960] -> __osm_sa_mad_ctrl_process: ]
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20050602/0c2a671f/attachment.sig>
More information about the general
mailing list