[openib-general] OpenSM died a horrible death
Eitan Zahavi
eitan at mellanox.co.il
Thu Jan 6 07:11:01 PST 2005
Hi Shahaf
The assert are in:
osm_lid_mgr.c:968: CL_ASSERT( p_mgr->p_subn->sm_port_guid );
osm_lid_mgr.c:1011: CL_ASSERT( p_mgr->p_subn->sm_port_guid );
osm_mcast_mgr.c:1150: CL_ASSERT( port_guid );
osm_port.c:977: CL_ASSERT( port_guid );
osm_state_mgr.c:806: CL_ASSERT( port_guid );
osm_state_mgr.c:866: CL_ASSERT( port_guid );
osm_vendor_al.c:573: CL_ASSERT( ca_guid );
osm_vendor_al.c:604: CL_ASSERT( p_guids );
osm_vendor_al.c:804: CL_ASSERT( port_guid );
osm_vendor_al.c:864: CL_ASSERT( port_guid );
Eitan Zahavi
Design Technology Director
Mellanox Technologies LTD
Tel:+972-4-9097208
Fax:+972-4-9593245
P.O. Box 586 Yokneam 20692 ISRAEL
-----Original Message-----
From: shaharf [mailto:shaharf at voltaire.com]
Sent: Thursday, January 06, 2005 5:03 PM
To: Tom Duffy
Cc: openib-general at openib.org; Eitan Zahavi; Hal Rosenstock
Subject: RE: [openib-general] OpenSM died a horrible death
Hi Tom,
Can you send me the original mail concerning the SM horrible death? It was
corrupted in our Exchange or it was very large (over 5 MB). If you want to
send very large log files, please send it tared and zipped. If it is only
local Exchange problem (praise Bill), then please just resend it.
Anyhow, I missed the exact context that it happened. From the below email I
got the impression that it occurred after get path record with dest=null. I
didn't find any relevant assert in the code, and I also issued synthetic
path record with dest gid = 0 and it works (return status 500).
Eitan - where is this assert that you think it hit?
I know that it is assert but did anyone understand from where? It should be
written in the log, and in the panic message and you can get it using a gdb.
The process should enter a tight loop to enable to debug it with gdb. The
dump of the backtrace (bt) of the gdb will be very helpful.
Shahar
_____
From: openib-general-bounces at openib.org
[mailto:openib-general-bounces at openib.org] On Behalf Of Eitan Zahavi
Sent: Thursday, January 06, 2005 8:50 AM
To: Hal Rosenstock; Tom Duffy
Cc: openib-general at openib.org
Subject: RE: [openib-general] OpenSM died a horrible death
OpenSM asserts on guid=0x0000000000000000
EZ
[Hal] >It looks like it died shortly after the following error:
[Hal] >[1104994208:000136814][43005960] -> __osm_pr_rcv_get_end_points: No
source [Hal] >port with GUID = 0x0000000000000000
_______________________________________________
openib-general mailing list
openib-general at openib.org
http://openib.org/mailman/listinfo/openib-general
<http://openib.org/mailman/listinfo/openib-general>
To unsubscribe, please visit
http://openib.org/mailman/listinfo/openib-general
<http://openib.org/mailman/listinfo/openib-general>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20050106/e6675018/attachment.html>
More information about the general
mailing list