[openib-general] OpenSM died a horrible death

Eitan Zahavi eitan at mellanox.co.il
Thu Jan 6 07:11:01 PST 2005


Hi Shahaf
 
The assert are in:
osm_lid_mgr.c:968:  CL_ASSERT( p_mgr->p_subn->sm_port_guid );
osm_lid_mgr.c:1011:  CL_ASSERT( p_mgr->p_subn->sm_port_guid );
osm_mcast_mgr.c:1150:  CL_ASSERT( port_guid );
osm_port.c:977:  CL_ASSERT( port_guid );
osm_state_mgr.c:806:  CL_ASSERT( port_guid );
osm_state_mgr.c:866:  CL_ASSERT( port_guid );
osm_vendor_al.c:573:  CL_ASSERT( ca_guid );
osm_vendor_al.c:604:  CL_ASSERT( p_guids );
osm_vendor_al.c:804:  CL_ASSERT( port_guid );
osm_vendor_al.c:864:  CL_ASSERT( port_guid );
 
Eitan Zahavi
Design Technology Director
Mellanox Technologies LTD
Tel:+972-4-9097208
Fax:+972-4-9593245
P.O. Box 586 Yokneam 20692 ISRAEL
 
-----Original Message-----
From: shaharf [mailto:shaharf at voltaire.com] 
Sent: Thursday, January 06, 2005 5:03 PM
To: Tom Duffy
Cc: openib-general at openib.org; Eitan Zahavi; Hal Rosenstock
Subject: RE: [openib-general] OpenSM died a horrible death
 
Hi Tom,
 Can you send me the original mail concerning the SM horrible death? It was
corrupted in our Exchange or it was very large (over 5 MB). If you want to
send very large log files, please send it tared and zipped. If it is only
local Exchange problem (praise Bill), then please just resend it.
 
Anyhow, I missed the exact context that it happened. From the below email I
got the impression that it occurred after get path record with dest=null. I
didn't find any relevant assert in the code, and I also issued synthetic
path record with dest gid = 0 and it works (return status 500). 
 
Eitan - where is this assert that you think it hit?
 
I know that it is assert but did anyone understand from where? It should be
written in the log, and in the panic message and you can get it using a gdb.
The process should enter a tight loop to enable to debug it with gdb. The
dump of the backtrace (bt) of the gdb will be very helpful. 
 
Shahar
 
  _____  

From: openib-general-bounces at openib.org
[mailto:openib-general-bounces at openib.org] On Behalf Of Eitan Zahavi
Sent: Thursday, January 06, 2005 8:50 AM
To: Hal Rosenstock; Tom Duffy
Cc: openib-general at openib.org
Subject: RE: [openib-general] OpenSM died a horrible death
 
OpenSM asserts on guid=0x0000000000000000 
EZ 
[Hal] >It looks like it died shortly after the following error: 
[Hal] >[1104994208:000136814][43005960] -> __osm_pr_rcv_get_end_points: No
source [Hal] >port with GUID = 0x0000000000000000
 
_______________________________________________ 
openib-general mailing list 
openib-general at openib.org 
http://openib.org/mailman/listinfo/openib-general
<http://openib.org/mailman/listinfo/openib-general>  
To unsubscribe, please visit
http://openib.org/mailman/listinfo/openib-general
<http://openib.org/mailman/listinfo/openib-general>  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20050106/e6675018/attachment.html>


More information about the general mailing list