[ofw] ASSERT in cl_spinlock_acquire function in OSM

Smith, Stan stan.smith at intel.com
Sun Nov 21 11:09:56 PST 2010


________________________________
From: Uri Habusha [mailto:urih at mellanox.co.il]
Sent: Saturday, November 20, 2010 1:21 PM
To: Hefty, Sean; ofw at lists.openfabrics.org; Smith, Stan
Subject: ASSERT in cl_spinlock_acquire function in OSM

During our IPoIB regression we got an assert. The reason for the assert is that the spin lock wasn't initialized. I take a look on osm_log object but it's looks to me corrupted, (the log_file_name is wrong).

Is it a known issue?

No.

Yes, the log filename is corrupted as if the invalid memory access fault handler is using the osm log file name buffer/memory for it's error log (sprintf) buffer.
Any idea which module contains the offending address 0x08004633`39010038  ?

The umad_port_id == -1 looks strange.

How are the test systems configured w.r.t. IB fabric?
Single IB switch?
How many other systems attached to the switch?
Is there just a single IPoIB transfer going on?
Any cable pulls or system(s) shutting down?

 What type of IPoIB transfer was going on? How might others reproduce this situation?

Uri

3: kd> kb
RetAddr           : Args to Child                                                           : Call Site
00000000`754954d9 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : ntdll!DbgBreakPoint
00000000`fff01b72 : 00000000`007cef18 00000000`00000000 00000000`ffec5758 00000000`00222460 : complibd!cl_spinlock_acquire+0x39 [s:\builds\6861\trunk\inc\user\complib\cl_spinlock_osd.h @ 107]
00000000`fff62879 : 00000000`007cef10 00000000`00000010 00000000`ffeb652c 00000000`ffed8db8 : opensm!osm_log+0x1c2 [s:\builds\6861\trunk\ulp\opensm\user\opensm\osm_log.c @ 171]
00000000`fff0247e : 00000000`006127c0 00000000`00000100 00000000`007c6fb0 00000000`0116f9a0 : opensm!osm_vendor_get+0x49 [s:\builds\6861\trunk\ulp\opensm\user\libvendor\osm_vendor_ibumad.c @ 995]
00000000`fff608b3 : 00000000`0024f1b0 00000000`006127c0 00000000`00000100 00000000`0116f9a0 : opensm!osm_mad_pool_get+0xbe [s:\builds\6861\trunk\ulp\opensm\user\opensm\osm_mad_pool.c @ 95]
00000000`754a2d0a : 00000000`00612770 00000000`00000000 00000000`00000000 00000000`00000000 : opensm!umad_receiver+0x3b3 [s:\builds\6861\trunk\ulp\opensm\user\libvendor\osm_vendor_ibumad.c @ 314]
00000000`7712be3d : 00000000`00612770 00000000`00000000 00000000`00000000 00000000`00000000 : complibd!cl_thread_callback+0x1a [s:\builds\6861\trunk\core\complib\user\cl_thread.c @ 49]
00000000`77266a51 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : kernel32!BaseThreadInitThunk+0xd
00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : ntdll!RtlUserThreadStart+0x1d

3: kd> ??p_spinlock
struct _cl_spinlock * 0x00000000`007cef18
   +0x000 crit_sec         : _RTL_CRITICAL_SECTION
   +0x028 initialized      : 0

3: kd> ??p_log
struct osm_log * 0x00000000`007cef10
   +0x000 level            : 0x30 '0'
   +0x008 lock             : _cl_spinlock
   +0x038 count            : 0
   +0x03c max_size         : 0
   +0x040 flush            : 0
   +0x048 out_port         : (null)
   +0x050 accum_log_file   : 0
   +0x054 daemon           : 0
   +0x058 log_file_name    : 0x08004633`39010038  "--- memory read error at address 0x08004633`39010038 ---"
   +0x060 log_prefix       : 0x00000000`00222460  "p???"

3: kd> ??p_vend
struct _osm_vendor * 0x00000000`007cb340
   +0x000 p_log            : 0x00000000`007cef10 osm_log
   +0x008 ca_count         : 0x7c6590
   +0x010 p_ca_info        : (null)
   +0x018 timeout          : 0xbb8
   +0x01c max_retries      : 3
   +0x020 agents           : [32] 0x00000000`006127c0
   +0x120 ca_names         : [32] [64]  "ibv_device0"
   +0x920 mtbl             : vendor_match_tbl
   +0x930 umad_port        : umad_port
   +0x9f8 cb_mutex         : 0x00000000`00000060
   +0xa00 match_tbl_mutex  : 0x00000000`00000064
   +0xa08 umad_port_id     : -1
   +0xa10 receiver         : (null)
   +0xa18 issmfd           : -1
   +0xa1c issm_path        : [256]  ""
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/ofw/attachments/20101121/f8a28916/attachment.html>


More information about the ofw mailing list