[openib-general] OpenSM realloc error
Hal Rosenstock
halr at voltaire.com
Thu Feb 16 15:18:05 PST 2006
Hi Owen,
On Thu, 2006-02-16 at 16:27, Owen Stampflee wrote:
> So, here is the back trace with no code modifications...
>
> 0x00000080b9719db0 in .__GI_raise () from /lib64/tls/libc.so.6
> (gdb) bt
> #0 0x00000080b9719db0 in .__GI_raise () from /lib64/tls/libc.so.6
> #1 0x00000080b971b89c in .__GI_abort () from /lib64/tls/libc.so.6
> #2 0x00000080b974e860 in .__libc_message () from /lib64/tls/libc.so.6
> #3 0x00000080b97580bc in ._int_realloc () from /lib64/tls/libc.so.6
> #4 0x00000080b9759528 in .__realloc () from /lib64/tls/libc.so.6
> #5 0x00000080b975942c in .__realloc () from /lib64/tls/libc.so.6
> #6 0x00000080b974cd30 in ._IO_mem_finish () from /lib64/tls/libc.so.6
> #7 0x00000080b97426b8 in ._IO_new_fclose () from /lib64/tls/libc.so.6
> #8 0x00000080b97b795c in .__GI_vsyslog () from /lib64/tls/libc.so.6
> #9 0x00000080b97b7ddc in .__GI_syslog () from /lib64/tls/libc.so.6
> #10 0x00000080a362be90 in .cl_log_event ()
> from /usr/lib64/libosmcomp.so.1
> #11 0x00000080a35f5700 in .osm_log () from /usr/lib64/libopensm.so.1
> #12 0x000000001001316c in ?? ()
> #13 0x00000000100059b4 in ?? ()
> #14 0x00000080b970411c in .generic_start_main ()
> from /lib64/tls/libc.so.6
> #15 0x00000080b97042a4 in .__libc_start_main ()
> from /lib64/tls/libc.so.6
> #16 0x0000000000000000 in ?? ()
> (gdb)
>
> Commenting out the cl_log_event in osm_log results in this backtrace:
>
> (gdb) bt
> #0 0x00000080b9719db0 in .__GI_raise () from /lib64/tls/libc.so.6
> #1 0x00000080b971b89c in .__GI_abort () from /lib64/tls/libc.so.6
> #2 0x00000080b974e860 in .__libc_message () from /lib64/tls/libc.so.6
> #3 0x00000080b9756db0 in ._int_malloc () from /lib64/tls/libc.so.6
> #4 0x00000080b9758b50 in .__GI___libc_malloc ()
> from /lib64/tls/libc.so.6
> #5 0x00000400000607bc in __cl_malloc_priv (size=0) at
> cl_memory_osd.c:62
> #6 0x00000400000604d4 in __cl_zalloc_ntrk (size=0) at cl_memory.c:416
> #7 0x00000400000629f4 in cl_ptr_vector_set_capacity
> (p_vector=0x100788d0,
> new_capacity=6349) at cl_ptr_vector.c:216
> #8 0x0000040000062acc in cl_ptr_vector_set_size (p_vector=0x0, size=16)
> at cl_ptr_vector.c:270
> #9 0x0000040000062c08 in cl_ptr_vector_init (p_vector=0x100788d0,
> min_size=6349,
> grow_size=16) at cl_ptr_vector.c:93
> #10 0x000004000005bb00 in cl_disp_init (p_disp=0x100788a0,
> thread_count=0,
> name=0x100464c0 "opensm") at cl_dispatcher.c:214
> #11 0x00000000100133f8 in ?? ()
> #12 0x00000000100059b4 in ?? ()
> #13 0x00000080b970411c in .generic_start_main ()
> from /lib64/tls/libc.so.6
> #14 0x00000080b97042a4 in .__libc_start_main ()
> from /lib64/tls/libc.so.6
> #15 0x0000000000000000 in ?? ()
__cl_malloc_priv is just a wrapper for malloc:
from cl_memory_osd.c:
void*
__cl_malloc_priv(
IN const size_t size )
{
return malloc( size );
}
If I believe gdb this appears to be a malloc of 0 bytes but since the
new_capacity was 6349 (and this would be multiplied by sizeof(void *)),
I'm not sure whether to trust this.
Can you send me the compile line from the OpenSM build ? Are the include
paths correct for 64 bit headers ?
> So now I've compiled it in 32-bit mode (had to fix my chroot) and
> everything runs, but I get the following message...
>
> Feb 16 13:59:28 006732 [0000] -> OpenSM Rev:openib-1.1.0
>
> Feb 16 13:59:28 008210 [F7E8D020] -> osm_report_notice: Reporting
> Generic Notice type:3 num:66 from LID:0x0000
> GID:0xfe80000000000000,0x0000000000000000
> Feb 16 13:59:28 008292 [F7E8D020] -> osm_report_notice: Reporting
> Generic Notice type:3 num:66 from LID:0x0000
> GID:0xfe80000000000000,0x0000000000000000
> Feb 16 13:59:28 015894 [F7E8D020] -> osm_vendor_get_all_port_attr:
> assign CA mthca0 port 1 guid (0x2c90109764831) as the default port
> Feb 16 13:59:28 015977 [F7E8D020] -> osm_vendor_bind: Binding to port
> 0x2c90109764831.
> Feb 16 13:59:28 021293 [F7E8D020] -> osm_vendor_bind: Binding to port
> 0x2c90109764831.
> Feb 16 13:59:28 021692 [F568C4E0] -> umad_receiver: ERR 5413: Failed to
> obtain request madw for received MAD(method=0x81 attr=0x11) -- dropping
For some reason, on the response received, it is not finding the match
in the transaction table. I thought this was fixed a while ago for
PowerPC. Can you run opensm with -V and see if there is any more output
that might be helpful ?
> Other info:
> [root at m2 ~]# ibstat
> CA 'mthca0'
> CA type: MT23108
> Number of ports: 2
> Firmware version: 3.3.2
> Hardware version: a1
> Node GUID: 0x0002c90109764830
> System image GUID: 0x0002c90109764833
> Port 1:
> State: Initializing
> Physical state: LinkUp
> Rate: 10
> Base lid: 0
> LMC: 0
> SM lid: 0
> Capability mask: 0x00510a68
> Port GUID: 0x0002c90109764831
> Port 2:
> State: Down
> Physical state: Polling
> Rate: 2
> Base lid: 0
> LMC: 0
> SM lid: 0
> Capability mask: 0x00510a68
> Port GUID: 0x0002c90109764832
>
>
> [root at m2 ~]# ibstatus
> Infiniband device 'mthca0' port 1 status:
> default gid: fe80:0000:0000:0000:0002:c901:0976:4831
> base lid: 0x0
> sm lid: 0x0
> state: 2: INIT
> phys state: 5: LinkUp
> rate: 10 Gb/sec (4X)
This is goodness and means the physical link has been established on
this port.
> Infiniband device 'mthca0' port 2 status:
> default gid: fe80:0000:0000:0000:0002:c901:0976:4832
> base lid: 0x0
> sm lid: 0x0
> state: 1: DOWN
> phys state: 2: Polling
> rate: 2.5 Gb/sec (1X)
>
>
> My archives suggest a firmware upgrade, but 3.3.3 isnt available from
> SBS as far as I can tell and my contact no longer works there so I'm
> going to have to find the new person to talk about getting newer
> firmware, unless of course another vendors firmware will work on this
> card.
I think 3.3.2 should be OK. In any case, I doubt it's the source of the
problem above.
-- Hal
> Cheers,
> Owen
>
More information about the general
mailing list