[openib-general] OpenSM realloc error

Owen Stampflee ostampflee at terrasoftsolutions.com
Thu Feb 16 13:27:47 PST 2006


So, here is the back trace with no code modifications...

0x00000080b9719db0 in .__GI_raise () from /lib64/tls/libc.so.6
(gdb) bt
#0  0x00000080b9719db0 in .__GI_raise () from /lib64/tls/libc.so.6
#1  0x00000080b971b89c in .__GI_abort () from /lib64/tls/libc.so.6
#2  0x00000080b974e860 in .__libc_message () from /lib64/tls/libc.so.6
#3  0x00000080b97580bc in ._int_realloc () from /lib64/tls/libc.so.6
#4  0x00000080b9759528 in .__realloc () from /lib64/tls/libc.so.6
#5  0x00000080b975942c in .__realloc () from /lib64/tls/libc.so.6
#6  0x00000080b974cd30 in ._IO_mem_finish () from /lib64/tls/libc.so.6
#7  0x00000080b97426b8 in ._IO_new_fclose () from /lib64/tls/libc.so.6
#8  0x00000080b97b795c in .__GI_vsyslog () from /lib64/tls/libc.so.6
#9  0x00000080b97b7ddc in .__GI_syslog () from /lib64/tls/libc.so.6
#10 0x00000080a362be90 in .cl_log_event ()
from /usr/lib64/libosmcomp.so.1
#11 0x00000080a35f5700 in .osm_log () from /usr/lib64/libopensm.so.1
#12 0x000000001001316c in ?? ()
#13 0x00000000100059b4 in ?? ()
#14 0x00000080b970411c in .generic_start_main ()
from /lib64/tls/libc.so.6
#15 0x00000080b97042a4 in .__libc_start_main ()
from /lib64/tls/libc.so.6
#16 0x0000000000000000 in ?? ()
(gdb)

Commenting out the cl_log_event in osm_log results in this backtrace:

(gdb) bt
#0  0x00000080b9719db0 in .__GI_raise () from /lib64/tls/libc.so.6
#1  0x00000080b971b89c in .__GI_abort () from /lib64/tls/libc.so.6
#2  0x00000080b974e860 in .__libc_message () from /lib64/tls/libc.so.6
#3  0x00000080b9756db0 in ._int_malloc () from /lib64/tls/libc.so.6
#4  0x00000080b9758b50 in .__GI___libc_malloc ()
from /lib64/tls/libc.so.6
#5  0x00000400000607bc in __cl_malloc_priv (size=0) at
cl_memory_osd.c:62
#6  0x00000400000604d4 in __cl_zalloc_ntrk (size=0) at cl_memory.c:416
#7  0x00000400000629f4 in cl_ptr_vector_set_capacity
(p_vector=0x100788d0,
    new_capacity=6349) at cl_ptr_vector.c:216
#8  0x0000040000062acc in cl_ptr_vector_set_size (p_vector=0x0, size=16)
    at cl_ptr_vector.c:270
#9  0x0000040000062c08 in cl_ptr_vector_init (p_vector=0x100788d0,
min_size=6349,
    grow_size=16) at cl_ptr_vector.c:93
#10 0x000004000005bb00 in cl_disp_init (p_disp=0x100788a0,
thread_count=0,
    name=0x100464c0 "opensm") at cl_dispatcher.c:214
#11 0x00000000100133f8 in ?? ()
#12 0x00000000100059b4 in ?? ()
#13 0x00000080b970411c in .generic_start_main ()
from /lib64/tls/libc.so.6
#14 0x00000080b97042a4 in .__libc_start_main ()
from /lib64/tls/libc.so.6
#15 0x0000000000000000 in ?? ()


So now I've compiled it in 32-bit mode (had to fix my chroot) and
everything runs, but I get the following message...

Feb 16 13:59:28 006732 [0000] -> OpenSM Rev:openib-1.1.0
 
Feb 16 13:59:28 008210 [F7E8D020] -> osm_report_notice: Reporting
Generic Notice type:3 num:66 from LID:0x0000
GID:0xfe80000000000000,0x0000000000000000
Feb 16 13:59:28 008292 [F7E8D020] -> osm_report_notice: Reporting
Generic Notice type:3 num:66 from LID:0x0000
GID:0xfe80000000000000,0x0000000000000000
Feb 16 13:59:28 015894 [F7E8D020] -> osm_vendor_get_all_port_attr:
assign CA mthca0 port 1 guid (0x2c90109764831) as the default port
Feb 16 13:59:28 015977 [F7E8D020] -> osm_vendor_bind: Binding to port
0x2c90109764831.
Feb 16 13:59:28 021293 [F7E8D020] -> osm_vendor_bind: Binding to port
0x2c90109764831.
Feb 16 13:59:28 021692 [F568C4E0] -> umad_receiver: ERR 5413: Failed to
obtain request madw for received MAD(method=0x81 attr=0x11) -- dropping

Other info:
[root at m2 ~]# ibstat
CA 'mthca0'
        CA type: MT23108
        Number of ports: 2
        Firmware version: 3.3.2
        Hardware version: a1
        Node GUID: 0x0002c90109764830
        System image GUID: 0x0002c90109764833
        Port 1:
                State: Initializing
                Physical state: LinkUp
                Rate: 10
                Base lid: 0
                LMC: 0
                SM lid: 0
                Capability mask: 0x00510a68
                Port GUID: 0x0002c90109764831
        Port 2:
                State: Down
                Physical state: Polling
                Rate: 2
                Base lid: 0
                LMC: 0
                SM lid: 0
                Capability mask: 0x00510a68
                Port GUID: 0x0002c90109764832


[root at m2 ~]# ibstatus
Infiniband device 'mthca0' port 1 status:
        default gid:     fe80:0000:0000:0000:0002:c901:0976:4831
        base lid:        0x0
        sm lid:          0x0
        state:           2: INIT
        phys state:      5: LinkUp
        rate:            10 Gb/sec (4X)
 
Infiniband device 'mthca0' port 2 status:
        default gid:     fe80:0000:0000:0000:0002:c901:0976:4832
        base lid:        0x0
        sm lid:          0x0
        state:           1: DOWN
        phys state:      2: Polling
        rate:            2.5 Gb/sec (1X)


My archives suggest a firmware upgrade, but 3.3.3 isnt available from
SBS as far as I can tell and my contact no longer works there so I'm
going to have to find the new person to talk about getting newer
firmware, unless of course another vendors firmware will work on this
card.

Cheers,
Owen




More information about the general mailing list