[openib-general] OpenSM realloc error

Owen Stampflee ostampflee at terrasoftsolutions.com
Thu Feb 16 17:43:16 PST 2006


A 32-bit build of 5411 gets the link to become active and ipv_rc_pingpng
works, but I cant bring up ipoib...

dmesg says this (tried both ib0 and ib1 to ensure ports werent swapped)
ADDRCONF(NETDEV_UP): ib0: link is not ready
ADDRCONF(NETDEV_UP): ib1: link is not ready

At least we're making progress.

Thanks,
Owen

On Thu, 2006-02-16 at 18:18 -0500, Hal Rosenstock wrote:
> Hi Owen,
> 
> On Thu, 2006-02-16 at 16:27, Owen Stampflee wrote:
> > So, here is the back trace with no code modifications...
> > 
> > 0x00000080b9719db0 in .__GI_raise () from /lib64/tls/libc.so.6
> > (gdb) bt
> > #0  0x00000080b9719db0 in .__GI_raise () from /lib64/tls/libc.so.6
> > #1  0x00000080b971b89c in .__GI_abort () from /lib64/tls/libc.so.6
> > #2  0x00000080b974e860 in .__libc_message () from /lib64/tls/libc.so.6
> > #3  0x00000080b97580bc in ._int_realloc () from /lib64/tls/libc.so.6
> > #4  0x00000080b9759528 in .__realloc () from /lib64/tls/libc.so.6
> > #5  0x00000080b975942c in .__realloc () from /lib64/tls/libc.so.6
> > #6  0x00000080b974cd30 in ._IO_mem_finish () from /lib64/tls/libc.so.6
> > #7  0x00000080b97426b8 in ._IO_new_fclose () from /lib64/tls/libc.so.6
> > #8  0x00000080b97b795c in .__GI_vsyslog () from /lib64/tls/libc.so.6
> > #9  0x00000080b97b7ddc in .__GI_syslog () from /lib64/tls/libc.so.6
> > #10 0x00000080a362be90 in .cl_log_event ()
> > from /usr/lib64/libosmcomp.so.1
> > #11 0x00000080a35f5700 in .osm_log () from /usr/lib64/libopensm.so.1
> > #12 0x000000001001316c in ?? ()
> > #13 0x00000000100059b4 in ?? ()
> > #14 0x00000080b970411c in .generic_start_main ()
> > from /lib64/tls/libc.so.6
> > #15 0x00000080b97042a4 in .__libc_start_main ()
> > from /lib64/tls/libc.so.6
> > #16 0x0000000000000000 in ?? ()
> > (gdb)
> > 
> > Commenting out the cl_log_event in osm_log results in this backtrace:
> > 
> > (gdb) bt
> > #0  0x00000080b9719db0 in .__GI_raise () from /lib64/tls/libc.so.6
> > #1  0x00000080b971b89c in .__GI_abort () from /lib64/tls/libc.so.6
> > #2  0x00000080b974e860 in .__libc_message () from /lib64/tls/libc.so.6
> > #3  0x00000080b9756db0 in ._int_malloc () from /lib64/tls/libc.so.6
> > #4  0x00000080b9758b50 in .__GI___libc_malloc ()
> > from /lib64/tls/libc.so.6
> > #5  0x00000400000607bc in __cl_malloc_priv (size=0) at
> > cl_memory_osd.c:62
> > #6  0x00000400000604d4 in __cl_zalloc_ntrk (size=0) at cl_memory.c:416
> > #7  0x00000400000629f4 in cl_ptr_vector_set_capacity
> > (p_vector=0x100788d0,
> >     new_capacity=6349) at cl_ptr_vector.c:216
> > #8  0x0000040000062acc in cl_ptr_vector_set_size (p_vector=0x0, size=16)
> >     at cl_ptr_vector.c:270
> > #9  0x0000040000062c08 in cl_ptr_vector_init (p_vector=0x100788d0,
> > min_size=6349,
> >     grow_size=16) at cl_ptr_vector.c:93
> > #10 0x000004000005bb00 in cl_disp_init (p_disp=0x100788a0,
> > thread_count=0,
> >     name=0x100464c0 "opensm") at cl_dispatcher.c:214
> > #11 0x00000000100133f8 in ?? ()
> > #12 0x00000000100059b4 in ?? ()
> > #13 0x00000080b970411c in .generic_start_main ()
> > from /lib64/tls/libc.so.6
> > #14 0x00000080b97042a4 in .__libc_start_main ()
> > from /lib64/tls/libc.so.6
> > #15 0x0000000000000000 in ?? ()
> 
> __cl_malloc_priv is just a wrapper for malloc:
> 
> from cl_memory_osd.c:
> void*
> __cl_malloc_priv(
>         IN      const size_t    size )
> {
>         return malloc( size );
> }
> 
> If I believe gdb this appears to be a malloc of 0 bytes but since the
> new_capacity was 6349 (and this would be multiplied by sizeof(void *)),
> I'm not sure whether to trust this.
> 
> Can you send me the compile line from the OpenSM build ? Are the include
> paths correct for 64 bit headers ?
> 
> > So now I've compiled it in 32-bit mode (had to fix my chroot) and
> > everything runs, but I get the following message...
> > 
> > Feb 16 13:59:28 006732 [0000] -> OpenSM Rev:openib-1.1.0
> >  
> > Feb 16 13:59:28 008210 [F7E8D020] -> osm_report_notice: Reporting
> > Generic Notice type:3 num:66 from LID:0x0000
> > GID:0xfe80000000000000,0x0000000000000000
> > Feb 16 13:59:28 008292 [F7E8D020] -> osm_report_notice: Reporting
> > Generic Notice type:3 num:66 from LID:0x0000
> > GID:0xfe80000000000000,0x0000000000000000
> > Feb 16 13:59:28 015894 [F7E8D020] -> osm_vendor_get_all_port_attr:
> > assign CA mthca0 port 1 guid (0x2c90109764831) as the default port
> > Feb 16 13:59:28 015977 [F7E8D020] -> osm_vendor_bind: Binding to port
> > 0x2c90109764831.
> > Feb 16 13:59:28 021293 [F7E8D020] -> osm_vendor_bind: Binding to port
> > 0x2c90109764831.
> > Feb 16 13:59:28 021692 [F568C4E0] -> umad_receiver: ERR 5413: Failed to
> > obtain request madw for received MAD(method=0x81 attr=0x11) -- dropping
> 
> For some reason, on the response received, it is not finding the match
> in the transaction table. I thought this was fixed a while ago for
> PowerPC. Can you run opensm with -V and see if there is any more output
> that might be helpful ?
> 
> > Other info:
> > [root at m2 ~]# ibstat
> > CA 'mthca0'
> >         CA type: MT23108
> >         Number of ports: 2
> >         Firmware version: 3.3.2
> >         Hardware version: a1
> >         Node GUID: 0x0002c90109764830
> >         System image GUID: 0x0002c90109764833
> >         Port 1:
> >                 State: Initializing
> >                 Physical state: LinkUp
> >                 Rate: 10
> >                 Base lid: 0
> >                 LMC: 0
> >                 SM lid: 0
> >                 Capability mask: 0x00510a68
> >                 Port GUID: 0x0002c90109764831
> >         Port 2:
> >                 State: Down
> >                 Physical state: Polling
> >                 Rate: 2
> >                 Base lid: 0
> >                 LMC: 0
> >                 SM lid: 0
> >                 Capability mask: 0x00510a68
> >                 Port GUID: 0x0002c90109764832
> > 
> > 
> > [root at m2 ~]# ibstatus
> > Infiniband device 'mthca0' port 1 status:
> >         default gid:     fe80:0000:0000:0000:0002:c901:0976:4831
> >         base lid:        0x0
> >         sm lid:          0x0
> >         state:           2: INIT
> >         phys state:      5: LinkUp
> >         rate:            10 Gb/sec (4X)
> 
> This is goodness and means the physical link has been established on
> this port.
> 
> > Infiniband device 'mthca0' port 2 status:
> >         default gid:     fe80:0000:0000:0000:0002:c901:0976:4832
> >         base lid:        0x0
> >         sm lid:          0x0
> >         state:           1: DOWN
> >         phys state:      2: Polling
> >         rate:            2.5 Gb/sec (1X)
> > 
> > 
> > My archives suggest a firmware upgrade, but 3.3.3 isnt available from
> > SBS as far as I can tell and my contact no longer works there so I'm
> > going to have to find the new person to talk about getting newer
> > firmware, unless of course another vendors firmware will work on this
> > card.
> 
> I think 3.3.2 should be OK. In any case, I doubt it's the source of the
> problem above.
> 
> -- Hal
> 
> > Cheers,
> > Owen
> > 
> 
> 
> !DSPAM:43f50a778141514148722!




More information about the general mailing list