[ofa-general] smpquery regression in 1.3-rc1

Sasha Khapyorsky sashak at voltaire.com
Fri Dec 21 08:23:43 PST 2007


On 09:08 Thu 20 Dec     , akepner at sgi.com wrote:
> On Thu, Dec 20, 2007 at 05:13:18PM +0000, Sasha Khapyorsky wrote:
> > ...
> > Yevgeny, Arthur, could you rerun smpquery with -dddd (for lot of debug
> > stuff)?
> > 
> 
> Well, just about any perturbation changes the behavior - run 
> it under strace, or gdb, link the IB libraries statically, or 
> look at the machine funny and it works fine. 
> 
> But using the debug flags reveals an apparent problem with the 
> debug code itself:
> 
> # ./smpquery_1.3_rc1 -d -G nodeinfo 0x00066a01a000737c
> ibwarn: [19328] smp_query: attr 0x15 mod 0x0 route DR path 0
> ibwarn: [19328] mad_rpc: data offs 64 sz 64
> mad data
> 0000 0000 0000 0000 fe80 0000 0000 0000
> 0002 0002 0251 0a6a 0000 0000 0103 0302
> 3452 0023 4040 0008 0804 ff40 0000 005e
> 0000 2012 1088 0000 0000 0000 0000 0000
> Segmentation fault
> 
> and gdb shows:
> 
> (gdb) bt
> #0  0x00002b0b9222ed0f in _IO_default_xsputn_internal () from /lib64/libc.so.6
> #1  0x00002b0b92207177 in vfprintf () from /lib64/libc.so.6
> #2  0x00002b0b9229577d in __vsprintf_chk () from /lib64/libc.so.6
> #3  0x00002b0b922956c0 in __sprintf_chk () from /lib64/libc.so.6
> #4  0x00002b0b91c71166 in portid2str (portid=0x7fff1905bc00) at src/portid.c:91
> #5  0x00002b0b91c72529 in sa_rpc_call (ibmad_port=0x7fff1905b680,
>     rcvbuf=0x7fff1905bb30, portid=0x7fff1905bc00, sa=0x7fff1905bac0, timeout=0)
>     at src/sa.c:58
> #6  0x00002b0b91c71791 in sa_call (rcvbuf=0x7fff1905bb30,
>     portid=0x7fff1905bc00, sa=0x7fff1905bac0, timeout=0) at src/rpc.c:395
> #7  0x00002b0b91c723bf in ib_path_query (srcgid=0x7fff1905be30 "\200",
>     destgid=0x7fff1905be30 "\200", sm_id=0x7fff1905bc00, buf=0x7fff1905bb30)
>     at ./include/infiniband/mad.h:790
> #8  0x00002b0b91c7144f in ib_resolve_guid (portid=0x7fff1905bde0,
>     guid=0x7fff1905bd20, sm_id=0x7fff1905bc00, timeout=<value optimized out>)
>     at src/resolve.c:83
> #9  0x00002b0b91c71610 in ib_resolve_portid_str (portid=0x7fff1905bde0,
>     addr_str=0x7fff1905d341 "0x00066a01a000737c", dest_type=2, sm_id=0x0)
>     at src/resolve.c:115
> #10 0x0000000000401cd1 in main (argc=2, argv=0x7fff1905bfd0)
>     at smpquery_1.3_rc1.c:522

Thanks for this great debug info. I'm not able to reproduce the segfault,
but looking at your backtrace think that this patch could fix segfault:

diff --git a/libibmad/src/resolve.c b/libibmad/src/resolve.c
index 05b443d..d8365b2 100644
--- a/labium/src/resolve.c
+++ b/libibmad/src/resolve.c
@@ -56,6 +56,8 @@ ib_resolve_smlid(ib_portid_t *sm_id, int timeout)
 	uint8_t portinfo[64];
 	int lid;
 
+	memset(sm_id, 0, sizeof(*sm_id));
+
 	if (!smp_query(portinfo, &self, IB_ATTR_PORT_INFO, 0, 0))
 		return -1;
 

Sasha



More information about the general mailing list