[ofa-general] dapl bug?

Steve Wise swise at opengridcomputing.com
Thu Apr 24 12:57:30 PDT 2008


Hey Arlin,

Have you ever seen this?  I hit this 100% of the time trying the 1.2 
version of dapltest on an ofed-1.3 system.  The debug info below was 
obtained by builting the src rpm with debug enabled...

> (gdb) r -T T -d -s vic11-10g -D chelsio -i 10 client SR 256 server SR 
> 256 client SR 256 server SR 256
> Starting program: /usr/bin/dapltest -T T -d -s vic11-10g -D chelsio -i 
> 10 client SR 256 server SR 256 client SR 256 server SR 256
> [Thread debugging using libthread_db enabled]
> [New Thread 46912498371600 (LWP 6654)]
> -------------------------------------
> TransCmd.server_name              : vic11-10g
> TransCmd.num_iterations           : 10
> TransCmd.num_threads              : 1
> TransCmd.eps_per_thread           : 1
> TransCmd.validate                 : 0
> TransCmd.dapl_name                : chelsio
> TransCmd.num_ops                  : 4
> TransCmd.op[0].transfer_type      : SEND_RECV  (client)
> TransCmd.op[0].seg_size           : 256
> TransCmd.op[0].num_segs           : 1
> TransCmd.op[0].reap_send_on_recv  : 0
> TransCmd.op[1].transfer_type      : SEND_RECV  (server)
> TransCmd.op[1].seg_size           : 256
> TransCmd.op[1].num_segs           : 1
> TransCmd.op[1].reap_send_on_recv  : 0
> TransCmd.op[2].transfer_type      : SEND_RECV  (client)
> TransCmd.op[2].seg_size           : 256
> TransCmd.op[2].num_segs           : 1
> TransCmd.op[2].reap_send_on_recv  : 0
> TransCmd.op[3].transfer_type      : SEND_RECV  (server)
> TransCmd.op[3].seg_size           : 256
> TransCmd.op[3].num_segs           : 1
> TransCmd.op[3].reap_send_on_recv  : 0
> Server Name: vic11-10g
>
> Program received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 46912498371600 (LWP 6654)]
> 0x00000032f04760b0 in strlen () from /lib64/libc.so.6
> (gdb) bt
> #0  0x00000032f04760b0 in strlen () from /lib64/libc.so.6
> #1  0x00000032f044602b in vfprintf () from /lib64/libc.so.6
> #2  0x00000032f044bdea in printf () from /lib64/libc.so.6
> #3  0x0000000000403900 in DT_NetAddrLookupHostAddress 
> (to_netaddr=0x7e16f88, hostname=0x7e1658c "vic11-10g") at 
> cmd/dapl_netaddr.c:136
> #4  0x00000000004026cb in DT_Params_Parse (argc=<value optimized out>, 
> argv=<value optimized out>, params_ptr=0x7e16580) at cmd/dapl_params.c:205
> #5  0x000000000040211f in dapltest (argc=22, argv=0x7fff48e9b5f8) at 
> cmd/dapl_main.c:88
> #6  0x00000032f041d8a4 in __libc_start_main () from /lib64/libc.so.6
> #7  0x0000000000401f59 in _start ()
> (gdb) 

Its hurling in DT_Mdep_printf() here:

> 134         /* Pull out IP address and print it as a sanity check */
> 135         DT_Mdep_printf ("Server Name: %s \n", hostname);
> 136         DT_Mdep_printf ("Server Net Address: %s\n",
> 137                         inet_ntoa(((struct sockaddr_in 
> *)target->ai_addr)->sin_addr));

The ai_addr looks ok though:
> (gdb) p/x *((struct sockaddr_in *)target->ai_addr)
> $3 = {sin_family = 0x2, sin_port = 0x0, sin_addr = {s_addr = 
> 0x8846a8c0}, sin_zero = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}}
> (gdb)
>

Ever seen this?

Steve.



More information about the general mailing list