[ofa-general] dapl bug?
Steve Wise
swise at opengridcomputing.com
Thu Apr 24 12:57:30 PDT 2008
Hey Arlin,
Have you ever seen this? I hit this 100% of the time trying the 1.2
version of dapltest on an ofed-1.3 system. The debug info below was
obtained by builting the src rpm with debug enabled...
> (gdb) r -T T -d -s vic11-10g -D chelsio -i 10 client SR 256 server SR
> 256 client SR 256 server SR 256
> Starting program: /usr/bin/dapltest -T T -d -s vic11-10g -D chelsio -i
> 10 client SR 256 server SR 256 client SR 256 server SR 256
> [Thread debugging using libthread_db enabled]
> [New Thread 46912498371600 (LWP 6654)]
> -------------------------------------
> TransCmd.server_name : vic11-10g
> TransCmd.num_iterations : 10
> TransCmd.num_threads : 1
> TransCmd.eps_per_thread : 1
> TransCmd.validate : 0
> TransCmd.dapl_name : chelsio
> TransCmd.num_ops : 4
> TransCmd.op[0].transfer_type : SEND_RECV (client)
> TransCmd.op[0].seg_size : 256
> TransCmd.op[0].num_segs : 1
> TransCmd.op[0].reap_send_on_recv : 0
> TransCmd.op[1].transfer_type : SEND_RECV (server)
> TransCmd.op[1].seg_size : 256
> TransCmd.op[1].num_segs : 1
> TransCmd.op[1].reap_send_on_recv : 0
> TransCmd.op[2].transfer_type : SEND_RECV (client)
> TransCmd.op[2].seg_size : 256
> TransCmd.op[2].num_segs : 1
> TransCmd.op[2].reap_send_on_recv : 0
> TransCmd.op[3].transfer_type : SEND_RECV (server)
> TransCmd.op[3].seg_size : 256
> TransCmd.op[3].num_segs : 1
> TransCmd.op[3].reap_send_on_recv : 0
> Server Name: vic11-10g
>
> Program received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 46912498371600 (LWP 6654)]
> 0x00000032f04760b0 in strlen () from /lib64/libc.so.6
> (gdb) bt
> #0 0x00000032f04760b0 in strlen () from /lib64/libc.so.6
> #1 0x00000032f044602b in vfprintf () from /lib64/libc.so.6
> #2 0x00000032f044bdea in printf () from /lib64/libc.so.6
> #3 0x0000000000403900 in DT_NetAddrLookupHostAddress
> (to_netaddr=0x7e16f88, hostname=0x7e1658c "vic11-10g") at
> cmd/dapl_netaddr.c:136
> #4 0x00000000004026cb in DT_Params_Parse (argc=<value optimized out>,
> argv=<value optimized out>, params_ptr=0x7e16580) at cmd/dapl_params.c:205
> #5 0x000000000040211f in dapltest (argc=22, argv=0x7fff48e9b5f8) at
> cmd/dapl_main.c:88
> #6 0x00000032f041d8a4 in __libc_start_main () from /lib64/libc.so.6
> #7 0x0000000000401f59 in _start ()
> (gdb)
Its hurling in DT_Mdep_printf() here:
> 134 /* Pull out IP address and print it as a sanity check */
> 135 DT_Mdep_printf ("Server Name: %s \n", hostname);
> 136 DT_Mdep_printf ("Server Net Address: %s\n",
> 137 inet_ntoa(((struct sockaddr_in
> *)target->ai_addr)->sin_addr));
The ai_addr looks ok though:
> (gdb) p/x *((struct sockaddr_in *)target->ai_addr)
> $3 = {sin_family = 0x2, sin_port = 0x0, sin_addr = {s_addr =
> 0x8846a8c0}, sin_zero = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}}
> (gdb)
>
Ever seen this?
Steve.
More information about the general
mailing list