[ofa-general] Bug with SDP on IA64
Nicolas Morey Chaisemartin
nicolas.morey-chaisemartin at ext.bull.net
Fri Oct 17 06:23:30 PDT 2008
Hi,
I am stuck with a bug from ofa-kernel 1.3.1 on an IA64 running a Bull
2.6.18 kernel.
When doing SDP transfers from an IA64 to any other host (IA64, x86,
x86_64) through ttcp, I got this message:
[root at h2 ~]# LD_PRELOAD=/usr/lib/libsdp.so.1 ~/ttcp/ttcp -t -s 192.168.0.10
ttcp-t: buflen=8192, nbuf=2048, align=16384/0, port=5001 tcp ->
192.168.0.10
ttcp-t: socket
ttcp-t: tcp_maxseg
ttcp-t: connect
ttcp-t: IO: Connection reset by peer
errno=104
[root at h2 ~]#
And the same error on the other side.
I activated the debug mode for sdp module and found out than on the
receiver side a completion error 1 shows up:
Oct 16 12:40:43 s_kernel at yack0 kernel: sdp_sock(5001:36814): Recv
completion with error. Status 1
Oct 16 12:40:43 s_kernel at yack0 kernel: sdp_sock(5001:36814): sdp_reset
state=1
Oct 16 12:40:44 s_kernel at yack0 kernel: sdp_sock(5001:36814):
sdp_cma_handler event 10 id 0000010425120600
Oct 16 12:40:44 s_kernel at yack0 kernel: sdp_sock(5001:36814):
RDMA_CM_EVENT_DISCONNECTED
The error triggers a socket reset which terminates the connection.
According to the docs I could find, Status 1 is a local length error,
meaning the size written in the packet doesn't match the payload.
I've noticed that with few packets (<= 100) or when ttcp is slowed down
(started through strace) transfers seem to work.
I've tried to update to the latest ofa-kernel (1.4.1 from 10/16/2008)
and the bug is still there.
Has anyone seen this problem before? What can I do to locate where
things go wrong?
Regards
Nicolas
More information about the general
mailing list