[ofa-general] Bug with SDP on IA64

Nicolas Morey Chaisemartin nicolas.morey-chaisemartin at ext.bull.net
Fri Oct 17 06:23:30 PDT 2008


Hi,

I am stuck with a bug from ofa-kernel 1.3.1 on an IA64 running a Bull 
2.6.18 kernel.
When doing SDP transfers from an IA64 to any other host (IA64, x86, 
x86_64) through ttcp, I got this message:

[root at h2 ~]# LD_PRELOAD=/usr/lib/libsdp.so.1 ~/ttcp/ttcp -t -s 192.168.0.10
ttcp-t: buflen=8192, nbuf=2048, align=16384/0, port=5001  tcp  -> 
192.168.0.10
ttcp-t: socket
ttcp-t: tcp_maxseg
ttcp-t: connect
ttcp-t: IO: Connection reset by peer
errno=104
[root at h2 ~]#

And the same error on the other side.
I activated the debug mode for sdp module and found out than on the 
receiver side a completion error 1 shows up:
Oct 16 12:40:43 s_kernel at yack0 kernel: sdp_sock(5001:36814): Recv 
completion with error. Status 1
Oct 16 12:40:43 s_kernel at yack0 kernel: sdp_sock(5001:36814): sdp_reset 
state=1
Oct 16 12:40:44 s_kernel at yack0 kernel: sdp_sock(5001:36814): 
sdp_cma_handler event 10 id 0000010425120600
Oct 16 12:40:44 s_kernel at yack0 kernel: sdp_sock(5001:36814): 
RDMA_CM_EVENT_DISCONNECTED

The error triggers a socket reset which terminates the connection.
According to the docs I could find, Status 1 is a local length error, 
meaning the size written in the packet doesn't match the payload.

I've noticed that with few packets (<= 100) or when ttcp is slowed down 
(started through strace) transfers seem to work.

I've tried to update to the latest ofa-kernel (1.4.1 from 10/16/2008) 
and the bug is still there.

Has anyone seen this problem before? What can I do to locate where 
things go wrong?

Regards

Nicolas



More information about the general mailing list