[ofa-general] Bug with SDP on IA64
Nicolas Morey Chaisemartin
nicolas.morey-chaisemartin at ext.bull.net
Mon Oct 27 03:43:40 PDT 2008
Amir Vadai a écrit :
> I asked our IB expert Jack for hints and he told me this:
>
>
> >From Section 11.6.2 (COMPLETION RETURN STATUS0 of the IB Spec volume 1, revision 1.2.1
> * Local Length Error - ... Generated for a
> Work Request posted to the local Receive Queue when the sum of
> the Data Segment lengths is too small to receive a valid incoming
> message or the length of the incoming message is greater than the
> maximum message size supported by the HCA port that received the
> message.
>
>
> There seem to be 2 possibilities:
> 1. The receiver did not post enough/large-enough scatter gather entries in
> the receive queue.
>
>
> or
> 2. The sender sent a 0-length packet, but did so incorrectly.
> (if any of the s/g entries (i.e., data segment entries) have a zero
> byte count, this results in 2 GigaBytes of data being sent over the wire).
>
>
> I note that SDP does not check for this (see sdp_post_send() in file sdp_bcopy.c:
> the sge->length field is not checked for zero length).
>
>
I think I got it.
In sdp_cma.c/sdp_response_handler,
the fragment size is retrieved through
sdp_sk(sk)->xmit_size_goal = ntohl(h->actrcvsz) -
sizeof(struct sdp_bsdh);
The dmesg messages shows :
sdp_sock(41820:0): sdp_response_handler bufs 64 xmit_size_goal 34816
send trigger 16
I forced this value to 2048 and then it works.
On Xeon this size is 2048 by default.
In my understanding the xmit_size_goal is the size of the receiving
buffer for buffered copies, isn't it?
So it shouldn't really matters as long as the packet is properly split
at the MTu size to be sent over the network, right?
Could it be only working from x86/x86_64 working because the buffer size
is smaller than the MTU?
Nicolas
More information about the general
mailing list