[ofa-general] Bug with SDP on IA64

Nicolas Morey Chaisemartin nicolas.morey-chaisemartin at ext.bull.net
Mon Oct 27 03:43:40 PDT 2008


Amir Vadai a écrit :
> I asked our IB expert Jack for hints and he told me this:
>
>
> >From Section 11.6.2 (COMPLETION RETURN STATUS0 of the IB Spec volume 1, revision 1.2.1
> * Local Length Error - ... Generated for a
>   Work Request posted to the local Receive Queue when the sum of
>   the Data Segment lengths is too small to receive a valid incoming
>   message or the length of the incoming message is greater than the
>   maximum message size supported by the HCA port that received the
>   message.
>
>
> There seem to be 2 possibilities:
> 1. The receiver did not post enough/large-enough scatter gather entries in
>    the receive queue.
>
>
> or 
> 2. The sender sent a 0-length packet, but did so incorrectly.
>    (if any of the s/g entries (i.e., data segment entries) have a zero
>    byte count, this results in 2 GigaBytes of data being sent over the wire).
>
>
>    I note that SDP does not check for this (see sdp_post_send() in file sdp_bcopy.c:
>    the sge->length field is not checked for zero length).
>
>   

I think I got it.
In sdp_cma.c/sdp_response_handler,
the fragment size is retrieved through
        sdp_sk(sk)->xmit_size_goal = ntohl(h->actrcvsz) -
                sizeof(struct sdp_bsdh);
The dmesg messages shows :
sdp_sock(41820:0): sdp_response_handler bufs 64 xmit_size_goal 34816 
send trigger 16

I forced this value to 2048 and then it works.
On Xeon this size is 2048 by default.

In my understanding the xmit_size_goal is the size of the receiving 
buffer for buffered copies, isn't it?
So it shouldn't really matters as long as the packet is properly split 
at the MTu size to be sent over the network, right?
Could it be only working from x86/x86_64 working because the buffer size 
is smaller than the MTU?

Nicolas






More information about the general mailing list