[ofa-general] Bug with SDP on IA64

Nicolas Morey Chaisemartin nicolas.morey-chaisemartin at ext.bull.net
Mon Oct 27 05:32:04 PDT 2008


Dotan Barak a écrit :
> On Mon, Oct 27, 2008 at 11:09 AM, Nicolas Morey Chaisemartin
> <nicolas.morey-chaisemartin at ext.bull.net> wrote:
>   
>> Amir Vadai a écrit :
>>     
>>> I asked our IB expert Jack for hints and he told me this:
>>>
>>>
>>> >From Section 11.6.2 (COMPLETION RETURN STATUS0 of the IB Spec volume 1,
>>> revision 1.2.1
>>> * Local Length Error - ... Generated for a
>>>  Work Request posted to the local Receive Queue when the sum of
>>>  the Data Segment lengths is too small to receive a valid incoming
>>>  message or the length of the incoming message is greater than the
>>>  maximum message size supported by the HCA port that received the
>>>  message.
>>>
>>>
>>> There seem to be 2 possibilities:
>>> 1. The receiver did not post enough/large-enough scatter gather entries in
>>>   the receive queue.
>>>
>>>
>>> or 2. The sender sent a 0-length packet, but did so incorrectly.
>>>   (if any of the s/g entries (i.e., data segment entries) have a zero
>>>   byte count, this results in 2 GigaBytes of data being sent over the
>>> wire).
>>>
>>>
>>>   I note that SDP does not check for this (see sdp_post_send() in file
>>> sdp_bcopy.c:
>>>   the sge->length field is not checked for zero length).
>>>
>>>
>>> Regarding how to debug this, you need to talk with an sdp expert to see if
>>> sdp may try
>>> to send 0-length packets under stress ([Amir]: I can help you with this).
>>>
>>>
>>>       
>> I've just run a few more tests.
>> I added a test in sdp_post_send to check to sge->length field:
>> if(sge->length == 0){printk(KERN_ERR "SDP sending 0bytes packet\n");}
>>     
>
> Please pay attension: sge->length of 0 means that you send 2GB and not 0 bytes.
> If you want to send 0 bytes, the sg_list should be empty (0 entries).
>
> This is why you have a length violation ...
>
>
> Dotan
>
>   

This is just a debug message. And I only have a 0 sge->length in the 
case of IA64 to IA64 transfer.
When transferring to IA64 to x86_64 I don't have any problem with this.

As I said in my last message, the problem seems to be linked to the 
bcopy buffer size.

Nicolas



More information about the general mailing list