[ofa-general] Bug with SDP on IA64
Nicolas Morey Chaisemartin
nicolas.morey-chaisemartin at ext.bull.net
Mon Oct 27 05:32:04 PDT 2008
Dotan Barak a écrit :
> On Mon, Oct 27, 2008 at 11:09 AM, Nicolas Morey Chaisemartin
> <nicolas.morey-chaisemartin at ext.bull.net> wrote:
>
>> Amir Vadai a écrit :
>>
>>> I asked our IB expert Jack for hints and he told me this:
>>>
>>>
>>> >From Section 11.6.2 (COMPLETION RETURN STATUS0 of the IB Spec volume 1,
>>> revision 1.2.1
>>> * Local Length Error - ... Generated for a
>>> Work Request posted to the local Receive Queue when the sum of
>>> the Data Segment lengths is too small to receive a valid incoming
>>> message or the length of the incoming message is greater than the
>>> maximum message size supported by the HCA port that received the
>>> message.
>>>
>>>
>>> There seem to be 2 possibilities:
>>> 1. The receiver did not post enough/large-enough scatter gather entries in
>>> the receive queue.
>>>
>>>
>>> or 2. The sender sent a 0-length packet, but did so incorrectly.
>>> (if any of the s/g entries (i.e., data segment entries) have a zero
>>> byte count, this results in 2 GigaBytes of data being sent over the
>>> wire).
>>>
>>>
>>> I note that SDP does not check for this (see sdp_post_send() in file
>>> sdp_bcopy.c:
>>> the sge->length field is not checked for zero length).
>>>
>>>
>>> Regarding how to debug this, you need to talk with an sdp expert to see if
>>> sdp may try
>>> to send 0-length packets under stress ([Amir]: I can help you with this).
>>>
>>>
>>>
>> I've just run a few more tests.
>> I added a test in sdp_post_send to check to sge->length field:
>> if(sge->length == 0){printk(KERN_ERR "SDP sending 0bytes packet\n");}
>>
>
> Please pay attension: sge->length of 0 means that you send 2GB and not 0 bytes.
> If you want to send 0 bytes, the sg_list should be empty (0 entries).
>
> This is why you have a length violation ...
>
>
> Dotan
>
>
This is just a debug message. And I only have a 0 sge->length in the
case of IA64 to IA64 transfer.
When transferring to IA64 to x86_64 I don't have any problem with this.
As I said in my last message, the problem seems to be linked to the
bcopy buffer size.
Nicolas
More information about the general
mailing list