[Scst-devel] [ofa-general] WinOF_2_0_5/SRP initiator: slow reads and eventually hangs

Chris Worley worleys at gmail.com
Sat Sep 19 10:29:56 PDT 2009


On Fri, Sep 18, 2009 at 3:33 PM, Chris Worley <worleys at gmail.com> wrote:
> On Fri, Sep 18, 2009 at 3:31 PM, Chris Worley <worleys at gmail.com> wrote:
>> On Mon, Sep 7, 2009 at 5:58 AM, Vladislav Bolkhovitin <vst at vlnb.net> wrote:
>>> Chris Worley, on 09/06/2009 05:41 PM wrote:
>>>>
>>>> On Sun, Sep 6, 2009 at 3:36 PM, Chris Worley<worleys at gmail.com> wrote:
>>>>>
>>>>> On Sun, Sep 6, 2009 at 3:17 PM, Bart Van Assche<bart.vanassche at gmail.com>
>>>>> wrote:
>>>>>>
>>>>>> On Fri, Sep 4, 2009 at 1:20 AM, Chris Worley <worleys at gmail.com> wrote:
>>>>>>>
>>>>>>> On Thu, Sep 3, 2009 at 11:38 AM, Chris Worley<worleys at gmail.com> wrote:
>>>>>>>>
>>>>>>>> I've used a couple of initiators (different systems) w/ different
>>>>>>>> OSes, w/ different IB cards (all QDR) and different IB stacks
>>>>>>>> (built-in vs. OFED) and can repeat the problem in all but the
>>>>>>>> RHEL5.2/OFED 1.4.1 target and initiator (but, if the initiator is
>>>>>>>> WinOF and the target is RHEL5.2/OFED1.4.1, then the problem does
>>>>>>>> repeat).
>>>>>>>
>>>>>>> Here's a twist: I used the Ubuntu initiator w/ one of the RHEL
>>>>>>> targets, and the RHEL initiator (same machine as was running WinOF
>>>>>>> from the beginning of this thread) w/ one of the Ubuntu targets: in
>>>>>>> both cases, the problem does not repeat.
>>>>>>>
>>>>>>> That makes it sound like OFED is the cure on either side of the
>>>>>>> connection, but does not explain the issue w/ WinOF (which does fail
>>>>>>> w/ either Ununtu or RHEL targets).
>>>>>>
>>>>>> These results are strange. Regarding the Linux-only tests, I was
>>>>>> assuming failure of a single component (Ubuntu SRP initiator, OFED SRP
>>>>>> initiator, Ubuntu IB driver, OFED IB driver or SRP target), but for
>>>>>> each of these components there is at least one test that passes and at
>>>>>> least one test that fails. So either my assumption is wrong or one of
>>>>>> the above test results is not repeatable. Do you have the time to
>>>>>> repeat the Linux-only tests ?
>>>>>
>>>>> Last night I was rerunning the RHEL5.2 initiator w/ Ubuntu client, and
>>>>> the problem repeated; now, I can't repeat the case where it didn't
>>>>> fail.  Still, no errors, other than the eventual timeouts previously
>>>>> shown; the target thinks all is fine, the initiator is stuck.
>>>>
>>>> ... and I haven't had any success w/ Ubuntu target and initiator, 8.10 or
>>>> 9.04.
>>>
>>> 1. Try with kernel parameter maxcpus=1. It will somehow relax possible races
>>> you have, although not completely.
>>
>> I finally got around to this test... 1 CPU works very well, w/o hangs
>> (will test all night to see if this holds true),

This has run through 1KB-8KB blocks for nearly 24 hours w/o error.
The single core case seems to work.

Chris
> 2 or more don't.
>> This is dual-socket NHM, so I can't specify more than one processor
>> w/o getting more than one socket.
>
> I don't know if this is important, but 1KB block tests didn't have a
> problem w/ 2 or 4 maxcpus... they didn't hang until 2KB blocks:
>
> fio --rw=randrw --bs=2k --rwmixread=100 --numjobs=64 --iodepth=64
> --sync=0 --direct=1 --randrepeat=0 --ioengine=libaio
> --filename=/dev/sdb --filename=/dev/sdc --name=test --loops=10000
> --size=32183006002 --runtime=600 --group_reporting
>
> Chris
>>
>> Chris
>>>
>>> 2. Try with another hardware, including motherboard. You can have something
>>> like http://lkml.org/lkml/2007/7/31/558 (not exactly it, of course)
>>>
>>>> Chris
>>>>>
>>>>> Chris
>>>>>>
>>>>>> Bart.
>>>>>>
>>>
>>>
>>
>



More information about the general mailing list