[Scst-devel] [ofa-general] WinOF_2_0_5/SRP initiator: slow reads and eventually hangs

Vladislav Bolkhovitin vst at vlnb.net
Thu Sep 3 04:32:21 PDT 2009


Chris Worley, on 09/03/2009 08:08 AM wrote:
> On Wed, Sep 2, 2009 at 2:58 PM, Chris Worley<worleys at gmail.com> wrote:
>> On Wed, Sep 2, 2009 at 2:00 PM, Bart Van Assche<bart.vanassche at gmail.com> wrote:
>>> On Wed, Sep 2, 2009 at 9:53 PM, Chris Worley<worleys at gmail.com> wrote:
>>>> On Wed, Sep 2, 2009 at 1:31 PM, Bart Van Assche<bart.vanassche at gmail.com> wrote:
>>>>> On Tue, Sep 1, 2009 at 1:04 AM, Chris Worley<worleys at gmail.com> wrote:
>>>>>> [ ... ]
>>>>>> I've found a good kernel/scst mix to easily repeat this; I can get it
>>>>>> to repeatedly hang w/ 8K block transfers running Ubuntu 9.04 w/ the
>>>>>> 2.6.27-14-server kernel on _both_ target and initiator (i.e. no WinOF
>>>>>> or OFED at all) and SCST rev 1062 on the target using one drive
>>>>>> (performance is >600MB/s, >80K IOPS, on the 8KB block sizes being
>>>>>> used).
>>>>>> [ ... ]
>>>>> Is there a special reason why you are using the 2.6.27-14-server
>>>>> kernel ? AFAIK the latest Ubuntu 9.04 kernel is 2.6.28-15-server.
>>>> No special reason other than it didn't get upgraded w/ the rest of the
>>>> distro... started w/ 8.10.
>> I'm upgrading too, to 9.04.
> 
> I tried the 2.6.28-15-server kernel (along w/ the 9.04 upgrade), and
> it does repeat the issue.
> 
> In trying to build a kernel w/ lockdep support as Vlad requested, my
> lack of Debian knowledge shone through, and, although I believe I
> followed all the instructions correctly, I'm not sure if I have a
> 2.6.28-15 or 2.6.28-10 kernel.  Anyway, the issue is still repeatable.
> 
> Whatever kernel that is, I have SRP hung currently.  What should I
> look for in /proc/lockd*?
> 
> I don't think it's a kernel lock... I think it's a protocol lock, as I
> can rmmod the target kernel modules (scst_vdisk, scst, and ib_srpt)
> when the initiator gets in this state.

Since you can rmmod SCST modules, then this shouldn't be SCST or 
backstorage SW/HW issue, because that means there are no stuck or lost 
SCSI commands. So, it should be issue of either SRP target/initiator, or 
OFED on the target or initiator, or your IB hardware on any node.

You should enable lockdep on both target and initiator (better with 
other kernel debug facilities enabled, see the attached file as a 
sample) and reproduce the issue. There is a big chance that those 
facilities will spot what's going on wrong there.

Vlad

> Thanks,
> 
> Chris
>> Chris
>>>> Do you think that kernel is better?
>>> I noticed this while trying to reproduce this issue. I have no opinion
>>> yet about which of these two kernels is better. I'll downgrade the
>>> Ubuntu kernel in my setup.
>>>
>>> Bart.
>>>
> 
> ------------------------------------------------------------------------------
> Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
> trial. Simplify your report design, integration and deployment - and focus on 
> what you do best, core application coding. Discover what's new with 
> Crystal Reports now.  http://p.sf.net/sfu/bobj-july
> _______________________________________________
> Scst-devel mailing list
> Scst-devel at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scst-devel
> 

-------------- next part --------------
A non-text attachment was scrubbed...
Name: kern_dbg.diff
Type: text/x-patch
Size: 4270 bytes
Desc: not available
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20090903/9f842064/attachment.bin>


More information about the general mailing list