[Scst-devel] [ofa-general] WinOF_2_0_5/SRP initiator: slow reads and eventually hangs

Wed Sep 2 11:01:54 PDT 2009

On Wed, Sep 2, 2009 at 11:27 AM, Chris Worley<worleys at gmail.com> wrote:
> On Wed, Sep 2, 2009 at 4:01 AM, Vladislav Bolkhovitin<vst at vlnb.net> wrote:
>> Chris Worley, on 09/01/2009 11:24 PM wrote:
>>>
>>> On Tue, Sep 1, 2009 at 1:02 PM, Vladislav Bolkhovitin<vst at vlnb.net> wrote:
>>>>
>>>> I'd suggest you to enable lockdep on the target. Google for more details
>>>> how
>>>> to do it.
>>>>
>>>> Also you should additional enable "mgmt_minor" SCST core trace level and
>>>> only it. Don't enable "all", its output useful only in very special
>>>> circumstances.
>>>
>>> Could you be more explicit in how to enable specific trace levels?
>>>
>>> For example, "all" causes the following:
>>>
>>> # cat /proc/scsi_tgt/vdisk/trace_level
>>> out_of_mem | minor | sg | mem | buff | entryexit | pid | line |
>>> function | debug | special | scsi | mgmt | mgmt_minor | mgmt_dbg |
>>> order
>>> # cat /proc/scsi_tgt/trace_level
>>> out_of_mem | minor | sg | mem | buff | entryexit | pid | line |
>>> function | debug | special | scsi | mgmt | mgmt_minor | mgmt_dbg |
>>> retry | scsi_serializing | recv_bot | send_bot | recv_top | send_top
>>>
>>> I tried echoing just some of those flags to cut down on excess
>>> verbosity, but would get errors like:
>>>
>>> # echo "minor | sg | mem" >/proc/scsi_tgt/vdisk/trace_level
>>> bash: echo: write error: Invalid argument
>>> # dmesg | tail -1l
>>> [330010.019198] scst: ***ERROR***: Unknown action "minor | sg | mem"
>>>
>>>> Usually to investigate a problem like yours, the default
>>>> flags in the debug build + "mgmt_minor" are sufficient.
>>>
>>> I tried "default" and didn't get any messages on the hang.
>>
>> See /proc/scsi_tgt/help for help about all SCST proc commands. (The latest
>> commits have some cleanups in this area.)
>
> I found three "trace tokens" that were causing the majority of the
> messages overflow:
>
> echo "all" >/proc/scsi_tgt/vdisk/trace_level
> (echo "del scsi" ;echo "del recv_bot";echo "del send_bot")
>>/proc/scsi_tgt/vdisk/trace_level
> echo "all" >/proc/scsi_tgt/trace_level
> (echo "del scsi" ;echo "del recv_bot";echo "del send_bot")
>>/proc/scsi_tgt/trace_level
>
> Then, the majority of messages were the pair:
>
> Sep  2 17:12:22 nameme kernel: [408676.552666] [0]: scst:
> __scst_init_cmd:3361:Too many pending commands (50) in session,
> returning BUSY to initiator "0x0002c903000260470002c90300026047"
> Sep  2 17:12:22 nameme kernel: [408676.552670] [0]: scst:
> scst_set_busy:366:Sending QUEUE_FULL status to initiator
> 0x0002c903000260470002c90300026047 (cmds count 79, queue_type 1,
> sess->init_phase 3)
>
> # grep -e "returning BUSY to initiator" /var/log/messages | wc -l
> 221834
> # grep -e "Sending QUEUE_FULL status to initiator" /var/log/messages | wc -l
> 167948

I got rid of those two messages, along with: "Xmitting data for cmd"
from scst_targ.c, and I couldn't enable the "scsi" trace token, but
could re-enable the "recv_bot", and the logs were filled with:

[410785.218625] [0]: scst_cmd_init_done:216:Recieving CDB:
[410785.218625]  (h)___0__1__2__3__4__5__6__7__8__9__A__B__C__D__E__F
[410785.218628]    0: 28 00 00 b8 33 90 00 00 10 00 00 00 00 00 00 00
 (...3...........

>
> Maybe the initiator has stopped due to its "busy" tracking... deciding
> the target cannot continue?

Note that once this occurs, I don't need to restart the target, I just
need to add the target drives on the initiator again... adding more
entries to /proc/partitions... I can't use the partitions that are
hung on the initiator, but I can use the newly added partitions.

This makes me think this is an initiator issue.  Is there any
debugging to enable for ib_srp?

But the contradictory results show the problem seems to follow the
target used, and should therefor be a target issue (i.e. the 8KB
problem w/ the Ubuntu target happens w/ Ubuntu or WinOF initiator, and
the <2KB problem w/ the OFED/RHEL stack works fine w/ an RHEL
initiator, but not WinOF).

Thanks,

Chris
>
> Is the queue settable to larger than 50 outstanding commands?
>
> Thanks,
>
> Chris
>>
>>> Thanks,
>>>
>>> Chris
>>>>
>>>> Vlad
>>>>
>>>> Chris Worley, on 09/01/2009 03:04 AM wrote:
>>>>>
>>>>> On Wed, Aug 12, 2009 at 12:15 AM, Bart Van
>>>>> Assche<bart.vanassche at gmail.com> wrote:
>>>>>>
>>>>>> On Tue, Aug 11, 2009 at 11:52 PM, Chris Worley<worleys at gmail.com>
>>>>>> wrote:
>>>>>>>
>>>>>>> I setup my target exactly as you prescribe... but my initiator is
>>>>>>> still Windows (version of WInOF at top): performance as relayed by
>>>>>>> IOMeter starts high and the average slowly decreases.  Watching the
>>>>>>> instantaneous throughput, there seem to be longer and longer lags of
>>>>>>> poor performance. between moments of good performance.  I need to run
>>>>>>> this against a Linux initiator to see if the problems are w/ WinOF.
>>>>>>>
>>>>>>> Using OFED 1.4.1 (w/ the stock RHEL kernel) on the target, the
>>>>>>> performance was steady and getting close to acceptable.  In a 15 hour
>>>>>>> test that cycles through sequential and random LBA's and R/W mixes
>>>>>>> from block sizes from 1MB to 512B, it worked well and got decent
>>>>>>> performance until it hit 1KB sequential reads which hung IOMeter; no
>>>>>>> messages on the Linux side (all looked okay).  IBSRP on the Windows
>>>>>>> side just said "a reset to device was issued" every 15 to 30 seconds
>>>>>>> after the problem started. I reloaded the IB stack on the Linux side,
>>>>>>> and was able to get it restarted.
>>>>>>>
>>>>>>> Still a lot of combinations to test.
>>>>>>
>>>>>> Which trace settings are you using on the target ? Enabling the proper
>>>>>> trace settings via /proc/scsi_tgt/trace_level might reveal whether you
>>>>>> are e.g. hitting the QUEUE_FULL condition. See also scst/README.
>>>>>
>>>>> I've found a good kernel/scst mix to easily repeat this; I can get it
>>>>> to repeatedly hang w/ 8K block transfers running Ubuntu 9.04 w/ the
>>>>> 2.6.27-14-server kernel on _both_ target and initiator (i.e. no WinOF
>>>>> or OFED at all) and SCST rev 1062 on the target using one drive
>>>>> (performance is >600MB/s, >80K IOPS, on the 8KB block sizes being
>>>>> used).
>>>>>
>>>>> Although the problem doesn't occur in Windows until blocks are <2KB
>>>>> and the RHEL5.2/OFED configuration does not repeat the issue using a
>>>>> Linux initiator, it seems like a very similar hang, so I'm hoping it's
>>>>> the same issue.
>>>>>
>>>>> To repeat the issue, I run 8KB block random reads w/ 64 threads,
>>>>> running AIO calls w/ a depth of 64 (using "fio" on the initiator):
>>>>>
>>>>> # fio --rw=randrw --bs=8k --rwmixread=100 --numjobs=64 --iodepth=64
>>>>> --sync=0 --direct=1 --randrepeat=0 --ioengine=libaio
>>>>> --filename=/dev/sdn --name=test --loops=10000 --size=16091503001
>>>>>
>>>>> The "size" represents 10% of the drive.  It doesn't seem to ever
>>>>> happen on writes, but I've seen it happen on mixed reads/writes.
>>>>>
>>>>> With tracing set to "default", there was still nothing in the target
>>>>> logs at the time of the hang.
>>>>>
>>>>> With tracing set thusly on the target:
>>>>>
>>>>> echo "all" >/proc/scsi_tgt/trace_level
>>>>> echo "all" >/proc/scsi_tgt/vdisk/trace_level
>>>>>
>>>>> The last few lines of dmesg look like:
>>>>>
>>>>> [255354.313411]    0: 28 00 01 84 54 90 00 00 10 00 00 00 00 00 00 00
>>>>>  (...T...........
>>>>> [255354.313420] [0]: scst: scst_cmd_init_done:214:tag=62, lun=0, CDB
>>>>> len=16, queue_type=1 (cmd ffff880102b4a568)
>>>>> [255354.313443] [26358]: scst: scst_pre_parse:417:op_name <READ(10)>
>>>>> (cmd ffff880102b4a3a0), direction=2 (expected 2, set yes),
>>>>> transfer_len=16 (expected len 8192), flags=1
>>>>> [255354.313420] [0]: scst_cmd_init_done:216:Recieving CDB:
>>>>> [255354.313452] [8602]: scst: scst_xmit_response:3004:Xmitting data
>>>>> for cmd ffff880102b49e48 (sg_cnt 0, sg ffff880132579f60, sg[0].page
>>>>> ffffe200042b7180)
>>>>> [255354.313457] [8604]: scst: scst_xmit_response:3004:Xmitting data
>>>>> for cmd ffff880102b4a010 (sg_cnt 0, sg ffff8802e9806f60, sg[0].page
>>>>> ffffe2000bc129c0)
>>>>> [255354.313426]  (h)___0__1__2__3__4__5__6__7__8__9__A__B__C__D__E__F
>>>>> [255354.313426]    0: 28 00 01 bc 5d 10 00 00 10 00 00 00 00 00 00 00
>>>>>  (...]...........
>>>>> [255354.313468] [26358]: scst: scst_pre_parse:417:op_name <READ(10)>
>>>>> (cmd ffff880102b4a568), direction=2 (expected 2, set yes),
>>>>> transfer_len=16 (expected len 8192), flags=1
>>>>> [255354.313484] [8602]: scst: scst_xmit_response:3004:Xmitting data
>>>>> for cmd ffff880102b4a1d8 (sg_cnt 0, sg ffff8802e98064c0, sg[0].page
>>>>> ffffe2000bc633c0)
>>>>> [255354.313551] [8604]: scst: scst_xmit_response:3004:Xmitting data
>>>>> for cmd ffff880102b4a3a0 (sg_cnt 0, sg ffff88018a877060, sg[0].page
>>>>> ffffe20004300200)
>>>>> [255354.313556] [8602]: scst: scst_xmit_response:3004:Xmitting data
>>>>> for cmd ffff880102b4a568 (sg_cnt 0, sg ffff880142581100, sg[0].page
>>>>> ffffe20004066d40)
>>>>>
>>>>> ... and there's a section like:
>>>>>
>>>>> [255354.310177]    0: 28 00 01 25 df 50 00 00 10 00 00 00 00 00 00 00
>>>>>  (..%.P..........
>>>>> [255354.310177] [0]: scst: scst_cmd_init_done:214:tag=57, lun=0, CDB
>>>>> len=16, queue_type=1 (cmd ffff8801642e2730)
>>>>> [255354.310177] [0]: scst_cmd_init_done:216:Recieving CDB:
>>>>> [255354.310177]  (h)___0__1__2__3__4__5__6__7__8__9__A__B__C__D__E__F
>>>>> [255354.310177]    0: 28 00 01 5e 22 c0 00 00 10 00 00 00 00 00 00 00
>>>>>  (..^"...........
>>>>> [255354.310966] [26369]: scst: scst_pre_parse:417:op_name <READ(10)>
>>>>> (cmd ffff880168a9e3a0), direction=2 (expected 2, set yes),
>>>>> transfer_len=16 (expected len 8192), flags=1
>>>>> [255354.310973] [26361]: scst: scst_pre_parse:417:op_name <READ(10)>
>>>>> (cmd ffff880168a9e010), direction=2 (expected 2, set yes),
>>>>> transfer_len=16 (expected len 8192), flags=1
>>>>> [255354.310980] [26365]: scst: scst_pre_parse:417:op_name <READ(10)>
>>>>> (cmd ffff880168a9e1d8), direction=2 (expected 2, set yes),
>>>>> transfer_len=16 (expected len 8192), flags=1
>>>>> [255354.310986] [26359]: scst: scst_pre_parse:417:op_name <READ(10)>
>>>>> (cmd ffff880168a9de48), direction=2 (expected 2, set yes),
>>>>> transfer_len=16 (expected len 8192), flags=1
>>>>> ...
>>>>> [255354.311221] [8604]: scst: scst_xmit_response:3004:Xmitting data
>>>>> for cmd ffff880168a9e1d8 (sg_cnt 0, sg ffff880173ca8060, sg[0].page
>>>>> ffffe20004325d00)
>>>>> [255354.311226] [8602]: scst: scst_xmit_response:3004:Xmitting data
>>>>> for cmd ffff880168a9ee50 (sg_cnt 0, sg ffff880173ca8c40, sg[0].page
>>>>> ffffe20005847ec0)
>>>>> [255354.311233] [8604]: scst: scst_xmit_response:3004:Xmitting data
>>>>> for cmd ffff880168a9dc80 (sg_cnt 0, sg ffff8802f0143c40, sg[0].page
>>>>> ffffe2000bc04880)
>>>>> [255354.311238] [8602]: scst: scst_xmit_response:3004:Xmitting data
>>>>> for cmd ffff880168a9e568 (sg_cnt 0, sg ffff8802f08361a0, sg[0].page
>>>>> ffffe2000bbf2400)
>>>>> [255354.311242] [8604]: scst: scst_xmit_response:3004:Xmitting data
>>>>> for cmd ffff880168a9d560 (sg_cnt 0, sg ffff88010acd74c0, sg[0].page
>>>>> ffffe200047e7280)
>>>>>
>>>>> ... but, prior to that, messages are unreadably garbled, as in:
>>>>>
>>>>> Aug 31 22:37:00 nameme kernel: t]9l ft48 r(09 ,83_5p  s20 sg:303
>>>>> _00s3]c_=cs  _00ad0000e_003a6_0031_4(ea5 9arg )_2As_05s_8[7:c8[f3 _178
>>>>> 087gff0 .R nt]9i0tmpd1:ft st06s68 5i9[301602_106)o6 _001e4 0<s0 3>)0
>>>>> .3E3_28a9102 pft0>e_o[.eo[<_2n05 98_0f8_i xpe1f0 D<98s np8one:21_0
>>>>> 30f3006=e_ ax R8gs=h62]= 2.pd_ pad555mlf
>>>>> 1_]f8=.05lf i7gxs_ac3 m_0c0:]5i3087[_ 5e sg,00[dc3e,_ 0[ ( 1<[t]F]
>>>>> ..eb 4t_ ah1,_1_]10.h45_]2,5__12C5o 37 d_.)b_g4f850s, t1e c80.ite.8pE
>>>>> ue2.4f[.ft0 5c5_1effft 5530 f len=16, 5v03,em_cs4e 05fc78.5r5. n
>>>>> ,45ft45ff<if_:4fnd5c<ts54c078f9]_0c0a0efee04f[,1n 0 __5deff588=f82
>>>>> .t)m9.8)9.8077=s  _C 3 i8 .tlsf5_[0s0 (2u fu 4
>>>>> 5fco5fnr.n0a05_34f__4fd_4n Bs60fn4pB.tor7=s
>>>>> _i8s7=0_.tl:c>l3e0.51_654.30350en.m C30 C3 e f.dtm0=2_1e0n]6qe  d.>_
>>>>> 76 d=f _esr_tp 9_50.tnf50[cs.,
>>>>> Aug 31 22:37:00 nameme kernel: e .0 5 B , 45 0<s382 3_
>>>>> Aug 31 22:37:00 nameme kernel:  c2< s0< cm38cf58.[f10 002< c3De
>>>>> _)088m8 9c5299pected__F
>>>>> Aug 31 22:37:00 nameme kernel: tran50 pt48)=8]=s59etl5pe4e6d)0c6
>>>>> ei_2(e_<3cc_ ea51es_0_sras A >cmdtesafe4 3[m 3.rer7:[ 1b00s5
>>>>> Aug 31 22:37:00 nameme kernel: ] 2a015ffs.35fff  B__ a
>>>>> 6cmd9spre3se9_2e3806(3_csA_  1 ns38ge0sre0
>>>>> Aug 31 22:37:00 nameme kernel: <g data  sf9_ _ 6d  0se5245f_26._2
>>>>> .,76.9<g fe t_]t6:(E...:s5D.s0_<Rte46>0330B005]08s3 __ r40r._5x,<Re08
>>>>> :2ec_ :06cs1_0ti1d l:253064enfe7]0 abd5 0f>196.t b 7.(008ni]
>>>>> 0s09.r650t, <24]__ s1=in03 s0p c2>>[4ein.1:ooD..ps210a>[25534_r6,:t
>>>>> n4.]4(8 e2 .r c 2n1g9360]10>(  00 00 00 00[fd[2
>>>>> [2g_re53  le_6c_md8t_ftc883tf03c  m_0 :8r8fmd63m3:0] 25
>>>>> c6>[2n_e:fa2e84_0
>>>>> Aug 31 22:37:00 nameme kernel: c,
>>>>> Aug 31 22:37:00 nameme kernel: .=0>5f=1s5=1d6_(de:d
>>>>> 2l_25:0edg25fm>ff40 l440 e,AFg l)AF0 0o[1088. 1aggB
>>>>> 0n=d9(16a.5oeX6csf00s0: ._, (=10es_(1 7 5c___oR5st_42p3d 7
>>>>> C9d=5_:(3__7mD4_ 0m4_ed
>>>>> 04,5.,[s55.d4c,,25=,c8__q,[(meet9303_mr0ue9m0u_032__fy2se
>>>>> Aug 31 22:37:00 nameme kernel: >  y>i
>>>>>
>>>>> ... so other suggestions on trace settings would be appreciated.
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Chris
>>>>>>
>>>>>> Bart.
>>
>>
>