[Scst-devel] [ofa-general] WinOF_2_0_5/SRP initiator: slow reads and eventually hangs
Chris Worley
worleys at gmail.com
Tue Sep 15 13:51:17 PDT 2009
On Tue, Sep 15, 2009 at 11:10 AM, Vladislav Bolkhovitin <vst at vlnb.net> wrote:
> Chris Worley, on 09/15/2009 09:01 PM wrote:
>>
>> On Tue, Sep 15, 2009 at 10:57 AM, Vladislav Bolkhovitin <vst at vlnb.net>
>> wrote:
>>>
>>> Chris Worley, on 09/15/2009 08:53 PM wrote:
>>>>
>>>> On Tue, Sep 15, 2009 at 10:43 AM, Vladislav Bolkhovitin <vst at vlnb.net>
>>>> wrote:
>>>>>
>>>>> Chris Worley, on 09/15/2009 07:50 PM wrote:
>>>>>>
>>>>>> On Tue, Sep 15, 2009 at 12:10 AM, Bart Van Assche
>>>>>> <bart.vanassche at gmail.com> wrote:
>>>>>>>
>>>>>>> On Tue, Sep 15, 2009 at 1:03 AM, Chris Worley <worleys at gmail.com>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>> On Mon, Sep 14, 2009 at 12:51 PM, Vladislav Bolkhovitin
>>>>>>>> <vst at vlnb.net>
>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> Chris Worley, on 09/11/2009 11:50 PM wrote:
>>>>>>>>>>
>>>>>>>>>> I've definitely removed the switch/firmware from being the cause.
>>>>>>>>>>
>>>>>>>>>> I'm thinking the reason you can't repeat the test may be latency
>>>>>>>>>> related. We get ~50usecs average latency (on small block sizes),
>>>>>>>>>> which can't be achieved using regular SSD's (and rotating drives
>>>>>>>>>> are
>>>>>>>>>> nowhere close). Maybe a ramdisk would help repeat the issue.
>>>>>>>>>
>>>>>>>>> I think you should try to reproduce the problem with ramdisk or
>>>>>>>>> nullio.
>>>>>>>>> By
>>>>>>>>> so you will eliminate possible influence of the SSD backend.
>>>>>>>>
>>>>>>>> W/ 12GB RAM in the target, I created a 7GB ramdisk:
>>>>>>>>
>>>>>>>> mount -t ramfs -o size=7g ramfs /mnt/
>>>>>>>> dd if=/dev/zero of=/mnt/foo bs=1024k count=7000
>>>>>>>> echo "open ramdisk /mnt/foo" > /proc/scsi_tgt/vdisk/vdisk
>>>>>>>> echo "add ramdisk 2" >/proc/scsi_tgt/groups/Default/devices
>>>>>>>>
>>>>>>>> Then, on the initiator, I tested it... and it hung during sequential
>>>>>>>> 8KB block reads:
>>>>>>>>
>>>>>>>> fio --rw=read --bs=8k --numjobs=64 --iodepth=64 --sync=0 --direct=1
>>>>>>>> --randrepeat=0 \
>>>>>>>> --group_reporting --ioengine=libaio --filename=/dev/sde --name=test
>>>>>>>> --loops=10000 --runtime=600
>>>>>>>>
>>>>>>>> Note that I was running the SM on the target this time too.
>>>>>>>
>>>>>>> Which Linux distro was installed on the inititiator and on the target
>>>>>>> ? And if applicable, which OFED version ? Which kernel messages were
>>>>>>> logged by SRPT around the time the issue occurred (after having
>>>>>>> enabled SRPT logging first) ?
>>>>>>
>>>>>> As logging hadn't helped this issue previously, I've not been enabling
>>>>>> it. That plus the kernel hacks needed to invoke logging, it's not
>>>>>> worth enabling.
>>>>>>
>>>>>> This was with Ubuntu 8.10, built-in IB on the 2.6.27-14-server kernel.
>>>>>>
>>>>>> I couldn't get ramdisks working w/ SCST in RHEL5.2. When running:
>>>>>>
>>>>>> echo "open ramdisk /mnt/foo" > /proc/scsi_tgt/vdisk/vdisk
>>>>>>
>>>>>> I get the error:
>>>>>>
>>>>>> dev_vdisk: ***ERROR***: Wrong f_op or FS doesn't have required
>>>>>> capabilities
>>>>>>
>>>>>> ... which doesn't occur in the Ubuntu kernel, so I've been unable to
>>>>>> test RHEL kernels w/ ramdisks. In general, this problem occurs w/ 8KB
>>>>>> and smaller blocks w/ the Ubuntu kernels, and 2KB and smaller blocks
>>>>>> w/ RHEL kernels.
>>>>>
>>>>> Use ramfs instead.
>>>>
>>>> Do you mean:
>>>>
>>>> mount -t ramfs -o size=7g ramfs /mnt/
>>>
>>> You should then create a file on it and use it.
>>
>> That's what I'm doing, I believe. From above:
>>
>>>>>>>> mount -t ramfs -o size=7g ramfs /mnt/
>>>>>>>> dd if=/dev/zero of=/mnt/foo bs=1024k count=7000
>>>>>>>> echo "open ramdisk /mnt/foo" > /proc/scsi_tgt/vdisk/vdisk
>>>>>>>> echo "add ramdisk 2" >/proc/scsi_tgt/groups/Default/devices
>>
>> ... but the "open", on RHEL5.2 kernel 2.6.18-92.el5, generates the
>> following kernel messages:
>>
>> dev_vdisk: Registering virtual FILEIO device ramdisk
>> scst: Processing thread started, PID 9629
>> scst: Processing thread started, PID 9630
>> scst: Processing thread started, PID 9631
>> scst: Processing thread started, PID 9632
>> scst: Processing thread started, PID 9633
>> dev_vdisk: ***ERROR***: Wrong f_op or FS doesn't have required
>> capabilities
>> scst: ***ERROR***: New device handler's vdisk attach() failed: -22
>> scst: Processing thread PID 9629 finished
>> scst: Processing thread PID 9630 finished
>> scst: Processing thread PID 9631 finished
>> scst: Processing thread PID 9632 finished
>> scst: Processing thread PID 9633 finished
>> scst: Failed to attach to virtual device ramdisk
>>
>> Chris
>>>>
>>>> ?
>>>>
>>>> That's what I'm doing.
>
> That's strange. I'm doing it all the time, although with not so old kernels
> as 2.6.18.
In lots of testing today, I've seen this panic twice on the Ubuntu 8.10 targets:
[ 330.155992] ib_srpt: disconnected session
0x00247100000000460024710000000046 because a new SRP_LOGIN_REQ has
been received.
[ 357.207046] ib_srpt: srpt_xmit_response: tag= 17 channel in bad state 2
[ 357.207052] ib_srpt: disconnected session
0x00247100000000460024710000000046 because a new SRP_LOGIN_REQ has
been received.
[ 357.207100] ib_srpt: srpt_xmit_response: tag= 47 channel in bad state 2
[ 357.207104] scst: ***ERROR***: Target driver ib_srpt
xmit_response() returned fatal error
[ 357.241429] scst: ***ERROR***: Target driver ib_srpt
xmit_response() returned fatal error
[ 357.250234] ------------[ cut here ]------------
[ 357.250537] ib_srpt: srpt_xmit_response: tag= 26 channel in bad state 2
[ 357.250539] scst: ***ERROR***: Target driver ib_srpt
xmit_response() returned fatal error
[ 357.250550] ib_srpt: srpt_xmit_response: tag= 38 channel in bad state 2
[ 357.250553] scst: ***ERROR***: Target driver ib_srpt
xmit_response() returned fatal error
[ 357.250560] ib_srpt: srpt_xmit_response: tag= 27 channel in bad state 2
<repeated many times>
[ 357.301253] kernel BUG at /root/scst/scst/src/scst_targ.c:3089!
[ 357.301253] invalid opcode: 0000 [1] SMP
[ 357.301253] CPU 0
...
[ 357.301253] RIP: 0010:[<ffffffffa04759f6>] [<ffffffffa04759f6>]
scst_tgt_cmd_done+0x26/0x30 [scst]
[ 357.301253] RSP: 0018:ffff88039ad27b50 EFLAGS: 00010297
[ 357.301253] RAX: 0000000000000200 RBX: ffff8803ad9c68f8 RCX: 0000000000000000
[ 357.301253] RDX: 00000000ffffffff RSI: 0000000000000000 RDI: ffff8803ad9c68f8
[ 357.301253] RBP: ffff88039ad27b50 R08: 0000000000000000 R09: 0000000000000000
[ 357.301253] R10: ffff88039ad277c0 R11: ffff88041ad278cf R12: ffff8803c2972180
[ 357.301253] R13: ffff88039ada0000 R14: 0000000000000001 R15: ffff8803fb00c2b0
[ 357.301253] FS: 0000000000000000(0000) GS:ffffffff807dd000(0000)
knlGS:0000000000000000
[ 357.301253] CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
[ 357.301253] CR2: 00007f9281e64000 CR3: 0000000000201000 CR4: 00000000000006e0
[ 357.301253] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 357.301253] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 357.301253] Process ib_cm/0 (pid: 8299, threadinfo
ffff88039ad26000, task ffff88039ad40000)
[ 357.301253] Stack: ffff88039ad27b80 ffffffffa04c0c47
ffff88039a8db900 ffff8803c2972180
[ 357.301253] ffff8803fb00c240 ffff8803fb00c284 ffff88039ad27bc0
ffffffffa04c0d93
[ 357.301253] ffff88042a4959c0 ffff88042a9d7800 ffff88042544da00
ffff88042a9d7898
[ 357.301253] Call Trace:
[ 357.301253] [<ffffffffa04c0c47>] srpt_abort_scst_cmd+0xd7/0x160 [ib_srpt]
[ 357.301253] [<ffffffffa04c0d93>] srpt_release_channel+0xc3/0x190 [ib_srpt]
[ 357.301253] [<ffffffffa04c0e82>]
srpt_find_and_release_channel+0x22/0x30 [ib_srpt]
[ 357.301253] [<ffffffffa04c227d>] srpt_cm_handler+0x6d/0xbb8 [ib_srpt]
[ 357.301253] [<ffffffff80247526>] ? try_to_wake_up+0x126/0x2f0
[ 357.301253] [<ffffffff802476fd>] ? default_wake_function+0xd/0x10
[ 357.301253] [<ffffffff80267106>] ? autoremove_wake_function+0x16/0x40
[ 357.301253] [<ffffffff8023c99a>] ? __wake_up_common+0x5a/0x90
[ 357.301253] [<ffffffff8023dece>] ? __wake_up+0x4e/0x70
[ 357.301253] [<ffffffff80263271>] ? __queue_work+0x41/0x50
[ 357.301253] [<ffffffff8026331d>] ? queue_work_on+0x4d/0x60
[ 357.301253] [<ffffffff8026344f>] ? queue_work+0x1f/0x30
[ 357.301253] [<ffffffff8026350d>] ? queue_delayed_work+0x2d/0x40
[ 357.301253] [<ffffffffa0373255>] ? wait_for_response+0xd5/0xe0 [ib_mad]
[ 357.301253] [<ffffffffa04b4e07>] cm_process_work+0x27/0x130 [ib_cm]
[ 357.301253] [<ffffffffa04b5cf1>] cm_drep_handler+0xf1/0x180 [ib_cm]
[ 357.301253] [<ffffffffa04b7790>] ? cm_work_handler+0x0/0x1b8 [ib_cm]
[ 357.301253] [<ffffffffa04b7895>] cm_work_handler+0x105/0x1b8 [ib_cm]
[ 357.301253] [<ffffffffa04b7790>] ? cm_work_handler+0x0/0x1b8 [ib_cm]
[ 357.301253] [<ffffffff802628c2>] run_workqueue+0xc2/0x1a0
[ 357.301253] [<ffffffff80262bcf>] worker_thread+0xaf/0x130
[ 357.301253] [<ffffffff802670f0>] ? autoremove_wake_function+0x0/0x40
[ 357.301253] [<ffffffff80262b20>] ? worker_thread+0x0/0x130
[ 357.301253] [<ffffffff80266cbe>] kthread+0x4e/0x90
[ 357.301253] [<ffffffff80213c99>] child_rip+0xa/0x11
[ 357.301253] [<ffffffff80266c70>] ? kthread+0x0/0x90
[ 357.301253] [<ffffffff80213c8f>] ? child_rip+0x0/0x11
[ 357.301253]
[ 357.301253]
[ 357.301253] Code: 00 00 00 00 00 55 48 89 e5 e8 a7 cc d9 df 83 7f
28 78 75 17 80 67 2d f7 c7 47 28 0d 00 00 00 ba 01 00 00 00 e8 8c fc
ff ff c9 c3 <0f> 0b eb fe 66 0f 1f 44 00 00 55 48 89 e5 41 54 53 e8 74
cc d9
[ 357.301253] RIP [<ffffffffa04759f6>] scst_tgt_cmd_done+0x26/0x30 [scst]
[ 357.301253] RSP <ffff88039ad27b50>
[ 358.302745] ---[ end trace a7f20725e9471e16 ]---
[ 384.258076] ib_srpt: received SRP_LOGIN_REQ with i_port_id
0x24710000000046:0x24710000000046, t_port_id
0x24717124000028:0x24717124000028 and length 260 on port 1
(guid=0xfe80000000000000:0x24717124000029)
[ 384.276974] ib_srpt: disconnected session
0x00247100000000460024710000000046 because a new SRP_LOGIN_REQ has
been received.
[ 384.290055] scst: Using security group "Default" for initiator
"0x00247100000000460024710000000046"
[ 411.329136] ib_srpt: received SRP_LOGIN_REQ with i_port_id
0x24710000000046:0x24710000000046, t_port_id
0x24717124000028:0x24717124000028 and length 260 on port 1
(guid=0xfe80000000000000:0x24717124000029)
[ 411.348297] ib_srpt: srpt_xmit_response: tag= 24 channel in bad state 2
[ 411.348301] ib_srpt: disconnected session
0x00247100000000460024710000000046 because a new SRP_LOGIN_REQ has
been received.
[ 411.348323] ib_srpt: srpt_xmit_response: tag= 20 channel in bad state 2
[ 411.348326] scst: ***ERROR***: Target driver ib_srpt
xmit_response() returned fatal error
[ 411.348331] ib_srpt: srpt_xmit_response: tag= 42 channel in bad state 2
<many repeats>
[ 411.349705] scst: Using security group "Default" for initiator
"0x00247100000000460024710000000046"
[ 411.621319] scst: ***ERROR***: Target driver ib_srpt
xmit_response() returned fatal error
[ 411.629805] ib_srpt: srpt_xmit_response: tag= 38 channel in bad state 2
[ 411.636690] scst: ***ERROR***: Target driver ib_srpt
xmit_response() returned fatal error
[ 411.636698] ------------[ cut here ]------------
[ 411.636699] WARNING: at /root/scst/srpt/src/ib_srpt.c:924
srpt_abort_scst_cmd+0xac/0x160 [ib_srpt]()
...
[ 411.636735]
[ 411.636736] Call Trace:
[ 411.636742] [<ffffffff8024eb14>] warn_on_slowpath+0x64/0x90
[ 411.636747] [<ffffffff802b3aa0>] ? free_hot_page+0x10/0x20
[ 411.636750] [<ffffffff802e2430>] ? __slab_free+0x10/0x120
[ 411.636753] [<ffffffff80234089>] ? __phys_addr+0x9/0x50
[ 411.636757] [<ffffffff803b3e1c>] ? swiotlb_unmap_sg_attrs+0x9c/0xc0
[ 411.636759] [<ffffffffa04c0c1c>] srpt_abort_scst_cmd+0xac/0x160 [ib_srpt]
[ 411.636762] [<ffffffffa04c0d93>] srpt_release_channel+0xc3/0x190 [ib_srpt]
[ 411.636764] [<ffffffffa04c0e82>]
srpt_find_and_release_channel+0x22/0x30 [ib_srpt]
[ 411.636766] [<ffffffffa04c227d>] srpt_cm_handler+0x6d/0xbb8 [ib_srpt]
[ 411.636769] [<ffffffff80247526>] ? try_to_wake_up+0x126/0x2f0
[ 411.636771] [<ffffffff802476fd>] ? default_wake_function+0xd/0x10
[ 411.636774] [<ffffffff80267106>] ? autoremove_wake_function+0x16/0x40
[ 411.636776] [<ffffffff8023c99a>] ? __wake_up_common+0x5a/0x90
[ 411.636778] [<ffffffff8023dece>] ? __wake_up+0x4e/0x70
[ 411.636781] [<ffffffff80263271>] ? __queue_work+0x41/0x50
[ 411.636783] [<ffffffff8026331d>] ? queue_work_on+0x4d/0x60
[ 411.636784] [<ffffffff8026344f>] ? queue_work+0x1f/0x30
[ 411.636785] [<ffffffff8026350d>] ? queue_delayed_work+0x2d/0x40
[ 411.636790] [<ffffffffa0373255>] ? wait_for_response+0xd5/0xe0 [ib_mad]
[ 411.636794] [<ffffffffa04b4e07>] cm_process_work+0x27/0x130 [ib_cm]
[ 411.636797] [<ffffffffa04b5cf1>] cm_drep_handler+0xf1/0x180 [ib_cm]
[ 411.636799] [<ffffffffa04b7790>] ? cm_work_handler+0x0/0x1b8 [ib_cm]
[ 411.636802] [<ffffffffa04b7895>] cm_work_handler+0x105/0x1b8 [ib_cm]
[ 411.636804] [<ffffffffa04b7790>] ? cm_work_handler+0x0/0x1b8 [ib_cm]
[ 411.636806] [<ffffffff802628c2>] run_workqueue+0xc2/0x1a0
[ 411.636808] [<ffffffff80262bcf>] worker_thread+0xaf/0x130
[ 411.636809] [<ffffffff802670f0>] ? autoremove_wake_function+0x0/0x40
[ 411.636811] [<ffffffff80262b20>] ? worker_thread+0x0/0x130
[ 411.636812] [<ffffffff80266cbe>] kthread+0x4e/0x90
[ 411.636815] [<ffffffff80213c99>] child_rip+0xa/0x11
[ 411.636816] [<ffffffff80266c70>] ? kthread+0x0/0x90
[ 411.636817] [<ffffffff80213c8f>] ? child_rip+0x0/0x11
[ 411.636818]
[ 411.636819] ---[ end trace a7f20725e9471e16 ]---
[ 411.636838] ------------[ cut here ]------------
[ 411.636839] kernel BUG at /root/scst/scst/src/scst_targ.c:3089!
[ 411.636840] invalid opcode: 0000 [2] SMP
[ 411.636841] CPU 3
...
[ 411.636863] RIP: 0010:[<ffffffffa04759f6>] [<ffffffffa04759f6>]
scst_tgt_cmd_done+0x26/0x30 [scst]
[ 411.636874] RSP: 0018:ffff88039ad4fb50 EFLAGS: 00010297
[ 411.636875] RAX: 0000000000000200 RBX: ffff880071d34558 RCX: 00000000000213e2
[ 411.636876] RDX: 00000000ffffffff RSI: 0000000000000000 RDI: ffff880071d34558
[ 411.636877] RBP: ffff88039ad4fb50 R08: 0000000000000006 R09: 0000000000000000
[ 411.636878] R10: ffff88039ad4f7c0 R11: ffff88041ad4f8cf R12: ffff88039af49240
[ 411.636879] R13: ffff88039ada0000 R14: 0000000000000001 R15: ffff8803ad8b97f0
[ 411.636881] FS: 0000000000000000(0000) GS:ffff88042e4f7080(0000)
knlGS:0000000000000000
[ 411.636882] CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
[ 411.636883] CR2: 00007f9281e64000 CR3: 0000000000201000 CR4: 00000000000006e0
[ 411.636884] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 411.636885] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 411.636887] Process ib_cm/3 (pid: 8302, threadinfo
ffff88039ad4e000, task ffff88039ad44350)
[ 411.636888] Stack: ffff88039ad4fb80 ffffffffa04c0c47
ffff88039a995a80 ffff88039af49240
[ 411.636890] ffff8803ad8b9780 ffff8803ad8b97c4 ffff88039ad4fbc0
ffffffffa04c0d93
[ 411.636891] ffff88042a4959c0 ffff88042896f300 ffff88041801d800
ffff88042896f398
[ 411.636893] Call Trace:
[ 411.636895] [<ffffffffa04c0c47>] srpt_abort_scst_cmd+0xd7/0x160 [ib_srpt]
[ 411.636897] [<ffffffffa04c0d93>] srpt_release_channel+0xc3/0x190 [ib_srpt]
[ 411.636899] [<ffffffffa04c0e82>]
srpt_find_and_release_channel+0x22/0x30 [ib_srpt]
[ 411.636901] [<ffffffffa04c227d>] srpt_cm_handler+0x6d/0xbb8 [ib_srpt]
[ 411.636903] [<ffffffff80247526>] ? try_to_wake_up+0x126/0x2f0
[ 411.636905] [<ffffffff802476fd>] ? default_wake_function+0xd/0x10
[ 411.636906] [<ffffffff80267106>] ? autoremove_wake_function+0x16/0x40
[ 411.636908] [<ffffffff8023c99a>] ? __wake_up_common+0x5a/0x90
[ 411.636909] [<ffffffff8023dece>] ? __wake_up+0x4e/0x70
[ 411.636911] [<ffffffff80263271>] ? __queue_work+0x41/0x50
[ 411.636912] [<ffffffff8026331d>] ? queue_work_on+0x4d/0x60
[ 411.636914] [<ffffffff8026344f>] ? queue_work+0x1f/0x30
[ 411.636915] [<ffffffff8026350d>] ? queue_delayed_work+0x2d/0x40
[ 411.636918] [<ffffffffa0373255>] ? wait_for_response+0xd5/0xe0 [ib_mad]
[ 411.636921] [<ffffffffa04b4e07>] cm_process_work+0x27/0x130 [ib_cm]
[ 411.636923] [<ffffffffa04b5cf1>] cm_drep_handler+0xf1/0x180 [ib_cm]
[ 411.636926] [<ffffffffa04b7790>] ? cm_work_handler+0x0/0x1b8 [ib_cm]
[ 411.636928] [<ffffffffa04b7895>] cm_work_handler+0x105/0x1b8 [ib_cm]
[ 411.636930] [<ffffffffa04b7790>] ? cm_work_handler+0x0/0x1b8 [ib_cm]
[ 411.636932] [<ffffffff802628c2>] run_workqueue+0xc2/0x1a0
[ 411.636934] [<ffffffff80262bcf>] worker_thread+0xaf/0x130
[ 411.636935] [<ffffffff802670f0>] ? autoremove_wake_function+0x0/0x40
[ 411.636937] [<ffffffff80262b20>] ? worker_thread+0x0/0x130
[ 411.636938] [<ffffffff80266cbe>] kthread+0x4e/0x90
[ 411.636939] [<ffffffff80213c99>] child_rip+0xa/0x11
[ 411.636941] [<ffffffff80266c70>] ? kthread+0x0/0x90
[ 411.636942] [<ffffffff80213c8f>] ? child_rip+0x0/0x11
[ 411.636943]
[ 411.636943]
[ 411.636943] Code: 00 00 00 00 00 55 48 89 e5 e8 a7 cc d9 df 83 7f
28 78 75 17 80 67 2d f7 c7 47 28 0d 00 00 00 ba 01 00 00 00 e8 8c fc
ff ff c9 c3 <0f> 0b eb fe 66 0f 1f 44 00 00 55 48 89 e5 41 54 53 e8 74
cc d9
[ 411.636954] RIP [<ffffffffa04759f6>] scst_tgt_cmd_done+0x26/0x30 [scst]
[ 411.636960] RSP <ffff88039ad4fb50>
[ 411.636966] ---[ end trace a7f20725e9471e16 ]---
[ 412.361070] ib_srpt: srpt_xmit_response: tag= 8 channel in bad state 2
[ 412.367835] scst: ***ERROR***: Target driver ib_srpt
xmit_response() returned fatal error
[ 412.376335] ib_srpt: srpt_xmit_response: tag= 16 channel in bad state 2
[ 412.383183] scst: ***ERROR***: Target driver ib_srpt
xmit_response() returned fatal error
[ 412.391677] ib_srpt: srpt_xmit_response: tag= 47 channel in bad state 2
<many repeats>
[ 456.831960] ib_srpt: received SRP_LOGIN_REQ with i_port_id
0x24710000000046:0x24710000000046, t_port_id
0x24717124000028:0x24717124000028 and length 260 on port 1
(guid=0xfe80000000000000:0x24717124000029)
[ 456.850851] ib_srpt: disconnected session
0x00247100000000460024710000000046 because a new SRP_LOGIN_REQ has
been received.
[ 456.863891] scst: Using security group "Default" for initiator
"0x00247100000000460024710000000046"
[ 483.902158] ib_srpt: received SRP_LOGIN_REQ with i_port_id
0x24710000000046:0x24710000000046, t_port_id
0x24717124000028:0x24717124000028 and length 260 on port 1
(guid=0xfe80000000000000:0x24717124000029)
[ 483.921040] ib_srpt: disconnected session
0x00247100000000460024710000000046 because a new SRP_LOGIN_REQ has
been received.
[ 483.934101] scst: Using security group "Default" for initiator
"0x00247100000000460024710000000046"
[ 510.971720] ib_srpt: received SRP_LOGIN_REQ with i_port_id
0x24710000000046:0x24710000000046, t_port_id
0x24717124000028:0x24717124000028 and length 260 on port 1
(guid=0xfe80000000000000:0x24717124000029)
[ 510.990587] ib_srpt: disconnected session
0x00247100000000460024710000000046 because a new SRP_LOGIN_REQ has
been received.
[ 511.003643] scst: Using security group "Default" for initiator
"0x00247100000000460024710000000046"
[ 543.542817] ib_srpt: received SRP_LOGIN_REQ with i_port_id
0x24710000000055:0x24710000000055, t_port_id
0x24717124000028:0x24717124000028 and length 260 on port 1
(guid=0xfe80000000000000:0x24717124000029)
[ 543.563208] scst: Using security group "Default" for initiator
"0x00247100000000550024710000000055"
[ 544.186067] ib_srpt: received SRP_LOGIN_REQ with i_port_id
0x24710000000046:0x24710000000046, t_port_id
0x24717124000028:0x24717124000028 and length 260 on port 1
(guid=0xfe80000000000000:0x24717124000029)
[ 544.204962] ib_srpt: disconnected session
0x00247100000000460024710000000046 because a new SRP_LOGIN_REQ has
been received.
[ 544.218003] scst: Using security group "Default" for initiator
"0x00247100000000460024710000000046"
[ 571.256063] ib_srpt: received SRP_LOGIN_REQ with i_port_id
0x24710000000046:0x24710000000046, t_port_id
0x24717124000028:0x24717124000028 and length 260 on port 1
(guid=0xfe80000000000000:0x24717124000029)
[ 571.274934] ib_srpt: disconnected session
0x00247100000000460024710000000046 because a new SRP_LOGIN_REQ has
been received.
[ 571.287930] scst: Using security group "Default" for initiator
"0x00247100000000460024710000000046"
[ 598.330955] ib_srpt: received SRP_LOGIN_REQ with i_port_id
0x24710000000046:0x24710000000046, t_port_id
0x24717124000028:0x24717124000028 and length 260 on port 1
(guid=0xfe80000000000000:0x24717124000029)
[ 598.349836] ib_srpt: disconnected session
0x00247100000000460024710000000046 because a new SRP_LOGIN_REQ has
been received.
[ 598.362854] scst: Using security group "Default" for initiator
"0x00247100000000460024710000000046"
[ 625.400865] ib_srpt: received SRP_LOGIN_REQ with i_port_id
0x24710000000046:0x24710000000046, t_port_id
0x24717124000028:0x24717124000028 and length 260 on port 1
(guid=0xfe80000000000000:0x24717124000029)
[ 625.419753] ib_srpt: disconnected session
0x00247100000000460024710000000046 because a new SRP_LOGIN_REQ has
been received.
[ 625.432786] scst: Using security group "Default" for initiator
"0x00247100000000460024710000000046"
[ 652.470722] ib_srpt: received SRP_LOGIN_REQ with i_port_id
0x24710000000046:0x24710000000046, t_port_id
0x24717124000028:0x24717124000028 and length 260 on port 1
(guid=0xfe80000000000000:0x24717124000029)
[ 652.489625] ib_srpt: disconnected session
0x00247100000000460024710000000046 because a new SRP_LOGIN_REQ has
been received.
[ 652.502814] scst: Using security group "Default" for initiator
"0x00247100000000460024710000000046"
[ 679.540621] ib_srpt: received SRP_LOGIN_REQ with i_port_id
0x24710000000046:0x24710000000046, t_port_id
0x24717124000028:0x24717124000028 and length 260 on port 1
(guid=0xfe80000000000000:0x24717124000029)
[ 679.559503] ib_srpt: disconnected session
0x00247100000000460024710000000046 because a new SRP_LOGIN_REQ has
been received.
[ 679.572514] scst: Using security group "Default" for initiator
"0x00247100000000460024710000000046"
[ 706.610483] ib_srpt: received SRP_LOGIN_REQ with i_port_id
0x24710000000046:0x24710000000046, t_port_id
0x24717124000028:0x24717124000028 and length 260 on port 1
(guid=0xfe80000000000000:0x24717124000029)
[ 706.629376] ib_srpt: disconnected session
0x00247100000000460024710000000046 because a new SRP_LOGIN_REQ has
been received.
[ 706.642369] scst: Using security group "Default" for initiator
"0x00247100000000460024710000000046"
[ 733.680405] ib_srpt: received SRP_LOGIN_REQ with i_port_id
0x24710000000046:0x24710000000046, t_port_id
0x24717124000028:0x24717124000028 and length 260 on port 1
(guid=0xfe80000000000000:0x24717124000029)
[ 733.699243] ib_srpt: disconnected session
0x00247100000000460024710000000046 because a new SRP_LOGIN_REQ has
been received.
[ 733.712237] scst: Using security group "Default" for initiator
"0x00247100000000460024710000000046"
[ 760.750170] ib_srpt: received SRP_LOGIN_REQ with i_port_id
0x24710000000046:0x24710000000046, t_port_id
0x24717124000028:0x24717124000028 and length 260 on port 1
(guid=0xfe80000000000000:0x24717124000029)
[ 760.769019] ib_srpt: disconnected session
0x00247100000000460024710000000046 because a new SRP_LOGIN_REQ has
been received.
[ 760.782038] scst: Using security group "Default" for initiator
"0x00247100000000460024710000000046"
[ 800.108971] ib_srpt: received SRP_LOGIN_REQ with i_port_id
0x24710000000046:0x24710000000046, t_port_id
0x24717124000028:0x24717124000028 and length 260 on port 1
(guid=0xfe80000000000000:0x24717124000029)
[ 800.127840] ib_srpt: disconnected session
0x00247100000000460024710000000046 because a new SRP_LOGIN_REQ has
been received.
[ 800.140847] scst: Using security group "Default" for initiator
"0x00247100000000460024710000000046"
[ 827.178873] ib_srpt: received SRP_LOGIN_REQ with i_port_id
0x24710000000046:0x24710000000046, t_port_id
0x24717124000028:0x24717124000028 and length 260 on port 1
(guid=0xfe80000000000000:0x24717124000029)
[ 827.197731] ib_srpt: disconnected session
0x00247100000000460024710000000046 because a new SRP_LOGIN_REQ has
been received.
[ 827.210717] scst: Using security group "Default" for initiator
"0x00247100000000460024710000000046"
[ 854.248829] ib_srpt: received SRP_LOGIN_REQ with i_port_id
0x24710000000046:0x24710000000046, t_port_id
0x24717124000028:0x24717124000028 and length 260 on port 1
(guid=0xfe80000000000000:0x24717124000029)
[ 854.267662] ib_srpt: disconnected session
0x00247100000000460024710000000046 because a new SRP_LOGIN_REQ has
been received.
[ 854.280645] scst: Using security group "Default" for initiator
"0x00247100000000460024710000000046"
[ 881.318906] ib_srpt: received SRP_LOGIN_REQ with i_port_id
0x24710000000046:0x24710000000046, t_port_id
0x24717124000028:0x24717124000028 and length 260 on port 1
(guid=0xfe80000000000000:0x24717124000029)
[ 881.337874] ib_srpt: disconnected session
0x00247100000000460024710000000046 because a new SRP_LOGIN_REQ has
been received.
[ 881.351139] scst: Using security group "Default" for initiator
"0x00247100000000460024710000000046"
[ 893.147951] BUG: unable to handle kernel paging request at ffffe3e00e700cc0
[ 893.155156] IP: [<ffffffff802e2589>] kfree+0x49/0x100
[ 893.157386] PGD 0
[ 893.157386] Oops: 0000 [3] SMP
[ 893.157386] CPU 2
...
[ 893.157386] RIP: 0010:[<ffffffff802e2589>] [<ffffffff802e2589>]
kfree+0x49/0x100
[ 893.157386] RSP: 0018:ffff88042e4d7ce0 EFLAGS: 00010216
[ 893.157386] RAX: 000001e00e700cc0 RBX: ffffe3e00e700cc0 RCX: 0100000000002081
[ 893.157386] RDX: ffffe20000000000 RSI: ffff88039accdf00 RDI: 000000039c033000
[ 893.157386] RBP: ffff88042e4d7d10 R08: 0000000000000000 R09: 0000000000000001
[ 893.157386] R10: 0000000000000000 R11: 0000000000000001 R12: ffff8803c09d7c20
[ 893.157386] R13: 000000039c033000 R14: ffff88039a108240 R15: ffff880071dc4558
[ 893.157386] FS: 0000000000000000(0000) GS:ffff88042fc02d00(0000)
knlGS:0000000000000000
[ 893.157386] CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
[ 893.157386] CR2: ffffe3e00e700cc0 CR3: 0000000000201000 CR4: 00000000000006e0
[ 893.157386] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 893.157386] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 893.157386] Process swapper (pid: 0, threadinfo ffff88042e4d2000,
task ffff88042e4cace0)
[ 893.157386] Stack: ffff88042e4d7cf0 0000000000000001
ffff8803c09d7c20 ffff88039accdf00
[ 893.157386] ffff88039a108240 ffff880071dc4558 ffff88042e4d7d80
ffffffffa04bf301
[ 893.157386] 0000000000000000 ffffffff802e2430 ffffe20001c77100
0000000000000202
[ 893.157386] Call Trace:
[ 893.157386] <IRQ> [<ffffffffa04bf301>] srpt_reset_ioctx+0x41/0xf0 [ib_srpt]
[ 893.157386] [<ffffffff802e2430>] ? __slab_free+0x10/0x120
[ 893.157386] [<ffffffffa047b8f7>] ? scst_destroy_put_cmd+0x47/0x80 [scst]
[ 893.157386] [<ffffffffa04bf42d>] srpt_on_free_cmd+0x7d/0xa0 [ib_srpt]
[ 893.157386] [<ffffffffa047bc59>] scst_free_cmd+0x39/0x1a0 [scst]
[ 893.157386] [<ffffffffa0474c7d>]
scst_process_active_cmd+0x144d/0x1bd0 [scst]
[ 893.157386] [<ffffffffa04c180e>] ? srpt_completion+0x4e/0x230 [ib_srpt]
[ 893.157386] [<ffffffffa047546d>] scst_do_job_active+0x6d/0x90 [scst]
[ 893.157386] [<ffffffffa04754b7>] scst_cmd_tasklet+0x27/0x40 [scst]
[ 893.157386] [<ffffffffa0348a7d>] ? mlx4_eq_int+0x7d/0x2a0 [mlx4_core]
[ 893.157386] [<ffffffff80254776>] tasklet_action+0x86/0x110
[ 893.157386] [<ffffffff80254e3c>] __do_softirq+0x8c/0x100
[ 893.157386] [<ffffffff8021417c>] call_softirq+0x1c/0x30
[ 893.157386] [<ffffffff80215875>] do_softirq+0x65/0xa0
[ 893.157386] [<ffffffff80254ba5>] irq_exit+0x95/0xa0
[ 893.157386] [<ffffffff80215b1b>] do_IRQ+0x8b/0x100
[ 893.157386] [<ffffffff80212f0e>] ret_from_intr+0x0/0x29
[ 893.157386] <EOI> [<ffffffff8021aba2>] ? mwait_idle+0x52/0x60
[ 893.157386] [<ffffffff8021ab59>] ? mwait_idle+0x9/0x60
[ 893.157386] [<ffffffff80210e95>] ? cpu_idle+0x75/0x110
[ 893.157386] [<ffffffff804ff405>] ? start_secondary+0x97/0xc2
[ 893.157386]
[ 893.157386]
[ 893.157386] Code: 01 f3 ff 48 83 ff 10 49 89 fd 0f 86 85 00 00 00
e8 0d 1b f5 ff 48 c1 e8 0c 48 ba 00 00 00 00 00 e2 ff ff 48 c1 e0 06
48 8d 1c 10 <48> 8b 03 f6 c4 40 74 07 48 8b 5b 10 48 8b 03 84 c0 0f 89
87 00
[ 893.157386] RIP [<ffffffff802e2589>] kfree+0x49/0x100
[ 893.157386] RSP <ffff88042e4d7ce0>
[ 893.157386] CR2: ffffe3e00e700cc0
[ 893.157386] ---[ end trace a7f20725e9471e16 ]---
[ 893.533612] Kernel panic - not syncing: Aiee, killing interrupt handler!
[ 893.540006] ------------[ cut here ]------------
[ 893.540617] WARNING: at /build/buildd/linux-2.6.27/kernel/smp.c:332
smp_call_function_mask+0x22c/0x240()
...
[ 893.630002]
[ 893.630002] Call Trace:
[ 893.632644] <IRQ> [<ffffffff8024eb14>] warn_on_slowpath+0x64/0x90
[ 893.640002] [<ffffffff8023e8f3>] ? __enqueue_entity+0x93/0xa0
[ 893.643556] [<ffffffff8023fdd9>] ? enqueue_entity+0xd9/0x260
[ 893.650002] [<ffffffff802463f9>] ? enqueue_task_fair+0x59/0x60
[ 893.660002] [<ffffffff8023c460>] ? enqueue_task+0x50/0x60
[ 893.661747] [<ffffffff80245e9d>] ? resched_task+0x2d/0x90
[ 893.670002] [<ffffffff8027876c>] smp_call_function_mask+0x22c/0x240
[ 893.672916] [<ffffffff802256c0>] ? stop_this_cpu+0x0/0x40
[ 893.680003] [<ffffffff80502d69>] ? mutex_unlock+0x9/0x20
[ 893.690003] [<ffffffff80284d44>] ? crash_kexec+0x74/0x100
[ 893.690515] [<ffffffff80267106>] ? autoremove_wake_function+0x16/0x40
[ 893.700002] [<ffffffff8023c99a>] ? __wake_up_common+0x5a/0x90
[ 893.703245] [<ffffffff802787a0>] smp_call_function+0x20/0x30
[ 893.710002] [<ffffffff80225688>] native_smp_send_stop+0x28/0x60
[ 893.720003] [<ffffffff80501b7d>] panic+0xb4/0x177
[ 893.721449] [<ffffffff8026c3a1>] ? __blocking_notifier_call_chain+0x21/0x90
[ 893.730002] [<ffffffff8026c426>] ? blocking_notifier_call_chain+0x16/0x20
[ 893.740002] [<ffffffff80253505>] do_exit+0x3f5/0x410
[ 893.740638] [<ffffffff80504ea0>] oops_begin+0x0/0xb0
[ 893.750002] [<ffffffff80506e60>] do_page_fault+0x270/0x790
[ 893.751374] [<ffffffff802b2816>] ? get_page_from_freelist+0x2a6/0x380
[ 893.760003] [<ffffffff8026b3e1>] ? ktime_get_ts+0x61/0x70
[ 893.764197] [<ffffffff8026d0ac>] ? sched_clock_cpu+0xcc/0x160
[ 893.770002] [<ffffffff805045ea>] error_exit+0x0/0x70
[ 893.780003] [<ffffffff802e2589>] ? kfree+0x49/0x100
[ 893.780898] [<ffffffffa04bf301>] srpt_reset_ioctx+0x41/0xf0 [ib_srpt]
[ 893.790003] [<ffffffff802e2430>] ? __slab_free+0x10/0x120
[ 893.793126] [<ffffffffa047b8f7>] ? scst_destroy_put_cmd+0x47/0x80 [scst]
[ 893.800002] [<ffffffffa04bf42d>] srpt_on_free_cmd+0x7d/0xa0 [ib_srpt]
[ 893.810002] [<ffffffffa047bc59>] scst_free_cmd+0x39/0x1a0 [scst]
[ 893.812078] [<ffffffffa0474c7d>]
scst_process_active_cmd+0x144d/0x1bd0 [scst]
[ 893.820002] [<ffffffffa04c180e>] ? srpt_completion+0x4e/0x230 [ib_srpt]
[ 893.830005] [<ffffffffa047546d>] scst_do_job_active+0x6d/0x90 [scst]
[ 893.833429] [<ffffffffa04754b7>] scst_cmd_tasklet+0x27/0x40 [scst]
[ 893.840003] [<ffffffffa0348a7d>] ? mlx4_eq_int+0x7d/0x2a0 [mlx4_core]
[ 893.850003] [<ffffffff80254776>] tasklet_action+0x86/0x110
[ 893.852828] [<ffffffff80254e3c>] __do_softirq+0x8c/0x100
[ 893.860004] [<ffffffff8021417c>] call_softirq+0x1c/0x30
[ 893.864467] [<ffffffff80215875>] do_softirq+0x65/0xa0
[ 893.870005] [<ffffffff80254ba5>] irq_exit+0x95/0xa0
[ 893.880004] [<ffffffff80215b1b>] do_IRQ+0x8b/0x100
[ 893.880448] [<ffffffff80212f0e>] ret_from_intr+0x0/0x29
[ 893.890003] <EOI> [<ffffffff8021aba2>] ? mwait_idle+0x52/0x60
[ 893.893008] [<ffffffff8021ab59>] ? mwait_idle+0x9/0x60
[ 893.900003] [<ffffffff80210e95>] ? cpu_idle+0x75/0x110
[ 893.902676] [<ffffffff804ff405>] ? start_secondary+0x97/0xc2
[ 893.910004]
[ 893.913956] ---[ end trace a7f20725e9471e16 ]---
This may have been due to low memory, as I was using most target
memory for the ramdisk.
I do seem to be able to push the issue harder with:
fio --rw=randrw --bs=1k --numjobs=64 --iodepth=64 --sync=0 \
--direct=1 --randrepeat=0 --group_reporting --ioengine=libaio \
--filename=/dev/sdp --name=test --loops=10000 --runtime=1600 \
--rwmixread=100
Chris
More information about the general
mailing list