[ofa-general] [Bug 14235] New: SRP initiator lockup

bugzilla-daemon at bugzilla.kernel.org bugzilla-daemon at bugzilla.kernel.org
Sat Sep 26 07:54:37 PDT 2009


http://bugzilla.kernel.org/show_bug.cgi?id=14235

           Summary: SRP initiator lockup
           Product: Drivers
           Version: 2.5
          Platform: All
        OS/Version: Linux
              Tree: Mainline
            Status: NEW
          Severity: high
          Priority: P1
         Component: Infiniband/RDMA
        AssignedTo: drivers_infiniband-rdma at kernel-bugs.osdl.org
        ReportedBy: bart.vanassche at gmail.com
        Regression: No


If an SRP target processes SRP I/O slow enough, the SRP initiator locks up.
This issue is 100% reproducible with the following setup:

Target:
* Kernel 2.6.30.4 with SCST patches applied and kernel debugging enabled.
* SCST r1153 with EXTRA_CFLAGS += -DCONFIG_SCST_TRACING -DCONFIG_SCST_DEBUG -g
added in srpt/src/Makefile and with EXTRA_CFLAGS += -DCONFIG_SCST_TRACING added
in scst/src/Makefile.
* ib_srpt loaded with kernel module parameters thread=0 and
processing_delay_in_us=500.

Initiator:
* Kernel 2.6.31.1 with kernel debugging enabled.
* SRP login has been performed as follows: rmmod ib_srp; modprobe ib_srp;
ibsrpdm -c | while read target_info; do echo "${target_info}"; echo
"${target_info}" > /sys/class/infiniband_srp/srp-mlx4_0-1/add_target; done
* After SRP login succeeded the following fio command was started:
fio --rw=rw --bs=64M --rwmixread=100 --numjobs=1 --iodepth=1 --sync=0
--direct=1 --ioengine=sync --filename=/dev/${srp_initiator_device} --name=test
--loops=1000 --runtime=600 --size=2G

After a few minutes fio locked up (I/O rate dropped from 1500 MB/s to 0 MB/s)
and the following kernel message started appearing periodically:

INFO: task fio:6389 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
fio           D 0000000000000000     0  6389   6388 0x00000000
 ffff880071dc5bd8 0000000000000046 ffff880071dc5b08 000000018107764d
 0000000000012cc0 000000000000de20 0000000000000001 ffff880070cd8000
 ffff880070cd83b0 0000000100000000 000000010001193e ffff88007fb99050
Call Trace:
 [<ffffffff812ec5e5>] ? _spin_unlock_irqrestore+0x65/0x80
 [<ffffffff812e9b37>] io_schedule+0x37/0x50
 [<ffffffff8110cff2>] __blockdev_direct_IO+0x692/0xd80
 [<ffffffff810e0357>] ? get_super+0x27/0xc0
 [<ffffffff8110b169>] blkdev_direct_IO+0x49/0x50
 [<ffffffff8110a1f0>] ? blkdev_get_blocks+0x0/0xc0
 [<ffffffff810a1799>] generic_file_aio_read+0x679/0x690
 [<ffffffff810dc35a>] ? __dentry_open+0x13a/0x340
 [<ffffffff810de091>] do_sync_read+0xf1/0x140
 [<ffffffff810775ed>] ? trace_hardirqs_on_caller+0x14d/0x1a0
 [<ffffffff810662f0>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff810775ed>] ? trace_hardirqs_on_caller+0x14d/0x1a0
 [<ffffffff8107764d>] ? trace_hardirqs_on+0xd/0x10
 [<ffffffff810ded28>] vfs_read+0xc8/0x180
 [<ffffffff810deed0>] sys_read+0x50/0x90
 [<ffffffff8100be6b>] system_call_fastpath+0x16/0x1b
no locks held by fio/6389.

-- 
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.



More information about the general mailing list