[Scst-devel] [ofa-general] WinOF_2_0_5/SRP initiator: slow reads and eventually hangs

Vladislav Bolkhovitin vst at vlnb.net
Wed Sep 9 11:38:30 PDT 2009


Chris Worley, on 09/09/2009 02:29 AM wrote:
> On Mon, Sep 7, 2009 at 1:58 PM, Vladislav Bolkhovitin<vst at vlnb.net> wrote:
>> Chris Worley, on 09/06/2009 05:41 PM wrote:
>>> On Sun, Sep 6, 2009 at 3:36 PM, Chris Worley<worleys at gmail.com> wrote:
>>>> On Sun, Sep 6, 2009 at 3:17 PM, Bart Van Assche<bart.vanassche at gmail.com>
>>>> wrote:
>>>>> On Fri, Sep 4, 2009 at 1:20 AM, Chris Worley <worleys at gmail.com> wrote:
>>>>>> On Thu, Sep 3, 2009 at 11:38 AM, Chris Worley<worleys at gmail.com> wrote:
>>>>>>> I've used a couple of initiators (different systems) w/ different
>>>>>>> OSes, w/ different IB cards (all QDR) and different IB stacks
>>>>>>> (built-in vs. OFED) and can repeat the problem in all but the
>>>>>>> RHEL5.2/OFED 1.4.1 target and initiator (but, if the initiator is
>>>>>>> WinOF and the target is RHEL5.2/OFED1.4.1, then the problem does
>>>>>>> repeat).
>>>>>> Here's a twist: I used the Ubuntu initiator w/ one of the RHEL
>>>>>> targets, and the RHEL initiator (same machine as was running WinOF
>>>>>> from the beginning of this thread) w/ one of the Ubuntu targets: in
>>>>>> both cases, the problem does not repeat.
>>>>>>
>>>>>> That makes it sound like OFED is the cure on either side of the
>>>>>> connection, but does not explain the issue w/ WinOF (which does fail
>>>>>> w/ either Ununtu or RHEL targets).
>>>>> These results are strange. Regarding the Linux-only tests, I was
>>>>> assuming failure of a single component (Ubuntu SRP initiator, OFED SRP
>>>>> initiator, Ubuntu IB driver, OFED IB driver or SRP target), but for
>>>>> each of these components there is at least one test that passes and at
>>>>> least one test that fails. So either my assumption is wrong or one of
>>>>> the above test results is not repeatable. Do you have the time to
>>>>> repeat the Linux-only tests ?
>>>> Last night I was rerunning the RHEL5.2 initiator w/ Ubuntu client, and
>>>> the problem repeated; now, I can't repeat the case where it didn't
>>>> fail.  Still, no errors, other than the eventual timeouts previously
>>>> shown; the target thinks all is fine, the initiator is stuck.
>>> ... and I haven't had any success w/ Ubuntu target and initiator, 8.10 or
>>> 9.04.
>> 1. Try with kernel parameter maxcpus=1. It will somehow relax possible races
>> you have, although not completely.
> 
> I'll give it a try if/when I can use my systems again.
> 
> I've already done the lock testing as suggested, with no problems.
> 
> It really seems like a protocol issue... something is getting dropped
> (maybe by the IB hardware), and either no retries or the target
> rejects the retries.
>> 2. Try with another hardware, including motherboard. You can have something
>> like http://lkml.org/lkml/2007/7/31/558 (not exactly it, of course)
> 
> I'm wondering why it's so easily repeatable by me, and those I work
> with, and nobody else?  I have another completely different
> configuration w/ the same issue...
> 
> Today I was running on a  dual-socket NHM and an eight-socket Opteron
> boxes.  Both were running RHEL 5.3 w/ 2.6.18-128.7.1.0.1.el5, with all
> the SCST recommended modifications (which I haven't been running on my
> systems) and OFED 1.4.1.  The Opteron was the target.
> 
> The apps on the initiator don't need great speed, just very low
> latency.  The app run time is ~4 hours... 2.5hours in, the hang
> occurs... same symptom, although this isn't raw disks running fio
> benchmarks... these are LVM striped (4MB) partitions w/ ext3 fs'es on
> top.
> 
> The average block sizes as seen from the target are between 64KB and
> 128KB... not the 8KB and smaller I've been talking about in other
> tests, although all I can tell is what iostat shows me, and it just
> displays averages.
> 
> But, the same issue occurs... the apps on the initiator hang, and the
> target thinks all is well.   An app will hang in one of the file
> systems... the others seem to be working well (even though they are
> comprised of the same drives as the hung fs/app), for example: you can
> do a "find ." from their root w/o hanging "find", but if you try that
> in the fs where the app is hung, "find" will hang.  Lvscan/pvscan will
> hang too.
> 
> Strangely, restarting the target (removing ib_srpd and scst_vdisk
> modules, then re-registering the disks with scst_vdisk and
> re-modprobing ib_srpt from scratch) causes the apps on the initiator
> to un-hang and make progress again... (but eventually hang again...
> seemingly more readily than before).
> 
> While nothing other than the messages you'd expect (from
> re-registering the drives to the initiator logging in) occur on the
> target, the initiator has much to say during this re-registration
> period, starting w/ the time-out (that has been shown previously):
> 
> Sep  8 22:04:07 nameme kernel: sd 30:0:0:3: timing out command, waited 360s
> Sep  8 22:04:07 nameme kernel: sd 30:0:0:3: SCSI error: return code = 0x06000000
> Sep  8 22:04:07 nameme kernel: end_request: I/O error, dev sdo, sector 45304704
> Sep  8 22:04:16 nameme kernel: sd 29:0:0:8: timing out command, waited 360s
> Sep  8 22:04:16 nameme kernel: sd 29:0:0:8: SCSI error: return code = 0x06000000
> Sep  8 22:04:16 nameme kernel: end_request: I/O error, dev sdj, sector 133535344
> Sep  8 22:04:16 nameme kernel: sd 29:0:0:1: timing out command, waited 360s
> Sep  8 22:04:16 nameme kernel: sd 29:0:0:1: SCSI error: return code = 0x06000000
> Sep  8 22:04:16 nameme kernel: end_request: I/O error, dev sdc, sector 55150000
> Sep  8 22:04:16 nameme kernel: sd 29:0:0:5: timing out command, waited 360s
> Sep  8 22:04:16 nameme kernel: sd 29:0:0:5: SCSI error: return code = 0x06000000
> Sep  8 22:04:16 nameme kernel: end_request: I/O error, dev sdg, sector 110218720
> Sep  8 22:04:16 nameme kernel: sd 29:0:0:6: timing out command, waited 360s
> Sep  8 22:04:16 nameme kernel: sd 29:0:0:6: SCSI error: return code = 0x06000000
> Sep  8 22:04:16 nameme kernel: end_request: I/O error, dev sdh, sector 133521792
> Sep  8 22:04:16 nameme kernel: sd 30:0:0:9: timing out command, waited 360s
> Sep  8 22:04:16 nameme kernel: sd 30:0:0:9: SCSI error: return code = 0x06000000
> Sep  8 22:04:16 nameme kernel: end_request: I/O error, dev sdu, sector 123498408
> Sep  8 22:04:16 nameme kernel: sd 29:0:0:7: timing out command, waited 360s
> Sep  8 22:04:16 nameme kernel: sd 29:0:0:7: SCSI error: return code = 0x06000000
> Sep  8 22:04:16 nameme kernel: end_request: I/O error, dev sdi, sector 61892872
> Sep  8 22:04:26 nameme kernel: sd 29:0:0:0: timing out command, waited 360s
> Sep  8 22:04:26 nameme kernel: sd 29:0:0:0: SCSI error: return code = 0x06000000
> Sep  8 22:04:26 nameme kernel: end_request: I/O error, dev sdb, sector 111165032
> Sep  8 22:04:35 nameme kernel: sd 29:0:0:3: timing out command, waited 360s
> Sep  8 22:04:35 nameme kernel: sd 29:0:0:3: SCSI error: return code = 0x06000000
> Sep  8 22:04:35 nameme kernel: end_request: I/O error, dev sde, sector 110192456
> Sep  8 22:04:35 nameme kernel: sd 29:0:0:4: timing out command, waited 360s
> Sep  8 22:04:35 nameme kernel: sd 29:0:0:4: SCSI error: return code = 0x06000000
> Sep  8 22:04:35 nameme kernel: end_request: I/O error, dev sdf, sector 81712464
> Sep  8 22:04:45 nameme kernel: sd 29:0:0:9: timing out command, waited 360s
> Sep  8 22:04:45 nameme kernel: sd 29:0:0:9: SCSI error: return code = 0x06000000
> Sep  8 22:04:45 nameme kernel: end_request: I/O error, dev sdk, sector 133530608
> Sep  8 22:04:45 nameme kernel: sd 30:0:0:4: timing out command, waited 360s
> Sep  8 22:04:45 nameme kernel: sd 30:0:0:4: SCSI error: return code = 0x06000000
> Sep  8 22:04:45 nameme kernel: end_request: I/O error, dev sdp, sector 133527104
> Sep  8 22:04:45 nameme kernel: sd 30:0:0:8: timing out command, waited 360s
> Sep  8 22:04:45 nameme kernel: sd 30:0:0:8: SCSI error: return code = 0x06000000
> Sep  8 22:04:45 nameme kernel: end_request: I/O error, dev sdt, sector 133534080
> Sep  8 22:04:45 nameme kernel: sd 30:0:0:5: timing out command, waited 360s
> Sep  8 22:04:45 nameme kernel: sd 30:0:0:5: SCSI error: return code = 0x06000000
> Sep  8 22:04:45 nameme kernel: end_request: I/O error, dev sdq, sector 133525384
> Sep  8 22:04:45 nameme kernel: sd 30:0:0:6: timing out command, waited 360s
> Sep  8 22:04:45 nameme kernel: sd 30:0:0:6: SCSI error: return code = 0x06000000
> Sep  8 22:04:45 nameme kernel: end_request: I/O error, dev sdr, sector 136997696
> Sep  8 22:04:50 nameme kernel: sd 30:0:0:0: timing out command, waited 360s
> Sep  8 22:04:50 nameme kernel: sd 30:0:0:0: SCSI error: return code = 0x06000000
> Sep  8 22:04:50 nameme kernel: end_request: I/O error, dev sdl, sector 133540752
> Sep  8 22:05:24 nameme kernel: sd 29:0:0:2: timing out command, waited 360s
> Sep  8 22:05:24 nameme kernel: sd 29:0:0:2: SCSI error: return code = 0x06000000
> Sep  8 22:05:24 nameme kernel: end_request: I/O error, dev sdd, sector 138572160
> Sep  8 22:06:37 nameme kernel:  host27: ib_srp: DREQ received -
> connection closed
> Sep  8 22:06:37 nameme kernel:  host28: ib_srp: DREQ received -
> connection closed
> Sep  8 22:06:37 nameme kernel:  host29: ib_srp: DREQ received -
> connection closed
> Sep  8 22:06:37 nameme kernel:  host30: ib_srp: DREQ received -
> connection closed
> Sep  8 22:06:37 nameme kernel:  host31: ib_srp: DREQ received -
> connection closed
> Sep  8 22:06:37 nameme kernel:  host32: ib_srp: DREQ received -
> connection closed
> Sep  8 22:06:39 nameme kernel:  host27: ib_srp: connection closed
> Sep  8 22:06:39 nameme kernel: ib_srp:  host27: add qp_in_err timer
> Sep  8 22:06:39 nameme kernel:  host28: ib_srp: connection closed
> Sep  8 22:06:39 nameme kernel: ib_srp:  host28: add qp_in_err timer
> Sep  8 22:06:39 nameme kernel:  host29: ib_srp: connection closed
> Sep  8 22:06:39 nameme kernel: ib_srp:  host29: add qp_in_err timer
> Sep  8 22:06:39 nameme kernel:  host30: ib_srp: connection closed
> Sep  8 22:06:39 nameme kernel: ib_srp:  host30: add qp_in_err timer
> Sep  8 22:06:39 nameme kernel:  host31: ib_srp: connection closed
> Sep  8 22:06:39 nameme kernel: ib_srp:  host31: add qp_in_err timer
> Sep  8 22:06:39 nameme kernel:  host32: ib_srp: connection closed
> Sep  8 22:06:39 nameme kernel: ib_srp:  host32: add qp_in_err timer
> Sep  8 22:07:04 nameme kernel:  host27: ib_srp: srp_qp_in_err_timer called
> Sep  8 22:07:04 nameme kernel:  host27: ib_srp: srp_qp_in_err_timer
> flushed reset - done
> Sep  8 22:07:04 nameme kernel:  host28: ib_srp: srp_qp_in_err_timer called
> Sep  8 22:07:04 nameme kernel:  host28: ib_srp: srp_qp_in_err_timer
> flushed reset - done
> Sep  8 22:07:04 nameme kernel:  host29: ib_srp: srp_qp_in_err_timer called
> Sep  8 22:07:04 nameme kernel:  host29: ib_srp: srp_qp_in_err_timer
> flushed reset - done
> Sep  8 22:07:04 nameme kernel:  host30: ib_srp: srp_qp_in_err_timer called
> Sep  8 22:07:04 nameme kernel:  host30: ib_srp: srp_qp_in_err_timer
> flushed reset - done
> Sep  8 22:07:04 nameme kernel:  host31: ib_srp: srp_qp_in_err_timer called
> Sep  8 22:07:04 nameme kernel:  host31: ib_srp: srp_qp_in_err_timer
> flushed reset - done
> Sep  8 22:07:04 nameme kernel:  host32: ib_srp: srp_qp_in_err_timer called
> Sep  8 22:07:04 nameme kernel:  host32: ib_srp: srp_qp_in_err_timer
> flushed reset - done
> Sep  8 22:09:10 nameme kernel:  host27: ib_srp: DREQ received -
> connection closed
> Sep  8 22:09:10 nameme kernel:  host28: ib_srp: DREQ received -
> connection closed
> Sep  8 22:09:10 nameme kernel:  host29: ib_srp: DREQ received -
> connection closed
> Sep  8 22:09:10 nameme kernel:  host30: ib_srp: DREQ received -
> connection closed
> Sep  8 22:09:10 nameme kernel:  host31: ib_srp: DREQ received -
> connection closed
> Sep  8 22:09:10 nameme kernel:  host32: ib_srp: DREQ received -
> connection closed
> Sep  8 22:09:12 nameme kernel:  host27: ib_srp: connection closed
> Sep  8 22:09:12 nameme kernel: ib_srp:  host27: add qp_in_err timer
> Sep  8 22:09:12 nameme kernel:  host28: ib_srp: connection closed
> Sep  8 22:09:12 nameme kernel: ib_srp:  host28: add qp_in_err timer
> Sep  8 22:09:12 nameme kernel:  host29: ib_srp: connection closed
> Sep  8 22:09:12 nameme kernel: ib_srp:  host29: add qp_in_err timer
> Sep  8 22:09:12 nameme kernel:  host30: ib_srp: connection closed
> Sep  8 22:09:12 nameme kernel: ib_srp:  host30: add qp_in_err timer
> Sep  8 22:09:12 nameme kernel:  host31: ib_srp: connection closed
> Sep  8 22:09:12 nameme kernel: ib_srp:  host31: add qp_in_err timer
> Sep  8 22:09:12 nameme kernel:  host32: ib_srp: connection closed
> Sep  8 22:09:12 nameme kernel: ib_srp:  host32: add qp_in_err timer
> Sep  8 22:09:37 nameme kernel:  host27: ib_srp: srp_qp_in_err_timer called
> Sep  8 22:09:37 nameme kernel:  host27: ib_srp: srp_qp_in_err_timer
> flushed reset - done
> Sep  8 22:09:37 nameme kernel:  host28: ib_srp: srp_qp_in_err_timer called
> Sep  8 22:09:37 nameme kernel:  host28: ib_srp: srp_qp_in_err_timer
> flushed reset - done
> Sep  8 22:09:37 nameme kernel:  host29: ib_srp: srp_qp_in_err_timer called
> Sep  8 22:09:37 nameme kernel:  host29: ib_srp: srp_qp_in_err_timer
> flushed reset - done
> Sep  8 22:09:37 nameme kernel:  host30: ib_srp: srp_qp_in_err_timer called
> Sep  8 22:09:37 nameme kernel:  host30: ib_srp: srp_qp_in_err_timer
> flushed reset - done
> Sep  8 22:09:37 nameme kernel:  host31: ib_srp: srp_qp_in_err_timer called
> Sep  8 22:09:37 nameme kernel:  host31: ib_srp: srp_qp_in_err_timer
> flushed reset - done
> Sep  8 22:09:37 nameme kernel:  host32: ib_srp: srp_qp_in_err_timer called
> Sep  8 22:09:37 nameme kernel:  host32: ib_srp: srp_qp_in_err_timer
> flushed reset - done
> Sep  8 22:10:36 nameme kernel:  host27: ib_srp: DREQ received -
> connection closed
> Sep  8 22:10:36 nameme kernel:  host28: ib_srp: DREQ received -
> connection closed
> Sep  8 22:10:36 nameme kernel:  host29: ib_srp: DREQ received -
> connection closed
> Sep  8 22:10:36 nameme kernel:  host30: ib_srp: DREQ received -
> connection closed
> Sep  8 22:10:36 nameme kernel:  host31: ib_srp: DREQ received -
> connection closed
> Sep  8 22:10:36 nameme kernel:  host32: ib_srp: DREQ received -
> connection closed
> Sep  8 22:10:39 nameme kernel:  host27: ib_srp: connection closed
> Sep  8 22:10:39 nameme kernel: ib_srp:  host27: add qp_in_err timer
> Sep  8 22:10:39 nameme kernel:  host28: ib_srp: connection closed
> Sep  8 22:10:39 nameme kernel: ib_srp:  host28: add qp_in_err timer
> Sep  8 22:10:39 nameme kernel:  host29: ib_srp: connection closed
> Sep  8 22:10:39 nameme kernel: ib_srp:  host29: add qp_in_err timer
> Sep  8 22:10:39 nameme kernel:  host30: ib_srp: connection closed
> Sep  8 22:10:39 nameme kernel: ib_srp:  host30: add qp_in_err timer
> Sep  8 22:10:39 nameme kernel:  host31: ib_srp: connection closed
> Sep  8 22:10:39 nameme kernel: ib_srp:  host31: add qp_in_err timer
> Sep  8 22:10:39 nameme kernel:  host32: ib_srp: connection closed
> Sep  8 22:10:39 nameme kernel: ib_srp:  host32: add qp_in_err timer
> Sep  8 22:11:04 nameme kernel:  host27: ib_srp: srp_qp_in_err_timer called
> Sep  8 22:11:04 nameme kernel:  host27: ib_srp: srp_qp_in_err_timer
> flushed reset - done
> Sep  8 22:11:04 nameme kernel:  host28: ib_srp: srp_qp_in_err_timer called
> Sep  8 22:11:04 nameme kernel:  host28: ib_srp: srp_qp_in_err_timer
> flushed reset - done
> Sep  8 22:11:04 nameme kernel:  host29: ib_srp: srp_qp_in_err_timer called
> Sep  8 22:11:04 nameme kernel:  host29: ib_srp: srp_qp_in_err_timer
> flushed reset - done
> Sep  8 22:11:04 nameme kernel:  host30: ib_srp: srp_qp_in_err_timer called
> Sep  8 22:11:04 nameme kernel:  host30: ib_srp: srp_qp_in_err_timer
> flushed reset - done
> Sep  8 22:11:04 nameme kernel:  host31: ib_srp: srp_qp_in_err_timer called
> Sep  8 22:11:04 nameme kernel:  host31: ib_srp: srp_qp_in_err_timer
> flushed reset - done
> Sep  8 22:11:04 nameme kernel:  host32: ib_srp: srp_qp_in_err_timer called
> Sep  8 22:11:04 nameme kernel:  host32: ib_srp: srp_qp_in_err_timer
> flushed reset - done
> 
> 
> Here is the start of another hang, un-hung by restarting the target,
> 
> Sep  8 22:32:33 nameme kernel: sd 29:0:0:7: timing out command, waited 360s
> Sep  8 22:32:33 nameme kernel: sd 29:0:0:7: SCSI error: return code = 0x06000000
> Sep  8 22:32:33 nameme kernel: end_request: I/O error, dev sdi, sector 123549288
> Sep  8 22:32:33 nameme kernel: device-mapper: multipath: Failing path 8:128.
> Sep  8 22:32:33 nameme kernel: sd 29:0:0:8: timing out command, waited 360s
> Sep  8 22:32:33 nameme kernel: sd 29:0:0:8: SCSI error: return code = 0x06000000
> Sep  8 22:32:33 nameme kernel: end_request: I/O error, dev sdj, sector 123538312
> Sep  8 22:32:33 nameme kernel: device-mapper: multipath: Failing path 8:144.
> Sep  8 22:38:33 nameme kernel: sd 29:0:0:8: timing out command, waited 360s
> Sep  8 22:38:33 nameme kernel: sd 29:0:0:8: SCSI error: return code = 0x06000000
> Sep  8 22:38:33 nameme kernel: end_request: I/O error, dev sdj, sector 140607872
> Sep  8 22:38:33 nameme kernel: sd 29:0:0:7: timing out command, waited 360s
> Sep  8 22:38:33 nameme kernel: sd 29:0:0:7: SCSI error: return code = 0x06000000
> Sep  8 22:38:33 nameme kernel: end_request: I/O error, dev sdi, sector 123549416
> Sep  8 22:38:33 nameme kernel: sd 29:0:0:0: timing out command, waited 360s
> Sep  8 22:38:33 nameme kernel: sd 29:0:0:0: SCSI error: return code = 0x06000000
> Sep  8 22:38:33 nameme kernel: end_request: I/O error, dev sdb, sector 123535744
> Sep  8 22:38:33 nameme kernel: device-mapper: multipath: Failing path 8:16.
> Sep  8 22:38:33 nameme kernel: sd 29:0:0:2: timing out command, waited 360s
> Sep  8 22:38:33 nameme kernel: sd 29:0:0:2: SCSI error: return code = 0x06000000
> Sep  8 22:38:33 nameme kernel: end_request: I/O error, dev sdd, sector 123535744
> Sep  8 22:38:33 nameme kernel: device-mapper: multipath: Failing path 8:48.
> Sep  8 22:38:33 nameme kernel: sd 29:0:0:6: timing out command, waited 360s
> Sep  8 22:38:33 nameme kernel: sd 29:0:0:6: SCSI error: return code = 0x06000000
> Sep  8 22:38:33 nameme kernel: end_request: I/O error, dev sdh, sector 123534960
> Sep  8 22:38:33 nameme kernel: device-mapper: multipath: Failing path 8:112.
> Sep  8 22:40:39 nameme kernel: sd 29:0:0:3: timing out command, waited 360s
> Sep  8 22:40:39 nameme kernel: sd 29:0:0:3: SCSI error: return code = 0x06000000

It means that the ini driver stopped accepting commands and not 
recovered after full line of resets.

> Sep  8 22:40:39 nameme kernel: end_request: I/O error, dev sde, sector 123559464
> Sep  8 22:40:39 nameme kernel: device-mapper: multipath: Failing path 8:64.
> Sep  8 22:41:32 nameme kernel:  host27: ib_srp: DREQ received -
> connection closed
> Sep  8 22:41:32 nameme kernel:  host28: ib_srp: DREQ received -
> connection closed
> Sep  8 22:41:32 nameme kernel:  host29: ib_srp: DREQ received -
> connection closed
> Sep  8 22:41:32 nameme kernel:  host30: ib_srp: DREQ received -
> connection closed
> Sep  8 22:41:32 nameme kernel:  host31: ib_srp: DREQ received -
> connection closed
> Sep  8 22:41:32 nameme kernel:  host32: ib_srp: DREQ received -
> connection closed
> Sep  8 22:41:34 nameme kernel:  host27: ib_srp: connection closed
> Sep  8 22:41:34 nameme kernel: ib_srp:  host27: add qp_in_err timer
> Sep  8 22:41:34 nameme kernel:  host28: ib_srp: connection closed
> Sep  8 22:41:34 nameme kernel: ib_srp:  host28: add qp_in_err timer
> Sep  8 22:41:34 nameme kernel:  host29: ib_srp: connection closed
> Sep  8 22:41:34 nameme kernel: ib_srp:  host29: add qp_in_err timer
> Sep  8 22:41:34 nameme kernel:  host30: ib_srp: connection closed
> Sep  8 22:41:34 nameme kernel: ib_srp:  host30: add qp_in_err timer
> Sep  8 22:41:34 nameme kernel:  host31: ib_srp: connection closed
> Sep  8 22:41:34 nameme kernel: ib_srp:  host31: add qp_in_err timer
> Sep  8 22:41:34 nameme kernel:  host32: ib_srp: connection closed
> Sep  8 22:41:34 nameme kernel: ib_srp:  host32: add qp_in_err timer
> Sep  8 22:41:59 nameme kernel:  host27: ib_srp: srp_qp_in_err_timer called
> Sep  8 22:41:59 nameme kernel:  host27: ib_srp: srp_qp_in_err_timer
> flushed reset - done
> Sep  8 22:41:59 nameme kernel:  host28: ib_srp: srp_qp_in_err_timer called
> Sep  8 22:41:59 nameme kernel:  host28: ib_srp: srp_qp_in_err_timer
> flushed reset - done
> Sep  8 22:41:59 nameme kernel:  host29: ib_srp: srp_qp_in_err_timer called
> Sep  8 22:41:59 nameme kernel:  host29: ib_srp: srp_qp_in_err_timer
> flushed reset - done
> Sep  8 22:41:59 nameme kernel:  host30: ib_srp: srp_qp_in_err_timer called
> Sep  8 22:41:59 nameme kernel:  host30: ib_srp: srp_qp_in_err_timer
> flushed reset - done
> Sep  8 22:41:59 nameme kernel:  host31: ib_srp: srp_qp_in_err_timer called
> Sep  8 22:41:59 nameme kernel:  host31: ib_srp: srp_qp_in_err_timer
> flushed reset - done
> Sep  8 22:41:59 nameme kernel:  host32: ib_srp: srp_qp_in_err_timer called
> Sep  8 22:41:59 nameme kernel:  host32: ib_srp: srp_qp_in_err_timer

Those messages should be analyzed and can be a key.

> flushed reset - done
> 
> If I get a chance to work w/ my systems again, I'll see if this
> re-registration helps there too.
> 
> Chris
>>> Chris
>>>> Chris
>>>>> Bart.




More information about the general mailing list