[ofa-general] [SRPT]multiple initiators supported?

Ian Jiang ianjiang.ict at gmail.com
Sun May 20 17:49:18 PDT 2007


(1) I found where my problem was.
I should not define DEBUG_TM in the Makefile of Generic SCSI target
mid-level for Linux (SCST).
Reference from README of scst-0.9.5.2:
 - DEBUG_TM - turns on task management functions debugging, when on
   LUN 0 in the "Default" group some of the commands will be delayed for
   about 60 sec., so making the remote initiator send TM functions, eg
   ABORT TASK and TARGET RESET. Also set TM_DBG_GO_OFFLINE symbol in the
   Makefile to 1 if you want that the device eventually become
   completely unresponsive, or to 0 otherwise to circle around ABORTs
   and RESETs code. Needs DEBUG turned on.

(2) Likely a bug in SRPT:
The CM ID cannot be destroyed in srpt_release_channel(). If so and if
a initiator disconnect from the target, the following connection
request will fail, because no CM ID exists at that moment. So the line
  ib_destroy_cm_id(ch->cm_id);
in srpt_release_channel() of ib_srpt.c should be removed. And the CM
ID would be destroyed only when the entire SRP target is removed.


On 5/15/07, Ian Jiang <ianjiang.ict at gmail.com> wrote:
> Hi Vu,
> Thanks for your replay. But I have got something wrong when using two
> initiators.
>
> Two initiators and one target are at three separated nodes.  The first
> initiator connected to the target correctly. However, the second one
> was aborted 1 minute after its login, and then required to
> *reset_host*, but it failed to send the CM Connect Request when trying
> to reconnect to the target.
>
> Here are the logs of the second initiator:
>
> May 15 13:58:59 cluster5 kernel: ib_srp: new target: id_ext
> 0002c90200206bd8 ioc_guid 0002c90200206bd8 pkey ffff service_id
> 0002c90200206bd8 dgid fe80:0000:0000:0000:0002:c902:0020:6bd9
> May 15 13:58:59 cluster5 kernel: scsi2 : SRP.T10:0002C90200206BD8
> May 15 13:58:59 cluster5 kernel:   Vendor: SCST_FIO  Model: fdisk_128M
>        Rev:  095
> May 15 13:58:59 cluster5 kernel:   Type:   Direct-Access
>        ANSI SCSI revision: 04
> May 15 13:58:59 cluster5 kernel: SCSI device sdb: 262144 512-byte hdwr
> sectors (134 MB)
> May 15 13:58:59 cluster5 kernel: sdb: Write Protect is off
> May 15 13:58:59 cluster5 kernel: sdb: Mode Sense: 6b 00 10 08
> May 15 13:58:59 cluster5 kernel: SCSI device sdb: drive cache: write back w/ FUA
> May 15 13:58:59 cluster5 kernel: SCSI device sdb: 262144 512-byte hdwr
> sectors (134 MB)
> May 15 13:58:59 cluster5 kernel: sdb: Write Protect is off
> May 15 13:58:59 cluster5 kernel: sdb: Mode Sense: 6b 00 10 08
> May 15 13:58:59 cluster5 kernel: SCSI device sdb: drive cache: write back w/ FUA
> May 15 13:58:59 cluster5 kernel:  sdb: unknown partition table
> May 15 13:58:59 cluster5 kernel: sd 2:0:0:0: Attached scsi disk sdb
> May 15 13:58:59 cluster5 kernel: sd 2:0:0:0: Attached scsi generic sg1 type 0
> May 15 13:59:59 cluster5 kernel: SRP abort called
> May 15 13:59:59 cluster5 kernel: SRP reset_device called
> May 15 14:00:29 cluster5 kernel: SRP abort called
> May 15 14:00:34 cluster5 kernel: ib_srp: SRP reset_host called
> May 15 14:00:36 cluster5 kernel: ib_srp: connection closed
> May 15 14:02:15 cluster5 kernel: ib_srp: Sending CM REQ failed
> May 15 14:02:15 cluster5 kernel: ib_srp: reconnect failed (-104),
> removing target port.
> May 15 14:02:15 cluster5 kernel: sd 2:0:0:0: scsi: Device offlined -
> not ready after error recovery
> May 15 14:02:15 cluster5 kernel: sd 2:0:0:0: rejecting I/O to offline device
> May 15 14:02:15 cluster5 kernel: Buffer I/O error on device sdb,
> logical block 32760
> May 15 14:02:15 cluster5 kernel:  2:0:0:0: rejecting I/O to dead device
> May 15 14:02:15 cluster5 kernel: Buffer I/O error on device sdb,
> logical block 32760
>
> Here are the logs of the target during the second initiator's
> connection.  It seemed that it did not receive the reconnect request.
>
> May 15 13:57:27 cluster4 kernel: ib_srpt: Host
> i_port_id=0x100000000000000:0xcc6b200002c90200 login with
> t_port_id=0xd86b200002c90200:0xd86b200002c90200 it_iu_len=260
> May 15 13:57:27 cluster4 kernel: ib_srpt: srpt_create_ch_ib[1105]
> max_cqe= 255 max_sge= 29 cm_id= da9b7200
> May 15 13:57:27 cluster4 kernel: [3966]: scst_init_session:scst: Name
> 0x00000000000000010002c90200206bcc not found, using default group
> May 15 13:57:27 cluster4 kernel: [3966]:
> scst_alloc_add_tgt_dev:Virtual device SCST lun=0
> May 15 13:57:27 cluster4 kernel: [3966]: tm_dbg_init_tgt_dev:LUN 0
> connected from initiator ib_srpt is under TM debugging
> May 15 13:57:27 cluster4 kernel: ib_srpt: Establish connection sess=
> c9a677a8 name= 0x00000000000000010002c90200206bcc cm_id= da9b7200
> May 15 13:57:27 cluster4 kernel: [3964]: scst_set_pending_UA:Setting
> pending UA cmd dabb3ec0
> May 15 13:57:27 cluster4 kernel: [3964]:
> tm_dbg_delay_cmd:tm_dbg_delay_cmd: delaying timed cmd dabb3ec0 (tag
> 35) for 60.96 seconds (15241 HZ)
>
> May 15 13:58:27 cluster4 kernel: ib_srpt: recv_tsk_mgmt= 1 for
> task_tag= 35 using tag= 163 cm_id= da9b7200 sess= c9a677a8
> May 15 13:58:27 cluster4 kernel: [0]: scst_rx_mgmt_fn_tag:sess=c9a677a8, tag=35
> May 15 13:58:27 cluster4 kernel: [0]: scst_post_rx_mgmt_cmd:Adding
> mgmt cmd c70486a0 to active mgmt cmd list
> May 15 13:58:27 cluster4 kernel: [3965]: scst_mgmt_cmd_thread:Moving
> mgmt cmd c70486a0 to mgmt cmd list
> May 15 13:58:27 cluster4 kernel: [3965]: scst_mgmt_cmd_init:Cmd
> dabb3ec0 for tag 35 (sn 35) found, aborting it
> May 15 13:58:27 cluster4 kernel: [3965]: scst_abort_cmd:Aborting cmd
> dabb3ec0 (tag 35)
> May 15 13:58:27 cluster4 kernel: [3965]:
> scst_call_dev_task_mgmt_fn:Calling dev handler disk_fileio
> task_mgmt_fn(fn=0)
> May 15 13:58:27 cluster4 kernel: [3965]:
> scst_call_dev_task_mgmt_fn:Dev handler disk_fileio task_mgmt_fn()
> returned 0
> May 15 13:58:27 cluster4 kernel: [3965]: tm_dbg_release_cmd:Abort
> request for delayed cmd dabb3ec0 (tag=35), moving it to active cmd
> list (delayed_cmds_count=1)
> May 15 13:58:27 cluster4 kernel: ib_srpt: srpt_tsk_mgmt_done[1972]
> tsk_mgmt_done for tag= 163 status=0
> May 15 13:58:27 cluster4 kernel: [3965]: scst_mgmt_cmd_send_done:Dev
> handler ib_srpt task_mgmt_fn_done() returned
> May 15 13:58:27 cluster4 kernel: ib_srpt: recv_tsk_mgmt= 8 for
> task_tag= 35 using tag= 163 cm_id= da9b7200 sess= c9a677a8
> May 15 13:58:27 cluster4 kernel: [3964]: tm_dbg_check_cmd:Processing
> delayed cmd dabb3ec0 (tag 35), delayed_cmds_count=1
> May 15 13:58:27 cluster4 kernel: [3964]: tm_dbg_change_state:Deleting timer
> May 15 13:58:27 cluster4 kernel: ib_srpt: srpt_xmit_response[1898]
> tag= 35 already get aborted
> May 15 13:58:57 cluster4 kernel: ib_srpt: recv_tsk_mgmt= 1 for
> task_tag= 36 using tag= 164 cm_id= da9b7200 sess= c9a677a8
> May 15 13:58:57 cluster4 kernel: [0]: scst_rx_mgmt_fn_tag:sess=c9a677a8, tag=36
> May 15 13:58:57 cluster4 kernel: [0]: scst_post_rx_mgmt_cmd:Adding
> mgmt cmd c7048240 to active mgmt cmd list
> May 15 13:58:57 cluster4 kernel: [3965]: scst_mgmt_cmd_thread:Moving
> mgmt cmd c7048240 to mgmt cmd list
> May 15 13:58:57 cluster4 kernel: [3965]: scst_mgmt_cmd_init:Cmd
> dabb3050 for tag 36 (sn 36) found, aborting it
> May 15 13:58:57 cluster4 kernel: [3965]: scst_abort_cmd:Aborting cmd
> dabb3050 (tag 36)
> May 15 13:58:57 cluster4 kernel: [3965]:
> scst_call_dev_task_mgmt_fn:Calling dev handler disk_fileio
> task_mgmt_fn(fn=0)
> May 15 13:58:57 cluster4 kernel: [3965]:
> scst_call_dev_task_mgmt_fn:Dev handler disk_fileio task_mgmt_fn()
> returned 0
> May 15 13:58:57 cluster4 kernel: [3965]: scst_abort_cmd:cmd dabb3050
> (tag 36) being executed/xmitted (state 12), deferring ABORT...
> May 15 13:58:57 cluster4 kernel: [3965]:
> scst_set_mcmd_next_state:cmd_wait_count(1) not 0, preparing to wait
> May 15 13:59:02 cluster4 kernel: ib_srpt: srpt_cm_dreq_recv[1523]
> cm_id= da9b7200 ch->state= 1
> May 15 13:59:04 cluster4 kernel: ib_srpt: srpt_cm_timewait_exit[1502]
> cm_id= da9b7200
> May 15 13:59:04 cluster4 kernel: ib_srpt: srpt_release_channel[1154]
> Release channel cm_id= da9b7200
> May 15 13:59:04 cluster4 kernel: ib_srpt: srpt_release_channel[1159]
> Release sess= c9a677a8 sess_name= 0x00000000000000010002c90200206bcc
> May 15 13:59:21 cluster4 kernel: ib_srpt: failed send status= 12
> May 15 13:59:21 cluster4 kernel: [0]: scst_complete_cmd_mgmt:cmd
> dabb3050 completed (tag 36, mcmd c7048240, mcmd->cmd_wait_count 1)
> May 15 13:59:21 cluster4 kernel: [0]: scst_complete_cmd_mgmt:Moving
> mgmt cmd c7048240 to active mgmt cmd list
> May 15 13:59:21 cluster4 kernel: [3965]: scst_mgmt_cmd_thread:Moving
> mgmt cmd c7048240 to mgmt cmd list
> May 15 13:59:21 cluster4 kernel: ib_srpt: srpt_tsk_mgmt_done[1972]
> tsk_mgmt_done for tag= 164 status=-1
> May 15 13:59:21 cluster4 kernel: [3965]: scst_mgmt_cmd_send_done:Dev
> handler ib_srpt task_mgmt_fn_done() returned
> May 15 13:59:21 cluster4 kernel: ib_srpt:
> srpt_unregister_session_done[1143]  sess= c9a677a8
> May 15 13:59:21 cluster4 kernel: [3966]: scst_free_all_UA:Clearing UA
> for tgt_dev lun 0
> May 15 13:59:21 cluster4 kernel: ib_srpt: failed send status= 5
> May 15 13:59:21 cluster4 kernel: ib_srpt: QP event 16 on cm_id=
> da9b7200 sess_name= 0x00000000000000010002c90200206bcc state= 2
>
>
> I have no idea why the *abort* was called at the second initiator.
> Could you please give some suggestion? Thanks a lot!
>
>
> On 5/15/07, Vu Pham <vuhuong at mellanox.com> wrote:
> > Ian Jiang wrote:
> > > Does the SRP target support multiple initiators?
> >
> > Yes, it does.
> >
> >
> > > I am using the SRR initiator and IB drivers in linux-2.6.20.
> > > The SRP target is at
> > > http://www.openfabrics.org/git/?p=~vu/srpt.git;a=summary
> > > and the IB driver is OFED-1.1 with linux-2.6.16.13-4-default of Suse-10.1.
> > >
>
>
>
> --
> Ian Jiang
>


-- 
Ian Jiang



More information about the general mailing list