[openib-general] [PATCH] IB/srp: don't schedule reconnect from srp, scsi does it for us (was Re: FW: OFED 1.1 rc3 srp driver panic)
Michael S. Tsirkin
mst at mellanox.co.il
Thu Sep 7 05:00:01 PDT 2006
Quoting r. ishai at dev.mellanox.co.il <ishai at dev.mellanox.co.il>:
> Subject: Re: FW: OFED 1.1 rc3 srp driver panic
>
> I think I found the race that causes this NULL Dereference.
>
> 1) There is a connection error.
>
> 2) srp_completion gets bad status and schedules a call to srp_reconnect_work.
>
> 3) srp_reconnect_work is scheduled to run and calls srp_reconnect_target.
>
> 4) srp_reconnect_target starts to run, changes the target state to
> SRP_TARGET_CONNECTING but there is a context switch before it gets to
> execute srp_reset_req.
>
> 5) The scsi error handling calls to srp_reset_host.
>
> 6) srp_reset_host calls srp_reconnect_target that returns -EAGAIN
> (because the target state is not SRP_TARGET_LIVE).
>
> 7) srp_reset_host returns FAILED and therefore the device goes offline.
>
> 8) Because the device goes offline the commands are being freed (In the
> scsi mid-layer).
>
> 9) The first execution of srp_reconnect_target resumes and calls to
> srp_reset_req that tries to access the commands that were freed.
>
> 10) NULL deref.
>
> Ishai
It seems that we don't really need to schedule srp_reconnect_work on
error since it will be called later anyway.
So it seems we can address these crashes and simplify srp
in the following way:
---
IB/srp: don't schedule reconnet from srp, scsi does it for us
If there is a problem in the connection, the scsi mid-layer will
eventually call reset_host that will call srp_reconnect, so we
do not need to schedule srp_reconnect_work from srp_completion.
Removing this prevents srp_reset_host from failing if srp_completion
was in progress, which in turn was causing crashes as both scsi
midlayer and srp_reconnect were cancelling commands.
Signed-off-by: Ishai Rabinovitz <ishai at mellanox.co.il>
Signed-off-by: Michael S. Tsirkin <mst at mellanox.co.il>
Index: last_stable/drivers/infiniband/ulp/srp/ib_srp.c
===================================================================
--- last_stable.orig/drivers/infiniband/ulp/srp/ib_srp.c 2006-09-06 15:37:50.000000000 +0300
+++ last_stable/drivers/infiniband/ulp/srp/ib_srp.c 2006-09-07 11:16:28.000000000 +0300
@@ -799,13 +799,6 @@ static void srp_process_rsp(struct srp_t
spin_unlock_irqrestore(target->scsi_host->host_lock, flags);
}
-static void srp_reconnect_work(void *target_ptr)
-{
- struct srp_target_port *target = target_ptr;
-
- srp_reconnect_target(target);
-}
-
static void srp_handle_recv(struct srp_target_port *target, struct ib_wc *wc)
{
struct srp_iu *iu;
@@ -858,7 +851,6 @@ static void srp_completion(struct ib_cq
{
struct srp_target_port *target = target_ptr;
struct ib_wc wc;
- unsigned long flags;
ib_req_notify_cq(cq, IB_CQ_NEXT_COMP);
while (ib_poll_cq(cq, 1, &wc) > 0) {
@@ -866,10 +858,6 @@ static void srp_completion(struct ib_cq
printk(KERN_ERR PFX "failed %s status %d\n",
wc.wr_id & SRP_OP_RECV ? "receive" : "send",
wc.status);
- spin_lock_irqsave(target->scsi_host->host_lock, flags);
- if (target->state == SRP_TARGET_LIVE)
- schedule_work(&target->work);
- spin_unlock_irqrestore(target->scsi_host->host_lock, flags);
break;
}
@@ -1724,8 +1712,6 @@ static ssize_t srp_create_target(struct
target->scsi_host = target_host;
target->srp_host = host;
- INIT_WORK(&target->work, srp_reconnect_work, target);
-
INIT_LIST_HEAD(&target->free_reqs);
INIT_LIST_HEAD(&target->req_queue);
for (i = 0; i < SRP_SQ_SIZE; ++i) {
--
MST
More information about the general
mailing list