[openfabrics-ewg] Current OFED kernel snapshot - problems in backporting SRP to RH4

Ishai Rabinovitz ishai at mellanox.co.il
Thu May 4 08:39:18 PDT 2006


On Wed, May 03, 2006 at 12:07:19AM +0300, Doug Ledford wrote:
> On Tue, May 02, 2006 at 11:30:13AM +0300, Ishai Rabinovitz wrote:
> > We have a problem when trying to back port SRP to RH4 U2 and U3 (Actually to 
> > any kernel earlier than 2.6.13).
> > The problem is when the SCSI driver is calling to eh_abort_handler,
> > or to eh_device_reset_handler.
> > In the current kernel (starting from 2.6.13) this call is made without 
> > host_lock spin-lock locked.
> > In the SRP code that performs the abort and the reset (srp_send_tsk_mgmt) we 
> > send a message to the target and we wait for a response from the target.
> > 
> > In early versions of the kernel the SCSI driver performs irq_spinlock_save to 
> > the host_lock before calling to the abort or reset handlers.
> > This creates a problem: The SRP driver can not go to sleep until the target will
> > answer.
> 
> 
> static int srp_abort(struct scsi_cmnd *scmnd)
> {
> 	int rv;
>         printk(KERN_ERR "SRP abort called\n");
> 
> 	spin_unlock_irq(scmnd->device->host->host_lock);
>         rv = srp_send_tsk_mgmt(scmnd, SRP_TSK_ABORT_TASK);
>         spin_lock_irq(scmnd->device->host->host_lock);
> 	return rv;
> }
> 
> repeat the same change for srp_reset and srp_reset_host
> 
> -- 
>   Doug Ledford <dledford at redhat.com>
>          Red Hat, Inc. 
>          1801 Varsity Dr.
>          Raleigh, NC 27606
>   
> _______________________________________________
> openfabrics-ewg mailing list
> openfabrics-ewg at openib.org
> http://openib.org/mailman/listinfo/openfabrics-ewg

Hi Doug,

Thank you for your suggestion. I tried this but it did not work.

Possible reasons why it did not work:
1) Maybe 2.6.9 with this patch only reveals a bug that exists also in the current kernel.
2) There is a chance that the SCSI driver is holding host_lock to avoid a race with
another operation it does (for example freeing memory of this target). 
3) Since the SCSI driver uses the spin_lock_irqsave version, there may be a
call to srp_abort in which interrupts are not allowed. The unlock in the patch allows 
interrupts and this may cause the problems.

Any suggestion?
-- 
Ishai Rabinovitz



More information about the ewg mailing list