[ofw] SRP bug?

Randy Kreiser rkreiser at datadirectnet.com
Fri Dec 21 06:53:04 PST 2007


Leonid, set the register you wanted to a "1" and it fails much quicker but
that was the only change I saw as it still fails the format.

Randy

-----Original Message-----
From: Leonid Keller [mailto:leonid at mellanox.co.il] 
Sent: Thursday, December 20, 2007 4:50 AM
To: rob at systemfabricworks.com; ofw at lists.openfabrics.org
Cc: Randy Kreiser
Subject: RE: [ofw] SRP bug?

Hi Rob,

Thank you for the elaborate analysis. It seems right.
I'd like to get some more information, maybe you or someone else can
help.

Did this trace come from an IB sniffer ? 
(Otherwise we can't be sure that the corruption happens at Initator's
side.)

How often it happens ? 

How can one reproduce it ?

What SRP target is being used ?

Could we ask (and whom) to perform experiments ?
For example, I'd suggest to set ModeFlags to 1 in
HKLM\SYSTEM\CurrentControlSet\Services\ibsrp\parameters, restart SRP
driver and rerun the test.

Leonid



> -----Original Message-----
> From: ofw-bounces at lists.openfabrics.org 
> [mailto:ofw-bounces at lists.openfabrics.org] On Behalf Of 
> Robert H.B. Netzer
> Sent: Tuesday, December 18, 2007 8:38 PM
> To: ofw at lists.openfabrics.org
> Cc: 'Randy Kreiser'
> Subject: [ofw] SRP bug?
> 
> I have recently been shown a trace of an SRP session between 
> the WinOF 1.0.1 SRP initiator and a DDN S2A9550 storage 
> appliance that has the following suspicious SRP_CMD.  It 
> seems to contain a bad virtual address.  Here is the payload 
> of the send from the initiator to the appliance (this is a 
> few hundred cmds into the stream):
> 
> 02000000 00200100 EF010000 00000000
> 00000000 00000000 00000000 00000000
> 2A000064 00220000 20000000 00000000
> 00000000 05A83364 A8002201 00000010
> 00004000 03006209 04006309 AA002301
> 00004000
> 
> Consulting the SRP and SCSI specs and decoding this:
> 
> The first row indicates that it's an SRP_CMD, that there is 
> one data-out buffer descriptor, and that it's an "indirect 
> data buffer descriptor" (type 2h, encoded in the high nibble 
> of the sixth byte above).
> 
> The SCSI CBD starts in the third row and is a write (10-byte 
> CDB). The length is 20h blocks (16k bytes).
> 
> The data-out buffer descriptor starts at byte 48 (fourth row) 
> and consists of a 16-byte "indirect table memory descriptor", 
> a four-byte total length (00004000), and one 16-byte "partial 
> memory descriptor" (there is one of these because the 
> data-out buffer descriptor count, the 7th byte in the SRP_CMD, is 1).
> 
> The suspicious part is the partial memory descriptor, which 
> is this (copying the last four words from above): 03006209 
> 04006309 AA002301 00004000.  This is a virtual address of 
> 03006209 04006309, a memory handle (AA002301) that looks like 
> the other ones earlier in the trace, and a data length of 16k.
> 
> The SRP stream gets into trouble when the target does an RDMA 
> Read Request using this virtual address -- it looks bogus.
> 
> I'm hoping that someone can double-check my decoding of this 
> packet, and perhaps Tzachi could take a look.
> 
> Rob Netzer
> System Fabric Works, Inc.
> 
> 
> _______________________________________________
> ofw mailing list
> ofw at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ofw
> 






More information about the ofw mailing list