[ofw] SRP bug?

Leonid Keller leonid at mellanox.co.il
Wed Dec 26 01:50:42 PST 2007


 Hi Bill, See the answers below

> -----Original Message-----
> From: Bill Boas [mailto:bboas at systemfabricworks.com] 
> Sent: Tuesday, December 25, 2007 12:06 AM
> To: 'Randy Kreiser'; Leonid Keller; 
> rob at systemfabricworks.com; ofw at lists.openfabrics.org
> Subject: RE: [ofw] SRP bug?
> 
> Randy, thanks for responding to Leonid's questions.
> 
> Leonid,
> 
> The US Gov customers are anxious to learn if we are making 
> progress understanding the cause(s) of this bug.
> 
> Do you have access to suitable hardware and software in the 
> Mellanox facilities where you are - (Yokneam)? To duplicate 
> this bug and run further tests to diagnose the root causes?

No. We  do not have the same HW and, which is may be more important, the
same SW.
The SRP target driver, you are working with, is some home-made
development, 
based on a (2 years ago) 1.X.0 IB Gold Mellanox SRP driver, which, as I
know, had some bugs in it.
May be, not all of them have been fixed by your guys.
On the setups, we have got here, all tests are passing OK.
I believe, it will be worthful to have a remote access to some similar
setup for to continue the investigation of the problem.

> 
> Bill.
> 
> Bill Boas
> VP, Business  Development
> System Fabric Works
> 510-375-8840
> bboas at systemfabricworks.com
> www.systemfabricworks.com
> 
> 
> -----Original Message-----
> From: ofw-bounces at lists.openfabrics.org
> [mailto:ofw-bounces at lists.openfabrics.org] On Behalf Of Randy Kreiser
> Sent: Monday, December 24, 2007 10:39 AM
> To: 'Leonid Keller'; rob at systemfabricworks.com; 
> ofw at lists.openfabrics.org
> Subject: RE: [ofw] SRP bug?
> 
> HI Leonid, answers are below!
> 
> Randy
> 
> -----Original Message-----
> From: Leonid Keller [mailto:leonid at mellanox.co.il]
> Sent: Sunday, December 23, 2007 3:53 AM
> To: Randy Kreiser; rob at systemfabricworks.com; 
> ofw at lists.openfabrics.org
> Subject: RE: [ofw] SRP bug?
> 
> Hi Randy,
> 
> Thank you for the reply. 
> A bit more questions:
> Does it fail with ModeFlags=0 (a default value) ? 
> 
> 	A) Yes, it fails with settings of 0,1 and 3
> 
> What SW run on the target side ?
> 
> 	A) We are running a target driver DDN version 3.08
> 
> What kind of device is that appliance (from SRP point of view) ?
> 
> 	A) Dumb block device (RAID controller with 8 luns).
> 
> Does data transfer work in raw mode (without formatting) ?
> 
> 	A) Yes, we setup a CXFS client running windows and it 
> reads and writes until your heart is content!
> 
> (you can check that with Iometer)
> TIA
> 
> Leonid 
> 
> > -----Original Message-----
> > From: Randy Kreiser [mailto:rkreiser at datadirectnet.com]
> > Sent: Friday, December 21, 2007 4:53 PM
> > To: Leonid Keller; rob at systemfabricworks.com; 
> > ofw at lists.openfabrics.org
> > Subject: RE: [ofw] SRP bug?
> > 
> > Leonid, set the register you wanted to a "1" and it fails 
> much quicker 
> > but that was the only change I saw as it still fails the format.
> > 
> > Randy
> > 
> > -----Original Message-----
> > From: Leonid Keller [mailto:leonid at mellanox.co.il]
> > Sent: Thursday, December 20, 2007 4:50 AM
> > To: rob at systemfabricworks.com; ofw at lists.openfabrics.org
> > Cc: Randy Kreiser
> > Subject: RE: [ofw] SRP bug?
> > 
> > Hi Rob,
> > 
> > Thank you for the elaborate analysis. It seems right.
> > I'd like to get some more information, maybe you or someone 
> else can 
> > help.
> > 
> > Did this trace come from an IB sniffer ? 
> > (Otherwise we can't be sure that the corruption happens at 
> Initator's
> > side.)
> > 
> > How often it happens ? 
> > 
> > How can one reproduce it ?
> > 
> > What SRP target is being used ?
> > 
> > Could we ask (and whom) to perform experiments ?
> > For example, I'd suggest to set ModeFlags to 1 in 
> > HKLM\SYSTEM\CurrentControlSet\Services\ibsrp\parameters,
> > restart SRP driver and rerun the test.
> > 
> > Leonid
> > 
> > 
> > 
> > > -----Original Message-----
> > > From: ofw-bounces at lists.openfabrics.org 
> > > [mailto:ofw-bounces at lists.openfabrics.org] On Behalf Of 
> Robert H.B.
> > > Netzer
> > > Sent: Tuesday, December 18, 2007 8:38 PM
> > > To: ofw at lists.openfabrics.org
> > > Cc: 'Randy Kreiser'
> > > Subject: [ofw] SRP bug?
> > > 
> > > I have recently been shown a trace of an SRP session
> > between the WinOF
> > > 1.0.1 SRP initiator and a DDN S2A9550 storage appliance
> > that has the
> > > following suspicious SRP_CMD.  It seems to contain a bad virtual 
> > > address.  Here is the payload of the send from the 
> initiator to the 
> > > appliance (this is a few hundred cmds into the stream):
> > > 
> > > 02000000 00200100 EF010000 00000000
> > > 00000000 00000000 00000000 00000000
> > > 2A000064 00220000 20000000 00000000
> > > 00000000 05A83364 A8002201 00000010
> > > 00004000 03006209 04006309 AA002301
> > > 00004000
> > > 
> > > Consulting the SRP and SCSI specs and decoding this:
> > > 
> > > The first row indicates that it's an SRP_CMD, that there is one 
> > > data-out buffer descriptor, and that it's an "indirect 
> data buffer 
> > > descriptor" (type 2h, encoded in the high nibble of the 
> sixth byte 
> > > above).
> > > 
> > > The SCSI CBD starts in the third row and is a write
> > (10-byte CDB). The
> > > length is 20h blocks (16k bytes).
> > > 
> > > The data-out buffer descriptor starts at byte 48 (fourth row) and 
> > > consists of a 16-byte "indirect table memory descriptor", a
> > four-byte
> > > total length (00004000), and one 16-byte "partial memory
> > descriptor" 
> > > (there is one of these because the data-out buffer
> > descriptor count,
> > > the 7th byte in the SRP_CMD, is 1).
> > > 
> > > The suspicious part is the partial memory descriptor, 
> which is this 
> > > (copying the last four words from above): 03006209
> > > 04006309 AA002301 00004000.  This is a virtual address of
> > > 03006209 04006309, a memory handle (AA002301) that looks like the 
> > > other ones earlier in the trace, and a data length of 16k.
> > > 
> > > The SRP stream gets into trouble when the target does an 
> RDMA Read 
> > > Request using this virtual address -- it looks bogus.
> > > 
> > > I'm hoping that someone can double-check my decoding of
> > this packet,
> > > and perhaps Tzachi could take a look.
> > > 
> > > Rob Netzer
> > > System Fabric Works, Inc.
> > > 
> > > 
> > > _______________________________________________
> > > ofw mailing list
> > > ofw at lists.openfabrics.org
> > > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ofw
> > > 
> > 
> > 
> > 
> > 
> 
> 
> 
> _______________________________________________
> ofw mailing list
> ofw at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ofw
> 



More information about the ofw mailing list