[ofa-general] iSER data corruption issues

Tom Tucker tom at opengridcomputing.com
Wed Oct 3 11:02:22 PDT 2007


On Wed, 2007-10-03 at 13:42 -0400, Pete Wyckoff wrote: 
> How does the requester (in IB speak) know that an RDMA Write
> operation has completed on the responder?
> 
> We have a software iSER target, available at git.osc.edu/tgt or
> browse at http://git.osc.edu/?p=tgt.git .  Using the existing
> in-kernel iSER initiator code, very rarely data corruption occurs,
> in that the received data from SCSI read operations does not match
> what was expected.  Sometimes it appears as if random kernel memory
> has been scribbled on by an errant RDMA write from the target.  My
> current working theory that the RDMA write has not completed by the
> time the initiator looks at its incoming data buffer.
> 
> Single RC QP, single CQ, no SRQ.  Only Send, Receive, and RDMA Write
> work requests are used.  After everything is connected up, a SCSI
> read sequence looks like:
> 
>     initiator: register pages with FMR, write test pattern
>     initiator: Send request to target
>     target:    Recv request
>     target:    RDMA Write response to initiator
>     target:    Wait for CQ entry for local RDMA Write completion
Pete:

I don't think this should be necessary...

>     target:    Send response to initiator

...as long as the send is posted on the same SQ as the write.

>     initiator: Recv response, access buffer
> 
> On very rare occasions, this buffer will have the test pattern, not
> the data that the target just sent.
> 
> Machines are opteron, fedora 7 up-to-date with its openfab libs,
> kernel 2.6.23-rc6 on target.  Either 2.6.23-rc6 or 2.6.22 or
> 2.6.18-rhel5 on initiator.  For some reason, it is much easier to
> produce with the rhel5 kernel.  One site with fast disks can see
> similar corruption with 2.6.23-rc6, however.  Target is pure
> userspace.  Initiator is in kernel and is poked by "lmdd" (like
> normal dd) through an iSCSI block device (/dev/sdb).
> 
> The IB spec seems to indicate that the contents of the RDMA Write
> buffer should be stable after completion of a subsequent send
> message (o9-20).  In fact, the "Wait for CQ entry" step on the
> target should be unnecessary, no?

I think so too.

> 
> Could there be some caching issues that the initiator is missing?
> I've added print[fk]s to the initiator and target to verify that the
> sequence of events is truly as above, and that the virtual addresses
> are as expected on both sides.
> 
> Any suggestions or advice would help.  Thanks,
> 

If your theory is correct, the data should eventually show up. Does it?

Does your code check for errors on dma_map_single/page? 

> 		-- Pete
> 
> 
> P.S.  Here are some debugging printfs not in the git.
> 
> Userspace code does 200 read()s of length 8000, but complains about
> the result somewhere in the 14th read, from 112000 to 120000, and
> exits early.  Expected pattern is a series of 400000 4-byte words,
> incrementing by 4, starting from 0.  So 0x00000000, 0x00000004, ...,
> 0x001869fc:
> 
> % lmdd of=internal ipat=1 if=/dev/sdb bs=8000 count=200 mismatch=10
> off=112000 want=1c000 got=3b3b3b3b
> 
> Initiator generates a series of SCSI operations, as driven by
> readahead and the block queue scheduler.  You can see that it starts
> reading 4 pages, then 1 page, then 23 pages, then 1 page and so on,
> in order.  These sizes and offsets vary from run to run.  Each line
> here is printed after the SCSI read response has been received.  It
> prints the first word in the buffer, and you can see the test
> pattern where data should be:
> 
> tag 02 va 36061000 len  4000 word0 00000000 ref 1
> tag 03 va 36065000 len  1000 word0 00004000 ref 1
> tag 04 va 36066000 len 17000 word0 00005000 ref 1
> tag 05 va 7b6bc000 len  1000 word0 3b3b3b3b ref 1

Is it interesting that the bad word occurs on the first page of the new
map?

> tag 06 va 7b6bd000 len 1f000 word0 0001d000 ref 1
> tag 07 va 7bdc2000 len 20000 word0 0003c000 ref 1
> 
> The userspace target code prints a line when it starts the RDMA
> write, then a line when the RDMA write completes locally, then a
> line when it sends the repsponse.  The tags are what the initiator
> assigned to each request.  The target thinks it is sending a
> 4096-byte buffer that has 0x1c000 as its first word, but the
> initiator did not see it:
> 
> tag 02 va 36061000 len  4000 word0 00000000 rdmaw
> tag 02 rdmaw completion
> tag 02 resp
> tag 03 va 36065000 len  1000 word0 00004000 rdmaw
> tag 03 rdmaw completion
> tag 03 resp
> tag 04 va 36066000 len 17000 word0 00005000 rdmaw
> tag 04 rdmaw completion
> tag 04 resp
> tag 05 va 7b6bc000 len  1000 word0 0001c000 rdmaw
> tag 05 rdmaw completion
> tag 05 resp
> tag 06 va 7b6bd000 len 1f000 word0 0001d000 rdmaw
> tag 06 rdmaw completion
> tag 07 va 7bdc2000 len 20000 word0 0003c000 rdmaw
> tag 07 rdmaw completion
> tag 06 resp
> tag 07 resp
> 
> _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general




More information about the general mailing list