[openib-general] nfsrdma release 7 issues,

Tom Tucker tom at opengridcomputing.com
Tue Dec 12 15:36:14 PST 2006


Vu:

See below...

On Tue, 2006-12-12 at 15:01 -0800, Vu Pham wrote:
> James,
>   Beside the double page free issue that Tom already fixed, I see the 
> following issues:
> 1. simultaneous nfsrdmamount from multiple host issue. I see the 
> following error messages
> ...
> Dec 12 13:31:40 ibd202 kernel: svcrdma: QP event 4 received for 
> QP=ffff810240f5fa00
> Dec 12 13:34:17 ibd202 kernel: svcrdma: QP event 4 received for 
> QP=ffff810240f5f000
> Dec 12 13:34:17 ibd202 kernel: svcrdma: QP event 4 received for 
> QP=ffff810242cfa400

This is the known race in the ib cm that resulted in the addition of the
rdma_establish interface. For RNFS it is a benign message, but I do need
to add the call ...I'm not fond of the rdma_establish solution so I've
dragged my feet...Thanks for reminding me ;-)

> 
> 2.  While some clients run I/Os, one idle client try to access the mount 
> point ie. *ls* and get I/O input error. I see these error messages on 
> server log
> 
> Dec 12 13:58:29 ibd202 kernel: nfsd: terminating on error 22
> Dec 12 13:58:29 ibd202 kernel: svcrdma: bad WR completion
> Dec 12 13:58:29 ibd202 kernel:  ctxt=ffff810242130800, count=1 on 
> xprt=ffff8102431c0400, rqstp=ffff8102414cdc00, status=5
> ...
> Dec 12 14:04:29 ibd202 kernel: ib_mthca 0000:08:00.0: CQ entry for 
> unknown QP 2e0408
> 
> Then the mount point is inaccessible from all clients

Ooh. This looks bad. This isn't concurrent with issue 1. above is it?
Was the "idle" client idle for more than 6 minutes? 

> 
> 3. performance issue - I got max 450 MB/s  read from server cache 
> (comparing to 800 MB/s with release 6, using the same hw configuration 
> for both client/server)
> 

Oof... 

1. I get much better than this on my MTD1000 hardware with SDR. Can you
send me your .config?

2. Can you please send me the iozone test parameters your using?

Thanks,
Tom
> thanks,
> -vu





More information about the general mailing list