[openib-general] Lustre over OpenIB Gen2

Eric Barton eeb at bartonsoftware.com
Thu Nov 10 10:32:30 PST 2005


Hi,

I'm working with Cluster File Systems on lustre network drivers, including IB
drivers for the Voltaire, Infinicon and Topspin stacks.  These are kernel
drivers which use RC QPs with VERBS for small message queueing and RDMA for
bulk transfers.

We're obviously looking at OpenIB Gen2, and I wonder if people could be so kind
as to answer some questions for me.

1. How stable is the CM API and is it supported by all OpenIB affiliated
   vendors?

2. I'd like to scale to >= 10,000 peer nodes; 1 RC QP per peer.  Is this going
   to get me into trouble?

   For example, I currently create a single PD and CQ for everything, however
   the example I've seen (cmatose.c) appears to create these separately for
   each peer.  Is that what I should be doing too?

3. Is contiguous memory allocation an issue in Gen2?  Since this is such a
   scarce resource in the kernel (and particular CQ usage with one vendor's
   stack relied heavily on it) what red flags should I be aware of?

4. Are RDMA reads still deprecated?  Which resources hit the spotlight if I
   chose to use them?

5. Should I pre-map all physical memory and do RDMA in page-sized fragments?
   This avoids any mapping overhead at the expense of having much larger
   numbers of queued RDMAs.  Since I try to keep up to 8 (by default) 1MByte
   RDMAs active concurrently to any individual peer, with 4k pages I can have
   up to 2048 RDMA work items queued at a time per peer.

   And if I pre-map, can I be guaranteed that if I put the CQ into the error
   state, all remote access to my memory is revoked (e.g. could a CQ I create
   after I destroy the one I just shut down somehow alias with it such that a
   pathalogically delayed RDMA could write my memory)?

   Or is it better to use FMR pools and take the map/unmap overhead?  If so, is
   there a way to know when the unmap actually hits the hardware and my memory
   is safe?

6. Does Gen2 present substantially the same APIs as the kernel in userspace?
   So if I wrote a userspace equivalent of my kernel driver, could I have pure
   userspace clients talk to kernel servers?

Thanks in advance...

-- 

                Cheers,
                        Eric

---------------------------------------------------
|Eric Barton        Barton Software               |
|9 York Gardens     Tel:    +44 (117) 330 1575    |
|Clifton            Mobile: +44 (7909) 680 356    |
|Bristol BS8 4LL    Fax:    call first            |
|United Kingdom     E-Mail: ----------------------|
---------------------------------------------------



More information about the general mailing list