[openib-general] Performance Degradation with OFED v. Voltaire

Bernadat, Philippe philippe_bernadat at hp.com
Thu Dec 14 08:11:37 PST 2006


> I guess the difference must be in the Lustre NAL, since you say other
> userspace code gets comparable performance.  Is there any difference
> in the architecture of the NAL for the Voltaire stack and the standard
> Linux stack?

I think Eric described the major differences earlier on, here it is, see
second half:

On Tue, 2006-12-05 at 12:22 +0000, Eric Barton wrote:
> Hi,
> 
> We'd dearly like some help to understand why we seem to be having
> performance issues with OFED.  When we run a lustre network bandwidth
> benchmark, we find significant performance degradation on OFED versus
> Voltaire...
> 
>              Premap (256 RDMA frags)     Map on demand (1 RDMA frag)
>              Voltaire  OFED  Ratio       Voltaire  OFED  Ratio 
> Writes MB/s  682       567   83 %        577       436   75 %
> Reads MB/s   658       554   84 %        555       432   77 %
> 
> These tests measure the bandwidth of 1MByte transfers pipelined 8
deep.
> All hardware/software was the same, apart from the IB stack and the
lustre
> network driver.
> 
> The architecture of the lustre network drivers for OFED and Voltaire
are
> almost identical.  Both use RC QPs with the same control message
protocol
> to set up bulk data transfers using RDMA WRITE.  Control messages use
a
> credit flow protocol to ensure that they are only sent when buffers
are
> posted to receive them.  Concurrent transfers over the same QP are
> supported so that lustre can pipeline bulk I/O.
> 
> The only difference between the lustre network drivers is that the
Voltaire
> driver has a single global CQ and the OFED driver has 1 CQ per QP.
However
> the measurement above are for a single pair of nodes - in this case
both
> implementations use a single CQ.
> 
> By default, the drivers pre-map all of physical memory so each RDMA
> consists of page fragments.  However, we can also compile both drivers
to
> map on demand using FMR so that RDMA is not fragmented.  The results
above
> compare both methods and although both drivers perform worse when
mapping,
> the OFED driver takes the bigger hit.
> 
> We'd be delighted if anyone can shed any light or can suggest any
steps we
> should take to discover the reason.  We're also very willing to
provide
> assistance if any of the OpenFabrics developers wants to duplicate the
> setup.
> 

 

> -----Original Message-----
> From: Roland Dreier [mailto:rdreier at cisco.com] 
> Sent: Thursday, December 14, 2006 5:06 PM
> To: Bernadat, Philippe
> Cc: Hal Rosenstock; Tziporet Koren; openib-general at openib.org
> Subject: Re: [openib-general] Performance Degradation with 
> OFED v. Voltaire
> 
> OK, it looks like the PCI config is OK.
> 
> I guess the difference must be in the Lustre NAL, since you say other
> userspace code gets comparable performance.  Is there any difference
> in the architecture of the NAL for the Voltaire stack and the standard
> Linux stack?
> 
> You may have to rely on Voltaire and/or the Lustre people to fix this,
> since they're the only ones with the complete picture about the
> Voltaire stack.
> 
>  - R.
> 




More information about the general mailing list