[openib-general] Performance Degradation with OFED v. Voltaire
Bernadat, Philippe
philippe_bernadat at hp.com
Thu Dec 14 08:11:37 PST 2006
> I guess the difference must be in the Lustre NAL, since you say other
> userspace code gets comparable performance. Is there any difference
> in the architecture of the NAL for the Voltaire stack and the standard
> Linux stack?
I think Eric described the major differences earlier on, here it is, see
second half:
On Tue, 2006-12-05 at 12:22 +0000, Eric Barton wrote:
> Hi,
>
> We'd dearly like some help to understand why we seem to be having
> performance issues with OFED. When we run a lustre network bandwidth
> benchmark, we find significant performance degradation on OFED versus
> Voltaire...
>
> Premap (256 RDMA frags) Map on demand (1 RDMA frag)
> Voltaire OFED Ratio Voltaire OFED Ratio
> Writes MB/s 682 567 83 % 577 436 75 %
> Reads MB/s 658 554 84 % 555 432 77 %
>
> These tests measure the bandwidth of 1MByte transfers pipelined 8
deep.
> All hardware/software was the same, apart from the IB stack and the
lustre
> network driver.
>
> The architecture of the lustre network drivers for OFED and Voltaire
are
> almost identical. Both use RC QPs with the same control message
protocol
> to set up bulk data transfers using RDMA WRITE. Control messages use
a
> credit flow protocol to ensure that they are only sent when buffers
are
> posted to receive them. Concurrent transfers over the same QP are
> supported so that lustre can pipeline bulk I/O.
>
> The only difference between the lustre network drivers is that the
Voltaire
> driver has a single global CQ and the OFED driver has 1 CQ per QP.
However
> the measurement above are for a single pair of nodes - in this case
both
> implementations use a single CQ.
>
> By default, the drivers pre-map all of physical memory so each RDMA
> consists of page fragments. However, we can also compile both drivers
to
> map on demand using FMR so that RDMA is not fragmented. The results
above
> compare both methods and although both drivers perform worse when
mapping,
> the OFED driver takes the bigger hit.
>
> We'd be delighted if anyone can shed any light or can suggest any
steps we
> should take to discover the reason. We're also very willing to
provide
> assistance if any of the OpenFabrics developers wants to duplicate the
> setup.
>
> -----Original Message-----
> From: Roland Dreier [mailto:rdreier at cisco.com]
> Sent: Thursday, December 14, 2006 5:06 PM
> To: Bernadat, Philippe
> Cc: Hal Rosenstock; Tziporet Koren; openib-general at openib.org
> Subject: Re: [openib-general] Performance Degradation with
> OFED v. Voltaire
>
> OK, it looks like the PCI config is OK.
>
> I guess the difference must be in the Lustre NAL, since you say other
> userspace code gets comparable performance. Is there any difference
> in the architecture of the NAL for the Voltaire stack and the standard
> Linux stack?
>
> You may have to rely on Voltaire and/or the Lustre people to fix this,
> since they're the only ones with the complete picture about the
> Voltaire stack.
>
> - R.
>
More information about the general
mailing list