[openib-general] Performance Degradation with OFED v. Voltaire

Mon Dec 11 15:20:56 PST 2006

On Tue, 2006-12-05 at 12:22 +0000, Eric Barton wrote:
> Hi,
> 
> We'd dearly like some help to understand why we seem to be having
> performance issues with OFED.  When we run a lustre network bandwidth
> benchmark, we find significant performance degradation on OFED versus
> Voltaire...
> 
>              Premap (256 RDMA frags)     Map on demand (1 RDMA frag)
>              Voltaire  OFED  Ratio       Voltaire  OFED  Ratio 
> Writes MB/s  682       567   83 %        577       436   75 %
> Reads MB/s   658       554   84 %        555       432   77 %

  Where these tests run on the same hardware setup?  If so was it PCI-X
or PCIe?  HCA firmware version would also be useful.

  Roland may be able to comment on if their are performance difference
for interrupt-drive CQ between the old VAPI stacks and OFED.

  At face value these results are troubling since we are starting to
move all of our IB clusters, that use Lustre, over to OFED.

  Thanks,

	- Matt

> 
> These tests measure the bandwidth of 1MByte transfers pipelined 8 deep.
> All hardware/software was the same, apart from the IB stack and the lustre
> network driver.
> 
> The architecture of the lustre network drivers for OFED and Voltaire are
> almost identical.  Both use RC QPs with the same control message protocol
> to set up bulk data transfers using RDMA WRITE.  Control messages use a
> credit flow protocol to ensure that they are only sent when buffers are
> posted to receive them.  Concurrent transfers over the same QP are
> supported so that lustre can pipeline bulk I/O.
> 
> The only difference between the lustre network drivers is that the Voltaire
> driver has a single global CQ and the OFED driver has 1 CQ per QP.  However
> the measurement above are for a single pair of nodes - in this case both
> implementations use a single CQ.
> 
> By default, the drivers pre-map all of physical memory so each RDMA
> consists of page fragments.  However, we can also compile both drivers to
> map on demand using FMR so that RDMA is not fragmented.  The results above
> compare both methods and although both drivers perform worse when mapping,
> the OFED driver takes the bigger hit.
> 
> We'd be delighted if anyone can shed any light or can suggest any steps we
> should take to discover the reason.  We're also very willing to provide
> assistance if any of the OpenFabrics developers wants to duplicate the
> setup.
>