[ewg] Tools to diagnose rdma performance issue?

Elken, Tom tom.elken at intel.com
Thu Aug 15 15:20:05 PDT 2013


> From: Chen, Mei-Jen [mailto:MChen at tmriusa.com]
> Our applications do not have requirements to use rdma/verbs,  the only
> requirement is to achieve best performance and stability. The reason I pick rdma
> /verbs for evaluation are mainly because it seems easier to setup, and I were
> able to integrate the required layers with the OS easily. If MPI over PSM can
> tight with QLE7340 better, then I will  try it out this way.

[Tom]   Hi Mei-Jen,

MPI runs over PSM with these performance advantages:
*	Lightweight PSM implementation including kernel bypass on performance critical paths
*	Hardware DMA on send and receive sides for large messages with no CPU copy
*	No interrupts in the performance path
*	One CPU core running PSM can deliver the full achievable bandwidth of the Intel QDR HCA
*	MPI message rate scales with number of CPU cores running PSM

> 
> Is it typical that rdma/verbs have much more overhead than MPI (because of do
> verbs processing on the CPUs)? 
[Tom] 
Other vendors' HCAs have chosen to build their MPI implementation on top of rdma/verbs, so in that case there is not more overhead when running verbs than when running MPI over verbs.

Our verbs implemtation has been optimized over the years, and is recommended to be configured so that multiple CPU cores help with verbs processing in cases where the verbs/rdma load is heavy, as might be the case with a storage server.

 > Since my current tested throughput is quite low,
> I wonder if there is something else...
> The CPU I am using is Sandy bridge which I think is relatively new.
[Tom] 
Yes, they are relatively new.  What clock rate are your CPUs?
I did your RDMA test on two nodes with 2.7 GHz Sandy Bridge CPUs and got this result using the same QLE7340 HCA as you used:

ib_read_bw
------------------------------------------------------------------
 #bytes     #iterations    BW peak[MB/sec]    BW average[MB/sec]
65536      1000           3251.65            3250.70
------------------------------------------------------------------

>From this point, we can take the details of your issue onto a separate e-mail thread since this may be getting more into an Intel support and system configuration issue rather than an OFED/EWG issue.  I'll also forward you an Open Fabrics User Day 2012 presentation I made with an "Introduction to PSM" which includes basics on how to build MPIs with PSM support.

Cheers,
-Tom Elken

> 
> Mei-Jen
> 
> 
> 
> -----Original Message-----
> From: Elken, Tom [mailto:tom.elken at intel.com]
> Sent: Wednesday, August 14, 2013 5:20 PM
> To: Chen, Mei-Jen; ewg at lists.openfabrics.org
> Subject: RE: Tools to diagnose rdma performance issue?
> 
> Hi Mei-Jen,
> 
> Do your applications need to use rdma/verbs?
> 
> If you are mainly interested in MPI, you can use MPI over PSM with your
> QLE7340 (True Scale) HCAs for better performance and stability.  PSM is  a
> library that comes with OFED.
> If so, basic MPI benchmarks such as IMB, OSU MPI Benchmarks will likely show a
> lot higher bandwidth than you are getting with the rdma tests you show below.
> 
> Also what CPU model are you using?  That has an influence on rdma/verbs
> performance with the qib driver and these HCAs, which do verbs processing on
> the CPUs.  I typically see a lot better performance than you show below on
> recent Intel CPUs.
> 
> -Tom
> 
> > -----Original Message-----
> > From: ewg-bounces at lists.openfabrics.org [mailto:ewg-
> > bounces at lists.openfabrics.org] On Behalf Of Chen, Mei-Jen
> > Sent: Wednesday, August 14, 2013 1:20 PM
> > To: ewg at lists.openfabrics.org
> > Subject: [ewg] Tools to diagnose rdma performance issue?
> >
> > Hi,
> >
> > I am using OFED-3.5 , pretest-2.0, to check  rdma performance between
> > two 4x QDR HCAs which connects with  a switch.
> > As the tested performance results are much slower than expected
> > (please see attached result below, I expected more than 3000 MB/s).
> > There is probably something very wrong and I am struggling with find
> > proper tools to diagnose/narrow down problems.  Any suggestion is
> appreciated.
> >
> > The selected packages from OFED-3.5 were recompiled and installed on
> > top of Linux 3.4 (with PREEMPT RT patch). The HCA driver comes from of
> kernel.org.
> > So far I have tried perftest, qperf, ibv_rc_pingpong. They all have similar
> results.
> > Iblinkinfo has shown all links are up with correct speed (4X 10.0 Gbps).
> >
> > Thanks.
> >
> >
> > # ib_read_bw 192.168.200.2
> > ----------------------------------------------------------------------
> > ----------------- Device not recognized to implement inline feature.
> > Disabling it
> > ---------------------------------------------------------------------------------------
> >                     RDMA_Read BW Test
> >  Dual-port       : OFF		Device : qib0
> >  Number of qps   : 1
> >  Connection type : RC
> >  TX depth        : 128
> >  CQ Moderation   : 100
> >  Mtu             : 2048B
> >  Link type       : IB
> >  Outstand reads  : 16
> >  rdma_cm QPs	 : OFF
> >  Data ex. method : Ethernet
> > ----------------------------------------------------------------------
> > -----------------  local address: LID 0x02 QPN 0x00a1 PSN 0x72c87b OUT
> > 0x10 RKey 0x4f5000 VAddr 0x007f3190af2000  remote address: LID 0x01
> > QPN 0x0095 PSN 0xca533a OUT 0x10 RKey 0x494a00 VAddr 0x007fe0f1f45000
> > ---------------------------------------------------------------------------------------
> >  #bytes     #iterations    BW peak[MB/sec]    BW average[MB/sec]
> > MsgRate[Mpps]
> >  65536      1000           1709.24            1709.23		   0.027348
> > ----------------------------------------------------------------------
> > -----------------
> >
> > # ibstat
> > CA 'qib0'
> > 	CA type: InfiniPath_QLE7340
> > 	Number of ports: 1
> > 	Firmware version:
> > 	Hardware version: 2
> > 	Node GUID: 0x001175000070aafe
> > 	System image GUID: 0x001175000070aafe
> > 	Port 1:
> > 		State: Active
> > 		Physical state: LinkUp
> > 		Rate: 40
> > 		Base lid: 1
> > 		LMC: 0
> > 		SM lid: 1
> > 		Capability mask: 0x0761086a
> > 		Port GUID: 0x001175000070aafe
> > 		Link layer: InfiniBand
> >
> >
> _________________________________________________________________
> > _____
> > This email has been scanned by the Symantec Email Security.cloud service.
> > For more information please visit http://www.symanteccloud.com
> >
> _________________________________________________________________
> > _____
> > _______________________________________________
> > ewg mailing list
> > ewg at lists.openfabrics.org
> > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
> 
> _________________________________________________________________
> _____
> This email has been scanned by the Symantec Email Security.cloud service.
> For more information please visit http://www.symanteccloud.com
> _________________________________________________________________
> _____
> 
> _________________________________________________________________
> _____
> This email has been scanned by the Symantec Email Security.cloud service.
> For more information please visit http://www.symanteccloud.com
> _________________________________________________________________
> _____



More information about the ewg mailing list