[ewg] Tools to diagnose rdma performance issue?

Chen, Mei-Jen MChen at tmriusa.com
Thu Aug 15 08:21:33 PDT 2013


Hi Tom,

Our applications do not have requirements to use rdma/verbs,  the only requirement is to achieve best performance and stability. The reason I pick rdma /verbs for evaluation are mainly because it seems easier to setup, and I were able to integrate the required layers with the OS easily. If MPI over PSM can tight with QLE7340 better, then I will  try it out this way.

Is it typical that rdma/verbs have much more overhead than MPI (because of do verbs processing on the CPUs)? Since my current tested throughput is quite low, I wonder if there is something else...

The CPU I am using is Sandy bridge which I think is relatively new. 

Mei-Jen



-----Original Message-----
From: Elken, Tom [mailto:tom.elken at intel.com] 
Sent: Wednesday, August 14, 2013 5:20 PM
To: Chen, Mei-Jen; ewg at lists.openfabrics.org
Subject: RE: Tools to diagnose rdma performance issue?

Hi Mei-Jen,

Do your applications need to use rdma/verbs?

If you are mainly interested in MPI, you can use MPI over PSM with your QLE7340 (True Scale) HCAs for better performance and stability.  PSM is  a library that comes with OFED.
If so, basic MPI benchmarks such as IMB, OSU MPI Benchmarks will likely show a lot higher bandwidth than you are getting with the rdma tests you show below.

Also what CPU model are you using?  That has an influence on rdma/verbs performance with the qib driver and these HCAs, which do verbs processing on the CPUs.  I typically see a lot better performance than you show below on recent Intel CPUs.

-Tom

> -----Original Message-----
> From: ewg-bounces at lists.openfabrics.org [mailto:ewg- 
> bounces at lists.openfabrics.org] On Behalf Of Chen, Mei-Jen
> Sent: Wednesday, August 14, 2013 1:20 PM
> To: ewg at lists.openfabrics.org
> Subject: [ewg] Tools to diagnose rdma performance issue?
> 
> Hi,
> 
> I am using OFED-3.5 , pretest-2.0, to check  rdma performance between 
> two 4x QDR HCAs which connects with  a switch.
> As the tested performance results are much slower than expected 
> (please see attached result below, I expected more than 3000 MB/s).  
> There is probably something very wrong and I am struggling with find 
> proper tools to diagnose/narrow down problems.  Any suggestion is appreciated.
> 
> The selected packages from OFED-3.5 were recompiled and installed on 
> top of Linux 3.4 (with PREEMPT RT patch). The HCA driver comes from of kernel.org.
> So far I have tried perftest, qperf, ibv_rc_pingpong. They all have similar results.
> Iblinkinfo has shown all links are up with correct speed (4X 10.0 Gbps).
> 
> Thanks.
> 
> 
> # ib_read_bw 192.168.200.2
> ----------------------------------------------------------------------
> ----------------- Device not recognized to implement inline feature. 
> Disabling it
> ---------------------------------------------------------------------------------------
>                     RDMA_Read BW Test
>  Dual-port       : OFF		Device : qib0
>  Number of qps   : 1
>  Connection type : RC
>  TX depth        : 128
>  CQ Moderation   : 100
>  Mtu             : 2048B
>  Link type       : IB
>  Outstand reads  : 16
>  rdma_cm QPs	 : OFF
>  Data ex. method : Ethernet
> ----------------------------------------------------------------------
> -----------------  local address: LID 0x02 QPN 0x00a1 PSN 0x72c87b OUT 
> 0x10 RKey 0x4f5000 VAddr 0x007f3190af2000  remote address: LID 0x01 
> QPN 0x0095 PSN 0xca533a OUT 0x10 RKey 0x494a00 VAddr 0x007fe0f1f45000
> ---------------------------------------------------------------------------------------
>  #bytes     #iterations    BW peak[MB/sec]    BW average[MB/sec]
> MsgRate[Mpps]
>  65536      1000           1709.24            1709.23		   0.027348
> ----------------------------------------------------------------------
> -----------------
> 
> # ibstat
> CA 'qib0'
> 	CA type: InfiniPath_QLE7340
> 	Number of ports: 1
> 	Firmware version:
> 	Hardware version: 2
> 	Node GUID: 0x001175000070aafe
> 	System image GUID: 0x001175000070aafe
> 	Port 1:
> 		State: Active
> 		Physical state: LinkUp
> 		Rate: 40
> 		Base lid: 1
> 		LMC: 0
> 		SM lid: 1
> 		Capability mask: 0x0761086a
> 		Port GUID: 0x001175000070aafe
> 		Link layer: InfiniBand
> 
> _________________________________________________________________
> _____
> This email has been scanned by the Symantec Email Security.cloud service.
> For more information please visit http://www.symanteccloud.com 
> _________________________________________________________________
> _____
> _______________________________________________
> ewg mailing list
> ewg at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg

______________________________________________________________________
This email has been scanned by the Symantec Email Security.cloud service.
For more information please visit http://www.symanteccloud.com ______________________________________________________________________

______________________________________________________________________
This email has been scanned by the Symantec Email Security.cloud service.
For more information please visit http://www.symanteccloud.com
______________________________________________________________________



More information about the ewg mailing list