[Users] Low performance of some Apps over IB

vithanousek vithanousek at seznam.cz
Thu Sep 4 07:54:25 PDT 2014


Hello,

Im trying to setup an Infiniband network on our HPC cluster. Each node have an Intel True Scale HCA QLE7340, and they are connected thorw Externaly managed switch Intel True Scale 12200, OpenSM is used as a subnet manager. 
There are CentOS linux 6.5 with lastest updates installed.
Drivers are installed from IntelIB-Basic.RHEL6-x86_64.7.3.0.0.26.tgz

Most things seems to work, OpenMPI test running throw PSM shows latency about 3us sometimes 9us. 
The same test running over openib btl is showing latency about 12us.
But ibping test returns show avg.  0.242 ms and ping over IPoIB is slower than over Ethernet ( IPoIB:0.159ms, Eth: 0.146ms).
I dont know if these numbers are okey or not, i have nothing to compare.

The main problem is that OrangeFS filesystem is teribly slow over IB. (writing speed is about 5MB/s) and it seems that problem isnt in OragngeFS but in InfiniBand. OrangeFS uses IB_verbs send/reciev and rdma_write.

I know that our switch has FW version 7.1.0.0.58, but I'm not sure if it is good idea to update firmware over problematic network. Is the update to 7.3 needed and can it solves our problems?

I sometime got error message from OpenMPI, when Im trying to use openib btl:

===============================================================================================
[[24318,1],18][btl_openib_component.c:3369:handle_wc] from node4 to: node16 error polling LP CQ with status RETRY EXCEEDED ERROR status number 12 for wr_id 2802a68 opcode 2  vendor error 0 qp_idx 3
--------------------------------------------------------------------------
The InfiniBand retry count between two MPI processes has been
exceeded.  "Retry count" is defined in the InfiniBand spec 1.2
(section 12.7.38):

    The total number of times that the sender wishes the receiver to
    retry timeout, packet sequence, etc. errors before posting a
    completion error.

This error typically means that there is something awry within the
InfiniBand fabric itself.  You should note the hosts on which this
error has occurred; it has been observed that rebooting or removing a
particular host from the job can sometimes resolve this issue.

Two MCA parameters can be used to control Open MPI's behavior with
respect to the retry count:

* btl_openib_ib_retry_count - The number of times the sender will
  attempt to retry (defaulted to 7, the maximum value).
* btl_openib_ib_timeout - The local ACK timeout parameter (defaulted
  to 20).  The actual timeout value used is calculated as:

     4.096 microseconds * (2^btl_openib_ib_timeout)

  See the InfiniBand spec 1.2 (section 12.7.34) for more details.

Below is some information about the host that raised the error and the
peer to which it was connected:

  Local host:   node4
  Local device: qib0
  Peer host:    node16

You may need to consult with your system administrator to get this
problem fixed.
====================================================================================

Other useful information can come from output of ipathstats, but i dont know what the numbers mean:


=====================================================================================
    KernIntr        404717    ErrorIntr           381      Tx_Errs             0
    Rcv_Errs             0     H/W_Errs             0    NoPIOBufs             3
   CtxtsOpen             0  RcvLen_Errs             0   EgrBufFull           381
  EgrHdrFull             0 
Unit0:
  Interrupts        291167 HostBusStall       1268599    RxTIDFull             0
RxTIDInvalid             0 RxTIDFloDrop             0 Ctxt0EgrOvfl             0
Ctxt1EgrOvfl             0 Ctxt2EgrOvfl            75 Ctxt3EgrOvfl           289
Ctxt4EgrOvfl            78 Ctxt5EgrOvfl           172 Ctxt6EgrOvfl           390
Ctxt7EgrOvfl             0 Ctxt8EgrOvfl           439 Ctxt9EgrOvfl           861
Ctx10EgrOvfl           552 Ctx11EgrOvfl             0 Ctx12EgrOvfl            34
Ctx13EgrOvfl           252 Ctx14EgrOvfl             0 Ctx15EgrOvfl             0
Ctx16EgrOvfl            14 Ctx17EgrOvfl             0 
Port0,1:
       TxPkt      17246079      TxWords     843008697        RxPkt     187776772
     RxWords     90842065K  TxFlowStall        162450    TxDmaDesc        351343
  RxDlidFltr             0 IBStatusChng            19   IBLinkDown             0
  IBLnkRecov             0  IBRxLinkErr             0  IBSymbolErr             0
    RxLLIErr             0  RxBadFormat             0     RxBadLen             0
  RxBufOvrfl             0        RxEBP             0 RxFlowCtlErr             0
   RxICRCerr             0   RxLPCRCerr             0    RxVCRCerr             0
  RxInvalLen             0  RxInvalPKey             0 RxPktDropped             0
 TxBadLength             0    TxDropped             0   TxInvalLen             0
  TxUnderrun             0    TxUnsupVL             0  RxLclPhyErr             0
  RxVL15Drop             0      RxVlErr             0 XcessBufOvfl             0
 RxQPBadCtxt             0  TXBadHeader             0 

==================================================================================

Please excuse my poor level of English.

Thanks for yours replay
Hanousek Vít


More information about the Users mailing list