[Users] Low performance of some Apps over IB
vithanousek
vithanousek at seznam.cz
Thu Sep 4 07:54:25 PDT 2014
Hello,
Im trying to setup an Infiniband network on our HPC cluster. Each node have an Intel True Scale HCA QLE7340, and they are connected thorw Externaly managed switch Intel True Scale 12200, OpenSM is used as a subnet manager.
There are CentOS linux 6.5 with lastest updates installed.
Drivers are installed from IntelIB-Basic.RHEL6-x86_64.7.3.0.0.26.tgz
Most things seems to work, OpenMPI test running throw PSM shows latency about 3us sometimes 9us.
The same test running over openib btl is showing latency about 12us.
But ibping test returns show avg. 0.242 ms and ping over IPoIB is slower than over Ethernet ( IPoIB:0.159ms, Eth: 0.146ms).
I dont know if these numbers are okey or not, i have nothing to compare.
The main problem is that OrangeFS filesystem is teribly slow over IB. (writing speed is about 5MB/s) and it seems that problem isnt in OragngeFS but in InfiniBand. OrangeFS uses IB_verbs send/reciev and rdma_write.
I know that our switch has FW version 7.1.0.0.58, but I'm not sure if it is good idea to update firmware over problematic network. Is the update to 7.3 needed and can it solves our problems?
I sometime got error message from OpenMPI, when Im trying to use openib btl:
===============================================================================================
[[24318,1],18][btl_openib_component.c:3369:handle_wc] from node4 to: node16 error polling LP CQ with status RETRY EXCEEDED ERROR status number 12 for wr_id 2802a68 opcode 2 vendor error 0 qp_idx 3
--------------------------------------------------------------------------
The InfiniBand retry count between two MPI processes has been
exceeded. "Retry count" is defined in the InfiniBand spec 1.2
(section 12.7.38):
The total number of times that the sender wishes the receiver to
retry timeout, packet sequence, etc. errors before posting a
completion error.
This error typically means that there is something awry within the
InfiniBand fabric itself. You should note the hosts on which this
error has occurred; it has been observed that rebooting or removing a
particular host from the job can sometimes resolve this issue.
Two MCA parameters can be used to control Open MPI's behavior with
respect to the retry count:
* btl_openib_ib_retry_count - The number of times the sender will
attempt to retry (defaulted to 7, the maximum value).
* btl_openib_ib_timeout - The local ACK timeout parameter (defaulted
to 20). The actual timeout value used is calculated as:
4.096 microseconds * (2^btl_openib_ib_timeout)
See the InfiniBand spec 1.2 (section 12.7.34) for more details.
Below is some information about the host that raised the error and the
peer to which it was connected:
Local host: node4
Local device: qib0
Peer host: node16
You may need to consult with your system administrator to get this
problem fixed.
====================================================================================
Other useful information can come from output of ipathstats, but i dont know what the numbers mean:
=====================================================================================
KernIntr 404717 ErrorIntr 381 Tx_Errs 0
Rcv_Errs 0 H/W_Errs 0 NoPIOBufs 3
CtxtsOpen 0 RcvLen_Errs 0 EgrBufFull 381
EgrHdrFull 0
Unit0:
Interrupts 291167 HostBusStall 1268599 RxTIDFull 0
RxTIDInvalid 0 RxTIDFloDrop 0 Ctxt0EgrOvfl 0
Ctxt1EgrOvfl 0 Ctxt2EgrOvfl 75 Ctxt3EgrOvfl 289
Ctxt4EgrOvfl 78 Ctxt5EgrOvfl 172 Ctxt6EgrOvfl 390
Ctxt7EgrOvfl 0 Ctxt8EgrOvfl 439 Ctxt9EgrOvfl 861
Ctx10EgrOvfl 552 Ctx11EgrOvfl 0 Ctx12EgrOvfl 34
Ctx13EgrOvfl 252 Ctx14EgrOvfl 0 Ctx15EgrOvfl 0
Ctx16EgrOvfl 14 Ctx17EgrOvfl 0
Port0,1:
TxPkt 17246079 TxWords 843008697 RxPkt 187776772
RxWords 90842065K TxFlowStall 162450 TxDmaDesc 351343
RxDlidFltr 0 IBStatusChng 19 IBLinkDown 0
IBLnkRecov 0 IBRxLinkErr 0 IBSymbolErr 0
RxLLIErr 0 RxBadFormat 0 RxBadLen 0
RxBufOvrfl 0 RxEBP 0 RxFlowCtlErr 0
RxICRCerr 0 RxLPCRCerr 0 RxVCRCerr 0
RxInvalLen 0 RxInvalPKey 0 RxPktDropped 0
TxBadLength 0 TxDropped 0 TxInvalLen 0
TxUnderrun 0 TxUnsupVL 0 RxLclPhyErr 0
RxVL15Drop 0 RxVlErr 0 XcessBufOvfl 0
RxQPBadCtxt 0 TXBadHeader 0
==================================================================================
Please excuse my poor level of English.
Thanks for yours replay
Hanousek Vít
More information about the Users
mailing list