[ofa-general] RNR NAK issues

Mon Apr 9 18:23:48 PDT 2007

I am wrestling with some RNR NAK issues with the IPOIB CM (NOSRQ) work. I 
find that in ipoib_cm_handle_tx_wc() wc->status is 13 
(IB_WC_RNR_RETRY_EXC_ERR). The sequence of events appears to be the 
following:

1. When the receiver has received ipoib_recvq_size messages, the sender 
receives an RNR NAK execeeded (B_WC_RNR_RETRY_EXC_ERR).
This results in the sender destroying its qp and sending a DREQ message to 
the other end. I find it a little stange that this error occurs even after 
the receive buffers are 
successfully posted to the qp.
2. The application (netperf) continues to send messages and setup happens 
all over again i.e. the qp are recreated.
3. This does not stop the application (infact netperf completes 
successfully) but this behaviour hammers the performance and, the 
throughput drops like a stone.

One of the things that I discovered was that in cm.c 
qp_attr->min_rnr_timer was set to 0. What is the purpose of settng this to 
0? How are drivers expected to use this? I see that mthca does some 
computation.
Probably because of this ( min_rnr_timer = 0) ehca appears to use this 
value and sets it to 0 too.

I hacked to change this value (in cm.c) to a non zero value. This improved 
performance, however I still see the previously mentioned RNR NAK issue. I 
have tried setting .cap.max_recv_wr to values between
ipoib_recvq_size - 2 to ipoib_recvq_size + 1. This seems to make no 
difference.

I tried this with 2.6.21-rc5 as the base. Any suggestions as to what I 
maybe missing? I reworked my earlier patch and eliminted the the #ifdefs 
and incorporated other comments. Other than that it is no difference.

Pradeep
pradeep at us.ibm.com