[openib-general] APM support in openib stack

Venkatesh Babu venkatesh.babu at 3leafnetworks.com
Thu Oct 26 15:12:56 PDT 2006


Any comments on the issue described in the following email ?

 It doesn't look like a firmware problem. I had got the APM working on
the same Mellanox HCA cards with IBGD 1.8.2 stack. With OFED 1.0 stack I
am getting the following problem. I guess it is some problem in
initializing the timers to the firmware.

VBabu

Venkatesh Babu wrote:

 > I have added couple of patches to the OFED stack as described in
 > bug#160, bug#172, and bug#159 and with this successfully tested the
 > APM functionality, except one issue.
 >
 > ISSUE:
 >  If I pull the both the cables then there are no paths to the
 > destination, so RC  QP connection is supposed to tear down. But it is
 > not working.
 >
 > 1. Create a RC QP and load both primary and alternate path
 >    (I was setting rnr_retry_count = 6, retry_count = 6,
 > packet_life_time field of struct ib_sa_path_rec to 15 and also tried
 > with 12)
 > 2. Send some traffic over RC QP
 > 3. Disconnect the cable  belonging to the primary path
 > 4. It smoothly fails over to alternate path and it becomes primary path.
 >
 > No affect to the traffic on that RC QP
 > 5. Remove the second cable belonging to the new primary path.
 > 6. Obviously traffic stops since there are no paths to the
 > destination. But for the outstanding WRs in the RC QP I don't get any
 > callback from the verbs layer describing whether it succeeded or
 > failed due to some error like IB_WC_RETRY_EXC_ERR.
 > When I query the RC QP properties it still shows that it is in 
 > IB_QPS_RTS state.
 >
 >
 > Without APM functionality it behaves correctly -
 > 1. Create a RC QP and load only primary path
 >    (I was setting rnr_retry_count = 6, retry_count = 6,
 > packet_life_time field of struct ib_sa_path_rec to 15 and also tried
 > with 12)
 > 2. Send some traffic over RC QP
 > 3. Disconnect the cable  belonging to the primary path
 > 4. Obviously traffic stops since there are no paths to the
 > destination. For the outstanding WRs in the RC QP I do get a callback
 > from the verbs layer describing the first WR that it failed due to
 > error IB_WC_RETRY_EXC_ERR and for all other WRs I get IB_WC_WR_FLUSH_ERR.
 > I will close this RC QP.




More information about the general mailing list