[openib-general] APM support in openib stack
Venkatesh Babu
venkatesh.babu at 3leafnetworks.com
Thu Oct 26 15:12:56 PDT 2006
Any comments on the issue described in the following email ?
It doesn't look like a firmware problem. I had got the APM working on
the same Mellanox HCA cards with IBGD 1.8.2 stack. With OFED 1.0 stack I
am getting the following problem. I guess it is some problem in
initializing the timers to the firmware.
VBabu
Venkatesh Babu wrote:
> I have added couple of patches to the OFED stack as described in
> bug#160, bug#172, and bug#159 and with this successfully tested the
> APM functionality, except one issue.
>
> ISSUE:
> If I pull the both the cables then there are no paths to the
> destination, so RC QP connection is supposed to tear down. But it is
> not working.
>
> 1. Create a RC QP and load both primary and alternate path
> (I was setting rnr_retry_count = 6, retry_count = 6,
> packet_life_time field of struct ib_sa_path_rec to 15 and also tried
> with 12)
> 2. Send some traffic over RC QP
> 3. Disconnect the cable belonging to the primary path
> 4. It smoothly fails over to alternate path and it becomes primary path.
>
> No affect to the traffic on that RC QP
> 5. Remove the second cable belonging to the new primary path.
> 6. Obviously traffic stops since there are no paths to the
> destination. But for the outstanding WRs in the RC QP I don't get any
> callback from the verbs layer describing whether it succeeded or
> failed due to some error like IB_WC_RETRY_EXC_ERR.
> When I query the RC QP properties it still shows that it is in
> IB_QPS_RTS state.
>
>
> Without APM functionality it behaves correctly -
> 1. Create a RC QP and load only primary path
> (I was setting rnr_retry_count = 6, retry_count = 6,
> packet_life_time field of struct ib_sa_path_rec to 15 and also tried
> with 12)
> 2. Send some traffic over RC QP
> 3. Disconnect the cable belonging to the primary path
> 4. Obviously traffic stops since there are no paths to the
> destination. For the outstanding WRs in the RC QP I do get a callback
> from the verbs layer describing the first WR that it failed due to
> error IB_WC_RETRY_EXC_ERR and for all other WRs I get IB_WC_WR_FLUSH_ERR.
> I will close this RC QP.
More information about the general
mailing list