[ofa-general] Missing IB_EVENT_PATH_MIG events

Dotan Barak dotanb at dev.mellanox.co.il
Mon Oct 15 23:15:06 PDT 2007


Hi.

lbt wrote:
> Hi,
>
> I'm trying out APM with OFED 1.2 , using Mellanox dual-port HCA 
> (ib_mthca driver).  When I have several RCQP's that I am trying to 
> migrate (software triggered migration using ib_modify_qp), I've 
> noticed that sometimes 1 or 2 of the remote QP's never generate an 
> IB_EVENT_PATH_MIG or even an IB_EVENT_PATH_MIG_ERR ... it seems that 
> it just gets lost. I looked through some of the ib_mthca patches in 
> git.kernel.org/?p=linux/kernel/git/roland/infiniband.git 
> <http://git.kernel.org/?p=linux/kernel/git/roland/infiniband.git>, and 
> incorporated the mmiowb patch for ib_mthca commands 
> (http://git.kernel.org/?p=linux/kernel/git/roland/infiniband.git;a=commit;h=76d7cc0345a037e8eea426f8abc710abd22946dd 
> <http://git.kernel.org/?p=linux/kernel/git/roland/infiniband.git;a=commit;h=76d7cc0345a037e8eea426f8abc710abd22946dd>).  
> But still seeing same issue. I have a test case that repeates 
> software-triggered migrations + rearming in a loop, and this problem 
> usually occurs in the first few cycles, but is not too frequent. If 
> anyone has any ideas on what might be wrong, or tips on  where I can 
> look/do to debug this, that would be very much appreciated!
>
> For example, this is the console output I will see (printed out by our 
> rcqp event handler):
> On the local end - initiates software triggered migration, using 
> ib_modify_qp:
> Event IB_EVENT_PATH_MIG occurred on QP#1043
> Event IB_EVENT_PATH_MIG occurred on QP#1040
> Event IB_EVENT_PATH_MIG occurred on QP#1033
>
> On the remote end:
> Event IB_EVENT_PATH_MIG occurred on QP#1040
> Event IB_EVENT_PATH_MIG occurred on QP#1043
Is
the timeout value (in the QP attributes) is 0?
If the answer is no, can you please supply some more details on this?


thanks
Dotan



More information about the general mailing list