[ewg] Re: Possible process deadlock in RMPP flow

Eli Cohen eli at dev.mellanox.co.il
Tue Oct 20 00:48:59 PDT 2009


On Mon, Oct 19, 2009 at 01:30:47PM -0700, Sean Hefty wrote:
> 
> I can't find anything off in the code for this.  It's odd, since
> unregister_mad_agent() does:
> 
>         flush_workqueue(port_priv->wq);
>         ib_cancel_rmpp_recvs(mad_agent_priv);
> 
> and ib_cancel_rmpp_recvs() does:
> 
>         spin_lock_irqsave(&agent->lock, flags);
>         list_for_each_entry(rmpp_recv, &agent->rmpp_list, list) {
>                 cancel_delayed_work(&rmpp_recv->timeout_work);
>                 cancel_delayed_work(&rmpp_recv->cleanup_work);
>         }
>         spin_unlock_irqrestore(&agent->lock, flags);
> 
>         flush_workqueue(agent->qp_info->port_priv->wq);
> 
> which basically just flushes the same work queue.
> 
> I haven't been able to reproduce the problem, but I'm running the latest kernel
> - not sure that matters in this case.  Does ibnetdiscover just hang forever at
> the end of the test when this occurs?  Is there any more information available?
> 

We are checking if the problem is a firmware bug, it looks like it.
Once we verify this I will send an update. 



More information about the ewg mailing list