[ofw] issue with checkin# 3122

Smith, Stan stan.smith at intel.com
Fri Aug 19 15:09:24 PDT 2011


Uri,
  I was unable to reproduce the DEBUG ASSERT() firing with ConnectX hardware [enable/disable, shutdown cycles] or Mthca hardware.
The offending ASSERT() was removed to ensure forward progress for all; the PNP port_remove() race condition is still not fully understood.

Stan.

Revision: 3267
Author: stansmith
Date: 3:00:15 PM, Friday, August 19, 2011
Message:
[BUS] removed DBG ASSERT on Uri's request as it fires during DBG version WHQL testing; I was unable to get the ASSERT to fire on ConnectX disable or shutdown. There is a port destruction race condition here.
Call to port_remove() skipped if !p_ctx as port_remove itself checks for null p_ctx.
----
Modified : /gen1/trunk/core/bus/kernel/bus_port_mgr.c


From: Uri Habusha [mailto:urih at mellanox.co.il]
Sent: Thursday, July 21, 2011 4:49 AM
To: Smith, Stan
Cc: ofw at lists.openfabrics.org; Leonid Keller; Tzachi Dar; Gilad Margalit; Benyahu Mizrahi
Subject: issue with checkin# 3122

Hi Stan,

I adopted your checkin# 3122 - IOC poll on demand.

When disabling the drive an ASSERT is popup. The ASSERT is in  following code in port_mgr_pnp_cb function

                                CL_ASSERT( p_ctx );    <== The problematic assert
                                if (p_ctx)
                                {
                                                p_bfi = p_ctx->p_bus_filter;
                                                CL_ASSERT( p_bfi );
                                                if (p_bfi->p_port_mgr->active_ports > 0)
                                                                cl_atomic_dec( &p_bfi->p_port_mgr->active_ports );
                                }
                                port_mgr_port_remove( (ib_pnp_port_rec_t*)p_pnp_rec );
                                break;

I noticed that in the port_mgr_port_remove there is a check if the ctx is valid or not. So I guess it's a known issue that can happen. For now I removed the assert in our code.

Please take a look in the code and see if it's valid fix (if so please change ofw code accordingly) or debug the issue. It happens when disable \enable the low level driver.

Uri


Uri Habusha
Windows SW Development Lead

Mellanox Technologies
P.OBox 586, Yokneam 20692
Israel



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/ofw/attachments/20110819/5840af02/attachment.html>


More information about the ofw mailing list