[ofw] issue with checkin# 3122

Uri Habusha urih at mellanox.co.il
Sun Jul 24 04:31:58 PDT 2011


Leonid Can you answer the questions below.

In general the assert happens when disabling the mlx4_bus driver

From: Smith, Stan [mailto:stan.smith at intel.com]
Sent: Friday, July 22, 2011 1:23 AM
To: Uri Habusha
Cc: ofw at lists.openfabrics.org; Leonid Keller; Tzachi Dar; Gilad Margalit; Benyahu Mizrahi
Subject: RE: issue with checkin# 3122

Thanks for the note, I'll look into ASAP.
Some questions:
When you mention low-level driver you are speaking of the HCA driver?

Does the CL_ASSERT() fire if the registry entry ioc_poll_interval == 30000?

Is the system which CL_ASSERT()'ed attached to a fabric which has an IB target device (IOC/IOU)?

What user-level events occurred before the HCA was disabled? Anything interesting? Perhaps a 'devcon rescan'?

If p_ctx is NULL then the port object has been destroyed. The issue, I'm guessing at this point, might be a race between PNP and port_destroy()?
A missing lock call perhaps?
Will let you know.

Stan.

From: Uri Habusha [mailto:urih at mellanox.co.il]
Sent: Thursday, July 21, 2011 4:49 AM
To: Smith, Stan
Cc: ofw at lists.openfabrics.org; Leonid Keller; Tzachi Dar; Gilad Margalit; Benyahu Mizrahi
Subject: issue with checkin# 3122

Hi Stan,

I adopted your checkin# 3122 - IOC poll on demand.

When disabling the drive an ASSERT is popup. The ASSERT is in  following code in port_mgr_pnp_cb function

                                CL_ASSERT( p_ctx );    <== The problematic assert
                                if (p_ctx)
                                {
                                                p_bfi = p_ctx->p_bus_filter;
                                                CL_ASSERT( p_bfi );
                                                if (p_bfi->p_port_mgr->active_ports > 0)
                                                                cl_atomic_dec( &p_bfi->p_port_mgr->active_ports );
                                }
                                port_mgr_port_remove( (ib_pnp_port_rec_t*)p_pnp_rec );
                                break;

I noticed that in the port_mgr_port_remove there is a check if the ctx is valid or not. So I guess it's a known issue that can happen. For now I removed the assert in our code.

Please take a look in the code and see if it's valid fix (if so please change ofw code accordingly) or debug the issue. It happens when disable \enable the low level driver.

Uri


Uri Habusha
Windows SW Development Lead

Mellanox Technologies
P.OBox 586, Yokneam 20692
Israel



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/ofw/attachments/20110724/5a9c80ac/attachment.html>


More information about the ofw mailing list