[Openib-windows] A crash handling IB_PNP_CA_ADD event

Fabian Tillier ftillier at silverstorm.com
Sun Jun 11 23:03:19 PDT 2006


Hi Leo,

On 6/11/06, Leonid Keller <leonid at mellanox.co.il> wrote:
> Hi Fab,
>
> The crash happens while handling IB_PNP_CA_ADD event. IBAL propagates
> this event to a registrant - not clear, which - but the latter returns
> for some reason an error, and IBAL fails upon handling that: context_map
> and map_item seemed to be damaged.
> Do you have some idea ?

It looks like the error is happening while generating the port events
(PORT_ADD, followed by any necessary state changes).

The code does not handle a duplicate insertion into the context map -
for example if two ports of a device have the same GUID.  In that
case, the item is *not* in the map, which could explain the crash when
the item is subsequently removed.

Note that your line numbers in the stack trace seem to be 5 lines off
(early) - are you using the latest code?  Any idea where the
discrepancy might be coming from?

You can check to see if the item was already in the map by putting a
breakpoint in pnp_create_context, right on the line that inserts the
context in the map, and compare the returned p_item (should be in rax
if you're using a free build) to &p_context->map_item.  If they're not
the same, we have a problem.

> Could it have happened if an HCA made registartion the second time ?

Yes, that's also a possiblity - any way where you might end up with
duplicate GUIDs being added could cause this problem.  There is no
code that traps for duplicate CAs being registered - they're
maintained in a linked list that doesn't care about duplicates.

I don't think we need to handle duplicate CA additions beyond just
assertions.  It's easy enough to add that - in the add_ci_ca call, add
an assert that find_ci_ca returns NULL.

- Fab




More information about the ofw mailing list