[Openib-windows] A crash handling IB_PNP_CA_ADD event
Fabian Tillier
ftillier at silverstorm.com
Sun Jun 11 23:03:19 PDT 2006
Hi Leo,
On 6/11/06, Leonid Keller <leonid at mellanox.co.il> wrote:
> Hi Fab,
>
> The crash happens while handling IB_PNP_CA_ADD event. IBAL propagates
> this event to a registrant - not clear, which - but the latter returns
> for some reason an error, and IBAL fails upon handling that: context_map
> and map_item seemed to be damaged.
> Do you have some idea ?
It looks like the error is happening while generating the port events
(PORT_ADD, followed by any necessary state changes).
The code does not handle a duplicate insertion into the context map -
for example if two ports of a device have the same GUID. In that
case, the item is *not* in the map, which could explain the crash when
the item is subsequently removed.
Note that your line numbers in the stack trace seem to be 5 lines off
(early) - are you using the latest code? Any idea where the
discrepancy might be coming from?
You can check to see if the item was already in the map by putting a
breakpoint in pnp_create_context, right on the line that inserts the
context in the map, and compare the returned p_item (should be in rax
if you're using a free build) to &p_context->map_item. If they're not
the same, we have a problem.
> Could it have happened if an HCA made registartion the second time ?
Yes, that's also a possiblity - any way where you might end up with
duplicate GUIDs being added could cause this problem. There is no
code that traps for duplicate CAs being registered - they're
maintained in a linked list that doesn't care about duplicates.
I don't think we need to handle duplicate CA additions beyond just
assertions. It's easy enough to add that - in the add_ci_ca call, add
an assert that find_ci_ca returns NULL.
- Fab
More information about the ofw
mailing list