[openib-general] Fwd: issues in ipoib

Michael S. Tsirkin mst at mellanox.co.il
Thu Aug 3 16:52:54 PDT 2006


Quoting r. Sean Hefty <sean.hefty at intel.com>:
> Subject: RE: Fwd: issues in ipoib
> 
> >> We don't know that sa_query->client is still valid here unless a
> >> reference was taken earlier.
> >
> >Yes we do.
> >Client must wait till all queries complete before unregistering.
> >
> >> I like this approach, but would like to see it expanded to track the
> >requests,
> >> to avoid duplicating this work in every client.  Unregistration would then
> >> cancel all outstanding queries issued by the user.
> >
> >I would not object to this on principle, but let's go there by small
> >steps - for now, let's get the API fixed and  solve the race with module
> >unloading that we have for 2.6.18. OK?
> 
> I don't see the need to rush for 2.6.18.  No one is hitting this problem,

Well, I know OFED 1.0 shipped with a different fix for this race
so that's one reason people do not complain :)
I don't think we should leave races that we know about and that
are easy to fix  in development kernels - these are not -stable rules.

When there's a module unload crash at customer's site, I
do not want to spend time trying to puzzle out whether
this could or could not be related to this window - I want
to have it covered, and think about something else.

> and it
> still applies to the ib_cm, ib_addr, and rdma_cm.
> 

This race only applies to ib_addr - CM cleans up after itself flushing
out callbacks when CM ID is destroyed.
As you can see the patch is a trivial fix - it's just too
late in the night here to code more, but I will do this on Sunday

Roland, what is your stance? Can this fix be merged for 2.6.18?

> re-working the patch in a week.  Tracking queries will require changes to the
> structure that are best hidden from the users.  Hiding those changes requires
> reworking the proposed ib_sa_register_client API to create and return struct
> ib_sa_client, rather than it being provided by the caller.

In kernel API's need not be stable.
Since the code is not even yet written, why try to anticipate
its needs now?

-- 
MST




More information about the general mailing list