[ofa-general] Re: [PATCH 1 of 9] Lock the entire mm to prevent any mmu related operation to happen

Thu Apr 17 10:14:43 PDT 2008

On Thu, Apr 17, 2008 at 11:36:42AM -0500, Robin Holt wrote:
> In this case, we are not making the call to unregister, we are waiting
> for the _release callout which has already removed it from the list.
> 
> In the event that the user has removed all the grants, we use unregister.
> That typically does not occur.  We merely wait for exit processing to
> clean up the structures.

Then it's very strange. LIST_POISON1 is set in n->next. If it was a
second hlist_del triggering the bug in theory list_poison2 should
trigger first, so perhaps it's really a notifier running despite a
mm_lock is taken? Could you post a full stack trace so I can see who's
running into LIST_POISON1? If it's really a notifier running outside
of some mm_lock that will be _immediately_ visible from the stack
trace that triggered the LIST_POISON1!

Also note, EMM isn't using the clean hlist_del, it's implementing list
by hand (with zero runtime gain) so all the debugging may not be
existent in EMM, so if it's really a mm_lock race, and it only
triggers with mmu notifiers and not with EMM, it doesn't necessarily
mean EMM is bug free. If you've a full stack trace it would greatly
help to verify what is mangling over the list when the oops triggers.

Thanks!
Andrea