[openib-general] crash in mthca soon after loading drivers
Sean Hefty
mshefty at ichips.intel.com
Wed Dec 8 17:12:16 PST 2004
Sean Hefty wrote:
> I'm getting the following bug in mthca when loading the drivers (core,
> mad, and mthca). The system is attached to a fabric with opensm running
> on top of the Mellanox gold software stack. I hit this when running
> with the tip of openib. Any help would be, well, helpful.
>
> - Sean
>
>
> Dec 8 14:53:47 mshefty-linux2 kernel: kernel BUG at
> drivers/infiniband/hw/mthca/mthca_cmd.c:328!
I still need to spend more time investigating this, but looking at
mthca_cmd_wait():
if (down_interruptible(&dev->cmd.event_sem))
return -EINTR;
spin_lock(&dev->cmd.context_lock);
BUG_ON(dev->cmd.free_head < 0);
context = &dev->cmd.context[dev->cmd.free_head];
dev->cmd.free_head = context->next;
spin_unlock(&dev->cmd.context_lock);
...snip...
wait_for_completion(&context->done);
***** possible race here *****
...snip...
out:
spin_lock(&dev->cmd.context_lock);
context->next = dev->cmd.free_head;
dev->cmd.free_head = context - dev->cmd.context;
spin_unlock(&dev->cmd.context_lock);
There appears to be a race here where event_sem can be incremented (in
mthca_cmd_complete()), but free_head has not yet been updated. A
second call to mthca_cmd_wait could then get the semaphore, but find
the list empty, leading to the bug. In my case, max_cmd is set to 1.
I need to verify if this is indeed what is happening, and if so what to
do to fix it.
- Sean
More information about the general
mailing list