[ofa-general] Re: Kernel panic in IPoIB stability testing

Jack Morgenstein jackm at dev.mellanox.co.il
Tue Feb 3 23:20:25 PST 2009


On Wednesday 04 February 2009 08:46, Jack Morgenstein wrote:
> On Tuesday 03 February 2009 19:56, Yossi Etigin wrote:
> > I think it comes from unicast_arp_send.
> > Consider this scenario:
> > - paths are flushed (opensm up/down).
> > - unicast_arp_send() is called with a path in priv->path_list. path->valid is 0.
> > - path_rec_start() fails with -EAGAIN (-11) because alloc_mad() fails - no sm ah (yet)
> >   (see the prints just before the panic).
> > - unicast_arp_send calls() path_free().
> > - path memory is overwritten.
> > - __ipoib_dev_flush() is called again.
> > - mark_paths_invalid() tries to iterate over priv->path_list and gets kernel panic
> >   because path->list became invalid.
> 
> I think you are right.
How about this:
	path = __path_find(dev, phdr->hwaddr + 4);
	if (!path || !path->valid) {
		int had_path = 0;
		if (!path)
			path = path_rec_create(dev, phdr->hwaddr + 4);
		else
			had_path = 1;
		if (path) {
			/* put pseudoheader back on for next time */
			skb_push(skb, sizeof *phdr);
			__skb_queue_tail(&path->queue, skb);

			if (path_rec_start(dev, path)) {
				if (had_path) {
					list_del(&path->list);
					rb_erase(&path->rb_node,
						 &priv->path_tree);
				}
				spin_unlock(&priv->lock);
				path_free(dev, path);
				return;
			} else if (!had_path)
				__path_add(dev, path);
		} else {
			++dev->stats.tx_dropped;
			dev_kfree_skb_any(skb);
		}

		spin_unlock(&priv->lock);
		return;
	}

- Jack



More information about the general mailing list