[ofa-general] Possible bug of inkernel OFED of RHEL5.3?

Liang Zhen Zhen.Liang at Sun.COM
Thu Apr 23 20:06:28 PDT 2009


Hi there,
I've posted this in rhel5-list, but I'm not sure whether it's the right
place so I post it here again...

We got this assertion while running inkernel OFED of RHEL5.3:

Apr 15 08:06:24 cl8-0 kernel: RTNL: assertion failed at
net/core/fib_rules.c (388)
Apr 15 08:06:24 cl8-0 kernel:
Apr 15 08:06:24 cl8-0 kernel: Call Trace:
Apr 15 08:06:24 cl8-0 kernel: [<ffffffff802288ee>] fib_rules_event+0x3d/0xff
Apr 15 08:06:24 cl8-0 kernel: [<ffffffff80066f1f>]
notifier_call_chain+0x20/0x32
Apr 15 08:06:24 cl8-0 kernel: [<ffffffff8021ba64>] dev_set_mtu+0x5a/0x60
Apr 15 08:06:24 cl8-0 kernel: [<ffffffff88446bb5>]
:ib_ipoib:set_mode+0x94/0x134
Apr 15 08:06:24 cl8-0 kernel: [<ffffffff80106b0a>]
sysfs_write_file+0xb9/0xe8
Apr 15 08:06:24 cl8-0 kernel: [<ffffffff8001659e>] vfs_write+0xce/0x174
Apr 15 08:06:24 cl8-0 kernel: [<ffffffff80016e6b>] sys_write+0x45/0x6e
Apr 15 08:06:24 cl8-0 kernel: [<ffffffff8005d116>] system_call+0x7e/0x83
Apr 15 08:06:24 cl8-0 kernel:
Apr 15 08:06:24 cl8-0 kernel: RTNL: assertion failed at
net/ipv4/devinet.c (986)
Apr 15 08:06:24 cl8-0 kernel:
Apr 15 08:06:24 cl8-0 kernel: Call Trace:
Apr 15 08:06:24 cl8-0 kernel: [<ffffffff8024e80c>] inetdev_event+0x48/0x282
Apr 15 08:06:24 cl8-0 kernel: [<ffffffff80066f1f>]
notifier_call_chain+0x20/0x32
Apr 15 08:06:24 cl8-0 kernel: [<ffffffff8021ba64>] dev_set_mtu+0x5a/0x60
Apr 15 08:06:24 cl8-0 kernel: [<ffffffff88446bb5>]
:ib_ipoib:set_mode+0x94/0x134
Apr 15 08:06:24 cl8-0 kernel: [<ffffffff80106b0a>]
sysfs_write_file+0xb9/0xe8
Apr 15 08:06:24 cl8-0 kernel: [<ffffffff8001659e>] vfs_write+0xce/0x174
Apr 15 08:06:24 cl8-0 kernel: [<ffffffff80016e6b>] sys_write+0x45/0x6e
Apr 15 08:06:24 cl8-0 kernel: [<ffffffff8005d116>] system_call+0x7e/0x83
Apr 15 08:06:24 cl8-0 kernel:

When looking into code I found:

sysfs_write_file()->flush_write_buffer()->store()->ipoib_cm.c::set_mode()->dev_set_mtu()->raw_notifier_call_chain->notifier_call_chain()->fib_rules_event()->ASSERT_RTNL().
So, ipoib_cm called dev_set_mtu without rtnl_lock, but dev_set_mtu will assert caller already has rtnl_lock.

I think we may need this patch, could somebody confirm this?

Thanks
Liang

--- drivers/infiniband/ulp/ipoib/ipoib_cm.c 2009-04-16
12:49:04.000000000 -0400
+++ drivers/infiniband/ulp/ipoib/ipoib_cm.c 2009-04-16
12:48:52.000000000 -0400
@@ -1481,7 +1481,9 @@ static ssize_t set_mode(struct class_dev
if (ipoib_cm_max_mtu(dev) > priv->mcast_mtu)
ipoib_warn(priv, "mtu > %d will cause multicast packet drops.\n",
priv->mcast_mtu);
+ rtnl_lock();
dev_set_mtu(dev, ipoib_cm_max_mtu(dev));
+ rtnl_unlock();

ipoib_flush_paths(dev);
return count;



More information about the general mailing list