[openib-general] [PATCH] [RFC] ofed_1_2 - SLES9SP3 Backport - IWCM workaround for ip_dev_find() bug.

Steve Wise swise at opengridcomputing.com
Tue Feb 6 12:24:43 PST 2007


> > > 
> > > Can we just backport our own version of ip_dev_find()?  We had this once before 
> > > in svn when they removed it from being exported from the kernel.
> > 
> > Yes, this is in kernel_addons for 2.6.19 or something like that.
> > Just copy from there, much cleaner than the patch.
> > 	
> 
> I just realized that ip_dev_find() is being redefined to xxx_ip_dev_find
> for sles9sp3.  So maybe this function is causing the error.  Stay tuned.

xxx_ip_dev_find() is returning the wrong interface (sometimes).  I added
printks to xxx_ip_dev_find().  Then I ran rping -s -a <local ip addr>
and it failed because xxx_ip_dev_find() returned loopback instead of my
eth device.  

Here is the function with printks:

static inline struct net_device *xxx_ip_dev_find(u32 addr)
{
        struct net_device *dev;
        u32 ip;

        read_lock(&dev_base_lock);
        printk("%s looking for dev with addr %x\n", __FUNCTION__, addr);
        for (dev = dev_base; dev; dev = dev->next) {
                ip = inet_select_addr(dev, 0, RT_SCOPE_LINK);
                printk("%s dev %p name %s ipaddr %x\n", __FUNCTION__,
                        dev, dev->name, ip);
                if (ip == addr) {
                        dev_hold(dev);
                        break;
                }
        }
        read_unlock(&dev_base_lock);

        return dev;
}


Here is the printk log showing loopback being returned:

xxx_ip_dev_find looking for dev with addr 8846a8c0
xxx_ip_dev_find dev ffffffff804000e0 name lo ipaddr 8846a8c0

The address bound to eth3 is 192.168.70.136 (0xc0a84688).  For some
reason, this line:

                ip = inet_select_addr(dev, 0, RT_SCOPE_LINK);

Returns the 192.168.70.136 address for device->name == "lo".

Riddle me that!

Also, sometimes it works ok because the loopback interface gets some
other ip address that is assigned to the local system as opposed to my
rdma address.  For example, I booted up the sles9sp3 system with a
rebuilt kernel and no ofed modules installed.  The system gets
10.10.0.136 via DHCP for its "public" interface.  I then built the ofed
modules and installed them.  I then loaded them and configured my rnic
interface with 192.168.70.136.  I ran rping and bound to the local
ipaddr and it worked.  The log showed that inet_select_addr() returned
10.10.0.136 for loopback and thus xxx_ip_dev_find() continued walking
the list and found the correct ethernet interface.  I then rebooted and
ran the test again and it failed.  So somehow module load order affects
this, I think.

grrrr.


Steve.






More information about the general mailing list