[openib-general] [PATCH] [RFC] ofed_1_2 - SLES9SP3 Backport -IWCM workaround for ip_dev_find() bug.

Michael S. Tsirkin mst at mellanox.co.il
Tue Feb 6 12:32:54 PST 2007


> Quoting Steve Wise <swise at opengridcomputing.com>:
> Subject: Re: [openib-general] [PATCH] [RFC] ofed_1_2 - SLES9SP3 Backport -IWCM workaround for ip_dev_find() bug.
> 
> > > > 
> > > > Can we just backport our own version of ip_dev_find()?  We had this once before 
> > > > in svn when they removed it from being exported from the kernel.
> > > 
> > > Yes, this is in kernel_addons for 2.6.19 or something like that.
> > > Just copy from there, much cleaner than the patch.
> > > 	
> > 
> > I just realized that ip_dev_find() is being redefined to xxx_ip_dev_find
> > for sles9sp3.  So maybe this function is causing the error.  Stay tuned.
> 
> xxx_ip_dev_find() is returning the wrong interface (sometimes).  I added
> printks to xxx_ip_dev_find().  Then I ran rping -s -a <local ip addr>
> and it failed because xxx_ip_dev_find() returned loopback instead of my
> eth device.  
> 
> Here is the function with printks:
> 
> static inline struct net_device *xxx_ip_dev_find(u32 addr)
> {
>         struct net_device *dev;
>         u32 ip;
> 
>         read_lock(&dev_base_lock);
>         printk("%s looking for dev with addr %x\n", __FUNCTION__, addr);
>         for (dev = dev_base; dev; dev = dev->next) {
>                 ip = inet_select_addr(dev, 0, RT_SCOPE_LINK);
>                 printk("%s dev %p name %s ipaddr %x\n", __FUNCTION__,
>                         dev, dev->name, ip);
>                 if (ip == addr) {
>                         dev_hold(dev);
>                         break;
>                 }
>         }
>         read_unlock(&dev_base_lock);
> 
>         return dev;
> }
> 
> 
> Here is the printk log showing loopback being returned:
> 
> xxx_ip_dev_find looking for dev with addr 8846a8c0
> xxx_ip_dev_find dev ffffffff804000e0 name lo ipaddr 8846a8c0
> 
> The address bound to eth3 is 192.168.70.136 (0xc0a84688).  For some
> reason, this line:
> 
>                 ip = inet_select_addr(dev, 0, RT_SCOPE_LINK);
> 
> Returns the 192.168.70.136 address for device->name == "lo".
> 
> Riddle me that!
> 
> Also, sometimes it works ok because the loopback interface gets some
> other ip address that is assigned to the local system as opposed to my
> rdma address.  For example, I booted up the sles9sp3 system with a
> rebuilt kernel and no ofed modules installed.  The system gets
> 10.10.0.136 via DHCP for its "public" interface.  I then built the ofed
> modules and installed them.  I then loaded them and configured my rnic
> interface with 192.168.70.136.  I ran rping and bound to the local
> ipaddr and it worked.  The log showed that inet_select_addr() returned
> 10.10.0.136 for loopback and thus xxx_ip_dev_find() continued walking
> the list and found the correct ethernet interface.  I then rebooted and
> ran the test again and it failed.  So somehow module load order affects
> this, I think.
> 
> grrrr.


Try copying inet_select_addr source in from some upstream kernel,
look at that.

-- 
MST




More information about the general mailing list