[openib-general] [PATCH] [RFC] ofed_1_2 - SLES9SP3 Backport -IWCM workaround for ip_dev_find() bug.
Michael S. Tsirkin
mst at mellanox.co.il
Tue Feb 6 12:32:54 PST 2007
> Quoting Steve Wise <swise at opengridcomputing.com>:
> Subject: Re: [openib-general] [PATCH] [RFC] ofed_1_2 - SLES9SP3 Backport -IWCM workaround for ip_dev_find() bug.
>
> > > >
> > > > Can we just backport our own version of ip_dev_find()? We had this once before
> > > > in svn when they removed it from being exported from the kernel.
> > >
> > > Yes, this is in kernel_addons for 2.6.19 or something like that.
> > > Just copy from there, much cleaner than the patch.
> > >
> >
> > I just realized that ip_dev_find() is being redefined to xxx_ip_dev_find
> > for sles9sp3. So maybe this function is causing the error. Stay tuned.
>
> xxx_ip_dev_find() is returning the wrong interface (sometimes). I added
> printks to xxx_ip_dev_find(). Then I ran rping -s -a <local ip addr>
> and it failed because xxx_ip_dev_find() returned loopback instead of my
> eth device.
>
> Here is the function with printks:
>
> static inline struct net_device *xxx_ip_dev_find(u32 addr)
> {
> struct net_device *dev;
> u32 ip;
>
> read_lock(&dev_base_lock);
> printk("%s looking for dev with addr %x\n", __FUNCTION__, addr);
> for (dev = dev_base; dev; dev = dev->next) {
> ip = inet_select_addr(dev, 0, RT_SCOPE_LINK);
> printk("%s dev %p name %s ipaddr %x\n", __FUNCTION__,
> dev, dev->name, ip);
> if (ip == addr) {
> dev_hold(dev);
> break;
> }
> }
> read_unlock(&dev_base_lock);
>
> return dev;
> }
>
>
> Here is the printk log showing loopback being returned:
>
> xxx_ip_dev_find looking for dev with addr 8846a8c0
> xxx_ip_dev_find dev ffffffff804000e0 name lo ipaddr 8846a8c0
>
> The address bound to eth3 is 192.168.70.136 (0xc0a84688). For some
> reason, this line:
>
> ip = inet_select_addr(dev, 0, RT_SCOPE_LINK);
>
> Returns the 192.168.70.136 address for device->name == "lo".
>
> Riddle me that!
>
> Also, sometimes it works ok because the loopback interface gets some
> other ip address that is assigned to the local system as opposed to my
> rdma address. For example, I booted up the sles9sp3 system with a
> rebuilt kernel and no ofed modules installed. The system gets
> 10.10.0.136 via DHCP for its "public" interface. I then built the ofed
> modules and installed them. I then loaded them and configured my rnic
> interface with 192.168.70.136. I ran rping and bound to the local
> ipaddr and it worked. The log showed that inet_select_addr() returned
> 10.10.0.136 for loopback and thus xxx_ip_dev_find() continued walking
> the list and found the correct ethernet interface. I then rebooted and
> ran the test again and it failed. So somehow module load order affects
> this, I think.
>
> grrrr.
Try copying inet_select_addr source in from some upstream kernel,
look at that.
--
MST
More information about the general
mailing list