[openib-general] [PATCH RFC ] ofed_1_2 simulate neighbour update events by snooping ARP packets

Steve Wise swise at opengridcomputing.com
Wed Jan 24 06:37:23 PST 2007


On Wed, 2007-01-24 at 12:14 +0200, Michael S. Tsirkin wrote:
> > Quoting Steve Wise <swise at opengridcomputing.com>:
> > Subject: [PATCH RFC ] ofed_1_2 simulate neighbour update events by snooping ARP packets
> > 
> > OFED/iWARP Developers,
> > 
> > Here is a proposal for supporting the minimum required neighbour update
> > event notifications needed for iwarp devices on the older kernels
> > supported by ofed.  
> > 
> > This patch is a request for comments.  Please review.  If you think it
> > looks ok, then I'll provide patches to all the various backports.
> > 
> > Steve
> 
> I am generally very positive about this, let's try to do this for OFED 1.2.
> Some comments on code:
> 
> > 2.6.17 backport: simulate neighbour update events by snooping ARP packets
> > 
> > Needed to support iWARP devices on backported kernels.  This also allows
> > using the current drivers/infiniband/core/addr.c which requires netevents
> > as well.
> > 
> > This patch rearranges things a bit:
> > 
> > - add the new file in the kernel_addons/backport dir for the ARP 
> >   snooping / netevent callout code.  This file is called 
> >   rdma_netevents.c.
> > 
> > - modify the kernel_patches/backports/2.6.17/linux_stuff* patch to 
> >   include rdma_netevents.c _and_ the netevent.c file into its own 
> >   module called rdma_ne
> 
> Maybe roll these two into a common netevent.c? Is there a reason not to?
> Are there kernels where you will want one of these but not the other?
> And the name is a bit confusing - nothing here is actually related to rdma in any way ...
> 

I kept them seperate because the netevent.c is pulled as-is from 2.6.20.
It does make sense to just add the stuff I put in rdma_netevent.c into
netevent.c and just have that one file.


> > - remove the backport patch to revert addr.c to snoop ARP packets. 
> > 
> > Signed-off-by: Steve Wise <swise at opengridcomputing.com>
> >
> >  .../backport/2.6.17/include/src/rdma_netevents.c   |   91 +++++++++++++++++++++++
> >  .../2.6.17/addr_1_netevents_revert_to_2_6_17.patch |   76 -------------------
> >  .../backport/2.6.17/linux_stuff_to_2_6_17.patch    |   13 ++-
> >  3 files changed, 99 insertions(+), 81 deletions(-)
> > 
> > diff --git a/kernel_addons/backport/2.6.17/include/src/rdma_netevents.c b/kernel_addons/backport/2.6.17/include/src/rdma_netevents.c
> > new file mode 100644
> > index 0000000..1e9422f
> > --- /dev/null
> > +++ b/kernel_addons/backport/2.6.17/include/src/rdma_netevents.c
> > @@ -0,0 +1,91 @@
> > +/*
> > + * Copyright (c) 2007 Open Grid Computing, Inc.  All rights reserved.
> > + * Copyright (c) 2007 Chelsio Communications, Inc.  All rights reserved.
> > + *
> > + * This Software is licensed under one of the following licenses:
> > + *
> > + * 1) under the terms of the "Common Public License 1.0" a copy of which is
> > + *    available from the Open Source Initiative, see
> > + *    http://www.opensource.org/licenses/cpl.php.
> > + *
> > + * 2) under the terms of the "The BSD License" a copy of which is
> > + *    available from the Open Source Initiative, see
> > + *    http://www.opensource.org/licenses/bsd-license.php.
> > + *
> > + * 3) under the terms of the "GNU General Public License (GPL) Version 2" a
> > + *    copy of which is available from the Open Source Initiative, see
> > + *    http://www.opensource.org/licenses/gpl-license.php.
> > + *
> > + * Licensee has the right to choose one of the above licenses.
> > + *
> > + * Redistributions of source code must retain the above copyright
> > + * notice and one of the license notices.
> > + *
> > + * Redistributions in binary form must reproduce both the above copyright
> > + * notice, one of the license notices in the documentation
> > + * and/or other materials provided with the distribution.
> > + *
> > + */
> > +
> > +/*
> > + * Simulate neighbour update netevents by snooping ARP packets.
> > + */
> > +
> > +#include <linux/if.h>
> > +#include <linux/netdevice.h>
> > +#include <linux/if_arp.h>
> > +
> > +#include <net/arp.h>
> > +#include <net/neighbour.h>
> > +#include <net/route.h>
> > +#include <net/netevent.h>
> > +
> > +MODULE_AUTHOR("Steve Wise");
> > +MODULE_DESCRIPTION("Netevent Notification Module");
> > +MODULE_LICENSE("Dual BSD/GPL");
> > +
> > +static int arp_recv(struct sk_buff *skb, struct net_device *dev,
> > +			 struct packet_type *pkt, struct net_device *dev2)
> > +{
> > +	struct arphdr *arp_hdr;
> > +	struct neighbour *n;
> > +	u8 *arp_ptr;
> > +	__be32 gw;
> > +	u16 op;
> > +
> > +	arp_hdr = (struct arphdr *) skb->nh.raw;
> > +	op = ntohs(arp_hdr->ar_op);
> > +
> > +	if (op == ARPOP_REQUEST || op == ARPOP_REPLY) {
> > +		arp_ptr = (u8 *)(arp_hdr + 1);	/* skip fixed-size arp header */
> 
> I think this is correct, but this looks weird because arp_hdr + 1
> is a pointer to an *invalid* arp header.

This is common practice for bumping past a fixed size header.  

> 
> I know arp_hdr + 1 does math in units of sizeof *arp_hdr, but just
> arp_ptr = skb->nh.raw + sizeof (struct arphdr) would much clearer -
> leave the pointer math for when there is an array.
> 
> And then you will not need a cast.
> 

Ok, if you think that is clearer, then I'll do it.

> 
> > +		arp_ptr += skb->dev->addr_len;	/* skip src ha */
> > +		memcpy(&gw, arp_ptr, 4);	/* pull the SPA */
> > +		n = neigh_lookup(&arp_tbl, &gw, skb->dev);
> > +		if (n) {
> > +			call_netevent_notifiers(NETEVENT_NEIGH_UPDATE, n);
> > +		}
> > +	}
> > +
> > +	kfree_skb(skb);
> > +	return 0;
> > +}
> > +
> > +static struct packet_type arp = {
> > +	.type = __constant_htons(ETH_P_ARP),
> > +	.func = arp_recv,
> > +	.af_packet_priv = (void *)1,
> > +};
> > +
> > +static int init(void)
> > +{
> > +	dev_add_pack(&arp);
> > +	return 0;
> > +}
> > +
> > +static void cleanup(void)
> > +{
> > +	dev_remove_pack(&arp);
> > +}
> > +
> > +module_init(init);
> > +module_exit(cleanup);
> > diff --git a/kernel_patches/backport/2.6.17/addr_1_netevents_revert_to_2_6_17.patch b/kernel_patches/backport/2.6.17/addr_1_netevents_revert_to_2_6_17.patch
> > deleted file mode 100644
> > index 316d8d2..0000000
> > --- a/kernel_patches/backport/2.6.17/addr_1_netevents_revert_to_2_6_17.patch
> > +++ /dev/null
> > @@ -1,76 +0,0 @@
> > -commit e795d092507d571d66f2ec98d3efdc7dd284bf80
> > -Author: Tom Tucker <tom at opengridcomputing.com>
> > -Date:   Sun Jul 30 20:44:19 2006 -0700
> > -
> > -    [NET] infiniband: Cleanup ib_addr module to use the netevents
> > -    
> > -    Signed-off-by: Tom Tucker <tom at opengridcomputing.com>
> > -    Signed-off-by: Steve Wise <swise at opengridcomputing.com>
> > -    Signed-off-by: David S. Miller <davem at davemloft.net>
> > -
> > -diff --git a/drivers/infiniband/core/addr.c b/drivers/infiniband/core/addr.c
> > -index 1205e80..d294bbc 100644
> > ---- a/drivers/infiniband/core/addr.c
> > -+++ b/drivers/infiniband/core/addr.c
> > -@@ -35,7 +35,6 @@ #include <linux/if_arp.h>
> > - #include <net/arp.h>
> > - #include <net/neighbour.h>
> > - #include <net/route.h>
> > --#include <net/netevent.h>
> > - #include <rdma/ib_addr.h>
> > - 
> > - MODULE_AUTHOR("Sean Hefty");
> > -@@ -327,22 +326,25 @@ void rdma_addr_cancel(struct rdma_dev_ad
> > - }
> > - EXPORT_SYMBOL(rdma_addr_cancel);
> > - 
> > --static int netevent_callback(struct notifier_block *self, unsigned long event, 
> > --	void *ctx)
> > -+static int addr_arp_recv(struct sk_buff *skb, struct net_device *dev,
> > -+			 struct packet_type *pkt, struct net_device *orig_dev)
> > - {
> > --	if (event == NETEVENT_NEIGH_UPDATE) {  
> > --		struct neighbour *neigh = ctx;
> > -+	struct arphdr *arp_hdr;
> > - 
> > --		if (neigh->dev->type == ARPHRD_INFINIBAND &&
> > --		    (neigh->nud_state & NUD_VALID)) {
> > --			set_timeout(jiffies);
> > --		}
> > --	}
> > -+	arp_hdr = (struct arphdr *) skb->nh.raw;
> > -+
> > -+	if (arp_hdr->ar_op == htons(ARPOP_REQUEST) ||
> > -+	    arp_hdr->ar_op == htons(ARPOP_REPLY))
> > -+		set_timeout(jiffies);
> > -+
> > -+	kfree_skb(skb);
> > - 	return 0;
> > - }
> > - 
> > --static struct notifier_block nb = {
> > --	.notifier_call = netevent_callback
> > -+static struct packet_type addr_arp = {
> > -+	.type           = __constant_htons(ETH_P_ARP),
> > -+	.func           = addr_arp_recv,
> > -+	.af_packet_priv = (void*) 1,
> > - };
> > - 
> > - static int addr_init(void)
> > -@@ -351,13 +353,13 @@ static int addr_init(void)
> > - 	if (!addr_wq)
> > - 		return -ENOMEM;
> > - 
> > --	register_netevent_notifier(&nb);
> > -+	dev_add_pack(&addr_arp);
> > - 	return 0;
> > - }
> > - 
> > - static void addr_cleanup(void)
> > - {
> > --	unregister_netevent_notifier(&nb);
> > -+	dev_remove_pack(&addr_arp);
> > - 	destroy_workqueue(addr_wq);
> > - }
> > - 
> > -
> > diff --git a/kernel_patches/backport/2.6.17/linux_stuff_to_2_6_17.patch b/kernel_patches/backport/2.6.17/linux_stuff_to_2_6_17.patch
> > index eb2285f..af7e814 100644
> > --- a/kernel_patches/backport/2.6.17/linux_stuff_to_2_6_17.patch
> > +++ b/kernel_patches/backport/2.6.17/linux_stuff_to_2_6_17.patch
> > @@ -5,20 +5,23 @@ index 0000000..58cf933
> >  +++ b/drivers/infiniband/core/genalloc.c
> >  @@ -0,0 +1 @@
> >  +#include "src/genalloc.c"
> > -diff --git a/drivers/infiniband/core/netevent.c b/drivers/infiniband/core/netevent.c
> > +diff --git a/drivers/infiniband/core/rdma_netevents.c b/drivers/infiniband/core/rdma_netevents.c
> >  new file mode 100644
> >  index 0000000..58cf933
> >  --- /dev/null
> > -+++ b/drivers/infiniband/core/netevent.c
> > -@@ -0,0 +1 @@
> > ++++ b/drivers/infiniband/core/rdma_netevents.c
> > +@@ -0,0 +1,2 @@
> >  +#include "src/netevent.c"
> > ++#include "src/rdma_netevents.c"
> 
> This is slightly ugly. Let's have an object file per .c file.
> Or just merge the two .c files together?
> 

ok.


> >  diff --git a/drivers/infiniband/core/Makefile b/drivers/infiniband/core/Makefile
> >  index 50fb1cd..456bfd0 100644
> >  --- a/drivers/infiniband/core/Makefile
> >  +++ b/drivers/infiniband/core/Makefile
> > -@@ -30,3 +30,5 @@ ib_ucm-y :=			ucm.o
> > +@@ -30,3 +30,7 @@ ib_ucm-y :=			ucm.o
> >   
> >   ib_uverbs-y :=			uverbs_main.o uverbs_cmd.o uverbs_mem.o \
> >   				uverbs_marshall.o
> >  +
> > -+ib_core-y +=			genalloc.o netevent.o
> > ++infiniband-$(CONFIG_INFINIBAND_ADDR_TRANS) += rdma_ne.o
> > ++rdma_ne-y :=			rdma_netevents.o
> > ++ib_core-y +=			genalloc.o 
> 
> I'd prefer not to have a new module rdma_ne. Scripts need to be written to install it,
> and making these kernel dependent is a big pain.
> Can we continue keeping it in ib_core?
> Or move to ib_addr you see a problem with this.
> 

The nice thing about making it a stand-alone module is that its init
function gets called when loaded.  If we add it to ib_core or ib_addr,
then we'll have to modify one of those init functions.

I'm all for keeping this simple, so do you have a suggestion for how to
call the init function?

One last (major) issue:  

After testing, I've discovered that this method (arp pkt snooping) has a
problem in that our recv function is getting called with the incoming
ARP packet __before__ the IPv4 ARP code.  Thus the neigh entries are not
yet updated with the results of the incoming ARP packet.  This
effectively makes notifications useless since the neigh entry hasn't yet
been updated! :-(

I have coded up a solution for this:  The arp_recv function will simply
install a destructor method on the skb (skb->destructor).  When the IPv4
code frees the skb, the destructor function will get called and I can
then look up the neigh entry and call the notifier chain.  This solution
works nicely, but it is a bit of HACK.    

Thoughts?









More information about the general mailing list