From bugzilla-daemon at lists.openfabrics.org  Thu Feb  1 00:40:18 2007
From: bugzilla-daemon at lists.openfabrics.org (bugzilla-daemon at lists.openfabrics.org)
Date: Thu,  1 Feb 2007 00:40:18 -0800 (PST)
Subject: [openib-general] [Bug 334] Problems with build
	OFED-1.1.1-ib_local_sa
In-Reply-To: <bug-334-1@https.bugs.openfabrics.org/>
Message-ID: <20070201084018.6FDD3E607F7@openfabrics.org>

https://bugs.openfabrics.org/show_bug.cgi?id=334


------- Comment #2 from erezz at voltaire.com  2007-02-01 00:40 -------
Created an attachment (id=71)
 --> (https://bugs.openfabrics.org/attachment.cgi?id=71&action=view)
ofed.conf


-- 
Configure bugmail: https://bugs.openfabrics.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at lists.openfabrics.org  Thu Feb  1 00:40:39 2007
From: bugzilla-daemon at lists.openfabrics.org (bugzilla-daemon at lists.openfabrics.org)
Date: Thu,  1 Feb 2007 00:40:39 -0800 (PST)
Subject: [openib-general] [Bug 334] Problems with build
	OFED-1.1.1-ib_local_sa
In-Reply-To: <bug-334-1@https.bugs.openfabrics.org/>
Message-ID: <20070201084039.988DBE607F8@openfabrics.org>

https://bugs.openfabrics.org/show_bug.cgi?id=334


------- Comment #3 from erezz at voltaire.com  2007-02-01 00:40 -------
Created an attachment (id=72)
 --> (https://bugs.openfabrics.org/attachment.cgi?id=72&action=view)
ofed_net.conf


-- 
Configure bugmail: https://bugs.openfabrics.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at lists.openfabrics.org  Thu Feb  1 00:50:19 2007
From: bugzilla-daemon at lists.openfabrics.org (bugzilla-daemon at lists.openfabrics.org)
Date: Thu,  1 Feb 2007 00:50:19 -0800 (PST)
Subject: [openib-general] [Bug 334] Problems with build
	OFED-1.1.1-ib_local_sa
In-Reply-To: <bug-334-1@https.bugs.openfabrics.org/>
Message-ID: <20070201085019.D3E05E607F7@openfabrics.org>

https://bugs.openfabrics.org/show_bug.cgi?id=334


erezz at voltaire.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |erezz at voltaire.com
          Component|IB Core                     |iSER


------- Comment #4 from erezz at voltaire.com  2007-02-01 00:50 -------
I wasn't able to reproduce this behavior. I made the cma fix:

diff -ru openib-1.1/drivers/infiniband/core/cma.c
openib-1.1-cma-fix/drivers/infiniband/core/cma.c
--- openib-1.1/drivers/infiniband/core/cma.c    2006-12-13 00:36:17.000000000
+0200
+++ openib-1.1-cma-fix/drivers/infiniband/core/cma.c    2007-02-01
09:57:47.000000000 +0200
@@ -43,6 +43,7 @@
 #include <rdma/ib_cache.h>
 #include <rdma/ib_cm.h>
 #include <rdma/ib_sa.h>
+#include <rdma/ib_local_sa.h>

 MODULE_AUTHOR("Sean Hefty");
 MODULE_DESCRIPTION("Generic RDMA CM Agent");

Before installing OFED, I installed the open-iscsi package that was shipped
with SLES 10 (open-iscsi-0.5.545-9.12). Then, the installation was successful:

thyme:/tmp/OFED-1.1.1-ib_local_sa # ./install.sh -c ofed.conf -net
ofed_net.conf

Removing previous InfiniBand Software installations


Installing OFED software into /usr/local/ofed

Running /bin/rpm -ihv --force --nodeps
/tmp/OFED-1.1.1-ib_local_sa/RPMS/sles-release-10-15.2/kernel-ib-1.1-2.6.16.21_0.8_smp.x86_64.rpm
/tmp/OFED-1.1.1-ib_local_sa/RPMS/sles-release-10-15.2/kernel-ib-devel-1.1-2.6.16.21_0.8_smp.x86_64.rpm

Running /bin/rpm -ihv
/tmp/OFED-1.1.1-ib_local_sa/RPMS/sles-release-10-15.2/libibcm-0.9.0-0.x86_64.rpm
/tmp/OFED-1.1.1-ib_local_sa/RPMS/sles-release-10-15.2/libibcm-devel-0.9.0-0.x86_64.rpm
/tmp/OFED-1.1.1-ib_local_sa/RPMS/sles-release-10-15.2/libibcommon-1.0-0.x86_64.rpm
/tmp/OFED-1.1.1-ib_local_sa/RPMS/sles-release-10-15.2/libibcommon-devel-1.0-0.x86_64.rpm
/tmp/OFED-1.1.1-ib_local_sa/RPMS/sles-release-10-15.2/libibmad-1.0-0.x86_64.rpm
/tmp/OFED-1.1.1-ib_local_sa/RPMS/sles-release-10-15.2/libibmad-devel-1.0-0.x86_64.rpm
/tmp/OFED-1.1.1-ib_local_sa/RPMS/sles-release-10-15.2/libibumad-1.0-0.x86_64.rpm
/tmp/OFED-1.1.1-ib_local_sa/RPMS/sles-release-10-15.2/libibumad-devel-1.0-0.x86_64.rpm
/tmp/OFED-1.1.1-ib_local_sa/RPMS/sles-release-10-15.2/libibverbs-1.0.4-0.x86_64.rpm
/tmp/OFED-1.1.1-ib_local_sa/RPMS/sles-release-10-15.2/libibverbs-devel-1.0.4-0.x86_64.rpm
/tmp/OFED-1.1.1-ib_local_sa/RPMS/sles-release-10-15.2/libibverbs-utils-1.0.4-0.x86_64.rpm
/tmp/OFED-1.1.1-ib_local_sa/RPMS/sles-release-10-15.2/libmthca-1.0.3-0.x86_64.rpm
/tmp/OFED-1.1.1-ib_local_sa/RPMS/sles-release-10-15.2/libmthca-devel-1.0.3-0.x86_64.rpm
/tmp/OFED-1.1.1-ib_local_sa/RPMS/sles-release-10-15.2/libopensm-2.0.0-0.x86_64.rpm
/tmp/OFED-1.1.1-ib_local_sa/RPMS/sles-release-10-15.2/libosmcomp-2.0.0-0.x86_64.rpm
/tmp/OFED-1.1.1-ib_local_sa/RPMS/sles-release-10-15.2/libosmvendor-2.0.0-0.x86_64.rpm
/tmp/OFED-1.1.1-ib_local_sa/RPMS/sles-release-10-15.2/librdmacm-0.9.0-0.x86_64.rpm
/tmp/OFED-1.1.1-ib_local_sa/RPMS/sles-release-10-15.2/librdmacm-devel-0.9.0-0.x86_64.rpm
/tmp/OFED-1.1.1-ib_local_sa/RPMS/sles-release-10-15.2/librdmacm-utils-0.9.0-0.x86_64.rpm
/tmp/OFED-1.1.1-ib_local_sa/RPMS/sles-release-10-15.2/openib-diags-1.1.0-0.x86_64.rpm

Running /bin/rpm -Uhv
/tmp/OFED-1.1.1-ib_local_sa/RPMS/sles-release-10-15.2/oiscsi-iser-support-1-1.x86_64.rpm

Running /bin/rpm -Uhv
/tmp/OFED-1.1.1-ib_local_sa/RPMS/sles-release-10-15.2/ofed-docs-1.1.1-0.noarch.rpm

Running /bin/rpm -Uhv
/tmp/OFED-1.1.1-ib_local_sa/RPMS/sles-release-10-15.2/ofed-scripts-1.1.1-0.noarch.rpm


IPoIB configuration for ib0:

IPADDR=192.168.10.58
NETMASK=255.255.255.0
NETWORK=192.168.10.0
BROADCAST=192.168.10.255
ONBOOT=yes

IPoIB configuration for ib1:

IPADDR=195.168.10.58
NETMASK=255.255.10.0
NETWORK=195.168.10.0
BROADCAST=195.168.10.255
ONBOOT=no
Installation finished successfully...
thyme:/tmp/OFED-1.1.1-ib_local_sa # rpm -qa|grep kernel-ib
kernel-ib-1.1-2.6.16.21_0.8_smp
kernel-ib-devel-1.1-2.6.16.21_0.8_smp
thyme:/tmp/OFED-1.1.1-ib_local_sa # rpm -ql
kernel-ib-1.1-2.6.16.21_0.8_smp|grep iser
/lib/modules/2.6.16.21-0.8-smp/kernel/drivers/infiniband/ulp/iser
/lib/modules/2.6.16.21-0.8-smp/kernel/drivers/infiniband/ulp/iser/ib_iser.ko

For some reason, on your machine scsi/libiscsi.h was missing. On my machine it
is located here (this is where SLES 10 puts it):

thyme:/tmp/OFED-1.1.1-ib_local_sa # find /usr/src/linux-2.6.16.21-0.8 -name
libiscsi.h
/usr/src/linux-2.6.16.21-0.8/drivers/scsi/libiscsi.h

If you take a look at
openib-1.1/kernel_patches/backport/2.6.16_sles10/include_libiscsi.patch, you
will see that iSER will look for it in the right place. Therefore, I don't
understand what happened on your machine. Please check the following:
1. rpm -q open-iscsi
2. find /usr/src/linux-2.6.16.21-0.8 -name libiscsi.h
3. Check that kernel_patches/backport/2.6.16_sles10/include_libiscsi.patch was
applied successfully.


-- 
Configure bugmail: https://bugs.openfabrics.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From ogerlitz at voltaire.com  Thu Feb  1 00:58:56 2007
From: ogerlitz at voltaire.com (Or Gerlitz)
Date: Thu, 01 Feb 2007 10:58:56 +0200
Subject: [openib-general] ip_ib_mc_map?
In-Reply-To: <1170275331.14294.1.camel@stevo-desktop>
References: <1170275331.14294.1.camel@stevo-desktop>
Message-ID: <45C1ABD0.5090404@voltaire.com>

Steve Wise wrote:
> where can I find this symbol?  I can't load rdma_cm on rhel4u4...
> rdma_cm: Unknown symbol ip_ib_mc_map

Sean, OK, sorry not to mention the rh4u4 issue once you did the push to 
OFED 1.2 ...

 From a reason that no one at RH can trace... someone went and removed 
all the support for ARPHRD_INFINIBAND multicast from u4 where it exists 
perfectly fine in u3 and hopefully on u5 as well (Doug can you update?), 
see https://bugs.openfabrics.org/show_bug.cgi?id=2661

Specifically, the below snip from the patch means that on rh4 u4 all 
IPv4 ARPHRD_INFINIBAND multicast goes on the broadcast group !!!

> Index: linux-2.6.9/net/ipv4/arp.c
> ===================================================================
> --- linux-2.6.9.orig/net/ipv4/arp.c	2004-10-18 23:55:06.000000000 +0200
> +++ linux-2.6.9/net/ipv4/arp.c	2006-09-20 14:43:59.000000000 +0300
> @@ -213,6 +213,9 @@
>  	case ARPHRD_IEEE802_TR:
>  		ip_tr_mc_map(addr, haddr);
>  		return 0;
> +	case ARPHRD_INFINIBAND:
> +		ip_ib_mc_map(addr, haddr);
> +		return 0;
>  	default:
>  		if (dir) {
>  			memcpy(haddr, dev->broadcast, dev->addr_len);

anyway, OFED wise, i see two ways to solve this:

1) adding a backport to the rdma_cm containing ip_ib_mc_map, period.

This means that apps offloading multicast traffic through the rdma cm 
would use the correct group where apps working through the net stack
use the broadcast group.

2) having the rdma cm follow the net stack and make its consumer use the 
broadcast group.

Or.


From swise at opengridcomputing.com  Thu Feb  1 01:01:24 2007
From: swise at opengridcomputing.com (Steve WIse)
Date: Thu, 01 Feb 2007 03:01:24 -0600
Subject: [openib-general] ip_ib_mc_map?
In-Reply-To: <45C1480C.1020600@ichips.intel.com>
References: <000101c74576$fedc81f0$8698070a@amr.corp.intel.com>
	<1170275680.14294.5.camel@stevo-desktop>
	<45C1480C.1020600@ichips.intel.com>
Message-ID: <1170320484.654.6.camel@linux-q667.site>

On Wed, 2007-01-31 at 17:53 -0800, Sean Hefty wrote:
> Steve Wise wrote:
> > Perhaps there's no backport for this to rhel4u4?
> 
> I would have thought so, but I really don't know.  The function is called from 
> net/ipv4/arp.c, and not directly by ipoib.  So, I don't know how the backport 
> patches typically handle this.
> 
> - Sean

Here's what I see:

ip_ib_mc_map() is called directly from cma_join_ib_multicast(), which is
added to the ofed_1_2 cma.c via patch file:
kernel_patches/fixes/sean_multicast_1.patch

So when I compiled ofed_1_2 on rhel4u4, the cma wouldn't load because
there is no ip_ib_mc_map() in rhel4u4.  

So you need a backport patch for this to work on rhel4u4.  Probably many
of the older kernels.

Steve.


From mst at mellanox.co.il  Thu Feb  1 01:06:28 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Thu, 1 Feb 2007 11:06:28 +0200
Subject: [openib-general] ip_ib_mc_map?
In-Reply-To: <45C1ABD0.5090404@voltaire.com>
References: <1170275331.14294.1.camel@stevo-desktop>
	<45C1ABD0.5090404@voltaire.com>
Message-ID: <20070201090628.GC14189@mellanox.co.il>

>  From a reason that no one at RH can trace... someone went and removed 
> all the support for ARPHRD_INFINIBAND multicast from u4 where it exists 
> perfectly fine in u3 and hopefully on u5 as well (Doug can you update?), 
> see https://bugs.openfabrics.org/show_bug.cgi?id=2661
> 
> Specifically, the below snip from the patch means that on rh4 u4 all 
> IPv4 ARPHRD_INFINIBAND multicast goes on the broadcast group !!!
> 
> > Index: linux-2.6.9/net/ipv4/arp.c
> > ===================================================================
> > --- linux-2.6.9.orig/net/ipv4/arp.c	2004-10-18 23:55:06.000000000 +0200
> > +++ linux-2.6.9/net/ipv4/arp.c	2006-09-20 14:43:59.000000000 +0300
> > @@ -213,6 +213,9 @@
> >  	case ARPHRD_IEEE802_TR:
> >  		ip_tr_mc_map(addr, haddr);
> >  		return 0;
> > +	case ARPHRD_INFINIBAND:
> > +		ip_ib_mc_map(addr, haddr);
> > +		return 0;
> >  	default:
> >  		if (dir) {
> >  			memcpy(haddr, dev->broadcast, dev->addr_len);
> 
> anyway, OFED wise, i see two ways to solve this:
> 
> 1) adding a backport to the rdma_cm containing ip_ib_mc_map, period.
> 
> This means that apps offloading multicast traffic through the rdma cm 
> would use the correct group where apps working through the net stack
> use the broadcast group.
> 
> 2) having the rdma cm follow the net stack and make its consumer use the 
> broadcast group.

Correct. Since multicast is broken in other respects on U4
(sockets can't join multicast groups), I think 2 is the simplest approach.

Anyone who wants IPoIB milticast should just stay away from U4.

-- 
MST


From mst at mellanox.co.il  Thu Feb  1 01:09:58 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Thu, 1 Feb 2007 11:09:58 +0200
Subject: [openib-general] ip_ib_mc_map?
In-Reply-To: <1170320484.654.6.camel@linux-q667.site>
References: <000101c74576$fedc81f0$8698070a@amr.corp.intel.com>
	<1170275680.14294.5.camel@stevo-desktop>
	<45C1480C.1020600@ichips.intel.com>
	<1170320484.654.6.camel@linux-q667.site>
Message-ID: <20070201090958.GD14189@mellanox.co.il>

> Quoting Steve WIse <swise at opengridcomputing.com>:
> Subject: Re: ip_ib_mc_map?
> 
> On Wed, 2007-01-31 at 17:53 -0800, Sean Hefty wrote:
> > Steve Wise wrote:
> > > Perhaps there's no backport for this to rhel4u4?
> > 
> > I would have thought so, but I really don't know.  The function is called from 
> > net/ipv4/arp.c, and not directly by ipoib.  So, I don't know how the backport 
> > patches typically handle this.
> > 
> > - Sean
> 
> Here's what I see:
> 
> ip_ib_mc_map() is called directly from cma_join_ib_multicast(), which is
> added to the ofed_1_2 cma.c via patch file:
> kernel_patches/fixes/sean_multicast_1.patch
> 
> So when I compiled ofed_1_2 on rhel4u4, the cma wouldn't load because
> there is no ip_ib_mc_map() in rhel4u4.  
> 
> So you need a backport patch for this to work on rhel4u4.  Probably many
> of the older kernels.

I think this breakage is U4 specific. Someone at RH went to the trouble to
rip all of IB related stuff out of the U4 kernel.

I think just calling ip_tr_mc_map on U4 instead will be enough.

-- 
MST


From ogerlitz at voltaire.com  Thu Feb  1 01:17:53 2007
From: ogerlitz at voltaire.com (Or Gerlitz)
Date: Thu, 01 Feb 2007 11:17:53 +0200
Subject: [openib-general] [mthca] Creation of a SRQ with many WR (> 16K)
 in kernel level fails
In-Reply-To: <45C0662A.7050203@dev.mellanox.co.il>
References: <45BF0575.9020507@dev.mellanox.co.il>
	<45BF1866.3010807@voltaire.com> <adatzy8nm9d.fsf@cisco.com>
	<45C0662A.7050203@dev.mellanox.co.il>
Message-ID: <45C1B041.4000000@voltaire.com>

Dotan Barak wrote:
> I think that now, when implementation of IPoIB CM is available and SRQ 
> is being used, one may
> need to use a SRQ with more than 16K WRs.

IPoIB UD uses SRQ by nature (since RX from all peers consume buffers 
from the --only-- RQ) and lives fine with 32 buffers (or 64 you can look 
in the code). Moreover, my assumption is that

	pps(RC) <= pps(UC) <= pps(UD)

this means that what ever number of RX buffer for UD/2K MTU which is 
"enough" to have no (or close to zero) packet loss under some traffic 
pattern, the same pattern can be served with IPoIB CM using SRQ of the 
same size.

Or.


From swise at opengridcomputing.com  Thu Feb  1 01:37:50 2007
From: swise at opengridcomputing.com (Steve WIse)
Date: Thu, 01 Feb 2007 03:37:50 -0600
Subject: [openib-general] [PATCH] RE:  regression in ofed 1.2
In-Reply-To: <000401c7458b$9bff77d0$8698070a@amr.corp.intel.com>
References: <000401c7458b$9bff77d0$8698070a@amr.corp.intel.com>
Message-ID: <1170322670.654.23.camel@linux-q667.site>

> Okay - I _think_ the problem is that OFED 1.2 pulled code from my git tree
> before I created an ofed_1_2 branch (which contains the fix), and didn't update
> to match my ofed_1_2 branch.  The crash that you reported occurring over iWarp
> should also happen over IB for the same reason, so both are likely broken atm...
> 
> Vlad, can you please update the ofed build by pulling from the ofed_1_2 branches
> of my rdma-dev.git and librdmacm.git trees?

I looked at your rdma-dev ofed_1_2 branch and see that the cma.c changes
you made there will resolve this issue.  It just needs to be pulled into
ofed_1_2.

Thanks!

Steve.


From ogerlitz at voltaire.com  Thu Feb  1 01:38:46 2007
From: ogerlitz at voltaire.com (Or Gerlitz)
Date: Thu, 01 Feb 2007 11:38:46 +0200
Subject: [openib-general] [mthca] Creation of a SRQ with many WR (> 16K)
 in kernel level fails
In-Reply-To: <adatzy8nm9d.fsf@cisco.com>
References: <45BF0575.9020507@dev.mellanox.co.il>
	<45BF1866.3010807@voltaire.com> <adatzy8nm9d.fsf@cisco.com>
Message-ID: <45C1B526.30101@voltaire.com>

Roland Dreier wrote:
>  > anyway, the solution that comes into my mind is to disable creating a
>  > QP/SRQ for which > 128KB allocations are needed. So
>  > mthca_query_device() will set the max_qp_wr and max_srq_wr attributes
>  > to values whose derived size still allows to use kmalloc.
> 
> But that will limit the size of the queues that userspace can create
> too.  I guess we could allocate kernel wrid arrays with vmalloc(), but
> I wonder if anyone actually cares about this limit...

mmm, i would avoid vmalloc if possible. Allocating upto 128K bytes for a 
kernel resource sounds fine.

As for the user space sharing of the same limitation, how about adding 
to the --kernel-- struct ib_device_attr "for user space" buddy fields to 
max_qp_wr max_srq_wr and max_cqe such that each hw driver set both 
values: for the "user space" field the actual hw limitation and for 
"kernel space" field a value which would pass kmalloc.

kernel ULPs calling ibv_device_query would use the original fields, no 
need to patch them. Same for user space ULPs no need to patch them.

However, when the call is made from user space, uverbs_query_device 
copies to the resp struct the "user space" attr.

Or.


From mst at mellanox.co.il  Thu Feb  1 01:50:03 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Thu, 1 Feb 2007 11:50:03 +0200
Subject: [openib-general] [mthca] Creation of a SRQ with many WR (> 16K)
 in kernel level fails
In-Reply-To: <45C1B526.30101@voltaire.com>
References: <45BF0575.9020507@dev.mellanox.co.il>
	<45BF1866.3010807@voltaire.com> <adatzy8nm9d.fsf@cisco.com>
	<45C1B526.30101@voltaire.com>
Message-ID: <20070201095003.GA15505@mellanox.co.il>

> As for the user space sharing of the same limitation, how about adding 
> to the --kernel-- struct ib_device_attr "for user space" buddy fields to 
> max_qp_wr max_srq_wr and max_cqe such that each hw driver set both 
> values: for the "user space" field the actual hw limitation and for 
> "kernel space" field a value which would pass kmalloc.

We could do that I guess but no one so far used query in kernel,
and userspace values are currently good.

-- 
MST


From dledford at redhat.com  Thu Feb  1 02:17:32 2007
From: dledford at redhat.com (Doug Ledford)
Date: Thu, 01 Feb 2007 05:17:32 -0500
Subject: [openib-general] ip_ib_mc_map?
In-Reply-To: <45C1ABD0.5090404@voltaire.com>
References: <1170275331.14294.1.camel@stevo-desktop>
	<45C1ABD0.5090404@voltaire.com>
Message-ID: <1170325052.2716.229.camel@fc6.xsintricity.com>

On Thu, 2007-02-01 at 10:58 +0200, Or Gerlitz wrote:
> Steve Wise wrote:
> > where can I find this symbol?  I can't load rdma_cm on rhel4u4...
> > rdma_cm: Unknown symbol ip_ib_mc_map
> 
> Sean, OK, sorry not to mention the rh4u4 issue once you did the push to 
> OFED 1.2 ...
> 
>  From a reason that no one at RH can trace... someone went and removed 
> all the support for ARPHRD_INFINIBAND multicast from u4 where it exists 
> perfectly fine in u3 and hopefully on u5 as well (Doug can you update?), 
> see https://bugs.openfabrics.org/show_bug.cgi?id=2661

Yes.  It's been fixed for U5.  It wasn't that the patch got removed,
it's that between U3 and U4 I did a complete rebase, which means that
all the patches from U3 were tossed out the window and a complete new
set made for U4.  I just missed re-adding this one in U4.

> Specifically, the below snip from the patch means that on rh4 u4 all 
> IPv4 ARPHRD_INFINIBAND multicast goes on the broadcast group !!!
> 
> > Index: linux-2.6.9/net/ipv4/arp.c
> > ===================================================================
> > --- linux-2.6.9.orig/net/ipv4/arp.c	2004-10-18 23:55:06.000000000 +0200
> > +++ linux-2.6.9/net/ipv4/arp.c	2006-09-20 14:43:59.000000000 +0300
> > @@ -213,6 +213,9 @@
> >  	case ARPHRD_IEEE802_TR:
> >  		ip_tr_mc_map(addr, haddr);
> >  		return 0;
> > +	case ARPHRD_INFINIBAND:
> > +		ip_ib_mc_map(addr, haddr);
> > +		return 0;
> >  	default:
> >  		if (dir) {
> >  			memcpy(haddr, dev->broadcast, dev->addr_len);
> 
> anyway, OFED wise, i see two ways to solve this:
> 
> 1) adding a backport to the rdma_cm containing ip_ib_mc_map, period.
> 
> This means that apps offloading multicast traffic through the rdma cm 
> would use the correct group where apps working through the net stack
> use the broadcast group.
> 
> 2) having the rdma cm follow the net stack and make its consumer use the 
> broadcast group.
> 
> Or.
-- 
Doug Ledford <dledford at redhat.com>
              GPG KeyID: CFBFF194
              http://people.redhat.com/dledford

Infiniband specific RPMs available at
              http://people.redhat.com/dledford/Infiniband
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070201/8fbf59bb/attachment.sig>

From bugzilla-daemon at lists.openfabrics.org  Thu Feb  1 02:22:40 2007
From: bugzilla-daemon at lists.openfabrics.org (bugzilla-daemon at lists.openfabrics.org)
Date: Thu,  1 Feb 2007 02:22:40 -0800 (PST)
Subject: [openib-general] [Bug 334] Problems with build
	OFED-1.1.1-ib_local_sa
In-Reply-To: <bug-334-1@https.bugs.openfabrics.org/>
Message-ID: <20070201102241.38A69E607F8@openfabrics.org>

https://bugs.openfabrics.org/show_bug.cgi?id=334


------- Comment #5 from dmitry.yulov at intel.com  2007-02-01 02:22 -------
Created an attachment (id=73)
 --> (https://bugs.openfabrics.org/attachment.cgi?id=73&action=view)
The file configuration for OFED


-- 
Configure bugmail: https://bugs.openfabrics.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From vlad at lists.openfabrics.org  Thu Feb  1 02:23:03 2007
From: vlad at lists.openfabrics.org (vlad at lists.openfabrics.org)
Date: Thu,  1 Feb 2007 02:23:03 -0800 (PST)
Subject: [openib-general] ofa_1_2_kernel 20070201-0200 daily build status
Message-ID: <20070201102303.B082FE607FA@openfabrics.org>

This email was generated automatically, please do not reply


Common build parameters:  --with-ipoib-mod --with-sdp-mod --with-srp-mod --with-user_mad-mod --with-user_access-mod --with-mthca-mod --with-core-mod --with-addr_trans-mod --with-cxgb3-mod 

Passed:
Passed on i686 with 2.6.15-23-server
Passed on i686 with linux-2.6.19
Passed on i686 with linux-2.6.17
Passed on i686 with linux-2.6.14
Passed on i686 with linux-2.6.16
Passed on i686 with linux-2.6.12
Passed on i686 with linux-2.6.15
Passed on i686 with linux-2.6.18
Passed on i686 with linux-2.6.13
Passed on x86_64 with linux-2.6.18
Passed on powerpc with linux-2.6.18
Passed on x86_64 with linux-2.6.19
Passed on powerpc with linux-2.6.19
Passed on x86_64 with linux-2.6.17
Passed on x86_64 with linux-2.6.15
Passed on x86_64 with linux-2.6.16
Passed on powerpc with linux-2.6.17
Passed on x86_64 with linux-2.6.12
Passed on x86_64 with linux-2.6.13
Passed on x86_64 with linux-2.6.14
Passed on ia64 with linux-2.6.19
Passed on ppc64 with linux-2.6.12
Passed on powerpc with linux-2.6.14
Passed on ppc64 with linux-2.6.19
Passed on powerpc with linux-2.6.12
Passed on powerpc with linux-2.6.13
Passed on powerpc with linux-2.6.15
Passed on powerpc with linux-2.6.16
Passed on ppc64 with linux-2.6.16
Passed on ia64 with linux-2.6.18
Passed on ppc64 with linux-2.6.18
Passed on ppc64 with linux-2.6.15
Passed on ppc64 with linux-2.6.14
Passed on ia64 with linux-2.6.16
Passed on ppc64 with linux-2.6.13
Passed on ppc64 with linux-2.6.17
Passed on ia64 with linux-2.6.13
Passed on ia64 with linux-2.6.15
Passed on ia64 with linux-2.6.14
Passed on ia64 with linux-2.6.17
Passed on ia64 with linux-2.6.12

Failed:
Build failed on ia64 with linux-2.6.16.21-0.8-default
Log:
/home/vlad/tmp/ofa_1_2_kernel-20070201-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core/addr.c:380: error: implicit declaration of function ‘register_netevent_notifier’
/home/vlad/tmp/ofa_1_2_kernel-20070201-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core/addr.c: In function ‘addr_cleanup’:
/home/vlad/tmp/ofa_1_2_kernel-20070201-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core/addr.c:386: error: implicit declaration of function ‘unregister_netevent_notifier’
make[4]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070201-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core/addr.o] Error 1
make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070201-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core] Error 2
make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070201-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband] Error 2
make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070201-0200_linux-2.6.16.21-0.8-default_ia64_check] Error 2
make[1]: Leaving directory `/home/vlad/kernel.org/ia64/linux-2.6.16.21-0.8-default'
make: *** [kernel] Error 2
----------------------------------------------------------------------------------


From bugzilla-daemon at lists.openfabrics.org  Thu Feb  1 02:30:18 2007
From: bugzilla-daemon at lists.openfabrics.org (bugzilla-daemon at lists.openfabrics.org)
Date: Thu,  1 Feb 2007 02:30:18 -0800 (PST)
Subject: [openib-general] [Bug 334] Problems with build
	OFED-1.1.1-ib_local_sa
In-Reply-To: <bug-334-1@https.bugs.openfabrics.org/>
Message-ID: <20070201103018.4DA5EE607F7@openfabrics.org>

https://bugs.openfabrics.org/show_bug.cgi?id=334


dmitry.yulov at intel.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |INVALID


------- Comment #6 from dmitry.yulov at intel.com  2007-02-01 02:30 -------
Hi,
Thanks a lot for explanation.
I have some comments for you:
First of all I need to run build script to make RPMS. I use the build.sh script
to do this. Also I need to build all packages from sources. I have attached the
my file configuration to build rpms and I see some difference from your file.

> rpm -q open-iscsi 
The fale was presented before I run built RPMS
> find /usr/src/linux-2.6.16.21-0.8 -name libiscsi.h 
The file has presented on my machine
> Check that kernel_patches/backport/2.6.16_sles10/include_libiscsi.patch was
applied successfully.
I checked it and patch appalied sucess.

Could you please try to build OFED-1.1.1-ib_local_sa from source using for it
my file configuration not your? I get OFED-1.1.1-ib_local_sa from
https://svn.openfabrics.org/svn/openib/gen2/branches/1.1/ofed/releases/OFED-1.1.1-ib_local_sa.tgz.


-- 
Configure bugmail: https://bugs.openfabrics.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From kliteyn at dev.mellanox.co.il  Thu Feb  1 02:35:01 2007
From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik)
Date: Thu, 01 Feb 2007 12:35:01 +0200
Subject: [openib-general] [PATCH] osm: trivial casting for compilation on
	windows
Message-ID: <45C1C255.4060405@dev.mellanox.co.il>

Trivial casting for compilation on windows

Signed-off-by: Yevgeny Kliteynik <kliteyn at dev.mellanox.co.il>
---
 osm/opensm/osm_subnet.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/osm/opensm/osm_subnet.c b/osm/opensm/osm_subnet.c
index f2e909b..e4e69c0 100644
--- a/osm/opensm/osm_subnet.c
+++ b/osm/opensm/osm_subnet.c
@@ -562,7 +562,7 @@ __osm_subn_opts_unpack_uint16(
 
   if (!strcmp(p_req_key, p_key))
   {
-    val = strtoul(p_val_str, NULL, 0);
+    val = (uint16_t)strtoul(p_val_str, NULL, 0);
     if (val != *p_val)
     {
       char buff[128];
-- 
1.4.4.1.GIT

 
From dotanb at dev.mellanox.co.il  Thu Feb  1 02:41:25 2007
From: dotanb at dev.mellanox.co.il (Dotan Barak)
Date: Thu, 01 Feb 2007 12:41:25 +0200
Subject: [openib-general] IB/mthca: question about HCA profile module
	parameters
Message-ID: <45C1C3D5.1050301@dev.mellanox.co.il>

Hi Moni.

I tried to use the mthca module parameter: for example i tried to change 
the number of QPs.

I got several failures when i used the HCA 25204:
* sometimes i got the following error message (when using big values, 
for example 512K QPs):
ib_mthca: 0000:0c: INIT_HCA command failed aborting.
ib_mthca: probe of 0000:0c: failed with error -16
* when i tried to use small amount of QPs (1024) the machine just hanged 
and i noticed a kernel oops message on the console


Did you verify the HCA profile module parameter feature?
Is there is any known limitation for the values that should be used? 
(for example: only values which are power of two)


thanks
Dotan


From swise at opengridcomputing.com  Thu Feb  1 02:53:25 2007
From: swise at opengridcomputing.com (Steve WIse)
Date: Thu, 01 Feb 2007 04:53:25 -0600
Subject: [openib-general] [PATCH] RE:  regression in ofed 1.2
In-Reply-To: <1170322670.654.23.camel@linux-q667.site>
References: <000401c7458b$9bff77d0$8698070a@amr.corp.intel.com>
	<1170322670.654.23.camel@linux-q667.site>
Message-ID: <1170327205.654.34.camel@linux-q667.site>

On Thu, 2007-02-01 at 03:37 -0600, Steve WIse wrote:
> > Okay - I _think_ the problem is that OFED 1.2 pulled code from my git tree
> > before I created an ofed_1_2 branch (which contains the fix), and didn't update
> > to match my ofed_1_2 branch.  The crash that you reported occurring over iWarp
> > should also happen over IB for the same reason, so both are likely broken atm...
> > 
> > Vlad, can you please update the ofed build by pulling from the ofed_1_2 branches
> > of my rdma-dev.git and librdmacm.git trees?
> 
> I looked at your rdma-dev ofed_1_2 branch and see that the cma.c changes
> you made there will resolve this issue.  It just needs to be pulled into
> ofed_1_2.
> 

Also, I just pulled down and built the latest ofed_1_2 kernel and user
code against 2.6.20-rc7, and the ucma abi is 4.  So rdma_create_qp()
will still crash even with the librdmacm code to avoid the call to
rdma_init_qp_attr for ABI 3 kernels.


Steve.


From bugzilla-daemon at lists.openfabrics.org  Thu Feb  1 03:04:17 2007
From: bugzilla-daemon at lists.openfabrics.org (bugzilla-daemon at lists.openfabrics.org)
Date: Thu,  1 Feb 2007 03:04:17 -0800 (PST)
Subject: [openib-general] [Bug 334] Problems with build
	OFED-1.1.1-ib_local_sa
In-Reply-To: <bug-334-1@https.bugs.openfabrics.org/>
Message-ID: <20070201110417.93A48E607F7@openfabrics.org>

https://bugs.openfabrics.org/show_bug.cgi?id=334


dmitry.yulov at intel.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|RESOLVED                    |REOPENED
         Resolution|INVALID                     |


------- Comment #7 from dmitry.yulov at intel.com  2007-02-01 03:04 -------
I try to build the product again and i saw thet pathces from
kernel_patches/backport/2.6.16_sles10/ directory not applied automaticaly. When
I applay these patch manually all built. How I can run build process with
automaticaly appaling patches?


-- 
Configure bugmail: https://bugs.openfabrics.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From ogerlitz at voltaire.com  Thu Feb  1 03:10:48 2007
From: ogerlitz at voltaire.com (Or Gerlitz)
Date: Thu, 01 Feb 2007 13:10:48 +0200
Subject: [openib-general] [mthca] Creation of a SRQ with many WR (> 16K)
 in kernel level fails
In-Reply-To: <20070201095003.GA15505@mellanox.co.il>
References: <45BF0575.9020507@dev.mellanox.co.il>
	<45BF1866.3010807@voltaire.com> <adatzy8nm9d.fsf@cisco.com>
	<45C1B526.30101@voltaire.com> <20070201095003.GA15505@mellanox.co.il>
Message-ID: <45C1CAB8.2080806@voltaire.com>

Michael S. Tsirkin wrote:
>> As for the user space sharing of the same limitation, how about adding 
>> to the --kernel-- struct ib_device_attr "for user space" buddy fields to 
>> max_qp_wr max_srq_wr and max_cqe such that each hw driver set both 
>> values: for the "user space" field the actual hw limitation and for 
>> "kernel space" field a value which would pass kmalloc.

> We could do that I guess but no one so far used query in kernel,
> and userspace values are currently good.

srp calls ibv_device_query but does not care for these fields, as for 
IPoIB CM if you see things as in my other email, i guess you don't need 
to query as well.

However, as this is a kind of easy to implement change which does not 
break the user kernel ABI and allows kernel consumers to count on query 
results they got from the hw driver, going longer term i think we do 
want to have it done.

Or.


From kliteyn at dev.mellanox.co.il  Thu Feb  1 03:48:48 2007
From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik)
Date: Thu, 01 Feb 2007 13:48:48 +0200
Subject: [openib-general] [PATCH] osm: some trivial chages in the
 osm_ucast_lash for compilation on windows
Message-ID: <45C1D3A0.7060201@dev.mellanox.co.il>

Hi Hal,

This patch has some trivial changes in the osm_ucast_lash.c 
for compilation on windows.

In general, this file needs a major cosmetic (and not only)
patch to fit better into the OSM code. Will get back to it 
at some point in the future.

-- Yevgeny

Signed-off-by: Yevgeny Kliteynik <kliteyn at dev.mellanox.co.il>
---
 osm/opensm/osm_ucast_lash.c |   80 ++++++++++++++++++++++--------------------
 1 files changed, 42 insertions(+), 38 deletions(-)

diff --git a/osm/opensm/osm_ucast_lash.c b/osm/opensm/osm_ucast_lash.c
index 70e5cbe..95f3ec9 100644
--- a/osm/opensm/osm_ucast_lash.c
+++ b/osm/opensm/osm_ucast_lash.c
@@ -217,6 +217,8 @@ static uint8_t find_port_from_lid(IN con
   uint8_t port_count = 0;
   uint8_t i=0;
   osm_physp_t *p_current_physp, *p_remote_physp = NULL;
+  ib_port_info_t *port_info;
+  ib_net16_t port_lid;
 
   uint8_t egress_port = 255;
 
@@ -227,8 +229,8 @@ static uint8_t find_port_from_lid(IN con
   // process management port first
   p_current_physp = osm_node_get_physp_ptr(p_sw->p_node, 0);
 
-  ib_port_info_t *port_info = &p_current_physp->port_info;
-  ib_net16_t port_lid =  port_info->base_lid;
+  port_info = &p_current_physp->port_info;
+  port_lid =  port_info->base_lid;
   if (port_lid == lid_no) {
     egress_port = 0;
     goto Exit;
@@ -294,15 +296,15 @@ static int cycle_exists(cdg_vertex_t * s
   } else {
     if(current == NULL) {
       current = start;
-      assert(prev == NULL);
+      CL_ASSERT(prev == NULL);
     }
 
     current->visiting_number = visit_num;
 
     if(prev != NULL) {
       prev->next = current;
-      assert(prev->to == current->from);
-      assert(prev->visiting_number > 0);
+      CL_ASSERT(prev->to == current->from);
+      CL_ASSERT(prev->visiting_number > 0);
     }
 
     new_visit_num = visit_num + 1;
@@ -346,7 +348,7 @@ static void remove_semipermanent_depend_
 
   while(sw != dest_switch){
     v = cdg_vertex_matrix[lane][sw][i_next_switch];
-    assert(v != NULL);
+    CL_ASSERT(v != NULL);
 
     if(v->num_using_vertex == 1) {
 
@@ -366,7 +368,7 @@ static void remove_semipermanent_depend_
 	    depend = i;
 	  }
 
-	assert(found);
+	CL_ASSERT(found);
 
 	if(v->num_using_this_depend[depend] == 1) {
 	  for(i=depend; i<v->num_dependencies-1; i++) {
@@ -403,7 +405,7 @@ static void enqueue(lash_t *p_lash, int
   switch_t **switches = p_lash->switches;
   q_item_t *q_head;
 
-  assert(switches[sw]->q_member == 0);
+  CL_ASSERT(switches[sw]->q_member == 0);
   switches[sw]->q_member = 1;
   switches[sw]->dist = dist;
   switches[sw]->prev = prev;
@@ -454,7 +456,7 @@ static void dequeue(lash_t *p_lash, int
   *dist = switches[q_min->sw]->dist;
   *prev = switches[q_min->sw]->prev;
 
-  assert(switches[q_min->sw]->q_member == 1 && !switches[q_min->sw]->mst_member);
+  CL_ASSERT(switches[q_min->sw]->q_member == 1 && !switches[q_min->sw]->mst_member);
   switches[q_min->sw]->q_member = 0;
   free(q_min);
 }
@@ -468,12 +470,11 @@ static void dequeue(lash_t *p_lash, int
 
 static int get_phys_connection(switch_t **switches, int switch_from, int switch_to)
 {
-  int i = 0;
+  unsigned int i = 0;
 
   for (i = 0; i < switches[switch_from]->num_connections; i++)
     if(switches[switch_from]->phys_connections[i] == switch_to)
       return i;
-  assert(1==1);
   return i;
 }
 
@@ -557,7 +558,7 @@ static void generate_routing_func_for_ms
       i_dest = i_dest->next;
     }
 
-    assert(prev->next == NULL);
+    CL_ASSERT(prev->next == NULL);
     prev->next = concat_dest;
     concat_dest = dest;
   }
@@ -590,10 +591,9 @@ static void generate_cdg_for_sp(lash_t*p
   while(sw != dest_switch) {
 
     if(cdg_vertex_matrix[lane][sw][next_switch] == NULL) {
+      unsigned i;
       v = create_cdg_vertex(num_switches);
 
-      int i;
-
       for(i=0; i<num_switches-1; i++) {
 	v->dependency[i] = NULL;
 	v->num_using_this_depend[i] = 0;
@@ -630,7 +630,7 @@ static void generate_cdg_for_sp(lash_t*p
 	prev->num_using_this_depend[prev->num_dependencies]++;
 	prev->num_dependencies++;
 
-	assert(prev->num_dependencies < num_switches);
+	CL_ASSERT(prev->num_dependencies < (int)num_switches);
 
 	if(prev->temp==0)
 	  prev->num_temp_depend++;
@@ -642,7 +642,7 @@ static void generate_cdg_for_sp(lash_t*p
     output_link = switches[sw]->routing_table[dest_switch].out_link;
 
     if(sw != dest_switch) {
-      assert(output_link != NONE);
+      CL_ASSERT(output_link != NONE);
       next_switch = switches[sw]->phys_connections[output_link];
     }
 
@@ -670,7 +670,7 @@ static void set_temp_depend_to_permanent
 
   while(sw != dest_switch) {
     v = cdg_vertex_matrix[lane][sw][next_switch];
-    assert(v != NULL);
+    CL_ASSERT(v != NULL);
 
     if(v->temp == 1) {
       v->temp = 0;
@@ -706,13 +706,13 @@ static void remove_temp_depend_for_sp(la
 
   while(sw != dest_switch) {
     v = cdg_vertex_matrix[lane][sw][next_switch];
-    assert(v != NULL);
+    CL_ASSERT(v != NULL);
 
     if(v->temp==1) {
       cdg_vertex_matrix[lane][sw][next_switch] = NULL;
       free(v);
     } else {
-      assert(v->num_temp_depend <= v->num_dependencies);
+      CL_ASSERT(v->num_temp_depend <= v->num_dependencies);
       v->num_dependencies = v->num_dependencies - v->num_temp_depend;
       v->num_temp_depend = 0;
       v->num_using_vertex--;
@@ -744,7 +744,8 @@ static void balance_virtual_lanes(lash_t
   int *num_mst_in_lane = p_lash->num_mst_in_lane;
   int ***virtual_location = p_lash->virtual_location;
   int min_filled_lane, max_filled_lane, medium_filled_lane, trials;
-  int old_min_filled_lane, old_max_filled_lane, i, j, new_num_min_lane, new_num_max_lane;
+  int old_min_filled_lane, old_max_filled_lane, new_num_min_lane, new_num_max_lane;
+  unsigned int i, j;
   int src, dest, start, next_switch, output_link;
   int stop = 0, cycle_found;
 
@@ -788,7 +789,7 @@ static void balance_virtual_lanes(lash_t
     output_link = p_lash->switches[src]->routing_table[dest].out_link;
     next_switch = p_lash->switches[src]->phys_connections[output_link];
 
-    assert(cdg_vertex_matrix[min_filled_lane][src][next_switch] != NULL);
+    CL_ASSERT(cdg_vertex_matrix[min_filled_lane][src][next_switch] != NULL);
     cycle_found = cycle_exists(cdg_vertex_matrix[min_filled_lane][src][next_switch], NULL, NULL, 1);
 
     for(i=0; i<num_switches; i++)
@@ -863,7 +864,7 @@ static switch_t *switch_create(lash_t *p
 {
 	unsigned num_switches = p_lash->num_switches;
 	switch_t *sw;
-	int i;
+	unsigned int i;
 
 	sw = malloc(sizeof(*sw));
 	if (!sw)
@@ -926,7 +927,7 @@ static void switch_delete(switch_t *sw)
 
 static void free_lash_structures(lash_t *p_lash)
 {
-  int i,j,k;
+  unsigned int i,j,k;
   unsigned num_switches = p_lash->num_switches;
   osm_log_t *p_log = &p_lash->p_osm->log;
 
@@ -988,12 +989,11 @@ static int init_lash_structures(lash_t *
   unsigned vl_min = p_lash->vl_min;
   unsigned num_switches = p_lash->num_switches;
   osm_log_t *p_log = &p_lash->p_osm->log;
+  int status = IB_SUCCESS;
+  unsigned int  i, j, k;
 
   OSM_LOG_ENTER( p_log, init_lash_structures);
 
-  int status = IB_SUCCESS;
-  int  i, j, k;
-
   // initialise cdg_vertex_matrix[num_switches][num_switches][num_switches]
   p_lash->cdg_vertex_matrix = (cdg_vertex_t****)malloc(vl_min * sizeof(cdg_vertex_t ****));
   for (i = 0; i < vl_min; i++) {
@@ -1084,10 +1084,11 @@ static int lash_core(lash_t *p_lash)
   unsigned num_switches = p_lash->num_switches;
   switch_t **switches = p_lash->switches;
   unsigned lanes_needed = 1;
-  int i, j, k, dest_switch = 0;
+  unsigned int i, j, k, dest_switch = 0;
   reachable_dest_t * dests, * idest;
   int cycle_found = 0;
-  int v_lane, stop = 0, output_link, i_next_switch;
+  unsigned v_lane;
+  int stop = 0, output_link, i_next_switch;
   int status = IB_SUCCESS;
 
   OSM_LOG_ENTER( p_log, lash_core);
@@ -1113,7 +1114,7 @@ static int lash_core(lash_t *p_lash)
 	    output_link = switches[i]->routing_table[dest_switch].out_link;
 	    i_next_switch = switches[i]->phys_connections[output_link];
 
-	    assert(p_lash->cdg_vertex_matrix[v_lane][i][i_next_switch] != NULL);
+	    CL_ASSERT(p_lash->cdg_vertex_matrix[v_lane][i][i_next_switch] != NULL);
 	    cycle_found = cycle_exists(p_lash->cdg_vertex_matrix[v_lane][i][i_next_switch], NULL, NULL, 1);
 
 	    for(j=0; j<num_switches; j++)
@@ -1214,12 +1215,14 @@ static void populate_fwd_tbls(lash_t *p_
 
   // Go through each swtich individually
   while(p_next_sw != (osm_switch_t*)cl_qmap_end( &p_subn->sw_guid_tbl )) {
+      uint64_t current_guid;
+      switch_t *sw;
       p_sw = p_next_sw;
       p_next_sw = (osm_switch_t*)cl_qmap_next( &p_sw->map_item );
 
       max_lid_ho = osm_switch_get_max_lid_ho(p_sw);
-      uint64_t current_guid = p_sw->p_node->node_info.port_guid;
-      switch_t *sw = p_sw->priv;
+      current_guid = p_sw->p_node->node_info.port_guid;
+      sw = p_sw->priv;
 
       memset(p_osm->sm.ucast_mgr.lft_buf, 0xff, IB_LID_UCAST_END_HO + 1);
 
@@ -1244,8 +1247,8 @@ static void populate_fwd_tbls(lash_t *p_
 		  cl_ntoh64(current_guid), -1, egress_port);
         } else {
 	  unsigned dst_lash_switch_id = get_lash_id(p_dst_sw);
-	  uint8_t lash_egress_port = sw->routing_table[dst_lash_switch_id].out_link;
-	  uint8_t physical_egress_port = sw->virtual_physical_port_table[lash_egress_port];
+	  uint8_t lash_egress_port = (uint8_t)sw->routing_table[dst_lash_switch_id].out_link;
+	  uint8_t physical_egress_port = (uint8_t)sw->virtual_physical_port_table[lash_egress_port];
 
 	  p_osm->sm.ucast_mgr.lft_buf[lid] = physical_egress_port;
 	  osm_log(p_log, OSM_LOG_DEBUG,
@@ -1366,7 +1369,7 @@ static void lash_cleanup(lash_t *p_lash)
 
 	if (p_lash->switches) {
 		unsigned id;
-		for (id = 0; id < p_lash->num_switches ; id++)
+		for (id = 0; ((int)id) < p_lash->num_switches ; id++)
 			if (p_lash->switches[id])
 				switch_delete(p_lash->switches[id]);
 		free(p_lash->switches);
@@ -1400,6 +1403,7 @@ static int discover_network_properties(l
 
   p_next_sw = (osm_switch_t*)cl_qmap_head( &p_subn->sw_guid_tbl );
   while(p_next_sw != (osm_switch_t*)cl_qmap_end( &p_subn->sw_guid_tbl ) ) {
+      uint16_t port_count;
       p_sw = p_next_sw;
       p_next_sw = (osm_switch_t*)cl_qmap_next( &p_sw->map_item );
 
@@ -1408,7 +1412,7 @@ static int discover_network_properties(l
         return -1;
       id++;
 
-      uint16_t port_count = osm_node_get_num_physp (p_sw->p_node);
+      port_count = osm_node_get_num_physp (p_sw->p_node);
 
       // Note, ignoring port 0. management port
       for (i=1; i<port_count; i++) {
@@ -1418,7 +1422,7 @@ static int discover_network_properties(l
 	      p_current_physp->p_remote_physp) {
 
 	    ib_port_info_t *p_port_info = &p_current_physp->port_info;
-	    int port_vl_min = ib_port_info_get_op_vls(p_port_info);
+	    uint8_t port_vl_min = ib_port_info_get_op_vls(p_port_info);
 	    if (port_vl_min && port_vl_min < vl_min)
 	      vl_min = port_vl_min;
 	  }
@@ -1508,7 +1512,7 @@ static void lash_delete(void *context)
 	lash_t *p_lash = context;
 	if (p_lash->switches) {
 		unsigned id;
-		for (id = 0; id < p_lash->num_switches ; id++)
+		for (id = 0; ((int)id) < p_lash->num_switches ; id++)
 			if (p_lash->switches[id])
 				switch_delete(p_lash->switches[id]);
 		free(p_lash->switches);
@@ -1534,7 +1538,7 @@ uint8_t osm_get_lash_sl(osm_opensm_t *p_
 	if (!p_sw || !p_sw->priv)
 		return OSM_DEFAULT_SL;
 
-	return ((switch_t *)p_sw->priv)->routing_table[dst_id].lane;
+	return (uint8_t)((switch_t *)p_sw->priv)->routing_table[dst_id].lane;
 }
 
 int osm_ucast_lash_setup(osm_opensm_t *p_osm)
-- 
1.4.4.1.GIT

 
From vlad at dev.mellanox.co.il  Thu Feb  1 03:58:16 2007
From: vlad at dev.mellanox.co.il (Vladimir Sokolovsky)
Date: Thu, 01 Feb 2007 13:58:16 +0200
Subject: [openib-general] MVAPICH2 SRPM and install file patches
In-Reply-To: <45C14344.9010602@cse.ohio-state.edu>
References: <45C14344.9010602@cse.ohio-state.edu>
Message-ID: <1170331096.6114.4.camel@vladsk-laptop>

On Wed, 2007-01-31 at 20:32 -0500, Shaun Rowland wrote:
> I've placed the MVAPICH2 SRPM on the OFA server in ~rowland/ofed_1_2,
> and it is linked to here:
> 
> http://www.openfabrics.org/~rowland/ofed_1_2/

ofed_1_2_scripts.patch applied.

Thanks,


-- 
Vladimir Sokolovsky <vlad at dev.mellanox.co.il>
Mellanox Technologies Ltd.


From ogerlitz at voltaire.com  Thu Feb  1 04:09:11 2007
From: ogerlitz at voltaire.com (Or Gerlitz)
Date: Thu, 01 Feb 2007 14:09:11 +0200
Subject: [openib-general] ip_ib_mc_map?
In-Reply-To: <20070201090628.GC14189@mellanox.co.il>
References: <1170275331.14294.1.camel@stevo-desktop>
	<45C1ABD0.5090404@voltaire.com> <20070201090628.GC14189@mellanox.co.il>
Message-ID: <45C1D867.4030208@voltaire.com>

Michael S. Tsirkin wrote:
>> 1) adding a backport to the rdma_cm containing ip_ib_mc_map, period.

>> 2) having the rdma cm follow the net stack and make its consumer use the 
>> broadcast group.

> Correct. Since multicast is broken in other respects on U4
> (sockets can't join multicast groups), I think 2 is the simplest approach.

The situation in U4 is kind of more involved, sockets doing 
IP_ADD_MEMBERSHIP to some multicast group are actually sending and 
receiving traffic over the IPoIB broadcast group which makes this 
cluster IPoIB kind of hell.

> Anyone who wants IPoIB milticast should just stay away from U4.

We are still interested to be able to run our multicast app over the 
RDMA CM and we want it to be done over the correct multicast group and 
not over a broadcast group. So option 2 is real problem for us.

Or.


From mst at mellanox.co.il  Thu Feb  1 04:10:08 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Thu, 1 Feb 2007 14:10:08 +0200
Subject: [openib-general] [PATCH 00/12] ofed_1_2 - Neighbour update
	support
In-Reply-To: <20070125191321.30934.74542.stgit@dell3.ogc.int>
References: <20070125191321.30934.74542.stgit@dell3.ogc.int>
Message-ID: <20070201121008.GA20789@mellanox.co.il>

> Quoting Steve Wise <swise at opengridcomputing.com>:
> Subject: [PATCH 00/12] ofed_1_2 - Neighbour update support
> 
> 
> Michael/Vlad:
> 
> Here are the backports for snooping arp packets to generate neighbour
> update netevents.  Also included is the addr.c patch to act on all valid
> neigh update events.  If this series looks good to you then I'll push
> this up and you all can pull it from my git tree.

This patches seems to have created a reference leak on each neighbour
as a result ipoib interface could not be brought down.
It also seems that RHASU2 backport was missing code.
I pushed out the following:


commit d140398db0da0beb3172e0ccf14ef3023cafec9c
Author: Michael S. Tsirkin <mst at mellanox.co.il>
Date:   Thu Feb 1 12:21:34 2007 +0200

    Fix neighbour reference leak in netevent.c
    
    Signed-off-by: Michael S. Tsirkin <mst at mellanox.co.il>

diff --git a/kernel_addons/backport/2.6.11/include/src/netevent.c b/kernel_addons/backport/2.6.11/include/src/netevent.c
index 6a8df29..0d26662 100644
--- a/kernel_addons/backport/2.6.11/include/src/netevent.c
+++ b/kernel_addons/backport/2.6.11/include/src/netevent.c
@@ -38,8 +38,10 @@ static void destructor(struct sk_buff *skb)
 	arp_ptr = skb->nh.raw + sizeof(struct arphdr) + skb->dev->addr_len;
 	memcpy(&gw, arp_ptr, 4);
 	n = neigh_lookup(&arp_tbl, &gw, skb->dev);
-	if (n)
+	if (n) {
 		call_netevent_notifiers(NETEVENT_NEIGH_UPDATE, n);
+		neigh_release(n);
+	}
 	return;
 }
 
diff --git a/kernel_addons/backport/2.6.12/include/src/netevent.c b/kernel_addons/backport/2.6.12/include/src/netevent.c
index 6a8df29..0d26662 100644
--- a/kernel_addons/backport/2.6.12/include/src/netevent.c
+++ b/kernel_addons/backport/2.6.12/include/src/netevent.c
@@ -38,8 +38,10 @@ static void destructor(struct sk_buff *skb)
 	arp_ptr = skb->nh.raw + sizeof(struct arphdr) + skb->dev->addr_len;
 	memcpy(&gw, arp_ptr, 4);
 	n = neigh_lookup(&arp_tbl, &gw, skb->dev);
-	if (n)
+	if (n) {
 		call_netevent_notifiers(NETEVENT_NEIGH_UPDATE, n);
+		neigh_release(n);
+	}
 	return;
 }
 
diff --git a/kernel_addons/backport/2.6.13/include/src/netevent.c b/kernel_addons/backport/2.6.13/include/src/netevent.c
index 6a8df29..0d26662 100644
--- a/kernel_addons/backport/2.6.13/include/src/netevent.c
+++ b/kernel_addons/backport/2.6.13/include/src/netevent.c
@@ -38,8 +38,10 @@ static void destructor(struct sk_buff *skb)
 	arp_ptr = skb->nh.raw + sizeof(struct arphdr) + skb->dev->addr_len;
 	memcpy(&gw, arp_ptr, 4);
 	n = neigh_lookup(&arp_tbl, &gw, skb->dev);
-	if (n)
+	if (n) {
 		call_netevent_notifiers(NETEVENT_NEIGH_UPDATE, n);
+		neigh_release(n);
+	}
 	return;
 }
 
diff --git a/kernel_addons/backport/2.6.14/include/src/netevent.c b/kernel_addons/backport/2.6.14/include/src/netevent.c
index 188283c..17a12ff 100644
--- a/kernel_addons/backport/2.6.14/include/src/netevent.c
+++ b/kernel_addons/backport/2.6.14/include/src/netevent.c
@@ -38,8 +38,10 @@ static void destructor(struct sk_buff *skb)
 	arp_ptr = skb->nh.raw + sizeof(struct arphdr) + skb->dev->addr_len;
 	memcpy(&gw, arp_ptr, 4);
 	n = neigh_lookup(&arp_tbl, &gw, skb->dev);
-	if (n)
+	if (n) {
 		call_netevent_notifiers(NETEVENT_NEIGH_UPDATE, n);
+		neigh_release(n);
+	}
 	return;
 }
 
diff --git a/kernel_addons/backport/2.6.15/include/src/netevent.c b/kernel_addons/backport/2.6.15/include/src/netevent.c
index 188283c..17a12ff 100644
--- a/kernel_addons/backport/2.6.15/include/src/netevent.c
+++ b/kernel_addons/backport/2.6.15/include/src/netevent.c
@@ -38,8 +38,10 @@ static void destructor(struct sk_buff *skb)
 	arp_ptr = skb->nh.raw + sizeof(struct arphdr) + skb->dev->addr_len;
 	memcpy(&gw, arp_ptr, 4);
 	n = neigh_lookup(&arp_tbl, &gw, skb->dev);
-	if (n)
+	if (n) {
 		call_netevent_notifiers(NETEVENT_NEIGH_UPDATE, n);
+		neigh_release(n);
+	}
 	return;
 }
 
diff --git a/kernel_addons/backport/2.6.15_ubuntu606/include/src/netevent.c b/kernel_addons/backport/2.6.15_ubuntu606/include/src/netevent.c
index 188283c..17a12ff 100644
--- a/kernel_addons/backport/2.6.15_ubuntu606/include/src/netevent.c
+++ b/kernel_addons/backport/2.6.15_ubuntu606/include/src/netevent.c
@@ -38,8 +38,10 @@ static void destructor(struct sk_buff *skb)
 	arp_ptr = skb->nh.raw + sizeof(struct arphdr) + skb->dev->addr_len;
 	memcpy(&gw, arp_ptr, 4);
 	n = neigh_lookup(&arp_tbl, &gw, skb->dev);
-	if (n)
+	if (n) {
 		call_netevent_notifiers(NETEVENT_NEIGH_UPDATE, n);
+		neigh_release(n);
+	}
 	return;
 }
 
diff --git a/kernel_addons/backport/2.6.16/include/src/netevent.c b/kernel_addons/backport/2.6.16/include/src/netevent.c
index 188283c..17a12ff 100644
--- a/kernel_addons/backport/2.6.16/include/src/netevent.c
+++ b/kernel_addons/backport/2.6.16/include/src/netevent.c
@@ -38,8 +38,10 @@ static void destructor(struct sk_buff *skb)
 	arp_ptr = skb->nh.raw + sizeof(struct arphdr) + skb->dev->addr_len;
 	memcpy(&gw, arp_ptr, 4);
 	n = neigh_lookup(&arp_tbl, &gw, skb->dev);
-	if (n)
+	if (n) {
 		call_netevent_notifiers(NETEVENT_NEIGH_UPDATE, n);
+		neigh_release(n);
+	}
 	return;
 }
 
diff --git a/kernel_addons/backport/2.6.16_sles10/include/src/netevent.c b/kernel_addons/backport/2.6.16_sles10/include/src/netevent.c
index 188283c..17a12ff 100644
--- a/kernel_addons/backport/2.6.16_sles10/include/src/netevent.c
+++ b/kernel_addons/backport/2.6.16_sles10/include/src/netevent.c
@@ -38,8 +38,10 @@ static void destructor(struct sk_buff *skb)
 	arp_ptr = skb->nh.raw + sizeof(struct arphdr) + skb->dev->addr_len;
 	memcpy(&gw, arp_ptr, 4);
 	n = neigh_lookup(&arp_tbl, &gw, skb->dev);
-	if (n)
+	if (n) {
 		call_netevent_notifiers(NETEVENT_NEIGH_UPDATE, n);
+		neigh_release(n);
+	}
 	return;
 }
 
diff --git a/kernel_addons/backport/2.6.17/include/src/netevent.c b/kernel_addons/backport/2.6.17/include/src/netevent.c
index 26a0920..4c67de1 100644
--- a/kernel_addons/backport/2.6.17/include/src/netevent.c
+++ b/kernel_addons/backport/2.6.17/include/src/netevent.c
@@ -38,8 +38,10 @@ static void destructor(struct sk_buff *skb)
 	arp_ptr = skb->nh.raw + sizeof(struct arphdr) + skb->dev->addr_len;
 	memcpy(&gw, arp_ptr, 4);
 	n = neigh_lookup(&arp_tbl, &gw, skb->dev);
-	if (n)
+	if (n) {
 		call_netevent_notifiers(NETEVENT_NEIGH_UPDATE, n);
+		neigh_release(n);
+	}
 	return;
 }
 
diff --git a/kernel_addons/backport/2.6.5_sles9_sp3/include/src/netevent.c b/kernel_addons/backport/2.6.5_sles9_sp3/include/src/netevent.c
index 57a23ab..90fce0c 100644
--- a/kernel_addons/backport/2.6.5_sles9_sp3/include/src/netevent.c
+++ b/kernel_addons/backport/2.6.5_sles9_sp3/include/src/netevent.c
@@ -39,8 +39,10 @@ static void destructor(struct sk_buff *skb)
 	arp_ptr = skb->nh.raw + sizeof(struct arphdr) + skb->dev->addr_len;
 	memcpy(&gw, arp_ptr, 4);
 	n = neigh_lookup(&arp_tbl, &gw, skb->dev);
-	if (n)
+	if (n) {
 		call_netevent_notifiers(NETEVENT_NEIGH_UPDATE, n);
+		neigh_release(n);
+	}
 	return;
 }
 
diff --git a/kernel_addons/backport/2.6.9_U2/include/src/netevent.c b/kernel_addons/backport/2.6.9_U2/include/src/netevent.c
index 5ffadd1..1589300 100644
--- a/kernel_addons/backport/2.6.9_U2/include/src/netevent.c
+++ b/kernel_addons/backport/2.6.9_U2/include/src/netevent.c
@@ -13,10 +13,59 @@
  *	Fixes:
  */
 
-#include <linux/module.h>
-#include <linux/skbuff.h>
 #include <linux/rtnetlink.h>
 #include <linux/notifier.h>
+#include <linux/mutex.h>
+#include <linux/if.h>
+#include <linux/netdevice.h>
+#include <linux/if_arp.h>
+
+#include <net/arp.h>
+#include <net/neighbour.h>
+#include <net/route.h>
+#include <net/netevent.h>
+
+static DEFINE_MUTEX(lock);
+static int count;
+
+static void destructor(struct sk_buff *skb)
+{
+	struct neighbour *n;
+	u8 *arp_ptr;
+	__be32 gw;
+
+	/* Pull the SPA */
+	arp_ptr = skb->nh.raw + sizeof(struct arphdr) + skb->dev->addr_len;
+	memcpy(&gw, arp_ptr, 4);
+	n = neigh_lookup(&arp_tbl, &gw, skb->dev);
+	if (n) {
+		call_netevent_notifiers(NETEVENT_NEIGH_UPDATE, n);
+		neigh_release(n);
+	}
+	return;
+}
+
+static int arp_recv(struct sk_buff *skb, struct net_device *dev,
+			 struct packet_type *pkt)
+{
+	struct arphdr *arp_hdr;
+	u16 op;
+
+	arp_hdr = (struct arphdr *) skb->nh.raw;
+	op = ntohs(arp_hdr->ar_op);
+
+	if ((op == ARPOP_REQUEST || op == ARPOP_REPLY) && !skb->destructor)
+		skb->destructor = destructor;
+
+	kfree_skb(skb);
+	return 0;
+}
+
+static struct packet_type arp = {
+	.type = __constant_htons(ETH_P_ARP),
+	.func = arp_recv,
+	.af_packet_priv = (void *)1,
+};
 
 static struct notifier_block *netevent_notif_chain;
 
@@ -34,6 +83,12 @@ int register_netevent_notifier(struct notifier_block *nb)
 	int err;
 
 	err = notifier_chain_register(&netevent_notif_chain, nb);
+	if (!err) {
+		mutex_lock(&lock);
+		if (count++ == 0)
+			dev_add_pack(&arp);
+		mutex_unlock(&lock);
+	}
 	return err;
 }
 
@@ -49,7 +104,16 @@ int register_netevent_notifier(struct notifier_block *nb)
 
 int unregister_netevent_notifier(struct notifier_block *nb)
 {
-	return notifier_chain_unregister(&netevent_notif_chain, nb);
+	int err;
+
+	err = notifier_chain_unregister(&netevent_notif_chain, nb);
+	if (!err) {
+		mutex_lock(&lock);
+		if (--count == 0)
+			dev_remove_pack(&arp);
+		mutex_unlock(&lock);
+	}
+	return err;
 }
 
 /**
diff --git a/kernel_addons/backport/2.6.9_U3/include/src/netevent.c b/kernel_addons/backport/2.6.9_U3/include/src/netevent.c
index 5ffadd1..1589300 100644
--- a/kernel_addons/backport/2.6.9_U3/include/src/netevent.c
+++ b/kernel_addons/backport/2.6.9_U3/include/src/netevent.c
@@ -13,10 +13,59 @@
  *	Fixes:
  */
 
-#include <linux/module.h>
-#include <linux/skbuff.h>
 #include <linux/rtnetlink.h>
 #include <linux/notifier.h>
+#include <linux/mutex.h>
+#include <linux/if.h>
+#include <linux/netdevice.h>
+#include <linux/if_arp.h>
+
+#include <net/arp.h>
+#include <net/neighbour.h>
+#include <net/route.h>
+#include <net/netevent.h>
+
+static DEFINE_MUTEX(lock);
+static int count;
+
+static void destructor(struct sk_buff *skb)
+{
+	struct neighbour *n;
+	u8 *arp_ptr;
+	__be32 gw;
+
+	/* Pull the SPA */
+	arp_ptr = skb->nh.raw + sizeof(struct arphdr) + skb->dev->addr_len;
+	memcpy(&gw, arp_ptr, 4);
+	n = neigh_lookup(&arp_tbl, &gw, skb->dev);
+	if (n) {
+		call_netevent_notifiers(NETEVENT_NEIGH_UPDATE, n);
+		neigh_release(n);
+	}
+	return;
+}
+
+static int arp_recv(struct sk_buff *skb, struct net_device *dev,
+			 struct packet_type *pkt)
+{
+	struct arphdr *arp_hdr;
+	u16 op;
+
+	arp_hdr = (struct arphdr *) skb->nh.raw;
+	op = ntohs(arp_hdr->ar_op);
+
+	if ((op == ARPOP_REQUEST || op == ARPOP_REPLY) && !skb->destructor)
+		skb->destructor = destructor;
+
+	kfree_skb(skb);
+	return 0;
+}
+
+static struct packet_type arp = {
+	.type = __constant_htons(ETH_P_ARP),
+	.func = arp_recv,
+	.af_packet_priv = (void *)1,
+};
 
 static struct notifier_block *netevent_notif_chain;
 
@@ -34,6 +83,12 @@ int register_netevent_notifier(struct notifier_block *nb)
 	int err;
 
 	err = notifier_chain_register(&netevent_notif_chain, nb);
+	if (!err) {
+		mutex_lock(&lock);
+		if (count++ == 0)
+			dev_add_pack(&arp);
+		mutex_unlock(&lock);
+	}
 	return err;
 }
 
@@ -49,7 +104,16 @@ int register_netevent_notifier(struct notifier_block *nb)
 
 int unregister_netevent_notifier(struct notifier_block *nb)
 {
-	return notifier_chain_unregister(&netevent_notif_chain, nb);
+	int err;
+
+	err = notifier_chain_unregister(&netevent_notif_chain, nb);
+	if (!err) {
+		mutex_lock(&lock);
+		if (--count == 0)
+			dev_remove_pack(&arp);
+		mutex_unlock(&lock);
+	}
+	return err;
 }
 
 /**
diff --git a/kernel_addons/backport/2.6.9_U4/include/src/netevent.c b/kernel_addons/backport/2.6.9_U4/include/src/netevent.c
index 6a8df29..0d26662 100644
--- a/kernel_addons/backport/2.6.9_U4/include/src/netevent.c
+++ b/kernel_addons/backport/2.6.9_U4/include/src/netevent.c
@@ -38,8 +38,10 @@ static void destructor(struct sk_buff *skb)
 	arp_ptr = skb->nh.raw + sizeof(struct arphdr) + skb->dev->addr_len;
 	memcpy(&gw, arp_ptr, 4);
 	n = neigh_lookup(&arp_tbl, &gw, skb->dev);
-	if (n)
+	if (n) {
 		call_netevent_notifiers(NETEVENT_NEIGH_UPDATE, n);
+		neigh_release(n);
+	}
 	return;
 }
 

-- 
MST


From mst at mellanox.co.il  Thu Feb  1 04:19:30 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Thu, 1 Feb 2007 14:19:30 +0200
Subject: [openib-general] [PATCH 00/12] ofed_1_2 - Neighbour update
	support
In-Reply-To: <20070125191321.30934.74542.stgit@dell3.ogc.int>
References: <20070125191321.30934.74542.stgit@dell3.ogc.int>
Message-ID: <20070201121930.GB20789@mellanox.co.il>

> Here are the backports for snooping arp packets to generate neighbour
> update netevents.

OK, I went (somewhat belatedly) over this code in more depth and I see
a couple of issues that I'd like you to address:

- There's some trailing whitespace in some netevet.c files.
  Could you clean these please?

- I see:
	$ diff ./kernel_addons/backport/2.6.9_U4/include/src/netevent.c
	kernel_addons/backport/2.6.5_sles9_sp3/include/src/netevent.c
	> #include <linux/skbuff.h>

Should not redhat backports include skbuff.h too?
They do use skbuff struct so it seems it is cleaner to include
directly, and we would get identical code for redhat and suse.
 
- What is the reason for:
        if ((op == ARPOP_REQUEST || op == ARPOP_REPLY) && !skb->destructor)
	                skb->destructor = destructor;

	kfree_skb(skb);

Could we miss events because skb has a desctructor?
Can we just call the descructor function directly (this is what addr.c
did previously, and this apparently worked fine).

Steve, could you pls clone ofed git and address these?


-- 
MST


From glebn at voltaire.com  Thu Feb  1 04:42:30 2007
From: glebn at voltaire.com (glebn at voltaire.com)
Date: Thu, 1 Feb 2007 14:42:30 +0200
Subject: [openib-general] [RFC/BUG] libibverbs: DMA vs. CQ race
In-Reply-To: <adabqkhy04v.fsf@cisco.com>
References: <Pine.LNX.4.61.0612131626250.24974@localhost.localdomain>
	<ada8xhaq5ze.fsf@cisco.com>
	<Pine.LNX.4.61.0701281444230.32767@localhost.localdomain>
	<adabqkhy04v.fsf@cisco.com>
Message-ID: <20070201124230.GA23354@minantech.com>

On Mon, Jan 29, 2007 at 01:49:04PM -0800, Roland Dreier wrote:
> Even with that resolved this all seems rather unfortunate to me.  I
> don't like the idea of having the kernel keep all these buffers around
> and then have the userspace library have to map the right buffer.  It
> leads to awkwardness like the fact that mthca_resize_cq() seems to be
> totally screwed if ibv_cmd_resize_cq() fails for some reason -- it
> already munmap'ed the original buffer, and it can't map the new
> buffer, and so the CQ is dead with no chance to recover.
I looked through ehca driver and it looks as it is doing exactly this
"keep all these buffers around and then have the userspace library have
to map the right buffer". ehca doesn't support resize_cq though, but
lest say this issue will be also resolved will this approach be
acceptable. This is how ehca works after all, so we are not inventing
something new here.

> 
> The really strange thing about this is that this Altix
> coherent/consistent memory really isn't about the memory itself, but
> about the relationship of that memory with DMA elsewhere -- as I
> understand the code, doing dma_alloc_coherent() returns normal memory
> with a special DMA address that tells the system to flush other DMAs
> before doing DMA to the coherent region.  Which isn't really what most
> people understand coherent memory to be, but it has the magic property
> of making most drivers work.
Yes. It seems Altix abuses dma_alloc_coherent() for this.

> 
> So I'd really like a better solution, but I don't have one in mind
> unfortunately.  Maybe we can all meditate on this and try to come up
> with something cleaner -- I really hope there is a better way to
> handle this.
>
Another approach may be to add another verbs (or we can make ibv_reg_mr
do this with special flag) for coherent memory allocation. This verb
will allocate coherent memory in the kernel and mmap it from a user space.
Than cq will be created as usual by providing lkey to the create_cq
verb. The resize will work exactly like it works now i.e allocate new cq
buffer call resize_cq with new buffer's lkey, copy cqes, unregister old buffer.

--
			Gleb.


From mst at mellanox.co.il  Thu Feb  1 04:42:11 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Thu, 1 Feb 2007 14:42:11 +0200
Subject: [openib-general] [PATCH 00/12] ofed_1_2 - Neighbour update
	support
In-Reply-To: <20070201121930.GB20789@mellanox.co.il>
References: <20070125191321.30934.74542.stgit@dell3.ogc.int>
	<20070201121930.GB20789@mellanox.co.il>
Message-ID: <20070201124211.GD20789@mellanox.co.il>

> - There's some trailing whitespace in some netevet.c files.
>   Could you clean these please?

OK, fixed the trailing whitespace and pushed out.

-- 
MST


From bugzilla-daemon at lists.openfabrics.org  Thu Feb  1 05:02:09 2007
From: bugzilla-daemon at lists.openfabrics.org (bugzilla-daemon at lists.openfabrics.org)
Date: Thu,  1 Feb 2007 05:02:09 -0800 (PST)
Subject: [openib-general] [Bug 334] Problems with build
	OFED-1.1.1-ib_local_sa
In-Reply-To: <bug-334-1@https.bugs.openfabrics.org/>
Message-ID: <20070201130209.CF235E607F7@openfabrics.org>

https://bugs.openfabrics.org/show_bug.cgi?id=334


------- Comment #8 from erezz at voltaire.com  2007-02-01 05:02 -------
(In reply to comment #7)
> I try to build the product again and i saw thet pathces from
> kernel_patches/backport/2.6.16_sles10/ directory not applied automaticaly. When
> I applay these patch manually all built. How I can run build process with
> automaticaly appaling patches?
> 

What is the output of uname -a ?

on my machine:

thyme:/tmp/ofed_sa_test/OFED-1.1.1-ib_local_sa # uname -a
Linux thyme 2.6.16.21-0.8-smp #1 SMP Mon Jul 3 18:25:39 UTC 2006 x86_64 x86_64
x86_64 GNU/Linux

Try the following:
Edit ofed_scripts/configure and add the line: "echo ${KVERSION}" where the
switch starts in line 214. See what happens in case 2.6.16*.


-- 
Configure bugmail: https://bugs.openfabrics.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From wombat2 at us.ibm.com  Thu Feb  1 05:21:36 2007
From: wombat2 at us.ibm.com (Bernard King-Smith)
Date: Thu, 1 Feb 2007 08:21:36 -0500
Subject: [openib-general] [mthca] Creation of a SRQ with many WR (> 16K)
 in kernel level fails
In-Reply-To: <mailman.594.1170323663.8957.openib-general@openib.org>
Message-ID: <OFFC8A346E.476C90F2-ON85257275.00484F2F-85257275.0049639B@us.ibm.com>

> ----- Message from "Or Gerlitz" <ogerlitz at voltaire.com> on Thu, 01 Feb 
2007 11:17:53 +0200 -----
> 
> Dotan Barak wrote:
> > I think that now, when implementation of IPoIB CM is available and SRQ 

> > is being used, one may
> > need to use a SRQ with more than 16K WRs.
> 
> IPoIB UD uses SRQ by nature (since RX from all peers consume buffers 
> from the --only-- RQ) and lives fine with 32 buffers (or 64 you can look 

> in the code). Moreover, my assumption is that
> 
>    pps(RC) <= pps(UC) <= pps(UD)
> 
> this means that what ever number of RX buffer for UD/2K MTU which is 
> "enough" to have no (or close to zero) packet loss under some traffic 
> pattern, the same pattern can be served with IPoIB CM using SRQ of the 
> same size.

I would expect that you will need more than 32 or 64 buffers using RC and 
SRQ. With larger packets it takes longer to do receive processing on each 
packet under RC. Larger packets means it takes more time to do checksum 
and copy to the socket because of up to 60K or data vs. 2K. The residency 
time on the receive queue will be longer. In the traffic pattern where one 
adapter is receiving from many adapters over the fabric, there will be a 
larger imbalance between sender rate vs. the receiving rate out of the 
queue. Given a large enough TCP send and receive window for a single 
socket to get peak bandwidth, muliple sockets will have more packet in 
flight for a single destination at the same time in this pattern

> 
> Or.
> 
> 
> 

Bernie King-Smith 
IBM Corporation
Server Group
Cluster System Performance 
wombat2 at us.ibm.com    (845)433-8483
Tie. 293-8483 or wombat2 on NOTES 

"We are not responsible for the world we are born into, only for the world 
we leave when we die.
So we have to accept what has gone before us and work to change the only 
thing we can,
-- The Future." William Shatner
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070201/203b0bd6/attachment.html>

From mst at mellanox.co.il  Thu Feb  1 05:55:22 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Thu, 1 Feb 2007 15:55:22 +0200
Subject: [openib-general] [PATCH] RE:  regression in ofed 1.2
In-Reply-To: <1170322670.654.23.camel@linux-q667.site>
References: <000401c7458b$9bff77d0$8698070a@amr.corp.intel.com>
	<1170322670.654.23.camel@linux-q667.site>
Message-ID: <20070201135522.GA27688@mellanox.co.il>

> Quoting Steve WIse <swise at opengridcomputing.com>:
> Subject: Re: [PATCH] RE:  regression in ofed 1.2
> 
> > Okay - I _think_ the problem is that OFED 1.2 pulled code from my git tree
> > before I created an ofed_1_2 branch (which contains the fix), and didn't update
> > to match my ofed_1_2 branch.  The crash that you reported occurring over iWarp
> > should also happen over IB for the same reason, so both are likely broken atm...
> > 
> > Vlad, can you please update the ofed build by pulling from the ofed_1_2 branches
> > of my rdma-dev.git and librdmacm.git trees?
> 
> I looked at your rdma-dev ofed_1_2 branch and see that the cma.c changes
> you made there will resolve this issue.  It just needs to be pulled into
> ofed_1_2.

OK, I've updated ofed to code from rdma-dev ofed_1_2 branch. Some notes: 

- Sean, please base your branches on specific -rc from linus
  (OFED 1.2 is now -rc7).
- Now that we are entering feature freeze, we should not do full replaces anymore.
  So Sean, please post incremental patches, labeled ofed-1.2 clearly.

-- 
MST


From bugzilla-daemon at lists.openfabrics.org  Thu Feb  1 05:57:29 2007
From: bugzilla-daemon at lists.openfabrics.org (bugzilla-daemon at lists.openfabrics.org)
Date: Thu,  1 Feb 2007 05:57:29 -0800 (PST)
Subject: [openib-general] [Bug 334] Problems with build
	OFED-1.1.1-ib_local_sa
In-Reply-To: <bug-334-1@https.bugs.openfabrics.org/>
Message-ID: <20070201135729.C10E3E607F7@openfabrics.org>

https://bugs.openfabrics.org/show_bug.cgi?id=334


------- Comment #9 from dmitry.yulov at intel.com  2007-02-01 05:57 -------
> Edit ofed_scripts/configure and add the line: "echo ${KVERSION}" where the
> switch starts in line 214. See what happens in case 2.6.16*.
When I try to run build.sh I see in log file:
Applying patches for 2.6.16.21-0.8-smp kernel:
       
/var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.16/addr_1_netevents_revert_to_2_6_17.patch
patching file drivers/infiniband/core/addr.c
       
/var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.16/ipath-backport.patch
patching file drivers/infiniband/hw/ipath/iowrite32_copy_x86_64.S
patching file drivers/infiniband/hw/ipath/ipath_backport.h
patching file drivers/infiniband/hw/ipath/ipath_diag.c
patching file drivers/infiniband/hw/ipath/ipath_driver.c

As I understand in this case used directory 2.6.16 not 2.6.16_suse10.
I try to add in build.sh script the option 
configure_options="$configure_options
--with-patchdir=/root/install/OFED-1.1.1-ib_local_sa/2.6.16_sles10"
But in this case build process broken. I don't know how I can add the patching
procedure in build.sh for patch cma.c file and kernel. Do you have any ideas?


-- 
Configure bugmail: https://bugs.openfabrics.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From swise at opengridcomputing.com  Thu Feb  1 05:57:28 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Thu, 1 Feb 2007 07:57:28 -0600
Subject: [openib-general] [PATCH] RE:  regression in ofed 1.2
References: <000401c7458b$9bff77d0$8698070a@amr.corp.intel.com>
	<1170322670.654.23.camel@linux-q667.site>
	<1170327205.654.34.camel@linux-q667.site>
	<20070201135619.GB27688@mellanox.co.il>
Message-ID: <000e01c74608$e9b4a040$020010ac@haggard>

>> >
>>
>> Also, I just pulled down and built the latest ofed_1_2 kernel and 
>> user
>> code against 2.6.20-rc7, and the ucma abi is 4.  So rdma_create_qp()
>> will still crash even with the librdmacm code to avoid the call to
>> rdma_init_qp_attr for ABI 3 kernels.
>>
>>
>> Steve.
>
> I'm a bit confused. Can you please try with latest code I've just 
> pushed out?
>

Will do.  This was before you pulled in sean's code. 


From mst at mellanox.co.il  Thu Feb  1 05:56:19 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Thu, 1 Feb 2007 15:56:19 +0200
Subject: [openib-general] [PATCH] RE:  regression in ofed 1.2
In-Reply-To: <1170327205.654.34.camel@linux-q667.site>
References: <000401c7458b$9bff77d0$8698070a@amr.corp.intel.com>
	<1170322670.654.23.camel@linux-q667.site>
	<1170327205.654.34.camel@linux-q667.site>
Message-ID: <20070201135619.GB27688@mellanox.co.il>

> Quoting Steve WIse <swise at opengridcomputing.com>:
> Subject: Re: [PATCH] RE:  regression in ofed 1.2
> 
> On Thu, 2007-02-01 at 03:37 -0600, Steve WIse wrote:
> > > Okay - I _think_ the problem is that OFED 1.2 pulled code from my git tree
> > > before I created an ofed_1_2 branch (which contains the fix), and didn't update
> > > to match my ofed_1_2 branch.  The crash that you reported occurring over iWarp
> > > should also happen over IB for the same reason, so both are likely broken atm...
> > > 
> > > Vlad, can you please update the ofed build by pulling from the ofed_1_2 branches
> > > of my rdma-dev.git and librdmacm.git trees?
> > 
> > I looked at your rdma-dev ofed_1_2 branch and see that the cma.c changes
> > you made there will resolve this issue.  It just needs to be pulled into
> > ofed_1_2.
> > 
> 
> Also, I just pulled down and built the latest ofed_1_2 kernel and user
> code against 2.6.20-rc7, and the ucma abi is 4.  So rdma_create_qp()
> will still crash even with the librdmacm code to avoid the call to
> rdma_init_qp_attr for ABI 3 kernels.
> 
> 
> Steve.

I'm a bit confused. Can you please try with latest code I've just pushed out?

-- 
MST


From bugzilla-daemon at lists.openfabrics.org  Thu Feb  1 06:15:18 2007
From: bugzilla-daemon at lists.openfabrics.org (bugzilla-daemon at lists.openfabrics.org)
Date: Thu,  1 Feb 2007 06:15:18 -0800 (PST)
Subject: [openib-general] [Bug 334] Problems with build
	OFED-1.1.1-ib_local_sa
In-Reply-To: <bug-334-1@https.bugs.openfabrics.org/>
Message-ID: <20070201141518.A7561E607F7@openfabrics.org>

https://bugs.openfabrics.org/show_bug.cgi?id=334


------- Comment #10 from erezz at voltaire.com  2007-02-01 06:15 -------
(In reply to comment #9)
> > Edit ofed_scripts/configure and add the line: "echo ${KVERSION}" where the
> > switch starts in line 214. See what happens in case 2.6.16*.
> When I try to run build.sh I see in log file:
> Applying patches for 2.6.16.21-0.8-smp kernel:

What is the output of uname -r ? This is VERY important. Also, can you run `cat
/etc/issue` and send the results?

>        
> /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.16/addr_1_netevents_revert_to_2_6_17.patch
> patching file drivers/infiniband/core/addr.c
>        
> /var/tmp/OFEDRPM/BUILD/openib-1.1/kernel_patches/backport/2.6.16/ipath-backport.patch
> patching file drivers/infiniband/hw/ipath/iowrite32_copy_x86_64.S
> patching file drivers/infiniband/hw/ipath/ipath_backport.h
> patching file drivers/infiniband/hw/ipath/ipath_diag.c
> patching file drivers/infiniband/hw/ipath/ipath_driver.c
> 
> As I understand in this case used directory 2.6.16 not 2.6.16_suse10.

This is not good. Try to debug ofed_scripts/configure and see what happens in
the switch in apply_backport_patches.

> I try to add in build.sh script the option 
> configure_options="$configure_options
> --with-patchdir=/root/install/OFED-1.1.1-ib_local_sa/2.6.16_sles10"

Don't do that.


-- 
Configure bugmail: https://bugs.openfabrics.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From halr at voltaire.com  Thu Feb  1 06:16:59 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 01 Feb 2007 09:16:59 -0500
Subject: [openib-general] [PATCH] osm: some trivial chages in the
 osm_ucast_lash for compilation on windows
In-Reply-To: <45C1D3A0.7060201@dev.mellanox.co.il>
References: <45C1D3A0.7060201@dev.mellanox.co.il>
Message-ID: <1170339359.15660.265762.camel@hal.voltaire.com>

Hi Yevgeny,

On Thu, 2007-02-01 at 06:48, Yevgeny Kliteynik wrote:
> Hi Hal,
> 
> This patch has some trivial changes in the osm_ucast_lash.c 
> for compilation on windows.
> 
> In general, this file needs a major cosmetic (and not only)
> patch to fit better into the OSM code. 

There will shortly be some work to improve this. This is one of the next
items on the list for this.

> Will get back to it at some point in the future.

Sure; this is not your problem but if you get to it first that will
help.

> -- Yevgeny
> 
> Signed-off-by: Yevgeny Kliteynik <kliteyn at dev.mellanox.co.il>

Thanks. Applied.

-- Hal


From halr at voltaire.com  Thu Feb  1 06:32:35 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 01 Feb 2007 09:32:35 -0500
Subject: [openib-general] [PATCH] osm: trivial casting for compilation
	on windows
In-Reply-To: <45C1C255.4060405@dev.mellanox.co.il>
References: <45C1C255.4060405@dev.mellanox.co.il>
Message-ID: <1170339465.15660.265845.camel@hal.voltaire.com>

On Thu, 2007-02-01 at 05:35, Yevgeny Kliteynik wrote:
> Trivial casting for compilation on windows
> 
> Signed-off-by: Yevgeny Kliteynik <kliteyn at dev.mellanox.co.il>

Thanks. Applied.

-- Hal


From steakdbini at yahoo.co.jp  Thu Feb  1 07:20:31 2007
From: steakdbini at yahoo.co.jp (�)
Date: Fri,  2 Feb 2007 00:20:31 +0900 (JST)
Subject: [openib-general]
	=?ISO-2022-JP?B?g4GBW4OLgqCC6IKqgsaCpIKygrSCooLcgrWCvYH0?=
Message-ID: 20070202002015

お久し振りです。瑞奈です。
先日はメールありがとうございました。
返事が遅くなってしまい、申し訳ありません。

前のメールで質問されていた仕事の話ですが・・・
私は専業主婦なんです。
去年の12月からずっと家のことをやってて、それで忙しかったんです。
家事は楽しいんですが、さすがに疲れが・・・（＞＜
こんな生活なので出会いもないし、誰かに甘えたくなっちゃう事も多くて。
それで、急にこんな事をいうと変に思われるかもしれませんが
一度会ってお話をしたいのですが、ご迷惑でしょうか？
私は世田谷区に住んでいる31歳です。
一緒にゴハンを食べたり、たくさんお話がしたいです♪
できれば今週末、新宿か渋谷あたりが私は都合がいいのですが
いかがでしょうか？

http://mic.chu.jp/mizuna/

最近、このサイトを利用しているので
ここからメールを下さいませんか？
mixiもやっているのですが、こちらの方が居心地がいいので
このサイトばかりを使ってます（＾＾；

それでは、お返事をお待ちしていますね。

瑞奈


From tziporet at mellanox.co.il  Thu Feb  1 07:40:26 2007
From: tziporet at mellanox.co.il (Tziporet Koren)
Date: Thu, 01 Feb 2007 17:40:26 +0200
Subject: [openib-general] components that have not opend the ofed_1_2 branch
Message-ID: <45C209EA.1040207@mellanox.co.il>

The following components have not opened ofed_1_2 branch:

    * libibverbs - Roland
    * libmthca - Roland
    * libipathverbs - Bryan
    * tvflash - Roland
    * srptools - Ishai
    * management - Hal


Please open the branch today or tomorrow at the latest .

Thanks,
Tziporet


From kliteyn at dev.mellanox.co.il  Thu Feb  1 07:57:42 2007
From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik)
Date: Thu, 01 Feb 2007 17:57:42 +0200
Subject: [openib-general] [PATCH 10/10] osm: QoS in OpenSM
In-Reply-To: <1170344724.15660.271079.camel@hal.voltaire.com>
References: <45BF6548.80104@dev.mellanox.co.il>
	<1170264561.15660.189494.camel@hal.voltaire.com>
	<45C115D8.6070504@dev.mellanox.co.il>
	<1170344724.15660.271079.camel@hal.voltaire.com>
Message-ID: <45C20DF6.6060809@dev.mellanox.co.il>

Hi Hal,

Hal Rosenstock wrote:
> Hi again Yevgeny,
> 
> On Wed, 2007-01-31 at 17:19, Yevgeny Kliteynik wrote:
> 
> [snip...]
> 
>>>> +   for (i = 0; i < IB_MAX_NUM_VLS; i++)
>>>> +   {
>>>> +      if (valid_sls[i])
>>>> +      {
>>>> +         vl = ib_slvl_table_get(p_slvl_tbl,i);
>>>> +         if (vl == IB_DROP_VL)
>>> Does vl > Operational VLs need checking here or is it never set this way
>>> ?
>> I think that it would be better if the "setup" part would check it when
>> configuring sl2vl tables, and when VL > Operational VL it should set
>> some default value instead (VL15 looks as a good option).
> 
> OK; but why scan all VLs if they are not supported ?

Agree, adding it to my ToDo list of improvements in QoS.
 
>>>> +            valid_sls[i] = FALSE;
>>>> +      }
>>>> +   }
>>>> +
>>>> +   /*
>>>> +    * now get pointer to the destination port (same as above)
>>>> +    */
>>>> +   p_node = osm_physp_get_node_ptr( p_dest_physp );
>>>> +
>>>> +   if( p_node->sw )
>>>> +   {
>>>> +      p_dest_physp = osm_switch_get_route_by_lid( p_node->sw, cl_ntoh16( dest_lid_ho ) );
>>>> +      if ( p_dest_physp == 0 )
>>>> +      {
>>>> +         osm_log( p_rcv->p_log, OSM_LOG_ERROR,
>>>> +                  "__osm_pr_rcv_get_path_parms_qos: ERR 1F03: "
>>>> +                  "Cannot find routing to LID 0x%X from switch for GUID 0x%016" PRIx64 "\n",
>>>> +                  dest_lid_ho,
>>>> +                  cl_ntoh64( osm_node_get_node_guid( p_node ) ) );
>>>> +         status = IB_ERROR;
>>>> +         goto Exit;
>>>> +      }
>>>> +   }
>>>> +
>>>> +   /*
>>>> +    * Now go through the path step by step
>>>> +    */
>>>> +
>>>> +   while( p_physp != p_dest_physp )
>>>> +   {
>>>> +      p_physp = osm_physp_get_remote( p_physp );
>>>> +      if ( p_physp == 0 )
>>>> +      {
>>>> +         osm_log( p_rcv->p_log, OSM_LOG_ERROR,
>>>> +                  "__osm_pr_rcv_get_path_parms_qos: ERR 1F04: "
>>>> +                  "Cannot find remote phys port when routing to LID 0x%X from node GUID 0x%016" PRIx64 "\n",
>>>> +                  dest_lid_ho,
>>>> +                  cl_ntoh64( osm_node_get_node_guid( p_node ) ) );
>>>> +         status = IB_ERROR;
>>>> +         goto Exit;
>>>> +      }
>>>> +      
>>>> +      in_port_num = osm_physp_get_port_num(p_physp);
>>>> +
>>>> +      /* this is point to point case (no switch in between) */
>>>> +      if( p_physp == p_dest_physp )
>>>> +         break;
>>>
>>> Ordering of check for switch and point to point case are different here
>>> and original routine. Should they be the same ? If so, which should
>>> change ? (Any reason why this was moved in this routine ?)
>> Not sure I'm following.
>> The order of check for switch and point to point case looks the same
>> to me (am I missing something?). The difference that I see is that 
>> the mtu and rate in the original function are adjusted after the 
>> check for switch, and in the new function they are adjusted before the
>> check, which I think is the same.
> 
> That could have been what I was seeing. Shouldn't the two functions be
> indentical in order (assuming these are to be separated) ? I wouldn't
> want to see them diverge further.

The order in the new function can be changed to match the order in the 
old one - I have no problem with that.

> [snip...]
> 
>>>> +/**********************************************************************
>>>> + **********************************************************************/
>>>>  static void
>>>>  __osm_pr_rcv_build_pr(
>>>>    IN osm_pr_rcv_t*         const p_rcv,
>>>> @@ -774,7 +1569,8 @@ __osm_pr_rcv_build_pr(
>>>>  #endif
>>>>  
>>>>    p_pr->pkey = p_parms->pkey;
>>>> -  p_pr->sl = cl_hton16(p_parms->sl);
>>>> +  ib_path_rec_set_qos_class(p_pr,p_parms->class);
>>>> +  ib_path_rec_set_sl(p_pr,p_parms->sl);
>>>>    p_pr->mtu = (uint8_t)(p_parms->mtu | 0x80);
>>>>    p_pr->rate = (uint8_t)(p_parms->rate | 0x80);
>>>>  
>>>> @@ -832,10 +1628,14 @@ __osm_pr_rcv_get_lid_pair_path(
>>>>      goto Exit;
>>>>    }
>>>>  
>>>> -  status = __osm_pr_rcv_get_path_parms( p_rcv, p_pr, p_src_port,
>>>> -                                        p_dest_port, dest_lid_ho,
>>>> -                                        comp_mask, &path_parms );
>>>> -
>>>> +  if (p_rcv->p_subn->opt.no_qos)
>>> Shouldn't this be based on p_rcv->p_subn.opt.qos_policy_file rather than
>>> no_qos ? I think there are cases where the QoS will be used without the
>>> QoS policy (higher level QoS support).
>> By totally ignoring sl2vl tables the original function may return
>> path that isn't a "real" path - it may lead to VL15 at some point.
>> So the new function takes care of this problem.
> 
> So it's a bug fix (missing functionality) in the existing QoS support.

Right. Hopefully, the new function will replace the old one, and there
won't be a need to add this functionality to the old function as a separate
task.
 
>> When there's no policy file, the policy parse tree is empty, and then 
>> the ports would not have any qos-level to be applied on the examined path.
>> In that case the new function does whatever the old one did, plus checking
>> the path for sl2vl "consistency".
> 
> Got it. Thanks.
> 
> -- Hal
> 
>> -- Yevgeny
> 
> 


From halr at voltaire.com  Thu Feb  1 07:45:34 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 01 Feb 2007 10:45:34 -0500
Subject: [openib-general] [PATCH 10/10] osm: QoS in OpenSM
In-Reply-To: <45C115D8.6070504@dev.mellanox.co.il>
References: <45BF6548.80104@dev.mellanox.co.il>
	<1170264561.15660.189494.camel@hal.voltaire.com>
	<45C115D8.6070504@dev.mellanox.co.il>
Message-ID: <1170344724.15660.271079.camel@hal.voltaire.com>

Hi again Yevgeny,

On Wed, 2007-01-31 at 17:19, Yevgeny Kliteynik wrote:

[snip...]

> >> +   for (i = 0; i < IB_MAX_NUM_VLS; i++)
> >> +   {
> >> +      if (valid_sls[i])
> >> +      {
> >> +         vl = ib_slvl_table_get(p_slvl_tbl,i);
> >> +         if (vl == IB_DROP_VL)
> > 
> > Does vl > Operational VLs need checking here or is it never set this way
> > ?
> I think that it would be better if the "setup" part would check it when
> configuring sl2vl tables, and when VL > Operational VL it should set
> some default value instead (VL15 looks as a good option).

OK; but why scan all VLs if they are not supported ?

> >> +            valid_sls[i] = FALSE;
> >> +      }
> >> +   }
> >> +
> >> +   /*
> >> +    * now get pointer to the destination port (same as above)
> >> +    */
> >> +   p_node = osm_physp_get_node_ptr( p_dest_physp );
> >> +
> >> +   if( p_node->sw )
> >> +   {
> >> +      p_dest_physp = osm_switch_get_route_by_lid( p_node->sw, cl_ntoh16( dest_lid_ho ) );
> >> +      if ( p_dest_physp == 0 )
> >> +      {
> >> +         osm_log( p_rcv->p_log, OSM_LOG_ERROR,
> >> +                  "__osm_pr_rcv_get_path_parms_qos: ERR 1F03: "
> >> +                  "Cannot find routing to LID 0x%X from switch for GUID 0x%016" PRIx64 "\n",
> >> +                  dest_lid_ho,
> >> +                  cl_ntoh64( osm_node_get_node_guid( p_node ) ) );
> >> +         status = IB_ERROR;
> >> +         goto Exit;
> >> +      }
> >> +   }
> >> +
> >> +   /*
> >> +    * Now go through the path step by step
> >> +    */
> >> +
> >> +   while( p_physp != p_dest_physp )
> >> +   {
> >> +      p_physp = osm_physp_get_remote( p_physp );
> >> +      if ( p_physp == 0 )
> >> +      {
> >> +         osm_log( p_rcv->p_log, OSM_LOG_ERROR,
> >> +                  "__osm_pr_rcv_get_path_parms_qos: ERR 1F04: "
> >> +                  "Cannot find remote phys port when routing to LID 0x%X from node GUID 0x%016" PRIx64 "\n",
> >> +                  dest_lid_ho,
> >> +                  cl_ntoh64( osm_node_get_node_guid( p_node ) ) );
> >> +         status = IB_ERROR;
> >> +         goto Exit;
> >> +      }
> >> +      
> >> +      in_port_num = osm_physp_get_port_num(p_physp);
> >> +
> >> +      /* this is point to point case (no switch in between) */
> >> +      if( p_physp == p_dest_physp )
> >> +         break;
> > 
> > 
> > Ordering of check for switch and point to point case are different here
> > and original routine. Should they be the same ? If so, which should
> > change ? (Any reason why this was moved in this routine ?)
> Not sure I'm following.
> The order of check for switch and point to point case looks the same
> to me (am I missing something?). The difference that I see is that 
> the mtu and rate in the original function are adjusted after the 
> check for switch, and in the new function they are adjusted before the
> check, which I think is the same.

That could have been what I was seeing. Shouldn't the two functions be
indentical in order (assuming these are to be separated) ? I wouldn't
want to see them diverge further.

[snip...]

> >> +/**********************************************************************
> >> + **********************************************************************/
> >>  static void
> >>  __osm_pr_rcv_build_pr(
> >>    IN osm_pr_rcv_t*         const p_rcv,
> >> @@ -774,7 +1569,8 @@ __osm_pr_rcv_build_pr(
> >>  #endif
> >>  
> >>    p_pr->pkey = p_parms->pkey;
> >> -  p_pr->sl = cl_hton16(p_parms->sl);
> >> +  ib_path_rec_set_qos_class(p_pr,p_parms->class);
> >> +  ib_path_rec_set_sl(p_pr,p_parms->sl);
> >>    p_pr->mtu = (uint8_t)(p_parms->mtu | 0x80);
> >>    p_pr->rate = (uint8_t)(p_parms->rate | 0x80);
> >>  
> >> @@ -832,10 +1628,14 @@ __osm_pr_rcv_get_lid_pair_path(
> >>      goto Exit;
> >>    }
> >>  
> >> -  status = __osm_pr_rcv_get_path_parms( p_rcv, p_pr, p_src_port,
> >> -                                        p_dest_port, dest_lid_ho,
> >> -                                        comp_mask, &path_parms );
> >> -
> >> +  if (p_rcv->p_subn->opt.no_qos)
> > 
> > Shouldn't this be based on p_rcv->p_subn.opt.qos_policy_file rather than
> > no_qos ? I think there are cases where the QoS will be used without the
> > QoS policy (higher level QoS support).
> 
> By totally ignoring sl2vl tables the original function may return
> path that isn't a "real" path - it may lead to VL15 at some point.
> So the new function takes care of this problem.

So it's a bug fix (missing functionality) in the existing QoS support.

> When there's no policy file, the policy parse tree is empty, and then 
> the ports would not have any qos-level to be applied on the examined path.
> In that case the new function does whatever the old one did, plus checking
> the path for sl2vl "consistency".

Got it. Thanks.

-- Hal

> -- Yevgeny


From monil at voltaire.com  Thu Feb  1 08:17:54 2007
From: monil at voltaire.com (Moni Levy)
Date: Thu, 1 Feb 2007 18:17:54 +0200
Subject: [openib-general] OFED 1.2 release - to be reviewed in the
 meeting today
In-Reply-To: <45C08E47.2040506@mellanox.co.il>
References: <45BDFF11.9080901@mellanox.co.il>
	<45BFF296.8000908@cse.ohio-state.edu> <45C08E47.2040506@mellanox.co.il>
Message-ID: <6a122cc00702010817j52958d85n1d141316e29a7ebf@mail.gmail.com>

Tziporet,
On 1/31/07, Tziporet Koren <tziporet at mellanox.co.il> wrote:
> Shaun Rowland wrote:
> >
> > Hi. I am not exactly sure where the ofed_1_2 directory for MPI SRPMs is
> > supposed to go. I assume from previous meetings this is just a
> > filesystem directory. Should it be a directory in my home directory on
> > staging.openfabrics.org, in ~/public_html, or is there something else I
> > need to do to put this into place? I think from the previous MPI
> > specific meeting, this was supposed to be done in a web directory. Since
> > I am unclear, I wanted to ask here.
>
> Please place your SRPM under your home directory at ofed_1_2 directory.
> Then you can make this directory accessible to the web in this way:
> 1. mkdir public_html
> 2. chmod 755 public_html
>
> Now you can put any stuff under public_html (also symbolic links) and it
> will be available via web
> www.openfabrics.org/~<user name>/

I have put the ib-bonding SRPM in ~monis/ofed_1_2

--Moni

>
> Tziporet
>
>
>
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
>
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
>
>


From swise at opengridcomputing.com  Thu Feb  1 09:12:01 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Thu, 01 Feb 2007 11:12:01 -0600
Subject: [openib-general] [PATCH 00/12] ofed_1_2 - Neighbour update
	support
In-Reply-To: <20070201121008.GA20789@mellanox.co.il>
References: <20070125191321.30934.74542.stgit@dell3.ogc.int>
	<20070201121008.GA20789@mellanox.co.il>
Message-ID: <1170349921.16637.1.camel@stevo-desktop>

Looks good.

Thanks,

Steve.


On Thu, 2007-02-01 at 14:10 +0200, Michael S. Tsirkin wrote:
> > Quoting Steve Wise <swise at opengridcomputing.com>:
> > Subject: [PATCH 00/12] ofed_1_2 - Neighbour update support
> > 
> > 
> > Michael/Vlad:
> > 
> > Here are the backports for snooping arp packets to generate neighbour
> > update netevents.  Also included is the addr.c patch to act on all valid
> > neigh update events.  If this series looks good to you then I'll push
> > this up and you all can pull it from my git tree.
> 
> This patches seems to have created a reference leak on each neighbour
> as a result ipoib interface could not be brought down.
> It also seems that RHASU2 backport was missing code.
> I pushed out the following:
> 
> 
> commit d140398db0da0beb3172e0ccf14ef3023cafec9c
> Author: Michael S. Tsirkin <mst at mellanox.co.il>
> Date:   Thu Feb 1 12:21:34 2007 +0200
> 
>     Fix neighbour reference leak in netevent.c
>     
>     Signed-off-by: Michael S. Tsirkin <mst at mellanox.co.il>
> 
> diff --git a/kernel_addons/backport/2.6.11/include/src/netevent.c b/kernel_addons/backport/2.6.11/include/src/netevent.c
> index 6a8df29..0d26662 100644
> --- a/kernel_addons/backport/2.6.11/include/src/netevent.c
> +++ b/kernel_addons/backport/2.6.11/include/src/netevent.c
> @@ -38,8 +38,10 @@ static void destructor(struct sk_buff *skb)
>  	arp_ptr = skb->nh.raw + sizeof(struct arphdr) + skb->dev->addr_len;
>  	memcpy(&gw, arp_ptr, 4);
>  	n = neigh_lookup(&arp_tbl, &gw, skb->dev);
> -	if (n)
> +	if (n) {
>  		call_netevent_notifiers(NETEVENT_NEIGH_UPDATE, n);
> +		neigh_release(n);
> +	}
>  	return;
>  }
>  
> diff --git a/kernel_addons/backport/2.6.12/include/src/netevent.c b/kernel_addons/backport/2.6.12/include/src/netevent.c
> index 6a8df29..0d26662 100644
> --- a/kernel_addons/backport/2.6.12/include/src/netevent.c
> +++ b/kernel_addons/backport/2.6.12/include/src/netevent.c
> @@ -38,8 +38,10 @@ static void destructor(struct sk_buff *skb)
>  	arp_ptr = skb->nh.raw + sizeof(struct arphdr) + skb->dev->addr_len;
>  	memcpy(&gw, arp_ptr, 4);
>  	n = neigh_lookup(&arp_tbl, &gw, skb->dev);
> -	if (n)
> +	if (n) {
>  		call_netevent_notifiers(NETEVENT_NEIGH_UPDATE, n);
> +		neigh_release(n);
> +	}
>  	return;
>  }
>  
> diff --git a/kernel_addons/backport/2.6.13/include/src/netevent.c b/kernel_addons/backport/2.6.13/include/src/netevent.c
> index 6a8df29..0d26662 100644
> --- a/kernel_addons/backport/2.6.13/include/src/netevent.c
> +++ b/kernel_addons/backport/2.6.13/include/src/netevent.c
> @@ -38,8 +38,10 @@ static void destructor(struct sk_buff *skb)
>  	arp_ptr = skb->nh.raw + sizeof(struct arphdr) + skb->dev->addr_len;
>  	memcpy(&gw, arp_ptr, 4);
>  	n = neigh_lookup(&arp_tbl, &gw, skb->dev);
> -	if (n)
> +	if (n) {
>  		call_netevent_notifiers(NETEVENT_NEIGH_UPDATE, n);
> +		neigh_release(n);
> +	}
>  	return;
>  }
>  
> diff --git a/kernel_addons/backport/2.6.14/include/src/netevent.c b/kernel_addons/backport/2.6.14/include/src/netevent.c
> index 188283c..17a12ff 100644
> --- a/kernel_addons/backport/2.6.14/include/src/netevent.c
> +++ b/kernel_addons/backport/2.6.14/include/src/netevent.c
> @@ -38,8 +38,10 @@ static void destructor(struct sk_buff *skb)
>  	arp_ptr = skb->nh.raw + sizeof(struct arphdr) + skb->dev->addr_len;
>  	memcpy(&gw, arp_ptr, 4);
>  	n = neigh_lookup(&arp_tbl, &gw, skb->dev);
> -	if (n)
> +	if (n) {
>  		call_netevent_notifiers(NETEVENT_NEIGH_UPDATE, n);
> +		neigh_release(n);
> +	}
>  	return;
>  }
>  
> diff --git a/kernel_addons/backport/2.6.15/include/src/netevent.c b/kernel_addons/backport/2.6.15/include/src/netevent.c
> index 188283c..17a12ff 100644
> --- a/kernel_addons/backport/2.6.15/include/src/netevent.c
> +++ b/kernel_addons/backport/2.6.15/include/src/netevent.c
> @@ -38,8 +38,10 @@ static void destructor(struct sk_buff *skb)
>  	arp_ptr = skb->nh.raw + sizeof(struct arphdr) + skb->dev->addr_len;
>  	memcpy(&gw, arp_ptr, 4);
>  	n = neigh_lookup(&arp_tbl, &gw, skb->dev);
> -	if (n)
> +	if (n) {
>  		call_netevent_notifiers(NETEVENT_NEIGH_UPDATE, n);
> +		neigh_release(n);
> +	}
>  	return;
>  }
>  
> diff --git a/kernel_addons/backport/2.6.15_ubuntu606/include/src/netevent.c b/kernel_addons/backport/2.6.15_ubuntu606/include/src/netevent.c
> index 188283c..17a12ff 100644
> --- a/kernel_addons/backport/2.6.15_ubuntu606/include/src/netevent.c
> +++ b/kernel_addons/backport/2.6.15_ubuntu606/include/src/netevent.c
> @@ -38,8 +38,10 @@ static void destructor(struct sk_buff *skb)
>  	arp_ptr = skb->nh.raw + sizeof(struct arphdr) + skb->dev->addr_len;
>  	memcpy(&gw, arp_ptr, 4);
>  	n = neigh_lookup(&arp_tbl, &gw, skb->dev);
> -	if (n)
> +	if (n) {
>  		call_netevent_notifiers(NETEVENT_NEIGH_UPDATE, n);
> +		neigh_release(n);
> +	}
>  	return;
>  }
>  
> diff --git a/kernel_addons/backport/2.6.16/include/src/netevent.c b/kernel_addons/backport/2.6.16/include/src/netevent.c
> index 188283c..17a12ff 100644
> --- a/kernel_addons/backport/2.6.16/include/src/netevent.c
> +++ b/kernel_addons/backport/2.6.16/include/src/netevent.c
> @@ -38,8 +38,10 @@ static void destructor(struct sk_buff *skb)
>  	arp_ptr = skb->nh.raw + sizeof(struct arphdr) + skb->dev->addr_len;
>  	memcpy(&gw, arp_ptr, 4);
>  	n = neigh_lookup(&arp_tbl, &gw, skb->dev);
> -	if (n)
> +	if (n) {
>  		call_netevent_notifiers(NETEVENT_NEIGH_UPDATE, n);
> +		neigh_release(n);
> +	}
>  	return;
>  }
>  
> diff --git a/kernel_addons/backport/2.6.16_sles10/include/src/netevent.c b/kernel_addons/backport/2.6.16_sles10/include/src/netevent.c
> index 188283c..17a12ff 100644
> --- a/kernel_addons/backport/2.6.16_sles10/include/src/netevent.c
> +++ b/kernel_addons/backport/2.6.16_sles10/include/src/netevent.c
> @@ -38,8 +38,10 @@ static void destructor(struct sk_buff *skb)
>  	arp_ptr = skb->nh.raw + sizeof(struct arphdr) + skb->dev->addr_len;
>  	memcpy(&gw, arp_ptr, 4);
>  	n = neigh_lookup(&arp_tbl, &gw, skb->dev);
> -	if (n)
> +	if (n) {
>  		call_netevent_notifiers(NETEVENT_NEIGH_UPDATE, n);
> +		neigh_release(n);
> +	}
>  	return;
>  }
>  
> diff --git a/kernel_addons/backport/2.6.17/include/src/netevent.c b/kernel_addons/backport/2.6.17/include/src/netevent.c
> index 26a0920..4c67de1 100644
> --- a/kernel_addons/backport/2.6.17/include/src/netevent.c
> +++ b/kernel_addons/backport/2.6.17/include/src/netevent.c
> @@ -38,8 +38,10 @@ static void destructor(struct sk_buff *skb)
>  	arp_ptr = skb->nh.raw + sizeof(struct arphdr) + skb->dev->addr_len;
>  	memcpy(&gw, arp_ptr, 4);
>  	n = neigh_lookup(&arp_tbl, &gw, skb->dev);
> -	if (n)
> +	if (n) {
>  		call_netevent_notifiers(NETEVENT_NEIGH_UPDATE, n);
> +		neigh_release(n);
> +	}
>  	return;
>  }
>  
> diff --git a/kernel_addons/backport/2.6.5_sles9_sp3/include/src/netevent.c b/kernel_addons/backport/2.6.5_sles9_sp3/include/src/netevent.c
> index 57a23ab..90fce0c 100644
> --- a/kernel_addons/backport/2.6.5_sles9_sp3/include/src/netevent.c
> +++ b/kernel_addons/backport/2.6.5_sles9_sp3/include/src/netevent.c
> @@ -39,8 +39,10 @@ static void destructor(struct sk_buff *skb)
>  	arp_ptr = skb->nh.raw + sizeof(struct arphdr) + skb->dev->addr_len;
>  	memcpy(&gw, arp_ptr, 4);
>  	n = neigh_lookup(&arp_tbl, &gw, skb->dev);
> -	if (n)
> +	if (n) {
>  		call_netevent_notifiers(NETEVENT_NEIGH_UPDATE, n);
> +		neigh_release(n);
> +	}
>  	return;
>  }
>  
> diff --git a/kernel_addons/backport/2.6.9_U2/include/src/netevent.c b/kernel_addons/backport/2.6.9_U2/include/src/netevent.c
> index 5ffadd1..1589300 100644
> --- a/kernel_addons/backport/2.6.9_U2/include/src/netevent.c
> +++ b/kernel_addons/backport/2.6.9_U2/include/src/netevent.c
> @@ -13,10 +13,59 @@
>   *	Fixes:
>   */
>  
> -#include <linux/module.h>
> -#include <linux/skbuff.h>
>  #include <linux/rtnetlink.h>
>  #include <linux/notifier.h>
> +#include <linux/mutex.h>
> +#include <linux/if.h>
> +#include <linux/netdevice.h>
> +#include <linux/if_arp.h>
> +
> +#include <net/arp.h>
> +#include <net/neighbour.h>
> +#include <net/route.h>
> +#include <net/netevent.h>
> +
> +static DEFINE_MUTEX(lock);
> +static int count;
> +
> +static void destructor(struct sk_buff *skb)
> +{
> +	struct neighbour *n;
> +	u8 *arp_ptr;
> +	__be32 gw;
> +
> +	/* Pull the SPA */
> +	arp_ptr = skb->nh.raw + sizeof(struct arphdr) + skb->dev->addr_len;
> +	memcpy(&gw, arp_ptr, 4);
> +	n = neigh_lookup(&arp_tbl, &gw, skb->dev);
> +	if (n) {
> +		call_netevent_notifiers(NETEVENT_NEIGH_UPDATE, n);
> +		neigh_release(n);
> +	}
> +	return;
> +}
> +
> +static int arp_recv(struct sk_buff *skb, struct net_device *dev,
> +			 struct packet_type *pkt)
> +{
> +	struct arphdr *arp_hdr;
> +	u16 op;
> +
> +	arp_hdr = (struct arphdr *) skb->nh.raw;
> +	op = ntohs(arp_hdr->ar_op);
> +
> +	if ((op == ARPOP_REQUEST || op == ARPOP_REPLY) && !skb->destructor)
> +		skb->destructor = destructor;
> +
> +	kfree_skb(skb);
> +	return 0;
> +}
> +
> +static struct packet_type arp = {
> +	.type = __constant_htons(ETH_P_ARP),
> +	.func = arp_recv,
> +	.af_packet_priv = (void *)1,
> +};
>  
>  static struct notifier_block *netevent_notif_chain;
>  
> @@ -34,6 +83,12 @@ int register_netevent_notifier(struct notifier_block *nb)
>  	int err;
>  
>  	err = notifier_chain_register(&netevent_notif_chain, nb);
> +	if (!err) {
> +		mutex_lock(&lock);
> +		if (count++ == 0)
> +			dev_add_pack(&arp);
> +		mutex_unlock(&lock);
> +	}
>  	return err;
>  }
>  
> @@ -49,7 +104,16 @@ int register_netevent_notifier(struct notifier_block *nb)
>  
>  int unregister_netevent_notifier(struct notifier_block *nb)
>  {
> -	return notifier_chain_unregister(&netevent_notif_chain, nb);
> +	int err;
> +
> +	err = notifier_chain_unregister(&netevent_notif_chain, nb);
> +	if (!err) {
> +		mutex_lock(&lock);
> +		if (--count == 0)
> +			dev_remove_pack(&arp);
> +		mutex_unlock(&lock);
> +	}
> +	return err;
>  }
>  
>  /**
> diff --git a/kernel_addons/backport/2.6.9_U3/include/src/netevent.c b/kernel_addons/backport/2.6.9_U3/include/src/netevent.c
> index 5ffadd1..1589300 100644
> --- a/kernel_addons/backport/2.6.9_U3/include/src/netevent.c
> +++ b/kernel_addons/backport/2.6.9_U3/include/src/netevent.c
> @@ -13,10 +13,59 @@
>   *	Fixes:
>   */
>  
> -#include <linux/module.h>
> -#include <linux/skbuff.h>
>  #include <linux/rtnetlink.h>
>  #include <linux/notifier.h>
> +#include <linux/mutex.h>
> +#include <linux/if.h>
> +#include <linux/netdevice.h>
> +#include <linux/if_arp.h>
> +
> +#include <net/arp.h>
> +#include <net/neighbour.h>
> +#include <net/route.h>
> +#include <net/netevent.h>
> +
> +static DEFINE_MUTEX(lock);
> +static int count;
> +
> +static void destructor(struct sk_buff *skb)
> +{
> +	struct neighbour *n;
> +	u8 *arp_ptr;
> +	__be32 gw;
> +
> +	/* Pull the SPA */
> +	arp_ptr = skb->nh.raw + sizeof(struct arphdr) + skb->dev->addr_len;
> +	memcpy(&gw, arp_ptr, 4);
> +	n = neigh_lookup(&arp_tbl, &gw, skb->dev);
> +	if (n) {
> +		call_netevent_notifiers(NETEVENT_NEIGH_UPDATE, n);
> +		neigh_release(n);
> +	}
> +	return;
> +}
> +
> +static int arp_recv(struct sk_buff *skb, struct net_device *dev,
> +			 struct packet_type *pkt)
> +{
> +	struct arphdr *arp_hdr;
> +	u16 op;
> +
> +	arp_hdr = (struct arphdr *) skb->nh.raw;
> +	op = ntohs(arp_hdr->ar_op);
> +
> +	if ((op == ARPOP_REQUEST || op == ARPOP_REPLY) && !skb->destructor)
> +		skb->destructor = destructor;
> +
> +	kfree_skb(skb);
> +	return 0;
> +}
> +
> +static struct packet_type arp = {
> +	.type = __constant_htons(ETH_P_ARP),
> +	.func = arp_recv,
> +	.af_packet_priv = (void *)1,
> +};
>  
>  static struct notifier_block *netevent_notif_chain;
>  
> @@ -34,6 +83,12 @@ int register_netevent_notifier(struct notifier_block *nb)
>  	int err;
>  
>  	err = notifier_chain_register(&netevent_notif_chain, nb);
> +	if (!err) {
> +		mutex_lock(&lock);
> +		if (count++ == 0)
> +			dev_add_pack(&arp);
> +		mutex_unlock(&lock);
> +	}
>  	return err;
>  }
>  
> @@ -49,7 +104,16 @@ int register_netevent_notifier(struct notifier_block *nb)
>  
>  int unregister_netevent_notifier(struct notifier_block *nb)
>  {
> -	return notifier_chain_unregister(&netevent_notif_chain, nb);
> +	int err;
> +
> +	err = notifier_chain_unregister(&netevent_notif_chain, nb);
> +	if (!err) {
> +		mutex_lock(&lock);
> +		if (--count == 0)
> +			dev_remove_pack(&arp);
> +		mutex_unlock(&lock);
> +	}
> +	return err;
>  }
>  
>  /**
> diff --git a/kernel_addons/backport/2.6.9_U4/include/src/netevent.c b/kernel_addons/backport/2.6.9_U4/include/src/netevent.c
> index 6a8df29..0d26662 100644
> --- a/kernel_addons/backport/2.6.9_U4/include/src/netevent.c
> +++ b/kernel_addons/backport/2.6.9_U4/include/src/netevent.c
> @@ -38,8 +38,10 @@ static void destructor(struct sk_buff *skb)
>  	arp_ptr = skb->nh.raw + sizeof(struct arphdr) + skb->dev->addr_len;
>  	memcpy(&gw, arp_ptr, 4);
>  	n = neigh_lookup(&arp_tbl, &gw, skb->dev);
> -	if (n)
> +	if (n) {
>  		call_netevent_notifiers(NETEVENT_NEIGH_UPDATE, n);
> +		neigh_release(n);
> +	}
>  	return;
>  }
>  
> 


From swise at opengridcomputing.com  Thu Feb  1 09:29:24 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Thu, 01 Feb 2007 11:29:24 -0600
Subject: [openib-general] [PATCH 00/12] ofed_1_2 - Neighbour update
	support
In-Reply-To: <20070201121930.GB20789@mellanox.co.il>
References: <20070125191321.30934.74542.stgit@dell3.ogc.int>
	<20070201121930.GB20789@mellanox.co.il>
Message-ID: <1170350964.16637.18.camel@stevo-desktop>

On Thu, 2007-02-01 at 14:19 +0200, Michael S. Tsirkin wrote:
> > Here are the backports for snooping arp packets to generate neighbour
> > update netevents.
> 
> OK, I went (somewhat belatedly) over this code in more depth and I see
> a couple of issues that I'd like you to address:
> 
> - There's some trailing whitespace in some netevet.c files.
>   Could you clean these please?
> 

You took care of these I assume based on your followup email.

> - I see:
> 	$ diff ./kernel_addons/backport/2.6.9_U4/include/src/netevent.c
> 	kernel_addons/backport/2.6.5_sles9_sp3/include/src/netevent.c
> 	> #include <linux/skbuff.h>
> 
> Should not redhat backports include skbuff.h too?
> They do use skbuff struct so it seems it is cleaner to include
> directly, and we would get identical code for redhat and suse.
>  

Yup.

> - What is the reason for:
>         if ((op == ARPOP_REQUEST || op == ARPOP_REPLY) && !skb->destructor)
> 	                skb->destructor = destructor;
> 
> 	kfree_skb(skb);
> 
> Could we miss events because skb has a desctructor?

Yes.  I looked through the ethernet drivers and didn't see anyone using
destructors.  I thought perhaps this is ok for backports.  There are
ways to address this issue:

1) Enhance the current code to save off the original destructor function
if it exists and put in ours.  Then when our function is called, we do
our processing, then call the original destructor function.  We would
need to save the original function ptr somewhere. 

2) schedule the function to happen at a later time and hope the ARP
subsystem has already updated the neigh table.  I opted against this
approach because it doesn't ensure that the neigh entry was updated
before we act on it.

> Can we just call the descructor function directly (this is what addr.c
> did previously, and this apparently worked fine).

The original addr.c snoop code worked fine for IB address resolution and
for the initial ARP resolution for iWARP devices, but not for notifying
iWARP devices when a neighbour changes.  For instance, if the neighbour
mac address changes, then the iWARP device needs to be notified so it
can update its L2 table maintained in the device. 

We need to defer calling the destructor function until the ARP subsystem
has processed this ARP packet.  Through testing, I saw that our snoop
function gets called _before_ the ARP subsystem processes the ARP
packet.  So the neighbour entry hasn't been updated yet.  Hooking via
destructor calls our function _after_ the ARP subsystem has updated the
neighbour.  So we can then lookup the neigh entry and do the callouts.


From mshefty at ichips.intel.com  Thu Feb  1 09:55:10 2007
From: mshefty at ichips.intel.com (Sean Hefty)
Date: Thu, 01 Feb 2007 09:55:10 -0800
Subject: [openib-general] new IB CM reject reason
In-Reply-To: <20070201062431.GB4499@mellanox.co.il>
References: <000201c74585$a0bc7260$8698070a@amr.corp.intel.com>
	<20070201062431.GB4499@mellanox.co.il>
Message-ID: <45C2297E.9050306@ichips.intel.com>

> No, I don't think "application crashed" makes sense as an element of wire protocol.
> I think an optional logging of errors in kernel CM would be a much better
> solution. I know I had to add some printks it each time I was debugging SDP.

The "application crashed" scenario is what high-lighted the issue.  The problem 
is that the CM must provide a reject reason.  Which reject reason do you use? 
My suggestion was for a reject reason of other/unknown/none given (pick one).

> 2. Another objection is that this feature seems to invite misuse where applications
> will use REJ reason as a hint on whether remote side crashed. But REJ could be
> lost. Wouldn't this confuse the remote side?

Currently, the CM issues the reject using "consumer defined", since nothing else 
maps any better under this condition.  But the reject isn't consumer defined... 
  By doing this, an application that expects specific private data in the reject 
message won't find it, which is just as likely to confuse the remote side.  This 
is why I think an unknown/unspecified reject reason is needed.

How an application interprets a reject with 'unknown' reason is up to the 
application, but I do think this is better than the application trying to guess 
whether 'consumer defined' really does mean consumer defined.

- Sean


From mshefty at ichips.intel.com  Thu Feb  1 10:01:01 2007
From: mshefty at ichips.intel.com (Sean Hefty)
Date: Thu, 01 Feb 2007 10:01:01 -0800
Subject: [openib-general] [PATCH] RE:  regression in ofed 1.2
In-Reply-To: <20070201135522.GA27688@mellanox.co.il>
References: <000401c7458b$9bff77d0$8698070a@amr.corp.intel.com>
	<1170322670.654.23.camel@linux-q667.site>
	<20070201135522.GA27688@mellanox.co.il>
Message-ID: <45C22ADD.40108@ichips.intel.com>

> - Sean, please base your branches on specific -rc from linus
>   (OFED 1.2 is now -rc7).

My branches should be in sync with rc6.  The original branches were built from 
an earlier rc version, and updated by pulling in the latest rc from Linus 
through my master branch.  Are you wanting the history of the branches reworked 
so that they get completely rebuilt off of the latest kernel?

> - Now that we are entering feature freeze, we should not do full replaces anymore.
>   So Sean, please post incremental patches, labeled ofed-1.2 clearly.

Additional patches will be posted to my ofed_1-2 branch, which you should be 
able to pull.  Do you see a problem with this process?  I don't understand why 
you would need to do a full replace.

- Sean


From mst at mellanox.co.il  Thu Feb  1 10:39:22 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Thu, 1 Feb 2007 20:39:22 +0200
Subject: [openib-general] new IB CM reject reason
In-Reply-To: <45C2297E.9050306@ichips.intel.com>
References: <45C2297E.9050306@ichips.intel.com>
Message-ID: <20070201183922.GB15115@mellanox.co.il>

> Quoting Sean Hefty <mshefty at ichips.intel.com>:
> Subject: Re: [openib-general] new IB CM reject reason
> 
> > No, I don't think "application crashed" makes sense as an element of wire protocol.
> > I think an optional logging of errors in kernel CM would be a much better
> > solution. I know I had to add some printks it each time I was debugging SDP.
> 
> The "application crashed" scenario is what high-lighted the issue.  The problem 
> is that the CM must provide a reject reason.  Which reject reason do you use? 
> My suggestion was for a reject reason of other/unknown/none given (pick one).

I'm actually happy with what existing code does (consumer reject).
I would like to highlight the lack of ability to make CM errors
go to system log as a weekness in current CM code, which hinders debugging.

Would you be interested in a patch making it possible to enable logging CM errors
and/or all CM events?

> > 2. Another objection is that this feature seems to invite misuse where applications
> > will use REJ reason as a hint on whether remote side crashed. But REJ could be
> > lost. Wouldn't this confuse the remote side?
> 
> Currently, the CM issues the reject using "consumer defined", since nothing else 
> maps any better under this condition.  But the reject isn't consumer defined... 
>   By doing this, an application that expects specific private data in the reject 
> message won't find it, which is just as likely to confuse the remote side.  This 
> is why I think an unknown/unspecified reject reason is needed.
> 
> How an application interprets a reject with 'unknown' reason is up to the 
> application, but I do think this is better than the application trying to guess 
> whether 'consumer defined' really does mean consumer defined.

Are we talking about code 28? My spec lists it as "consumer reject".
The meaning of *private data* is consumer defined.

                   The consumer decided to reject the communica- 
                   tion or EE context setup establishment attempt for
                   reasons other than those listed in the other REJ
                   codes. Typically this happens based upon infor-
                   mation being conveyed in the PrivateData field of
                   a message. It can also happen because the Con-
                   sumer decided for reasons unrelated to any CM
                   message it received to terminate the communica-
                   tion or EE context setup establishment attempt.
                   This would therefore be the appropriate Reason
                   code to use if the Consumer decided to destroy
                   the QP or EEC in the midst of the communication
                   or EE context setup establishment attempt.

So this really *does* seem to be what spec intended for exactly our case.

Now, I do not really object to inventing new rejection reasons: for example,
maybe we can invent one that lets us stick the errno value in private data
somehow - but it's not like there's no solution inside the spec,
and inventing a whole new reject reason just for userspace consumers
seems like a narrow approach to me.


-- 
MST


From swise at opengridcomputing.com  Thu Feb  1 10:42:11 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Thu, 01 Feb 2007 12:42:11 -0600
Subject: [openib-general] [PATCH] RE:  regression in ofed 1.2
In-Reply-To: <20070201135522.GA27688@mellanox.co.il>
References: <000401c7458b$9bff77d0$8698070a@amr.corp.intel.com>
	<1170322670.654.23.camel@linux-q667.site>
	<20070201135522.GA27688@mellanox.co.il>
Message-ID: <1170355331.16637.25.camel@stevo-desktop>

Um, now on rhel4u4 we crash creating the mcast workqueue.

The name is "ib_mcast_wq" which is too long for older kernels.

Did we loose a backport patch?


On Thu, 2007-02-01 at 15:55 +0200, Michael S. Tsirkin wrote:
> > Quoting Steve WIse <swise at opengridcomputing.com>:
> > Subject: Re: [PATCH] RE:  regression in ofed 1.2
> > 
> > > Okay - I _think_ the problem is that OFED 1.2 pulled code from my git tree
> > > before I created an ofed_1_2 branch (which contains the fix), and didn't update
> > > to match my ofed_1_2 branch.  The crash that you reported occurring over iWarp
> > > should also happen over IB for the same reason, so both are likely broken atm...
> > > 
> > > Vlad, can you please update the ofed build by pulling from the ofed_1_2 branches
> > > of my rdma-dev.git and librdmacm.git trees?
> > 
> > I looked at your rdma-dev ofed_1_2 branch and see that the cma.c changes
> > you made there will resolve this issue.  It just needs to be pulled into
> > ofed_1_2.
> 
> OK, I've updated ofed to code from rdma-dev ofed_1_2 branch. Some notes: 
> 
> - Sean, please base your branches on specific -rc from linus
>   (OFED 1.2 is now -rc7).
> - Now that we are entering feature freeze, we should not do full replaces anymore.
>   So Sean, please post incremental patches, labeled ofed-1.2 clearly.
> 


From sean.hefty at intel.com  Thu Feb  1 10:55:20 2007
From: sean.hefty at intel.com (Sean Hefty)
Date: Thu, 1 Feb 2007 10:55:20 -0800
Subject: [openib-general] new IB CM reject reason
In-Reply-To: <20070201183922.GB15115@mellanox.co.il>
Message-ID: <000101c74632$85b37bf0$8698070a@amr.corp.intel.com>

>Would you be interested in a patch making it possible to enable logging CM
>errors
>and/or all CM events?

A patch for this would be fine with me.

>Are we talking about code 28? My spec lists it as "consumer reject".
>The meaning of *private data* is consumer defined.
>
>                   The consumer decided to reject the communica-
>                   tion or EE context setup establishment attempt for
>                   reasons other than those listed in the other REJ
>                   codes. Typically this happens based upon infor-
>                   mation being conveyed in the PrivateData field of
>                   a message. It can also happen because the Con-
>                   sumer decided for reasons unrelated to any CM
>                   message it received to terminate the communica-
>                   tion or EE context setup establishment attempt.
>                   This would therefore be the appropriate Reason
>                   code to use if the Consumer decided to destroy
>                   the QP or EEC in the midst of the communication
>                   or EE context setup establishment attempt.
>
>So this really *does* seem to be what spec intended for exactly our case.

I disagree.  This is for the CM consumer, not the CM itself.  In this case, the
CM must issue a reject that will be delivered to the remote application.  The CM
has no idea what private data format the remote application expects.

>Now, I do not really object to inventing new rejection reasons: for example,
>maybe we can invent one that lets us stick the errno value in private data
>somehow - but it's not like there's no solution inside the spec,
>and inventing a whole new reject reason just for userspace consumers
>seems like a narrow approach to me.

Unless we start enforcing a policy that kernel consumers must issue a reject
before destroying a cm_id (while in the connecting phase), they have this
problem.

My claim is that the reject reasons are insufficient to cover all possible
conditions, and adding a generic 'other' reject reason solves this.  Using
consumer defined, which is what is done today, is incorrect.  As an alternate
solution, we could also not send any reject and just let the connection time out
on the remote side.

- Sean


From mst at mellanox.co.il  Thu Feb  1 11:00:49 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Thu, 1 Feb 2007 21:00:49 +0200
Subject: [openib-general] [PATCH] RE:  regression in ofed 1.2
In-Reply-To: <45C22ADD.40108@ichips.intel.com>
References: <45C22ADD.40108@ichips.intel.com>
Message-ID: <20070201190049.GC15115@mellanox.co.il>

> Quoting Sean Hefty <mshefty at ichips.intel.com>:
> Subject: Re: [openib-general] [PATCH] RE:  regression in ofed 1.2
> 
> > - Sean, please base your branches on specific -rc from linus
> >   (OFED 1.2 is now -rc7).
> 
> My branches should be in sync with rc6.

If you check, they are not. ofed 1 2 branch has an extra
commit on top of -rc6. But I figured it out already.

> so that they get completely rebuilt off of the latest kernel?

No need to do anything at this point.

> > - Now that we are entering feature freeze, we should not do full replaces anymore.
> >   So Sean, please post incremental patches, labeled ofed-1.2 clearly.
> 
> Additional patches will be posted to my ofed_1-2 branch, which you should be 
> able to pull.

First, please post patches on list as well.
We can then just take the patch from git or from mail and add it under fixes.

> Do you see a problem with this process?

Yes. I had to jump through some hoops to first get a patch I can put in OFED due
to the issue outlined above, and then get the diff I got to apply without
conflicts, since port randomization code conflicted with the QoS patches. All
solved now - just put your patch before QoS one - but these conflicts should be
be figured out by whoever submits patches.

> I don't understand why you would need to do a full replace.

We won't do a full replace, just add patches in fixes directory.

What I expect everyone to do however, to get patches put in OFED,
is to test that patches one posts work in OFED git tree, not just against
upstream based git trees.

This currently includes testing for build against older kernels on various
architectures (me and Vlad put a cross-build setup for this at staging,
it now has kernel.org kernels but we will be adding distro kernels)
and testing on at least one of the main supported enterprise distros (RHEL/SLES).

I simply can't take untested patches - I have nightly tests but no time to test
all ULPs before I apply.

-- 
MST


From mst at mellanox.co.il  Thu Feb  1 11:06:24 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Thu, 1 Feb 2007 21:06:24 +0200
Subject: [openib-general] new IB CM reject reason
In-Reply-To: <000101c74632$85b37bf0$8698070a@amr.corp.intel.com>
References: <20070201183922.GB15115@mellanox.co.il>
	<000101c74632$85b37bf0$8698070a@amr.corp.intel.com>
Message-ID: <20070201190624.GB6473@mellanox.co.il>

> >Are we talking about code 28? My spec lists it as "consumer reject".
> >The meaning of *private data* is consumer defined.
> >
> >                   The consumer decided to reject the communica-
> >                   tion or EE context setup establishment attempt for
> >                   reasons other than those listed in the other REJ
> >                   codes. Typically this happens based upon infor-
> >                   mation being conveyed in the PrivateData field of
> >                   a message. It can also happen because the Con-
> >                   sumer decided for reasons unrelated to any CM
> >                   message it received to terminate the communica-
> >                   tion or EE context setup establishment attempt.
> >                   This would therefore be the appropriate Reason
> >                   code to use if the Consumer decided to destroy
> >                   the QP or EEC in the midst of the communication
> >                   or EE context setup establishment attempt.
> >
> >So this really *does* seem to be what spec intended for exactly our case.
> 
> I disagree.  This is for the CM consumer, not the CM itself.  In this case, the
> CM must issue a reject that will be delivered to the remote application.  The CM
> has no idea what private data format the remote application expects.

Since we disagree about spec reading, would you raise this in the
relevant WG?

> >Now, I do not really object to inventing new rejection reasons: for example,
> >maybe we can invent one that lets us stick the errno value in private data
> >somehow - but it's not like there's no solution inside the spec,
> >and inventing a whole new reject reason just for userspace consumers
> >seems like a narrow approach to me.
> 
> Unless we start enforcing a policy that kernel consumers must issue a reject
> before destroying a cm_id (while in the connecting phase), they have this
> problem.
> 
> My claim is that the reject reasons are insufficient to cover all possible
> conditions, and adding a generic 'other' reject reason solves this.  Using
> consumer defined, which is what is done today, is incorrect.  As an alternate
> solution, we could also not send any reject and just let the connection time out
> on the remote side.

And my claim is that you should define private data format to go with this
other reason otherwise you are not really solving the problem.

-- 
MST


From mst at mellanox.co.il  Thu Feb  1 11:11:28 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Thu, 1 Feb 2007 21:11:28 +0200
Subject: [openib-general] [PATCH] RE:  regression in ofed 1.2
In-Reply-To: <1170355331.16637.25.camel@stevo-desktop>
References: <1170355331.16637.25.camel@stevo-desktop>
Message-ID: <20070201191128.GB17617@mellanox.co.il>

> Quoting Steve Wise <swise at opengridcomputing.com>:
> Subject: Re: [PATCH] RE:  regression in ofed 1.2
> 
> Um, now on rhel4u4 we crash creating the mcast workqueue.
> 
> The name is "ib_mcast_wq" which is too long for older kernels.
> 
> Did we loose a backport patch?

Sean, please rename the multicast wq to ib_mcast as we agreed.

I just pushed the following out:

commit efedfe57a21a134a65d951bcca73af46da609c5e
Author: Michael S. Tsirkin <mst at mellanox.co.il>
Date:   Thu Feb 1 21:09:16 2007 +0200

    Make multicast WQ name shorter.
    
    Signed-off-by: Michael S. Tsirkin <mst at mellanox.co.il>

diff --git a/kernel_patches/fixes/merged_sean_rdma_dev_ofed_1_2.patch b/kernel_patches/fixes/merged_sean_rdma_dev_ofed_1_2.patch
index e70d4da..4b968db 100644
--- a/kernel_patches/fixes/merged_sean_rdma_dev_ofed_1_2.patch
+++ b/kernel_patches/fixes/merged_sean_rdma_dev_ofed_1_2.patch
@@ -2225,7 +2225,7 @@ index 0000000..039f1eb
 +{
 +	int ret;
 +
-+	mcast_wq = create_singlethread_workqueue("ib_mcast_wq");
++	mcast_wq = create_singlethread_workqueue("ib_mcast");
 +	if (!mcast_wq)
 +		return -ENOMEM;
 +

-- 
MST


From mst at mellanox.co.il  Thu Feb  1 11:22:21 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Thu, 1 Feb 2007 21:22:21 +0200
Subject: [openib-general] [PATCH 00/12] ofed_1_2 - Neighbour update
	support
In-Reply-To: <1170350964.16637.18.camel@stevo-desktop>
References: <1170350964.16637.18.camel@stevo-desktop>
Message-ID: <20070201192221.GD17617@mellanox.co.il>

> > Could we miss events because skb has a desctructor?
> 
> Yes.  I looked through the ethernet drivers and didn't see anyone using
> destructors.  I thought perhaps this is ok for backports.  There are
> ways to address this issue:
> 
> 1) Enhance the current code to save off the original destructor function
> if it exists and put in ours.  Then when our function is called, we do
> our processing, then call the original destructor function.  We would
> need to save the original function ptr somewhere. 
> 
> 2) schedule the function to happen at a later time and hope the ARP
> subsystem has already updated the neigh table.  I opted against this
> approach because it doesn't ensure that the neigh entry was updated
> before we act on it.
> 
> > Can we just call the descructor function directly (this is what addr.c
> > did previously, and this apparently worked fine).
> 
> The original addr.c snoop code worked fine for IB address resolution and
> for the initial ARP resolution for iWARP devices, but not for notifying
> iWARP devices when a neighbour changes.  For instance, if the neighbour
> mac address changes, then the iWARP device needs to be notified so it
> can update its L2 table maintained in the device. 
> 
> We need to defer calling the destructor function until the ARP subsystem
> has processed this ARP packet.  Through testing, I saw that our snoop
> function gets called _before_ the ARP subsystem processes the ARP
> packet.  So the neighbour entry hasn't been updated yet.  Hooking via
> destructor calls our function _after_ the ARP subsystem has updated the
> neighbour.  So we can then lookup the neigh entry and do the callouts.

Not sure how do you mean all this. You do kfree_skb immediately in the
arp processing function. Will this not call the destructor directly?

Anyway, it seems too risky to change the code a lot now.
what I am concerned is that this could have broken working code.

To reduce the risk of problems for existing code,
I'd like to see something like the following:

	if (someone asked for notification on neighbour changes)
		do the destructor trick

	if (someone asked for notification on address resolution)
		call the destructor directly

Could you code this up please?

-- 
MST


From mst at mellanox.co.il  Thu Feb  1 11:29:24 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Thu, 1 Feb 2007 21:29:24 +0200
Subject: [openib-general] IPoIB CM for merge?
Message-ID: <20070201192924.GE17617@mellanox.co.il>

Roland, 2.6.20 is nearly done.
Could you please spend some time reviewing IPoIB CM code?
I am concerned about missing the 2.6.21 merge window.

-- 
MST


From swise at opengridcomputing.com  Thu Feb  1 12:01:11 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Thu, 01 Feb 2007 14:01:11 -0600
Subject: [openib-general] [PATCH 00/12] ofed_1_2 - Neighbour update
	support
In-Reply-To: <20070201192221.GD17617@mellanox.co.il>
References: <1170350964.16637.18.camel@stevo-desktop>
	<20070201192221.GD17617@mellanox.co.il>
Message-ID: <1170360071.16637.39.camel@stevo-desktop>

On Thu, 2007-02-01 at 21:22 +0200, Michael S. Tsirkin wrote:
> > > Could we miss events because skb has a desctructor?
> > 
> > Yes.  I looked through the ethernet drivers and didn't see anyone using
> > destructors.  I thought perhaps this is ok for backports.  There are
> > ways to address this issue:
> > 
> > 1) Enhance the current code to save off the original destructor function
> > if it exists and put in ours.  Then when our function is called, we do
> > our processing, then call the original destructor function.  We would
> > need to save the original function ptr somewhere. 
> > 
> > 2) schedule the function to happen at a later time and hope the ARP
> > subsystem has already updated the neigh table.  I opted against this
> > approach because it doesn't ensure that the neigh entry was updated
> > before we act on it.
> > 
> > > Can we just call the descructor function directly (this is what addr.c
> > > did previously, and this apparently worked fine).
> > 
> > The original addr.c snoop code worked fine for IB address resolution and
> > for the initial ARP resolution for iWARP devices, but not for notifying
> > iWARP devices when a neighbour changes.  For instance, if the neighbour
> > mac address changes, then the iWARP device needs to be notified so it
> > can update its L2 table maintained in the device. 
> > 
> > We need to defer calling the destructor function until the ARP subsystem
> > has processed this ARP packet.  Through testing, I saw that our snoop
> > function gets called _before_ the ARP subsystem processes the ARP
> > packet.  So the neighbour entry hasn't been updated yet.  Hooking via
> > destructor calls our function _after_ the ARP subsystem has updated the
> > neighbour.  So we can then lookup the neigh entry and do the callouts.
> 
> Not sure how do you mean all this. You do kfree_skb immediately in the
> arp processing function. Will this not call the destructor directly?
> 

No because the skb refcnt gets bumped by the dev packet code before
passing it up to each snoop function.  So the destructor fn will get
called only when the _last_ user of this skbuf frees it.  If by some
reason we are the last ref, then yes, we'd get called immediately.  But
that's not what happens because the snoopers get added to the end of the
list of users who want any given ethertype packet.  Hope that makes
sense.

> Anyway, it seems too risky to change the code a lot now.
> what I am concerned is that this could have broken working code.
> 

I tested it with IB and iWARP.

> To reduce the risk of problems for existing code,
> I'd like to see something like the following:
> 
> 	if (someone asked for notification on neighbour changes)
> 		do the destructor trick
> 
> 	if (someone asked for notification on address resolution)
> 		call the destructor directly
> 
> Could you code this up please?

There's no easy way to tell who asked for notifications. And
particularly why they asked for notification.

I think we should leave it as-is.  If we have problems, we'll fix it.

Or you could put your arp snoop code back in addr.c and address
translation will not use netevents.  But still thing we should leave
it...


From mshefty at ichips.intel.com  Thu Feb  1 12:05:34 2007
From: mshefty at ichips.intel.com (Sean Hefty)
Date: Thu, 01 Feb 2007 12:05:34 -0800
Subject: [openib-general] new IB CM reject reason
In-Reply-To: <20070201190624.GB6473@mellanox.co.il>
References: <20070201183922.GB15115@mellanox.co.il>
	<000101c74632$85b37bf0$8698070a@amr.corp.intel.com>
	<20070201190624.GB6473@mellanox.co.il>
Message-ID: <45C2480E.2000904@ichips.intel.com>

> And my claim is that you should define private data format to go with this
> other reason otherwise you are not really solving the problem.

This is not a consumer issued reject.  It is a CM issued reject, so the private 
data is ignored.  This is no different than several other reject reasons (like 
invalid service ID).  At best we could define the ARI, but if we knew what the 
contents of the ARI should be, then we should use a more specific reject reason 
than 'other'.

- Sean


From swise at opengridcomputing.com  Thu Feb  1 12:07:21 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Thu, 01 Feb 2007 14:07:21 -0600
Subject: [openib-general] [PATCH] ofed_1_2 Cleanup RHEL4U4 netevent backport]
Message-ID: <1170360441.16637.41.camel@stevo-desktop>

From: Steve Wise <swise at opengridcomputing.com>

Add skbuff.h to include list for RHEL4U4 netevent.c file.  This makes
it identical to the SLES9SP3 file.

Signed-off-by: Steve Wise <swise at opengridcomputing.com>
---

 .../backport/2.6.9_U4/include/src/netevent.c       |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/kernel_addons/backport/2.6.9_U4/include/src/netevent.c b/kernel_addons/backport/2.6.9_U4/include/src/netevent.c
index 1589300..87fb55c 100644
--- a/kernel_addons/backport/2.6.9_U4/include/src/netevent.c
+++ b/kernel_addons/backport/2.6.9_U4/include/src/netevent.c
@@ -13,6 +13,7 @@
  *	Fixes:
  */
 
+#include <linux/skbuff.h>
 #include <linux/rtnetlink.h>
 #include <linux/notifier.h>
 #include <linux/mutex.h>


From swise at opengridcomputing.com  Thu Feb  1 12:09:03 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Thu, 01 Feb 2007 14:09:03 -0600
Subject: [openib-general] [PATCH] ofed_1_2 Chelsio ethernet driver updates.
Message-ID: <1170360543.16637.45.camel@stevo-desktop>

From: Steve Wise <swise at opengridcomputing.com>

This patch updates the ofed_1_2 cxgb3 module to the latest queued
for 2.6.21.

Signed-off-by: Steve Wise <swise at opengridcomputing.com>
---

 drivers/net/cxgb3/firmware_exports.h |    2 +-
 drivers/net/cxgb3/sge.c              |   21 +++++++++------------
 drivers/net/cxgb3/t3_cpl.h           |    3 ---
 3 files changed, 10 insertions(+), 16 deletions(-)

diff --git a/drivers/net/cxgb3/firmware_exports.h b/drivers/net/cxgb3/firmware_exports.h
index 4538377..6a835f6 100755
--- a/drivers/net/cxgb3/firmware_exports.h
+++ b/drivers/net/cxgb3/firmware_exports.h
@@ -129,7 +129,7 @@ #define FW_OFLD_NUM			8
 #define FW_OFLD_SGEEC_START		0
 
 /*
- *
+ * 
  */
 #define FW_RI_NUM			1
 #define FW_RI_SGEEC_START		65527
diff --git a/drivers/net/cxgb3/sge.c b/drivers/net/cxgb3/sge.c
index 6b053bf..3f2cf8a 100755
--- a/drivers/net/cxgb3/sge.c
+++ b/drivers/net/cxgb3/sge.c
@@ -601,17 +601,16 @@ static struct sk_buff *get_packet(struct
 	if (len <= SGE_RX_COPY_THRES) {
 		skb = alloc_skb(len, GFP_ATOMIC);
 		if (likely(skb != NULL)) {
-			struct rx_desc *d = &fl->desc[fl->cidx];
-			dma_addr_t mapping =
-			    (dma_addr_t)((u64) be32_to_cpu(d->addr_hi) << 32 |
-					 be32_to_cpu(d->addr_lo));
-
 			__skb_put(skb, len);
-			pci_dma_sync_single_for_cpu(adap->pdev, mapping, len,
-						    PCI_DMA_FROMDEVICE);
+			pci_dma_sync_single_for_cpu(adap->pdev,
+						    pci_unmap_addr(sd,
+								   dma_addr),
+						    len, PCI_DMA_FROMDEVICE);
 			memcpy(skb->data, sd->skb->data, len);
-			pci_dma_sync_single_for_device(adap->pdev, mapping, len,
-						       PCI_DMA_FROMDEVICE);
+			pci_dma_sync_single_for_device(adap->pdev,
+						       pci_unmap_addr(sd,
+								      dma_addr),
+						       len, PCI_DMA_FROMDEVICE);
 		} else if (!drop_thres)
 			goto use_orig_buf;
 	      recycle:
@@ -1667,7 +1666,7 @@ #endif
 	credits = G_RSPD_TXQ0_CR(flags);
 	if (credits)
 		qs->txq[TXQ_ETH].processed += credits;
-	
+
 	credits = G_RSPD_TXQ2_CR(flags);
 	if (credits)
 		qs->txq[TXQ_CTRL].processed += credits;
@@ -2220,14 +2219,12 @@ static irqreturn_t t3b_intr_napi(int irq
 	if (likely(map & 1)) {
 		dev = adap->sge.qs[0].netdev;
 
-		BUG_ON(napi_is_scheduled(dev));
 		if (likely(__netif_rx_schedule_prep(dev)))
 			__netif_rx_schedule(dev);
 	}
 	if (map & 2) {
 		dev = adap->sge.qs[1].netdev;
 
-		BUG_ON(napi_is_scheduled(dev));
 		if (likely(__netif_rx_schedule_prep(dev)))
 			__netif_rx_schedule(dev);
 	}
diff --git a/drivers/net/cxgb3/t3_cpl.h b/drivers/net/cxgb3/t3_cpl.h
index 96b2f36..b7a1a31 100755
--- a/drivers/net/cxgb3/t3_cpl.h
+++ b/drivers/net/cxgb3/t3_cpl.h
@@ -184,9 +184,6 @@ #define V_OPCODE(x) ((x) << S_OPCODE)
 #define G_OPCODE(x) (((x) >> S_OPCODE) & 0xFF)
 #define G_TID(x)    ((x) & 0xFFFFFF)
 
-#define S_QNUM 0
-#define G_QNUM(x) (((x) >> S_QNUM) & 0xFFFF)
-
 /* tid is assumed to be 24-bits */
 #define MK_OPCODE_TID(opcode, tid) (V_OPCODE(opcode) | (tid))
 

From swise at opengridcomputing.com  Thu Feb  1 12:19:43 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Thu, 01 Feb 2007 14:19:43 -0600
Subject: [openib-general] ip_ib_mc_map?
In-Reply-To: <20070201090958.GD14189@mellanox.co.il>
References: <000101c74576$fedc81f0$8698070a@amr.corp.intel.com>
	<1170275680.14294.5.camel@stevo-desktop>
	<45C1480C.1020600@ichips.intel.com>
	<1170320484.654.6.camel@linux-q667.site>
	<20070201090958.GD14189@mellanox.co.il>
Message-ID: <1170361183.16637.47.camel@stevo-desktop>

On Thu, 2007-02-01 at 11:09 +0200, Michael S. Tsirkin wrote:
> > Quoting Steve WIse <swise at opengridcomputing.com>:
> > Subject: Re: ip_ib_mc_map?
> > 
> > On Wed, 2007-01-31 at 17:53 -0800, Sean Hefty wrote:
> > > Steve Wise wrote:
> > > > Perhaps there's no backport for this to rhel4u4?
> > > 
> > > I would have thought so, but I really don't know.  The function is called from 
> > > net/ipv4/arp.c, and not directly by ipoib.  So, I don't know how the backport 
> > > patches typically handle this.
> > > 
> > > - Sean
> > 
> > Here's what I see:
> > 
> > ip_ib_mc_map() is called directly from cma_join_ib_multicast(), which is
> > added to the ofed_1_2 cma.c via patch file:
> > kernel_patches/fixes/sean_multicast_1.patch
> > 
> > So when I compiled ofed_1_2 on rhel4u4, the cma wouldn't load because
> > there is no ip_ib_mc_map() in rhel4u4.  
> > 
> > So you need a backport patch for this to work on rhel4u4.  Probably many
> > of the older kernels.
> 
> I think this breakage is U4 specific. Someone at RH went to the trouble to
> rip all of IB related stuff out of the U4 kernel.
> 
> I think just calling ip_tr_mc_map on U4 instead will be enough.
> 

I changed cma.c to call ip_tr_mc_map() and I can at least load rdma_cm
now.  I didn't test any mcast, but the rdma-cm is working over iwarp...

Steve.


From jlentini at netapp.com  Thu Feb  1 12:29:00 2007
From: jlentini at netapp.com (James Lentini)
Date: Thu, 1 Feb 2007 15:29:00 -0500 (EST)
Subject: [openib-general] new IB CM reject reason
In-Reply-To: <45C2480E.2000904@ichips.intel.com>
References: <20070201183922.GB15115@mellanox.co.il>
	<000101c74632$85b37bf0$8698070a@amr.corp.intel.com>
	<20070201190624.GB6473@mellanox.co.il>
	<45C2480E.2000904@ichips.intel.com>
Message-ID: <Pine.LNX.4.64.0702011522560.2536@jlentini-linux.nane.netapp.com>


On Thu, 1 Feb 2007, Sean Hefty wrote:

> > And my claim is that you should define private data format to go with this
> > other reason otherwise you are not really solving the problem.
> 
> This is not a consumer issued reject.  It is a CM issued reject, so 
> the private data is ignored.  This is no different than several 
> other reject reasons (like invalid service ID).  At best we could 
> define the ARI, but if we knew what the contents of the ARI should 
> be, then we should use a more specific reject reason than 'other'.

Invalid Service ID (8) appears to be an appropriate Reason value for 
the case when a REQ is received for a service ID that is not 
registered with the CM (either because the application crashed or 
exited on its own accord).

I agree that if the reason codes are insufficient we should take this 
up in the IBTA.


From or.gerlitz at gmail.com  Thu Feb  1 12:40:57 2007
From: or.gerlitz at gmail.com (Or Gerlitz)
Date: Thu, 1 Feb 2007 22:40:57 +0200
Subject: [openib-general] ip_ib_mc_map?
In-Reply-To: <1170325052.2716.229.camel@fc6.xsintricity.com>
References: <1170275331.14294.1.camel@stevo-desktop>
	<45C1ABD0.5090404@voltaire.com>
	<1170325052.2716.229.camel@fc6.xsintricity.com>
Message-ID: <15ddcffd0702011240l3c427bfcx6fcc7f7968fcf8b9@mail.gmail.com>

On 2/1/07, Doug Ledford <dledford at redhat.com> wrote:
> On Thu, 2007-02-01 at 10:58 +0200, Or Gerlitz wrote:

> >  From a reason that no one at RH can trace... someone went and removed
> > all the support for ARPHRD_INFINIBAND multicast from u4 where it exists
> > perfectly fine in u3 and hopefully on u5 as well (Doug can you update?),
> > see https://bugs.openfabrics.org/show_bug.cgi?id=2661

> Yes.  It's been fixed for U5.  It wasn't that the patch got removed,
> it's that between U3 and U4 I did a complete rebase, which means that
> all the patches from U3 were tossed out the window and a complete new
> set made for U4.  I just missed re-adding this one in U4.

thanks for fixing this for U5 (which i understand is not out yet, correct?).

As of the importance for us to have IP multicast working fine with
IPoIB over RH4...
do you have an IB setup to test that?

Or.


From swise at opengridcomputing.com  Thu Feb  1 13:05:34 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Thu, 01 Feb 2007 15:05:34 -0600
Subject: [openib-general] [Fwd: Re: [PATCH 1/10] cxgb3 - main header
	files]
In-Reply-To: <adamz4f0wsy.fsf@cisco.com>
References: <1169216896.15842.6.camel@stevo-desktop>
	<adamz4f0wsy.fsf@cisco.com>
Message-ID: <1170363934.16637.58.camel@stevo-desktop>

On Fri, 2007-01-19 at 09:07 -0800, Roland Dreier wrote:
>  > Jeff has pulled in the Chelsio Ethernet driver.  If you are ready to
>  > merge in the RDMA driver, you can pull it from 
> 
> Yes, I saw that... OK, I'll get serious about reviewing the RDMA stuff.

Hey Roland,

Have you had a chance to review this?

Thanks,

Steve.


From dledford at redhat.com  Thu Feb  1 14:19:21 2007
From: dledford at redhat.com (Doug Ledford)
Date: Thu, 01 Feb 2007 17:19:21 -0500
Subject: [openib-general] ip_ib_mc_map?
In-Reply-To: <15ddcffd0702011240l3c427bfcx6fcc7f7968fcf8b9@mail.gmail.com>
References: <1170275331.14294.1.camel@stevo-desktop>
	<45C1ABD0.5090404@voltaire.com>
	<1170325052.2716.229.camel@fc6.xsintricity.com>
	<15ddcffd0702011240l3c427bfcx6fcc7f7968fcf8b9@mail.gmail.com>
Message-ID: <1170368361.2716.239.camel@fc6.xsintricity.com>

On Thu, 2007-02-01 at 22:40 +0200, Or Gerlitz wrote:
> On 2/1/07, Doug Ledford <dledford at redhat.com> wrote:
> > On Thu, 2007-02-01 at 10:58 +0200, Or Gerlitz wrote:
> 
> > >  From a reason that no one at RH can trace... someone went and removed
> > > all the support for ARPHRD_INFINIBAND multicast from u4 where it exists
> > > perfectly fine in u3 and hopefully on u5 as well (Doug can you update?),
> > > see https://bugs.openfabrics.org/show_bug.cgi?id=2661
> 
> > Yes.  It's been fixed for U5.  It wasn't that the patch got removed,
> > it's that between U3 and U4 I did a complete rebase, which means that
> > all the patches from U3 were tossed out the window and a complete new
> > set made for U4.  I just missed re-adding this one in U4.
> 
> thanks for fixing this for U5 (which i understand is not out yet, correct?).

Correct.  Although I can get people the packages slated for U5 if they
want to test/check them out.

> As of the importance for us to have IP multicast working fine with
> IPoIB over RH4...
> do you have an IB setup to test that?

Yeah, I've got a setup, I just don't have any multicast tests that I
run.  Any test programs you have for multicast in particular would be
helpful.

-- 
Doug Ledford <dledford at redhat.com>
              GPG KeyID: CFBFF194
              http://people.redhat.com/dledford

Infiniband specific RPMs available at
              http://people.redhat.com/dledford/Infiniband
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070201/77e963e2/attachment.sig>

From mst at mellanox.co.il  Thu Feb  1 14:24:05 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Fri, 2 Feb 2007 00:24:05 +0200
Subject: [openib-general] [PATCH 00/12] ofed_1_2 - Neighbour update
	support
In-Reply-To: <1170360071.16637.39.camel@stevo-desktop>
References: <1170360071.16637.39.camel@stevo-desktop>
Message-ID: <20070201222405.GG17617@mellanox.co.il>

> There's no easy way to tell who asked for notifications. And
> particularly why they asked for notification.
> 
> I think we should leave it as-is.  If we have problems, we'll fix it.
> 
> Or you could put your arp snoop code back in addr.c and address
> translation will not use netevents.  But still thing we should leave
> it...

I think the issues need to be addressed in some way.

I think I see another issue with the destructor approach: ib_core could
be unloaded while skb with destructor pointing to our code is still around.
This will lead to nasty crashes without clear backtrace on screen if text
segment memory gets over-written and the destructor gets called afterwards.

It currently seems that invoking the callback function directly rather than
sticking it in skb->destructor is the lesser of evils at this point.
But I'll think all this over, and I'd like to ask you to do this too,
and post some suggestions.

I can think of some more complicated approaches that might work better
for iwarp. Off the top of my head, our netevents implementation could
keep a reference on the skb, start a timer, check the users counter on skb and
call the notifier chain when it drops to 1. Let's sleep on it.

-- 
MST


From mst at mellanox.co.il  Thu Feb  1 14:25:57 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Fri, 2 Feb 2007 00:25:57 +0200
Subject: [openib-general] ip_ib_mc_map?
In-Reply-To: <1170361183.16637.47.camel@stevo-desktop>
References: <1170361183.16637.47.camel@stevo-desktop>
Message-ID: <20070201222557.GH17617@mellanox.co.il>

> Quoting Steve Wise <swise at opengridcomputing.com>:
> Subject: Re: ip_ib_mc_map?
> 
> On Thu, 2007-02-01 at 11:09 +0200, Michael S. Tsirkin wrote:
> > > Quoting Steve WIse <swise at opengridcomputing.com>:
> > > Subject: Re: ip_ib_mc_map?
> > > 
> > > On Wed, 2007-01-31 at 17:53 -0800, Sean Hefty wrote:
> > > > Steve Wise wrote:
> > > > > Perhaps there's no backport for this to rhel4u4?
> > > > 
> > > > I would have thought so, but I really don't know.  The function is called from 
> > > > net/ipv4/arp.c, and not directly by ipoib.  So, I don't know how the backport 
> > > > patches typically handle this.
> > > > 
> > > > - Sean
> > > 
> > > Here's what I see:
> > > 
> > > ip_ib_mc_map() is called directly from cma_join_ib_multicast(), which is
> > > added to the ofed_1_2 cma.c via patch file:
> > > kernel_patches/fixes/sean_multicast_1.patch
> > > 
> > > So when I compiled ofed_1_2 on rhel4u4, the cma wouldn't load because
> > > there is no ip_ib_mc_map() in rhel4u4.  
> > > 
> > > So you need a backport patch for this to work on rhel4u4.  Probably many
> > > of the older kernels.
> > 
> > I think this breakage is U4 specific. Someone at RH went to the trouble to
> > rip all of IB related stuff out of the U4 kernel.
> > 
> > I think just calling ip_tr_mc_map on U4 instead will be enough.
> > 
> 
> I changed cma.c to call ip_tr_mc_map() and I can at least load rdma_cm
> now.  I didn't test any mcast, but the rdma-cm is working over iwarp...

So this could be a macro in kernel_addons, unless someone from
Voltaire is willing to step up with a more elaborate implementation.

-- 
MST


From swise at opengridcomputing.com  Thu Feb  1 14:41:56 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Thu, 01 Feb 2007 16:41:56 -0600
Subject: [openib-general] [PATCH 00/12] ofed_1_2 - Neighbour update
	support
In-Reply-To: <20070201222405.GG17617@mellanox.co.il>
References: <1170360071.16637.39.camel@stevo-desktop>
	<20070201222405.GG17617@mellanox.co.il>
Message-ID: <1170369716.16637.69.camel@stevo-desktop>

On Fri, 2007-02-02 at 00:24 +0200, Michael S. Tsirkin wrote:
> > There's no easy way to tell who asked for notifications. And
> > particularly why they asked for notification.
> > 
> > I think we should leave it as-is.  If we have problems, we'll fix it.
> > 
> > Or you could put your arp snoop code back in addr.c and address
> > translation will not use netevents.  But still thing we should leave
> > it...
> 
> I think the issues need to be addressed in some way.
> 
> I think I see another issue with the destructor approach: ib_core could
> be unloaded while skb with destructor pointing to our code is still around.
> This will lead to nasty crashes without clear backtrace on screen if text
> segment memory gets over-written and the destructor gets called afterwards.
> 

Yes...hmm...  We could reference the module in the snoop function and
deref it in the destructor function.

> It currently seems that invoking the callback function directly rather than
> sticking it in skb->destructor is the lesser of evils at this point.
> But I'll think all this over, and I'd like to ask you to do this too,
> and post some suggestions.
> 

Ok.

> I can think of some more complicated approaches that might work better
> for iwarp. Off the top of my head, our netevents implementation could
> keep a reference on the skb, start a timer, check the users counter on skb and
> call the notifier chain when it drops to 1. Let's sleep on it.
> 

Ok.  I'll ponder it some more.  But we could solve the module unload
issue via module refs methinks.


Steve.
 

From mst at mellanox.co.il  Thu Feb  1 14:43:04 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Fri, 2 Feb 2007 00:43:04 +0200
Subject: [openib-general] new IB CM reject reason
In-Reply-To: <45C2480E.2000904@ichips.intel.com>
References: <20070201183922.GB15115@mellanox.co.il>
	<000101c74632$85b37bf0$8698070a@amr.corp.intel.com>
	<20070201190624.GB6473@mellanox.co.il>
	<45C2480E.2000904@ichips.intel.com>
Message-ID: <20070201224304.GI17617@mellanox.co.il>

> Quoting Sean Hefty <mshefty at ichips.intel.com>:
> Subject: Re: new IB CM reject reason
> 
> > And my claim is that you should define private data format to go with this
> > other reason otherwise you are not really solving the problem.
> 
> This is not a consumer issued reject.  It is a CM issued reject, so the private 
> data is ignored.  This is no different than several other reject reasons (like 
> invalid service ID).  At best we could define the ARI, but if we knew what the 
> contents of the ARI should be, then we should use a more specific reject reason 
> than 'other'.

I still don't really buy this, and I think you don't see my point.

The difference between ib_cm module and consumer is an artificial one -
the consumer just uses ib_cm as a convenience module. In particular, as a
feature, he gets automatic REJ generation when CM ID is destroyed.
In this case private data is all 0s.

So a custom protocol on top of ib_cm module that has its own consumer rejects for
some reason, would be wise to put something other than all 0s in its private
data if it wants to differentiate between the two kinds of consumer reject.
Most likely no one cares much about reject reasons so all this is
unnecessary.

But adding "other" reason just moves the problem up one level -
what if the actual consumer is using some library on top of CM?
Consider for example cma. It might generate rejects on its own too.
So now, there is cm, cma as a cm consumer, and the cma consumer.
So do we need yet another reject reason for cma generated rejects?

Do you see my point now?

-- 
MST


From mst at mellanox.co.il  Thu Feb  1 14:48:41 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Fri, 2 Feb 2007 00:48:41 +0200
Subject: [openib-general] [PATCH 00/12] ofed_1_2 - Neighbour update
	support
In-Reply-To: <1170369716.16637.69.camel@stevo-desktop>
References: <1170369716.16637.69.camel@stevo-desktop>
Message-ID: <20070201224841.GJ17617@mellanox.co.il>

> > I can think of some more complicated approaches that might work better
> > for iwarp. Off the top of my head, our netevents implementation could
> > keep a reference on the skb, start a timer, check the users counter on skb and
> > call the notifier chain when it drops to 1. Let's sleep on it.
> > 
> 
> Ok.  I'll ponder it some more.  But we could solve the module unload
> issue via module refs methinks.

This almost never works cleanly - module can't reference itself
without races: module can get unloaded after it drops the reference
to itself and before the function exits.
But I agree such a race is mostly theoretical.

And we still have the case where destructor != NULL.

Certainly something to think about.

-- 
MST


From mst at mellanox.co.il  Thu Feb  1 14:57:54 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Fri, 2 Feb 2007 00:57:54 +0200
Subject: [openib-general] new IB CM reject reason
In-Reply-To: <Pine.LNX.4.64.0702011522560.2536@jlentini-linux.nane.netapp.com>
References: <Pine.LNX.4.64.0702011522560.2536@jlentini-linux.nane.netapp.com>
Message-ID: <20070201225754.GK17617@mellanox.co.il>

> Invalid Service ID (8) appears to be an appropriate Reason value for 
> the case when a REQ is received for a service ID that is not 
> registered with the CM (either because the application crashed or 
> exited on its own accord).

No, we are actually speaking about reject to generate when application
cancels the communication establishment (e.g. by exiting), not as a response
to any CM message.

-- 
MST


From or.gerlitz at gmail.com  Thu Feb  1 15:18:26 2007
From: or.gerlitz at gmail.com (Or Gerlitz)
Date: Fri, 2 Feb 2007 01:18:26 +0200
Subject: [openib-general] ip_ib_mc_map?
In-Reply-To: <1170368361.2716.239.camel@fc6.xsintricity.com>
References: <1170275331.14294.1.camel@stevo-desktop>
	<45C1ABD0.5090404@voltaire.com>
	<1170325052.2716.229.camel@fc6.xsintricity.com>
	<15ddcffd0702011240l3c427bfcx6fcc7f7968fcf8b9@mail.gmail.com>
	<1170368361.2716.239.camel@fc6.xsintricity.com>
Message-ID: <15ddcffd0702011518qf115aaey862ef168784e81ca@mail.gmail.com>

On 2/2/07, Doug Ledford <dledford at redhat.com> wrote:
> > As of the importance for us to have IP multicast working fine with
> > IPoIB over RH4...
> > do you have an IB setup to test that?
>
> Yeah, I've got a setup, I just don't have any multicast tests that I
> run.  Any test programs you have for multicast in particular would be
> helpful.

This is farely simple to do: have some multicast traffic routed over
an IPoIB subnet on two nodes, eg using

$ route add -net 224.0.0.0 netmask 255.0.0.0 dev ib0

and then

server

$ iperf -usB 224.5.5.5 -i 1

client

$ iperf -uc 224.5.5.5 -l 100 -b 50M -t 30 -i 1

Or.


From swise at opengridcomputing.com  Thu Feb  1 15:23:37 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Thu, 01 Feb 2007 17:23:37 -0600
Subject: [openib-general] [PATCH 00/12] ofed_1_2 - Neighbour update
	support
In-Reply-To: <20070201224841.GJ17617@mellanox.co.il>
References: <1170369716.16637.69.camel@stevo-desktop>
	<20070201224841.GJ17617@mellanox.co.il>
Message-ID: <1170372217.16637.87.camel@stevo-desktop>

On Fri, 2007-02-02 at 00:48 +0200, Michael S. Tsirkin wrote:
> > > I can think of some more complicated approaches that might work better
> > > for iwarp. Off the top of my head, our netevents implementation could
> > > keep a reference on the skb, start a timer, check the users counter on skb and
> > > call the notifier chain when it drops to 1. Let's sleep on it.
> > > 

Remembering which skbs to check later requires more complication.  Here
is one method to handle this and do what you suggest above.

In the snoop function:

Clone the skb and save the original skb ptr in the new skb->cb area.
This area is ours to use on a freshly cloned skbuff.  Add this new skb
ptr to a linked list of outstanding netevents to be processed later.
Don't free the original skb passed in.  This keeps the reference on it
like you proposed above.  Schedule a delayed work handler for a few
ticks in the future.

In the delayed work handler:

Walk the pending netevents skb list.  For each pending skb, get the
original skb ptr from the cloned skb->cb area, and if the user count is
now 1 then do the current destructor() logic, remove the skb from the
pending list, and free both skbs.  If the list is not empty reschedule
the delayed work handler for a few ticks later.

In the module unload function:

cancel any delayed work handling
walk the pending list and free the skbs and the original snooped skbs.

This solves the destructor issue and the rmmod issue, but is more
complicated.  If you're worried about regressing straight rdma address
translation, then you can call the address translation timer function
synchronously in the snoop function like before and change the
addr_trans module to not use netevents...


Steve.
 

From mst at mellanox.co.il  Thu Feb  1 15:33:18 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Fri, 2 Feb 2007 01:33:18 +0200
Subject: [openib-general] [PATCH 00/12] ofed_1_2 - Neighbour update
	support
In-Reply-To: <1170372217.16637.87.camel@stevo-desktop>
References: <1170372217.16637.87.camel@stevo-desktop>
Message-ID: <20070201233318.GO17617@mellanox.co.il>

> Quoting Steve Wise <swise at opengridcomputing.com>:
> Subject: Re: [PATCH 00/12] ofed_1_2 - Neighbour update support
> 
> On Fri, 2007-02-02 at 00:48 +0200, Michael S. Tsirkin wrote:
> > > > I can think of some more complicated approaches that might work better
> > > > for iwarp. Off the top of my head, our netevents implementation could
> > > > keep a reference on the skb, start a timer, check the users counter on skb and
> > > > call the notifier chain when it drops to 1. Let's sleep on it.
> > > > 
> 
> Remembering which skbs to check later requires more complication.  Here
> is one method to handle this and do what you suggest above.
> 
> In the snoop function:
> 
> Clone the skb and save the original skb ptr in the new skb->cb area.
> This area is ours to use on a freshly cloned skbuff.  Add this new skb
> ptr to a linked list of outstanding netevents to be processed later.
> Don't free the original skb passed in.  This keeps the reference on it
> like you proposed above.  Schedule a delayed work handler for a few
> ticks in the future.
> 
> In the delayed work handler:
> 
> Walk the pending netevents skb list.  For each pending skb, get the
> original skb ptr from the cloned skb->cb area, and if the user count is
> now 1 then do the current destructor() logic, remove the skb from the
> pending list, and free both skbs.  If the list is not empty reschedule
> the delayed work handler for a few ticks later.
> 
> In the module unload function:
> 
> cancel any delayed work handling
> walk the pending list and free the skbs and the original snooped skbs.
> 
> This solves the destructor issue and the rmmod issue, but is more
> complicated.  If you're worried about regressing straight rdma address
> translation, then you can call the address translation timer function
> synchronously in the snoop function like before and change the
> addr_trans module to not use netevents...


Yes, this is what I proposed above. It does all sound quite complicated.
Some notes:
	- you don't need an skb just too keep a void*. create your own
	  structure for this.
	- better use a timer than a workqueue - you are calling netevents
	  from atomic context on new kernels anyway.

So maybe destructor with module ref counting is better.
Donnu.

-- 
MST


From swise at opengridcomputing.com  Thu Feb  1 15:50:27 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Thu, 01 Feb 2007 17:50:27 -0600
Subject: [openib-general] [PATCH 00/12] ofed_1_2 - Neighbour update
	support
In-Reply-To: <20070201233318.GO17617@mellanox.co.il>
References: <1170372217.16637.87.camel@stevo-desktop>
	<20070201233318.GO17617@mellanox.co.il>
Message-ID: <1170373827.16637.92.camel@stevo-desktop>

On Fri, 2007-02-02 at 01:33 +0200, Michael S. Tsirkin wrote:
> > Quoting Steve Wise <swise at opengridcomputing.com>:
> > Subject: Re: [PATCH 00/12] ofed_1_2 - Neighbour update support
> > 
> > On Fri, 2007-02-02 at 00:48 +0200, Michael S. Tsirkin wrote:
> > > > > I can think of some more complicated approaches that might work better
> > > > > for iwarp. Off the top of my head, our netevents implementation could
> > > > > keep a reference on the skb, start a timer, check the users counter on skb and
> > > > > call the notifier chain when it drops to 1. Let's sleep on it.
> > > > > 
> > 
> > Remembering which skbs to check later requires more complication.  Here
> > is one method to handle this and do what you suggest above.
> > 
> > In the snoop function:
> > 
> > Clone the skb and save the original skb ptr in the new skb->cb area.
> > This area is ours to use on a freshly cloned skbuff.  Add this new skb
> > ptr to a linked list of outstanding netevents to be processed later.
> > Don't free the original skb passed in.  This keeps the reference on it
> > like you proposed above.  Schedule a delayed work handler for a few
> > ticks in the future.
> > 
> > In the delayed work handler:
> > 
> > Walk the pending netevents skb list.  For each pending skb, get the
> > original skb ptr from the cloned skb->cb area, and if the user count is
> > now 1 then do the current destructor() logic, remove the skb from the
> > pending list, and free both skbs.  If the list is not empty reschedule
> > the delayed work handler for a few ticks later.
> > 
> > In the module unload function:
> > 
> > cancel any delayed work handling
> > walk the pending list and free the skbs and the original snooped skbs.
> > 
> > This solves the destructor issue and the rmmod issue, but is more
> > complicated.  If you're worried about regressing straight rdma address
> > translation, then you can call the address translation timer function
> > synchronously in the snoop function like before and change the
> > addr_trans module to not use netevents...
> 
> 
> Yes, this is what I proposed above. It does all sound quite complicated.
> Some notes:
> 	- you don't need an skb just too keep a void*. create your own
> 	  structure for this.
> 	- better use a timer than a workqueue - you are calling netevents
> 	  from atomic context on new kernels anyway.
> 
> So maybe destructor with module ref counting is better.
> Donnu.

We could use a global refcnt to count the number of pending destructions
and use a completion object to block unload until all the destructors
fire and the refcnt goes to zero.


From rdreier at cisco.com  Thu Feb  1 20:45:11 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Thu, 01 Feb 2007 20:45:11 -0800
Subject: [openib-general] ipath and current git woes
In-Reply-To: <20070201002202.GA12386@obsidianresearch.com> (Jason
	Gunthorpe's message of "Wed, 31 Jan 2007 17:22:02 -0700")
References: <20070201002202.GA12386@obsidianresearch.com>
Message-ID: <ada8xfh6uco.fsf@cisco.com>

 > After applying that patch the user space consumers load but we got a
 > kernel oops when we tried to run a test here :<
 > 
 > Unable to handle kernel NULL pointer dereference at 0000000000000918 RIP: 
 >  [<ffffffff88074c76>] :ib_ipath:ipath_mmap+0x37/0x95

So I had a look at this, and it seems that there are two bugs that
lead to this.

First of all, libipathverbs gets a response from the kernel that has a
64-bit kernel address in it, and passes that back into a call to
mmap(), where it uses that address as the offset.  On 32-bit
userspace, that chops off the high bits of the address and so the
ipath kernel driver can't find the address in its list.

So that explains why things don't work.  And unfortunately the obvious
fix for libipathverbs to use mmap64() instead of mmap() doesn't work,
because on Linux, mmap64() is implemented with the mmap2 system call,
which just allows the offset to be 12 bits bigger -- so it only gets
you to 44 bits, which is not enough to reach a 64-bit kernel address
(which is typically something like 0xffffc20000072000).  So you
probably want to use something like a 32-bit serial number to point at
your buffers or something like that.

The oops is caused by another more serious problem.  Obviously a buggy
libipathverbs shouldn't be able to crash the kernel, because even if
libipathverbs is fixed then malicious userspace could do the same
thing too.

It turns out that all the handling of pending_mmaps in the ipath
driver is not really careful about userspace screwing it up.  When
userspace creates a CQ, the CQ buffer is added to the device-wide list
of pending mmaps.  Of course 32-bit userspace never succeeds in
mapping that CQ, so it stays on the list (the only way it gets removed
is if it is successfully mmapped).  But then the destroy CQ operation
sees that the mmap is pending, and frees the structure holding the
information (without removing it from the list).  And of course when
that memory gets reused, then the pending mmap list gets corrupted,
etc etc.

Of course this is ugly to fix with the current data structure -- the
list of pending mmaps is singly-linked, which means I have to walk the
whole list to delete an entry.  It also makes the list walking in
ipath_mmap() is unnecessarily obfuscated.  I think it's much better to
just use the standard kernel list_head stuff if you're going to delete
things from the middle of the list, rather than implementing your own
singly-linked list.  Sure it costs an extra pointer in each entry but
no one ever has to worry about whether you're deleting things
correctly, etc.

There's some other silly stuff I noticed too, like:

    grep -n mmap_cnt *.[ch] /dev/null
    ipath_cq.c:232:		ip->mmap_cnt = 0;
    ipath_mmap.c:63:	ip->mmap_cnt++;
    ipath_mmap.c:70:	ip->mmap_cnt--;
    ipath_qp.c:837:			ip->mmap_cnt = 0;
    ipath_srq.c:162:		ip->mmap_cnt = 0;
    ipath_verbs.h:178:	unsigned mmap_cnt;

umm -- no one ever looks at mmap_cnt (there's a kref too), so why keep
it at all?

So Qlogic guys -- please fix this up!

 - R.


From rdreier at cisco.com  Thu Feb  1 20:47:10 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Thu, 01 Feb 2007 20:47:10 -0800
Subject: [openib-general] IPoIB CM for merge?
In-Reply-To: <20070201192924.GE17617@mellanox.co.il> (Michael S.
	Tsirkin's message of "Thu, 1 Feb 2007 21:29:24 +0200")
References: <20070201192924.GE17617@mellanox.co.il>
Message-ID: <ada4pq56u9d.fsf@cisco.com>

 > Could you please spend some time reviewing IPoIB CM code?
 > I am concerned about missing the 2.6.21 merge window.

Thanks for the reminder.

Can we trade?  Have you looked at the cxgb3 iwarp driver?  Any comments?

 - R.


From rdreier at cisco.com  Thu Feb  1 20:48:13 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Thu, 01 Feb 2007 20:48:13 -0800
Subject: [openib-general] [Fwd: Re: [PATCH 1/10] cxgb3 - main header
	files]
In-Reply-To: <1170363934.16637.58.camel@stevo-desktop> (Steve Wise's
	message of "Thu, 01 Feb 2007 15:05:34 -0600")
References: <1169216896.15842.6.camel@stevo-desktop>
	<adamz4f0wsy.fsf@cisco.com> <1170363934.16637.58.camel@stevo-desktop>
Message-ID: <adazm7x5fn6.fsf@cisco.com>

 > Have you had a chance to review this?

Still on my list.

Can we trade?  Can you look at the IPoIB connected mode stuff in the
ipoib-cm branch in

    git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git

and let me know if you see anything you don't like?

 - R.


From mike.heffner at evergrid.com  Thu Feb  1 21:10:09 2007
From: mike.heffner at evergrid.com (Mike Heffner)
Date: Fri, 02 Feb 2007 00:10:09 -0500
Subject: [openib-general] Detecting when an RDMA writer process disappears
Message-ID: <45C2C7B1.7090204@evergrid.com>


Is there any method by which a receiving process that is polling in 
preregistered memory regions for data from a sender performing RDMA 
writes, can detect if the sender is killed? Say by a SIGKILL signal? The 
RC connection is setup using the RDMA CM and there do not appear to be 
any CM events created on the event channel, nor does there appear to be 
any async. events created. Occasionally I will get a CQE failure on the 
QP, depending on where the communication flow is, that I can use to mark 
the connection failed, but this happens only about 50% of the time.

An alternative solution would be periodically sending "keep-alives" and 
detecting the CQE failure, but I'd be interested to know if there are 
any other options that don't require sending keep-alives.


Thanks,

Mike

-- 

   Mike Heffner <mike.heffner at evergrid.com>
   EverGrid Software
   Blacksburg, VA USA

   Voice: (540) 443-3500 #603


From jgunthorpe at obsidianresearch.com  Thu Feb  1 21:25:03 2007
From: jgunthorpe at obsidianresearch.com (Jason Gunthorpe)
Date: Thu, 1 Feb 2007 22:25:03 -0700
Subject: [openib-general] ipath and current git woes
In-Reply-To: <45C13771.2070406@qlogic.com>
References: <20070201002202.GA12386@obsidianresearch.com>
	<45C13771.2070406@qlogic.com>
Message-ID: <20070202052503.GA19654@obsidianresearch.com>

On Wed, Jan 31, 2007 at 04:42:25PM -0800, Robert Walsh wrote:
> Jason Gunthorpe wrote:
> >Has anyone been able to use ipath with the current latest git
> >everything?
> 
> We're working on getting this up to date right now.  Give us a couple of 
> days and we'll have some new patches ready.

OK. Things are working ok here using the same kernel and a 64 bit OFED
1.1 user space built in a chroot. That makes sense after reading
Roland's analysis...

Thanks,
Jason


From eitan at sw053.yok.mtl.com  Thu Feb  1 21:40:43 2007
From: eitan at sw053.yok.mtl.com (Eitan Zahavi)
Date: Fri, 2 Feb 2007 07:40:43 +0200
Subject: [openib-general] nightly osm_sim report 2007-02-02:normal completion
Message-ID: <200702020540.l125ehia022501@sw053.yok.mtl.com>

OSM Simulation Regression Summary
OpenSM rev = Thu_Feb_1_10:25:31_2007 b8cdb7 
ibutils rev = Wed_Jan_3_11:42:12_2007 913448 
Total=410 Pass=409 Fail=1

Pass:
30 Stability IS1-16.topo
30 Pkey IS1-16.topo
30 OsmTest IS1-16.topo
30 Multicast IS1-16.topo
30 LidMgr IS1-16.topo
29 OsmStress IS1-16.topo
10 Stability IS3-loop.topo
10 Stability IS3-128.topo
10 Pkey IS3-128.topo
10 OsmTest IS3-loop.topo
10 OsmTest IS3-128.topo
10 OsmStress IS3-128.topo
10 Multicast IS3-loop.topo
10 Multicast IS3-128.topo
10 LidMgr IS3-128.topo
10 FatTree part-4-ary-3-tree.topo
10 FatTree merge-roots-reorder-4-ary-2-tree.topo
10 FatTree merge-roots-4-ary-2-tree.topo
10 FatTree merge-root-4-ary-3-tree.topo
10 FatTree merge-root-12-ary-2-tree.topo
10 FatTree merge-2-ary-4-tree.topo
10 FatTree half-4-ary-3-tree.topo
10 FatTree blend-4-ary-2-tree.topo
10 FatTree 4-ary-4-tree.topo
10 FatTree 4-ary-3-tree.topo
10 FatTree 32nodes-3lvl-is1.topo
10 FatTree 2-ary-4-tree.topo
10 FatTree 12-node-spaced.topo
10 FatTree 12-ary-2-tree.topo

Failures:
1 OsmStress IS1-16.topo


From mst at mellanox.co.il  Thu Feb  1 22:03:22 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Fri, 2 Feb 2007 08:03:22 +0200
Subject: [openib-general] [PATCH 00/12] ofed_1_2 - Neighbour update
	support
In-Reply-To: <1170373827.16637.92.camel@stevo-desktop>
References: <1170373827.16637.92.camel@stevo-desktop>
Message-ID: <20070202060228.GQ17617@mellanox.co.il>

> We could use a global refcnt to count the number of pending destructions
> and use a completion object to block unload until all the destructors
> fire and the refcnt goes to zero.

It has the same race as module refcnt. So just use that.

-- 
MST


From bugzilla-daemon at lists.openfabrics.org  Thu Feb  1 22:16:04 2007
From: bugzilla-daemon at lists.openfabrics.org (bugzilla-daemon at lists.openfabrics.org)
Date: Thu,  1 Feb 2007 22:16:04 -0800 (PST)
Subject: [openib-general] [Bug 334] Problems with build
	OFED-1.1.1-ib_local_sa
In-Reply-To: <bug-334-1@https.bugs.openfabrics.org/>
Message-ID: <20070202061604.ECEC7E607F9@openfabrics.org>

https://bugs.openfabrics.org/show_bug.cgi?id=334


------- Comment #11 from dmitry.yulov at intel.com  2007-02-01 22:16 -------
(In reply to comment #10)
> What is the output of uname -r ? This is VERY important. Also, can you run 
`cat /etc/issue` and send the results?
> >        

As you can see my first message I wrote the my machine configuration:
>The machine configuration:
>Kernel: Linux 2.6.16.21-0.8-smp #1 SMP Mon Jul 3 18:25:39 UTC 2006 x86_64
x86_64 x86_64 GNU/Linux
>OS: SUSE Linux Enterprise Server 10 (x86_64)
>gcc version: gcc (GCC) 4.1.0 (SUSE Linux)

Unfortunately my machine didn't have the version of Linux in /etc/issue because
it is not right by IT requrements. I have saw the ofed_scripts/configure file
and I saw that for right choice of patches configure needed the file
/etc/issue. I think that not good idea because first of all need to run
command:  cat /etc/*release* and find the version Linux in this file and after
this check (if neccessary) file /etc/issue


-- 
Configure bugmail: https://bugs.openfabrics.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From mst at mellanox.co.il  Thu Feb  1 22:56:14 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Fri, 2 Feb 2007 08:56:14 +0200
Subject: [openib-general] IPoIB CM for merge?
In-Reply-To: <ada4pq56u9d.fsf@cisco.com>
References: <ada4pq56u9d.fsf@cisco.com>
Message-ID: <20070202065547.GS17617@mellanox.co.il>

> Quoting Roland Dreier <rdreier at cisco.com>:
> Subject: Re: IPoIB CM for merge?
> 
>  > Could you please spend some time reviewing IPoIB CM code?
>  > I am concerned about missing the 2.6.21 merge window.
> 
> Thanks for the reminder.
> 
> Can we trade?  Have you looked at the cxgb3 iwarp driver?  Any comments?

I haven't yet, sorry. OK.
I am not sure I have the last version posted so I am going to go by what
is there in OFED git tree.

And I also only looked under drivers/infiniband/.

So, here are some questions: I looked in the archives and have not seen
these addressed. Maybe these can be answered and then I'll go from there?
Does this sound OK?

Files with names like
./core/cxio_hal.c
./core/cxio_hal.h
normally generate a fair bit of discussion which wasn't present here,
I did not guess everyone was just busy.
For example, why is there both struct iwch_cq and struct t3_cq?

File tcb.h comment says:
/* This file is automatically generated --- do not edit */
This looks like a GPL violation, does it not?

What's the deal with the naming convention?
Is there a reason in cxgb3, some files start with iwch and some with cxio?
How about using cxgb3 prefix all over?

-- 
MST


From philippe.gregoire at cea.fr  Fri Feb  2 02:10:16 2007
From: philippe.gregoire at cea.fr (Philippe Gregoire)
Date: Fri, 02 Feb 2007 11:10:16 +0100
Subject: [openib-general] dry-run mode for opensm ?
Message-ID: <45C30E08.1030502@cea.fr>

Hal
Is there any way to run opensm in a dry-run mode
just to make it dump the route tables it will generate ?
We alve already an embedded SM and I would like to compare the
current route tables with those that OpenSM would generate.
Thanks
Philippe


From vlad at lists.openfabrics.org  Fri Feb  2 02:20:43 2007
From: vlad at lists.openfabrics.org (vlad at lists.openfabrics.org)
Date: Fri,  2 Feb 2007 02:20:43 -0800 (PST)
Subject: [openib-general] ofa_1_2_kernel 20070202-0200 daily build status
Message-ID: <20070202102043.4FA07E607F9@openfabrics.org>

This email was generated automatically, please do not reply


Common build parameters:  --with-ipoib-mod --with-sdp-mod --with-srp-mod --with-user_mad-mod --with-user_access-mod --with-mthca-mod --with-core-mod --with-addr_trans-mod --with-cxgb3-mod 

Passed:
Passed on i686 with 2.6.15-23-server
Passed on i686 with linux-2.6.19
Passed on i686 with linux-2.6.17
Passed on i686 with linux-2.6.18
Passed on i686 with linux-2.6.16
Passed on i686 with linux-2.6.12
Passed on i686 with linux-2.6.13
Passed on i686 with linux-2.6.15
Passed on i686 with linux-2.6.14
Passed on powerpc with linux-2.6.19
Passed on x86_64 with linux-2.6.19
Passed on x86_64 with linux-2.6.15
Passed on x86_64 with linux-2.6.17
Passed on x86_64 with linux-2.6.14
Passed on x86_64 with linux-2.6.12
Passed on x86_64 with linux-2.6.16
Passed on x86_64 with linux-2.6.18
Passed on powerpc with linux-2.6.18
Passed on ia64 with linux-2.6.19
Passed on x86_64 with linux-2.6.13
Passed on powerpc with linux-2.6.17
Passed on ppc64 with linux-2.6.12
Passed on powerpc with linux-2.6.15
Passed on ppc64 with linux-2.6.19
Passed on powerpc with linux-2.6.16
Passed on ppc64 with linux-2.6.17
Passed on powerpc with linux-2.6.13
Passed on ppc64 with linux-2.6.16
Passed on ia64 with linux-2.6.18
Passed on powerpc with linux-2.6.12
Passed on powerpc with linux-2.6.14
Passed on ppc64 with linux-2.6.18
Passed on ppc64 with linux-2.6.13
Passed on ppc64 with linux-2.6.15
Passed on ppc64 with linux-2.6.14
Passed on ia64 with linux-2.6.12
Passed on ia64 with linux-2.6.13
Passed on ia64 with linux-2.6.17
Passed on ia64 with linux-2.6.14
Passed on ia64 with linux-2.6.16
Passed on ia64 with linux-2.6.15

Failed:
Build failed on ia64 with linux-2.6.16.21-0.8-default
Log:
/home/vlad/tmp/ofa_1_2_kernel-20070202-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core/addr.c:380: error: implicit declaration of function ‘register_netevent_notifier’
/home/vlad/tmp/ofa_1_2_kernel-20070202-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core/addr.c: In function ‘addr_cleanup’:
/home/vlad/tmp/ofa_1_2_kernel-20070202-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core/addr.c:386: error: implicit declaration of function ‘unregister_netevent_notifier’
make[4]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070202-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core/addr.o] Error 1
make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070202-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core] Error 2
make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070202-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband] Error 2
make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070202-0200_linux-2.6.16.21-0.8-default_ia64_check] Error 2
make[1]: Leaving directory `/home/vlad/kernel.org/ia64/linux-2.6.16.21-0.8-default'
make: *** [kernel] Error 2
----------------------------------------------------------------------------------


From mst at mellanox.co.il  Fri Feb  2 03:15:32 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Fri, 2 Feb 2007 13:15:32 +0200
Subject: [openib-general] IPoIB CM for merge?
In-Reply-To: <ada4pq56u9d.fsf@cisco.com>
References: <ada4pq56u9d.fsf@cisco.com>
Message-ID: <20070202111532.GT17617@mellanox.co.il>

> Quoting Roland Dreier <rdreier at cisco.com>:
> Subject: Re: IPoIB CM for merge?
> 
>  > Could you please spend some time reviewing IPoIB CM code?
>  > I am concerned about missing the 2.6.21 merge window.
> 
> Thanks for the reminder.
> 
> Can we trade?  Have you looked at the cxgb3 iwarp driver?  Any comments?

OK.
I am not sure I have the last version posted so I am going to go by what
is there in OFED git tree.

And I also only looked under drivers/infiniband/.

So, here are some questions: I looked in the archives and have not seen
these addressed. Maybe these can be answered and then I'll go from there?
Does this sound OK?

Files with names like
./core/cxio_hal.c
./core/cxio_hal.h
normally generate a fair bit of discussion which wasn't present here,
I did not guess everyone was just busy.
For example, why is there both struct iwch_cq and struct t3_cq?

File tcb.h comment says:
/* This file is automatically generated --- do not edit */
This looks like a GPL violation, does it not?

What's the deal with the naming convention?
Is there a reason in cxgb3, some files start with iwch and some with cxio?
How about using cxgb3 prefix all over?

-- 
MST


From bugzilla-daemon at lists.openfabrics.org  Fri Feb  2 03:42:54 2007
From: bugzilla-daemon at lists.openfabrics.org (bugzilla-daemon at lists.openfabrics.org)
Date: Fri,  2 Feb 2007 03:42:54 -0800 (PST)
Subject: [openib-general] [Bug 334] Problems with build
	OFED-1.1.1-ib_local_sa
In-Reply-To: <bug-334-1@https.bugs.openfabrics.org/>
Message-ID: <20070202114254.39BAAE607F9@openfabrics.org>

https://bugs.openfabrics.org/show_bug.cgi?id=334


------- Comment #12 from dmitry.yulov at intel.com  2007-02-02 03:42 -------
Created an attachment (id=74)
 --> (https://bugs.openfabrics.org/attachment.cgi?id=74&action=view)
Patch for ofed_scripts/configure

I have added a patch file for configure in my case. 


-- 
Configure bugmail: https://bugs.openfabrics.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at lists.openfabrics.org  Fri Feb  2 03:56:43 2007
From: bugzilla-daemon at lists.openfabrics.org (bugzilla-daemon at lists.openfabrics.org)
Date: Fri,  2 Feb 2007 03:56:43 -0800 (PST)
Subject: [openib-general] [Bug 334] Problems with build
	OFED-1.1.1-ib_local_sa
In-Reply-To: <bug-334-1@https.bugs.openfabrics.org/>
Message-ID: <20070202115643.76DD4E607F9@openfabrics.org>

https://bugs.openfabrics.org/show_bug.cgi?id=334


------- Comment #13 from dmitry.yulov at intel.com  2007-02-02 03:56 -------
I want to ask someone how I can apply the patch during build.sh run script?
As I know when I run build.sh my old files with patch always update throught
run rpm -i openib-1.1.src.rpm. How I can do it (apply my patches) or I need to
wait new releases?


-- 
Configure bugmail: https://bugs.openfabrics.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From halr at voltaire.com  Fri Feb  2 06:31:36 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 02 Feb 2007 09:31:36 -0500
Subject: [openib-general] dry-run mode for opensm ?
In-Reply-To: <45C30E08.1030502@cea.fr>
References: <45C30E08.1030502@cea.fr>
Message-ID: <1170426648.15660.351722.camel@hal.voltaire.com>

Hi Phillipe,

On Fri, 2007-02-02 at 05:10, Philippe Gregoire wrote:
> Hal
> Is there any way to run opensm in a dry-run mode
> just to make it dump the route tables it will generate ?

Not that I'm aware of.

> We alve already an embedded SM and I would like to compare the
> current route tables with those that OpenSM would generate.

There are two options here from what I know:
1. Turn off the embedded SM temporarily and run OpenSM (in one of it's
various routing modes)
2. Get your topology into a simulator and run OpenSM on it

BTW, there are scripts which will work with any SM to dump the routing
tables (dump_lfts/mgfts.sh) if that is how you are doing the comparison.

-- Hal

> Thanks
> Philippe


From swise at opengridcomputing.com  Fri Feb  2 07:18:24 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Fri, 02 Feb 2007 09:18:24 -0600
Subject: [openib-general] [PATCH 00/12] ofed_1_2 - Neighbour update
	support
In-Reply-To: <20070202060228.GQ17617@mellanox.co.il>
References: <1170373827.16637.92.camel@stevo-desktop>
	<20070202060228.GQ17617@mellanox.co.il>
Message-ID: <1170429504.26115.1.camel@stevo-desktop>

On Fri, 2007-02-02 at 08:03 +0200, Michael S. Tsirkin wrote:
> > We could use a global refcnt to count the number of pending destructions
> > and use a completion object to block unload until all the destructors
> > fire and the refcnt goes to zero.
> 
> It has the same race as module refcnt. So just use that.
> 

I don't understand the race.  Can you explain please?  This should be
able to be done without a race with a refcnt, a spinlock, a bit saying
we're unloading, and a completion object.

But maybe I'm confused ;-)


From swise at opengridcomputing.com  Fri Feb  2 07:28:59 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Fri, 02 Feb 2007 09:28:59 -0600
Subject: [openib-general] [Fwd: Re: [PATCH 1/10] cxgb3 - main header
	files]
In-Reply-To: <adazm7x5fn6.fsf@cisco.com>
References: <1169216896.15842.6.camel@stevo-desktop>
	<adamz4f0wsy.fsf@cisco.com> <1170363934.16637.58.camel@stevo-desktop>
	<adazm7x5fn6.fsf@cisco.com>
Message-ID: <1170430139.26115.9.camel@stevo-desktop>

On Thu, 2007-02-01 at 20:48 -0800, Roland Dreier wrote:
>  > Have you had a chance to review this?
> 
> Still on my list.
> 
> Can we trade?  Can you look at the IPoIB connected mode stuff in the
> ipoib-cm branch in
> 
>     git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git
> 
> and let me know if you see anything you don't like?
> 
>  - R.

Ok.  I'll review the IPoIB connected mode code.


Steve.


From halr at voltaire.com  Fri Feb  2 07:28:06 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 02 Feb 2007 10:28:06 -0500
Subject: [openib-general] components that have not opend the ofed_1_2
	branch
In-Reply-To: <45C209EA.1040207@mellanox.co.il>
References: <45C209EA.1040207@mellanox.co.il>
Message-ID: <1170430064.15660.354336.camel@hal.voltaire.com>

On Thu, 2007-02-01 at 10:40, Tziporet Koren wrote:
> The following components have not opened ofed_1_2 branch:
> 
>     * libibverbs - Roland
>     * libmthca - Roland
>     * libipathverbs - Bryan
>     * tvflash - Roland
>     * srptools - Ishai
>     * management - Hal
> 
> 
> Please open the branch today or tomorrow at the latest .

Done; just created the ofed_1_2 branch for management.

-- Hal

> Thanks,
> Tziporet


From swise at opengridcomputing.com  Fri Feb  2 07:41:09 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Fri, 02 Feb 2007 09:41:09 -0600
Subject: [openib-general] ofa_1_2_kernel 20070202-0200 daily build status
In-Reply-To: <20070202102043.4FA07E607F9@openfabrics.org>
References: <20070202102043.4FA07E607F9@openfabrics.org>
Message-ID: <1170430869.26115.12.camel@stevo-desktop>

On Fri, 2007-02-02 at 02:20 -0800, vlad at lists.openfabrics.org wrote:
> This email was generated automatically, please do not reply

Which distro is 2.6.16.21-0.8-default?  I'm sure I didn't do a netevent
backport that.  


> Failed:
> Build failed on ia64 with linux-2.6.16.21-0.8-default
> Log:
> /home/vlad/tmp/ofa_1_2_kernel-20070202-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core/addr.c:380: error: implicit declaration of function ‘register_netevent_notifier’
> /home/vlad/tmp/ofa_1_2_kernel-20070202-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core/addr.c: In function ‘addr_cleanup’:
> /home/vlad/tmp/ofa_1_2_kernel-20070202-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core/addr.c:386: error: implicit declaration of function ‘unregister_netevent_notifier’
> make[4]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070202-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core/addr.o] Error 1
> make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070202-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core] Error 2
> make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070202-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband] Error 2
> make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070202-0200_linux-2.6.16.21-0.8-default_ia64_check] Error 2
> make[1]: Leaving directory `/home/vlad/kernel.org/ia64/linux-2.6.16.21-0.8-default'
> make: *** [kernel] Error 2
> ----------------------------------------------------------------------------------
> 
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


From swise at opengridcomputing.com  Fri Feb  2 07:54:31 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Fri, 02 Feb 2007 09:54:31 -0600
Subject: [openib-general] IPoIB CM for merge?
In-Reply-To: <20070202111532.GT17617@mellanox.co.il>
References: <ada4pq56u9d.fsf@cisco.com> <20070202111532.GT17617@mellanox.co.il>
Message-ID: <1170431671.26115.25.camel@stevo-desktop>

On Fri, 2007-02-02 at 13:15 +0200, Michael S. Tsirkin wrote:
> > Quoting Roland Dreier <rdreier at cisco.com>:
> > Subject: Re: IPoIB CM for merge?
> > 
> >  > Could you please spend some time reviewing IPoIB CM code?
> >  > I am concerned about missing the 2.6.21 merge window.
> > 
> > Thanks for the reminder.
> > 
> > Can we trade?  Have you looked at the cxgb3 iwarp driver?  Any comments?
> 
> OK.
> I am not sure I have the last version posted so I am going to go by what
> is there in OFED git tree.
> 
> And I also only looked under drivers/infiniband/.
> 
> So, here are some questions: I looked in the archives and have not seen
> these addressed. Maybe these can be answered and then I'll go from there?
> Does this sound OK?
> 
> Files with names like
> ./core/cxio_hal.c
> ./core/cxio_hal.h
> normally generate a fair bit of discussion which wasn't present here,
> I did not guess everyone was just busy.
> For example, why is there both struct iwch_cq and struct t3_cq?
> 

The cxgb3/core code defines a low level interface to the RDMA bits of
the T3 device. 

This code was originally a separate module (named cxio) that allowed
other RDMA middleware layers to sit on top of the this core rdma module.
At the time, there was RNIC-PI and OFA being developed.  So that is the
history of this.  As per the first openib review (about a year ago) of
this code I merged this core module into the cxgb3 module.  I left the
file structure and names as-is because it was low priority IMO.

The t3_cq struct is the low level CQ structure used to manage both a HW
accessed CQ and a SW CQ (needed to handle error cases and out of order
completions). The iwch_cq struct contains the stuff needed to integrate
with the OFA core and uverbs code. It contains a t3_cq inline.

> File tcb.h comment says:
> /* This file is automatically generated --- do not edit */
> This looks like a GPL violation, does it not?
> 

I can add the license if that's what you mean.

> What's the deal with the naming convention?
> Is there a reason in cxgb3, some files start with iwch and some with cxio?
> How about using cxgb3 prefix all over?

The cxio_ prefix is used for the low-level functions/types that talk
directly with the HW.  iwch_ is the provider driver functions that
interface with the OFA stack.  I'd rather not change the names.
Especially since this has already gone through several review cycles.
I'm hoping we can get this in and improve it with subsequent
submissions.  Is that reasonable?

Steve.
 

From mshefty at ichips.intel.com  Fri Feb  2 09:59:05 2007
From: mshefty at ichips.intel.com (Sean Hefty)
Date: Fri, 02 Feb 2007 09:59:05 -0800
Subject: [openib-general] please pull for 2.6.21: fix + add IB multicast
 support
In-Reply-To: <45BF8E17.2010805@ichips.intel.com>
References: <000701c741a6$16dc4760$ff0da8c0@amr.corp.intel.com>
	<45BF8E17.2010805@ichips.intel.com>
Message-ID: <45C37BE9.5040105@ichips.intel.com>

> Sean Hefty (3):
>        rdma_cm: Increment port number after close to avoid re-use.
>        ib_sa: track multicast join/leave requests
>        rdma_cm: add multicast communication support

Assuming that you haven't look at this yet, I updated the ib_sa patch above to 
shorten the workqueue name, plus added a fourth patch to shorten the workqueue 
names for ib_addr and rdma_cm.  E.g. "ib_mcast_wq" became "ib_mcast".

Let me know if you need any assistance.

- Sean


From swise at opengridcomputing.com  Fri Feb  2 11:18:13 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Fri, 02 Feb 2007 13:18:13 -0600
Subject: [openib-general] IPoIB connected mode review comments
In-Reply-To: <adazm7x5fn6.fsf@cisco.com>
References: <1169216896.15842.6.camel@stevo-desktop>
	<adamz4f0wsy.fsf@cisco.com> <1170363934.16637.58.camel@stevo-desktop>
	<adazm7x5fn6.fsf@cisco.com>
Message-ID: <1170443893.26115.59.camel@stevo-desktop>

On Thu, 2007-02-01 at 20:48 -0800, Roland Dreier wrote:
>  > Have you had a chance to review this?
> 
> Still on my list.
> 
> Can we trade?  Can you look at the IPoIB connected mode stuff in the
> ipoib-cm branch in
> 
>     git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git
> 
> and let me know if you see anything you don't like?
> 
>  - R.

Here are my comments.  I'm not an ib cm expert though.  These are mostly
questions:


Since IPoIB is using IP addresses already, wouldn't it be simpler to use
the rdma cm to setup connections?  

Could you optimize this design and only signal some of the tx wrs?

In ipoib_cm_send() you call ipoib_cm_skb_too_long() if the packet is too
large for the interface mtu.  And you print a warning.  But
ipoib_cm_skb_too_long() actually queues the packet for the cm case.  For
ud it just drops the packet.  The skb task for cm then will send a
ICMP_DEST_UNREACH for these packets.  Why the difference?  Also if this
packet came from the local stack via a local application, you don't want
to send  DEST_UNREACH, right?  (I'm probably just confused about the
purpose of this).

In ipoib_cm_tx_completion() you rearm, then drain the cq.  I thought
there was some reason that it was better to do drain/rearm/drain?
Something about if you rearm and there's a cq entry mthca does another
immediate interrupt?  

In ipoib_cm_handle_tx_wc():

When can a tx completion happen with a wr_id that isn't within the
ipoib_sendq_size range?  This looks like it is really a bug condition
that should never happen.  I see the same code in the rx completion path
too.  

Also, what's up with the /* FIXME */ comment?

You lock the priv->lock inside of the priv->tx_lock.  Is this ordering
correct and consistent across all the code?


ipoib_cm_handle_rx_wc() - what's up with the XXX comment?

What's the algorithm to keep enough buffers posted in the SRQ?


From akepner at sgi.com  Fri Feb  2 13:34:15 2007
From: akepner at sgi.com (akepner at sgi.com)
Date: Fri, 2 Feb 2007 13:34:15 -0800 (PST)
Subject: [openib-general] [RFC/BUG] libibverbs: DMA vs. CQ race
In-Reply-To: <adabqkhy04v.fsf@cisco.com>
References: <Pine.LNX.4.61.0612131626250.24974@localhost.localdomain>
	<ada8xhaq5ze.fsf@cisco.com>
	<Pine.LNX.4.61.0701281444230.32767@localhost.localdomain>
	<adabqkhy04v.fsf@cisco.com>
Message-ID: <Pine.LNX.4.61.0702021326050.26058@localhost.localdomain>


Thanks for having a look at this.

On Mon, 29 Jan 2007, Roland Dreier wrote:

> ....
> Well, first the changes to the userspace libmthca need to be such that
> new libmthca continues to work with old kernels....

Absolutely.

> .....
> The really strange thing about this is that this Altix
> coherent/consistent memory really isn't about the memory itself, but
> about the relationship of that memory with DMA elsewhere -- as I
> understand the code, doing dma_alloc_coherent() returns normal memory
> with a special DMA address that tells the system to flush other DMAs
> before doing DMA to the coherent region.  Which isn't really what most
> people understand coherent memory to be, but it has the magic property
> of making most drivers work.
> ....

I agree that this isn't a very elegant solution, but I don't
know of a better one.

Assuming that something along the lines of the previous patch
is used, we need to address userspace/kernel compatibility.

The existing abi versioning doesn't seem to be exactly what
we want to use, though, because we want to change a verb's
semantics to work around a bug. (Changing the abi_version
may be an inevitable result, though.)

How about adding "semantic flags" to the mthca_* commands
(mthca_create_cq, etc.)? Userspace could read the contents of
a new sysfs file which, if found, would indicate the flags
that the kernel understands. Then it could pass the flags, if
it chooses, to get the kernel to use the desired semantics.

Something like:

# cat /sys/class/infiniband_verbs/uverbs0/abi_flags
0000000000000001 [64 bits of flags]

where:

enum abi_flags {
         COHERENT_USER_CQ        = (1<<0),
         .....
};

Better/different ideas?

-- 
Arthur


From pasquale.davide at gmail.com  Fri Feb  2 15:17:45 2007
From: pasquale.davide at gmail.com (Davide Pasquale)
Date: Sat, 3 Feb 2007 00:17:45 +0100
Subject: [openib-general] OFED 1.1 build issue
In-Reply-To: <1169128895.31746.73017.camel@hal.voltaire.com>
References: <ef24263c0701120228j780ed2cfp9ec549e9d73acec1@mail.gmail.com>
	<20070112112201.GB2802@mellanox.co.il>
	<ef24263c0701180319k6d5ac590x92991858d9b7f487@mail.gmail.com>
	<1169123080.31746.67663.camel@hal.voltaire.com>
	<ef24263c0701180502l46bf26b2u5de42337dcdb2325@mail.gmail.com>
	<1169126162.31746.70598.camel@hal.voltaire.com>
	<ef24263c0701180552t4d84e84dkfd7279bc2f94060f@mail.gmail.com>
	<1169128895.31746.73017.camel@hal.voltaire.com>
Message-ID: <ef24263c0702021517l5b28b43eyee68989c84f47d53@mail.gmail.com>

Solved upgrading blade enclosure firmware to version 1.20!

Thanks.

On 18 Jan 2007 09:01:45 -0500, Hal Rosenstock <halr at voltaire.com> wrote:
>
> On Thu, 2007-01-18 at 08:52, Davide Pasquale wrote:
> > On 18 Jan 2007 08:19:34 -0500, Hal Rosenstock <halr at voltaire.com>
> > wrote:
> >         On Thu, 2007-01-18 at 08:02, Davide Pasquale wrote:
> >         >
> >         > On 18 Jan 2007 07:34:43 -0500, Hal Rosenstock
> >         <halr at voltaire.com>
> >         > wrote:
> >         >         On Thu, 2007-01-18 at 06:19, Davide Pasquale wrote:
> >         >         > Starting opensm I see this error in
> >         /var/log/osm.log:
> >         >         >
> >         >         > OpenSM Rev:openib-2.0.5 OpenIB svn Exported
> >         revision
> >         >         > Jan 18 12:11:39 628147 [95AA8160] ->
> >         osm_vendor_bind:
> >         >         Binding to port
> >         >         > 0x18feffff8c7a8d
> >         >         > Jan 18 12:11:39 629557 [95AA8160] ->
> >         osm_vendor_bind:
> >         >         Binding to port
> >         >         > 0x18feffff8c7a8d
> >         >         > Jan 18 12:11:39 630605 [41401960] -> SM port is
> >         down
> >         >         > Jan 18 12:11:39 630693 [41401960] ->
> >         >         __osm_sm_state_mgr_signal_error:
> >         >         > ERR 3207: Invalid signal OSM_SM_SIGNAL_DISCOVER in
> >         state
> >         >         > IB_SMINFO_STATE_DISCOVERING
> >         >         > Jan 18 12:11:49 631170 [41E02960] -> SM port is
> >         down
> >         >         > Jan 18 12:11:49 631238 [41E02960] ->
> >         >         __osm_sm_state_mgr_signal_error:
> >         >         > ERR 3207: Invalid signal OSM_SM_SIGNAL_DISCOVER in
> >         state
> >         >         > IB_SMINFO_STATE_DISCOVERING
> >         >         >
> >         >         > and the SM port is always down.
> >         >
> >         >         The error message is benign.
> >         >
> >         >         Is the SM port plugged into any other IB device ?
> >         >
> >         >         -- Hal
> >         >
> >         > Hi Hal,
> >         >
> >         > we are using HP Blade System and each blade has an
> >         infiniband card
> >         > onboard.
> >         > The SM port is plugged in the Infiniband switch internal to
> >         the blade
> >         > enclosure.
> >         > Is this information helpful for you ?
> >
> >         The port being down has nothing to do with SM operation. For
> >         some
> >         reason, there is no connectivity or negotiation between the
> >         blades and
> >         the switch.
> >
> >         -- Hal
> >
> >         >
> >         >
> >         >
> >         >
> >         >
> >
> >
> > Thanks!
> > What can I look to in order to solve this problem ?
>
> I don't know the HP blade system so the only thing I can say to try is
> to unseat and reseat all the blades (HCAs and switch(es)) to see if this
> resolves the problem. If it doesn't, I have no clue.
>
> -- Hal
>
> >
> > Regards,
> > Davide.
> >
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070203/78f71be3/attachment.html>

From sean.hefty at intel.com  Fri Feb  2 16:02:23 2007
From: sean.hefty at intel.com (Sean Hefty)
Date: Fri, 2 Feb 2007 16:02:23 -0800
Subject: [openib-general] [RFC] [PATCH] ib_usa: export multicast and
 informinfo registration to userspace
Message-ID: <000001c74726$94d0f500$e598070a@amr.corp.intel.com>

Export SA client capabilities for multicast and SA event registration
to userspace.  Multicast and event registration are tracked on a per
port basis, with tracking done by the ib_sa kernel module.

Based on feedback from the list, a new userspace SA module was added,
rather than trying to rework the usermad interface.  The user to kernel
interface is minimal, but was designed to be flexible enough to add
additional SA client support if needed.  (E.g. local SA cache lookup,
SA queries, service registration, etc.)

Signed-off-by: Sean Hefty <sean.hefty at intel.com>
---
The following patch is also available from the user_sa branch of my
rdma-dev.git tree, and is dependent on the informinfo branch/patch
posted earlier to the list.  (A couple of small fixes to the informinfo
code have been added since the original patches.)  A userspace sa library
is also available.

The informinfo and userspace support was completed as part of the
PathForward project at the request of the US National Laboratories.

diff --git a/drivers/infiniband/Kconfig b/drivers/infiniband/Kconfig
index 9edface..b5ffc78 100644
--- a/drivers/infiniband/Kconfig
+++ b/drivers/infiniband/Kconfig
@@ -18,15 +18,15 @@ config INFINIBAND_USER_MAD
 	  need libibumad from <http://www.openib.org>.
 
 config INFINIBAND_USER_ACCESS
-	tristate "InfiniBand userspace access (verbs and CM)"
+	tristate "InfiniBand userspace access (verbs, CM, SA client)"
 	depends on INFINIBAND
 	---help---
 	  Userspace InfiniBand access support.  This enables the
-	  kernel side of userspace verbs and the userspace
-	  communication manager (CM).  This allows userspace processes
-	  to set up connections and directly access InfiniBand
+	  kernel side of userspace verbs, the userspace communication
+	  manager (CM), and userspace SA client.  This allows userspace
+	  processes to set up connections and directly access InfiniBand
 	  hardware for fast-path operations.  You will also need
-	  libibverbs, libibcm and a hardware driver library from
+	  libibverbs, libibcm, libibsa, and a hardware driver library from
 	  <http://www.openib.org>.
 
 config INFINIBAND_ADDR_TRANS
diff --git a/drivers/infiniband/core/Makefile b/drivers/infiniband/core/Makefile
index 2e9c4b2..e89cf2e 100644
--- a/drivers/infiniband/core/Makefile
+++ b/drivers/infiniband/core/Makefile
@@ -4,7 +4,7 @@ user_access-$(CONFIG_INFINIBAND_ADDR_TRANS)	:= rdma_ucm.o
 obj-$(CONFIG_INFINIBAND) +=		ib_core.o ib_mad.o ib_sa.o \
 					ib_cm.o iw_cm.o $(infiniband-y)
 obj-$(CONFIG_INFINIBAND_USER_MAD) +=	ib_umad.o
-obj-$(CONFIG_INFINIBAND_USER_ACCESS) +=	ib_uverbs.o ib_ucm.o \
+obj-$(CONFIG_INFINIBAND_USER_ACCESS) +=	ib_uverbs.o ib_ucm.o ib_usa.o \
 					$(user_access-y)
 
 ib_core-y :=			packer.o ud_header.o verbs.o sysfs.o \
@@ -28,5 +28,7 @@ ib_umad-y :=			user_mad.o
 
 ib_ucm-y :=			ucm.o
 
+ib_usa-y :=			usa.o
+
 ib_uverbs-y :=			uverbs_main.o uverbs_cmd.o uverbs_mem.o \
 				uverbs_marshall.o
diff --git a/drivers/infiniband/core/sa_query.c b/drivers/infiniband/core/sa_query.c
index 172a450..771f52a 100644
--- a/drivers/infiniband/core/sa_query.c
+++ b/drivers/infiniband/core/sa_query.c
@@ -464,6 +464,46 @@ static const struct ib_field notice_table[] = {
 	  .size_bits    = 128 },
 };
 
+int ib_sa_pack_attr(void *dst, void *src, int attr_id)
+{
+	switch (attr_id) {
+	case IB_SA_ATTR_MC_MEMBER_REC:
+		ib_pack(mcmember_rec_table, ARRAY_SIZE(mcmember_rec_table),
+			src, dst);
+		break;
+	case IB_SA_ATTR_INFORM_INFO:
+		ib_pack(inform_table, ARRAY_SIZE(inform_table), src, dst);
+		break;
+	case IB_SA_ATTR_NOTICE:
+		ib_pack(notice_table, ARRAY_SIZE(notice_table), src, dst);
+		break;
+	default:
+		return -EINVAL;
+	}
+	return 0;
+}
+EXPORT_SYMBOL(ib_sa_pack_attr);
+
+int ib_sa_unpack_attr(void *dst, void *src, int attr_id)
+{
+	switch (attr_id) {
+	case IB_SA_ATTR_MC_MEMBER_REC:
+		ib_unpack(mcmember_rec_table, ARRAY_SIZE(mcmember_rec_table),
+			  src, dst);
+		break;
+	case IB_SA_ATTR_INFORM_INFO:
+		ib_unpack(inform_table, ARRAY_SIZE(inform_table), src, dst);
+		break;
+	case IB_SA_ATTR_NOTICE:
+		ib_unpack(notice_table, ARRAY_SIZE(notice_table), src, dst);
+		break;
+	default:
+		return -EINVAL;
+	}
+	return 0;
+}
+EXPORT_SYMBOL(ib_sa_unpack_attr);
+
 static void free_sm_ah(struct kref *kref)
 {
 	struct ib_sa_sm_ah *sm_ah = container_of(kref, struct ib_sa_sm_ah, ref);
diff --git a/drivers/infiniband/core/usa.c b/drivers/infiniband/core/usa.c
new file mode 100644
index 0000000..ae05091
--- /dev/null
+++ b/drivers/infiniband/core/usa.c
@@ -0,0 +1,792 @@
+/*
+ * Copyright (c) 2006-2007 Intel Corporation.  All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *	copyright notice, this list of conditions and the following
+ *	disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *	copyright notice, this list of conditions and the following
+ *	disclaimer in the documentation and/or other materials
+ *	provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include <linux/completion.h>
+#include <linux/mutex.h>
+#include <linux/poll.h>
+#include <linux/idr.h>
+#include <linux/miscdevice.h>
+
+#include <rdma/ib_usa.h>
+
+MODULE_AUTHOR("Sean Hefty");
+MODULE_DESCRIPTION("IB userspace SA");
+MODULE_LICENSE("Dual BSD/GPL");
+
+static void usa_add_one(struct ib_device *device);
+static void usa_remove_one(struct ib_device *device);
+
+static struct ib_client usa_client = {
+	.name   = "ib_usa",
+	.add    = usa_add_one,
+	.remove = usa_remove_one
+};
+
+struct usa_device {
+	struct list_head list;
+	struct ib_device *device;
+	struct completion comp;
+	atomic_t refcount;
+	int start_port;
+	int end_port;
+};
+
+struct usa_file {
+	struct mutex		file_mutex;
+	struct file		*filp;
+	struct ib_sa_client	sa_client;
+	struct list_head	event_list;
+	struct list_head	id_list;
+	wait_queue_head_t	poll_wait;
+	int			event_id;
+};
+
+struct usa_id {
+	struct usa_file *file;
+	struct usa_device *dev;
+	struct list_head list;
+	u64 uid;
+	int num;
+	int events_reported;
+	u16 attr_id;
+};
+
+struct usa_event {
+	struct usa_id *id;
+	struct list_head list;
+	struct ib_usa_event_resp resp;
+};
+
+struct usa_multicast {
+	struct usa_id id;
+	struct usa_event event;
+	struct ib_sa_multicast *multicast;
+};
+
+struct usa_inform_info {
+	struct usa_id id;
+	struct ib_inform_info *inform_info;
+};
+
+static DEFINE_MUTEX(usa_mutex);
+static LIST_HEAD(dev_list);
+static DEFINE_IDR(usa_idr);
+
+static struct usa_device *get_dev(__be64 guid, __u8 port_num)
+{
+	struct usa_device *dev;
+
+	mutex_lock(&usa_mutex);
+	list_for_each_entry(dev, &dev_list, list) {
+		if (dev->device->node_guid == guid) {
+    			if (port_num < dev->start_port ||
+			    port_num > dev->end_port)
+				break;
+			atomic_inc(&dev->refcount);
+			mutex_unlock(&usa_mutex);
+			return dev;
+		}
+	}
+	mutex_unlock(&usa_mutex);
+	return NULL;
+}
+
+static void put_dev(struct usa_device *dev)
+{
+	if (atomic_dec_and_test(&dev->refcount))
+		complete(&dev->comp);
+}
+
+static int insert_id(struct usa_id *id)
+{
+	int ret;
+
+	do {
+		ret = idr_pre_get(&usa_idr, GFP_KERNEL);
+		if (!ret)
+			break;
+
+		mutex_lock(&usa_mutex);
+		ret = idr_get_new(&usa_idr, id, &id->num);
+		mutex_unlock(&usa_mutex);
+	} while (ret == -EAGAIN);
+
+	return ret;
+}
+
+static void remove_id(struct usa_id *id)
+{
+	mutex_lock(&usa_mutex);
+	idr_remove(&usa_idr, id->num);
+	mutex_unlock(&usa_mutex);
+}
+
+static struct usa_id *get_id(int num, struct usa_file *file, u16 attr_id)
+{
+	struct usa_id *id;
+
+	id = idr_find(&usa_idr, num);
+	if (!id)
+		return ERR_PTR(-ENOENT);
+
+	if ((id->file != file) || (id->attr_id != attr_id))
+		return ERR_PTR(-EINVAL);
+
+	return id;
+}
+
+static void insert_file_id(struct usa_file *file, struct usa_id *id)
+{
+	mutex_lock(&file->file_mutex);
+	list_add_tail(&id->list, &file->id_list);
+	mutex_unlock(&file->file_mutex);
+}
+
+static void remove_file_id(struct usa_file *file, struct usa_id *id)
+{
+	mutex_lock(&file->file_mutex);
+	list_del(&id->list);
+	mutex_unlock(&file->file_mutex);
+}
+
+static void finish_event(struct usa_event *event)
+{
+	switch (be16_to_cpu(event->resp.attr_id)) {
+	case IB_SA_ATTR_MC_MEMBER_REC:
+		list_del_init(&event->list);
+		event->id->events_reported++;
+		break;
+	default:
+		list_del(&event->list);
+		if (event->id)
+			event->id->events_reported++;
+		kfree(event);
+		break;
+	}
+}
+
+static ssize_t usa_get_event(struct usa_file *file, const char __user *inbuf,
+			      int in_len, int out_len)
+{
+	struct ib_usa_get_event cmd;
+	struct usa_event *event;
+	int ret = 0;
+	DEFINE_WAIT(wait);
+
+	if (out_len < sizeof(event->resp))
+		return -ENOSPC;
+
+	if (copy_from_user(&cmd, inbuf, sizeof(cmd)))
+		return -EFAULT;
+
+	mutex_lock(&file->file_mutex);
+	while (list_empty(&file->event_list)) {
+		mutex_unlock(&file->file_mutex);
+
+		if (file->filp->f_flags & O_NONBLOCK)
+			return -EAGAIN;
+
+		if (wait_event_interruptible(file->poll_wait,
+					     !list_empty(&file->event_list)))
+			return -ERESTARTSYS;
+
+		mutex_lock(&file->file_mutex);
+	}
+
+	event = list_entry(file->event_list.next, struct usa_event, list);
+
+	if (copy_to_user((void __user *)(unsigned long)cmd.response,
+			 &event->resp, sizeof(event->resp))) {
+		ret = -EFAULT;
+		goto done;
+	}
+
+	finish_event(event);
+done:
+	mutex_unlock(&file->file_mutex);
+	return ret;
+}
+
+static void queue_event(struct usa_file *file, struct usa_event *event)
+{
+	mutex_lock(&file->file_mutex);
+	list_move_tail(&event->list, &file->event_list);
+	wake_up_interruptible(&file->poll_wait);
+	mutex_unlock(&file->file_mutex);
+}
+
+/*
+ * We can get up to two events for a single multicast member.  A second event
+ * only occurs if there's an error on an existing multicast membership.
+ * Report only the last event.
+ */
+static int multicast_handler(int status, struct ib_sa_multicast *multicast)
+{
+	struct usa_multicast *mcast = multicast->context;
+	struct usa_file *file = mcast->id.file;
+
+	mcast->event.resp.status = status;
+	if (!status) {
+		mcast->event.resp.data_len = IB_SA_ATTR_MC_MEMBER_REC_LEN;
+		ib_sa_pack_attr(mcast->event.resp.data, &multicast->rec,
+				IB_SA_ATTR_MC_MEMBER_REC);
+	}
+
+	queue_event(file, &mcast->event);
+	return 0;
+}
+
+static int join_mcast(struct usa_file *file, struct ib_usa_request *req,
+		      int out_len)
+{
+	struct usa_multicast *mcast;
+	struct ib_sa_mcmember_rec rec;
+	int ret;
+
+	if (out_len < sizeof(u32))
+		return -ENOSPC;
+
+	mcast = kzalloc(sizeof *mcast, GFP_KERNEL);
+	if (!mcast)
+		return -ENOMEM;
+
+	mcast->id.dev = get_dev(req->node_guid, req->port_num);
+	if (!mcast->id.dev) {
+		ret = -ENODEV;
+		goto err1;
+	}
+
+	if (copy_from_user(mcast->event.resp.data,
+			   (void __user *) (unsigned long) req->attr,
+			   IB_SA_ATTR_MC_MEMBER_REC_LEN)) {
+		ret = -EFAULT;
+		goto err2;
+	}
+
+	INIT_LIST_HEAD(&mcast->event.list);
+	mcast->event.id = &mcast->id;
+	mcast->event.resp.attr_id = cpu_to_be16(IB_SA_ATTR_MC_MEMBER_REC);
+	mcast->event.resp.uid = req->uid;
+	mcast->id.attr_id = IB_SA_ATTR_MC_MEMBER_REC;
+	mcast->id.uid = req->uid;
+
+	ret = insert_id(&mcast->id);
+	if (ret)
+		goto err2;
+
+	mcast->event.resp.id = mcast->id.num;
+	if (copy_to_user((void __user *) (unsigned long) req->response,
+			 &mcast->id.num, sizeof(u32))) {
+		ret = EFAULT;
+		goto err3;
+	}
+
+	mcast->id.file = file;
+	insert_file_id(file, &mcast->id);
+
+	ib_sa_unpack_attr(&rec, mcast->event.resp.data,
+			  IB_SA_ATTR_MC_MEMBER_REC);
+	mcast->multicast = ib_sa_join_multicast(&file->sa_client,
+						mcast->id.dev->device,
+						req->port_num, &rec,
+						(ib_sa_comp_mask) req->comp_mask,
+						GFP_KERNEL, multicast_handler,
+						mcast);
+	if (IS_ERR(mcast->multicast)) {
+		ret = PTR_ERR(mcast->multicast);
+		goto err4;
+	}
+
+	return 0;
+
+err4:
+	remove_file_id(file, &mcast->id);
+err3:
+	remove_id(&mcast->id);
+err2:
+	put_dev(mcast->id.dev);
+err1:
+	kfree(mcast);
+	return ret;
+}
+
+static int get_mcast(struct usa_file *file, struct ib_usa_request *req,
+		     int out_len)
+{
+	struct usa_device *dev;
+	struct ib_sa_mcmember_rec rec;
+	u8 mcmember_rec[IB_SA_ATTR_MC_MEMBER_REC_LEN];
+	int ret;
+
+	if (out_len < sizeof(IB_SA_ATTR_MC_MEMBER_REC_LEN))
+		return -ENOSPC;
+
+	if (req->comp_mask != IB_SA_MCMEMBER_REC_MGID)
+		return -ENOSYS;
+
+	if (copy_from_user(mcmember_rec,
+			   (void __user *) (unsigned long) req->attr,
+			   IB_SA_ATTR_MC_MEMBER_REC_LEN))
+		return -EFAULT;
+
+	dev = get_dev(req->node_guid, req->port_num);
+	if (!dev)
+		return -ENODEV;
+
+	ib_sa_unpack_attr(&rec, mcmember_rec, IB_SA_ATTR_MC_MEMBER_REC);
+	ret = ib_sa_get_mcmember_rec(dev->device, req->port_num,
+				     &rec.mgid, &rec);
+	if (!ret) {
+		ib_sa_pack_attr(mcmember_rec, &rec, IB_SA_ATTR_MC_MEMBER_REC);
+		if (copy_to_user((void __user *) (unsigned long) req->response,
+				 mcmember_rec, IB_SA_ATTR_MC_MEMBER_REC_LEN))
+			ret = -EFAULT;
+	}
+
+	put_dev(dev);
+	return ret;
+}
+
+static int process_mcast(struct usa_file *file, struct ib_usa_request *req,
+			 int out_len)
+{
+	/* Only indirect requests are currently supported. */
+	if (!req->local)
+		return -ENOSYS;
+
+	switch (req->method) {
+	case IB_MGMT_METHOD_GET:
+		return get_mcast(file, req, out_len);
+	case IB_MGMT_METHOD_SET:
+		return join_mcast(file, req, out_len);
+	default:
+		return -EINVAL;
+	}
+}
+
+static int notice_handler(int status, struct ib_inform_info *info,
+			  struct ib_sa_notice *notice)
+{
+	struct usa_inform_info *inform = info->context;
+	struct usa_file *file = inform->id.file;
+	struct usa_event *event;
+
+	event = kzalloc(sizeof *event, GFP_KERNEL);
+	if (!event)
+		return 0;
+
+	event->resp.uid = inform->id.uid;
+	event->id = &inform->id;
+	event->resp.status = status;
+	INIT_LIST_HEAD(&event->list);
+
+	if (notice) {
+		event->resp.attr_id = cpu_to_be16(IB_SA_ATTR_NOTICE);
+		event->resp.data_len = IB_SA_ATTR_NOTICE_LEN;
+		ib_sa_pack_attr(event->resp.data, notice, IB_SA_ATTR_NOTICE);
+	} else
+		event->resp.attr_id = cpu_to_be16(IB_SA_ATTR_INFORM_INFO);
+
+	queue_event(file, event);
+	return 0;
+}
+
+static int reg_inform(struct usa_file *file, struct ib_usa_request *req,
+		      int out_len)
+{
+	struct usa_inform_info *inform;
+	struct ib_sa_inform sa_inform_info;
+	u8 net_inform_info[IB_SA_ATTR_INFORM_INFO_LEN];
+	u16 trap_number;
+	int ret;
+
+	if (out_len < sizeof(u32))
+		return -ENOSPC;
+
+	if (copy_from_user(&net_inform_info,
+			   (void __user *) (unsigned long) req->attr,
+			   IB_SA_ATTR_INFORM_INFO_LEN))
+		return -EFAULT;
+
+	inform = kzalloc(sizeof *inform, GFP_KERNEL);
+	if (!inform)
+		return -ENOMEM;
+
+	inform->id.dev = get_dev(req->node_guid, req->port_num);
+	if (!inform->id.dev) {
+		ret = -ENODEV;
+		goto err1;
+	}
+
+	inform->id.attr_id = IB_SA_ATTR_INFORM_INFO;
+	inform->id.uid = req->uid;
+
+	ret = insert_id(&inform->id);
+	if (ret)
+		goto err2;
+
+	if (copy_to_user((void __user *) (unsigned long) req->response,
+			 &inform->id.num, sizeof(u32))) {
+		ret = EFAULT;
+		goto err3;
+	}
+
+	inform->id.file = file;
+	insert_file_id(file, &inform->id);
+
+	ib_sa_unpack_attr(&sa_inform_info, &net_inform_info,
+			  IB_SA_ATTR_INFORM_INFO);
+	trap_number = be16_to_cpu(sa_inform_info.trap.generic.trap_num);
+	inform->inform_info =
+		ib_sa_register_inform_info(&file->sa_client,
+					   inform->id.dev->device,
+					   req->port_num, trap_number,
+					   GFP_KERNEL, notice_handler,
+					   inform);
+	if (IS_ERR(inform->inform_info)) {
+		ret = PTR_ERR(inform->inform_info);
+		goto err4;
+	}
+
+	return 0;
+
+err4:
+	remove_file_id(file, &inform->id);
+err3:
+	remove_id(&inform->id);
+err2:
+	put_dev(inform->id.dev);
+err1:
+	kfree(inform);
+	return ret;
+}
+
+static int process_inform(struct usa_file *file, struct ib_usa_request *req,
+			  int out_len)
+{
+	/* Only indirect requests are currently supported. */
+	if (!req->local)
+		return -ENOSYS;
+
+	if (req->method != IB_MGMT_METHOD_SET)
+		return -EINVAL;
+
+	return reg_inform(file, req, out_len);
+}
+
+static ssize_t usa_request(struct usa_file *file, const char __user *inbuf,
+			   int in_len, int out_len)
+{
+	struct ib_usa_request req;
+
+	if (copy_from_user(&req, inbuf, sizeof(req)))
+		return -EFAULT;
+
+	switch (be16_to_cpu(req.attr_id)) {
+	case IB_SA_ATTR_MC_MEMBER_REC:
+		return process_mcast(file, &req, out_len);
+	case IB_SA_ATTR_INFORM_INFO:
+		return process_inform(file, &req, out_len);
+	default:
+		return -EINVAL;
+	}
+}
+
+static void *cleanup_mcast(struct usa_id *id)
+{
+	struct usa_multicast *mcast;
+
+	mcast = container_of(id, struct usa_multicast, id);
+	ib_sa_free_multicast(mcast->multicast);
+
+	mutex_lock(&id->file->file_mutex);
+	list_del(&id->list);
+	list_del(&mcast->event.list);
+	mutex_unlock(&id->file->file_mutex);
+
+	return mcast;
+}
+
+static void *cleanup_inform(struct usa_id *id)
+{
+	struct usa_inform_info *inform;
+
+	inform = container_of(id, struct usa_inform_info, id);
+	ib_sa_unregister_inform_info(inform->inform_info);
+
+	mutex_lock(&id->file->file_mutex);
+	list_del(&id->list);
+	/* TODO cleanup events */
+	mutex_unlock(&id->file->file_mutex);
+
+	return inform;
+}
+
+static int free_id(struct usa_id *id)
+{
+	void *free_obj;
+	int events_reported;
+
+	switch (id->attr_id) {
+	case IB_SA_ATTR_MC_MEMBER_REC:
+		free_obj = cleanup_mcast(id);
+		break;
+	case IB_SA_ATTR_INFORM_INFO:
+		free_obj = cleanup_inform(id);
+		break;
+	default:
+		free_obj = NULL;
+		break;
+	}
+
+	events_reported = id->events_reported;
+	put_dev(id->dev);
+	kfree(free_obj);
+
+	return events_reported;
+}
+
+static ssize_t usa_free(struct usa_file *file, const char __user *inbuf,
+			int in_len, int out_len)
+{
+	struct ib_usa_free cmd;
+	struct ib_usa_free_resp resp;
+	struct usa_id *id;
+	int ret = 0;
+
+	if (out_len < sizeof(resp))
+		return -ENOSPC;
+
+	if (copy_from_user(&cmd, inbuf, sizeof(cmd)))
+		return -EFAULT;
+
+	mutex_lock(&usa_mutex);
+	id = get_id(cmd.id, file, be16_to_cpu(cmd.attr_id));
+	if (!IS_ERR(id))
+		idr_remove(&usa_idr, id->num);
+	mutex_unlock(&usa_mutex);
+
+	resp.events_reported = free_id(id);
+
+	if (copy_to_user((void __user *) (unsigned long) cmd.response,
+			 &resp, sizeof resp))
+		ret = -EFAULT;
+
+	return ret;
+}
+
+static ssize_t (*usa_cmd_table[])(struct usa_file *file,
+				   const char __user *inbuf,
+				   int in_len, int out_len) = {
+	[IB_USA_CMD_REQUEST]	= usa_request,
+	[IB_USA_CMD_GET_EVENT]	= usa_get_event,
+	[IB_USA_CMD_FREE]	= usa_free,
+};
+
+static ssize_t usa_write(struct file *filp, const char __user *buf,
+			 size_t len, loff_t *pos)
+{
+	struct usa_file *file = filp->private_data;
+	struct ib_usa_cmd_hdr hdr;
+	ssize_t ret;
+
+	if (len < sizeof(hdr))
+		return -EINVAL;
+
+	if (copy_from_user(&hdr, buf, sizeof(hdr)))
+		return -EFAULT;
+
+	if (hdr.cmd < 0 || hdr.cmd >= ARRAY_SIZE(usa_cmd_table))
+		return -EINVAL;
+
+	if (hdr.in + sizeof(hdr) > len)
+		return -EINVAL;
+
+	ret = usa_cmd_table[hdr.cmd](file, buf + sizeof(hdr), hdr.in, hdr.out);
+	if (!ret)
+		ret = len;
+
+	return ret;
+}
+
+static unsigned int usa_poll(struct file *filp, struct poll_table_struct *wait)
+{
+	struct usa_file *file = filp->private_data;
+	unsigned int mask = 0;
+
+	poll_wait(filp, &file->poll_wait, wait);
+
+	if (!list_empty(&file->event_list))
+		mask = POLLIN | POLLRDNORM;
+
+	return mask;
+}
+
+static int usa_open(struct inode *inode, struct file *filp)
+{
+	struct usa_file *file;
+
+	file = kmalloc(sizeof *file, GFP_KERNEL);
+	if (!file)
+		return -ENOMEM;
+
+	ib_sa_register_client(&file->sa_client);
+
+	INIT_LIST_HEAD(&file->event_list);
+	INIT_LIST_HEAD(&file->id_list);
+	init_waitqueue_head(&file->poll_wait);
+	mutex_init(&file->file_mutex);
+
+	filp->private_data = file;
+	file->filp = filp;
+	return 0;
+}
+
+static int usa_close(struct inode *inode, struct file *filp)
+{
+	struct usa_file *file = filp->private_data;
+	struct usa_id *id;
+
+	while (!list_empty(&file->id_list)) {
+		id = list_entry(file->id_list.next, struct usa_id, list);
+		remove_id(id);
+		free_id(id);
+	}
+	ib_sa_unregister_client(&file->sa_client);
+
+	kfree(file);
+	return 0;
+}
+
+static void usa_add_one(struct ib_device *device)
+{
+	struct usa_device *dev;
+
+	if (rdma_node_get_transport(device->node_type) != RDMA_TRANSPORT_IB)
+		return;
+
+	dev = kmalloc(sizeof *dev, GFP_KERNEL);
+	if (!dev)
+		return;
+
+	dev->device = device;
+	if (device->node_type == RDMA_NODE_IB_SWITCH)
+		dev->start_port = dev->end_port = 0;
+	else {
+		dev->start_port = 1;
+		dev->end_port = device->phys_port_cnt;
+	}
+
+	init_completion(&dev->comp);
+	atomic_set(&dev->refcount, 1);
+	ib_set_client_data(device, &usa_client, dev);
+
+	mutex_lock(&usa_mutex);
+	list_add_tail(&dev->list, &dev_list);
+	mutex_unlock(&usa_mutex);
+}
+
+static void usa_remove_one(struct ib_device *device)
+{
+	struct usa_device *dev;
+
+	dev = ib_get_client_data(device, &usa_client);
+	if (!dev)
+		return;
+
+	mutex_lock(&usa_mutex);
+	list_del(&dev->list);
+	mutex_unlock(&usa_mutex);
+
+	/* TODO: force immediate device removal */
+	put_dev(dev);
+	wait_for_completion(&dev->comp);
+	kfree(dev);
+}
+
+static struct file_operations usa_fops = {
+	.owner 	 = THIS_MODULE,
+	.open 	 = usa_open,
+	.release = usa_close,
+	.write	 = usa_write,
+	.poll    = usa_poll,
+};
+
+static struct miscdevice usa_misc = {
+	.minor	= MISC_DYNAMIC_MINOR,
+	.name	= "ib_usa",
+	.fops	= &usa_fops,
+};
+
+static ssize_t show_abi_version(struct device *dev,
+				struct device_attribute *attr,
+				char *buf)
+{
+	return sprintf(buf, "%d\n", IB_USA_ABI_VERSION);
+}
+static DEVICE_ATTR(abi_version, S_IRUGO, show_abi_version, NULL);
+
+static int __init usa_init(void)
+{
+	int ret;
+
+	ret = misc_register(&usa_misc);
+	if (ret)
+		return ret;
+
+	ret = device_create_file(usa_misc.this_device, &dev_attr_abi_version);
+	if (ret)
+		goto err1;
+
+	ret = ib_register_client(&usa_client);
+	if (ret)
+		goto err2;
+	
+	return 0;
+
+err2:
+	device_remove_file(usa_misc.this_device, &dev_attr_abi_version);
+err1:
+	misc_deregister(&usa_misc);
+	return ret;
+}
+
+static void __exit usa_cleanup(void)
+{
+	ib_unregister_client(&usa_client);
+	device_remove_file(usa_misc.this_device, &dev_attr_abi_version);
+	misc_deregister(&usa_misc);
+	idr_destroy(&usa_idr);
+}
+
+module_init(usa_init);
+module_exit(usa_cleanup);
diff --git a/include/rdma/ib_sa.h b/include/rdma/ib_sa.h
index a8e5221..f36be98 100644
--- a/include/rdma/ib_sa.h
+++ b/include/rdma/ib_sa.h
@@ -557,4 +557,7 @@ ib_sa_register_inform_info(struct ib_sa_client *client,
  */
 void ib_sa_unregister_inform_info(struct ib_inform_info *info);
 
+int ib_sa_pack_attr(void *dst, void *src, int attr_id);
+int ib_sa_unpack_attr(void *dst, void *src, int attr_id);
+
 #endif /* IB_SA_H */
diff --git a/include/rdma/ib_usa.h b/include/rdma/ib_usa.h
new file mode 100644
index 0000000..0180cab
--- /dev/null
+++ b/include/rdma/ib_usa.h
@@ -0,0 +1,97 @@
+/*
+ * Copyright (c) 2006-2007 Intel Corporation.  All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#ifndef IB_USA_H
+#define IB_USA_H
+
+#include <linux/types.h>
+#include <rdma/ib_sa.h>
+
+#define IB_USA_ABI_VERSION	1
+
+#define IB_USA_EVENT_DATA	256
+
+enum {
+	IB_USA_CMD_REQUEST,
+	IB_USA_CMD_GET_EVENT,
+	IB_USA_CMD_FREE
+};
+
+enum {
+	IB_SA_ATTR_NOTICE_LEN = 80,
+	IB_SA_ATTR_INFORM_INFO_LEN = 36,
+	IB_SA_ATTR_MC_MEMBER_REC_LEN = 52
+};
+
+struct ib_usa_cmd_hdr {
+	__u32 cmd;
+	__u16 in;
+	__u16 out;
+};
+
+struct ib_usa_request {
+	__u64 response;
+	__u64 uid;
+	__u64 node_guid;
+	__u64 comp_mask;
+	__u64 attr;
+	__be16 attr_id;
+	__u8  method;
+	__u8  port_num;
+	__u8  local;
+};
+
+struct ib_usa_free {
+	__u64 response;
+	__u32 id;
+	__be16 attr_id;
+};
+
+struct ib_usa_free_resp {
+	__u32 events_reported;
+};
+
+struct ib_usa_get_event {
+	__u64 response;
+};
+
+struct ib_usa_event_resp {
+	__u64 uid;
+	__u32 id;
+	__u32 status;
+	__u32 data_len;
+	__be16 attr_id;
+	__u16 reserved;
+	__u8  data[IB_USA_EVENT_DATA];
+};
+
+#endif /* IB_USA_H */


From pradeep at us.ibm.com  Fri Feb  2 16:31:32 2007
From: pradeep at us.ibm.com (Pradeep Satyanarayana)
Date: Fri, 2 Feb 2007 16:31:32 -0800
Subject: [openib-general] IPoIB CM for merge?
In-Reply-To: <1170431671.26115.25.camel@stevo-desktop>
Message-ID: <OFFB2434C6.DFAD6AC8-ON88257277.000217E9-88257277.0002E42F@us.ibm.com>

Hello Michael,

Here are a few more observations :

1. For the SRQ case, the skbs and recieve biffers are posted during init 
and even before the rx_qp is created. This causes a problem (atleast for 
non SRQs) for the ehca. We need to call the ipoib_cm_alloc_skb() and 
ipoib_cm_post_recieve() after the rx_qp is in the RTR state.

2. Also found that in ipoib_cm_create_rx_qp() one needs to initialize 
.cap.max_recv_wr and .cap.max_recv_sge. Otherwise this leads to some 
problems like rq overflows and causing communication failures.

Pradeep
pradeep at us.ibm.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070202/fa34555d/attachment.html>

From chrisw at sous-sol.org  Fri Feb  2 18:35:15 2007
From: chrisw at sous-sol.org (Chris Wright)
Date: Fri, 02 Feb 2007 18:35:15 -0800
Subject: [openib-general] [patch 11/59] [stable] [PATCH] IB/mthca: Fix
 off-by-one in FMR handling on memfree
References: <20070203023504.435051000@sous-sol.org>
Message-ID: <20070203023916.739906000@sous-sol.org>

An embedded and charset-unspecified text was scrubbed...
Name: ib-mthca-fix-off-by-one-in-fmr-handling-on-memfree.patch
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070202/f42abc25/attachment.ksh>

From kazeigan at yahoo.co.jp  Fri Feb  2 18:37:39 2007
From: kazeigan at yahoo.co.jp (�)
Date: Sat,  3 Feb 2007 11:37:39 +0900 (JST)
Subject: [openib-general]
	=?ISO-2022-JP?B?g4GBW4OLgqCC6IKqgsaCpIKygrSCooLcgrWCvYH0?=
Message-ID: 20070203113738

お久し振りです。瑞奈です。
先日はメールありがとうございました。
返事が遅くなってしまい、申し訳ありません。

前のメールで質問されていた仕事の話ですが・・・
私は専業主婦なんです。
去年の12月からずっと家のことをやってて、それで忙しかったんです。
家事は楽しいんですが、さすがに疲れが・・・（＞＜
こんな生活なので出会いもないし、誰かに甘えたくなっちゃう事も多くて。
それで、急にこんな事をいうと変に思われるかもしれませんが
一度会ってお話をしたいのですが、ご迷惑でしょうか？
私は世田谷区に住んでいる31歳です。
一緒にゴハンを食べたり、たくさんお話がしたいです♪
できれば今週末、新宿か渋谷あたりが私は都合がいいのですが
いかがでしょうか？

http://chu.punyu.jp/mizuna/

最近、このサイトを利用しているので
ここからメールを下さいませんか？
mixiもやっているのですが、こちらの方が居心地がいいので
このサイトばかりを使ってます（＾＾；

それでは、お返事をお待ちしていますね。

瑞奈


From xma at us.ibm.com  Fri Feb  2 20:58:37 2007
From: xma at us.ibm.com (Shirley Ma)
Date: Fri, 2 Feb 2007 20:58:37 -0800
Subject: [openib-general] Multicast join group failure prevents IPoIB
	performing
Message-ID: <OFBE0190D5.DADA4E85-ON87257277.0019E0B7-88257276.00733D42@us.ibm.com>


When bringing IPoIB interface up, I hit default group multicast join
failure. (This could be fixed in SM set up?)
ib0: multicast join failed for xxxx, status -22
Then the interface was UP but not RUNNING. So the nodes couldn't ping each
other. I think the right behavior of the interface should be UP and RUNNING
even with some multicast join failure. I would like to provide a patch if
there is no problem. Please advise.

Thanks
Shirley Ma
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070202/57bedbbb/attachment.html>

From eitan at sw053.yok.mtl.com  Fri Feb  2 21:28:01 2007
From: eitan at sw053.yok.mtl.com (Eitan Zahavi)
Date: Sat, 3 Feb 2007 07:28:01 +0200
Subject: [openib-general] nightly osm_sim report 2007-02-03:normal completion
Message-ID: <200702030528.l135S13O000650@sw053.yok.mtl.com>

OSM Simulation Regression Summary
OpenSM rev = Fri_Feb_2_09:16:30_2007 db386c 
ibutils rev = Wed_Jan_3_11:42:12_2007 913448 
Total=410 Pass=410 Fail=0

Pass:
30 Stability IS1-16.topo
30 Pkey IS1-16.topo
30 OsmTest IS1-16.topo
30 OsmStress IS1-16.topo
30 Multicast IS1-16.topo
30 LidMgr IS1-16.topo
10 Stability IS3-loop.topo
10 Stability IS3-128.topo
10 Pkey IS3-128.topo
10 OsmTest IS3-loop.topo
10 OsmTest IS3-128.topo
10 OsmStress IS3-128.topo
10 Multicast IS3-loop.topo
10 Multicast IS3-128.topo
10 LidMgr IS3-128.topo
10 FatTree part-4-ary-3-tree.topo
10 FatTree merge-roots-reorder-4-ary-2-tree.topo
10 FatTree merge-roots-4-ary-2-tree.topo
10 FatTree merge-root-4-ary-3-tree.topo
10 FatTree merge-root-12-ary-2-tree.topo
10 FatTree merge-2-ary-4-tree.topo
10 FatTree half-4-ary-3-tree.topo
10 FatTree blend-4-ary-2-tree.topo
10 FatTree 4-ary-4-tree.topo
10 FatTree 4-ary-3-tree.topo
10 FatTree 32nodes-3lvl-is1.topo
10 FatTree 2-ary-4-tree.topo
10 FatTree 12-node-spaced.topo
10 FatTree 12-ary-2-tree.topo

Failures:


From vlad at lists.openfabrics.org  Sat Feb  3 02:21:53 2007
From: vlad at lists.openfabrics.org (vlad at lists.openfabrics.org)
Date: Sat,  3 Feb 2007 02:21:53 -0800 (PST)
Subject: [openib-general] ofa_1_2_kernel 20070203-0200 daily build status
Message-ID: <20070203102154.36F92E607F9@openfabrics.org>

This email was generated automatically, please do not reply


Common build parameters:  --with-ipoib-mod --with-sdp-mod --with-srp-mod --with-user_mad-mod --with-user_access-mod --with-mthca-mod --with-core-mod --with-addr_trans-mod --with-cxgb3-mod 

Passed:
Passed on i686 with 2.6.15-23-server
Passed on i686 with linux-2.6.19
Passed on i686 with linux-2.6.17
Passed on i686 with linux-2.6.16
Passed on i686 with linux-2.6.15
Passed on i686 with linux-2.6.18
Passed on i686 with linux-2.6.14
Passed on i686 with linux-2.6.12
Passed on i686 with linux-2.6.13
Passed on x86_64 with linux-2.6.19
Passed on x86_64 with linux-2.6.17
Passed on ia64 with linux-2.6.18
Passed on powerpc with linux-2.6.17
Passed on x86_64 with linux-2.6.15
Passed on powerpc with linux-2.6.19
Passed on x86_64 with linux-2.6.18
Passed on x86_64 with linux-2.6.14
Passed on x86_64 with linux-2.6.12
Passed on x86_64 with linux-2.6.16
Passed on x86_64 with linux-2.6.13
Passed on ppc64 with linux-2.6.19
Passed on powerpc with linux-2.6.18
Passed on ia64 with linux-2.6.19
Passed on powerpc with linux-2.6.13
Passed on ppc64 with linux-2.6.12
Passed on ppc64 with linux-2.6.14
Passed on powerpc with linux-2.6.16
Passed on powerpc with linux-2.6.15
Passed on powerpc with linux-2.6.12
Passed on ia64 with linux-2.6.16
Passed on ppc64 with linux-2.6.16
Passed on powerpc with linux-2.6.14
Passed on ppc64 with linux-2.6.15
Passed on ppc64 with linux-2.6.17
Passed on ppc64 with linux-2.6.13
Passed on ia64 with linux-2.6.13
Passed on ppc64 with linux-2.6.18
Passed on ia64 with linux-2.6.12
Passed on ia64 with linux-2.6.17
Passed on ia64 with linux-2.6.14
Passed on ia64 with linux-2.6.15

Failed:
Build failed on ia64 with linux-2.6.16.21-0.8-default
Log:
/home/vlad/tmp/ofa_1_2_kernel-20070203-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core/addr.c:380: error: implicit declaration of function ‘register_netevent_notifier’
/home/vlad/tmp/ofa_1_2_kernel-20070203-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core/addr.c: In function ‘addr_cleanup’:
/home/vlad/tmp/ofa_1_2_kernel-20070203-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core/addr.c:386: error: implicit declaration of function ‘unregister_netevent_notifier’
make[4]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070203-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core/addr.o] Error 1
make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070203-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core] Error 2
make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070203-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband] Error 2
make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070203-0200_linux-2.6.16.21-0.8-default_ia64_check] Error 2
make[1]: Leaving directory `/home/vlad/kernel.org/ia64/linux-2.6.16.21-0.8-default'
make: *** [kernel] Error 2
----------------------------------------------------------------------------------


From halr at voltaire.com  Sat Feb  3 06:30:36 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 03 Feb 2007 09:30:36 -0500
Subject: [openib-general] OpenIB management libraries release 1.0.2
Message-ID: <1170513034.4525.15093.camel@hal.voltaire.com>

http://www.openfabrics.org/~halr/

md5sum
b9b4bdf899f1d0ff15e06915cd846a3a  libibcommon-1.0.2.tar.gz
2af3ff7e38a1f49fb7514660a9991c89  libibmad-1.0.2.tar.gz
7d7690abfe9b08c8240fbf0157653b90  libibumad-1.0.2.tar.gz


From xma at us.ibm.com  Sat Feb  3 08:54:41 2007
From: xma at us.ibm.com (Shirley Ma)
Date: Sat, 3 Feb 2007 09:54:41 -0700
Subject: [openib-general] Multicast join group failure prevents IPoIB
 performing
In-Reply-To: <OFBE0190D5.DADA4E85-ON87257277.0019E0B7-88257276.00733D42@us.ibm.com>
Message-ID: <OFC5DE2831.45D42293-ON87257277.005B8016-88257277.0030F679@us.ibm.com>


According to IPoIB RFC4391 section 5, once IPoIB broadcast group has been
joined, the IPoIB link should be UP, since it's ready for data transfer,
the interface should be able to run for broadcast and unicast, do not need
to wait for all multicast join successfully. Here is the patch to allow
IPoIB interface running without waiting for all multicast join succesful,
like all host group multicast join .... Here is the patch:

diff -urpN ipoib/ipoib_multicast.c ipoib-patch/ipoib_multicast.c
--- ipoib/ipoib_multicast.c   2006-11-29 13:57:37.000000000 -0800
+++ ipoib-patch/ipoib_multicast.c   2007-02-03 00:52:23.000000000 -0800
@@ -566,6 +566,7 @@ void ipoib_mcast_join_task(void *dev_ptr

      if (!test_bit(IPOIB_MCAST_FLAG_ATTACHED, &priv->broadcast->flags)) {
            ipoib_mcast_join(dev, priv->broadcast, 0);
+           netif_carrier_on(dev);
            return;
      }

@@ -599,7 +600,6 @@ void ipoib_mcast_join_task(void *dev_ptr
      ipoib_dbg_mcast(priv, "successfully joined all multicast groups\n");

      clear_bit(IPOIB_MCAST_RUN, &priv->flags);
-     netif_carrier_on(dev);
 }

 int ipoib_mcast_start_thread(struct net_device *dev)

(See attached file: multicast.patch)

http://www.rfc-editor.org/rfc/rfc4391.txt

5.  Setting Up an IPoIB Link

   The broadcast-GID, as defined in the previous section, MUST be set up
   for an IPoIB subnet to be formed.  Every IPoIB interface MUST
   "FullMember" join the IB multicast group defined by the broadcast-
   GID.  This multicast group will henceforth be referred to as the
   broadcast group.  The join operation returns the MTU, the Q_Key, and
   other parameters associated with the broadcast group.  The node then
   associates the parameters received as a result of the join operation
   with its IPoIB interface.  The broadcast group also serves to provide
   a link-layer broadcast service for protocols like ARP, net-directed,
   subnet-directed, and all-subnets-directed broadcasts in IPv4 over IB
   networks.

   The join operation is successful only if the Subnet Manager (SM)
   determines that the joining node can support the MTU registered with
   the broadcast group [RFC4392] ensuring support for a common link MTU.
   The SM also ensures that all the nodes joining the broadcast-GID have
   paths to one another and can therefore send and receive unicast
   packets.  It further ensures that all the nodes do indeed form a
   multicast tree that allows packets sent from any member to be
   replicated to every other member.   Thus, the IPoIB link is formed by
   the IPoIB nodes joining the broadcast group.  There is no physical
   demarcation of the IPoIB link other than that determined by the
   broadcast group membership.


Shirley Ma


             Shirley                                                       
             Ma/Beaverton/IBM@                                             
             IBMUS                                                      To 
             Sent by:                  openib-general at openib.org           
             openib-general-bo                                          cc 
             unces at openib.org                                              
                                                                   Subject 
                                       [openib-general] Multicast join     
             02/02/07 08:58 PM         group failure prevents IPoIB        
                                       performing                          
                                                                           
                                                                           
When bringing IPoIB interface up, I hit default group multicast join
failure. (This could be fixed in SM set up?)
ib0: multicast join failed for xxxx, status -22
Then the interface was UP but not RUNNING. So the nodes couldn't ping each
other. I think the right behavior of the interface should be UP and RUNNING
even with some multicast join failure. I would like to provide a patch if
there is no problem. Please advise.

Thanks
Shirley Ma_______________________________________________
openib-general mailing list
openib-general at openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit
http://openib.org/mailman/listinfo/openib-general
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070203/dcec05be/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070203/dcec05be/attachment.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: pic07588.gif
Type: image/gif
Size: 1255 bytes
Desc: not available
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070203/dcec05be/attachment-0001.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ecblank.gif
Type: image/gif
Size: 45 bytes
Desc: not available
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070203/dcec05be/attachment-0002.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: multicast.patch
Type: application/octet-stream
Size: 684 bytes
Desc: not available
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070203/dcec05be/attachment.obj>

From bugzilla-daemon at lists.openfabrics.org  Sat Feb  3 23:07:21 2007
From: bugzilla-daemon at lists.openfabrics.org (bugzilla-daemon at lists.openfabrics.org)
Date: Sat,  3 Feb 2007 23:07:21 -0800 (PST)
Subject: [openib-general] [Bug 334] Problems with build
	OFED-1.1.1-ib_local_sa
In-Reply-To: <bug-334-1@https.bugs.openfabrics.org/>
Message-ID: <20070204070721.CAE32E607F9@openfabrics.org>

https://bugs.openfabrics.org/show_bug.cgi?id=334


------- Comment #14 from erezz at voltaire.com  2007-02-03 23:07 -------
(In reply to comment #13)
> I want to ask someone how I can apply the patch during build.sh run script?
> As I know when I run build.sh my old files with patch always update throught
> run rpm -i openib-1.1.src.rpm. How I can do it (apply my patches) or I need to
> wait new releases?
> 

(In reply to comment #11)
> (In reply to comment #10)
> > What is the output of uname -r ? This is VERY important. Also, can you run 
> `cat /etc/issue` and send the results?
> > >        
> 
> As you can see my first message I wrote the my machine configuration:
> >The machine configuration:
> >Kernel: Linux 2.6.16.21-0.8-smp #1 SMP Mon Jul 3 18:25:39 UTC 2006 x86_64
> x86_64 x86_64 GNU/Linux
> >OS: SUSE Linux Enterprise Server 10 (x86_64)
> >gcc version: gcc (GCC) 4.1.0 (SUSE Linux)
> 
> Unfortunately my machine didn't have the version of Linux in /etc/issue because
> it is not right by IT requrements. 

Why? OFED 1.1 expects that you don't change this file. This is how SuSE ships
it with SLES 10.

I have saw the ofed_scripts/configure file
> and I saw that for right choice of patches configure needed the file
> /etc/issue. I think that not good idea because first of all need to run
> command:  cat /etc/*release* and find the version Linux in this file and after
> this check (if neccessary) file /etc/issue
> 

I don't understand the problem.


-- 
Configure bugmail: https://bugs.openfabrics.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at lists.openfabrics.org  Sat Feb  3 23:14:59 2007
From: bugzilla-daemon at lists.openfabrics.org (bugzilla-daemon at lists.openfabrics.org)
Date: Sat,  3 Feb 2007 23:14:59 -0800 (PST)
Subject: [openib-general] [Bug 334] Problems with build
	OFED-1.1.1-ib_local_sa
In-Reply-To: <bug-334-1@https.bugs.openfabrics.org/>
Message-ID: <20070204071459.64292E607F9@openfabrics.org>

https://bugs.openfabrics.org/show_bug.cgi?id=334


erezz at voltaire.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|REOPENED                    |RESOLVED
         Resolution|                            |INVALID


------- Comment #15 from erezz at voltaire.com  2007-02-03 23:14 -------
(In reply to comment #13)
> I want to ask someone how I can apply the patch during build.sh run script?

I don't agree with your patch. It assumes that SLES 10 may be corrupted. OFED
should not try to support this. If you want to use this patch for your own
purposes, just apply it (manually) before running OFED build scripts. OFED's
backport patches mechanism is not suitable for such patches.

> As I know when I run build.sh my old files with patch always update throught
> run rpm -i openib-1.1.src.rpm. How I can do it (apply my patches) or I need to
> wait new releases?
> 


-- 
Configure bugmail: https://bugs.openfabrics.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From ogerlitz at voltaire.com  Sun Feb  4 00:13:40 2007
From: ogerlitz at voltaire.com (Or Gerlitz)
Date: Sun, 04 Feb 2007 10:13:40 +0200
Subject: [openib-general] Detecting when an RDMA writer process
 disappears
In-Reply-To: <45C2C7B1.7090204@evergrid.com>
References: <45C2C7B1.7090204@evergrid.com>
Message-ID: <45C595B4.3000700@voltaire.com>

Mike Heffner wrote:
> Is there any method by which a receiving process that is polling in 
> preregistered memory regions for data from a sender performing RDMA 
> writes, can detect if the sender is killed? Say by a SIGKILL signal? The 
> RC connection is setup using the RDMA CM and there do not appear to be 
> any CM events created on the event channel

If you have a process with connected RDMA CM ID whose associated peer 
process died you should get DISCONNECTED event. how do you verify that 
there is no rdma cm event present at the polling side?

Or.


From ogerlitz at voltaire.com  Sun Feb  4 00:32:02 2007
From: ogerlitz at voltaire.com (Or Gerlitz)
Date: Sun, 04 Feb 2007 10:32:02 +0200
Subject: [openib-general] ip_ib_mc_map?
In-Reply-To: <15ddcffd0702011518qf115aaey862ef168784e81ca@mail.gmail.com>
References: <1170275331.14294.1.camel@stevo-desktop>
	<45C1ABD0.5090404@voltaire.com>
	<1170325052.2716.229.camel@fc6.xsintricity.com>
	<15ddcffd0702011240l3c427bfcx6fcc7f7968fcf8b9@mail.gmail.com>
	<1170368361.2716.239.camel@fc6.xsintricity.com>
	<15ddcffd0702011518qf115aaey862ef168784e81ca@mail.gmail.com>
Message-ID: <45C59A02.6080900@voltaire.com>

Or Gerlitz wrote:
> On 2/2/07, Doug Ledford <dledford at redhat.com> wrote:
>> Yeah, I've got a setup, I just don't have any multicast tests that I
>> run.  Any test programs you have for multicast in particular would be
>> helpful.

> This is farely simple to do: have some multicast traffic routed over
> an IPoIB subnet on two nodes, eg using
> 
> $ route add -net 224.0.0.0 netmask 255.0.0.0 dev ib0
> $ iperf -usB 224.5.5.5 -i 1

OK, to verifying the problem is away based on running client/server is 
actually harder, since when the problem persist data is being moved on 
the broadcast group... so basically, first thing you want to do is set 
routing, then open an iperf server and see if the netstack has computed 
a correct IPoIB multicast hw address and instructed the device to use it.

> # iperf -usB 224.5.5.5 &

this is on U3, the stack computed fine the hw addresses for 224.5.5.5 
and 224.0.0.1

> # ip maddr show ib0
> 5:      ib0
>         link  00:ff:ff:ff:ff:12:40:1b:00:00:00:00:00:00:00:00:00:05:05:05
>         link  00:ff:ff:ff:ff:12:40:1b:00:00:00:00:00:00:00:00:00:00:00:01
>         inet  224.5.5.5
>         inet  224.0.0.1

this is on U4, the stack did not compute any hw addresses for 224.5.5.5 
and 224.0.0.1, the inet addresses are the output of /proc/net/igmp which 
means the stack is aware this node joins these groups but as we know the 
ARPHRD_INFINIBAND case was removed from the code computing a multicast 
link layer address...

> # ip maddr show ib0
> 8:      ib0
>         inet  224.5.5.5
>         inet  224.0.0.1

So basically, if on your U5-staged node, you have the same
# ip maddr show output as over U3 we made a progress. Really verifying 
that this traffic does not go over the broadcast group is a little bit 
harder, you would need a third active IPoIB device (that is another node 
or a second ipoib running device on the rx machine - eg ib1), run the 
iperf multicast test and make sure the --rx counters-- of the third 
device doe not get progress, where on U3 they would progress since all 
mcast traffic goes on the broadcast channel.

Please let me know if you need any further clarifications on how to test 
this, and... thanks! for taking care of it.

Or.


From vlad at lists.openfabrics.org  Sun Feb  4 02:22:23 2007
From: vlad at lists.openfabrics.org (vlad at lists.openfabrics.org)
Date: Sun,  4 Feb 2007 02:22:23 -0800 (PST)
Subject: [openib-general] ofa_1_2_kernel 20070204-0200 daily build status
Message-ID: <20070204102223.9F1DDE607F9@openfabrics.org>

This email was generated automatically, please do not reply


Common build parameters:  --with-ipoib-mod --with-sdp-mod --with-srp-mod --with-user_mad-mod --with-user_access-mod --with-mthca-mod --with-core-mod --with-addr_trans-mod --with-cxgb3-mod 

Passed:
Passed on i686 with 2.6.15-23-server
Passed on i686 with linux-2.6.19
Passed on i686 with linux-2.6.17
Passed on i686 with linux-2.6.16
Passed on i686 with linux-2.6.15
Passed on i686 with linux-2.6.13
Passed on i686 with linux-2.6.14
Passed on i686 with linux-2.6.18
Passed on i686 with linux-2.6.12
Passed on powerpc with linux-2.6.17
Passed on powerpc with linux-2.6.19
Passed on ia64 with linux-2.6.19
Passed on powerpc with linux-2.6.18
Passed on x86_64 with linux-2.6.19
Passed on ppc64 with linux-2.6.18
Passed on x86_64 with linux-2.6.18
Passed on ia64 with linux-2.6.18
Passed on x86_64 with linux-2.6.12
Passed on ppc64 with linux-2.6.19
Passed on x86_64 with linux-2.6.15
Passed on x86_64 with linux-2.6.16
Passed on x86_64 with linux-2.6.14
Passed on x86_64 with linux-2.6.13
Passed on x86_64 with linux-2.6.17
Passed on powerpc with linux-2.6.14
Passed on powerpc with linux-2.6.12
Passed on powerpc with linux-2.6.16
Passed on ppc64 with linux-2.6.13
Passed on powerpc with linux-2.6.13
Passed on ppc64 with linux-2.6.17
Passed on ppc64 with linux-2.6.12
Passed on powerpc with linux-2.6.15
Passed on ia64 with linux-2.6.16
Passed on ppc64 with linux-2.6.14
Passed on ppc64 with linux-2.6.15
Passed on ppc64 with linux-2.6.16
Passed on ia64 with linux-2.6.15
Passed on ia64 with linux-2.6.12
Passed on ia64 with linux-2.6.17
Passed on ia64 with linux-2.6.13
Passed on ia64 with linux-2.6.14

Failed:
Build failed on ia64 with linux-2.6.16.21-0.8-default
Log:
/home/vlad/tmp/ofa_1_2_kernel-20070204-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core/addr.c:380: error: implicit declaration of function ‘register_netevent_notifier’
/home/vlad/tmp/ofa_1_2_kernel-20070204-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core/addr.c: In function ‘addr_cleanup’:
/home/vlad/tmp/ofa_1_2_kernel-20070204-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core/addr.c:386: error: implicit declaration of function ‘unregister_netevent_notifier’
make[4]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070204-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core/addr.o] Error 1
make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070204-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core] Error 2
make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070204-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband] Error 2
make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070204-0200_linux-2.6.16.21-0.8-default_ia64_check] Error 2
make[1]: Leaving directory `/home/vlad/kernel.org/ia64/linux-2.6.16.21-0.8-default'
make: *** [kernel] Error 2
----------------------------------------------------------------------------------


From mst at mellanox.co.il  Sun Feb  4 02:57:57 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Sun, 4 Feb 2007 12:57:57 +0200
Subject: [openib-general] IPoIB CM for merge?
In-Reply-To: <1170431671.26115.25.camel@stevo-desktop>
References: <1170431671.26115.25.camel@stevo-desktop>
Message-ID: <20070204105757.GC8630@mellanox.co.il>

> Quoting Steve Wise <swise at opengridcomputing.com>:
> Subject: Re: [openib-general] IPoIB CM for merge?
> 
> On Fri, 2007-02-02 at 13:15 +0200, Michael S. Tsirkin wrote:
> > > Quoting Roland Dreier <rdreier at cisco.com>:
> > > Subject: Re: IPoIB CM for merge?
> > > 
> > >  > Could you please spend some time reviewing IPoIB CM code?
> > >  > I am concerned about missing the 2.6.21 merge window.
> > > 
> > > Thanks for the reminder.
> > > 
> > > Can we trade?  Have you looked at the cxgb3 iwarp driver?  Any comments?
> > 
> > OK.
> > I am not sure I have the last version posted so I am going to go by what
> > is there in OFED git tree.
> > 
> > And I also only looked under drivers/infiniband/.
> > 
> > So, here are some questions: I looked in the archives and have not seen
> > these addressed. Maybe these can be answered and then I'll go from there?
> > Does this sound OK?
> > 
> > Files with names like
> > ./core/cxio_hal.c
> > ./core/cxio_hal.h
> > normally generate a fair bit of discussion which wasn't present here,
> > I did not guess everyone was just busy.
> > For example, why is there both struct iwch_cq and struct t3_cq?
> > 
> 
> The cxgb3/core code defines a low level interface to the RDMA bits of
> the T3 device. 
> 
> This code was originally a separate module (named cxio) that allowed
> other RDMA middleware layers to sit on top of the this core rdma module.
> At the time, there was RNIC-PI and OFA being developed.  So that is the
> history of this.  As per the first openib review (about a year ago) of
> this code I merged this core module into the cxgb3 module.  I left the
> file structure and names as-is because it was low priority IMO.
> 
> The t3_cq struct is the low level CQ structure used to manage both a HW
> accessed CQ and a SW CQ (needed to handle error cases and out of order
> completions). The iwch_cq struct contains the stuff needed to integrate
> with the OFA core and uverbs code. It contains a t3_cq inline.

So now that there's a common module, there's no technical reason for
the two-level structure to exist? I would say you want to at least
move the files into a common directory.

I think you will also find that for datapath operations such as poll cq,
converting completion from hardware to struct t3_cqe, and from
that to ib_wc adds an untrivial amount of overhead.


> > File tcb.h comment says:
> > /* This file is automatically generated --- do not edit */
> > This looks like a GPL violation, does it not?
> > 
> 
> I can add the license if that's what you mean.

I mean that this file does not seem to be the source, in the GPL sense.
The following comes from COPYING under linux source directory:

	The source code for a work means the preferred form of the work for
	making modifications to it.  For an executable work, complete source
	code means all the source code for all modules it contains, plus any
	associated interface definition files, plus the scripts used to
	control compilation and installation of the executable.

So I think you must make the actual source available under the terms of GPL.

> > What's the deal with the naming convention?
> > Is there a reason in cxgb3, some files start with iwch and some with cxio?
> > How about using cxgb3 prefix all over?
> 
> The cxio_ prefix is used for the low-level functions/types that talk
> directly with the HW.  iwch_ is the provider driver functions that
> interface with the OFA stack.  I'd rather not change the names.
> Especially since this has already gone through several review cycles.
> I'm hoping we can get this in and improve it with subsequent
> submissions.  Is that reasonable?


-- 
MST


From monis at voltaire.com  Sun Feb  4 04:21:02 2007
From: monis at voltaire.com (Moni Shoua)
Date: Sun, 04 Feb 2007 14:21:02 +0200
Subject: [openib-general] IB/mthca: question about HCA profile module
	parameters
In-Reply-To: <45C1C3D5.1050301@dev.mellanox.co.il>
References: <45C1C3D5.1050301@dev.mellanox.co.il>
Message-ID: <45C5CFAE.9000302@voltaire.com>

Dotan Barak wrote:
> Hi Moni.
> 
> I tried to use the mthca module parameter: for example i tried to change
> the number of QPs.
> 
> I got several failures when i used the HCA 25204:
> * sometimes i got the following error message (when using big values,
> for example 512K QPs):
> ib_mthca: 0000:0c: INIT_HCA command failed aborting.
> ib_mthca: probe of 0000:0c: failed with error -16
> * when i tried to use small amount of QPs (1024) the machine just hanged
> and i noticed a kernel oops message on the console
> 
> 
> Did you verify the HCA profile module parameter feature?
> Is there is any known limitation for the values that should be used?
> (for example: only values which are power of two)
> 
> 
> thanks
> Dotan
> 

Hi Dotan,
I verified the profile feature up to the level of successful modprobe.
I am working now to look into your report.
thanks


From mst at mellanox.co.il  Sun Feb  4 04:58:20 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Sun, 4 Feb 2007 14:58:20 +0200
Subject: [openib-general] IPoIB connected mode review comments
In-Reply-To: <1170443893.26115.59.camel@stevo-desktop>
References: <1170443893.26115.59.camel@stevo-desktop>
Message-ID: <20070204125820.GA14288@mellanox.co.il>

> Quoting Steve Wise <swise at opengridcomputing.com>:
> Subject: IPoIB connected mode review comments
> 
> On Thu, 2007-02-01 at 20:48 -0800, Roland Dreier wrote:
> >  > Have you had a chance to review this?
> > 
> > Still on my list.
> > 
> > Can we trade?  Can you look at the IPoIB connected mode stuff in the
> > ipoib-cm branch in
> > 
> >     git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git
> > 
> > and let me know if you see anything you don't like?
> > 
> >  - R.
> 
> Here are my comments.  I'm not an ib cm expert though.  These are mostly
> questions:

Steve, thanks for looking at the code!
I hope the following answers your questions.

> 
> Since IPoIB is using IP addresses already, wouldn't it be simpler to use
> the rdma cm to setup connections?  

IPoIB is not using IP addresses. It uses hardware addresses as any network
device would. So using rdma cm does not make sense.

> Could you optimize this design and only signal some of the tx wrs?

This optimization would apply to UD mode too.
No one so far came up with a way to do this cleanly.

> In ipoib_cm_send() you call ipoib_cm_skb_too_long() if the packet is too
> large for the interface mtu.  And you print a warning.  But
> ipoib_cm_skb_too_long() actually queues the packet for the cm case.  For
> ud it just drops the packet.  The skb task for cm then will send a
> ICMP_DEST_UNREACH for these packets.  Why the difference?

For UD I just kept the current behaviour - I think
this can actually only happen in case of a race when packet was queued
before MTU was changed, so the originator was already notified of
the MTU change by the stack above us.

For CM the local MTU may exceed the size of a buffer that was posted on
the remote QP. So we need to send ICMP_DEST_UNREACH to reduce the
originator's dest MTU to whatever this QP actually can support.
Since this needs the original skb, and must be done from task or bh context,
so we queue the skb and handle it in task context.

> Also if this
> packet came from the local stack via a local application, you don't want
> to send  DEST_UNREACH, right?  (I'm probably just confused about the
> purpose of this).

Yes, sending DEST_UNREACH does not seem to affect local interface.  That's why
I call update_pmtu too.  It is also good to update the MTU ASAP to reduce the
number lot of packets that are dropped - and update_pmtu can be called from
atomic context. I do not know how to tell the packet is from local
stack and it does not seem to do any harm to handle all packets in a uniform
manner.

net/ipv4/ip_gre.c and net/ipv4/ipip.c are examples of code that do something
similiar.

> In ipoib_cm_tx_completion() you rearm, then drain the cq.  I thought
> there was some reason that it was better to do drain/rearm/drain?
> Something about if you rearm and there's a cq entry mthca does another
> immediate interrupt?  

Again, this comment applies to UD mode as well.
AFAIK so far this worked best.

> In ipoib_cm_handle_tx_wc():
> 
> When can a tx completion happen with a wr_id that isn't within the
> ipoib_sendq_size range?  This looks like it is really a bug condition
> that should never happen.

Because of this:
	post_send(priv, tx, tx->tx_head & (ipoib_sendq_size - 1))
so wr_id is always within range.
Again, this is exactly the same logic as in UD case.

> I see the same code in the rx completion path too.  

It's even simpler there:
+       for (i = 0; i < ipoib_recvq_size; ++i) {

...

+               if (ipoib_cm_post_receive(dev, i)) {

...

+               }
+       }

So i is always within RX size range.

> Also, what's up with the /* FIXME */ comment?

Since I have QPs which I never post send WRs on, I should be able to set
.cap.max_send_wr to 0 and .cap.max_send_sge should not matter.

However, low level drivers do not seem to support this at the moment, so
I set these to 1 for now - this is also correct but has a small memory cost. 

> You lock the priv->lock inside of the priv->tx_lock.  Is this ordering
> correct and consistent across all the code?

Yes, that's the nesting rule.

> ipoib_cm_handle_rx_wc() - what's up with the XXX comment?

We have the same comment in UD code - that's where this comes from.
Basically we don't have an easy way to know the correct packet type,
and always setting it to PACKET_HOST seems to work.

> What's the algorithm to keep enough buffers posted in the SRQ?

Same as with UD really - if I can't allocate a new skb I repost
the old one and increment the dropped packet counter.


-- 
MST


From mst at mellanox.co.il  Sun Feb  4 05:06:06 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Sun, 4 Feb 2007 15:06:06 +0200
Subject: [openib-general] IPoIB CM for merge?
In-Reply-To: <OFFB2434C6.DFAD6AC8-ON88257277.000217E9-88257277.0002E42F@us.ibm.com>
References: <OFFB2434C6.DFAD6AC8-ON88257277.000217E9-88257277.0002E42F@us.ibm.com>
Message-ID: <20070204130606.GB14288@mellanox.co.il>

> Quoting Pradeep Satyanarayana <pradeep at us.ibm.com>:
> Subject: Re: [openib-general] IPoIB CM for merge?
> 
> 
> Hello Michael, 
> 
> Here are a few more observations : 

Pradeep, I think you are posting in the wrong thread: it seems you are not
talking about my code, but rather about the project you mentioned of
implementing IPoIB CM without SRQ.

IPoIB CM currently falls back on UD mode for HCAs that do not support SRQ,
so there should be no problem for the ehca - as new code won't be activated.
As I said already, I do not see a clean way to address this limitation,
so I would rather have current IPoIB CM code merged upstream first, and think
about enhancements later.

> 
> 1. For the SRQ case, the skbs and recieve biffers are posted during init and even before the rx_qp is created. This causes a problem (atleast for non SRQs) for the ehca. We need to call the ipoib_cm_alloc_skb() and ipoib_cm_post_recieve() after the rx_qp
> is in the RTR state. 
> 
> 2. Also found that in ipoib_cm_create_rx_qp() one needs to initialize .cap.max_recv_wr and .cap.max_recv_sge. Otherwise this leads to some problems like rq overflows and causing communication failures. 

Yes, I think these are some of the things that would need to be done to make IPoIB CM
work without SRQ. It is clearly not something we want to do for SRQ case however:
for example, posting WRs to SRQ during connection setup would race
against completion events for other connections. And assigning .cap.max_recv_wr > 0
for a QP not connected to SRQ does not make sense, and might thinkably confuse
low level drivers.

-- 
MST


From mst at mellanox.co.il  Sun Feb  4 05:07:18 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Sun, 4 Feb 2007 15:07:18 +0200
Subject: [openib-general] ofa_1_2_kernel 20070202-0200 daily buildstatus
In-Reply-To: <1170430869.26115.12.camel@stevo-desktop>
References: <1170430869.26115.12.camel@stevo-desktop>
Message-ID: <20070204130718.GC14288@mellanox.co.il>

> Quoting Steve Wise <swise at opengridcomputing.com>:
> Subject: Re: [openib-general] ofa_1_2_kernel 20070202-0200 daily buildstatus
> 
> On Fri, 2007-02-02 at 02:20 -0800, vlad at lists.openfabrics.org wrote:
> > This email was generated automatically, please do not reply
> 
> Which distro is 2.6.16.21-0.8-default?  I'm sure I didn't do a netevent
> backport that.  

That's SLES10 actually.

> Failed:
> Build failed on ia64 with linux-2.6.16.21-0.8-default
> Log:
> /home/vlad/tmp/ofa_1_2_kernel-20070202-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core/addr.c:380: error: implicit declaration of function ‘register_netevent_notifier’
> /home/vlad/tmp/ofa_1_2_kernel-20070202-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core/addr.c: In function ‘addr_cleanup’:
> /home/vlad/tmp/ofa_1_2_kernel-20070202-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core/addr.c:386: error: implicit declaration of function ‘unregister_netevent_notifier’
> make[4]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070202-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core/addr.o] Error 1
> make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070202-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core] Error 2
> make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070202-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband] Error 2
> make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070202-0200_linux-2.6.16.21-0.8-default_ia64_check] Error 2
> make[1]: Leaving directory `/home/vlad/kernel.org/ia64/linux-2.6.16.21-0.8-default'
> make: *** [kernel] Error 2


-- 
MST


From mst at mellanox.co.il  Sun Feb  4 05:14:14 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Sun, 4 Feb 2007 15:14:14 +0200
Subject: [openib-general] [PATCH 00/12] ofed_1_2 - Neighbour update
	support
In-Reply-To: <1170429504.26115.1.camel@stevo-desktop>
References: <1170429504.26115.1.camel@stevo-desktop>
Message-ID: <20070204131414.GD14288@mellanox.co.il>

> Quoting Steve Wise <swise at opengridcomputing.com>:
> Subject: Re: [PATCH 00/12] ofed_1_2 - Neighbour update support
> 
> On Fri, 2007-02-02 at 08:03 +0200, Michael S. Tsirkin wrote:
> > > We could use a global refcnt to count the number of pending destructions
> > > and use a completion object to block unload until all the destructors
> > > fire and the refcnt goes to zero.
> > 
> > It has the same race as module refcnt. So just use that.
> > 
> 
> I don't understand the race.  Can you explain please?  This should be
> able to be done without a race with a refcnt, a spinlock, a bit saying
> we're unloading, and a completion object.
> 
> But maybe I'm confused ;-)

In short, the rule is that you can't pass a pointer to your function
to another module, and the unload module safely without synchronizing with that
other module.

Simplified example:

destructor
{
	complete(&foo);
A:
	return;
}

module_cleanup:
{
	wait(foo)
	return;
}

Now, assume destructor runs up to point A, then your module unloads,
and the memory its text occupied is overwritten by something else.
An attempt to execute code from point A will now crash.
So completion is not better than just module refcount here.

That said, I think the race is unlikely and just using module
refcount should be sufficient, and it's certainly simple.

-- 
MST


From mst at mellanox.co.il  Sun Feb  4 05:15:00 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Sun, 4 Feb 2007 15:15:00 +0200
Subject: [openib-general] [PATCH] RE:  regression in ofed 1.2
In-Reply-To: <1170355331.16637.25.camel@stevo-desktop>
References: <1170355331.16637.25.camel@stevo-desktop>
Message-ID: <20070204131500.GE14288@mellanox.co.il>

> Quoting Steve Wise <swise at opengridcomputing.com>:
> Subject: Re: [PATCH] RE:  regression in ofed 1.2
> 
> Um, now on rhel4u4 we crash creating the mcast workqueue.
> 
> The name is "ib_mcast_wq" which is too long for older kernels.
> 
> Did we loose a backport patch?

Not sure what happened here.
Sean, could you rename ib_mcast_wq to ib_mcast please?


-- 
MST


From mst at mellanox.co.il  Sun Feb  4 06:00:19 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Sun, 4 Feb 2007 16:00:19 +0200
Subject: [openib-general] [RFC] [PATCH] ib_usa: export multicast and
 informinfo registration to userspace
In-Reply-To: <000001c74726$94d0f500$e598070a@amr.corp.intel.com>
References: <000001c74726$94d0f500$e598070a@amr.corp.intel.com>
Message-ID: <20070204140019.GC18543@mellanox.co.il>

+static void usa_remove_one(struct ib_device *device)
+{
+	struct usa_device *dev;
+
+	dev = ib_get_client_data(device, &usa_client);
+	if (!dev)
+		return;
+
+	mutex_lock(&usa_mutex);
+	list_del(&dev->list);
+	mutex_unlock(&usa_mutex);
+
+	/* TODO: force immediate device removal */
+	put_dev(dev);
+	wait_for_completion(&dev->comp);
+	kfree(dev);
+}

I think we really need to address this TODO.
An application waiting for data from SA needs to get woken up and get
an error code indicating that the device was removed.

This is currently broken in umad, but let's do it correctly here.

-- 
MST


From mst at mellanox.co.il  Sun Feb  4 06:02:49 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Sun, 4 Feb 2007 16:02:49 +0200
Subject: [openib-general] Detecting when an RDMA writer process
	disappears
In-Reply-To: <45C595B4.3000700@voltaire.com>
References: <45C2C7B1.7090204@evergrid.com>
 <45C595B4.3000700@voltaire.com>
Message-ID: <20070204140249.GD18543@mellanox.co.il>

> Quoting Or Gerlitz <ogerlitz at voltaire.com>:
> Subject: Re: Detecting when an RDMA writer process disappears
> 
> Mike Heffner wrote:
> > Is there any method by which a receiving process that is polling in 
> > preregistered memory regions for data from a sender performing RDMA 
> > writes, can detect if the sender is killed? Say by a SIGKILL signal? The 
> > RC connection is setup using the RDMA CM and there do not appear to be 
> > any CM events created on the event channel
> 
> If you have a process with connected RDMA CM ID whose associated peer 
> process died you should get DISCONNECTED event. how do you verify that 
> there is no rdma cm event present at the polling side?

You may or may not get this event in case of packet loss - same as with sockets.
Sending keepalives is really the only way if you want to handle all
cases such as remote node crash.

-- 
MST


From vlad at mellanox.co.il  Sun Feb  4 06:34:25 2007
From: vlad at mellanox.co.il (Vladimir Sokolovsky)
Date: Sun, 04 Feb 2007 16:34:25 +0200
Subject: [openib-general] openib diags installation issue
Message-ID: <1170599665.5887.14.camel@vladsk-laptop>

Hi Hal,
I have the following issue while executing 'make DESTDIR=/var/tmp/OFED install':
See the patch below for fixing this issue.


 /usr/bin/install -c -m 644 './man/ibprintca.8' '/var/tmp/OFED/usr/local/ofed/share/man/man8/ibprintca.8'
 /usr/bin/install -c -m 644 './man/ibfindnodesusing.8' '/var/tmp/OFED/usr/local/ofed/share/man/man8/ibfindnodesusing.8'
make  install-data-hook
make[3]: Entering directory `/var/tmp/OFEDRPM/BUILD/ofa_user-1.2/src/userspace/management/diags'
for script in scripts/ibqueryerrors.pl scripts/ibswportwatch.pl scripts/iblinkinfo.pl scripts/ibprintswitch.pl scripts/ibprintca.pl scripts/ibfindnodesusing.pl; do \
                binname=`echo $script | sed -e "s/scripts\/\(.*\)/\1/"`; \
                cat $script | sed -e "s,use lib \"<prefix>\(/lib/perl\)\";,use lib \"/usr/local/ofed\1\";," > /usr/local/ofed/bin/$binname; \
                chmod 755 /usr/local/ofed/bin/$binname; \
        done
/bin/bash: line 2: /usr/local/ofed/bin/ibqueryerrors.pl: No such file or directory
chmod: cannot access `/usr/local/ofed/bin/ibqueryerrors.pl': No such file or directory
/bin/bash: line 2: /usr/local/ofed/bin/ibswportwatch.pl: No such file or directory
chmod: cannot access `/usr/local/ofed/bin/ibswportwatch.pl': No such file or directory
/bin/bash: line 2: /usr/local/ofed/bin/iblinkinfo.pl: No such file or directory
chmod: cannot access `/usr/local/ofed/bin/iblinkinfo.pl': No such file or directory
/bin/bash: line 2: /usr/local/ofed/bin/ibprintswitch.pl: No such file or directory
chmod: cannot access `/usr/local/ofed/bin/ibprintswitch.pl': No such file or directory
/bin/bash: line 2: /usr/local/ofed/bin/ibprintca.pl: No such file or directory
chmod: cannot access `/usr/local/ofed/bin/ibprintca.pl': No such file or directory
/bin/bash: line 2: /usr/local/ofed/bin/ibfindnodesusing.pl: No such file or directory
chmod: cannot access `/usr/local/ofed/bin/ibfindnodesusing.pl': No such file or directory
make[3]: *** [install-data-hook] Error 1
make[3]: Leaving directory `/var/tmp/OFEDRPM/BUILD/ofa_user-1.2/src/userspace/management/diags'
make[2]: *** [install-data-am] Error 2
make[2]: Leaving directory `/var/tmp/OFEDRPM/BUILD/ofa_user-1.2/src/userspace/management/diags'
make[1]: *** [install-am] Error 2
make[1]: Leaving directory `/var/tmp/OFEDRPM/BUILD/ofa_user-1.2/src/userspace/management/diags'
make: *** [install_diags] Error 2
error: Bad exit status from /var/tmp/rpm-tmp.37589 (%install)


Patch for fixing the issue above:

diff --git a/diags/Makefile.am b/diags/Makefile.am
index 06b21fc..81ece28 100644
--- a/diags/Makefile.am
+++ b/diags/Makefile.am
@@ -150,9 +150,9 @@ dist-hook: diags.spec
 install-data-hook:
        for script in $(IB_SW_COUNT_DEPENDANT); do \
                binname=`echo $$script | sed -e "s/scripts\/\(.*\)/\1/"`; \
-               cat $$script | sed -e "s,use lib \"<prefix>\(/lib/perl\)\";,use lib \"$(prefix)\1\";," > $(bindir)/$$binname; \
-               chmod 755 $(bindir)/$$binname; \
+               cat $$script | sed -e "s,use lib \"<prefix>\(/lib/perl\)\";,use lib \"$(prefix)\1\";," > $(DESTDIR)$(bindir)/$$binname; \
+               chmod 755 $(DESTDIR)$(bindir)/$$binname; \
        done
-       $(top_srcdir)/config/install-sh -m 755 -d $(prefix)/lib/perl
-       $(top_srcdir)/config/install-sh -m 755 scripts/IBswcountlimits.pm $(prefix)/lib/perl
+       $(top_srcdir)/config/install-sh -m 755 -d $(DESTDIR)$(prefix)/lib/perl
+       $(top_srcdir)/config/install-sh -m 755 scripts/IBswcountlimits.pm $(DESTDIR)$(prefix)/lib/perl


From monis at voltaire.com  Sun Feb  4 06:57:14 2007
From: monis at voltaire.com (Moni Shoua)
Date: Sun, 04 Feb 2007 16:57:14 +0200
Subject: [openib-general] IB/mthca: question about HCA profile module
	parameters
In-Reply-To: <45C1C3D5.1050301@dev.mellanox.co.il>
References: <45C1C3D5.1050301@dev.mellanox.co.il>
Message-ID: <45C5F44A.9020802@voltaire.com>

Dotan Barak wrote:
> Hi Moni.
> 
> I tried to use the mthca module parameter: for example i tried to change
> the number of QPs.
> 
> I got several failures when i used the HCA 25204:
> * sometimes i got the following error message (when using big values,
> for example 512K QPs):
> ib_mthca: 0000:0c: INIT_HCA command failed aborting.
> ib_mthca: probe of 0000:0c: failed with error -16
> * when i tried to use small amount of QPs (1024) the machine just hanged
> and i noticed a kernel oops message on the console
> 
OK. So I ran more tests on my setup which now include
- Dual x86_64 processor (Intel Xeon)
- 1GB RAM
- 25204 HCA - fw_ver=1.1.0

In the range of 16K - to 256K of value for num_qp I got no errors.
For lower and higher values I got errors from INIT_HCA and (not always and just for very low values) a machine hung.
Do you have the Oops saved somewhere? Can you put it here please?


> 
> Did you verify the HCA profile module parameter feature?
As I mentioned earlier, I verified that non default values can be assigned 
and that the HCA works for some selected values. 
I also noticed that illegal cause the driver to throw a message to the kernel log.
However, I didn't test the exact behaviout of all possible values for each profile variable.
> Is there is any known limitation for the values that should be used?
> (for example: only values which are power of two)
> 
> 
I guess that it is clear that there are hardware limitations that don't allow setting of any value.
Unfotunately, even after looking for them in the PRM, I couldn't figure out which are they.
The software limits the value to be a power of 2 and corrects the users if they try to set a wrong value (to the nearest power of 2). In that case a warning message is thrown to the kernel log.
> thanks
> Dotan
> 


From mst at mellanox.co.il  Sun Feb  4 06:59:58 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Sun, 4 Feb 2007 16:59:58 +0200
Subject: [openib-general] [PATCH 00/12] ofed_1_2 - Neighbour update
	support
In-Reply-To: <1170372217.16637.87.camel@stevo-desktop>
References: <1170372217.16637.87.camel@stevo-desktop>
Message-ID: <20070204145958.GA20087@mellanox.co.il>

> If you're worried about regressing straight rdma address
> translation, then you can call the address translation timer function
> synchronously in the snoop function like before and change the
> addr_trans module to not use netevents...

This seems the prudent thing to do.
OK, I'll do that.

-- 
MST


From swise at opengridcomputing.com  Sun Feb  4 07:48:57 2007
From: swise at opengridcomputing.com (Steve WIse)
Date: Sun, 04 Feb 2007 09:48:57 -0600
Subject: [openib-general] [PATCH] ofed_1_2 Cleanup RHEL4U4 netevent
 backport]
In-Reply-To: <1170360441.16637.41.camel@stevo-desktop>
References: <1170360441.16637.41.camel@stevo-desktop>
Message-ID: <1170604137.4129.13.camel@linux-q667.site>

Vlad/Michael,

I'm still tracking this as an outstanding patch.  Have you pulled this
in yet?

Thanks,

Steve.


On Thu, 2007-02-01 at 14:07 -0600, Steve Wise wrote:
> From: Steve Wise <swise at opengridcomputing.com>
> 
> Add skbuff.h to include list for RHEL4U4 netevent.c file.  This makes
> it identical to the SLES9SP3 file.
> 
> Signed-off-by: Steve Wise <swise at opengridcomputing.com>
> ---
> 
>  .../backport/2.6.9_U4/include/src/netevent.c       |    1 +
>  1 files changed, 1 insertions(+), 0 deletions(-)
> 
> diff --git a/kernel_addons/backport/2.6.9_U4/include/src/netevent.c b/kernel_addons/backport/2.6.9_U4/include/src/netevent.c
> index 1589300..87fb55c 100644
> --- a/kernel_addons/backport/2.6.9_U4/include/src/netevent.c
> +++ b/kernel_addons/backport/2.6.9_U4/include/src/netevent.c
> @@ -13,6 +13,7 @@
>   *	Fixes:
>   */
>  
> +#include <linux/skbuff.h>
>  #include <linux/rtnetlink.h>
>  #include <linux/notifier.h>
>  #include <linux/mutex.h>
> 
> 
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
> 


From swise at opengridcomputing.com  Sun Feb  4 07:49:41 2007
From: swise at opengridcomputing.com (Steve WIse)
Date: Sun, 04 Feb 2007 09:49:41 -0600
Subject: [openib-general] [PATCH] ofed_1_2 Chelsio ethernet driver
 updates.
In-Reply-To: <1170360543.16637.45.camel@stevo-desktop>
References: <1170360543.16637.45.camel@stevo-desktop>
Message-ID: <1170604181.4129.15.camel@linux-q667.site>

Vlad/Michael,

I'm still tracking this as an outstanding patch.  Can you pull it in
please?

Thanks,

Steve.


On Thu, 2007-02-01 at 14:09 -0600, Steve Wise wrote:
> From: Steve Wise <swise at opengridcomputing.com>
> 
> This patch updates the ofed_1_2 cxgb3 module to the latest queued
> for 2.6.21.
> 
> Signed-off-by: Steve Wise <swise at opengridcomputing.com>
> ---
> 
>  drivers/net/cxgb3/firmware_exports.h |    2 +-
>  drivers/net/cxgb3/sge.c              |   21 +++++++++------------
>  drivers/net/cxgb3/t3_cpl.h           |    3 ---
>  3 files changed, 10 insertions(+), 16 deletions(-)
> 
> diff --git a/drivers/net/cxgb3/firmware_exports.h b/drivers/net/cxgb3/firmware_exports.h
> index 4538377..6a835f6 100755
> --- a/drivers/net/cxgb3/firmware_exports.h
> +++ b/drivers/net/cxgb3/firmware_exports.h
> @@ -129,7 +129,7 @@ #define FW_OFLD_NUM			8
>  #define FW_OFLD_SGEEC_START		0
>  
>  /*
> - *
> + * 
>   */
>  #define FW_RI_NUM			1
>  #define FW_RI_SGEEC_START		65527
> diff --git a/drivers/net/cxgb3/sge.c b/drivers/net/cxgb3/sge.c
> index 6b053bf..3f2cf8a 100755
> --- a/drivers/net/cxgb3/sge.c
> +++ b/drivers/net/cxgb3/sge.c
> @@ -601,17 +601,16 @@ static struct sk_buff *get_packet(struct
>  	if (len <= SGE_RX_COPY_THRES) {
>  		skb = alloc_skb(len, GFP_ATOMIC);
>  		if (likely(skb != NULL)) {
> -			struct rx_desc *d = &fl->desc[fl->cidx];
> -			dma_addr_t mapping =
> -			    (dma_addr_t)((u64) be32_to_cpu(d->addr_hi) << 32 |
> -					 be32_to_cpu(d->addr_lo));
> -
>  			__skb_put(skb, len);
> -			pci_dma_sync_single_for_cpu(adap->pdev, mapping, len,
> -						    PCI_DMA_FROMDEVICE);
> +			pci_dma_sync_single_for_cpu(adap->pdev,
> +						    pci_unmap_addr(sd,
> +								   dma_addr),
> +						    len, PCI_DMA_FROMDEVICE);
>  			memcpy(skb->data, sd->skb->data, len);
> -			pci_dma_sync_single_for_device(adap->pdev, mapping, len,
> -						       PCI_DMA_FROMDEVICE);
> +			pci_dma_sync_single_for_device(adap->pdev,
> +						       pci_unmap_addr(sd,
> +								      dma_addr),
> +						       len, PCI_DMA_FROMDEVICE);
>  		} else if (!drop_thres)
>  			goto use_orig_buf;
>  	      recycle:
> @@ -1667,7 +1666,7 @@ #endif
>  	credits = G_RSPD_TXQ0_CR(flags);
>  	if (credits)
>  		qs->txq[TXQ_ETH].processed += credits;
> -	
> +
>  	credits = G_RSPD_TXQ2_CR(flags);
>  	if (credits)
>  		qs->txq[TXQ_CTRL].processed += credits;
> @@ -2220,14 +2219,12 @@ static irqreturn_t t3b_intr_napi(int irq
>  	if (likely(map & 1)) {
>  		dev = adap->sge.qs[0].netdev;
>  
> -		BUG_ON(napi_is_scheduled(dev));
>  		if (likely(__netif_rx_schedule_prep(dev)))
>  			__netif_rx_schedule(dev);
>  	}
>  	if (map & 2) {
>  		dev = adap->sge.qs[1].netdev;
>  
> -		BUG_ON(napi_is_scheduled(dev));
>  		if (likely(__netif_rx_schedule_prep(dev)))
>  			__netif_rx_schedule(dev);
>  	}
> diff --git a/drivers/net/cxgb3/t3_cpl.h b/drivers/net/cxgb3/t3_cpl.h
> index 96b2f36..b7a1a31 100755
> --- a/drivers/net/cxgb3/t3_cpl.h
> +++ b/drivers/net/cxgb3/t3_cpl.h
> @@ -184,9 +184,6 @@ #define V_OPCODE(x) ((x) << S_OPCODE)
>  #define G_OPCODE(x) (((x) >> S_OPCODE) & 0xFF)
>  #define G_TID(x)    ((x) & 0xFFFFFF)
>  
> -#define S_QNUM 0
> -#define G_QNUM(x) (((x) >> S_QNUM) & 0xFFFF)
> -
>  /* tid is assumed to be 24-bits */
>  #define MK_OPCODE_TID(opcode, tid) (V_OPCODE(opcode) | (tid))
>  
> 
> 
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
> 


From mst at mellanox.co.il  Sun Feb  4 07:52:44 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Sun, 4 Feb 2007 17:52:44 +0200
Subject: [openib-general] [PATCH] ofed_1_2 Cleanup RHEL4U4
 neteventbackport]
In-Reply-To: <1170604137.4129.13.camel@linux-q667.site>
References: <1170604137.4129.13.camel@linux-q667.site>
Message-ID: <20070204155244.GC20087@mellanox.co.il>

No, but it really makes sense. Vlad?

Quoting Steve WIse <swise at opengridcomputing.com>:
Subject: Re: [openib-general] [PATCH] ofed_1_2 Cleanup RHEL4U4 neteventbackport]

Vlad/Michael,

I'm still tracking this as an outstanding patch.  Have you pulled this
in yet?

Thanks,

Steve.


On Thu, 2007-02-01 at 14:07 -0600, Steve Wise wrote:
> From: Steve Wise <swise at opengridcomputing.com>
> 
> Add skbuff.h to include list for RHEL4U4 netevent.c file.  This makes
> it identical to the SLES9SP3 file.
> 
> Signed-off-by: Steve Wise <swise at opengridcomputing.com>
> ---
> 
>  .../backport/2.6.9_U4/include/src/netevent.c       |    1 +
>  1 files changed, 1 insertions(+), 0 deletions(-)
> 
> diff --git a/kernel_addons/backport/2.6.9_U4/include/src/netevent.c b/kernel_addons/backport/2.6.9_U4/include/src/netevent.c
> index 1589300..87fb55c 100644
> --- a/kernel_addons/backport/2.6.9_U4/include/src/netevent.c
> +++ b/kernel_addons/backport/2.6.9_U4/include/src/netevent.c
> @@ -13,6 +13,7 @@
>   *	Fixes:
>   */
>  
> +#include <linux/skbuff.h>
>  #include <linux/rtnetlink.h>
>  #include <linux/notifier.h>
>  #include <linux/mutex.h>
> 
> 
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
> 

-- 
MST


From mst at mellanox.co.il  Sun Feb  4 07:54:47 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Sun, 4 Feb 2007 17:54:47 +0200
Subject: [openib-general] [PATCH] ofed_1_2 Chelsio ethernet driver
	updates.
In-Reply-To: <1170604181.4129.15.camel@linux-q667.site>
References: <1170360543.16637.45.camel@stevo-desktop>
	<1170604181.4129.15.camel@linux-q667.site>
Message-ID: <20070204155447.GD20087@mellanox.co.il>

Vlad?

Quoting Steve WIse <swise at opengridcomputing.com>:
Subject: Re: [PATCH] ofed_1_2 Chelsio ethernet driver updates.

Vlad/Michael,

I'm still tracking this as an outstanding patch.  Can you pull it in
please?

Thanks,

Steve.


On Thu, 2007-02-01 at 14:09 -0600, Steve Wise wrote:
> From: Steve Wise <swise at opengridcomputing.com>
> 
> This patch updates the ofed_1_2 cxgb3 module to the latest queued
> for 2.6.21.
> 
> Signed-off-by: Steve Wise <swise at opengridcomputing.com>

-- 
MST


From swise at opengridcomputing.com  Sun Feb  4 07:57:57 2007
From: swise at opengridcomputing.com (Steve WIse)
Date: Sun, 04 Feb 2007 09:57:57 -0600
Subject: [openib-general] ofa_1_2_kernel 20070202-0200 daily buildstatus
In-Reply-To: <20070204130718.GC14288@mellanox.co.il>
References: <1170430869.26115.12.camel@stevo-desktop>
	<20070204130718.GC14288@mellanox.co.il>
Message-ID: <1170604677.4129.20.camel@linux-q667.site>

So its building sles10 ok on all other platforms but ia64?  It seems
like its not including the netevent.c file.  But that backport does
exist.  


On Sun, 2007-02-04 at 15:07 +0200, Michael S. Tsirkin wrote:
> > Quoting Steve Wise <swise at opengridcomputing.com>:
> > Subject: Re: [openib-general] ofa_1_2_kernel 20070202-0200 daily buildstatus
> > 
> > On Fri, 2007-02-02 at 02:20 -0800, vlad at lists.openfabrics.org wrote:
> > > This email was generated automatically, please do not reply
> > 
> > Which distro is 2.6.16.21-0.8-default?  I'm sure I didn't do a netevent
> > backport that.  
> 
> That's SLES10 actually.
> 
> > Failed:
> > Build failed on ia64 with linux-2.6.16.21-0.8-default
> > Log:
> > /home/vlad/tmp/ofa_1_2_kernel-20070202-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core/addr.c:380: error: implicit declaration of function ‘register_netevent_notifier’
> > /home/vlad/tmp/ofa_1_2_kernel-20070202-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core/addr.c: In function ‘addr_cleanup’:
> > /home/vlad/tmp/ofa_1_2_kernel-20070202-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core/addr.c:386: error: implicit declaration of function ‘unregister_netevent_notifier’
> > make[4]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070202-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core/addr.o] Error 1
> > make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070202-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core] Error 2
> > make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070202-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband] Error 2
> > make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070202-0200_linux-2.6.16.21-0.8-default_ia64_check] Error 2
> > make[1]: Leaving directory `/home/vlad/kernel.org/ia64/linux-2.6.16.21-0.8-default'
> > make: *** [kernel] Error 2
> 
> 


From swise at opengridcomputing.com  Sun Feb  4 08:14:33 2007
From: swise at opengridcomputing.com (Steve WIse)
Date: Sun, 04 Feb 2007 10:14:33 -0600
Subject: [openib-general] ofa_1_2_kernel 20070202-0200 daily buildstatus
In-Reply-To: <20070204130718.GC14288@mellanox.co.il>
References: <1170430869.26115.12.camel@stevo-desktop>
	<20070204130718.GC14288@mellanox.co.il>
Message-ID: <1170605673.4129.43.camel@linux-q667.site>

Michael,

You've setup a cross-compile environment on staging.openfabrics.org, eh?
How can I utilize that to resolve this issue?  Or is someone else
handling it?


Steve.


On Sun, 2007-02-04 at 15:07 +0200, Michael S. Tsirkin wrote:
> > Quoting Steve Wise <swise at opengridcomputing.com>:
> > Subject: Re: [openib-general] ofa_1_2_kernel 20070202-0200 daily buildstatus
> > 
> > On Fri, 2007-02-02 at 02:20 -0800, vlad at lists.openfabrics.org wrote:
> > > This email was generated automatically, please do not reply
> > 
> > Which distro is 2.6.16.21-0.8-default?  I'm sure I didn't do a netevent
> > backport that.  
> 
> That's SLES10 actually.
> 
> > Failed:
> > Build failed on ia64 with linux-2.6.16.21-0.8-default
> > Log:
> > /home/vlad/tmp/ofa_1_2_kernel-20070202-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core/addr.c:380: error: implicit declaration of function ‘register_netevent_notifier’
> > /home/vlad/tmp/ofa_1_2_kernel-20070202-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core/addr.c: In function ‘addr_cleanup’:
> > /home/vlad/tmp/ofa_1_2_kernel-20070202-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core/addr.c:386: error: implicit declaration of function ‘unregister_netevent_notifier’
> > make[4]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070202-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core/addr.o] Error 1
> > make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070202-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core] Error 2
> > make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070202-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband] Error 2
> > make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070202-0200_linux-2.6.16.21-0.8-default_ia64_check] Error 2
> > make[1]: Leaving directory `/home/vlad/kernel.org/ia64/linux-2.6.16.21-0.8-default'
> > make: *** [kernel] Error 2
> 
> 


From vlad at mellanox.co.il  Sun Feb  4 08:54:30 2007
From: vlad at mellanox.co.il (Vladimir Sokolovsky)
Date: Sun, 04 Feb 2007 18:54:30 +0200
Subject: [openib-general] [PATCH] ofed_1_2 Chelsio ethernet driver
 updates.
In-Reply-To: <1170360543.16637.45.camel@stevo-desktop>
References: <1170360543.16637.45.camel@stevo-desktop>
Message-ID: <1170608070.5887.15.camel@vladsk-laptop>

On Thu, 2007-02-01 at 14:09 -0600, Steve Wise wrote:
> From: Steve Wise <swise at opengridcomputing.com>
> 
> This patch updates the ofed_1_2 cxgb3 module to the latest queued
> for 2.6.21.
> 
> Signed-off-by: Steve Wise <swise at opengridcomputing.com>
> ---
> 
>  drivers/net/cxgb3/firmware_exports.h |    2 +-
>  drivers/net/cxgb3/sge.c              |   21 +++++++++------------
>  drivers/net/cxgb3/t3_cpl.h           |    3 ---
>  3 files changed, 10 insertions(+), 16 deletions(-)

Applied.

-- 
Vladimir Sokolovsky <vlad at mellanox.co.il>
Mellanox Technologies Ltd.


From mst at mellanox.co.il  Sun Feb  4 09:58:33 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Sun, 4 Feb 2007 19:58:33 +0200
Subject: [openib-general] idea for ofed 1 2 kernel file structure
Message-ID: <6C2C79E72C305246B504CBA17B5500C905DC04@mtlexch01.mtl.com>

Hi!
I looked a current ofed 1.2 kernel tree and there is 1 thing I dislike:
It is hard to see changes that are specific to OFED since we have whole
kernel history mixed in.
 
It would easy to split OFED specific files In separate directory and
have OFED scripts combine that with upstream kernel.
 
All out of tree modules we distribute would go there too.
What do others think about this?
 
 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070204/3370276a/attachment.html>

From swise at opengridcomputing.com  Sun Feb  4 10:19:20 2007
From: swise at opengridcomputing.com (Steve WIse)
Date: Sun, 04 Feb 2007 12:19:20 -0600
Subject: [openib-general] idea for ofed 1 2 kernel file structure
In-Reply-To: <6C2C79E72C305246B504CBA17B5500C905DC04@mtlexch01.mtl.com>
References: <6C2C79E72C305246B504CBA17B5500C905DC04@mtlexch01.mtl.com>
Message-ID: <1170613160.4129.110.camel@linux-q667.site>

On Sun, 2007-02-04 at 19:58 +0200, Michael S. Tsirkin wrote:
> Hi!
> 
> I looked a current ofed 1.2 kernel tree and there is 1 thing I
> dislike:
> 
> It is hard to see changes that are specific to OFED since we have
> whole kernel history mixed in.
> 
>  
> 
> It would easy to split OFED specific files In separate directory and
> have OFED scripts combine that with upstream kernel.
> 
>  
> 
> All out of tree modules we distribute would go there too.
> 
> What do others think about this?
> 
>  

I'm not exactly clear what you mean.  Could you expand a little on your
idea?


From mst at mellanox.co.il  Sun Feb  4 10:27:59 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Sun, 4 Feb 2007 20:27:59 +0200
Subject: [openib-general] idea for ofed 1 2 kernel file structure
In-Reply-To: <1170613160.4129.110.camel@linux-q667.site>
References: <1170613160.4129.110.camel@linux-q667.site>
Message-ID: <20070204182759.GA28729@mellanox.co.il>

> Quoting Steve WIse <swise at opengridcomputing.com>:
> Subject: Re: idea for ofed 1 2 kernel file structure
> 
> On Sun, 2007-02-04 at 19:58 +0200, Michael S. Tsirkin wrote:
> > Hi!
> > 
> > I looked a current ofed 1.2 kernel tree and there is 1 thing I
> > dislike:
> > 
> > It is hard to see changes that are specific to OFED since we have
> > whole kernel history mixed in.
> > 
> >  
> > 
> > It would easy to split OFED specific files In separate directory and
> > have OFED scripts combine that with upstream kernel.
> > 
> >  
> > 
> > All out of tree modules we distribute would go there too.
> > 
> > What do others think about this?
> > 
> >  
> 
> I'm not exactly clear what you mean.  Could you expand a little on your
> idea?

Well, OFED kernel tree is currently kernel.org files + OFED files.
We could have OFED files in a separate tree and build script
would put them together.

-- 
MST


From swise at opengridcomputing.com  Sun Feb  4 11:43:27 2007
From: swise at opengridcomputing.com (Steve WIse)
Date: Sun, 04 Feb 2007 13:43:27 -0600
Subject: [openib-general] idea for ofed 1 2 kernel file structure
In-Reply-To: <20070204182759.GA28729@mellanox.co.il>
References: <1170613160.4129.110.camel@linux-q667.site>
	<20070204182759.GA28729@mellanox.co.il>
Message-ID: <1170618207.4129.118.camel@linux-q667.site>


> 
> Well, OFED kernel tree is currently kernel.org files + OFED files.
> We could have OFED files in a separate tree and build script
> would put them together.
> 

So the ofed_1_2 tree would become just new drivers/ulps that are not in
the kernel its based on (2.6.20), kernel_patches/, kernel_addons/, and
ofed_scripts/.  Right?

I think that's a reasonable approach, and it keeps the kernel tree clean
and makes it very clear which files are added to any given kernel
release that ofed bases on.


my 2 cents.


Steve.


From mst at mellanox.co.il  Sun Feb  4 12:58:55 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Sun, 4 Feb 2007 22:58:55 +0200
Subject: [openib-general] idea for ofed 1 2 kernel file structure
In-Reply-To: <1170618207.4129.118.camel@linux-q667.site>
References: <1170613160.4129.110.camel@linux-q667.site>
	<20070204182759.GA28729@mellanox.co.il>
	<1170618207.4129.118.camel@linux-q667.site>
Message-ID: <20070204205855.GF29029@mellanox.co.il>

> Quoting Steve WIse <swise at opengridcomputing.com>:
> Subject: Re: idea for ofed 1 2 kernel file structure
> 
> 
> > 
> > Well, OFED kernel tree is currently kernel.org files + OFED files.
> > We could have OFED files in a separate tree and build script
> > would put them together.
> > 
> 
> So the ofed_1_2 tree would become just new drivers/ulps that are not in
> the kernel its based on (2.6.20), kernel_patches/, kernel_addons/, and
> ofed_scripts/.  Right?

Yes.

> I think that's a reasonable approach, and it keeps the kernel tree clean
> and makes it very clear which files are added to any given kernel
> release that ofed bases on.

On the other hand, we are at feature freeze, so this is only
acceptable only if this can be done with only minor changes
in Vlad's build scripts.

So I'll check with him.

-- 
MST


From dotanb at dev.mellanox.co.il  Mon Feb  5 01:31:34 2007
From: dotanb at dev.mellanox.co.il (Dotan Barak)
Date: Mon, 05 Feb 2007 11:31:34 +0200
Subject: [openib-general] IB/mthca: question about HCA profile module
	parameters
In-Reply-To: <45C5F44A.9020802@voltaire.com>
References: <45C1C3D5.1050301@dev.mellanox.co.il>
	<45C5F44A.9020802@voltaire.com>
Message-ID: <45C6F976.3000802@dev.mellanox.co.il>

Hi Mini and thanks for the quick response.

Moni Shoua wrote:
> OK. So I ran more tests on my setup which now include
> - Dual x86_64 processor (Intel Xeon)
> - 1GB RAM
> - 25204 HCA - fw_ver=1.1.0
>
> In the range of 16K - to 256K of value for num_qp I got no errors.
> For lower and higher values I got errors from INIT_HCA and (not always and just for very low values) a machine hung.
> Do you have the Oops saved somewhere? Can you put it here please?
>
>   
Sorry but i don't have a dump of the kernel oops but i have a strong 
belief  that we saw the same  kernel oops ...
If it is needed, i will try to reproduce it one more time.
>   
>> Did you verify the HCA profile module parameter feature?
>>     
> As I mentioned earlier, I verified that non default values can be assigned 
> and that the HCA works for some selected values. 
> I also noticed that illegal cause the driver to throw a message to the kernel log.
> However, I didn't test the exact behaviout of all possible values for each profile variable.
>   
I guess that this is something that need to be done. i will add this to 
our regression in the future ....
>> Is there is any known limitation for the values that should be used?
>> (for example: only values which are power of two)
>>
>>
>>     
> I guess that it is clear that there are hardware limitations that don't allow setting of any value.
> Unfotunately, even after looking for them in the PRM, I couldn't figure out which are they.
> The software limits the value to be a power of 2 and corrects the users if they try to set a wrong value (to the nearest power of 2). In that case a warning message is thrown to the kernel log.
>   
As much as i know, the minimum amount of any resource (for example, QPs) 
are the number of resources that
the HCA report as reserved.

I will open a bug in the Bugzilla, so we will know that there are 
problems in this feature.

thanks
Dotan


From vlad at dev.mellanox.co.il  Mon Feb  5 01:50:47 2007
From: vlad at dev.mellanox.co.il (Vladimir Sokolovsky)
Date: Mon, 05 Feb 2007 11:50:47 +0200
Subject: [openib-general] MVAPICH2 SRPM and install file patches
In-Reply-To: <45C14344.9010602@cse.ohio-state.edu>
References: <45C14344.9010602@cse.ohio-state.edu>
Message-ID: <1170669047.6049.4.camel@vladsk-laptop>

On Wed, 2007-01-31 at 20:32 -0500, Shaun Rowland wrote:
> I've placed the MVAPICH2 SRPM on the OFA server in ~rowland/ofed_1_2,
> and it is linked to here:
> 
> http://www.openfabrics.org/~rowland/ofed_1_2/
> 

Hi Shaun,
Please change mvapich2.spec to avoid using of %build macro.
It removes RPM_BUILD_ROOT on SuSE distros:

Executing(%build): /bin/sh -e /var/tmp/rpm-tmp.9418
+ umask 022
+ cd /var/tmp/OFEDRPM/BUILD
+ /bin/rm -rf /var/tmp/OFED
++ dirname /var/tmp/OFED
+ /bin/mkdir -p /var/tmp
+ /bin/mkdir /var/tmp/OFED
+ cd mvapich2-0.9.8
+ export OPEN_IB_HOME=/var/tmp/OFED/usr/local/ofed
+ OPEN_IB_HOME=/var/tmp/OFED/usr/local/ofed

-- 
Vladimir Sokolovsky <vlad at dev.mellanox.co.il>
Mellanox Technologies Ltd.


From rdreier at cisco.com  Mon Feb  5 02:15:25 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Mon, 05 Feb 2007 02:15:25 -0800
Subject: [openib-general] [GIT PULL] please pull infiniband.git
Message-ID: <ada1wl4x64i.fsf@cisco.com>

Linus, please pull from

    master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband.git for-linus

This tree is also available from kernel.org mirrors at:

    git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git for-linus

This is my first merge for 2.6.21:

Hoang-Nam Nguyen (2):
      IB/ehca: Remove use of do_mmap()
      IB/ehca: Remove obsolete prototypes

Ishai Rabinovitz (1):
      IB/srp: Don't wait for response when QP is in error state.

Jason Gunthorpe (1):
      IB: Make sure struct ib_user_mad.data is aligned

Michael S. Tsirkin (2):
      IB: Include <linux/kref.h> explicitly in <rdma/ib_verbs.h>
      IB: Return qp pointer as part of ib_wc

Steve Wise (1):
      RDMA/addr: Handle ethernet neighbour updates during route resolution

 drivers/infiniband/core/addr.c            |    3 +-
 drivers/infiniband/core/mad.c             |   11 +-
 drivers/infiniband/core/uverbs_cmd.c      |    2 +-
 drivers/infiniband/hw/amso1100/c2_cq.c    |    2 +-
 drivers/infiniband/hw/ehca/ehca_classes.h |   29 +--
 drivers/infiniband/hw/ehca/ehca_cq.c      |   65 ++----
 drivers/infiniband/hw/ehca/ehca_iverbs.h  |    8 -
 drivers/infiniband/hw/ehca/ehca_main.c    |    6 +-
 drivers/infiniband/hw/ehca/ehca_qp.c      |   78 +-----
 drivers/infiniband/hw/ehca/ehca_reqs.c    |    2 +-
 drivers/infiniband/hw/ehca/ehca_uverbs.c  |  395 ++++++++++++-----------------
 drivers/infiniband/hw/ipath/ipath_qp.c    |    2 +-
 drivers/infiniband/hw/ipath/ipath_rc.c    |    8 +-
 drivers/infiniband/hw/ipath/ipath_ruc.c   |    8 +-
 drivers/infiniband/hw/ipath/ipath_uc.c    |    4 +-
 drivers/infiniband/hw/ipath/ipath_ud.c    |    8 +-
 drivers/infiniband/hw/mthca/mthca_cmd.c   |    2 +-
 drivers/infiniband/hw/mthca/mthca_cq.c    |    2 +-
 drivers/infiniband/ulp/srp/ib_srp.c       |    7 +
 drivers/infiniband/ulp/srp/ib_srp.h       |    1 +
 include/rdma/ib_user_mad.h                |    2 +-
 include/rdma/ib_verbs.h                   |    3 +-
 22 files changed, 243 insertions(+), 405 deletions(-)


From vlad at lists.openfabrics.org  Mon Feb  5 02:22:18 2007
From: vlad at lists.openfabrics.org (vlad at lists.openfabrics.org)
Date: Mon,  5 Feb 2007 02:22:18 -0800 (PST)
Subject: [openib-general] ofa_1_2_kernel 20070205-0200 daily build status
Message-ID: <20070205102221.765A7E607FE@openfabrics.org>

This email was generated automatically, please do not reply


Common build parameters:  --with-ipoib-mod --with-sdp-mod --with-srp-mod --with-user_mad-mod --with-user_access-mod --with-mthca-mod --with-core-mod --with-addr_trans-mod --with-cxgb3-mod 

Passed:
Passed on i686 with 2.6.15-23-server
Passed on i686 with linux-2.6.19
Passed on i686 with linux-2.6.14
Passed on i686 with linux-2.6.17
Passed on i686 with linux-2.6.16
Passed on i686 with linux-2.6.15
Passed on i686 with linux-2.6.13
Passed on i686 with linux-2.6.12
Passed on i686 with linux-2.6.18
Passed on x86_64 with linux-2.6.19
Passed on x86_64 with linux-2.6.18
Passed on x86_64 with linux-2.6.13
Passed on powerpc with linux-2.6.19
Passed on x86_64 with linux-2.6.16
Passed on powerpc with linux-2.6.17
Passed on powerpc with linux-2.6.18
Passed on x86_64 with linux-2.6.15
Passed on x86_64 with linux-2.6.14
Passed on x86_64 with linux-2.6.12
Passed on x86_64 with linux-2.6.17
Passed on ia64 with linux-2.6.19
Passed on powerpc with linux-2.6.14
Passed on powerpc with linux-2.6.16
Passed on powerpc with linux-2.6.12
Passed on ppc64 with linux-2.6.12
Passed on powerpc with linux-2.6.15
Passed on ppc64 with linux-2.6.19
Passed on powerpc with linux-2.6.13
Passed on ppc64 with linux-2.6.14
Passed on ppc64 with linux-2.6.17
Passed on ia64 with linux-2.6.18
Passed on ppc64 with linux-2.6.18
Passed on ppc64 with linux-2.6.13
Passed on ppc64 with linux-2.6.15
Passed on ppc64 with linux-2.6.16
Passed on ia64 with linux-2.6.17
Passed on ia64 with linux-2.6.14
Passed on ia64 with linux-2.6.15
Passed on ia64 with linux-2.6.13
Passed on ia64 with linux-2.6.12
Passed on ia64 with linux-2.6.16

Failed:
Build failed on ia64 with linux-2.6.16.21-0.8-default
Log:
/home/vlad/tmp/ofa_1_2_kernel-20070205-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core/addr.c:380: error: implicit declaration of function ‘register_netevent_notifier’
/home/vlad/tmp/ofa_1_2_kernel-20070205-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core/addr.c: In function ‘addr_cleanup’:
/home/vlad/tmp/ofa_1_2_kernel-20070205-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core/addr.c:386: error: implicit declaration of function ‘unregister_netevent_notifier’
make[4]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070205-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core/addr.o] Error 1
make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070205-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core] Error 2
make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070205-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband] Error 2
make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070205-0200_linux-2.6.16.21-0.8-default_ia64_check] Error 2
make[1]: Leaving directory `/home/vlad/kernel.org/ia64/linux-2.6.16.21-0.8-default'
make: *** [kernel] Error 2
----------------------------------------------------------------------------------


From hello001 at emirates.net.ae  Mon Feb  5 15:04:56 2007
From: hello001 at emirates.net.ae (International IP - Dubai (WorldWide Trademarks Attorneys))
Date: Mon, 05 Feb 2007 15:04:56 -0800
Subject: [openib-general] Our ref. 702/a5tms/12
Message-ID: <0a0e01c7497a$1b938940$0201a8c0@YASSER4>


February5th, 2007

Our ref. 702/a5tms/12

Kind Attn. of General Manager ESQ,

CC. Kind Attn. of Marketing Manager ESQ.

Dear Sir,
Good Afternoon....
        As a leading company specializing in the registration of trademarks/ logos and Commercial Agencies in United Arab Emirates &  WorldWide, we would like to express our sincere desire to be at your service concerning the same in both of UAE and worldwide.

        For setting up your company branch in Dubai, It's our most pleasure to assist you in this regard.

Awaiting your kind inquiries, instructions, suggestions, we always remain.

Warm regards, 

Sincerely,

For International IP - Dubai (WorldWide Trademarks Attorneys)


Main Branch - Dubai

P.O. Box:64246, Dubai, United Arab Emirates


Tel. #+  971-4-2977-930                                
Fax. #+ 971-4-2977-776  
Cellular # +971-50-2519-528


E-mail: hello001 at emirates.net.ae 


Rashid Khalfan Bin Sabt

General Manager
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070205/6fcfde48/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Clear Day Bkgrd.JPG
Type: image/jpeg
Size: 5675 bytes
Desc: not available
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070205/6fcfde48/attachment.jpe>

From bugzilla-daemon at lists.openfabrics.org  Mon Feb  5 03:17:05 2007
From: bugzilla-daemon at lists.openfabrics.org (bugzilla-daemon at lists.openfabrics.org)
Date: Mon,  5 Feb 2007 03:17:05 -0800 (PST)
Subject: [openib-general] [Bug 334] Problems with build
	OFED-1.1.1-ib_local_sa
In-Reply-To: <bug-334-1@https.bugs.openfabrics.org/>
Message-ID: <20070205111705.7372CE607FE@openfabrics.org>

https://bugs.openfabrics.org/show_bug.cgi?id=334


------- Comment #16 from dmitry.yulov at intel.com  2007-02-05 03:17 -------
> I don't agree with your patch. It assumes that SLES 10 may be corrupted. OFED
> should not try to support this. If you want to use this patch for your own
> purposes, just apply it (manually) before running OFED build scripts. OFED's
> backport patches mechanism is not suitable for such patches.

I don't agree with you because my patch do not any changes in system files. It
only search version of SUSE, but if you think that OFED should not try to
support this I think that many Intel people who will install OFED on SLES10
platform will be unhappy. Thanks a lot for you help.

-- Dmitry.


-- 
Configure bugmail: https://bugs.openfabrics.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From vlad at dev.mellanox.co.il  Mon Feb  5 03:44:23 2007
From: vlad at dev.mellanox.co.il (Vladimir Sokolovsky)
Date: Mon, 05 Feb 2007 13:44:23 +0200
Subject: [openib-general] MVAPICH2 rpmbuild issue
In-Reply-To: <45C14344.9010602@cse.ohio-state.edu>
References: <45C14344.9010602@cse.ohio-state.edu>
Message-ID: <1170675863.6049.11.camel@vladsk-laptop>

Hi Shaun,
Please check the following issue:

Executing(%install): /bin/sh -e /var/tmp/rpm-tmp.84872
+ umask 022
+ cd /var/tmp/OFEDRPM/BUILD
+ cd mvapich2-0.9.8
+ export OPEN_IB_HOME=/var/tmp/OFED/usr/local/ofed
+ OPEN_IB_HOME=/var/tmp/OFED/usr/local/ofed
+ '[' -d /var/tmp/OFED/usr/local/ofed/lib ']'
+ '[' -d /var/tmp/OFED/usr/local/ofed/lib64 ']'
+ export PREFIX=/var/tmp/OFED/usr/local/ofed/mpi/gcc/mvapich2-0.9.8-1
+ PREFIX=/var/tmp/OFED/usr/local/ofed/mpi/gcc/mvapich2-0.9.8-1
+ export CC=gcc CXX=g++ F77=gfortran
+ CC=gcc
+ CXX=g++
+ F77=gfortran
+ export ROMIO=yes
+ ROMIO=yes
+ export SHARED_LIBS=yes
+ SHARED_LIBS=yes
+ ./make.mvapich2.gen2
Could not find the OPEN_IB_HOME/lib64 or OPEN_IB_HOME/lib directory.
Exiting.
error: Bad exit status from /var/tmp/rpm-tmp.84872 (%install)


RPM build errors:
    Bad exit status from /var/tmp/rpm-tmp.84872 (%install)
ERROR: Failed executing "rpmbuild --rebuild --define '_topdir /var/tmp/OFEDRPM' --define '_name mvapich2_gcc' --define '_prefix /usr/local/ofed/mpi/gcc/mvapich2-0.9.8-1' --define 'build_root /var/tmp/OFED' --define 'open_ib_home /usr/local/ofed' --define 'ofed_build_root /var/tmp/OFED' --define 'comp_env CC=gcc CXX=g++ F77=gfortran' --define 'iwarp 0' --define 'romio 1' --define 'shared_libs 1' --define 'auto_req 1' /mswg2/work/vlad/ofed/test/OFED-1.2-alpha1/SRPMS/mvapich2-0.9.8-1.src.rpm"

-- 
Vladimir Sokolovsky <vlad at dev.mellanox.co.il>
Mellanox Technologies Ltd.


From bugzilla-daemon at lists.openfabrics.org  Mon Feb  5 03:52:57 2007
From: bugzilla-daemon at lists.openfabrics.org (bugzilla-daemon at lists.openfabrics.org)
Date: Mon,  5 Feb 2007 03:52:57 -0800 (PST)
Subject: [openib-general] [Bug 334] Problems with build
	OFED-1.1.1-ib_local_sa
In-Reply-To: <bug-334-1@https.bugs.openfabrics.org/>
Message-ID: <20070205115257.F1917E607FE@openfabrics.org>

https://bugs.openfabrics.org/show_bug.cgi?id=334


erezz at voltaire.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |vlad at mellanox.co.il


------- Comment #17 from erezz at voltaire.com  2007-02-05 03:52 -------
(In reply to comment #16)
> > I don't agree with your patch. It assumes that SLES 10 may be corrupted. OFED
> > should not try to support this. If you want to use this patch for your own
> > purposes, just apply it (manually) before running OFED build scripts. OFED's
> > backport patches mechanism is not suitable for such patches.
> 
> I don't agree with you because my patch do not any changes in system files. It
> only search version of SUSE, but if you think that OFED should not try to
> support this I think that many Intel people who will install OFED on SLES10
> platform will be unhappy. Thanks a lot for you help.
> 
> -- Dmitry.
> 

Note that /etc/issue belongs to a SLES package:
rpm thyme:~ # rpm -qf /etc/issue
sles-release-10-15.2

Deleting it means that you corrupt your system. One can also delete
/etc/SuSE-release and expect that OFED will work. If you decide to delete
/etc/issue (or any other file that comes with SLES 10), you'll need to change
OFED scripts for your special needs. Anyway, I maintain iSER in OFED. You may
want to ask Vlad (vlad at mellanox.co.il) what he thinks about it. He maintains
OFED's build scripts.


-- 
Configure bugmail: https://bugs.openfabrics.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at lists.openfabrics.org  Mon Feb  5 04:02:29 2007
From: bugzilla-daemon at lists.openfabrics.org (bugzilla-daemon at lists.openfabrics.org)
Date: Mon,  5 Feb 2007 04:02:29 -0800 (PST)
Subject: [openib-general] [Bug 334] Problems with build
	OFED-1.1.1-ib_local_sa
In-Reply-To: <bug-334-1@https.bugs.openfabrics.org/>
Message-ID: <20070205120229.2A624E607FE@openfabrics.org>

https://bugs.openfabrics.org/show_bug.cgi?id=334


------- Comment #18 from dmitry.yulov at intel.com  2007-02-05 04:02 -------
> Note that /etc/issue belongs to a SLES package:
> rpm thyme:~ # rpm -qf /etc/issue
> sles-release-10-15.2
> Deleting it means that you corrupt your system. One can also delete
> /etc/SuSE-release and expect that OFED will work. If you decide to delete
> /etc/issue (or any other file that comes with SLES 10), you'll need to change
> OFED scripts for your special needs. Anyway, I maintain iSER in OFED. You may
> want to ask Vlad (vlad at mellanox.co.il) what he thinks about it. He maintains
> OFED's build scripts.

Thank you. I do not delete /etc/issue file. I have had it file, but it contain
next information:

: cat /etc/issue
************************************************
Use of this system by unauthorized persons or   
in an unauthorized manner is strictly prohibited
************************************************

That is all. 


-- 
Configure bugmail: https://bugs.openfabrics.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From tziporet at mellanox.co.il  Mon Feb  5 04:04:29 2007
From: tziporet at mellanox.co.il (Tziporet Koren)
Date: Mon, 05 Feb 2007 14:04:29 +0200
Subject: [openib-general] QoS in opensm will not be part of OFED 1.2
Message-ID: <45C71D4D.4060503@mellanox.co.il>

Hi Hal,

I had an AI to check the QoS status with OSM.
Conclusions are that QoS support in OpenSM will not be part of OFED 1.2 
(I updated the plan on the Wiki)

The reasons for this are:
1. Code not ready at code freeze.
2. There are technical discussion in the list regarding some 
implementation details (e.g. XML or text syntax).
3. SPEC is not published by IBTA yet.

Hal & Yevgeny - please work on a plan that will enable QoS to be merged 
on the main trunk once its ready.

Tziporet


From kliteyn at dev.mellanox.co.il  Mon Feb  5 04:37:41 2007
From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik)
Date: Mon, 05 Feb 2007 14:37:41 +0200
Subject: [openib-general] OSM QoS policy file
Message-ID: <45C72515.8090100@dev.mellanox.co.il>

Hi Hal.

I added osm/doc/qos-policy.txt file with the description of the QoS
policy file, and an example of such file (with more comments inside).
I'm sure you'll have questions and corrections regarding this file,
so for now, to make our work easier, I'm not sending it as patch, 
but just as text. Please review the file.

Thanks

-- Yevgeny

=============================================================

QoS Policy File
===============

The QoS policy file is divided into 4 sub sections:

 - Port Group: a set of CAs, Routers or Switches that share 
   the same settings. A port group might be a partition 
   defined by the partition manager policy in terms of 
   GUIDs. Future implementations might provide support 
   for NodeDescription based definition of port groups.

 - Fabric Setup: 
   Defines how the SL2VL and VLArb tables should be setup.
   This policy definition assumes the computation of target 
   behavior should be performed outside of OpenSM.

 - QoS-Levels Definition:
   This section defines the possible sets of parameters for 
   QoS that a client might be mapped to. Each set holds: SL
   and optionally: Max MTU, Max Rate, Packet Lifiteme and 
   QoS Class.

 - Matching Rules:
   A list of rules that match an incoming PathRecord request
   to a QoS-Level. The rules are processed in order such as 
   the first match is applied. Each rule is built out of set
   of match expressions which should all match for the rule
   to apply. The matching expressions are defined for the 
   following fields:
     - SRC and DST to lists of port groups
     - Service-ID to a list of Service-ID or Service-ID ranges
     - QoS Class to a list of QoS Class values or ranges


Example of the QoS policy file
==============================

<?xml version="1.0" encoding="ISO-8859-1"?>
<qos-policy>
    <!-- Port Groups define sets of ports to be used later in the settings -->
    <port-groups>
        <!-- using port GUIDs -->
        <port-group> 
            <name>Storage</name> 
            <!-- <use> is just a description that is used for logging.
                 Other than that, it is just a commentary -->
            <use>our SRP storage targets</use>
            <port-guid>0x1000000000000001</port-guid>
            <port-guid>0x1000000000000002</port-guid>
        </port-group>
        <port-group> 
            <name>Virtual Servers</name> 
            <use>node desc and IB port #</use>
            <!-- The syntax of the port name is as follows: "hostname/CA-num/Pnum".
                 "hostname" and "CA-num" are compared to the first 2 words of 
                 NodeDescription, and "Pnum" is a port number on that node. -->
            <port-name>vs1/HCA-1/P1</port-name>
            <port-name>vs3/HCA-1/P1</port-name>
            <port-name>vs3/HCA-2/P1</port-name>
        </port-group>
        <!-- using partitions defined in the partition policy -->
        <port-group> 
            <name>Partition 1</name> 
            <use>default settings</use>
            <partition>Part1</partition> 
        </port-group>
        <!-- using node types CA|ROUTER|SWITCH -->
        <port-group> 
            <name>Routers</name> 
            <use>all routers</use>
            <node-type>ROUTER</node-type> 
        </port-group>  
    </port-groups>
    
    <qos-setup>
        <sl2vl-tables>
            <!-- scope defines the exact devices and in/out ports the tables apply to
                 if the same port is matching several rules the last one applies -->
            <sl2vl-scope> 
                <group>Part1</group> 
                <!-- *see explanation below the policy file example* -->
                <from>*</from> 
                <!-- *see explanation below the policy file example* -->
                <to>*</to> 
                <!-- SL2VL table has to have exactly 16 values (one for each SL) -->
                <sl2vl-table>0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,7</sl2vl-table>
            </sl2vl-scope>
            <sl2vl-scope>
                <!-- *see explanation below the policy file example* -->
                <across-from>Storage1</across-from>
                <!-- *see explanation below the policy file example* -->
                <across-to>Storage2</across-to>
                <sl2vl-table>0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0</sl2vl-table>
            </sl2vl-scope>
        </sl2vl-tables>

        <!-- define all types of VLArb tables. The length of the tables should 
             match the physically supported tables by their target ports -->
        <vlarb-tables>
            <!-- scope defines the exact ports the VLArb tables apply to -->
            <vlarb-scope> 
                <!-- defining VLArb tables on all the ports that belong to 
                     port group 'Storage', and on all the ports that connected 
                     to ports of port group 'Storage' -->
                <group>Storage</group>
                <!-- "across" means all the ports that are connected to ports 
                     that belong to the specified port group -->
                <across>Storage</across>
                <!-- VLArb table holds VL and weight pairs -->
                <vlarb-high>0:255,1:127,2:63,3:31,4:15,5:7,6:3,7:1</vlarb-high>
                <vlarb-low>8:255,9:127,10:63,11:31,12:15,13:7,14:3</vlarb-low>
                <vl-high-limit>10</vl-high-limit>
            </vlarb-scope>
        </vlarb-tables>

    </qos-setup>

    <qos-levels>
        <!-- the first one is just setting SL -->
        <qos-level> 
            <!-- Serial number (unique ID) of the QoS level -->
            <sn>1</sn> 
            <use>for the lowest priority comm</use>
            <sl>16</sl>
            <pkey>123</pkey>
            <packet-life>16</packet-life>
        </qos-level>
        <!-- the second sets SL and QoS Class -->
        <qos-level> 
            <sn>2</sn> 
            <use>low latency best bandwidth</use>
            <sl>0</sl> 
            <qos-class>7</qos-class>
        </qos-level>
        <!-- the whole set: SL, QoS Class, MTU-Limit, Rate-Limit, Packet Lifetime -->
        <qos-level> 
            <sn>3</sn> 
            <use>just an example</use>
            <sl>0</sl> 
            <qos-class>32</qos-class> 
            <mtu-limit>1</mtu-limit> 
            <rate-limit>1</rate-limit>
            <packet-life>12</packet-life>
        </qos-level>
    </qos-levels>

    <!-- Match rules are scanned in a first-fit manner (like firewall rules table) -->
    <qos-match-rules>
        <!-- matching by single criteria: class (list of values and ranges) -->
        <qos-match-rule> 
            <qos-level-sn>1</qos-level-sn> <!-- defined in <sn> of <qos-level> -->
            <use>low latency by class 7-9 or 11</use> <!-- just a description -->
            <qos-class>7-9,11</qos-class>
        </qos-match-rule>
        <!-- show matching by destination group AND service-ids -->
        <qos-match-rule> 
            <qos-level-sn>2</qos-level-sn> 
            <use>Storage targets connection></use>
            <destination>Storage</destination>
            <service>22,4719-5000</service>
        </qos-match-rule>
        <!-- show matching by source group only -->
        <qos-match-rule> 
            <qos-level-sn>3</qos-level-sn> 
            <use>bla bla</use>
            <source>Storage</source>
        </qos-match-rule>
    </qos-match-rules>

</qos-policy>


Explanation of some fields
==========================

Most of the tags meaning is either intuitive or explained by the 
comments along the file. One section that deserves a special
explanation is SL2VL tables definition - <sl2vl-scope>.

In general, VL is a function of in-port (the port that the packet
has entered through), out-port (the port that the packet is supposed
to come out from) and the SL.
In OpenSM, SL2VL table is defined on every port, where this port is 
an out-port. Hence, on every port, SL2VL table is defined as function 
of in-port and SL.

<to>n,m</to>

  This means that of all the ports of the specified port group, define
  SL2VL tables where to-ports are ports number n and m. Since SL2VL 
  table is defined per out-port, using <to> effectively means defining
  SL2VL table on ports n and m.
  In order to specify that SL2VL table should be defined on all the 
  ports, an asterisk (*) can be used.

<from>i,j</from>

  This means that of all the ports of the specified port group that were
  not filtered out by the <to> value, define SL2VL table only for entries
  where from-ports are ports number i and j.
  In order to specify that SL2VL table should be defined for all the in-ports, 
  an asterisk (*) can be used.

To specify that all the SL2VL tables entries should be defined for all 
the ports of a certain group, use the following:
    <group>port_group</group> 
    <from>*</from>
    <to>*</to>

<across-to>PortGroupName</across-to>
  
  This is combination of <across> keyword (that can be found in VLArb tables 
  definition) and <to> keyword. 
  <across>PortGroupName</across> means that the ports that we're talking about
  are all the ports that are connected to ports that belong to PortGroupName.
  Essintially, <across-to>PortGroupName</across-to> means the folowing:
  <to>list_of_all_the_ports_that_are_connected_to_group_PortGroupName</to>
  
  Example of usage of <across-to>:
  A user has a set of 'special' nodes (e.g. storage nodes), and all the
  traffic to these nodes has to get specific VL. The solution is to define port
  group (i.e "Storage") that will include all the ports of these nodes, and then
  to configure SL2VL tables on all the switch ports that are connected to the
  Storage port group by specifying <across-to>Storage</across-to>
  
<across-from>PortGroupName</across-from>

  Similar to <across-to>, <across-from> is combination of <across> and <from>
  keywords.
  

From rdreier at cisco.com  Mon Feb  5 06:20:25 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Mon, 05 Feb 2007 06:20:25 -0800
Subject: [openib-general] idea for ofed 1 2 kernel file structure
References: <6C2C79E72C305246B504CBA17B5500C905DC04@mtlexch01.mtl.com>
Message-ID: <adaveigvg7q.fsf@cisco.com>

 > I looked a current ofed 1.2 kernel tree and there is 1 thing I dislike:
 > It is hard to see changes that are specific to OFED since we have whole
 > kernel history mixed in.

I'm not sure how you have your branches set up, but if you have
something like a "linus" branch that tracks the upstream kernel, it's
easy to do stuff like "git log linus.." or "git diff linus.. drivers/infiniband"
and see the differences that way.

Using git that way (which is what it's designed for, after all) seems
better than some scripts to munge together two trees.

 - R.


From halr at voltaire.com  Mon Feb  5 06:00:50 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 05 Feb 2007 09:00:50 -0500
Subject: [openib-general] QoS in opensm will not be part of OFED 1.2
In-Reply-To: <45C71D4D.4060503@mellanox.co.il>
References: <45C71D4D.4060503@mellanox.co.il>
Message-ID: <1170684049.4525.195527.camel@hal.voltaire.com>

Hi Tziporet,

On Mon, 2007-02-05 at 07:04, Tziporet Koren wrote:
> Hi Hal,
> 
> I had an AI to check the QoS status with OSM.
> Conclusions are that QoS support in OpenSM will not be part of OFED 1.2 
> (I updated the plan on the Wiki)
> 
> The reasons for this are:
> 1. Code not ready at code freeze.
> 2. There are technical discussion in the list regarding some 
> implementation details (e.g. XML or text syntax).
> 3. SPEC is not published by IBTA yet.

I think this last reason also applies to the end client QoS changes as
well.

-- Hal

> Hal & Yevgeny - please work on a plan that will enable QoS to be merged 
> on the main trunk once its ready.

> Tziporet
> 
> 
>  


From xma at us.ibm.com  Mon Feb  5 06:50:55 2007
From: xma at us.ibm.com (Shirley Ma)
Date: Mon, 5 Feb 2007 07:50:55 -0700
Subject: [openib-general] [PATCH] enable IPoIB only if broadcast join finish
In-Reply-To: <adaveigvg7q.fsf@cisco.com>
Message-ID: <OF15ED804B.C1AB3DA2-ON87257279.0050CC75-88257279.00259EE9@us.ibm.com>


Hi, Roland,

Please review this patch. According to IPoIB RFC4391 section 5, once IPoIB
broacast group has been joined, the interface should be ready for data
transfer. In current IPoIB implementation, the interface is UP and RUNNING
when all default multicast join successful. We hit a problem while the
broadcast join finishe and sucessful but the all hosts multicast join
failure.

Here is the patch, if possible please give your input asap, we have an
urgent customer issue need to be resolved:

diff -urpN ipoib/ipoib_multicast.c ipoib-multicast/ipoib_multicast.c
--- ipoib/ipoib_multicast.c   2006-11-29 13:57:37.000000000 -0800
+++ ipoib-multicast/ipoib_multicast.c     2007-02-04 22:34:16.000000000 -0800
@@ -402,6 +402,11 @@ static void ipoib_mcast_join_complete(in
                  queue_work(ipoib_workqueue, &priv->mcast_task);
            mutex_unlock(&mcast_mutex);
            complete(&mcast->done);
+           /*
+            * broadcast join finished, enable carrier
+            */
+           if (mcast == priv->broadcast)
+                 netif_carrier_on(dev);
            return;
      }

@@ -599,7 +604,6 @@ void ipoib_mcast_join_task(void *dev_ptr
      ipoib_dbg_mcast(priv, "successfully joined all multicast groups\n");

      clear_bit(IPOIB_MCAST_RUN, &priv->flags);
-     netif_carrier_on(dev);
 }

 int ipoib_mcast_start_thread(struct net_device *dev)

(See attached file: ipoib-multicast.patch)

Thanks
Shirley Ma
IBM Linux Technology Center
15300 SW Koll Parkway
Beaverton, OR 97006-6063
Phone(Fax): (503) 578-7638
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070205/4821ac96/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ipoib-multicast.patch
Type: application/octet-stream
Size: 777 bytes
Desc: not available
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070205/4821ac96/attachment.obj>

From michael.arndt at informatik.tu-chemnitz.de  Mon Feb  5 07:18:24 2007
From: michael.arndt at informatik.tu-chemnitz.de (Michael Arndt)
Date: Mon, 5 Feb 2007 16:18:24 +0100
Subject: [openib-general] Unknown SMP Recv
Message-ID: <000901c74938$e10b2a30$21606d86@one7>

Hi,

I have change the driver (smi) a little and have written a tool like a 
router or a bridge. It receives directed route smp's on one port and sends 
it to another port. I use 3 nodes (sender on node 1, the router on node 2, 
normal node on 3) and send a subnGet SMP with [0][1][1] as initial path. And 
it works fine, but on way back the router also receives a second subnGetResp 
packet with no data. The header is almost the same as the real subnGetResp 
packet, just the DrSLID,DrDLID, initial path, return path are 0. Are there 
any ideas where this packet come from? Ack?

Thanks Michael
 

From mst at mellanox.co.il  Mon Feb  5 07:25:08 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Mon, 5 Feb 2007 17:25:08 +0200
Subject: [openib-general] idea for ofed 1 2 kernel file structure
In-Reply-To: <adaveigvg7q.fsf@cisco.com>
References: <adaveigvg7q.fsf@cisco.com>
Message-ID: <20070205152507.GA4246@mellanox.co.il>

> Quoting Roland Dreier <rdreier at cisco.com>:
> Subject: Re: [openib-general] idea for ofed 1 2 kernel file structure
> 
>  > I looked a current ofed 1.2 kernel tree and there is 1 thing I dislike:
>  > It is hard to see changes that are specific to OFED since we have whole
>  > kernel history mixed in.
> 
> I'm not sure how you have your branches set up, but if you have
> something like a "linus" branch that tracks the upstream kernel, it's
> easy to do stuff like "git log linus.." or "git diff linus.. drivers/infiniband"
> and see the differences that way.

limit to drivers/infiniband is no longer sufficient as we have components
under drivers/net etc.
Another problem is that history-rewriting tools such as git rebase
seem to easily get confused by the complicated linux history.

> Using git that way (which is what it's designed for, after all) seems
> better than some scripts to munge together two trees.

Problem is, OFED kernel code actually consists of 2 parts:
upstream kernel developed separately at lkml and out of kernel components,
developed separately. OFED does not really track linux all the time: we
only update at -RC time.

Mixing such 2 projects together does not seem to be what git was designed for.
For example, when a patch is applied upstream we need to remove it from
fixes. So after I do git pull from upstream I get a broken tree that won't
even build. Not good.

Another problem I'm trying to address is the confusion around what gets
applied as patch and what directly. This way, a bad patch won't even apply.


-- 
MST


From halr at voltaire.com  Mon Feb  5 07:34:15 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 05 Feb 2007 10:34:15 -0500
Subject: [openib-general] Unknown SMP Recv
In-Reply-To: <000901c74938$e10b2a30$21606d86@one7>
References: <000901c74938$e10b2a30$21606d86@one7>
Message-ID: <1170689654.4525.201415.camel@hal.voltaire.com>

On Mon, 2007-02-05 at 10:18, Michael Arndt wrote:
> Hi,
> 
> I have change the driver (smi) a little and have written a tool like a 
> router or a bridge. It receives directed route smp's on one port and sends 
> it to another port. I use 3 nodes (sender on node 1, the router on node 2, 
> normal node on 3) and send a subnGet SMP with [0][1][1] as initial path. And 
> it works fine, but on way back the router also receives a second subnGetResp 
> packet with no data. The header is almost the same as the real subnGetResp 
> packet, just the DrSLID,DrDLID, initial path, return path are 0. Are there 
> any ideas where this packet come from? Ack?

A router should not allow a SMP to cross a subnet boundary. SMPs are
restricted to the local subnet.

-- Hal

> Thanks Michael
>  
> 
> 
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
> 


From mst at mellanox.co.il  Mon Feb  5 07:38:26 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Mon, 5 Feb 2007 17:38:26 +0200
Subject: [openib-general] QoS in opensm will not be part of OFED 1.2
In-Reply-To: <1170684049.4525.195527.camel@hal.voltaire.com>
References: <45C71D4D.4060503@mellanox.co.il>
	<1170684049.4525.195527.camel@hal.voltaire.com>
Message-ID: <20070205153826.GB4246@mellanox.co.il>

> > I had an AI to check the QoS status with OSM.
> > Conclusions are that QoS support in OpenSM will not be part of OFED 1.2 
> > (I updated the plan on the Wiki)
> > 
> > The reasons for this are:
> > 1. Code not ready at code freeze.
> > 2. There are technical discussion in the list regarding some 
> > implementation details (e.g. XML or text syntax).
> > 3. SPEC is not published by IBTA yet.
> 
> I think this last reason also applies to the end client QoS changes as
> well.

Yes. But the other 2 don't.

-- 
MST


From changquing.tang at hp.com  Mon Feb  5 07:48:29 2007
From: changquing.tang at hp.com (Tang, Changqing)
Date: Mon, 5 Feb 2007 15:48:29 -0000
Subject: [openib-general] Immediate data question
In-Reply-To: <adaveigvg7q.fsf@cisco.com>
References: <6C2C79E72C305246B504CBA17B5500C905DC04@mtlexch01.mtl.com>
	<adaveigvg7q.fsf@cisco.com>
Message-ID: <349DCDA352EACF42A0C49FA6DCEA840350AAC4@G3W0634.americas.hpqcorp.net>


Roland:
	If I only want to send/recv 4 bytes with immediate data:

On sender side:
	opcode = IBV_WR_SEND_WITH_IMM;
	imm_data = my_4_bytes_data;

	Do I still need to specify sg_list and num_sge ?

On receiver side, because the immediate data is inside the completion
structure, do I need to post a receive for above message ?
If I need to post a receive, do I need to specify sg_list and num_sge
for the receive ?

I looked the spec but did not find useful information.

The reason I ask is that at some point, I can not(or hard) to provide
registered memory only for 4 bytes data.

Thank you.

--CQ


> -----Original Message-----
> From: openib-general-bounces at openib.org 
> [mailto:openib-general-bounces at openib.org] On Behalf Of Roland Dreier
> Sent: Monday, February 05, 2007 8:20 AM
> To: Michael S. Tsirkin
> Cc: openib-general at openib.org
> Subject: Re: [openib-general] idea for ofed 1 2 kernel file structure
> 
>  > I looked a current ofed 1.2 kernel tree and there is 1 
> thing I dislike:
>  > It is hard to see changes that are specific to OFED since 
> we have whole  > kernel history mixed in.
> 
> I'm not sure how you have your branches set up, but if you 
> have something like a "linus" branch that tracks the upstream 
> kernel, it's easy to do stuff like "git log linus.." or "git 
> diff linus.. drivers/infiniband"
> and see the differences that way.
> 
> Using git that way (which is what it's designed for, after 
> all) seems better than some scripts to munge together two trees.
> 
>  - R.
> 
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit 
> http://openib.org/mailman/listinfo/openib-general
> 
> 


From mst at mellanox.co.il  Mon Feb  5 07:49:22 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Mon, 5 Feb 2007 17:49:22 +0200
Subject: [openib-general] QoS in opensm will not be part of OFED 1.2
In-Reply-To: <1170690105.4525.201879.camel@hal.voltaire.com>
References: <1170690105.4525.201879.camel@hal.voltaire.com>
Message-ID: <20070205154922.GC4246@mellanox.co.il>

> > > > I had an AI to check the QoS status with OSM.
> > > > Conclusions are that QoS support in OpenSM will not be part of OFED 1.2 
> > > > (I updated the plan on the Wiki)
> > > > 
> > > > The reasons for this are:
> > > > 1. Code not ready at code freeze.
> > > > 2. There are technical discussion in the list regarding some 
> > > >    implementation details (e.g. XML or text syntax).
> > > > 3. SPEC is not published by IBTA yet.
> > > 
> > > I think this last reason also applies to the end client QoS changes as
> > > well.
> > 
> > Yes. But the other 2 don't.
> 
> Right but I think that precludes it from being included in OFED right
> now.

Since the code is already included in OFED, moving it out would violate the feature
freeze rules, unless there's an actual bug this would fix.

-- 
MST


From halr at voltaire.com  Mon Feb  5 07:41:48 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 05 Feb 2007 10:41:48 -0500
Subject: [openib-general] QoS in opensm will not be part of OFED 1.2
In-Reply-To: <20070205153826.GB4246@mellanox.co.il>
References: <45C71D4D.4060503@mellanox.co.il>
	<1170684049.4525.195527.camel@hal.voltaire.com>
	<20070205153826.GB4246@mellanox.co.il>
Message-ID: <1170690105.4525.201879.camel@hal.voltaire.com>

On Mon, 2007-02-05 at 10:38, Michael S. Tsirkin wrote:
> > > I had an AI to check the QoS status with OSM.
> > > Conclusions are that QoS support in OpenSM will not be part of OFED 1.2 
> > > (I updated the plan on the Wiki)
> > > 
> > > The reasons for this are:
> > > 1. Code not ready at code freeze.
> > > 2. There are technical discussion in the list regarding some 
> > > implementation details (e.g. XML or text syntax).
> > > 3. SPEC is not published by IBTA yet.
> > 
> > I think this last reason also applies to the end client QoS changes as
> > well.
> 
> Yes. But the other 2 don't.

Right but I think that precludes it from being included in OFED right
now.

-- Hal


From guyg at Voltaire.COM  Mon Feb  5 08:43:14 2007
From: guyg at Voltaire.COM (guyg)
Date: Mon, 05 Feb 2007 18:43:14 +0200
Subject: [openib-general]  [libmthca] deadlock while trying to destroy QP
Message-ID: <45C75EA2.6000905@Voltaire.COM>

Hi Roland,

I am running a proprietary test over ofed1.1 (userspace).

I have one context where I poll my cq and another (signal handler 
context) where I try to destroy my QP.

It looks like mthca_destroy_qp is trying to take a lock that 
mthca_poll_cq is holding.

The deadlock is occurring at the end of the test run where there 
are no more completions, hence deadlocking and the test never exists.

Here is a core dump:

#0  0x0000003a6ce09172 in pthread_spin_lock () from /lib64/tls/libpthread.so.0
#1  0x0000002a959cf449 in mthca_cq_clean (cq=0x607240, qpn=3277830, srq=0x0) at src/cq.c:554
#2  0x0000002a959d28b9 in mthca_destroy_qp (qp=0x607400) at src/mthca.h:246
#3  0x000000000040117b in client_sig_handler ()
#4  <signal handler called>
#5  0x0000003a6ce09165 in pthread_spin_lock () from /lib64/tls/libpthread.so.0
#6  0x0000002a959cec91 in mthca_poll_cq (ibcq=0x607240, ne=1, wc=0x7fbffff590) at src/cq.c:467
#7  0x0000002a9557bf73 in ibv_poll_cq (cq=0x607240, num_entries=1, wc=0x7fbffff590) at /usr/local/ofed/include/infiniband/verbs.h:824


Does destroy_qp needs to be dependent on the CQ?

Do you have any suggestions?

Thanks,
Guy


From michael.arndt at informatik.tu-chemnitz.de  Mon Feb  5 08:56:58 2007
From: michael.arndt at informatik.tu-chemnitz.de (Michael Arndt)
Date: Mon, 5 Feb 2007 17:56:58 +0100
Subject: [openib-general] Unknown SMP Recv
References: <000901c74938$e10b2a30$21606d86@one7>
	<1170689654.4525.201415.camel@hal.voltaire.com>
Message-ID: <001401c74946$a664a2e0$21606d86@one7>

Hi,

> A router should not allow a SMP to cross a subnet boundary. SMPs are
> restricted to the local subnet.

I work on a discovering mechanism for switchless InfiniBand Architectures 
like Rings, Tori or maybe Hyper-Cubes. There is just one single subnet, no 
switches or routers. Please ignore the background and focus to the problem 
about the second packet. Maybe you have some ideas even you are not involved 
in the hole project. That would be nice.

Thanks Michael 


From mshefty at ichips.intel.com  Mon Feb  5 09:07:34 2007
From: mshefty at ichips.intel.com (Sean Hefty)
Date: Mon, 05 Feb 2007 09:07:34 -0800
Subject: [openib-general] [PATCH] RE:  regression in ofed 1.2
In-Reply-To: <20070204131500.GE14288@mellanox.co.il>
References: <1170355331.16637.25.camel@stevo-desktop>
	<20070204131500.GE14288@mellanox.co.il>
Message-ID: <45C76456.6090804@ichips.intel.com>

>>The name is "ib_mcast_wq" which is too long for older kernels.
>>
>>Did we loose a backport patch?
> 
> 
> Not sure what happened here.
> Sean, could you rename ib_mcast_wq to ib_mcast please?

I renamed the workqueue for what I requested to pull upstream, and I added a 
patch to my pull request to rename a couple of other workqueues.

Didn't you already apply a rename patch to the ofed code?

- Sean


From halr at voltaire.com  Mon Feb  5 09:13:12 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 05 Feb 2007 12:13:12 -0500
Subject: [openib-general] Unknown SMP Recv
In-Reply-To: <001401c74946$a664a2e0$21606d86@one7>
References: <000901c74938$e10b2a30$21606d86@one7>
	<1170689654.4525.201415.camel@hal.voltaire.com>
	<001401c74946$a664a2e0$21606d86@one7>
Message-ID: <1170695591.4525.207604.camel@hal.voltaire.com>

On Mon, 2007-02-05 at 11:56, Michael Arndt wrote:
> Hi,
> 
> > A router should not allow a SMP to cross a subnet boundary. SMPs are
> > restricted to the local subnet.
> 
> I work on a discovering mechanism for switchless InfiniBand Architectures 
> like Rings, Tori or maybe Hyper-Cubes. There is just one single subnet, no 
> switches or routers. Please ignore the background and focus to the problem 
> about the second packet. Maybe you have some ideas even you are not involved 
> in the hole project. That would be nice.

Guess you don't mean IB router when you say router in your description.

I also have no theories without more information:

Is the sender a normal node ? Is normal node mean standard OpenIB
without changes ? How was the SMI changed ? On which nodes ? Only the
intermediate one ?

Aside from the initial path being [0][1][1], what are the hop count and
hop pointer ? What are DrDLID and DrSLID as well as the LIDs in the LRH
of the SMP ?

-- Hal

> Thanks Michael 


From swise at opengridcomputing.com  Mon Feb  5 09:19:21 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Mon, 05 Feb 2007 11:19:21 -0600
Subject: [openib-general] cxgb3.git tree merged to 2.6.20
Message-ID: <1170695961.16661.26.camel@stevo-desktop>

All, 

I've updated my tree git://staging.openfabrics.org/~swise/cxgb3.git to
linux-2.6.20.  

Branches:

cxgb3 - my development branch with commits that were used to review the
rdma driver (large patch series) + the T3 Ethernet driver.  

for-roland - branch where roland can pull the latest rdma driver (the
same code that is in OFED 1.2)

for-ofed_1_2 - branch used to deliver the original ethernet and rdma
driver code to the ofed_1_2 tree.  It is up to date with the ofed_1_2
tree wrt the drivers. 


Steve.


From suri at baymicrosystems.com  Mon Feb  5 09:31:02 2007
From: suri at baymicrosystems.com (Suresh Shelvapille)
Date: Mon, 5 Feb 2007 12:31:02 -0500
Subject: [openib-general] patches to 2.6.19.1 kernel for switch Operation
In-Reply-To: <1170072757.4555.242192.camel@hal.voltaire.com>
References: <000601c7419f$d4470c60$ff0da8c0@amr.corp.intel.com>
	<1170072757.4555.242192.camel@hal.voltaire.com>
Message-ID: <039701c7494b$6bd5d860$1914a8c0@surioffice>


Hal:

We are upgrading to 2.6.19.1 kernel and I finally ported the changes
required for Switch operation from my current kernel (2.6.12) version. 

I have tested these changes for a switch with different SM(s). But I need
the community's help to test the changes on different HCAs to make sure I
have not broken anything.

Please see if the changes look OK.

Thanks,
Suri


-------------- next part --------------
A non-text attachment was scrubbed...
Name: smi.c.ptch
Type: application/octet-stream
Size: 1257 bytes
Desc: not available
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070205/e1c0dc54/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: agent.c.ptch
Type: application/octet-stream
Size: 1079 bytes
Desc: not available
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070205/e1c0dc54/attachment-0001.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: mad.c.ptch
Type: application/octet-stream
Size: 3501 bytes
Desc: not available
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070205/e1c0dc54/attachment-0002.obj>

From akepner at sgi.com  Mon Feb  5 09:33:22 2007
From: akepner at sgi.com (akepner at sgi.com)
Date: Mon, 5 Feb 2007 09:33:22 -0800 (PST)
Subject: [openib-general] idea for ofed 1 2 kernel file structure
In-Reply-To: <6C2C79E72C305246B504CBA17B5500C905DC04@mtlexch01.mtl.com>
References: <6C2C79E72C305246B504CBA17B5500C905DC04@mtlexch01.mtl.com>
Message-ID: <Pine.LNX.4.61.0702050930360.26852@localhost.localdomain>

On Sun, 4 Feb 2007, Michael S. Tsirkin wrote:

> Hi!
> I looked a current ofed 1.2 kernel tree and there is 1 thing I dislike:
> It is hard to see changes that are specific to OFED since we have whole
> kernel history mixed in.

I agree.

>
> It would easy to split OFED specific files In separate directory and
> have OFED scripts combine that with upstream kernel.
>
> All out of tree modules we distribute would go there too.
> What do others think about this?
>

I like that idea very much.

-- 
Arthur


From or.gerlitz at gmail.com  Mon Feb  5 10:16:00 2007
From: or.gerlitz at gmail.com (Or Gerlitz)
Date: Mon, 5 Feb 2007 20:16:00 +0200
Subject: [openib-general] Immediate data question
In-Reply-To: <349DCDA352EACF42A0C49FA6DCEA840350AAC4@G3W0634.americas.hpqcorp.net>
References: <6C2C79E72C305246B504CBA17B5500C905DC04@mtlexch01.mtl.com>
	<adaveigvg7q.fsf@cisco.com>
	<349DCDA352EACF42A0C49FA6DCEA840350AAC4@G3W0634.americas.hpqcorp.net>
Message-ID: <15ddcffd0702051016x4587a6das87c4ef116296662b@mail.gmail.com>

On 2/5/07, Tang, Changqing <changquing.tang at hp.com> wrote:

> On sender side:
>         opcode = IBV_WR_SEND_WITH_IMM;
>         imm_data = my_4_bytes_data;
>         Do I still need to specify sg_list and num_sge ?

At the sender side i think you can do well with:
opcode = IBV_WR_SEND
send_flags |= IBV_SEND_INLINE
sge.addr = pointer to the 4 bytes
sge.len   = 4
sge.lkey = don't care

since the 4 bytes are --copied-- by the IB library from sge.addr
during the execution of ibv_post_send(), the owenership of sge.addr is
yours once the call returns.

> On receiver side, because the immediate data is inside the completion
> structure, do I need to post a receive for above message ?

yes, i don't see how you can get a way from posting a receive WR

> The reason I ask is that at some point, I can not(or hard) to provide
> registered memory only for 4 bytes data.

what about the mpi impl. header ??? do you have a case where only 4
bytes need to be passed to the other side?

Or.


From mst at mellanox.co.il  Mon Feb  5 10:42:07 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Mon, 5 Feb 2007 20:42:07 +0200
Subject: [openib-general] [PATCH] RE:  regression in ofed 1.2
In-Reply-To: <45C76456.6090804@ichips.intel.com>
References: <45C76456.6090804@ichips.intel.com>
Message-ID: <20070205184207.GB15775@mellanox.co.il>

> Quoting Sean Hefty <mshefty at ichips.intel.com>:
> Subject: Re: [openib-general] [PATCH] RE:  regression in ofed 1.2
> 
> >>The name is "ib_mcast_wq" which is too long for older kernels.
> >>
> >>Did we loose a backport patch?
> > 
> > 
> > Not sure what happened here.
> > Sean, could you rename ib_mcast_wq to ib_mcast please?
> 
> I renamed the workqueue for what I requested to pull upstream, and I added a 
> patch to my pull request to rename a couple of other workqueues.
> 
> Didn't you already apply a rename patch to the ofed code?

You but I assumed it's in your branch so I threw it out when I took your
latest code.

-- 
MST


From mst at mellanox.co.il  Mon Feb  5 10:42:46 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Mon, 5 Feb 2007 20:42:46 +0200
Subject: [openib-general] idea for ofed 1 2 kernel file structure
In-Reply-To: <Pine.LNX.4.61.0702050930360.26852@localhost.localdomain>
References: <Pine.LNX.4.61.0702050930360.26852@localhost.localdomain>
Message-ID: <20070205184246.GC15775@mellanox.co.il>

> Quoting akepner at sgi.com <akepner at sgi.com>:
> Subject: Re: [openib-general] idea for ofed 1 2 kernel file structure
> 
> On Sun, 4 Feb 2007, Michael S. Tsirkin wrote:
> 
> > Hi!
> > I looked a current ofed 1.2 kernel tree and there is 1 thing I dislike:
> > It is hard to see changes that are specific to OFED since we have whole
> > kernel history mixed in.
> 
> I agree.
> 
> >
> > It would easy to split OFED specific files In separate directory and
> > have OFED scripts combine that with upstream kernel.
> >
> > All out of tree modules we distribute would go there too.
> > What do others think about this?
> >
> 
> I like that idea very much.

Could you address Roland's proposal as well?

-- 
Arthur

-- 
MST


From mst at mellanox.co.il  Mon Feb  5 10:57:09 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Mon, 5 Feb 2007 20:57:09 +0200
Subject: [openib-general] Fwd: bug in mthca_qp.c (GEN 2)
Message-ID: <20070205185709.GB16598@mellanox.co.il>

Roland, what do you think?
Looks pretty severe actually.

----- Forwarded message from Jack Morgenstein <jackm at mellanox.co.il> -----

Subject: bug in mthca_qp.c (GEN 2)
Date: Mon, 5 Feb 2007 12:44:11 +0200
From: Jack Morgenstein <jackm at mellanox.co.il>

static void to_ib_ah_attr(struct mthca_dev *dev, struct ib_ah_attr *ib_ah_attr,
    struct mthca_qp_path *path)
{
 memset(ib_ah_attr, 0, sizeof *path);
 
SHOULD BE:
     memset(ib_ah_attr, 0, sizeof *ib_ah_attr);


----- End forwarded message -----

-- 
MST


From swise at opengridcomputing.com  Mon Feb  5 11:43:43 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Mon, 05 Feb 2007 13:43:43 -0600
Subject: [openib-general] [PATCH] ofed_1_2 - iw_cxgb3 - Add standard GPL
	header to tcb.h
Message-ID: <1170704623.16661.54.camel@stevo-desktop>

Add standard GPL header to tcb.h

From: Steve Wise <swise at opengridcomputing.com>

Signed-off-by: Steve Wise <swise at opengridcomputing.com>
---

 drivers/infiniband/hw/cxgb3/tcb.h |   33 +++++++++++++++++++++++++++++++--
 1 files changed, 31 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/hw/cxgb3/tcb.h b/drivers/infiniband/hw/cxgb3/tcb.h
index f287a7c..c702dc1 100644
--- a/drivers/infiniband/hw/cxgb3/tcb.h
+++ b/drivers/infiniband/hw/cxgb3/tcb.h
@@ -1,5 +1,34 @@
-/* This file is automatically generated --- do not edit */
-
+/*
+ * Copyright (c) 2007 Chelsio, Inc. All rights reserved.
+ *	
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
 #ifndef _TCB_DEFS_H
 #define _TCB_DEFS_H
 

From mst at mellanox.co.il  Mon Feb  5 12:12:23 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Mon, 5 Feb 2007 22:12:23 +0200
Subject: [openib-general] [PATCHv6 RFC] IPoIB CM Experimental support
Message-ID: <20070205201223.GD16598@mellanox.co.il>

The following patch adds experimental support for IPoIB connected mode.
The idea is to increase performance by increasing the MTU
from the maximum of 2K (theoretically 4K) supported by IPoIB on top of UD.
With this code, I'm able to get 800MByte/sec or more with netperf
without options on a Mellanox 4x back-to-back DDR system.

Some notes on code:
1. SRQ is used for scalability to large cluster sizes
2. Only RC connections are used (UC does not support SRQ now)
3. Retry count is set to 0 since spec draft warns against retries
4. Each connection is used for data transfers in only 1 direction,
   so each connection is either active(TX) or passive (RX).
   2 sides that want to communicate create 2 connections.
5. Each active (TX) connection has a separate CQ for send completions -
   this keeps the code simple without CQ resize and other tricks
6. To detect stale passive side connections (where the remote side
   is down), we keep an LRU list of passive connections (updated once
   per second per connection) and destroy a connection after it has been unused
   for several seconds. The LRU rule makes it possible to avoid scanning
   connections that have recently been active.

Signed-off-by: Michael S. Tsirkin <mst at mellanox.co.il>

---

OK, I have addressed the comment from Pradeep Satyanarayana
and added a small cosmetic improvement. This patch is hopefully the final
version for review before I request upstream merge in a couple of days.
This is the last call for comments before I submit it for upstream inclusion.

Please review.

Besides the 2 consmetic changes above this is just a rebase on top of Roland's
for-linus branch, so it should be functionally equivalent to what's in -mm now.
However, and just for the record, I can't access the lab now and might not be
able to do this tomorrow either - so this patch was only compile-tested.

This applies on top of Roland's for-linus tree.

I still keep the sysfs flag to enable/disable CM - this is safe,
but maybe we can go back to only looking at the device MTU, now that
multicast works?

Changes from PATCHv5:
- with debug enabled, show qpn instead of a pointer - this is prettier
- rename ipoib_cm_modify_rx_rts to ipoib_cm_modify_rx_qp, since the RX QP
  actually stays in RTR. Thanks to Pradeep Satyanarayana <pradeep at us.ibm.com>
  for pointing this out.
- Reduce MTU on connected->datagram mode change

Changes from PATCHv4:
- Fix TX ring full recovery when TX ring is destroyed (bug 320)

Changes from PATCHv3:
- Fix TX ring full recovery
- Whitespace fix

Changes from PATCHv2:
- Using path MTU discovery, multicast and UDP traffic to UD mode now work,
  only a small number of packets is dropped.
- Use timer to clean up stale RX connections
- Make CM use same CQ IPoIB uses for UD (good for
  mixed UD/CM traffic and for NAPI if we ever enable it)
- Tone down warning messages - only some packets are now dropped
  in CM/UD setup

CM support is also still labeled as experimental, and set it to disabled by
default, although its been very stable for me, and the code is complete
as far as I'm concerned. Is it be easier to merge it this way?

Note that the connected mode support adds very little overhead when not activated
at run time, and zero data-path overhead when not activated at compile time.
Here's a short description of what the patch does:

a. The code is here:
git://git.openfabrics.org/~mst/linux-2.6 ipoib-cm-for-roland
>git show
will show this patch

b. How to activate:
Server:
#modprobe ib_ipoib
#echo connected > /sys/class/net/ib0/mode
#/sbin/ifconfig ib0 mtu 65520
#./netperf-2.4.2/src/netserver

Client:
#modprobe ib_ipoib
#echo connected > /sys/class/net/ib0/mode
#/sbin/ifconfig ib0 mtu 65520
#./netperf-2.4.2/src/netperf -H 11.4.3.68 -f M
        TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 11.4.3.68 (11.4.3.68)
        port 0 AF_INET : demo
        Recv   Send    Send
        Socket Socket  Message  Elapsed
        Size   Size    Size     Time     Throughput
        bytes  bytes   bytes    secs.    MBytes/sec

        87380  16384  16384    10.01     891.21
c. TODO list
(Optional) Send side S/G support

d. Limitations
With MTU > 2044, UDP multicast and UDP connections to IPoIB UD mode
currently will drop some packets since we sometimes get packets that are
too large to send over a UD QP. Typically a single packet will be dropped
each several minutes until path MTU discovery kicks in and lowers
the path MTU to this destination.


diff --git a/drivers/infiniband/ulp/ipoib/Kconfig b/drivers/infiniband/ulp/ipoib/Kconfig
index c75322d..0ffca11 100644
--- a/drivers/infiniband/ulp/ipoib/Kconfig
+++ b/drivers/infiniband/ulp/ipoib/Kconfig
@@ -8,6 +8,20 @@ config INFINIBAND_IPOIB
 
 	  See Documentation/infiniband/ipoib.txt for more information
 
+config INFINIBAND_IPOIB_CM
+	bool "IP-over-InfiniBand Connected Mode support"
+	depends on INFINIBAND_IPOIB && EXPERIMENTAL
+	default n
+	---help---
+	  This option enables experimental support for IPoIB connected mode.
+	  After enabling this option, you need to switch to connected mode through
+	  /sys/class/net/ibXXX/mode to actually create connections, and then increase
+	  the interface MTU with e.g. ifconfig ib0 mtu 65520.
+
+	  WARNING: Enabling connected mode will trigger some
+	  packet drops for multicast and UD mode traffic from this interface,
+	  unless you limit mtu for these destinations to 2044.
+
 config INFINIBAND_IPOIB_DEBUG
 	bool "IP-over-InfiniBand debugging" if EMBEDDED
 	depends on INFINIBAND_IPOIB
diff --git a/drivers/infiniband/ulp/ipoib/Makefile b/drivers/infiniband/ulp/ipoib/Makefile
index 8935e74..98ee38e 100644
--- a/drivers/infiniband/ulp/ipoib/Makefile
+++ b/drivers/infiniband/ulp/ipoib/Makefile
@@ -5,5 +5,6 @@ ib_ipoib-y					:= ipoib_main.o \
 						   ipoib_multicast.o \
 						   ipoib_verbs.o \
 						   ipoib_vlan.o
+ib_ipoib-$(CONFIG_INFINIBAND_IPOIB_CM)		+= ipoib_cm.o
 ib_ipoib-$(CONFIG_INFINIBAND_IPOIB_DEBUG)	+= ipoib_fs.o
 
diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h b/drivers/infiniband/ulp/ipoib/ipoib.h
index 07deee8..8082d50 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib.h
+++ b/drivers/infiniband/ulp/ipoib/ipoib.h
@@ -62,6 +62,10 @@ enum {
 
 	IPOIB_ENCAP_LEN 	  = 4,
 
+	IPOIB_CM_MTU              = 0x10000 - 0x10, /* padding to align header to 16 */
+	IPOIB_CM_BUF_SIZE         = IPOIB_CM_MTU  + IPOIB_ENCAP_LEN,
+	IPOIB_CM_HEAD_SIZE 	  = IPOIB_CM_BUF_SIZE % PAGE_SIZE,
+	IPOIB_CM_RX_SG            = ALIGN(IPOIB_CM_BUF_SIZE, PAGE_SIZE) / PAGE_SIZE,
 	IPOIB_RX_RING_SIZE 	  = 128,
 	IPOIB_TX_RING_SIZE 	  = 64,
 	IPOIB_MAX_QUEUE_SIZE	  = 8192,
@@ -81,6 +85,8 @@ enum {
 	IPOIB_MCAST_RUN 	  = 6,
 	IPOIB_STOP_REAPER         = 7,
 	IPOIB_MCAST_STARTED       = 8,
+	IPOIB_FLAG_NETIF_STOPPED  = 9,
+	IPOIB_FLAG_ADMIN_CM 	  = 10,
 
 	IPOIB_MAX_BACKOFF_SECONDS = 16,
 
@@ -90,6 +96,14 @@ enum {
 	IPOIB_MCAST_FLAG_ATTACHED = 3,
 };
 
+
+#define	IPOIB_OP_RECV   (1ul << 31)
+#ifdef CONFIG_INFINIBAND_IPOIB_CM
+#define	IPOIB_CM_OP_SRQ (1ul << 30)
+#else
+#define	IPOIB_CM_OP_SRQ (0)
+#endif
+
 /* structs */
 
 struct ipoib_header {
@@ -113,6 +127,61 @@ struct ipoib_tx_buf {
 	u64		mapping;
 };
 
+#ifdef CONFIG_INFINIBAND_IPOIB_CM
+struct ib_cm_id;
+
+struct ipoib_cm_data {
+	__be32 qpn; /* High byte MUST be ignored on receive */
+	__be32 mtu;
+};
+
+struct ipoib_cm_rx {
+	struct ib_cm_id     *id;
+	struct ib_qp        *qp;
+	struct list_head     list;
+	struct net_device   *dev;
+	unsigned long        jiffies;
+};
+
+struct ipoib_cm_tx {
+	struct ib_cm_id     *id;
+	struct ib_cq        *cq;
+	struct ib_qp        *qp;
+	struct list_head     list;
+	struct net_device   *dev;
+	struct ipoib_neigh  *neigh;
+	struct ipoib_path   *path;
+	struct ipoib_tx_buf *tx_ring;
+	unsigned             tx_head;
+	unsigned             tx_tail;
+	unsigned long        flags;
+	u32                  mtu;
+	struct ib_wc         ibwc[IPOIB_NUM_WC];
+};
+
+struct ipoib_cm_rx_buf {
+	struct sk_buff *skb;
+	u64 mapping[IPOIB_CM_RX_SG];
+};
+
+struct ipoib_cm_dev_priv {
+	struct ib_srq  	       *srq;
+	struct ipoib_cm_rx_buf *srq_ring;
+	struct ib_cm_id        *id;
+	struct list_head        passive_ids;
+	struct work_struct      start_task;
+	struct work_struct      reap_task;
+	struct work_struct      skb_task;
+	struct delayed_work     stale_task;
+	struct sk_buff_head     skb_queue;
+	struct list_head        start_list;
+	struct list_head        reap_list;
+	struct ib_wc            ibwc[IPOIB_NUM_WC];
+	struct ib_sge           rx_sge[IPOIB_CM_RX_SG];
+	struct ib_recv_wr       rx_wr;
+};
+
+#endif
 /*
  * Device private locking: tx_lock protects members used in TX fast
  * path (and we use LLTX so upper layers don't do extra locking).
@@ -179,6 +248,10 @@ struct ipoib_dev_priv {
 	struct list_head child_intfs;
 	struct list_head list;
 
+#ifdef CONFIG_INFINIBAND_IPOIB_CM
+	struct ipoib_cm_dev_priv cm;
+#endif
+
 #ifdef CONFIG_INFINIBAND_IPOIB_DEBUG
 	struct list_head fs_list;
 	struct dentry *mcg_dentry;
@@ -212,6 +285,9 @@ struct ipoib_path {
 
 struct ipoib_neigh {
 	struct ipoib_ah    *ah;
+#ifdef CONFIG_INFINIBAND_IPOIB_CM
+	struct ipoib_cm_tx *cm;
+#endif
 	union ib_gid        dgid;
 	struct sk_buff_head queue;
 
@@ -315,6 +391,146 @@ int ipoib_vlan_delete(struct net_device *pdev, unsigned short pkey);
 void ipoib_pkey_poll(struct work_struct *work);
 int ipoib_pkey_dev_delay_open(struct net_device *dev);
 
+#ifdef CONFIG_INFINIBAND_IPOIB_CM
+
+#define IPOIB_FLAGS_RC          0x80
+#define IPOIB_FLAGS_UC          0x40
+
+/* We don't support UC connections at the moment */
+#define IPOIB_CM_SUPPORTED(ha)   (ha[0] & (IPOIB_FLAGS_RC))
+
+static inline int ipoib_cm_admin_enabled(struct net_device *dev)
+{
+	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	return IPOIB_CM_SUPPORTED(dev->dev_addr) &&
+		test_bit(IPOIB_FLAG_ADMIN_CM, &priv->flags);
+}
+
+static inline int ipoib_cm_enabled(struct net_device *dev, struct neighbour *n)
+{
+	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	return IPOIB_CM_SUPPORTED(n->ha) &&
+		test_bit(IPOIB_FLAG_ADMIN_CM, &priv->flags);
+}
+
+static inline int ipoib_cm_up(struct ipoib_neigh *neigh)
+
+{
+	return test_bit(IPOIB_FLAG_OPER_UP, &neigh->cm->flags);
+}
+
+static inline struct ipoib_cm_tx *ipoib_cm_get(struct ipoib_neigh *neigh)
+{
+	return neigh->cm;
+}
+
+static inline void ipoib_cm_set(struct ipoib_neigh *neigh, struct ipoib_cm_tx *tx)
+{
+	neigh->cm = tx;
+}
+
+void ipoib_cm_send(struct net_device *dev, struct sk_buff *skb, struct ipoib_cm_tx *tx);
+int ipoib_cm_dev_open(struct net_device *dev);
+void ipoib_cm_dev_stop(struct net_device *dev);
+int ipoib_cm_dev_init(struct net_device *dev);
+int ipoib_cm_add_mode_attr(struct net_device *dev);
+void ipoib_cm_dev_cleanup(struct net_device *dev);
+struct ipoib_cm_tx *ipoib_cm_create_tx(struct net_device *dev, struct ipoib_path *path,
+				    struct ipoib_neigh *neigh);
+void ipoib_cm_destroy_tx(struct ipoib_cm_tx *tx);
+void ipoib_cm_skb_too_long(struct net_device* dev, struct sk_buff *skb,
+			   unsigned int mtu);
+void ipoib_cm_handle_rx_wc(struct net_device *dev, struct ib_wc *wc);
+#else
+
+struct ipoib_cm_tx;
+
+static inline int ipoib_cm_admin_enabled(struct net_device *dev)
+{
+	return 0;
+}
+static inline int ipoib_cm_enabled(struct net_device *dev, struct neighbour *n)
+
+{
+	return 0;
+}
+
+static inline int ipoib_cm_up(struct ipoib_neigh *neigh)
+
+{
+	return 0;
+}
+
+static inline struct ipoib_cm_tx *ipoib_cm_get(struct ipoib_neigh *neigh)
+{
+	return NULL;
+}
+
+static inline void ipoib_cm_set(struct ipoib_neigh *neigh, struct ipoib_cm_tx *tx)
+{
+}
+
+static inline
+void ipoib_cm_send(struct net_device *dev, struct sk_buff *skb, struct ipoib_cm_tx *tx)
+{
+	return;
+}
+
+static inline
+int ipoib_cm_dev_open(struct net_device *dev)
+{
+	return 0;
+}
+
+static inline
+void ipoib_cm_dev_stop(struct net_device *dev)
+{
+	return;
+}
+
+static inline
+int ipoib_cm_dev_init(struct net_device *dev)
+{
+	return -ENOSYS;
+}
+
+static inline
+void ipoib_cm_dev_cleanup(struct net_device *dev)
+{
+	return;
+}
+
+static inline
+struct ipoib_cm_tx *ipoib_cm_create_tx(struct net_device *dev, struct ipoib_path *path,
+				    struct ipoib_neigh *neigh)
+{
+	return NULL;
+}
+
+static inline
+void ipoib_cm_destroy_tx(struct ipoib_cm_tx *tx)
+{
+	return;
+}
+
+static inline
+int ipoib_cm_add_mode_attr(struct net_device *dev)
+{
+	return 0;
+}
+
+static inline void ipoib_cm_skb_too_long(struct net_device* dev, struct sk_buff *skb,
+					 unsigned int mtu)
+{
+	dev_kfree_skb_any(skb);
+}
+
+static inline void ipoib_cm_handle_rx_wc(struct net_device *dev, struct ib_wc *wc)
+{
+}
+
+#endif
+
 #ifdef CONFIG_INFINIBAND_IPOIB_DEBUG
 void ipoib_create_debug_files(struct net_device *dev);
 void ipoib_delete_debug_files(struct net_device *dev);
@@ -392,4 +608,6 @@ extern int ipoib_debug_level;
 
 #define IPOIB_GID_ARG(gid)	IPOIB_GID_RAW_ARG((gid).raw)
 
+#define IPOIB_QPN(ha) (be32_to_cpup((__be32 *) ha) & 0xffffff)
+
 #endif /* _IPOIB_H */
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_cm.c b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
new file mode 100644
index 0000000..a618a40
--- /dev/null
+++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
@@ -0,0 +1,1236 @@
+/*
+ * Copyright (c) 2006 Mellanox Technologies. All rights reserved
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ *
+ * $Id$
+ */
+
+#include <rdma/ib_cm.h>
+#include <rdma/ib_cache.h>
+#include <net/dst.h>
+#include <net/icmp.h>
+
+#ifdef CONFIG_IPV6
+#include <linux/icmpv6.h>
+#endif
+
+#ifdef CONFIG_INFINIBAND_IPOIB_DEBUG_DATA
+static int data_debug_level;
+
+module_param_named(cm_data_debug_level, data_debug_level, int, 0644);
+MODULE_PARM_DESC(cm_data_debug_level,
+		 "Enable data path debug tracing for connected mode if > 0");
+#endif
+
+#include "ipoib.h"
+
+#define IPOIB_CM_IETF_ID 0x1000000000000000ULL
+
+#define IPOIB_CM_RX_UPDATE_TIME (256 * HZ)
+#define IPOIB_CM_RX_TIMEOUT     (2 * 256 * HZ)
+#define IPOIB_CM_RX_DELAY       (3 * 256 * HZ)
+#define IPOIB_CM_RX_UPDATE_MASK (0x3)
+
+struct ipoib_cm_id {
+	struct ib_cm_id *id;
+	int flags;
+	u32 remote_qpn;
+	u32 remote_mtu;
+};
+
+static int ipoib_cm_tx_handler(struct ib_cm_id *cm_id,
+			       struct ib_cm_event *event);
+
+static void ipoib_cm_dma_unmap_rx(struct ipoib_dev_priv *priv,
+				  u64 mapping[IPOIB_CM_RX_SG])
+{
+	int i;
+
+	ib_dma_unmap_single(priv->ca, mapping[0], IPOIB_CM_HEAD_SIZE, DMA_FROM_DEVICE);
+
+	for (i = 0; i < IPOIB_CM_RX_SG - 1; ++i)
+		ib_dma_unmap_single(priv->ca, mapping[i + 1], PAGE_SIZE, DMA_FROM_DEVICE);
+}
+
+static int ipoib_cm_post_receive(struct net_device *dev, int id)
+{
+	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	struct ib_recv_wr *bad_wr;
+	int i, ret;
+
+	priv->cm.rx_wr.wr_id = id | IPOIB_CM_OP_SRQ;
+
+	for (i = 0; i < IPOIB_CM_RX_SG; ++i)
+		priv->cm.rx_sge[i].addr = priv->cm.srq_ring[id].mapping[i];
+
+	ret = ib_post_srq_recv(priv->cm.srq, &priv->cm.rx_wr, &bad_wr);
+	if (unlikely(ret)) {
+		ipoib_warn(priv, "post srq failed for buf %d (%d)\n", id, ret);
+		ipoib_cm_dma_unmap_rx(priv, priv->cm.srq_ring[id].mapping);
+		dev_kfree_skb_any(priv->cm.srq_ring[id].skb);
+		priv->cm.srq_ring[id].skb = NULL;
+	}
+
+	return ret;
+}
+
+static int ipoib_cm_alloc_rx_skb(struct net_device *dev, int id,
+				 u64 mapping[IPOIB_CM_RX_SG])
+{
+	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	struct sk_buff *skb;
+	int i;
+
+	skb = dev_alloc_skb(IPOIB_CM_HEAD_SIZE + 12);
+	if (unlikely(!skb))
+		return -ENOMEM;
+
+	/*
+	 * IPoIB adds a 4 byte header. So we need 12 more bytes to align the
+	 * IP header to a multiple of 16.
+	 */
+	skb_reserve(skb, 12);
+
+	mapping[0] = ib_dma_map_single(priv->ca, skb->data, IPOIB_CM_HEAD_SIZE,
+				       DMA_FROM_DEVICE);
+	if (unlikely(ib_dma_mapping_error(priv->ca, mapping[0]))) {
+		dev_kfree_skb_any(skb);
+		return -EIO;
+	}
+
+	for (i = 0; i < IPOIB_CM_RX_SG - 1; i++) {
+		struct page *page = alloc_page(GFP_ATOMIC);
+
+		if (!page)
+			goto partial_error;
+		skb_fill_page_desc(skb, i, page, 0, PAGE_SIZE);
+
+		mapping[i + 1] = ib_dma_map_page(priv->ca, skb_shinfo(skb)->frags[i].page,
+						 0, PAGE_SIZE, DMA_TO_DEVICE);
+		if (unlikely(ib_dma_mapping_error(priv->ca, mapping[i + 1])))
+			goto partial_error;
+	}
+
+	priv->cm.srq_ring[id].skb = skb;
+	return 0;
+
+partial_error:
+
+	ib_dma_unmap_single(priv->ca, mapping[0], IPOIB_CM_HEAD_SIZE, DMA_FROM_DEVICE);
+
+	for (; i >= 0; --i)
+		ib_dma_unmap_single(priv->ca, mapping[i + 1], PAGE_SIZE, DMA_FROM_DEVICE);
+
+	kfree_skb(skb);
+	return -ENOMEM;
+}
+
+static struct ib_qp *ipoib_cm_create_rx_qp(struct net_device *dev,
+					   struct ipoib_cm_rx *p)
+{
+	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	struct ib_qp_init_attr attr = {
+		.send_cq = priv->cq, /* does not matter, we never send anything */
+		.recv_cq = priv->cq,
+		.srq = priv->cm.srq,
+		.cap.max_send_wr = 1, /* FIXME: 0 Seems not to work */
+		.cap.max_send_sge = 1, /* FIXME: 0 Seems not to work */
+		.sq_sig_type = IB_SIGNAL_ALL_WR,
+		.qp_type = IB_QPT_RC,
+		.qp_context = p,
+	};
+	return ib_create_qp(priv->pd, &attr);
+}
+
+static int ipoib_cm_modify_rx_qp(struct net_device *dev,
+				  struct ib_cm_id *cm_id, struct ib_qp *qp)
+{
+	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	struct ib_qp_attr qp_attr;
+	int qp_attr_mask, ret;
+
+	qp_attr.qp_state = IB_QPS_INIT;
+	ret = ib_cm_init_qp_attr(cm_id, &qp_attr, &qp_attr_mask);
+	if (ret) {
+		ipoib_warn(priv, "failed to init QP attr for INIT: %d\n", ret);
+		return ret;
+	}
+	ret = ib_modify_qp(qp, &qp_attr, qp_attr_mask);
+	if (ret) {
+		ipoib_warn(priv, "failed to modify QP to INIT: %d\n", ret);
+		return ret;
+	}
+	qp_attr.qp_state = IB_QPS_RTR;
+	ret = ib_cm_init_qp_attr(cm_id, &qp_attr, &qp_attr_mask);
+	if (ret) {
+		ipoib_warn(priv, "failed to init QP attr for RTR: %d\n", ret);
+		return ret;
+	}
+	qp_attr.rq_psn = 0 /* FIXME */;
+	ret = ib_modify_qp(qp, &qp_attr, qp_attr_mask);
+	if (ret) {
+		ipoib_warn(priv, "failed to modify QP to RTR: %d\n", ret);
+		return ret;
+	}
+	return 0;
+}
+
+static int ipoib_cm_send_rep(struct net_device *dev, struct ib_cm_id *cm_id,
+			     struct ib_qp *qp, struct ib_cm_req_event_param *req)
+{
+	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	struct ipoib_cm_data data = {};
+	struct ib_cm_rep_param rep = {};
+
+	data.qpn = cpu_to_be32(priv->qp->qp_num);
+	data.mtu = cpu_to_be32(IPOIB_CM_BUF_SIZE);
+
+	rep.private_data = &data;
+	rep.private_data_len = sizeof data;
+	rep.flow_control = 0;
+	rep.rnr_retry_count = req->rnr_retry_count;
+	rep.target_ack_delay = 20; /* FIXME */
+	rep.srq = 1;
+	rep.qp_num = qp->qp_num;
+	rep.starting_psn = 0 /* FIXME */;
+	return ib_send_cm_rep(cm_id, &rep);
+}
+
+static int ipoib_cm_req_handler(struct ib_cm_id *cm_id, struct ib_cm_event *event)
+{
+	struct net_device *dev = cm_id->context;
+	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	struct ipoib_cm_rx *p;
+	unsigned long flags;
+	int ret;
+
+	ipoib_dbg(priv, "REQ arrived\n");
+	p = kzalloc(sizeof *p, GFP_KERNEL);
+	if (!p)
+		return -ENOMEM;
+	p->dev = dev;
+	p->id = cm_id;
+	p->qp = ipoib_cm_create_rx_qp(dev, p);
+	if (IS_ERR(p->qp)) {
+		ret = PTR_ERR(p->qp);
+		goto err_qp;
+	}
+
+	ret = ipoib_cm_modify_rx_qp(dev, cm_id, p->qp);
+	if (ret)
+		goto err_modify;
+
+	ret = ipoib_cm_send_rep(dev, cm_id, p->qp, &event->param.req_rcvd);
+	if (ret) {
+		ipoib_warn(priv, "failed to send REP: %d\n", ret);
+		goto err_rep;
+	}
+
+	cm_id->context = p;
+	p->jiffies = jiffies;
+	spin_lock_irqsave(&priv->lock, flags);
+	list_add(&p->list, &priv->cm.passive_ids);
+	spin_unlock_irqrestore(&priv->lock, flags);
+	queue_delayed_work(ipoib_workqueue,
+			   &priv->cm.stale_task, IPOIB_CM_RX_DELAY);
+	return 0;
+
+err_rep:
+err_modify:
+	ib_destroy_qp(p->qp);
+err_qp:
+	kfree(p);
+	return ret;
+}
+
+static int ipoib_cm_rx_handler(struct ib_cm_id *cm_id,
+			       struct ib_cm_event *event)
+{
+	struct ipoib_cm_rx *p;
+	struct ipoib_dev_priv *priv;
+	unsigned long flags;
+	int ret;
+
+	switch (event->event) {
+	case IB_CM_REQ_RECEIVED:
+		return ipoib_cm_req_handler(cm_id, event);
+	case IB_CM_DREQ_RECEIVED:
+		p = cm_id->context;
+		ib_send_cm_drep(cm_id, NULL, 0);
+		/* Fall through */
+	case IB_CM_REJ_RECEIVED:
+		p = cm_id->context;
+		priv = netdev_priv(p->dev);
+		spin_lock_irqsave(&priv->lock, flags);
+		if (list_empty(&p->list))
+			ret = 0; /* Connection is going away already. */
+		else {
+			list_del_init(&p->list);
+			ret = -ECONNRESET;
+		}
+		spin_unlock_irqrestore(&priv->lock, flags);
+		if (ret) {
+			ib_destroy_qp(p->qp);
+			kfree(p);
+			return ret;
+		}
+		return 0;
+	default:
+		return 0;
+	}
+}
+/* Adjust length of skb with fragments to match received data */
+static void skb_put_frags(struct sk_buff *skb, unsigned int hdr_space,
+			  unsigned int length)
+{
+	int i, num_frags;
+	unsigned int size;
+
+	/* put header into skb */
+	size = min(length, hdr_space);
+	skb->tail += size;
+	skb->len += size;
+	length -= size;
+
+	num_frags = skb_shinfo(skb)->nr_frags;
+	for (i = 0; i < num_frags; i++) {
+		skb_frag_t *frag = &skb_shinfo(skb)->frags[i];
+
+		if (length == 0) {
+			/* don't need this page */
+			__free_page(frag->page);
+			--skb_shinfo(skb)->nr_frags;
+		} else {
+			size = min(length, (unsigned) PAGE_SIZE);
+
+			frag->size = size;
+			skb->data_len += size;
+			skb->truesize += size;
+			skb->len += size;
+			length -= size;
+		}
+	}
+}
+
+void ipoib_cm_handle_rx_wc(struct net_device *dev, struct ib_wc *wc)
+{
+	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	unsigned int wr_id = wc->wr_id & ~IPOIB_CM_OP_SRQ;
+	struct sk_buff *skb;
+	struct ipoib_cm_rx *p;
+	unsigned long flags;
+	u64 mapping[IPOIB_CM_RX_SG];
+
+	ipoib_dbg_data(priv, "cm recv completion: id %d, op %d, status: %d\n",
+		       wr_id, wc->opcode, wc->status);
+
+	if (unlikely(wr_id >= ipoib_recvq_size)) {
+		ipoib_warn(priv, "cm recv completion event with wrid %d (> %d)\n",
+			   wr_id, ipoib_recvq_size);
+		return;
+	}
+
+	skb  = priv->cm.srq_ring[wr_id].skb;
+
+	if (unlikely(wc->status != IB_WC_SUCCESS)) {
+		ipoib_dbg(priv, "cm recv error "
+			   "(status=%d, wrid=%d vend_err %x)\n",
+			   wc->status, wr_id, wc->vendor_err);
+		++priv->stats.rx_dropped;
+		goto repost;
+	}
+
+	if (!likely(wr_id & IPOIB_CM_RX_UPDATE_MASK)) {
+		p = wc->qp->qp_context;
+		if (time_after_eq(jiffies, p->jiffies + IPOIB_CM_RX_UPDATE_TIME)) {
+			spin_lock_irqsave(&priv->lock, flags);
+			p->jiffies = jiffies;
+			/* Move this entry to list head, but do
+			 * not re-add it if it has been removed. */
+			if (!list_empty(&p->list))
+				list_move(&p->list, &priv->cm.passive_ids);
+			spin_unlock_irqrestore(&priv->lock, flags);
+			queue_delayed_work(ipoib_workqueue,
+					   &priv->cm.stale_task, IPOIB_CM_RX_DELAY);
+		}
+	}
+
+	if (unlikely(ipoib_cm_alloc_rx_skb(dev, wr_id, mapping))) {
+		/*
+		 * If we can't allocate a new RX buffer, dump
+		 * this packet and reuse the old buffer.
+		 */
+		ipoib_dbg(priv, "failed to allocate receive buffer %d\n", wr_id);
+		++priv->stats.rx_dropped;
+		goto repost;
+	}
+
+	ipoib_cm_dma_unmap_rx(priv, priv->cm.srq_ring[wr_id].mapping);
+	memcpy(priv->cm.srq_ring[wr_id].mapping, mapping, sizeof mapping);
+
+	ipoib_dbg_data(priv, "received %d bytes, SLID 0x%04x\n",
+		       wc->byte_len, wc->slid);
+
+	skb_put_frags(skb, IPOIB_CM_HEAD_SIZE, wc->byte_len);
+
+	skb->protocol = ((struct ipoib_header *) skb->data)->proto;
+	skb->mac.raw = skb->data;
+	skb_pull(skb, IPOIB_ENCAP_LEN);
+
+	dev->last_rx = jiffies;
+	++priv->stats.rx_packets;
+	priv->stats.rx_bytes += skb->len;
+
+	skb->dev = dev;
+	/* XXX get correct PACKET_ type here */
+	skb->pkt_type = PACKET_HOST;
+	netif_rx_ni(skb);
+
+repost:
+	if (unlikely(ipoib_cm_post_receive(dev, wr_id)))
+		ipoib_warn(priv, "ipoib_cm_post_receive failed "
+			   "for buf %d\n", wr_id);
+}
+
+static inline int post_send(struct ipoib_dev_priv *priv,
+			    struct ipoib_cm_tx *tx,
+			    unsigned int wr_id,
+			    u64 addr, int len)
+{
+	struct ib_send_wr *bad_wr;
+
+	priv->tx_sge.addr             = addr;
+	priv->tx_sge.length           = len;
+
+	priv->tx_wr.wr_id 	      = wr_id;
+
+	return ib_post_send(tx->qp, &priv->tx_wr, &bad_wr);
+}
+
+void ipoib_cm_send(struct net_device *dev, struct sk_buff *skb, struct ipoib_cm_tx *tx)
+{
+	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	struct ipoib_tx_buf *tx_req;
+	u64 addr;
+
+	if (unlikely(skb->len > tx->mtu)) {
+		ipoib_warn(priv, "packet len %d (> %d) too long to send, dropping\n",
+			   skb->len, tx->mtu);
+		++priv->stats.tx_dropped;
+		++priv->stats.tx_errors;
+		ipoib_cm_skb_too_long(dev, skb, tx->mtu - INFINIBAND_ALEN);
+		return;
+	}
+
+	ipoib_dbg_data(priv, "sending packet: head 0x%x length %d connection 0x%x\n",
+		       tx->tx_head, skb->len, tx->qp->qp_num);
+
+	/*
+	 * We put the skb into the tx_ring _before_ we call post_send()
+	 * because it's entirely possible that the completion handler will
+	 * run before we execute anything after the post_send().  That
+	 * means we have to make sure everything is properly recorded and
+	 * our state is consistent before we call post_send().
+	 */
+	tx_req = &tx->tx_ring[tx->tx_head & (ipoib_sendq_size - 1)];
+	tx_req->skb = skb;
+	addr = ib_dma_map_single(priv->ca, skb->data, skb->len, DMA_TO_DEVICE);
+	if (unlikely(ib_dma_mapping_error(priv->ca, addr))) {
+		++priv->stats.tx_errors;
+		dev_kfree_skb_any(skb);
+		return;
+	}
+
+	tx_req->mapping = addr;
+
+	if (unlikely(post_send(priv, tx, tx->tx_head & (ipoib_sendq_size - 1),
+			        addr, skb->len))) {
+		ipoib_warn(priv, "post_send failed\n");
+		++priv->stats.tx_errors;
+		ib_dma_unmap_single(priv->ca, addr, skb->len, DMA_TO_DEVICE);
+		dev_kfree_skb_any(skb);
+	} else {
+		dev->trans_start = jiffies;
+		++tx->tx_head;
+
+		if (tx->tx_head - tx->tx_tail == ipoib_sendq_size) {
+			ipoib_dbg(priv, "TX ring 0x%x full, stopping kernel net queue\n",
+				  tx->qp->qp_num);
+			netif_stop_queue(dev);
+			set_bit(IPOIB_FLAG_NETIF_STOPPED, &tx->flags);
+		}
+	}
+}
+
+static void ipoib_cm_handle_tx_wc(struct net_device *dev, struct ipoib_cm_tx *tx,
+				  struct ib_wc *wc)
+{
+	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	unsigned int wr_id = wc->wr_id;
+	struct ipoib_tx_buf *tx_req;
+	unsigned long flags;
+
+	ipoib_dbg_data(priv, "cm send completion: id %d, op %d, status: %d\n",
+		       wr_id, wc->opcode, wc->status);
+
+	if (unlikely(wr_id >= ipoib_sendq_size)) {
+		ipoib_warn(priv, "cm send completion event with wrid %d (> %d)\n",
+			   wr_id, ipoib_sendq_size);
+		return;
+	}
+
+	tx_req = &tx->tx_ring[wr_id];
+
+	ib_dma_unmap_single(priv->ca, tx_req->mapping, tx_req->skb->len, DMA_TO_DEVICE);
+
+	/* FIXME: is this right? Shouldn't we only increment on success? */
+	++priv->stats.tx_packets;
+	priv->stats.tx_bytes += tx_req->skb->len;
+
+	dev_kfree_skb_any(tx_req->skb);
+
+	spin_lock_irqsave(&priv->tx_lock, flags);
+	++tx->tx_tail;
+	if (unlikely(test_bit(IPOIB_FLAG_NETIF_STOPPED, &tx->flags)) &&
+	    tx->tx_head - tx->tx_tail <= ipoib_sendq_size >> 1) {
+		clear_bit(IPOIB_FLAG_NETIF_STOPPED, &tx->flags);
+		netif_wake_queue(dev);
+	}
+
+	if (wc->status != IB_WC_SUCCESS &&
+	    wc->status != IB_WC_WR_FLUSH_ERR) {
+		struct ipoib_neigh *neigh;
+
+		ipoib_dbg(priv, "failed cm send event "
+			   "(status=%d, wrid=%d vend_err %x)\n",
+			   wc->status, wr_id, wc->vendor_err);
+
+		spin_lock(&priv->lock);
+		neigh = tx->neigh;
+
+		if (neigh) {
+			neigh->cm = NULL;
+			list_del(&neigh->list);
+			if (neigh->ah)
+				ipoib_put_ah(neigh->ah);
+			ipoib_neigh_free(dev, neigh);
+
+			tx->neigh = NULL;
+		}
+
+		/* queue would be re-started anyway when TX is destroyed,
+		 * but it makes sense to do it ASAP here. */
+		if (test_and_clear_bit(IPOIB_FLAG_NETIF_STOPPED, &tx->flags))
+			netif_wake_queue(dev);
+
+		if (test_and_clear_bit(IPOIB_FLAG_INITIALIZED, &tx->flags)) {
+			list_move(&tx->list, &priv->cm.reap_list);
+			queue_work(ipoib_workqueue, &priv->cm.reap_task);
+		}
+
+		clear_bit(IPOIB_FLAG_OPER_UP, &tx->flags);
+
+		spin_unlock(&priv->lock);
+	}
+
+	spin_unlock_irqrestore(&priv->tx_lock, flags);
+}
+
+static void ipoib_cm_tx_completion(struct ib_cq *cq, void *tx_ptr)
+{
+	struct ipoib_cm_tx *tx = tx_ptr;
+	int n, i;
+
+	ib_req_notify_cq(cq, IB_CQ_NEXT_COMP);
+	do {
+		n = ib_poll_cq(cq, IPOIB_NUM_WC, tx->ibwc);
+		for (i = 0; i < n; ++i)
+			ipoib_cm_handle_tx_wc(tx->dev, tx, tx->ibwc + i);
+	} while (n == IPOIB_NUM_WC);
+}
+
+int ipoib_cm_dev_open(struct net_device *dev)
+{
+	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	int ret;
+
+	if (!IPOIB_CM_SUPPORTED(dev->dev_addr))
+		return 0;
+
+	priv->cm.id = ib_create_cm_id(priv->ca, ipoib_cm_rx_handler, dev);
+	if (IS_ERR(priv->cm.id)) {
+		printk(KERN_WARNING "%s: failed to create CM ID\n", priv->ca->name);
+		return IS_ERR(priv->cm.id);
+	}
+
+	ret = ib_cm_listen(priv->cm.id, cpu_to_be64(IPOIB_CM_IETF_ID | priv->qp->qp_num),
+			   0, NULL);
+	if (ret) {
+		printk(KERN_WARNING "%s: failed to listen on ID 0x%llx\n", priv->ca->name,
+		       IPOIB_CM_IETF_ID | priv->qp->qp_num);
+		ib_destroy_cm_id(priv->cm.id);
+		return ret;
+	}
+	return 0;
+}
+
+void ipoib_cm_dev_stop(struct net_device *dev)
+{
+	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	struct ipoib_cm_rx *p;
+	unsigned long flags;
+
+	if (!IPOIB_CM_SUPPORTED(dev->dev_addr))
+		return;
+
+	ib_destroy_cm_id(priv->cm.id);
+	spin_lock_irqsave(&priv->lock, flags);
+	while (!list_empty(&priv->cm.passive_ids)) {
+		p = list_entry(priv->cm.passive_ids.next, typeof(*p), list);
+		list_del_init(&p->list);
+		spin_unlock_irqrestore(&priv->lock, flags);
+		ib_destroy_cm_id(p->id);
+		ib_destroy_qp(p->qp);
+		kfree(p);
+		spin_lock_irqsave(&priv->lock, flags);
+	}
+	spin_unlock_irqrestore(&priv->lock, flags);
+
+	cancel_delayed_work(&priv->cm.stale_task);
+}
+
+static int ipoib_cm_rep_handler(struct ib_cm_id *cm_id, struct ib_cm_event *event)
+{
+	struct ipoib_cm_tx *p = cm_id->context;
+	struct ipoib_dev_priv *priv = netdev_priv(p->dev);
+	struct ipoib_cm_data *data = event->private_data;
+	struct sk_buff_head skqueue;
+	struct ib_qp_attr qp_attr;
+	int qp_attr_mask, ret;
+	struct sk_buff *skb;
+	unsigned long flags;
+
+	p->mtu = be32_to_cpu(data->mtu);
+
+	if (p->mtu < priv->dev->mtu + IPOIB_ENCAP_LEN) {
+		ipoib_warn(priv, "Rejecting connection: mtu %d < device mtu %d + 4\n",
+			   p->mtu, priv->dev->mtu);
+		return -EINVAL;
+	}
+
+	qp_attr.qp_state = IB_QPS_RTR;
+	ret = ib_cm_init_qp_attr(cm_id, &qp_attr, &qp_attr_mask);
+	if (ret) {
+		ipoib_warn(priv, "failed to init QP attr for RTR: %d\n", ret);
+		return ret;
+	}
+
+	qp_attr.rq_psn = 0 /* FIXME */;
+	ret = ib_modify_qp(p->qp, &qp_attr, qp_attr_mask);
+	if (ret) {
+		ipoib_warn(priv, "failed to modify QP to RTR: %d\n", ret);
+		return ret;
+	}
+
+	qp_attr.qp_state = IB_QPS_RTS;
+	ret = ib_cm_init_qp_attr(cm_id, &qp_attr, &qp_attr_mask);
+	if (ret) {
+		ipoib_warn(priv, "failed to init QP attr for RTS: %d\n", ret);
+		return ret;
+	}
+	ret = ib_modify_qp(p->qp, &qp_attr, qp_attr_mask);
+	if (ret) {
+		ipoib_warn(priv, "failed to modify QP to RTS: %d\n", ret);
+		return ret;
+	}
+
+	skb_queue_head_init(&skqueue);
+
+	spin_lock_irqsave(&priv->lock, flags);
+	set_bit(IPOIB_FLAG_OPER_UP, &p->flags);
+	if (p->neigh)
+		while ((skb = __skb_dequeue(&p->neigh->queue)))
+			__skb_queue_tail(&skqueue, skb);
+	spin_unlock_irqrestore(&priv->lock, flags);
+
+	while ((skb = __skb_dequeue(&skqueue))) {
+		skb->dev = p->dev;
+		if (dev_queue_xmit(skb))
+			ipoib_warn(priv, "dev_queue_xmit failed "
+				   "to requeue packet\n");
+	}
+
+	ret = ib_send_cm_rtu(cm_id, NULL, 0);
+	if (ret) {
+		ipoib_warn(priv, "failed to send RTU: %d\n", ret);
+		return ret;
+	}
+	return 0;
+}
+
+static struct ib_qp *ipoib_cm_create_tx_qp(struct net_device *dev, struct ib_cq *cq)
+{
+	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	struct ib_qp_init_attr attr = {};
+	attr.recv_cq = priv->cq;
+	attr.srq = priv->cm.srq;
+	attr.cap.max_send_wr = ipoib_sendq_size;
+	attr.cap.max_send_sge = 1;
+	attr.sq_sig_type = IB_SIGNAL_ALL_WR;
+	attr.qp_type = IB_QPT_RC;
+	attr.send_cq = cq;
+	return ib_create_qp(priv->pd, &attr);
+}
+
+static int ipoib_cm_send_req(struct net_device *dev,
+			     struct ib_cm_id *id, struct ib_qp *qp,
+			     u32 qpn,
+			     struct ib_sa_path_rec *pathrec)
+{
+	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	struct ipoib_cm_data data = {};
+	struct ib_cm_req_param req = {};
+
+	data.qpn = cpu_to_be32(priv->qp->qp_num);
+	data.mtu = cpu_to_be32(IPOIB_CM_BUF_SIZE);
+
+	req.primary_path 	      = pathrec;
+	req.alternate_path 	      = NULL;
+	req.service_id                = cpu_to_be64(IPOIB_CM_IETF_ID | qpn);
+	req.qp_num 		      = qp->qp_num;
+	req.qp_type 		      = qp->qp_type;
+	req.private_data 	      = &data;
+	req.private_data_len 	      = sizeof data;
+	req.flow_control 	      = 0;
+
+	req.starting_psn              = 0; /* FIXME */
+
+	/*
+	 * Pick some arbitrary defaults here; we could make these
+	 * module parameters if anyone cared about setting them.
+	 */
+	req.responder_resources	      = 4;
+	req.remote_cm_response_timeout = 20;
+	req.local_cm_response_timeout  = 20;
+	req.retry_count 	      = 0; /* RFC draft warns against retries */
+	req.rnr_retry_count 	      = 0; /* RFC draft warns against retries */
+	req.max_cm_retries 	      = 15;
+	req.srq 	              = 15;
+	return ib_send_cm_req(id, &req);
+}
+
+static int ipoib_cm_modify_tx_init(struct net_device *dev,
+				  struct ib_cm_id *cm_id, struct ib_qp *qp)
+{
+	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	struct ib_qp_attr qp_attr;
+	int qp_attr_mask, ret;
+	ret = ib_find_cached_pkey(priv->ca, priv->port, priv->pkey, &qp_attr.pkey_index);
+	if (ret) {
+		ipoib_warn(priv, "pkey 0x%x not in cache: %d\n", priv->pkey, ret);
+		return ret;
+	}
+
+	qp_attr.qp_state = IB_QPS_INIT;
+	qp_attr.qp_access_flags = IB_ACCESS_LOCAL_WRITE;
+	qp_attr.port_num = priv->port;
+	qp_attr_mask = IB_QP_STATE | IB_QP_ACCESS_FLAGS | IB_QP_PKEY_INDEX | IB_QP_PORT;
+
+	ret = ib_modify_qp(qp, &qp_attr, qp_attr_mask);
+	if (ret) {
+		ipoib_warn(priv, "failed to modify tx QP to INIT: %d\n", ret);
+		return ret;
+	}
+	return 0;
+}
+
+static int ipoib_cm_tx_init(struct ipoib_cm_tx *p, u32 qpn,
+			    struct ib_sa_path_rec *pathrec)
+{
+	struct ipoib_dev_priv *priv = netdev_priv(p->dev);
+	int ret;
+
+	p->tx_ring = kzalloc(ipoib_sendq_size * sizeof *p->tx_ring,
+				GFP_KERNEL);
+	if (!p->tx_ring) {
+		ipoib_warn(priv, "failed to allocate tx ring\n");
+		ret = -ENOMEM;
+		goto err_tx;
+	}
+
+	p->cq = ib_create_cq(priv->ca, ipoib_cm_tx_completion, NULL, p,
+			     ipoib_sendq_size + 1);
+	if (IS_ERR(p->cq)) {
+		ret = PTR_ERR(p->cq);
+		ipoib_warn(priv, "failed to allocate tx cq: %d\n", ret);
+		goto err_cq;
+	}
+
+	ret = ib_req_notify_cq(p->cq, IB_CQ_NEXT_COMP);
+	if (ret) {
+		ipoib_warn(priv, "failed to request completion notification: %d\n", ret);
+		goto err_req_notify;
+	}
+
+	p->qp = ipoib_cm_create_tx_qp(p->dev, p->cq);
+	if (IS_ERR(p->qp)) {
+		ret = PTR_ERR(p->qp);
+		ipoib_warn(priv, "failed to allocate tx qp: %d\n", ret);
+		goto err_qp;
+	}
+
+	p->id = ib_create_cm_id(priv->ca, ipoib_cm_tx_handler, p);
+	if (IS_ERR(p->id)) {
+		ret = PTR_ERR(p->id);
+		ipoib_warn(priv, "failed to create tx cm id: %d\n", ret);
+		goto err_id;
+	}
+
+	ret = ipoib_cm_modify_tx_init(p->dev, p->id,  p->qp);
+	if (ret) {
+		ipoib_warn(priv, "failed to modify tx qp to rtr: %d\n", ret);
+		goto err_modify;
+	}
+
+	ret = ipoib_cm_send_req(p->dev, p->id, p->qp, qpn, pathrec);
+	if (ret) {
+		ipoib_warn(priv, "failed to send cm req: %d\n", ret);
+		goto err_send_cm;
+	}
+
+	ipoib_dbg(priv, "Request connection 0x%x for gid " IPOIB_GID_FMT " qpn 0x%x\n",
+		  p->qp->qp_num, IPOIB_GID_ARG(pathrec->dgid), qpn);
+
+	return 0;
+
+err_send_cm:
+err_modify:
+	ib_destroy_cm_id(p->id);
+err_id:
+	p->id = NULL;
+	ib_destroy_qp(p->qp);
+err_req_notify:
+err_qp:
+	p->qp = NULL;
+	ib_destroy_cq(p->cq);
+err_cq:
+	p->cq = NULL;
+err_tx:
+	return ret;
+}
+
+static void ipoib_cm_tx_destroy(struct ipoib_cm_tx *p)
+{
+	struct ipoib_dev_priv *priv = netdev_priv(p->dev);
+	struct ipoib_tx_buf *tx_req;
+
+	ipoib_dbg(priv, "Destroy active connection 0x%x head 0x%x tail 0x%x\n",
+		  p->qp ? p->qp->qp_num : 0, p->tx_head, p->tx_tail);
+
+	if (p->id)
+		ib_destroy_cm_id(p->id);
+
+	if (p->qp)
+		ib_destroy_qp(p->qp);
+
+	if (p->cq)
+		ib_destroy_cq(p->cq);
+
+	if (test_bit(IPOIB_FLAG_NETIF_STOPPED, &p->flags))
+		netif_wake_queue(p->dev);
+
+	if (p->tx_ring) {
+		while ((int) p->tx_tail - (int) p->tx_head < 0) {
+			tx_req = &p->tx_ring[p->tx_tail & (ipoib_sendq_size - 1)];
+			ib_dma_unmap_single(priv->ca, tx_req->mapping, tx_req->skb->len,
+					 DMA_TO_DEVICE);
+			dev_kfree_skb_any(tx_req->skb);
+			++p->tx_tail;
+		}
+
+		kfree(p->tx_ring);
+	}
+
+	kfree(p);
+}
+
+static int ipoib_cm_tx_handler(struct ib_cm_id *cm_id,
+			       struct ib_cm_event *event)
+{
+	struct ipoib_cm_tx *tx = cm_id->context;
+	struct ipoib_dev_priv *priv = netdev_priv(tx->dev);
+	struct net_device *dev = priv->dev;
+	struct ipoib_neigh *neigh;
+	unsigned long flags;
+	int ret;
+
+	switch (event->event) {
+	case IB_CM_DREQ_RECEIVED:
+		ipoib_dbg(priv, "DREQ received.\n");
+		ib_send_cm_drep(cm_id, NULL, 0);
+		break;
+	case IB_CM_REP_RECEIVED:
+		ipoib_dbg(priv, "REP received.\n");
+		ret = ipoib_cm_rep_handler(cm_id, event);
+		if (ret)
+			ib_send_cm_rej(cm_id, IB_CM_REJ_CONSUMER_DEFINED,
+				       NULL, 0, NULL, 0);
+		break;
+	case IB_CM_REQ_ERROR:
+	case IB_CM_REJ_RECEIVED:
+	case IB_CM_TIMEWAIT_EXIT:
+		ipoib_dbg(priv, "CM error %d.\n", event->event);
+		spin_lock_irqsave(&priv->tx_lock, flags);
+		spin_lock(&priv->lock);
+		neigh = tx->neigh;
+
+		if (neigh) {
+			neigh->cm = NULL;
+			list_del(&neigh->list);
+			if (neigh->ah)
+				ipoib_put_ah(neigh->ah);
+			ipoib_neigh_free(dev, neigh);
+
+			tx->neigh = NULL;
+		}
+
+		if (test_and_clear_bit(IPOIB_FLAG_INITIALIZED, &tx->flags)) {
+			list_move(&tx->list, &priv->cm.reap_list);
+			queue_work(ipoib_workqueue, &priv->cm.reap_task);
+		}
+
+		spin_unlock(&priv->lock);
+		spin_unlock_irqrestore(&priv->tx_lock, flags);
+		break;
+	default:
+		break;
+	}
+
+	return 0;
+}
+
+struct ipoib_cm_tx *ipoib_cm_create_tx(struct net_device *dev, struct ipoib_path *path,
+				       struct ipoib_neigh *neigh)
+{
+	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	struct ipoib_cm_tx *tx;
+
+	tx = kzalloc(sizeof *tx, GFP_ATOMIC);
+	if (!tx)
+		return NULL;
+
+	neigh->cm = tx;
+	tx->neigh = neigh;
+	tx->path = path;
+	tx->dev = dev;
+	list_add(&tx->list, &priv->cm.start_list);
+	set_bit(IPOIB_FLAG_INITIALIZED, &tx->flags);
+	queue_work(ipoib_workqueue, &priv->cm.start_task);
+	return tx;
+}
+
+void ipoib_cm_destroy_tx(struct ipoib_cm_tx *tx)
+{
+	struct ipoib_dev_priv *priv = netdev_priv(tx->dev);
+	if (test_and_clear_bit(IPOIB_FLAG_INITIALIZED, &tx->flags)) {
+		list_move(&tx->list, &priv->cm.reap_list);
+		queue_work(ipoib_workqueue, &priv->cm.reap_task);
+		ipoib_dbg(priv, "Reap connection for gid " IPOIB_GID_FMT "\n",
+			  IPOIB_GID_ARG(tx->neigh->dgid));
+		tx->neigh = NULL;
+	}
+}
+
+static void ipoib_cm_tx_start(struct work_struct *work)
+{
+	struct ipoib_dev_priv *priv = container_of(work, struct ipoib_dev_priv,
+						   cm.start_task);
+	struct net_device *dev = priv->dev;
+	struct ipoib_neigh *neigh;
+	struct ipoib_cm_tx *p;
+	unsigned long flags;
+	int ret;
+
+	struct ib_sa_path_rec pathrec;
+	u32 qpn;
+
+	spin_lock_irqsave(&priv->tx_lock, flags);
+	spin_lock(&priv->lock);
+	while (!list_empty(&priv->cm.start_list)) {
+		p = list_entry(priv->cm.start_list.next, typeof(*p), list);
+		list_del_init(&p->list);
+		neigh = p->neigh;
+		qpn = IPOIB_QPN(neigh->neighbour->ha);
+		memcpy(&pathrec, &p->path->pathrec, sizeof pathrec);
+		spin_unlock(&priv->lock);
+		spin_unlock_irqrestore(&priv->tx_lock, flags);
+		ret = ipoib_cm_tx_init(p, qpn, &pathrec);
+		spin_lock_irqsave(&priv->tx_lock, flags);
+		spin_lock(&priv->lock);
+		if (ret) {
+			neigh = p->neigh;
+			if (neigh) {
+				neigh->cm = NULL;
+				list_del(&neigh->list);
+				if (neigh->ah)
+					ipoib_put_ah(neigh->ah);
+				ipoib_neigh_free(dev, neigh);
+			}
+			list_del(&p->list);
+			kfree(p);
+		}
+	}
+	spin_unlock(&priv->lock);
+	spin_unlock_irqrestore(&priv->tx_lock, flags);
+}
+
+static void ipoib_cm_tx_reap(struct work_struct *work)
+{
+	struct ipoib_dev_priv *priv = container_of(work, struct ipoib_dev_priv,
+						   cm.reap_task);
+	struct ipoib_cm_tx *p;
+	unsigned long flags;
+
+	spin_lock_irqsave(&priv->tx_lock, flags);
+	spin_lock(&priv->lock);
+	while (!list_empty(&priv->cm.reap_list)) {
+		p = list_entry(priv->cm.reap_list.next, typeof(*p), list);
+		list_del(&p->list);
+		spin_unlock(&priv->lock);
+		spin_unlock_irqrestore(&priv->tx_lock, flags);
+		ipoib_cm_tx_destroy(p);
+		spin_lock_irqsave(&priv->tx_lock, flags);
+		spin_lock(&priv->lock);
+	}
+	spin_unlock(&priv->lock);
+	spin_unlock_irqrestore(&priv->tx_lock, flags);
+}
+
+static void ipoib_cm_skb_reap(struct work_struct *work)
+{
+	struct ipoib_dev_priv *priv = container_of(work, struct ipoib_dev_priv,
+						   cm.skb_task);
+	struct net_device *dev = priv->dev;
+	struct sk_buff *skb;
+	unsigned long flags;
+
+	__be32 mtu = cpu_to_be32(priv->mcast_mtu);
+
+	spin_lock_irqsave(&priv->tx_lock, flags);
+	spin_lock(&priv->lock);
+	while ((skb = skb_dequeue(&priv->cm.skb_queue))) {
+		spin_unlock(&priv->lock);
+		spin_unlock_irqrestore(&priv->tx_lock, flags);
+		if (skb->protocol == htons(ETH_P_IP))
+			icmp_send(skb, ICMP_DEST_UNREACH, ICMP_FRAG_NEEDED, htonl(mtu));
+#ifdef CONFIG_IPV6
+		else if (skb->protocol == htons(ETH_P_IPV6))
+			icmpv6_send(skb, ICMPV6_PKT_TOOBIG, 0, mtu, dev);
+#endif
+		dev_kfree_skb_any(skb);
+		spin_lock_irqsave(&priv->tx_lock, flags);
+		spin_lock(&priv->lock);
+	}
+	spin_unlock(&priv->lock);
+	spin_unlock_irqrestore(&priv->tx_lock, flags);
+}
+
+void ipoib_cm_skb_too_long(struct net_device* dev, struct sk_buff *skb,
+			   unsigned int mtu)
+{
+	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	int e = skb_queue_empty(&priv->cm.skb_queue);
+
+	if (skb->dst)
+		skb->dst->ops->update_pmtu(skb->dst, mtu);
+
+	skb_queue_tail(&priv->cm.skb_queue, skb);
+	if (e)
+		queue_work(ipoib_workqueue, &priv->cm.skb_task);
+}
+
+static void ipoib_cm_stale_task(struct work_struct *work)
+{
+	struct ipoib_dev_priv *priv = container_of(work, struct ipoib_dev_priv,
+						   cm.stale_task.work);
+	struct ipoib_cm_rx *p;
+	unsigned long flags;
+
+	spin_lock_irqsave(&priv->lock, flags);
+	while (!list_empty(&priv->cm.passive_ids)) {
+		/* List if sorted by LRU, start from tail,
+		 * stop when we see a recently used entry */
+		p = list_entry(priv->cm.passive_ids.prev, typeof(*p), list);
+		if (time_after_eq(jiffies, p->jiffies + IPOIB_CM_RX_TIMEOUT))
+			break;
+		list_del_init(&p->list);
+		spin_unlock_irqrestore(&priv->lock, flags);
+		ib_destroy_cm_id(p->id);
+		ib_destroy_qp(p->qp);
+		kfree(p);
+		spin_lock_irqsave(&priv->lock, flags);
+	}
+	spin_unlock_irqrestore(&priv->lock, flags);
+}
+
+
+static ssize_t show_mode(struct class_device *cdev, char *buf)
+{
+	struct net_device *dev = container_of(cdev, struct net_device, class_dev);
+	struct ipoib_dev_priv *priv = netdev_priv(dev);
+
+	if (test_bit(IPOIB_FLAG_ADMIN_CM, &priv->flags))
+		return sprintf(buf, "connected\n");
+	else
+		return sprintf(buf, "datagram\n");
+}
+
+static ssize_t set_mode(struct class_device *cdev,
+			const char *buf, size_t count)
+{
+	struct net_device *dev = container_of(cdev, struct net_device, class_dev);
+	struct ipoib_dev_priv *priv = netdev_priv(dev);
+
+	/* flush paths if we switch modes so that connections are restarted */
+	if (IPOIB_CM_SUPPORTED(dev->dev_addr) && !strcmp(buf, "connected\n")) {
+		set_bit(IPOIB_FLAG_ADMIN_CM, &priv->flags);
+		ipoib_warn(priv, "enabling connected mode "
+			   "will cause multicast packet drops\n");
+		ipoib_flush_paths(dev);
+		return count;
+	}
+
+	if (!strcmp(buf, "datagram\n")) {
+		clear_bit(IPOIB_FLAG_ADMIN_CM, &priv->flags);
+		dev->mtu = min(priv->mcast_mtu, dev->mtu);
+		ipoib_flush_paths(dev);
+		return count;
+	}
+
+	return -EINVAL;
+}
+
+static CLASS_DEVICE_ATTR(mode, S_IWUGO | S_IRUGO, show_mode, set_mode);
+
+int ipoib_cm_add_mode_attr(struct net_device *dev)
+{
+	return class_device_create_file(&dev->class_dev, &class_device_attr_mode);
+}
+
+int ipoib_cm_dev_init(struct net_device *dev)
+{
+	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	struct ib_srq_init_attr srq_init_attr = {
+		.attr = {
+			.max_wr  = ipoib_recvq_size,
+			.max_sge = IPOIB_CM_RX_SG
+		}
+	};
+	int ret, i;
+
+	INIT_LIST_HEAD(&priv->cm.passive_ids);
+	INIT_LIST_HEAD(&priv->cm.reap_list);
+	INIT_LIST_HEAD(&priv->cm.start_list);
+	INIT_WORK(&priv->cm.start_task, ipoib_cm_tx_start);
+	INIT_WORK(&priv->cm.reap_task, ipoib_cm_tx_reap);
+	INIT_WORK(&priv->cm.skb_task, ipoib_cm_skb_reap);
+	INIT_DELAYED_WORK(&priv->cm.stale_task, ipoib_cm_stale_task);
+
+	skb_queue_head_init(&priv->cm.skb_queue);
+
+	priv->cm.srq = ib_create_srq(priv->pd, &srq_init_attr);
+	if (IS_ERR(priv->cm.srq)) {
+		ret = PTR_ERR(priv->cm.srq);
+		priv->cm.srq = NULL;
+		return ret;
+	}
+
+	priv->cm.srq_ring = kzalloc(ipoib_recvq_size * sizeof *priv->cm.srq_ring,
+				    GFP_KERNEL);
+	if (!priv->cm.srq_ring) {
+		printk(KERN_WARNING "%s: failed to allocate CM ring (%d entries)\n",
+		       priv->ca->name, ipoib_recvq_size);
+		ipoib_cm_dev_cleanup(dev);
+		return -ENOMEM;
+	}
+
+	for (i = 0; i < IPOIB_CM_RX_SG; ++i)
+		priv->cm.rx_sge[i].lkey	= priv->mr->lkey;
+
+	priv->cm.rx_sge[0].length = IPOIB_CM_HEAD_SIZE;
+	for (i = 1; i < IPOIB_CM_RX_SG; ++i)
+		priv->cm.rx_sge[i].length = PAGE_SIZE;
+	priv->cm.rx_wr.next = NULL;
+	priv->cm.rx_wr.sg_list = priv->cm.rx_sge;
+	priv->cm.rx_wr.num_sge = IPOIB_CM_RX_SG;
+
+	for (i = 0; i < ipoib_recvq_size; ++i) {
+		if (ipoib_cm_alloc_rx_skb(dev, i, priv->cm.srq_ring[i].mapping)) {
+			ipoib_warn(priv, "failed to allocate receive buffer %d\n", i);
+			ipoib_cm_dev_cleanup(dev);
+			return -ENOMEM;
+		}
+		if (ipoib_cm_post_receive(dev, i)) {
+			ipoib_warn(priv, "ipoib_ib_post_receive failed for buf %d\n", i);
+			ipoib_cm_dev_cleanup(dev);
+			return -EIO;
+		}
+	}
+
+	priv->dev->dev_addr[0] = IPOIB_FLAGS_RC;
+	return 0;
+}
+
+void ipoib_cm_dev_cleanup(struct net_device *dev)
+{
+	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	int i, ret;
+
+	if (!priv->cm.srq)
+		return;
+
+	ipoib_dbg(priv, "Cleanup ipoib connected mode.\n");
+
+	ret = ib_destroy_srq(priv->cm.srq);
+	if (ret)
+		ipoib_warn(priv, "ib_destroy_srq failed: %d\n", ret);
+
+	priv->cm.srq = NULL;
+	if (!priv->cm.srq_ring)
+		return;
+	for (i = 0; i < ipoib_recvq_size; ++i)
+		if (priv->cm.srq_ring[i].skb) {
+			ipoib_cm_dma_unmap_rx(priv, priv->cm.srq_ring[i].mapping);
+			dev_kfree_skb_any(priv->cm.srq_ring[i].skb);
+			priv->cm.srq_ring[i].skb = NULL;
+		}
+	kfree(priv->cm.srq_ring);
+	priv->cm.srq_ring = NULL;
+}
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_ib.c b/drivers/infiniband/ulp/ipoib/ipoib_ib.c
index 59d9594..f2aa923 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_ib.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_ib.c
@@ -50,8 +50,6 @@ MODULE_PARM_DESC(data_debug_level,
 		 "Enable data path debug tracing if > 0");
 #endif
 
-#define	IPOIB_OP_RECV	(1ul << 31)
-
 static DEFINE_MUTEX(pkey_mutex);
 
 struct ipoib_ah *ipoib_create_ah(struct net_device *dev,
@@ -268,10 +266,11 @@ static void ipoib_ib_handle_tx_wc(struct net_device *dev, struct ib_wc *wc)
 
 	spin_lock_irqsave(&priv->tx_lock, flags);
 	++priv->tx_tail;
-	if (netif_queue_stopped(dev) &&
-	    test_bit(IPOIB_FLAG_ADMIN_UP, &priv->flags) &&
-	    priv->tx_head - priv->tx_tail <= ipoib_sendq_size >> 1)
+	if (unlikely(test_bit(IPOIB_FLAG_NETIF_STOPPED, &priv->flags)) &&
+	    priv->tx_head - priv->tx_tail <= ipoib_sendq_size >> 1) {
+		clear_bit(IPOIB_FLAG_NETIF_STOPPED, &priv->flags);
 		netif_wake_queue(dev);
+	}
 	spin_unlock_irqrestore(&priv->tx_lock, flags);
 
 	if (wc->status != IB_WC_SUCCESS &&
@@ -283,7 +282,9 @@ static void ipoib_ib_handle_tx_wc(struct net_device *dev, struct ib_wc *wc)
 
 static void ipoib_ib_handle_wc(struct net_device *dev, struct ib_wc *wc)
 {
-	if (wc->wr_id & IPOIB_OP_RECV)
+	if (wc->wr_id & IPOIB_CM_OP_SRQ)
+		ipoib_cm_handle_rx_wc(dev, wc);
+	else if (wc->wr_id & IPOIB_OP_RECV)
 		ipoib_ib_handle_rx_wc(dev, wc);
 	else
 		ipoib_ib_handle_tx_wc(dev, wc);
@@ -327,12 +328,12 @@ void ipoib_send(struct net_device *dev, struct sk_buff *skb,
 	struct ipoib_tx_buf *tx_req;
 	u64 addr;
 
-	if (unlikely(skb->len > dev->mtu + INFINIBAND_ALEN)) {
+	if (unlikely(skb->len > priv->mcast_mtu + INFINIBAND_ALEN)) {
 		ipoib_warn(priv, "packet len %d (> %d) too long to send, dropping\n",
-			   skb->len, dev->mtu + INFINIBAND_ALEN);
+			   skb->len, priv->mcast_mtu + INFINIBAND_ALEN);
 		++priv->stats.tx_dropped;
 		++priv->stats.tx_errors;
-		dev_kfree_skb_any(skb);
+		ipoib_cm_skb_too_long(dev, skb, priv->mcast_mtu);
 		return;
 	}
 
@@ -372,6 +373,7 @@ void ipoib_send(struct net_device *dev, struct sk_buff *skb,
 		if (priv->tx_head - priv->tx_tail == ipoib_sendq_size) {
 			ipoib_dbg(priv, "TX ring full, stopping kernel net queue\n");
 			netif_stop_queue(dev);
+			set_bit(IPOIB_FLAG_NETIF_STOPPED, &priv->flags);
 		}
 	}
 }
@@ -424,6 +426,13 @@ int ipoib_ib_dev_open(struct net_device *dev)
 		return -1;
 	}
 
+	ret = ipoib_cm_dev_open(dev);
+	if (ret) {
+		ipoib_warn(priv, "ipoib_ib_post_receives returned %d\n", ret);
+		ipoib_ib_dev_stop(dev);
+		return -1;
+	}
+
 	clear_bit(IPOIB_STOP_REAPER, &priv->flags);
 	queue_delayed_work(ipoib_workqueue, &priv->ah_reap_task, HZ);
 
@@ -509,6 +518,8 @@ int ipoib_ib_dev_stop(struct net_device *dev)
 
 	clear_bit(IPOIB_FLAG_INITIALIZED, &priv->flags);
 
+	ipoib_cm_dev_stop(dev);
+
 	/*
 	 * Move our QP to the error state and then reinitialize in
 	 * when all work requests have completed or have been flushed.
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c
index 705eb1d..19e82db 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
@@ -49,8 +49,6 @@
 
 #include <net/dst.h>
 
-#define IPOIB_QPN(ha) (be32_to_cpup((__be32 *) ha) & 0xffffff)
-
 MODULE_AUTHOR("Roland Dreier");
 MODULE_DESCRIPTION("IP-over-InfiniBand net driver");
 MODULE_LICENSE("Dual BSD/GPL");
@@ -145,6 +143,8 @@ static int ipoib_stop(struct net_device *dev)
 
 	netif_stop_queue(dev);
 
+	clear_bit(IPOIB_FLAG_NETIF_STOPPED, &priv->flags);
+
 	/*
 	 * Now flush workqueue to make sure a scheduled task doesn't
 	 * bring our internal state back up.
@@ -178,8 +178,18 @@ static int ipoib_change_mtu(struct net_device *dev, int new_mtu)
 {
 	struct ipoib_dev_priv *priv = netdev_priv(dev);
 
-	if (new_mtu > IPOIB_PACKET_SIZE - IPOIB_ENCAP_LEN)
+	/* dev->mtu > 2K ==> connected mode */
+	if (ipoib_cm_admin_enabled(dev) && new_mtu <= IPOIB_CM_MTU) {
+		if (new_mtu > priv->mcast_mtu)
+			ipoib_warn(priv, "mtu > %d will cause multicast packet drops.\n",
+				   priv->mcast_mtu);
+		dev->mtu = new_mtu;
+		return 0;
+	}
+
+	if (new_mtu > IPOIB_PACKET_SIZE - IPOIB_ENCAP_LEN) {
 		return -EINVAL;
+	}
 
 	priv->admin_mtu = new_mtu;
 
@@ -414,6 +424,20 @@ static void path_rec_completion(int status,
 			memcpy(&neigh->dgid.raw, &path->pathrec.dgid.raw,
 			       sizeof(union ib_gid));
 
+			if (ipoib_cm_enabled(dev, neigh->neighbour)) {
+				if (!ipoib_cm_get(neigh))
+					ipoib_cm_set(neigh, ipoib_cm_create_tx(dev,
+									       path,
+									       neigh));
+				if (!ipoib_cm_get(neigh)) {
+					list_del(&neigh->list);
+					if (neigh->ah)
+						ipoib_put_ah(neigh->ah);
+					ipoib_neigh_free(dev, neigh);
+					continue;
+				}
+			}
+
 			while ((skb = __skb_dequeue(&neigh->queue)))
 				__skb_queue_tail(&skqueue, skb);
 		}
@@ -520,7 +544,25 @@ static void neigh_add_path(struct sk_buff *skb, struct net_device *dev)
 		memcpy(&neigh->dgid.raw, &path->pathrec.dgid.raw,
 		       sizeof(union ib_gid));
 
-		ipoib_send(dev, skb, path->ah, IPOIB_QPN(skb->dst->neighbour->ha));
+		if (ipoib_cm_enabled(dev, neigh->neighbour)) {
+			if (!ipoib_cm_get(neigh))
+				ipoib_cm_set(neigh, ipoib_cm_create_tx(dev, path, neigh));
+			if (!ipoib_cm_get(neigh)) {
+				list_del(&neigh->list);
+				if (neigh->ah)
+					ipoib_put_ah(neigh->ah);
+				ipoib_neigh_free(dev, neigh);
+				goto err_drop;
+			}
+			if (skb_queue_len(&neigh->queue) < IPOIB_MAX_PATH_REC_QUEUE)
+				__skb_queue_tail(&neigh->queue, skb);
+			else {
+				ipoib_warn(priv, "queue length limit %d. Packet drop.\n",
+					   skb_queue_len(&neigh->queue));
+				goto err_drop;
+			}
+		} else
+			ipoib_send(dev, skb, path->ah, IPOIB_QPN(skb->dst->neighbour->ha));
 	} else {
 		neigh->ah  = NULL;
 
@@ -538,6 +580,7 @@ err_list:
 
 err_path:
 	ipoib_neigh_free(dev, neigh);
+err_drop:
 	++priv->stats.tx_dropped;
 	dev_kfree_skb_any(skb);
 
@@ -640,7 +683,12 @@ static int ipoib_start_xmit(struct sk_buff *skb, struct net_device *dev)
 
 		neigh = *to_ipoib_neigh(skb->dst->neighbour);
 
-		if (likely(neigh->ah)) {
+		if (ipoib_cm_get(neigh)) {
+			if (ipoib_cm_up(neigh)) {
+				ipoib_cm_send(dev, skb, ipoib_cm_get(neigh));
+				goto out;
+			}
+		} else if (neigh->ah) {
 			if (unlikely(memcmp(&neigh->dgid.raw,
 					    skb->dst->neighbour->ha + 4,
 					    sizeof(union ib_gid)))) {
@@ -805,6 +853,7 @@ struct ipoib_neigh *ipoib_neigh_alloc(struct neighbour *neighbour)
 	neigh->neighbour = neighbour;
 	*to_ipoib_neigh(neighbour) = neigh;
 	skb_queue_head_init(&neigh->queue);
+	ipoib_cm_set(neigh, NULL);
 
 	return neigh;
 }
@@ -818,6 +867,8 @@ void ipoib_neigh_free(struct net_device *dev, struct ipoib_neigh *neigh)
 		++priv->stats.tx_dropped;
 		dev_kfree_skb_any(skb);
 	}
+	if (ipoib_cm_get(neigh))
+		ipoib_cm_destroy_tx(ipoib_cm_get(neigh));
 	kfree(neigh);
 }
 
@@ -1081,6 +1132,8 @@ static struct net_device *ipoib_add_port(const char *format,
 
 	ipoib_create_debug_files(priv->dev);
 
+	if (ipoib_cm_add_mode_attr(priv->dev))
+		goto sysfs_failed;
 	if (ipoib_add_pkey_attr(priv->dev))
 		goto sysfs_failed;
 	if (class_device_create_file(&priv->dev->class_dev,
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
index b04b72c..fea737f 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
@@ -597,7 +597,9 @@ void ipoib_mcast_join_task(struct work_struct *work)
 
 	priv->mcast_mtu = ib_mtu_enum_to_int(priv->broadcast->mcmember.mtu) -
 		IPOIB_ENCAP_LEN;
-	dev->mtu = min(priv->mcast_mtu, priv->admin_mtu);
+
+	if (!ipoib_cm_admin_enabled(dev))
+		dev->mtu = min(priv->mcast_mtu, priv->admin_mtu);
 
 	ipoib_dbg_mcast(priv, "successfully joined all multicast groups\n");
 
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_verbs.c b/drivers/infiniband/ulp/ipoib/ipoib_verbs.c
index 7b717c6..3cb551b 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_verbs.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_verbs.c
@@ -168,35 +168,41 @@ int ipoib_transport_dev_init(struct net_device *dev, struct ib_device *ca)
 		.qp_type     = IB_QPT_UD
 	};
 
+	int ret, size;
+
 	priv->pd = ib_alloc_pd(priv->ca);
 	if (IS_ERR(priv->pd)) {
 		printk(KERN_WARNING "%s: failed to allocate PD\n", ca->name);
 		return -ENODEV;
 	}
 
-	priv->cq = ib_create_cq(priv->ca, ipoib_ib_completion, NULL, dev,
-				ipoib_sendq_size + ipoib_recvq_size + 1);
+	priv->mr = ib_get_dma_mr(priv->pd, IB_ACCESS_LOCAL_WRITE);
+	if (IS_ERR(priv->mr)) {
+		printk(KERN_WARNING "%s: ib_get_dma_mr failed\n", ca->name);
+		goto out_free_pd;
+	}
+
+	size = ipoib_sendq_size + ipoib_recvq_size + 1;
+	ret = ipoib_cm_dev_init(dev);
+	if (!ret)
+		size += ipoib_recvq_size;
+
+	priv->cq = ib_create_cq(priv->ca, ipoib_ib_completion, NULL, dev, size);
 	if (IS_ERR(priv->cq)) {
 		printk(KERN_WARNING "%s: failed to create CQ\n", ca->name);
-		goto out_free_pd;
+		goto out_free_mr;
 	}
 
 	if (ib_req_notify_cq(priv->cq, IB_CQ_NEXT_COMP))
 		goto out_free_cq;
 
-	priv->mr = ib_get_dma_mr(priv->pd, IB_ACCESS_LOCAL_WRITE);
-	if (IS_ERR(priv->mr)) {
-		printk(KERN_WARNING "%s: ib_get_dma_mr failed\n", ca->name);
-		goto out_free_cq;
-	}
-
 	init_attr.send_cq = priv->cq;
 	init_attr.recv_cq = priv->cq,
 
 	priv->qp = ib_create_qp(priv->pd, &init_attr);
 	if (IS_ERR(priv->qp)) {
 		printk(KERN_WARNING "%s: failed to create QP\n", ca->name);
-		goto out_free_mr;
+		goto out_free_cq;
 	}
 
 	priv->dev->dev_addr[1] = (priv->qp->qp_num >> 16) & 0xff;
@@ -212,12 +218,12 @@ int ipoib_transport_dev_init(struct net_device *dev, struct ib_device *ca)
 
 	return 0;
 
-out_free_mr:
-	ib_dereg_mr(priv->mr);
-
 out_free_cq:
 	ib_destroy_cq(priv->cq);
 
+out_free_mr:
+	ib_dereg_mr(priv->mr);
+
 out_free_pd:
 	ib_dealloc_pd(priv->pd);
 	return -ENODEV;
@@ -235,12 +241,14 @@ void ipoib_transport_dev_cleanup(struct net_device *dev)
 		clear_bit(IPOIB_PKEY_ASSIGNED, &priv->flags);
 	}
 
-	if (ib_dereg_mr(priv->mr))
-		ipoib_warn(priv, "ib_dereg_mr failed\n");
-
 	if (ib_destroy_cq(priv->cq))
 		ipoib_warn(priv, "ib_cq_destroy failed\n");
 
+	ipoib_cm_dev_cleanup(dev);
+
+	if (ib_dereg_mr(priv->mr))
+		ipoib_warn(priv, "ib_dereg_mr failed\n");
+
 	if (ib_dealloc_pd(priv->pd))
 		ipoib_warn(priv, "ib_dealloc_pd failed\n");
 }
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_vlan.c b/drivers/infiniband/ulp/ipoib/ipoib_vlan.c
index f887780..d9fd82d 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_vlan.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_vlan.c
@@ -115,6 +115,8 @@ int ipoib_vlan_add(struct net_device *pdev, unsigned short pkey)
 
 	ipoib_create_debug_files(priv->dev);
 
+	if (ipoib_cm_add_mode_attr(priv->dev))
+		goto sysfs_failed;
 	if (ipoib_add_pkey_attr(priv->dev))
 		goto sysfs_failed;
 
-- 
MST


From rowland at cse.ohio-state.edu  Mon Feb  5 12:12:59 2007
From: rowland at cse.ohio-state.edu (Shaun Rowland)
Date: Mon, 05 Feb 2007 15:12:59 -0500
Subject: [openib-general] MVAPICH2 SRPM and install file patches
In-Reply-To: <1170669047.6049.4.camel@vladsk-laptop>
References: <45C14344.9010602@cse.ohio-state.edu>
	<1170669047.6049.4.camel@vladsk-laptop>
Message-ID: <45C78FCB.8010807@cse.ohio-state.edu>

Vladimir Sokolovsky wrote:
> On Wed, 2007-01-31 at 20:32 -0500, Shaun Rowland wrote:
>> I've placed the MVAPICH2 SRPM on the OFA server in ~rowland/ofed_1_2,
>> and it is linked to here:
>>
>> http://www.openfabrics.org/~rowland/ofed_1_2/
>>
> 
> Hi Shaun,
> Please change mvapich2.spec to avoid using of %build macro.
> It removes RPM_BUILD_ROOT on SuSE distros:
> 
> Executing(%build): /bin/sh -e /var/tmp/rpm-tmp.9418
> + umask 022
> + cd /var/tmp/OFEDRPM/BUILD
> + /bin/rm -rf /var/tmp/OFED
> ++ dirname /var/tmp/OFED
> + /bin/mkdir -p /var/tmp
> + /bin/mkdir /var/tmp/OFED
> + cd mvapich2-0.9.8
> + export OPEN_IB_HOME=/var/tmp/OFED/usr/local/ofed
> + OPEN_IB_HOME=/var/tmp/OFED/usr/local/ofed
> 

Thank you for pointing out this issue on SuSE. I've made the change and
placed a new SRPM in my directory (mvapich2-0.9.8-2.src.rpm) and updated
my latest.txt file.
-- 
Shaun Rowland	rowland at cse.ohio-state.edu
http://www.cse.ohio-state.edu/~rowland/


From rowland at cse.ohio-state.edu  Mon Feb  5 12:30:46 2007
From: rowland at cse.ohio-state.edu (Shaun Rowland)
Date: Mon, 05 Feb 2007 15:30:46 -0500
Subject: [openib-general] MVAPICH2 rpmbuild issue
In-Reply-To: <1170675863.6049.11.camel@vladsk-laptop>
References: <45C14344.9010602@cse.ohio-state.edu>
	<1170675863.6049.11.camel@vladsk-laptop>
Message-ID: <45C793F6.8090003@cse.ohio-state.edu>

Vladimir Sokolovsky wrote:
> Hi Shaun,
> Please check the following issue:

Hi Vladimir. I can tell from the output what seems to have happened, but
I don't know why it happened. When I tested using the install/build
scripts you had given us originally, I tested against OFED 1.1 files to
understand how the build procedure worked. From that testing, the first
thing I found that I had to deal with was the fact that the openib
packages were built in /var/tmp/OFED and left there for other packages
to be built against. Since the openib files are in a location other than
their final destination, I created the %ofed_build_root macro to define
this location and in addition, set a %ofed_bootstarp condition in the
RPM. From the rpmbuild command below, this appears to be called how I
expect. However:

> Executing(%install): /bin/sh -e /var/tmp/rpm-tmp.84872
> + umask 022
> + cd /var/tmp/OFEDRPM/BUILD
> + cd mvapich2-0.9.8
> + export OPEN_IB_HOME=/var/tmp/OFED/usr/local/ofed
> + OPEN_IB_HOME=/var/tmp/OFED/usr/local/ofed
> + '[' -d /var/tmp/OFED/usr/local/ofed/lib ']'
> + '[' -d /var/tmp/OFED/usr/local/ofed/lib64 ']'

In the two lines above, I am setting LD_LIBRARY_PATH so that MVAPICH2
can be built. I do this because, again, the files are not in their final
destination directory, but in /var/tmp/OFED/$STACK_PREFIX/lib[64].
Above, I am testing for either the lib or lib64 directory in that path,
but neither is being found because there is no associated export of
LD_LIBRARY_PATH above. This is also why:

> + export PREFIX=/var/tmp/OFED/usr/local/ofed/mpi/gcc/mvapich2-0.9.8-1
> + PREFIX=/var/tmp/OFED/usr/local/ofed/mpi/gcc/mvapich2-0.9.8-1
> + export CC=gcc CXX=g++ F77=gfortran
> + CC=gcc
> + CXX=g++
> + F77=gfortran
> + export ROMIO=yes
> + ROMIO=yes
> + export SHARED_LIBS=yes
> + SHARED_LIBS=yes
> + ./make.mvapich2.gen2
> Could not find the OPEN_IB_HOME/lib64 or OPEN_IB_HOME/lib directory.
> Exiting.
> error: Bad exit status from /var/tmp/rpm-tmp.84872 (%install)

our make.mvapich2.gen2 script fails. It actually exists if either of
these directories cannot be found. It is basically the same check,
except in make.mvapich2.gen2 LD_LIBRARY_PATH is not set, and it also
exists if the directories are not found. It would be possible to do the
LD_LIBRARY_PATH setting in make.mvapich2.gen2 as well, but usually it
isn't necessary - so I had added the code to the spec file myself.

So my question in this case, given the error output, is what happened
to /var/tmp/OFED/usr/local/ofed/lib or
/var/tmp/OFED/usr/local/ofed/lib64? The rpmbuild is not finding those
directories, but the files should still be there for MVAPICH2 to be
built against, yes? Unless the build process has changed, it seems these
directories do not exist when I was expecting them to exist. They should
be there at that location, right?

I've not tried the new install/build scripts since you've updated them.
I think I need to make an openib SRPM for this or ? I am currently
investigating this and will attempt to use the new scripts on my own
testing system. I will also check if there are any files I can use
instead of making an SRPM if that's even necessary (it seems that it
was, so I had not done it yet).

-- 
Shaun Rowland	rowland at cse.ohio-state.edu
http://www.cse.ohio-state.edu/~rowland/


From swise at opengridcomputing.com  Mon Feb  5 13:43:43 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Mon, 05 Feb 2007 15:43:43 -0600
Subject: [openib-general] idea for ofed 1 2 kernel file structure
In-Reply-To: <adaveigvg7q.fsf@cisco.com>
References: <6C2C79E72C305246B504CBA17B5500C905DC04@mtlexch01.mtl.com>
	<adaveigvg7q.fsf@cisco.com>
Message-ID: <1170711823.16661.78.camel@stevo-desktop>

On Mon, 2007-02-05 at 06:20 -0800, Roland Dreier wrote:
>  > I looked a current ofed 1.2 kernel tree and there is 1 thing I dislike:
>  > It is hard to see changes that are specific to OFED since we have whole
>  > kernel history mixed in.
> 
> I'm not sure how you have your branches set up, but if you have
> something like a "linus" branch that tracks the upstream kernel, it's
> easy to do stuff like "git log linus.." or "git diff linus.. drivers/infiniband"
> and see the differences that way.
> 
> Using git that way (which is what it's designed for, after all) seems
> better than some scripts to munge together two trees.
> 

So git "log linus.." would show commits in the current branch that are
not in the linus branch, correct?

That would work.  Two branches:  one with the main kernel git tree, and
based on that + the ofed-specific changes.  


From sweitzen at cisco.com  Mon Feb  5 13:44:52 2007
From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen))
Date: Mon, 5 Feb 2007 13:44:52 -0800
Subject: [openib-general] MVAPICH2 SRPM and install file patches
In-Reply-To: <45C14344.9010602@cse.ohio-state.edu>
References: <45C14344.9010602@cse.ohio-state.edu>
Message-ID: <A15335FBE9BD2449AF2C9EF3D1EB8EA302F3F4A0@xmb-sjc-216.amer.cisco.com>

Shaun,

Thanks for doing this.

I see things like romio and shlibs configurable in the patch, what about
other MVAPICH2 features like fault tolerance, multi rail, threads, and
MPD?  How can configure them when I use install.sh to compile and
install OFED?

I also didn't quite understand the ib-vs-iwarp configuration, I thought
OFED 1.2 would support both.

Scott Weitzenkamp
SQA and Release Manager
Server Virtualization Business Unit
Cisco Systems
 

> -----Original Message-----
> From: openib-general-bounces at openib.org 
> [mailto:openib-general-bounces at openib.org] On Behalf Of Shaun Rowland
> Sent: Wednesday, January 31, 2007 5:33 PM
> To: vlad at dev.mellanox.co.il
> Cc: openfabrics-ewg at openib.org; openib-general at openib.org
> Subject: [openib-general] MVAPICH2 SRPM and install file patches
> 
> I've placed the MVAPICH2 SRPM on the OFA server in ~rowland/ofed_1_2,
> and it is linked to here:
> 
> http://www.openfabrics.org/~rowland/ofed_1_2/
> 
> Additionally, I am including a patch in this email that updates the
> ofed_1_2_scripts files from the GIT repository we were given to
> handle the MVAPICH2 SRPM file. Basically, installing MVAPICH2 
> is similar
> to the other MPI packages, except that I have added a choice option to
> build with iWARP support or not. The default is IB only. If 
> the user has
> selected the librdmacm packages and the mvapich2 package, 
> this choice is
> presented. This is also saved in the ofed.conf file using an
> MVAPICH2_IMPL variable, and the librdmacm packages are added as
> dependencies if the iWARP version of MVAPICH2 is desired and they are
> not already in the ofed.conf file, which seems like standard 
> behavior in
> the scripts. The resulting binary RPM uses the name convention
> mvapich2_<compiler> as normal in either case. There are various ways
> this could be implemented, perhaps in a better manner. This is what I
> was able to come up with by today. Since the installation 
> scripts given
> were very similar to the original OFED 1.1 scripts, I was able to test
> the installation procedure using OFED 1.1 files. Everything worked for
> me, including building the mpitests package against the mvapich2
> package. There are some comments about this in what I have 
> done. I hope
> that it is helpful in getting our SRPM integrated into the 
> installation
> scripts.
> 
> Additionally, I put a README file in my ofed_1_2 directory 
> that contains
> information about the macros that can be used with our SRPM file. The
> SRPM can be used to install against an existing OFED installation, and
> those macros control various aspects of the result. There is 
> one special
> macro I use for when the SRPM is being built along with the 
> OFED source,
> and its use should be clear in the patched build.sh script and
> associated comment.
> -- 
> Shaun Rowland	rowland at cse.ohio-state.edu
> http://www.cse.ohio-state.edu/~rowland/
> 


From rdreier at cisco.com  Mon Feb  5 14:00:51 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Mon, 05 Feb 2007 14:00:51 -0800
Subject: [openib-general] Fwd: bug in mthca_qp.c (GEN 2)
In-Reply-To: <20070205185709.GB16598@mellanox.co.il> (Michael S.
	Tsirkin's message of "Mon, 5 Feb 2007 20:57:09 +0200")
References: <20070205185709.GB16598@mellanox.co.il>
Message-ID: <aday7ncqn70.fsf@cisco.com>

 > Roland, what do you think?
 > Looks pretty severe actually.

 > static void to_ib_ah_attr(struct mthca_dev *dev, struct ib_ah_attr *ib_ah_attr,
 >     struct mthca_qp_path *path)
 > {
 >  memset(ib_ah_attr, 0, sizeof *path);

It's definitely a bug but I don't think it's very severe -- the only
calls to to_ib_ah_attr are in mthca_query_qp, where the function is
used to fill in fields embedded in a struct ib_qp_attr, and even
though the memset overruns the ib_ah_attr slightly, it only zeros out
fields that are set later in the function anyway.  So with current
code at least the bug is harmless.


anyway, I queued the patch below for 2.6.21:

IB/mthca: Use correct structure size in call to memset()

When clearing the ib_ah_attr parameter in to_ib_ah_attr(), use sizeof
*ib_ah_attr instead of sizeof *path.

Pointed out by Jack Morgenstein <jackm at mellanox.co.il>.

Signed-off-by: Roland Dreier <rolandd at cisco.com>
---
 drivers/infiniband/hw/mthca/mthca_qp.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/infiniband/hw/mthca/mthca_qp.c b/drivers/infiniband/hw/mthca/mthca_qp.c
index 5f5214c..224c93d 100644
--- a/drivers/infiniband/hw/mthca/mthca_qp.c
+++ b/drivers/infiniband/hw/mthca/mthca_qp.c
@@ -399,7 +399,7 @@ static int to_ib_qp_access_flags(int mthca_flags)
 static void to_ib_ah_attr(struct mthca_dev *dev, struct ib_ah_attr *ib_ah_attr,
 				struct mthca_qp_path *path)
 {
-	memset(ib_ah_attr, 0, sizeof *path);
+	memset(ib_ah_attr, 0, sizeof *ib_ah_attr);
 	ib_ah_attr->port_num 	  = (be32_to_cpu(path->port_pkey) >> 24) & 0x3;
 
 	if (ib_ah_attr->port_num == 0 || ib_ah_attr->port_num > dev->limits.num_ports)
-- 
1.4.4.1


From rdreier at cisco.com  Mon Feb  5 14:09:12 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Mon, 05 Feb 2007 14:09:12 -0800
Subject: [openib-general] Immediate data question
In-Reply-To: <349DCDA352EACF42A0C49FA6DCEA840350AAC4@G3W0634.americas.hpqcorp.net>
	(Changqing Tang's message of "Mon, 5 Feb 2007 15:48:29 -0000")
References: <6C2C79E72C305246B504CBA17B5500C905DC04@mtlexch01.mtl.com>
	<adaveigvg7q.fsf@cisco.com>
	<349DCDA352EACF42A0C49FA6DCEA840350AAC4@G3W0634.americas.hpqcorp.net>
Message-ID: <adatzy0qmt3.fsf@cisco.com>

 > 	If I only want to send/recv 4 bytes with immediate data:

I assume you mean that you only want to send the 4 bytes of immediate
data, and nothing else.

 > On sender side:
 > 	opcode = IBV_WR_SEND_WITH_IMM;
 > 	imm_data = my_4_bytes_data;
 > 
 > 	Do I still need to specify sg_list and num_sge ?

Well, you should be able to specify num_sge = 0.  But to be honest I'm
not positive that 0-length sends are allowed; I know that 0-length
RDMA WRITE operations are allowed.

 > On receiver side, because the immediate data is inside the completion
 > structure, do I need to post a receive for above message ?

Yes, otherwise how would you get the immediate data?

 > If I need to post a receive, do I need to specify sg_list and num_sge
 > for the receive ?

I believe that a 0-length receive with num_sge = 0 should be fine, at
least to handle an RDMA write with immediate data.  But again I'm not positive.

 - R.


From vlad at mellanox.co.il  Mon Feb  5 14:25:51 2007
From: vlad at mellanox.co.il (Vladimir Sokolovsky)
Date: Tue, 6 Feb 2007 00:25:51 +0200
Subject: [openib-general] OFED-1.2 first release
Message-ID: <6C2C79E72C305246B504CBA17B5500C922B30E@mtlexch01.mtl.com>

Hi,

OFED-1.2-20070205-1823.tgz can be downloaded from

http://www.openfabrics.org/builds/ofed-1.2/


The first OFED package includes:


ofa_kernel-1.2-alpha1.src.rpm

ofa_user-1.2-alpha1.src.rpm

mvapich-0.9.9-971.src.rpm

mvapich2-0.9.8-1.src.rpm

openmpi-1.2b4ofedr13470-1ofed.src.rpm

mpitests-2.0-698.src.rpm

open-iscsi-generic-2.0-742.src.rpm

ib-bonding-0.9.0-1.src.rpm

ofed-docs-1.2-0.src.rpm

ofed-scripts-1.2-0.src.rpm


Known issues:

srptools - compilation fails

openib_diags - compilation fails

ibutils - not included yet


To build OFED RPMs:

cd OFED-1.2-20070205-1823

./build.sh


Created RPMs will be stored under OFED-1.2-20070205-1823/RPMS/

directory.


To install OFED RPMs:

cd OFED-1.2-20070205-1823

./install.sh


For a detailed installation guide, see

OFED-1.2-xxx/docs/OFED_Installation_Guide.txt


-- 

Vladimir Sokolovsky <vlad at dev.mellanox.co.il>
<mailto:vlad at dev.mellanox.co.il> 

Mellanox Technologies Ltd.
 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070206/b2cd5355/attachment.html>

From dledford at redhat.com  Mon Feb  5 14:26:10 2007
From: dledford at redhat.com (Doug Ledford)
Date: Mon, 05 Feb 2007 17:26:10 -0500
Subject: [openib-general] Web site needs update
Message-ID: <1170714371.2716.275.camel@fc6.xsintricity.com>

The web site lists the svn repo, which is mostly empty now, and the
README says the web site lists the various git repos for accessing the
source code, but there are no git repos listed on the web site.  Could
we please have the authoritative git repos for the different components
being worked on listed on the web site for easy reference?

-- 
Doug Ledford <dledford at redhat.com>
              GPG KeyID: CFBFF194
              http://people.redhat.com/dledford

Infiniband specific RPMs available at
              http://people.redhat.com/dledford/Infiniband
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070205/a293738a/attachment.sig>

From rowland at cse.ohio-state.edu  Mon Feb  5 14:47:11 2007
From: rowland at cse.ohio-state.edu (Shaun Rowland)
Date: Mon, 05 Feb 2007 17:47:11 -0500
Subject: [openib-general] MVAPICH2 SRPM and install file patches
In-Reply-To: <A15335FBE9BD2449AF2C9EF3D1EB8EA302F3F4A0@xmb-sjc-216.amer.cisco.com>
References: <45C14344.9010602@cse.ohio-state.edu>
	<A15335FBE9BD2449AF2C9EF3D1EB8EA302F3F4A0@xmb-sjc-216.amer.cisco.com>
Message-ID: <45C7B3EF.2030903@cse.ohio-state.edu>

Scott Weitzenkamp (sweitzen) wrote:
> Shaun,
> 
> Thanks for doing this.
> 
> I see things like romio and shlibs configurable in the patch, what about
> other MVAPICH2 features like fault tolerance, multi rail, threads, and
> MPD?  How can configure them when I use install.sh to compile and
> install OFED?

Hi Scott. I had thought about this a little when I was testing with the
install/build scripts Vlad gave us. I would appreciate his input if I
get anything wrong here as well. From the perspective of the user
running the install.sh script, the MPI packages are essentially built
one way. You do get to pick the compiler(s) to use, but as for other
options - you would have to edit the build.sh function associated with
the desired package. I created a hack for the iwarp vs ib configuration
for MVAPICH2 because I needed to distinguish between the two (for
reasons I will outline at the end of this message).

Theoretically, you should be able to export the proper variables from
our make.mvapich2.* scripts before running the install.sh script, and
the features would be enabled. For instance, you could do:

export MULTI_THREAD=yes
./install.sh

This is not a good solution for installing OFED, but should work due to
not conflicting with anything else - at least that I am aware. I see
that I need to update the make.mvapich2.iwarp script to have the
multithreading option anyway as well, so it would not quite work 100%
right now.

As far as each feature you asked about:

* fault tolerance
      - this is controlled during the build process with $ENABLE_CKPT 
and requires $BLCR_HOME pointing to a BLCR installation.  This only 
works for single threaded builds without rdmacm support (the ib case 
only, essentially).

* multi rail
      - this is controlled by runtime environment variables after 
installation.

* threads
      - This is controlled by $MULTI_THREAD during the build process. 
As noted above, there's a restriction with fault tolerance.

* MPD
      - MPD is used by MVAPICH2 as it is based on MPICH2.

There are actually a number of options that could be chosen. I believe
from our side, it will be good for me to go ahead and put these in our
SRPM now. Our SRPM can be used outside of the OFED installation system
of course, and these should really be there. There are even other
"devices", like uDAPL.

I did the SRPM in the install/build script patches the way I did
because that seemed like a good set of options for how the OFED
installation system works. There's no framework or examples of asking
about features to build in an MPI package. I just quickly tacked on the
iwarp question and made up a new configuration variable for the
ofed.conf file, but it's not necessarily a good way to do it.

One possibility would be to create a shell function that sets various
build options for MPI packages. Variables could be set in this function
using some name convention, in our case perhaps MVAPICH2_OPT_<whatever>.
In such a function (probably one for each package, that seems to be the
convention), it would be easier to code all the exceptions for features
- if there are any. There are some in our case, as I've mentioned. This
configuration function could be called when the user is choosing to
install MVAPICH2.

This leads to a number of problems. Can the user select different
options for each of the compiler versions of the MPI package? I think
clearly the answer should be "no". Even as implemented now, you cannot
install the iwarp and ib version of MVAPICH2 at the same time during the
install process. You must choose one or the other. Being able to do
either would require one of two changes:

1. Having another level of installer system configuration where I could
selected the devices desired, and options for each device (by device
here, I mean uDAPL, IB, iWARP).

- or -

2. Make multiple RPM packages to fit into how the installer currently
interacts with SRPMs, prompts, etc.

I've only had a limitted time to investigate this, so what I have done
so far mostly fits with how the OFED install system does things with the
other packages - except for my iwarp vs ib question prompt. I think
there's potential for a lot of compilication here. A configuration
function for each package would be one possible way to contain that,
however I'd have to go back and check out how things work again to see
how something like that would fit in.

So, I will add these new feature options to our SRPM because they could
be used outside of the OFED installation system anyway, and we would
like that to be possible and give the ability to set these options.
However, I cannot say what would be best for the OFED installation
system. It might be better to just go with what we have now - more
"mainstream" builds, and let the user do their own build if they want to
highly customize or something. Otherwise, I've given one possible idea
from the perspective of someone who is new to the install system. Vlad,
do you have any opinion here? Do you see where I am coming from as far
as what kind of situation we are talking about with presenting options
for MPI package builds?

> I also didn't quite understand the ib-vs-iwarp configuration, I thought
> OFED 1.2 would support both.

There are 2 reasons our SRPM has to be told whether it is being built
for iWARP or IB:

1. We need to use -DRDMA_CM_RNIC during the build for iWARP (this is
actually done by invoking our make.mvapich2.iwarp script in the RPM build).

2. If the %auto_req macro is set to 0, then simple RPM names for the
install requirements are used:

Autoreq: 0
Requires: libibumad libibverbs [default]
Requires: libibumad libibverbs librdmacm [iWARP]

This is actually not done, but it is there as a possibility (Autoreq is
used right now I mean): Vlad, I was thinking that you might want to
change our function in build.sh to set auto_req to 0 instead of 1. I see
that is how MVAPICH is doing requires, instead of letting Autoreq do it.
I think it will work either way probably, but using --define 'auto_req
0' will probably cut down on some potential issues. I had set it to 1
because I saw in OFED 1.1 it seemed that this was how things worked.
-- 
Shaun Rowland	rowland at cse.ohio-state.edu
http://www.cse.ohio-state.edu/~rowland/


From changquing.tang at hp.com  Mon Feb  5 14:54:33 2007
From: changquing.tang at hp.com (Tang, Changqing)
Date: Mon, 5 Feb 2007 22:54:33 -0000
Subject: [openib-general] Immediate data question
In-Reply-To: <adatzy0qmt3.fsf@cisco.com>
References: <6C2C79E72C305246B504CBA17B5500C905DC04@mtlexch01.mtl.com><adaveigvg7q.fsf@cisco.com><349DCDA352EACF42A0C49FA6DCEA840350AAC4@G3W0634.americas.hpqcorp.net>
	<adatzy0qmt3.fsf@cisco.com>
Message-ID: <349DCDA352EACF42A0C49FA6DCEA840350B1B5@G3W0634.americas.hpqcorp.net>


Thank you. Other than using immediate data to send notification from one
end to the other of a QP, is there any other way to do this ? For
example, can I modify
QP state from RTS to other state on one end, and then the other end gets
some notification when I query the QP ?

--CQ


> -----Original Message-----
> From: Roland Dreier [mailto:rdreier at cisco.com] 
> Sent: Monday, February 05, 2007 4:09 PM
> To: Tang, Changqing
> Cc: Michael S. Tsirkin; openib-general at openib.org
> Subject: Re: Immediate data question
> 
>  > 	If I only want to send/recv 4 bytes with immediate data:
> 
> I assume you mean that you only want to send the 4 bytes of 
> immediate data, and nothing else.
> 
>  > On sender side:
>  > 	opcode = IBV_WR_SEND_WITH_IMM;
>  > 	imm_data = my_4_bytes_data;
>  > 
>  > 	Do I still need to specify sg_list and num_sge ?
> 
> Well, you should be able to specify num_sge = 0.  But to be 
> honest I'm not positive that 0-length sends are allowed; I 
> know that 0-length RDMA WRITE operations are allowed.
> 
>  > On receiver side, because the immediate data is inside the 
> completion  > structure, do I need to post a receive for 
> above message ?
> 
> Yes, otherwise how would you get the immediate data?
> 
>  > If I need to post a receive, do I need to specify sg_list 
> and num_sge  > for the receive ?
> 
> I believe that a 0-length receive with num_sge = 0 should be 
> fine, at least to handle an RDMA write with immediate data.  
> But again I'm not positive.
> 
>  - R.
> 


From rdreier at cisco.com  Mon Feb  5 15:02:32 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Mon, 05 Feb 2007 15:02:32 -0800
Subject: [openib-general] Immediate data question
In-Reply-To: <349DCDA352EACF42A0C49FA6DCEA840350B1B5@G3W0634.americas.hpqcorp.net>
	(Changqing Tang's message of "Mon, 5 Feb 2007 22:54:33 -0000")
References: <6C2C79E72C305246B504CBA17B5500C905DC04@mtlexch01.mtl.com>
	<adaveigvg7q.fsf@cisco.com>
	<349DCDA352EACF42A0C49FA6DCEA840350AAC4@G3W0634.americas.hpqcorp.net>
	<adatzy0qmt3.fsf@cisco.com>
	<349DCDA352EACF42A0C49FA6DCEA840350B1B5@G3W0634.americas.hpqcorp.net>
Message-ID: <ada7iuwp5rr.fsf@cisco.com>

    Changqing> Thank you. Other than using immediate data to send
    Changqing> notification from one end to the other of a QP, is
    Changqing> there any other way to do this ? For example, can I
    Changqing> modify QP state from RTS to other state on one end, and
    Changqing> then the other end gets some notification when I query
    Changqing> the QP ?

Not that I know of.  You would need to do something that triggers
something to be sent on the wire, and I don't know of any way to do
that other than posting a work request.

 - R.


From swise at opengridcomputing.com  Mon Feb  5 15:09:29 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Mon, 05 Feb 2007 17:09:29 -0600
Subject: [openib-general] MVAPICH2 SRPM and install file patches
In-Reply-To: <45C7B3EF.2030903@cse.ohio-state.edu>
References: <45C14344.9010602@cse.ohio-state.edu>
	<A15335FBE9BD2449AF2C9EF3D1EB8EA302F3F4A0@xmb-sjc-216.amer.cisco.com>
	<45C7B3EF.2030903@cse.ohio-state.edu>
Message-ID: <1170716969.16661.97.camel@stevo-desktop>


> > I also didn't quite understand the ib-vs-iwarp configuration, I thought
> > OFED 1.2 would support both.
> 
> There are 2 reasons our SRPM has to be told whether it is being built
> for iWARP or IB:
> 
> 1. We need to use -DRDMA_CM_RNIC during the build for iWARP (this is
> actually done by invoking our make.mvapich2.iwarp script in the RPM build).

I believe the iWARP build will work over IB too.  The difference, I
think, is that the iWARP build uses the RDMA-CM and the IB build uses
the IB-CM.  

Shaun, is this correct?  

If so, I suggest you define these options differently.  Perhaps IBCM vs
RDMACM? Right now it implies that you cannot run the same mvapich build
over both transports.  

My 2 cents.


Steve.


From swise at opengridcomputing.com  Mon Feb  5 16:19:23 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Mon, 05 Feb 2007 18:19:23 -0600
Subject: [openib-general] OFED-1.2 first release
In-Reply-To: <6C2C79E72C305246B504CBA17B5500C922B30E@mtlexch01.mtl.com>
References: <6C2C79E72C305246B504CBA17B5500C922B30E@mtlexch01.mtl.com>
Message-ID: <1170721163.16661.111.camel@stevo-desktop>

BTW:  The README.txt still talks about OFED-1.1 and the October 2006
release.


Steve.

On Tue, 2007-02-06 at 00:25 +0200, Vladimir Sokolovsky wrote:
> Hi,
> 
> OFED-1.2-20070205-1823.tgz can be downloaded from
> 
> http://www.openfabrics.org/builds/ofed-1.2/
> 

> 


From akepner at sgi.com  Mon Feb  5 16:33:02 2007
From: akepner at sgi.com (akepner at sgi.com)
Date: Mon, 5 Feb 2007 16:33:02 -0800 (PST)
Subject: [openib-general] idea for ofed 1 2 kernel file structure
In-Reply-To: <20070205184246.GC15775@mellanox.co.il>
References: <Pine.LNX.4.61.0702050930360.26852@localhost.localdomain>
	<20070205184246.GC15775@mellanox.co.il>
Message-ID: <Pine.LNX.4.61.0702051628200.4774@localhost.localdomain>

On Mon, 5 Feb 2007, Michael S. Tsirkin wrote:

> ....
> Could you address Roland's proposal as well?
>

Regarding the use of git to track the differences in
OFED/kernel.org trees?

I had to go (re)learn some git stuff, but now I think
that this will work fine.

-- 
Arthur


From swise at opengridcomputing.com  Mon Feb  5 17:07:03 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Mon, 05 Feb 2007 19:07:03 -0600
Subject: [openib-general] OFED-1.2 first release
In-Reply-To: <1170721163.16661.111.camel@stevo-desktop>
References: <6C2C79E72C305246B504CBA17B5500C922B30E@mtlexch01.mtl.com>
	<1170721163.16661.111.camel@stevo-desktop>
Message-ID: <1170724023.19728.5.camel@stevo-desktop>


I think there might be some dependency problem.  I selected libibverbs,
libcxgb3, librdmacm, perftest, mvapich2/IWARP and mpitests.  For some
reason it pulled in libibumad as a prereq, but not libibcommon...

Also, I think mvapich2/IWARP links with libibumad or libibcommon and it
doesn't need to when using librdmacm.


[root at r2-iw redhat-release-4AS-5.5]# rpm -U *
error: Failed dependencies:
        libibcommon.so.1()(64bit) is needed by libibumad-1.0.2-0.x86_64
        libibcommon.so.1(IBCOMMON_1.0)(64bit) is needed by libibumad-1.0.2-0.x86_64
    Suggested resolutions:
        libibcommon-1.0-1.x86_64.rpm


> On Tue, 2007-02-06 at 00:25 +0200, Vladimir Sokolovsky wrote:
> > Hi,
> > 
> > OFED-1.2-20070205-1823.tgz can be downloaded from
> > 
> > http://www.openfabrics.org/builds/ofed-1.2/
> > 
> 
> > 
> 
> 
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
> 


From mst at mellanox.co.il  Mon Feb  5 21:13:56 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Tue, 6 Feb 2007 07:13:56 +0200
Subject: [openib-general] idea for ofed 1 2 kernel file structure
In-Reply-To: <1170711823.16661.78.camel@stevo-desktop>
References: <1170711823.16661.78.camel@stevo-desktop>
Message-ID: <20070206051356.GF16598@mellanox.co.il>

> Quoting Steve Wise <swise at opengridcomputing.com>:
> Subject: Re: [openib-general] idea for ofed 1 2 kernel file structure
> 
> On Mon, 2007-02-05 at 06:20 -0800, Roland Dreier wrote:
> >  > I looked a current ofed 1.2 kernel tree and there is 1 thing I dislike:
> >  > It is hard to see changes that are specific to OFED since we have whole
> >  > kernel history mixed in.
> > 
> > I'm not sure how you have your branches set up, but if you have
> > something like a "linus" branch that tracks the upstream kernel, it's
> > easy to do stuff like "git log linus.." or "git diff linus.. drivers/infiniband"
> > and see the differences that way.
> > 
> > Using git that way (which is what it's designed for, after all) seems
> > better than some scripts to munge together two trees.
> > 
> 
> So git "log linus.." would show commits in the current branch that are
> not in the linus branch, correct?
> 
> That would work.  Two branches:  one with the main kernel git tree, and
> based on that + the ofed-specific changes.  

Well, that's what we have now.
The master branch tracks upstream kernel.

-- 
MST


From mst at mellanox.co.il  Mon Feb  5 21:16:33 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Tue, 6 Feb 2007 07:16:33 +0200
Subject: [openib-general] Web site needs update
In-Reply-To: <1170714371.2716.275.camel@fc6.xsintricity.com>
References: <1170714371.2716.275.camel@fc6.xsintricity.com>
Message-ID: <20070206051633.GG16598@mellanox.co.il>

> Quoting Doug Ledford <dledford at redhat.com>:
> Subject: Web site needs update
> 
> The web site lists the svn repo, which is mostly empty now, and the
> README says the web site lists the various git repos for accessing the
> source code, but there are no git repos listed on the web site.  Could
> we please have the authoritative git repos for the different components
> being worked on listed on the web site for easy reference?

I think the thing to do now is to finally move openfabrics.org
and openib.org to point to the new server.

Then we'll be able to fix this.

-- 
MST


From sweitzen at cisco.com  Mon Feb  5 21:26:49 2007
From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen))
Date: Mon, 5 Feb 2007 21:26:49 -0800
Subject: [openib-general] OFED-1.2 first release
In-Reply-To: <6C2C79E72C305246B504CBA17B5500C922B30E@mtlexch01.mtl.com>
References: <6C2C79E72C305246B504CBA17B5500C922B30E@mtlexch01.mtl.com>
Message-ID: <A15335FBE9BD2449AF2C9EF3D1EB8EA302F3F68C@xmb-sjc-216.amer.cisco.com>

Vlad and Tziporet,
 
It might help if you elaborated on what you meant by "first release",
you have been saying "code freeze" but really this is "feature freeze",
right?  This announcement is quite a bit different from previous OFED
announcements, where you detailed what features were available and what
OS were supported.  The daily build email mentions compiling against
kernels, but I haven't seen what distros were actually tested.  Are we
starting from scratch on compiling and testing with distros like RHEL4?
Do you anticipate we will just go day by day with builds trying to
stabilize things initially?
 
In any case, here's what I see when I try to compile with install.sh on
RHEL4 U3 x86_64:
 
...
/tmp/OFED-1.2-20070205-1823/build.sh: line 802: kernel-ib: command not
found
Running rpmbuild --rebuild --target=noarch --define '_topdir
/var/tmp/OFEDRPM' -
-define '_prefix /usr/local/ofed'
/tmp/OFED-1.2-20070205-1823/SRPMS/ofed-docs-1.
2-0.src.rpm
Running /bin/mv -f
/var/tmp/OFEDRPM/RPMS/noarch/ofed-docs-1.2-0.noarch.rpm /tmp/
OFED-1.2-20070205-1823/RPMS/redhat-release-4AS-4.1
Running rpmbuild --rebuild --target=noarch --define '_topdir
/var/tmp/OFEDRPM' -
-define '_prefix /usr/local/ofed'
/tmp/OFED-1.2-20070205-1823/SRPMS/ofed-scripts
-1.2-0.src.rpm
Running /bin/mv -f
/var/tmp/OFEDRPM/RPMS/noarch/ofed-scripts-1.2-0.noarch.rpm /t
mp/OFED-1.2-20070205-1823/RPMS/redhat-release-4AS-4.1
Running rpmbuild --rebuild --define '_topdir /var/tmp/OFEDRPM' --define
'_prefix
 /usr/local/ofed'
/tmp/OFED-1.2-20070205-1823/SRPMS/ib-bonding-0.9.0-1.src.rpm
Running /bin/mv -f
/var/tmp/OFEDRPM/RPMS/x86_64/ib-bonding-0.9.0-1.x86_64.rpm /t
mp/OFED-1.2-20070205-1823/RPMS/redhat-release-4AS-4.1
 
ERROR: Failed executing "/bin/mv -f
/var/tmp/OFEDRPM/RPMS/x86_64/ib-bonding-0.9.
0-1.x86_64.rpm /tmp/OFED-1.2-20070205-1823/RPMS/redhat-release-4AS-4.1"
 
See log file: /tmp/OFED.10899.log
 
# tail -10 /tmp/OFED.10899.log
Checking for unpackaged file(s): /usr/lib/rpm/check-files
/var/tmp/ib-bonding-0.
9.0-root
Wrote: /var/tmp/OFEDRPM/RPMS/x86_64/ib-bonding-0.9.0-1-rh-x86_64.rpm
Wrote:
/var/tmp/OFEDRPM/RPMS/x86_64/ib-bonding-debuginfo-0.9.0-1-rh-x86_64.rpm
Executing(--clean): /bin/sh -e /var/tmp/rpm-tmp.98615
+ umask 022
+ cd /var/tmp/OFEDRPM/BUILD
+ rm -rf ib-bonding-0.9.0
+ exit 0
/bin/mv: cannot stat
`/var/tmp/OFEDRPM/RPMS/x86_64/ib-bonding-0.9.0-1.x86_64.rpm
': No such file or directory
ERROR: Failed executing "/bin/mv -f
/var/tmp/OFEDRPM/RPMS/x86_64/ib-bonding-0.9.
0-1.x86_64.rpm /tmp/OFED-1.2-20070205-1823/RPMS/redhat-release-4AS-4.1"


Scott

________________________________

	From: openib-general-bounces at openib.org
[mailto:openib-general-bounces at openib.org] On Behalf Of Vladimir
Sokolovsky
	Sent: Monday, February 05, 2007 2:26 PM
	To: openfabrics-ewg at openib.org
	Cc: openib-general at openib.org
	Subject: [openib-general] OFED-1.2 first release
	
	
	Hi,
	
	OFED-1.2-20070205-1823.tgz can be downloaded from
	
	http://www.openfabrics.org/builds/ofed-1.2/
	
	
	The first OFED package includes:
	
	
	ofa_kernel-1.2-alpha1.src.rpm
	
	ofa_user-1.2-alpha1.src.rpm
	
	mvapich-0.9.9-971.src.rpm
	
	mvapich2-0.9.8-1.src.rpm
	
	openmpi-1.2b4ofedr13470-1ofed.src.rpm
	
	mpitests-2.0-698.src.rpm
	
	open-iscsi-generic-2.0-742.src.rpm
	
	ib-bonding-0.9.0-1.src.rpm
	
	ofed-docs-1.2-0.src.rpm
	
	ofed-scripts-1.2-0.src.rpm
	
	
	Known issues:
	
	srptools - compilation fails
	
	openib_diags - compilation fails
	
	ibutils - not included yet
	
	
	To build OFED RPMs:
	
	cd OFED-1.2-20070205-1823
	
	./build.sh
	
	
	Created RPMs will be stored under OFED-1.2-20070205-1823/RPMS/
	
	directory.
	
	
	To install OFED RPMs:
	
	cd OFED-1.2-20070205-1823
	
	./install.sh
	
	
	For a detailed installation guide, see
	
	OFED-1.2-xxx/docs/OFED_Installation_Guide.txt
	
	
	-- 
	
	Vladimir Sokolovsky <vlad at dev.mellanox.co.il>
<mailto:vlad at dev.mellanox.co.il> 
	
	Mellanox Technologies Ltd.
	 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070205/b5b9fb4c/attachment.html>

From sweitzen at cisco.com  Mon Feb  5 21:43:41 2007
From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen))
Date: Mon, 5 Feb 2007 21:43:41 -0800
Subject: [openib-general] OFED-1.2 first release
In-Reply-To: <A15335FBE9BD2449AF2C9EF3D1EB8EA302F3F68C@xmb-sjc-216.amer.cisco.com>
References: <6C2C79E72C305246B504CBA17B5500C922B30E@mtlexch01.mtl.com>
	<A15335FBE9BD2449AF2C9EF3D1EB8EA302F3F68C@xmb-sjc-216.amer.cisco.com>
Message-ID: <A15335FBE9BD2449AF2C9EF3D1EB8EA302F3F697@xmb-sjc-216.amer.cisco.com>

Moving on, I set ib_bonding=n in ofed.conf and try install.sh again, and
now get this:
 
...
Building MVAPICH RPM. Please wait...
 
Using gcc compiler
Running rpmbuild -v --rebuild --define '_topdir /var/tmp/OFEDRPM'
--define '_nam
e mvapich_gcc' --define 'ofed 1' --define 'compiler gcc' --define
'openib_prefix
 /usr/local/ofed' --define 'build_root /var/tmp/OFED' --define '_prefix
/usr/loc
al/ofed/mpi/gcc/mvapich-0.9.9'
/tmp/OFED-1.2-20070205-1823/SRPMS/mvapich-0.9.9-9
71.src.rpm
 
ERROR: Failed executing "rpmbuild -v --rebuild --define '_topdir
/var/tmp/OFEDRP
M' --define '_name mvapich_gcc' --define 'ofed 1' --define 'compiler
gcc' --defi
ne 'openib_prefix /usr/local/ofed' --define 'build_root /var/tmp/OFED'
--define
'_prefix /usr/local/ofed/mpi/gcc/mvapich-0.9.9'
/tmp/OFED-1.2-20070205-1823/SRPM
S/mvapich-0.9.9-971.src.rpm"
 
See log file: /tmp/OFED.6120.log
 
#  tail /tmp/OFED.6120.log
+ LANG=C
+ export LANG
+ unset DISPLAY
/var/tmp/rpm-tmp.870: line 33: syntax error near unexpected token `)'
error: Bad exit status from /var/tmp/rpm-tmp.870 (%install)
 

RPM build errors:
    Bad exit status from /var/tmp/rpm-tmp.870 (%install)
ERROR: Failed executing "rpmbuild -v --rebuild --define '_topdir
/var/tmp/OFEDRP
M' --define '_name mvapich_gcc' --define 'ofed 1' --define 'compiler
gcc' --defi
ne 'openib_prefix /usr/local/ofed' --define 'build_root /var/tmp/OFED'
--define
'_prefix /usr/local/ofed/mpi/gcc/mvapich-0.9.9'
/tmp/OFED-1.2-20070205-1823/SRPM
S/mvapich-0.9.9-971.src.rpm"

Scott

________________________________

	From: Scott Weitzenkamp (sweitzen) 
	Sent: Monday, February 05, 2007 9:27 PM
	To: Vladimir Sokolovsky; openfabrics-ewg at openib.org; Tziporet
Koren; Scott Weitzenkamp (sweitzen)
	Cc: openib-general at openib.org
	Subject: RE: [openib-general] OFED-1.2 first release
	
	
	Vlad and Tziporet,
	 
	It might help if you elaborated on what you meant by "first
release", you have been saying "code freeze" but really this is "feature
freeze", right?  This announcement is quite a bit different from
previous OFED announcements, where you detailed what features were
available and what OS were supported.  The daily build email mentions
compiling against kernels, but I haven't seen what distros were actually
tested.  Are we starting from scratch on compiling and testing with
distros like RHEL4?  Do you anticipate we will just go day by day with
builds trying to stabilize things initially?
	 
	In any case, here's what I see when I try to compile with
install.sh on RHEL4 U3 x86_64:
	 
	...
	/tmp/OFED-1.2-20070205-1823/build.sh: line 802: kernel-ib:
command not found
	Running rpmbuild --rebuild --target=noarch --define '_topdir
/var/tmp/OFEDRPM' -
	-define '_prefix /usr/local/ofed'
/tmp/OFED-1.2-20070205-1823/SRPMS/ofed-docs-1.
	2-0.src.rpm
	Running /bin/mv -f
/var/tmp/OFEDRPM/RPMS/noarch/ofed-docs-1.2-0.noarch.rpm /tmp/
	OFED-1.2-20070205-1823/RPMS/redhat-release-4AS-4.1
	Running rpmbuild --rebuild --target=noarch --define '_topdir
/var/tmp/OFEDRPM' -
	-define '_prefix /usr/local/ofed'
/tmp/OFED-1.2-20070205-1823/SRPMS/ofed-scripts
	-1.2-0.src.rpm
	Running /bin/mv -f
/var/tmp/OFEDRPM/RPMS/noarch/ofed-scripts-1.2-0.noarch.rpm /t
	mp/OFED-1.2-20070205-1823/RPMS/redhat-release-4AS-4.1
	Running rpmbuild --rebuild --define '_topdir /var/tmp/OFEDRPM'
--define '_prefix
	 /usr/local/ofed'
/tmp/OFED-1.2-20070205-1823/SRPMS/ib-bonding-0.9.0-1.src.rpm
	Running /bin/mv -f
/var/tmp/OFEDRPM/RPMS/x86_64/ib-bonding-0.9.0-1.x86_64.rpm /t
	mp/OFED-1.2-20070205-1823/RPMS/redhat-release-4AS-4.1
	 
	ERROR: Failed executing "/bin/mv -f
/var/tmp/OFEDRPM/RPMS/x86_64/ib-bonding-0.9.
	0-1.x86_64.rpm
/tmp/OFED-1.2-20070205-1823/RPMS/redhat-release-4AS-4.1"
	 
	See log file: /tmp/OFED.10899.log
	 
	# tail -10 /tmp/OFED.10899.log
	Checking for unpackaged file(s): /usr/lib/rpm/check-files
/var/tmp/ib-bonding-0.
	9.0-root
	Wrote:
/var/tmp/OFEDRPM/RPMS/x86_64/ib-bonding-0.9.0-1-rh-x86_64.rpm
	Wrote:
/var/tmp/OFEDRPM/RPMS/x86_64/ib-bonding-debuginfo-0.9.0-1-rh-x86_64.rpm
	Executing(--clean): /bin/sh -e /var/tmp/rpm-tmp.98615
	+ umask 022
	+ cd /var/tmp/OFEDRPM/BUILD
	+ rm -rf ib-bonding-0.9.0
	+ exit 0
	/bin/mv: cannot stat
`/var/tmp/OFEDRPM/RPMS/x86_64/ib-bonding-0.9.0-1.x86_64.rpm
	': No such file or directory
	ERROR: Failed executing "/bin/mv -f
/var/tmp/OFEDRPM/RPMS/x86_64/ib-bonding-0.9.
	0-1.x86_64.rpm
/tmp/OFED-1.2-20070205-1823/RPMS/redhat-release-4AS-4.1"
	

	Scott

________________________________

		From: openib-general-bounces at openib.org
[mailto:openib-general-bounces at openib.org] On Behalf Of Vladimir
Sokolovsky
		Sent: Monday, February 05, 2007 2:26 PM
		To: openfabrics-ewg at openib.org
		Cc: openib-general at openib.org
		Subject: [openib-general] OFED-1.2 first release
		
		
		Hi,
		
		OFED-1.2-20070205-1823.tgz can be downloaded from
		
		http://www.openfabrics.org/builds/ofed-1.2/
		
		
		The first OFED package includes:
		
		
		ofa_kernel-1.2-alpha1.src.rpm
		
		ofa_user-1.2-alpha1.src.rpm
		
		mvapich-0.9.9-971.src.rpm
		
		mvapich2-0.9.8-1.src.rpm
		
		openmpi-1.2b4ofedr13470-1ofed.src.rpm
		
		mpitests-2.0-698.src.rpm
		
		open-iscsi-generic-2.0-742.src.rpm
		
		ib-bonding-0.9.0-1.src.rpm
		
		ofed-docs-1.2-0.src.rpm
		
		ofed-scripts-1.2-0.src.rpm
		
		
		Known issues:
		
		srptools - compilation fails
		
		openib_diags - compilation fails
		
		ibutils - not included yet
		
		
		To build OFED RPMs:
		
		cd OFED-1.2-20070205-1823
		
		./build.sh
		
		
		Created RPMs will be stored under
OFED-1.2-20070205-1823/RPMS/
		
		directory.
		
		
		To install OFED RPMs:
		
		cd OFED-1.2-20070205-1823
		
		./install.sh
		
		
		For a detailed installation guide, see
		
		OFED-1.2-xxx/docs/OFED_Installation_Guide.txt
		
		
		-- 
		
		Vladimir Sokolovsky <vlad at dev.mellanox.co.il>
<mailto:vlad at dev.mellanox.co.il> 
		
		Mellanox Technologies Ltd.
		 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070205/03dc6a84/attachment.html>

From monil at voltaire.com  Mon Feb  5 23:48:00 2007
From: monil at voltaire.com (Moni Levy)
Date: Tue, 6 Feb 2007 09:48:00 +0200
Subject: [openib-general] OFED-1.2 first release
In-Reply-To: <A15335FBE9BD2449AF2C9EF3D1EB8EA302F3F68C@xmb-sjc-216.amer.cisco.com>
References: <6C2C79E72C305246B504CBA17B5500C922B30E@mtlexch01.mtl.com>
	<A15335FBE9BD2449AF2C9EF3D1EB8EA302F3F68C@xmb-sjc-216.amer.cisco.com>
Message-ID: <6a122cc00702052348u5cf38560j689f6072992fd4ad@mail.gmail.com>

Vlad,

> # tail -10 /tmp/OFED.10899.log
> Wrote:
> /var/tmp/OFEDRPM/RPMS/x86_64/ib-bonding-0.9.0-1-rh-x86_64.rpm
> Wrote:
> /var/tmp/OFEDRPM/RPMS/x86_64/ib-bonding-debuginfo-0.9.0-1-rh-x86_64.rpm
> Executing(--clean): /bin/sh -e /var/tmp/rpm-tmp.98615
> + umask 022
> + cd /var/tmp/OFEDRPM/BUILD
> + rm -rf ib-bonding-0.9.0
> + exit 0
> /bin/mv: cannot stat
> `/var/tmp/OFEDRPM/RPMS/x86_64/ib-bonding-0.9.0-1.x86_64.rpm

I see that there is a small difference in the expected RPM name. Can
you fix that in the script or should we change the name of the RPM ?

-- Moni

> ': No such file or directory
> ERROR: Failed executing "/bin/mv -f
> /var/tmp/OFEDRPM/RPMS/x86_64/ib-bonding-0.9.
> 0-1.x86_64.rpm /tmp/OFED-1.2-20070205-1823/RPMS/redhat-release-4AS-4.1"


From dotanb at dev.mellanox.co.il  Mon Feb  5 23:56:04 2007
From: dotanb at dev.mellanox.co.il (Dotan Barak)
Date: Tue, 06 Feb 2007 09:56:04 +0200
Subject: [openib-general] Immediate data question
In-Reply-To: <349DCDA352EACF42A0C49FA6DCEA840350AAC4@G3W0634.americas.hpqcorp.net>
References: <6C2C79E72C305246B504CBA17B5500C905DC04@mtlexch01.mtl.com>
	<adaveigvg7q.fsf@cisco.com>
	<349DCDA352EACF42A0C49FA6DCEA840350AAC4@G3W0634.americas.hpqcorp.net>
Message-ID: <45C83494.6@dev.mellanox.co.il>

Hi CQ.

Tang, Changqing wrote:
> Roland:
> 	If I only want to send/recv 4 bytes with immediate data:
>
> On sender side:
> 	opcode = IBV_WR_SEND_WITH_IMM;
> 	imm_data = my_4_bytes_data;
>
> 	Do I still need to specify sg_list and num_sge ?
>   
If the data that is being sent is only the immediate data, so no MR 
should be registered in this side.
The SR will look like this:
sr.opcode = IBV_WR_SEND_WITH_IMM;
sr.imm_data = my_4_bytes_data;
sr.num_sge = 0;
> On receiver side, because the immediate data is inside the completion
> structure, do I need to post a receive for above message ?
> If I need to post a receive, do I need to specify sg_list and num_sge
> for the receive ?
>   
In the receiver side you must post RR (because SEND opcode consumes a RR).

If you are using UD QP, you must add s/g list with 40 bytes (of 
registered memory).
If you are not using UD QP, the s/g list in this side can be empty 
(num_sge = 0) and the data that was sent will be
provided to you in wc.imm_data.
> I looked the spec but did not find useful information.
>
> The reason I ask is that at some point, I can not(or hard) to provide
> registered memory only for 4 bytes data.
>   
I think that you can avoid registering those 4 bytes ...


Hope this helped you
Dotan


From sweitzen at cisco.com  Tue Feb  6 00:06:45 2007
From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen))
Date: Tue, 6 Feb 2007 00:06:45 -0800
Subject: [openib-general] OFED-1.2 first release
In-Reply-To: <A15335FBE9BD2449AF2C9EF3D1EB8EA302F3F697@xmb-sjc-216.amer.cisco.com>
References: <6C2C79E72C305246B504CBA17B5500C922B30E@mtlexch01.mtl.com>
	<A15335FBE9BD2449AF2C9EF3D1EB8EA302F3F68C@xmb-sjc-216.amer.cisco.com>
	<A15335FBE9BD2449AF2C9EF3D1EB8EA302F3F697@xmb-sjc-216.amer.cisco.com>
Message-ID: <A15335FBE9BD2449AF2C9EF3D1EB8EA302F9B609@xmb-sjc-216.amer.cisco.com>

Not getting MPI RPMS for Intel compilers, either.
 
Running /bin/rpm -Uhv
/tmp/OFED-1.2-20070205-1823/RPMS/redhat-release-4AS-4.1/mp
itests_mvapich2_gcc-2.0-698.x86_64.rpm
/tmp/OFED-1.2-20070205-1823/RPMS/redhat-release-4AS-4.1/mvapich2_intel-0
.9.8-1.x
86_64.rpm not found
Running /bin/rpm -Uhv
/tmp/OFED-1.2-20070205-1823/RPMS/redhat-release-4AS-4.1/op
enmpi_gcc-1.2b4ofedr13470-1ofed.x86_64.rpm
Running /bin/rpm -Uhv
/tmp/OFED-1.2-20070205-1823/RPMS/redhat-release-4AS-4.1/mp
itests_openmpi_gcc-2.0-698.x86_64.rpm
/tmp/OFED-1.2-20070205-1823/RPMS/redhat-release-4AS-4.1/openmpi_intel-1.
2b4ofedr
13470-1ofed.x86_64.rpm not found
ERROR: -.x86_64.rpm not found under
/tmp/OFED-1.2-20070205-1823/RPMS/redhat-rele
ase-4AS-4.1.
Installation finished successfully...

Scott

________________________________

	From: Scott Weitzenkamp (sweitzen) 
	Sent: Monday, February 05, 2007 9:44 PM
	To: Scott Weitzenkamp (sweitzen); 'Vladimir Sokolovsky';
'openfabrics-ewg at openib.org'; 'Tziporet Koren'
	Cc: 'openib-general at openib.org'
	Subject: RE: [openib-general] OFED-1.2 first release
	
	
	Moving on, I set ib_bonding=n in ofed.conf and try install.sh
again, and now get this:
	 
	...
	Building MVAPICH RPM. Please wait...
	 
	Using gcc compiler
	Running rpmbuild -v --rebuild --define '_topdir
/var/tmp/OFEDRPM' --define '_nam
	e mvapich_gcc' --define 'ofed 1' --define 'compiler gcc'
--define 'openib_prefix
	 /usr/local/ofed' --define 'build_root /var/tmp/OFED' --define
'_prefix /usr/loc
	al/ofed/mpi/gcc/mvapich-0.9.9'
/tmp/OFED-1.2-20070205-1823/SRPMS/mvapich-0.9.9-9
	71.src.rpm
	 
	ERROR: Failed executing "rpmbuild -v --rebuild --define '_topdir
/var/tmp/OFEDRP
	M' --define '_name mvapich_gcc' --define 'ofed 1' --define
'compiler gcc' --defi
	ne 'openib_prefix /usr/local/ofed' --define 'build_root
/var/tmp/OFED' --define
	'_prefix /usr/local/ofed/mpi/gcc/mvapich-0.9.9'
/tmp/OFED-1.2-20070205-1823/SRPM
	S/mvapich-0.9.9-971.src.rpm"
	 
	See log file: /tmp/OFED.6120.log
	 
	#  tail /tmp/OFED.6120.log
	+ LANG=C
	+ export LANG
	+ unset DISPLAY
	/var/tmp/rpm-tmp.870: line 33: syntax error near unexpected
token `)'
	error: Bad exit status from /var/tmp/rpm-tmp.870 (%install)
	 
	
	RPM build errors:
	    Bad exit status from /var/tmp/rpm-tmp.870 (%install)
	ERROR: Failed executing "rpmbuild -v --rebuild --define '_topdir
/var/tmp/OFEDRP
	M' --define '_name mvapich_gcc' --define 'ofed 1' --define
'compiler gcc' --defi
	ne 'openib_prefix /usr/local/ofed' --define 'build_root
/var/tmp/OFED' --define
	'_prefix /usr/local/ofed/mpi/gcc/mvapich-0.9.9'
/tmp/OFED-1.2-20070205-1823/SRPM
	S/mvapich-0.9.9-971.src.rpm"
	
	Scott

________________________________

		From: Scott Weitzenkamp (sweitzen) 
		Sent: Monday, February 05, 2007 9:27 PM
		To: Vladimir Sokolovsky; openfabrics-ewg at openib.org;
Tziporet Koren; Scott Weitzenkamp (sweitzen)
		Cc: openib-general at openib.org
		Subject: RE: [openib-general] OFED-1.2 first release
		
		
		Vlad and Tziporet,
		 
		It might help if you elaborated on what you meant by
"first release", you have been saying "code freeze" but really this is
"feature freeze", right?  This announcement is quite a bit different
from previous OFED announcements, where you detailed what features were
available and what OS were supported.  The daily build email mentions
compiling against kernels, but I haven't seen what distros were actually
tested.  Are we starting from scratch on compiling and testing with
distros like RHEL4?  Do you anticipate we will just go day by day with
builds trying to stabilize things initially?
		 
		In any case, here's what I see when I try to compile
with install.sh on RHEL4 U3 x86_64:
		 
		...
		/tmp/OFED-1.2-20070205-1823/build.sh: line 802:
kernel-ib: command not found
		Running rpmbuild --rebuild --target=noarch --define
'_topdir /var/tmp/OFEDRPM' -
		-define '_prefix /usr/local/ofed'
/tmp/OFED-1.2-20070205-1823/SRPMS/ofed-docs-1.
		2-0.src.rpm
		Running /bin/mv -f
/var/tmp/OFEDRPM/RPMS/noarch/ofed-docs-1.2-0.noarch.rpm /tmp/
		OFED-1.2-20070205-1823/RPMS/redhat-release-4AS-4.1
		Running rpmbuild --rebuild --target=noarch --define
'_topdir /var/tmp/OFEDRPM' -
		-define '_prefix /usr/local/ofed'
/tmp/OFED-1.2-20070205-1823/SRPMS/ofed-scripts
		-1.2-0.src.rpm
		Running /bin/mv -f
/var/tmp/OFEDRPM/RPMS/noarch/ofed-scripts-1.2-0.noarch.rpm /t
		mp/OFED-1.2-20070205-1823/RPMS/redhat-release-4AS-4.1
		Running rpmbuild --rebuild --define '_topdir
/var/tmp/OFEDRPM' --define '_prefix
		 /usr/local/ofed'
/tmp/OFED-1.2-20070205-1823/SRPMS/ib-bonding-0.9.0-1.src.rpm
		Running /bin/mv -f
/var/tmp/OFEDRPM/RPMS/x86_64/ib-bonding-0.9.0-1.x86_64.rpm /t
		mp/OFED-1.2-20070205-1823/RPMS/redhat-release-4AS-4.1
		 
		ERROR: Failed executing "/bin/mv -f
/var/tmp/OFEDRPM/RPMS/x86_64/ib-bonding-0.9.
		0-1.x86_64.rpm
/tmp/OFED-1.2-20070205-1823/RPMS/redhat-release-4AS-4.1"
		 
		See log file: /tmp/OFED.10899.log
		 
		# tail -10 /tmp/OFED.10899.log
		Checking for unpackaged file(s):
/usr/lib/rpm/check-files /var/tmp/ib-bonding-0.
		9.0-root
		Wrote:
/var/tmp/OFEDRPM/RPMS/x86_64/ib-bonding-0.9.0-1-rh-x86_64.rpm
		Wrote:
/var/tmp/OFEDRPM/RPMS/x86_64/ib-bonding-debuginfo-0.9.0-1-rh-x86_64.rpm
		Executing(--clean): /bin/sh -e /var/tmp/rpm-tmp.98615
		+ umask 022
		+ cd /var/tmp/OFEDRPM/BUILD
		+ rm -rf ib-bonding-0.9.0
		+ exit 0
		/bin/mv: cannot stat
`/var/tmp/OFEDRPM/RPMS/x86_64/ib-bonding-0.9.0-1.x86_64.rpm
		': No such file or directory
		ERROR: Failed executing "/bin/mv -f
/var/tmp/OFEDRPM/RPMS/x86_64/ib-bonding-0.9.
		0-1.x86_64.rpm
/tmp/OFED-1.2-20070205-1823/RPMS/redhat-release-4AS-4.1"
		

		Scott

________________________________

			From: openib-general-bounces at openib.org
[mailto:openib-general-bounces at openib.org] On Behalf Of Vladimir
Sokolovsky
			Sent: Monday, February 05, 2007 2:26 PM
			To: openfabrics-ewg at openib.org
			Cc: openib-general at openib.org
			Subject: [openib-general] OFED-1.2 first release
			
			
			Hi,
			
			OFED-1.2-20070205-1823.tgz can be downloaded
from
			
			http://www.openfabrics.org/builds/ofed-1.2/
			
			
			The first OFED package includes:
			
			
			ofa_kernel-1.2-alpha1.src.rpm
			
			ofa_user-1.2-alpha1.src.rpm
			
			mvapich-0.9.9-971.src.rpm
			
			mvapich2-0.9.8-1.src.rpm
			
			openmpi-1.2b4ofedr13470-1ofed.src.rpm
			
			mpitests-2.0-698.src.rpm
			
			open-iscsi-generic-2.0-742.src.rpm
			
			ib-bonding-0.9.0-1.src.rpm
			
			ofed-docs-1.2-0.src.rpm
			
			ofed-scripts-1.2-0.src.rpm
			
			
			Known issues:
			
			srptools - compilation fails
			
			openib_diags - compilation fails
			
			ibutils - not included yet
			
			
			To build OFED RPMs:
			
			cd OFED-1.2-20070205-1823
			
			./build.sh
			
			
			Created RPMs will be stored under
OFED-1.2-20070205-1823/RPMS/
			
			directory.
			
			
			To install OFED RPMs:
			
			cd OFED-1.2-20070205-1823
			
			./install.sh
			
			
			For a detailed installation guide, see
			
			OFED-1.2-xxx/docs/OFED_Installation_Guide.txt
			
			
			-- 
			
			Vladimir Sokolovsky <vlad at dev.mellanox.co.il>
<mailto:vlad at dev.mellanox.co.il> 
			
			Mellanox Technologies Ltd.
			 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070206/3e0943cc/attachment.html>

From vlad at mellanox.co.il  Tue Feb  6 00:19:45 2007
From: vlad at mellanox.co.il (Vladimir Sokolovsky)
Date: Tue, 06 Feb 2007 10:19:45 +0200
Subject: [openib-general] openib diags installation issue
In-Reply-To: <1170599665.5887.14.camel@vladsk-laptop>
References: <1170599665.5887.14.camel@vladsk-laptop>
Message-ID: <1170749985.6537.2.camel@vladsk-laptop>

Hi Hal,
Please merge the following commit to the ofed_1_2 branch of the management.git:

commit	6c819523a6a58e2ac4948327f256e49984dce9fb
Diags/Makefile.am: Fix for executing 'make DESTDIR=/var/tmp/OFED install'

Thanks,

-- 
Vladimir Sokolovsky <vlad at mellanox.co.il>
Mellanox Technologies Ltd.


From vlad at lists.openfabrics.org  Tue Feb  6 02:22:40 2007
From: vlad at lists.openfabrics.org (vlad at lists.openfabrics.org)
Date: Tue,  6 Feb 2007 02:22:40 -0800 (PST)
Subject: [openib-general] ofa_1_2_kernel 20070206-0200 daily build status
Message-ID: <20070206102240.A9687E60807@openfabrics.org>

This email was generated automatically, please do not reply


Common build parameters:  --with-ipoib-mod --with-sdp-mod --with-srp-mod --with-user_mad-mod --with-user_access-mod --with-mthca-mod --with-core-mod --with-addr_trans-mod --with-cxgb3-mod 

Passed:
Passed on i686 with 2.6.15-23-server
Passed on i686 with linux-2.6.19
Passed on i686 with linux-2.6.17
Passed on i686 with linux-2.6.12
Passed on i686 with linux-2.6.18
Passed on i686 with linux-2.6.16
Passed on i686 with linux-2.6.13
Passed on i686 with linux-2.6.15
Passed on i686 with linux-2.6.14
Passed on powerpc with linux-2.6.19
Passed on powerpc with linux-2.6.17
Passed on x86_64 with linux-2.6.18
Passed on x86_64 with linux-2.6.13
Passed on x86_64 with linux-2.6.12
Passed on powerpc with linux-2.6.18
Passed on x86_64 with linux-2.6.17
Passed on x86_64 with linux-2.6.15
Passed on x86_64 with linux-2.6.19
Passed on x86_64 with linux-2.6.14
Passed on x86_64 with linux-2.6.16
Passed on ppc64 with linux-2.6.18
Passed on powerpc with linux-2.6.12
Passed on powerpc with linux-2.6.16
Passed on powerpc with linux-2.6.13
Passed on ppc64 with linux-2.6.15
Passed on ppc64 with linux-2.6.16
Passed on ppc64 with linux-2.6.14
Passed on ppc64 with linux-2.6.12
Passed on ia64 with linux-2.6.19
Passed on ppc64 with linux-2.6.17
Passed on ppc64 with linux-2.6.19
Passed on powerpc with linux-2.6.14
Passed on ia64 with linux-2.6.18
Passed on powerpc with linux-2.6.15
Passed on ppc64 with linux-2.6.13
Passed on ia64 with linux-2.6.15
Passed on ia64 with linux-2.6.12
Passed on ia64 with linux-2.6.17
Passed on ia64 with linux-2.6.16
Passed on ia64 with linux-2.6.14
Passed on ia64 with linux-2.6.13

Failed:
Build failed on ia64 with linux-2.6.16.21-0.8-default
Log:
/home/vlad/tmp/ofa_1_2_kernel-20070206-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core/addr.c:380: error: implicit declaration of function ‘register_netevent_notifier’
/home/vlad/tmp/ofa_1_2_kernel-20070206-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core/addr.c: In function ‘addr_cleanup’:
/home/vlad/tmp/ofa_1_2_kernel-20070206-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core/addr.c:386: error: implicit declaration of function ‘unregister_netevent_notifier’
make[4]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070206-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core/addr.o] Error 1
make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070206-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core] Error 2
make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070206-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband] Error 2
make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070206-0200_linux-2.6.16.21-0.8-default_ia64_check] Error 2
make[1]: Leaving directory `/home/vlad/kernel.org/ia64/linux-2.6.16.21-0.8-default'
make: *** [kernel] Error 2
----------------------------------------------------------------------------------


From tziporet at mellanox.co.il  Tue Feb  6 02:35:18 2007
From: tziporet at mellanox.co.il (Tziporet Koren)
Date: Tue, 06 Feb 2007 12:35:18 +0200
Subject: [openib-general] [openfabrics-ewg] OFED-1.2 first package (was
	release)
In-Reply-To: <6C2C79E72C305246B504CBA17B5500C922B30E@mtlexch01.mtl.com>
References: <6C2C79E72C305246B504CBA17B5500C922B30E@mtlexch01.mtl.com>
Message-ID: <45C859E6.7020507@mellanox.co.il>

Vladimir Sokolovsky wrote:
> Hi,
>
> OFED-1.2-20070205-1823.tgz can be downloaded from
>
> http://www.openfabrics.org/builds/ofed-1.2/
Just a clarification:
This is the first OFED package and its not the alpha release yet.
We published it so everybody can fix issues we already found and basic 
installation testing.
Daily builds will be available from tomorrow.

Plan is to have first alpha release on Monday.
A detailed release mail will be sent with the release.

All - please work closely with Vlad to resolve all issues so we can make 
this Alpha.

Thanks,
Tziporet

 
From ogerlitz at voltaire.com  Tue Feb  6 02:40:57 2007
From: ogerlitz at voltaire.com (Or Gerlitz)
Date: Tue, 06 Feb 2007 12:40:57 +0200
Subject: [openib-general] please pull for 2.6.21: fix + add IB multicast
 support
In-Reply-To: <45C37BE9.5040105@ichips.intel.com>
References: <000701c741a6$16dc4760$ff0da8c0@amr.corp.intel.com>
	<45BF8E17.2010805@ichips.intel.com> <45C37BE9.5040105@ichips.intel.com>
Message-ID: <45C85B39.4080700@voltaire.com>

Sean Hefty wrote:
>> Sean Hefty (3):
>>        rdma_cm: Increment port number after close to avoid re-use.
>>        ib_sa: track multicast join/leave requests
>>        rdma_cm: add multicast communication support
> 
> Assuming that you haven't look at this yet, I updated the ib_sa patch 
> above to shorten the workqueue name, plus added a fourth patch to 
> shorten the workqueue names for ib_addr and rdma_cm.  E.g. "ib_mcast_wq" 
> became "ib_mcast".

> Let me know if you need any assistance.

Roland,

Can you comment on the multicast changes merge for 2.6.21 status?

We are working (developing and testing) with a userspace rdma cm based 
multicast app over this code during the last two months and are very 
satisfied with it. The testing included IPoIB, the user space app and 
multicast interoperability between them.

Or.


From halr at voltaire.com  Tue Feb  6 04:18:26 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 06 Feb 2007 07:18:26 -0500
Subject: [openib-general] openib diags installation issue
In-Reply-To: <1170749985.6537.2.camel@vladsk-laptop>
References: <1170599665.5887.14.camel@vladsk-laptop>
	<1170749985.6537.2.camel@vladsk-laptop>
Message-ID: <1170764304.4525.280004.camel@hal.voltaire.com>

Hi Vlad,

On Tue, 2007-02-06 at 03:19, Vladimir Sokolovsky wrote:
> Hi Hal,
> Please merge the following commit to the ofed_1_2 branch of the management.git:
> 
> commit	6c819523a6a58e2ac4948327f256e49984dce9fb
> Diags/Makefile.am: Fix for executing 'make DESTDIR=/var/tmp/OFED install'
> 
> Thanks,

Applied. Thanks.

-- Hal


From tziporet at mellanox.co.il  Tue Feb  6 04:51:36 2007
From: tziporet at mellanox.co.il (Tziporet Koren)
Date: Tue, 6 Feb 2007 14:51:36 +0200
Subject: [openib-general] OFED-1.2 first release
Message-ID: <6C2C79E72C305246B504CBA17B5500C9A0DD5C@mtlexch01.mtl.com>

I know - I just took the docs from OFED 1.1
I will work on the docs after we will have a working package.

Tziporet 

-----Original Message-----
From: openib-general-bounces at openib.org
[mailto:openib-general-bounces at openib.org] On Behalf Of Steve Wise
Sent: Tuesday, February 06, 2007 2:19 AM
To: Vladimir Sokolovsky
Cc: openib-general
Subject: Re: [openib-general] OFED-1.2 first release

BTW:  The README.txt still talks about OFED-1.1 and the October 2006
release.


From mst at mellanox.co.il  Tue Feb  6 05:26:39 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Tue, 6 Feb 2007 15:26:39 +0200
Subject: [openib-general] Backport and fix patches for ipath driver
In-Reply-To: <45C7AB72.4040400@pathscale.com>
References: <45C7AB72.4040400@pathscale.com>
Message-ID: <20070206132639.GA6937@mellanox.co.il>

> Quoting  Bryan O'Sullivan <bos at pathscale.com>:
> Subject: Backport and fix patches for ipath driver
> 
> Hi, Vlad and Tziporet -
> 
> Here's a round of fix and backport patches for the ipath driver, for 
> dropping into the OFED 1.2 tree.  The way in which they're organised 
> should, I hope, be clear.

Looks good, fixes look much cleaner than what we had for OFED 1.1.
I think fixes can be applied already.
However, I'm not sure the backports are ready to be applied as is yet.

Just taking a look at random:

./backport/2.6.18/ipath-50-mad-kmem_cache-2.6.19.patch
BACKPORT - kmem_cache_t disappeared after 2.6.19
diff -r a290ff6e9ae7 drivers/infiniband/core/mad.c
--- a/drivers/infiniband/core/mad.c     Wed Jan 31 14:47:02 2007 -0800
+++ b/drivers/infiniband/core/mad.c     Wed Jan 31 14:48:00 2007 -0800
@@ -46,7 +46,7 @@ MODULE_AUTHOR("Hal Rosenstock");
 MODULE_AUTHOR("Hal Rosenstock");
 MODULE_AUTHOR("Sean Hefty");

-static struct kmem_cache *ib_mad_cache;
+static kmem_cache_t *ib_mad_cache;

 static struct list_head ib_mad_port_list;
 static u32 ib_mad_client_id = 0;

This changes a core file, and does not seem to be related to ipath at all.
What problem does this solve?  I note that mad.c already seems to build fine on 2.6.18
for us - this is part of daily build.

Another example that looks strange:

BACKPORT - workqueues changed in 2.6.20

diff -r 8b94fcef1edd drivers/infiniband/hw/ipath/ipath_driver.c
--- a/drivers/infiniband/hw/ipath/ipath_driver.c        Thu Feb 01 08:54:29 2007 -0800
+++ b/drivers/infiniband/hw/ipath/ipath_driver.c        Thu Feb 01 08:57:19 2007 -0800
@@ -241,7 +241,7 @@ static struct ipath_devdata *ipath_alloc
        dd->pcidev = pdev;
        pci_set_drvdata(pdev, dd);

-       INIT_DELAYED_WORK(&dd->link_work, check_link_status);
+       INIT_WORK(&dd->link_work, check_link_status);

        list_add(&dd->ipath_list, &ipath_dev_list);


INIT_DELAYED_WORK is implemented in kernel_addons, so this should
not be necessary.

@@ -725,6 +725,7 @@ static void __devexit ipath_remove_one(s
         */
        ipath_shutdown_device(dd);

+#undef cancel_delayed_work
        cancel_delayed_work(&dd->link_work);
        flush_scheduled_work();

This undef looks quite ugly. What does it do?

Please go over the backport patches and check whether they are really necessary.
I think you will mostly discover that the kernel_addons mechanism makes
the backport patches unnecessary. If not, you should try adding
things under kernel_addons as first choice so that everyone benefits.


-- 
MST


From swise at opengridcomputing.com  Tue Feb  6 05:53:03 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Tue, 06 Feb 2007 07:53:03 -0600
Subject: [openib-general] idea for ofed 1 2 kernel file structure
In-Reply-To: <20070206051356.GF16598@mellanox.co.il>
References: <1170711823.16661.78.camel@stevo-desktop>
	<20070206051356.GF16598@mellanox.co.il>
Message-ID: <1170769983.19662.0.camel@stevo-desktop>

On Tue, 2007-02-06 at 07:13 +0200, Michael S. Tsirkin wrote:
> > Quoting Steve Wise <swise at opengridcomputing.com>:
> > Subject: Re: [openib-general] idea for ofed 1 2 kernel file structure
> > 
> > On Mon, 2007-02-05 at 06:20 -0800, Roland Dreier wrote:
> > >  > I looked a current ofed 1.2 kernel tree and there is 1 thing I dislike:
> > >  > It is hard to see changes that are specific to OFED since we have whole
> > >  > kernel history mixed in.
> > > 
> > > I'm not sure how you have your branches set up, but if you have
> > > something like a "linus" branch that tracks the upstream kernel, it's
> > > easy to do stuff like "git log linus.." or "git diff linus.. drivers/infiniband"
> > > and see the differences that way.
> > > 
> > > Using git that way (which is what it's designed for, after all) seems
> > > better than some scripts to munge together two trees.
> > > 
> > 
> > So git "log linus.." would show commits in the current branch that are
> > not in the linus branch, correct?
> > 
> > That would work.  Two branches:  one with the main kernel git tree, and
> > based on that + the ofed-specific changes.  
> 
> Well, that's what we have now.
> The master branch tracks upstream kernel.
> 

I didn't realize git "log master.." would show only the ofed-specific
commits...


From mst at mellanox.co.il  Tue Feb  6 05:58:30 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Tue, 6 Feb 2007 15:58:30 +0200
Subject: [openib-general] QoS in opensm will not be part of OFED 1.2
In-Reply-To: <20070205154922.GC4246@mellanox.co.il>
References: <1170690105.4525.201879.camel@hal.voltaire.com>
	<20070205154922.GC4246@mellanox.co.il>
Message-ID: <20070206135830.GA7750@mellanox.co.il>

> Quoting Michael S. Tsirkin <mst at mellanox.co.il>:
> Subject: Re: QoS in opensm will not be part of OFED 1.2
> 
> > > > > I had an AI to check the QoS status with OSM.
> > > > > Conclusions are that QoS support in OpenSM will not be part of OFED 1.2 
> > > > > (I updated the plan on the Wiki)
> > > > > 
> > > > > The reasons for this are:
> > > > > 1. Code not ready at code freeze.
> > > > > 2. There are technical discussion in the list regarding some 
> > > > >    implementation details (e.g. XML or text syntax).
> > > > > 3. SPEC is not published by IBTA yet.
> > > > 
> > > > I think this last reason also applies to the end client QoS changes as
> > > > well.
> > > 
> > > Yes. But the other 2 don't.
> > 
> > Right but I think that precludes it from being included in OFED right
> > now.
> 
> Since the code is already included in OFED, moving it out would violate the feature
> freeze rules, unless there's an actual bug this would fix.

OTOH, you are right in that without SM support we can't claim to have this
feature at all. So, to avoid controversy, I have just removed the QoS patches
from IB core and pushed the code out.

-- 
MST


From soporte at banesco.ve  Tue Feb  6 06:14:23 2007
From: soporte at banesco.ve (Banesco Banco Universal)
Date: Tue, 06 Feb 2007 06:14:23 -0800
Subject: [openib-general] Seguridad en su cuenta.
Message-ID: <WEBHO8ZFMsf6hZnLXjr00005454@cp90.mysite4now.com>

An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070206/be9aca2a/attachment.html>

From halr at voltaire.com  Tue Feb  6 06:21:52 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 06 Feb 2007 09:21:52 -0500
Subject: [openib-general] [RFC] [PATCH] ib_usa: export multicast and
 informinfo registration to userspace
In-Reply-To: <000001c74726$94d0f500$e598070a@amr.corp.intel.com>
References: <000001c74726$94d0f500$e598070a@amr.corp.intel.com>
Message-ID: <1170771710.4525.287718.camel@hal.voltaire.com>

On Fri, 2007-02-02 at 19:02, Sean Hefty wrote:
> Export SA client capabilities for multicast and SA event registration
> to userspace.  Multicast and event registration are tracked on a per
> port basis, with tracking done by the ib_sa kernel module.
> 
> Based on feedback from the list, a new userspace SA module was added,
> rather than trying to rework the usermad interface.  The user to kernel
> interface is minimal, but was designed to be flexible enough to add
> additional SA client support if needed.  (E.g. local SA cache lookup,
> SA queries, service registration, etc.)
> 
> Signed-off-by: Sean Hefty <sean.hefty at intel.com>
> ---
> The following patch is also available from the user_sa branch of my
> rdma-dev.git tree, and is dependent on the informinfo branch/patch
> posted earlier to the list.  (A couple of small fixes to the informinfo
> code have been added since the original patches.)  A userspace sa library
> is also available.
> 
> The informinfo and userspace support was completed as part of the
> PathForward project at the request of the US National Laboratories.
> 

[snip...]

> diff --git a/drivers/infiniband/core/usa.c b/drivers/infiniband/core/usa.c
> new file mode 100644
> index 0000000..ae05091
> --- /dev/null
> +++ b/drivers/infiniband/core/usa.c
> @@ -0,0 +1,792 @@

[snip...]

> +static int process_mcast(struct usa_file *file, struct ib_usa_request *req,
> +			 int out_len)
> +{
> +	/* Only indirect requests are currently supported. */
> +	if (!req->local)
> +		return -ENOSYS;
> +
> +	switch (req->method) {
> +	case IB_MGMT_METHOD_GET:
> +		return get_mcast(file, req, out_len);
> +	case IB_MGMT_METHOD_SET:
> +		return join_mcast(file, req, out_len);
> +	default:
> +		return -EINVAL;

Should leaving a multicast group also be supported ?

-- Hal


From halr at voltaire.com  Tue Feb  6 06:34:25 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 06 Feb 2007 09:34:25 -0500
Subject: [openib-general] QoS in opensm will not be part of OFED 1.2
In-Reply-To: <20070206135830.GA7750@mellanox.co.il>
References: <1170690105.4525.201879.camel@hal.voltaire.com>
	<20070205154922.GC4246@mellanox.co.il>
	<20070206135830.GA7750@mellanox.co.il>
Message-ID: <1170772464.4525.288496.camel@hal.voltaire.com>

On Tue, 2007-02-06 at 08:58, Michael S. Tsirkin wrote:
> > Quoting Michael S. Tsirkin <mst at mellanox.co.il>:
> > Subject: Re: QoS in opensm will not be part of OFED 1.2
> > 
> > > > > > I had an AI to check the QoS status with OSM.
> > > > > > Conclusions are that QoS support in OpenSM will not be part of OFED 1.2 
> > > > > > (I updated the plan on the Wiki)
> > > > > > 
> > > > > > The reasons for this are:
> > > > > > 1. Code not ready at code freeze.
> > > > > > 2. There are technical discussion in the list regarding some 
> > > > > >    implementation details (e.g. XML or text syntax).
> > > > > > 3. SPEC is not published by IBTA yet.
> > > > > 
> > > > > I think this last reason also applies to the end client QoS changes as
> > > > > well.
> > > > 
> > > > Yes. But the other 2 don't.
> > > 
> > > Right but I think that precludes it from being included in OFED right
> > > now.
> > 
> > Since the code is already included in OFED, moving it out would violate the feature
> > freeze rules, unless there's an actual bug this would fix.
> 
> OTOH, you are right in that without SM support we can't claim to have this
> feature at all. So, to avoid controversy, I have just removed the QoS patches
> from IB core and pushed the code out.

I think that the mthca patch to encode SL in sched_queue field to
improve hardware QoS guarantees for connected QPs is useful as this can
be exercised by IPoIB-CM. If so, should/can this be included ?

-- Hal


From mst at mellanox.co.il  Tue Feb  6 06:41:03 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Tue, 6 Feb 2007 16:41:03 +0200
Subject: [openib-general] QoS in opensm will not be part of OFED 1.2
In-Reply-To: <1170772464.4525.288496.camel@hal.voltaire.com>
References: <1170772464.4525.288496.camel@hal.voltaire.com>
Message-ID: <20070206144103.GB9534@mellanox.co.il>

> > > Quoting Michael S. Tsirkin <mst at mellanox.co.il>:
> > > Subject: Re: QoS in opensm will not be part of OFED 1.2
> > > 
> > > > > > > I had an AI to check the QoS status with OSM.
> > > > > > > Conclusions are that QoS support in OpenSM will not be part of OFED 1.2 
> > > > > > > (I updated the plan on the Wiki)
> > > > > > > 
> > > > > > > The reasons for this are:
> > > > > > > 1. Code not ready at code freeze.
> > > > > > > 2. There are technical discussion in the list regarding some 
> > > > > > >    implementation details (e.g. XML or text syntax).
> > > > > > > 3. SPEC is not published by IBTA yet.
> > > > > > 
> > > > > > I think this last reason also applies to the end client QoS changes as
> > > > > > well.
> > > > > 
> > > > > Yes. But the other 2 don't.
> > > > 
> > > > Right but I think that precludes it from being included in OFED right
> > > > now.
> > > 
> > > Since the code is already included in OFED, moving it out would violate the feature
> > > freeze rules, unless there's an actual bug this would fix.
> > 
> > OTOH, you are right in that without SM support we can't claim to have this
> > feature at all. So, to avoid controversy, I have just removed the QoS patches
> > from IB core and pushed the code out.
> 
> I think that the mthca patch to encode SL in sched_queue field to
> improve hardware QoS guarantees for connected QPs is useful as this can
> be exercised by IPoIB-CM. If so, should/can this be included ?

OK. Note this is still untested, and off by default.

-- 
MST


From vlad at dev.mellanox.co.il  Tue Feb  6 06:59:19 2007
From: vlad at dev.mellanox.co.il (Vladimir Sokolovsky)
Date: Tue, 06 Feb 2007 16:59:19 +0200
Subject: [openib-general] [openfabrics-ewg] OFED 1.2 release - to be
 reviewed in the meeting today
In-Reply-To: <6a122cc00702010817j52958d85n1d141316e29a7ebf@mail.gmail.com>
References: <45BDFF11.9080901@mellanox.co.il>
	<45BFF296.8000908@cse.ohio-state.edu> <45C08E47.2040506@mellanox.co.il>
	<6a122cc00702010817j52958d85n1d141316e29a7ebf@mail.gmail.com>
Message-ID: <1170773959.6537.17.camel@vladsk-laptop>

On Thu, 2007-02-01 at 18:17 +0200, Moni Levy wrote:
> Tziporet,
> On 1/31/07, Tziporet Koren <tziporet at mellanox.co.il> wrote:
> > Shaun Rowland wrote:
> > >
> > > Hi. I am not exactly sure where the ofed_1_2 directory for MPI SRPMs is
> > > supposed to go. I assume from previous meetings this is just a
> > > filesystem directory. Should it be a directory in my home directory on
> > > staging.openfabrics.org, in ~/public_html, or is there something else I
> > > need to do to put this into place? I think from the previous MPI
> > > specific meeting, this was supposed to be done in a web directory. Since
> > > I am unclear, I wanted to ask here.
> >
> > Please place your SRPM under your home directory at ofed_1_2 directory.
> > Then you can make this directory accessible to the web in this way:
> > 1. mkdir public_html
> > 2. chmod 755 public_html
> >
> > Now you can put any stuff under public_html (also symbolic links) and it
> > will be available via web
> > www.openfabrics.org/~<user name>/
> 
> I have put the ib-bonding SRPM in ~monis/ofed_1_2
> 
> --Moni

Hi Moni,
Please move ~monis/ofed_1_2 to ~monis/public_html/ofed_1_2

Thanks,

-- 
Vladimir Sokolovsky <vlad at dev.mellanox.co.il>
Mellanox Technologies Ltd.


From mst at mellanox.co.il  Tue Feb  6 07:03:21 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Tue, 6 Feb 2007 17:03:21 +0200
Subject: [openib-general] idea for ofed 1 2 kernel file structure
In-Reply-To: <6C2C79E72C305246B504CBA17B5500C905DC04@mtlexch01.mtl.com>
References: <6C2C79E72C305246B504CBA17B5500C905DC04@mtlexch01.mtl.com>
Message-ID: <20070206150321.GA21776@mellanox.co.il>

> Quoting  Michael S. Tsirkin <mst at mellanox.co.il>:
> Subject: idea for ofed 1 2 kernel file structure
> 
> Hi!
> 
> I looked a current ofed 1.2 kernel tree and there is 1 thing I dislike:
> 
> It is hard to see changes that are specific to OFED since we have whole kernel
> history mixed in.
> 
>  
> 
> It would easy to split OFED specific files In separate directory and have OFED
> scripts combine that with upstream kernel.
> 
>  
> 
> All out of tree modules we distribute would go there too.
> 
> What do others think about this?

OK, I didn't quite get whether the majority likes this or not,
so I created such a repository, extracted the ofed specific history
and imported it there.

Take a look here:
git://git.openfabrics.org/~mst/newofed.git

Build scripts will have to be adjusted to add
necessary kernel components that we use.

Another nice thing about this layout, is that users (if they so wish)
will be able to use just linux kernel source tarball instead of full linux
kernel git.

OFED maintainers, you are the primary users of the OFED git.
Please comment which layout is better for you.

-- 
MST


From swise at opengridcomputing.com  Tue Feb  6 07:15:57 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Tue, 06 Feb 2007 09:15:57 -0600
Subject: [openib-general] [PATCH] ofed_1_2 Cleanup RHEL4U4
 neteventbackport]
In-Reply-To: <20070204155244.GC20087@mellanox.co.il>
References: <1170604137.4129.13.camel@linux-q667.site>
	<20070204155244.GC20087@mellanox.co.il>
Message-ID: <1170774957.19662.13.camel@stevo-desktop>

Hey guys,

This still hasn't been pulled in yet.  Its trivial and its up to you if
it goes in, but lemme know so I can remove it from my list of pending
patches.

Thanks,


Steve.


On Sun, 2007-02-04 at 17:52 +0200, Michael S. Tsirkin wrote:
> No, but it really makes sense. Vlad?
> 
> Quoting Steve WIse <swise at opengridcomputing.com>:
> Subject: Re: [openib-general] [PATCH] ofed_1_2 Cleanup RHEL4U4 neteventbackport]
> 
> Vlad/Michael,
> 
> I'm still tracking this as an outstanding patch.  Have you pulled this
> in yet?
> 
> Thanks,
> 
> Steve.
> 
> 
> On Thu, 2007-02-01 at 14:07 -0600, Steve Wise wrote:
> > From: Steve Wise <swise at opengridcomputing.com>
> > 
> > Add skbuff.h to include list for RHEL4U4 netevent.c file.  This makes
> > it identical to the SLES9SP3 file.
> > 
> > Signed-off-by: Steve Wise <swise at opengridcomputing.com>
> > ---
> > 
> >  .../backport/2.6.9_U4/include/src/netevent.c       |    1 +
> >  1 files changed, 1 insertions(+), 0 deletions(-)
> > 
> > diff --git a/kernel_addons/backport/2.6.9_U4/include/src/netevent.c b/kernel_addons/backport/2.6.9_U4/include/src/netevent.c
> > index 1589300..87fb55c 100644
> > --- a/kernel_addons/backport/2.6.9_U4/include/src/netevent.c
> > +++ b/kernel_addons/backport/2.6.9_U4/include/src/netevent.c
> > @@ -13,6 +13,7 @@
> >   *	Fixes:
> >   */
> >  
> > +#include <linux/skbuff.h>
> >  #include <linux/rtnetlink.h>
> >  #include <linux/notifier.h>
> >  #include <linux/mutex.h>
> > 
> > 
> > _______________________________________________
> > openib-general mailing list
> > openib-general at openib.org
> > http://openib.org/mailman/listinfo/openib-general
> > 
> > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
> > 
> 


From swise at opengridcomputing.com  Tue Feb  6 07:38:41 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Tue, 06 Feb 2007 09:38:41 -0600
Subject: [openib-general] OFED-1.2 first release - provider library install
	problem
In-Reply-To: <6C2C79E72C305246B504CBA17B5500C922B30E@mtlexch01.mtl.com>
References: <6C2C79E72C305246B504CBA17B5500C922B30E@mtlexch01.mtl.com>
Message-ID: <1170776321.19662.28.camel@stevo-desktop>

Vlad,

After installing the  test alpha1 build rpms on rhel4u4 with a
kernel.org 2.6.20 kernel, it appears that the provider library config
files didn't get installed for libcxgb3:

[root at r1-iw ~]#  rping -s -a 0.0.0.0 -p 9999
libibverbs: Warning: couldn't open config directory '/usr/local/ofed/etc/libibverbs.d'.
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
libibverbs: Warning: couldn't open config directory '/usr/local/ofed/etc/libibverbs.d'.
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
libibverbs: Warning: no userspace device-specific driver found for uverbs0
Segmentation fault
[root at r1-iw ~]# ls /usr/local/ofed/etc
ls: /usr/local/ofed/etc: No such file or directory
[root at r1-iw ~]#

I'm running with the cxgb3 driver, so I guess libcxgb3 didn't install
itself correctly?  This works when doing 'make install' from the
userspace tarballs.  Is there some rpm magic missing?  I'm not sure how
to debug this as I'm rpm-challenged (but willing to learn :).

It appears libcxgb3 installed its v2 libs correctly:  

[root at r1-iw ~]# ls /usr/local/ofed/lib64
libcxgb3.a           libdat.a          libibcommon.so        libibumad.a         libibverbs.so.1.0.0
libcxgb3-rdmav2.so   libdat.so         libibcommon.so.1      libibumad.so        librdmacm.so
libcxgb3.so          libdat.so.1       libibcommon.so.1.0.0  libibumad.so.1      librdmacm.so.0.9.0
libdaplcma.a         libdat.so.1.0.2   libibmad.a            libibumad.so.1.0.0
libdaplcma.so        libibcm.so        libibmad.so           libibverbs.a
libdaplcma.so.1      libibcm.so.0.9.0  libibmad.so.1         libibverbs.so
libdaplcma.so.1.0.2  libibcommon.a     libibmad.so.1.2.0     libibverbs.so.1
[root at r1-iw ~]#

But /usr/local/ofed/etc/libibverbs.d didn't get created and the cxgb3.driver file installed.


Steve.


From monis at voltaire.com  Tue Feb  6 07:47:57 2007
From: monis at voltaire.com (Moni Shoua)
Date: Tue, 06 Feb 2007 17:47:57 +0200
Subject: [openib-general] [PATCH] IB/ipoib get net_device from ipoib_neigh
 instead of linux neighbour
Message-ID: <45C8A32D.2000504@voltaire.com>

Michael, Roland,

I'd appreciate if you take a look at this and give your comments.

The patch here refers to this thread about adding bonding 
support for IPoIB interfaces and is necessary for it to work properly.
http://openib.org/pipermail/openib-general/2007-January/031934.html

The patch here is for upstream kernel while there is a version of the patch 
for OFED as well (for kernels up to 2.6.16)
http://openib.org/pipermail/openib-general/2007-January/031935.html

thanks
- MoniS

------------------------------------------------------------------------------
IPoIB uses a two layer neighboring scheme, such that for each struct neighbour
whose device is an ipoib one, there is a struct ipoib_neigh buddy which is
created on demand at the tx flow by an ipoib_neigh_alloc(skb->dst->neighbour)
call.

When using the bonding driver, neighbours are created by the net stack on behalf
of the bonding (master) device. On the tx flow the bonding code gets an skb such
that skb->dev points to the master device, it changes this skb to point on the
slave device and calls the slave hard_start_xmit function.

Combing these two flows, there is a hole if some code at ipoib
(ipoib_neigh_destructor) assumes that for each struct neighbour it gets, n->dev
is an ipoib device so for example netdev_priv(n->dev) would be of type struct
ipoib_dev_priv.

To fix it, this patch adds a dev field to struct ipoib_neigh which is used
instead of the struct neighbour dev one.

Signed-off-by: Moni Shoua <monis at voltaire.com>
Signed-off-by: Or Gerlitz <ogerlitz at voltaire.com>
---
 ipoib.h           |    4 +++-
 ipoib_main.c      |   23 +++++++++++++----------
 ipoib_multicast.c |    2 +-
 3 files changed, 17 insertions(+), 12 deletions(-)

Index: infiniband/drivers/infiniband/ulp/ipoib/ipoib.h
===================================================================
--- infiniband.orig/drivers/infiniband/ulp/ipoib/ipoib.h	2007-01-22 12:11:25.000000000 +0200
+++ infiniband/drivers/infiniband/ulp/ipoib/ipoib.h	2007-01-22 12:18:06.101698456 +0200
@@ -216,6 +216,7 @@ struct ipoib_neigh {
 	struct sk_buff_head queue;
 
 	struct neighbour   *neighbour;
+	struct net_device *dev;
 
 	struct list_head    list;
 };
@@ -232,7 +233,8 @@ static inline struct ipoib_neigh **to_ip
 				     INFINIBAND_ALEN, sizeof(void *));
 }
 
-struct ipoib_neigh *ipoib_neigh_alloc(struct neighbour *neigh);
+struct ipoib_neigh *ipoib_neigh_alloc(struct neighbour *neigh,
+				      struct net_device *dev);
 void ipoib_neigh_free(struct net_device *dev, struct ipoib_neigh *neigh);
 
 extern struct workqueue_struct *ipoib_workqueue;
Index: infiniband/drivers/infiniband/ulp/ipoib/ipoib_main.c
===================================================================
--- infiniband.orig/drivers/infiniband/ulp/ipoib/ipoib_main.c	2007-01-22 12:11:33.000000000 +0200
+++ infiniband/drivers/infiniband/ulp/ipoib/ipoib_main.c	2007-01-22 12:34:57.599156580 +0200
@@ -490,7 +490,7 @@ static void neigh_add_path(struct sk_buf
 	struct ipoib_path *path;
 	struct ipoib_neigh *neigh;
 
-	neigh = ipoib_neigh_alloc(skb->dst->neighbour);
+	neigh = ipoib_neigh_alloc(skb->dst->neighbour, skb->dev);
 	if (!neigh) {
 		++priv->stats.tx_dropped;
 		dev_kfree_skb_any(skb);
@@ -769,32 +769,34 @@ static void ipoib_set_mcast_list(struct 
 static void ipoib_neigh_destructor(struct neighbour *n)
 {
 	struct ipoib_neigh *neigh;
-	struct ipoib_dev_priv *priv = netdev_priv(n->dev);
+	struct ipoib_dev_priv *priv;
 	unsigned long flags;
 	struct ipoib_ah *ah = NULL;
 
-	ipoib_dbg(priv,
-		  "neigh_destructor for %06x " IPOIB_GID_FMT "\n",
-		  IPOIB_QPN(n->ha),
-		  IPOIB_GID_RAW_ARG(n->ha + 4));
-
-	spin_lock_irqsave(&priv->lock, flags);
 
 	neigh = *to_ipoib_neigh(n);
 	if (neigh) {
+		priv = netdev_priv(neigh->dev);
+		ipoib_dbg(priv,
+			  "neigh_destructor for %06x " IPOIB_GID_FMT "\n",
+			  IPOIB_QPN(n->ha),
+			  IPOIB_GID_RAW_ARG(n->ha + 4));
+
+		spin_lock_irqsave(&priv->lock, flags);
 		if (neigh->ah)
 			ah = neigh->ah;
 		list_del(&neigh->list);
 		ipoib_neigh_free(n->dev, neigh);
+		spin_unlock_irqrestore(&priv->lock, flags);
 	}
 
-	spin_unlock_irqrestore(&priv->lock, flags);
 
 	if (ah)
 		ipoib_put_ah(ah);
 }
 
-struct ipoib_neigh *ipoib_neigh_alloc(struct neighbour *neighbour)
+struct ipoib_neigh *ipoib_neigh_alloc(struct neighbour *neighbour,
+				      struct net_device *dev)
 {
 	struct ipoib_neigh *neigh;
 
@@ -803,6 +805,7 @@ struct ipoib_neigh *ipoib_neigh_alloc(st
 		return NULL;
 
 	neigh->neighbour = neighbour;
+	neigh->dev = dev;
 	*to_ipoib_neigh(neighbour) = neigh;
 	skb_queue_head_init(&neigh->queue);
 
Index: infiniband/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
===================================================================
--- infiniband.orig/drivers/infiniband/ulp/ipoib/ipoib_multicast.c	2007-01-22 12:11:25.000000000 +0200
+++ infiniband/drivers/infiniband/ulp/ipoib/ipoib_multicast.c	2007-01-22 12:18:06.151689482 +0200
@@ -774,7 +774,7 @@ out:
 		if (skb->dst            &&
 		    skb->dst->neighbour &&
 		    !*to_ipoib_neigh(skb->dst->neighbour)) {
-			struct ipoib_neigh *neigh = ipoib_neigh_alloc(skb->dst->neighbour);
+			struct ipoib_neigh *neigh = ipoib_neigh_alloc(skb->dst->neighbour, skb->dev);
 
 			if (neigh) {
 				kref_get(&mcast->ah->ref);


From halr at voltaire.com  Tue Feb  6 07:52:50 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 06 Feb 2007 10:52:50 -0500
Subject: [openib-general] [PATCH 1/2] OpenSM: Add a printable node
 description to osm_node_t
Message-ID: <1170777169.4525.293473.camel@hal.voltaire.com>

OpenSM: Add a printable node description to osm_node_t
Also, convert memcpy's to use this rather than temporary one

Signed-off-by: Ira K. Weiny <weiny2 at llnl.gov>
Signed-off-by: Hal Rosenstock <halr at voltaire.com>

diff --git a/osm/include/opensm/osm_node.h b/osm/include/opensm/osm_node.h
index 8417f10..6f95d5d 100644
--- a/osm/include/opensm/osm_node.h
+++ b/osm/include/opensm/osm_node.h
@@ -107,6 +107,7 @@ typedef struct _osm_node
 	ib_node_desc_t	node_desc;
 	uint32_t	discovery_count;
 	uint32_t	physp_tbl_size;
+	char		print_desc[IB_NODE_DESCRIPTION_SIZE+1];
 	osm_physp_t	physp_table[1];
 } osm_node_t;
 /*
@@ -135,6 +136,9 @@ typedef struct _osm_node
 *		than the number of ports in the node, since port numbers
 *		start with 1 for some bizzare reason.
 *
+*	print_desc
+*		A printable version of the node description.
+*
 *	phsyp_table
 *		Array of physical port objects belonging to this node.
 *		Index is contiguous by local port number.
diff --git a/osm/opensm/osm_drop_mgr.c b/osm/opensm/osm_drop_mgr.c
index 6c5939e..0d08ff6 100644
--- a/osm/opensm/osm_drop_mgr.c
+++ b/osm/opensm/osm_drop_mgr.c
@@ -367,19 +367,12 @@ __osm_drop_mgr_remove_port(
 
   if (osm_log_is_active( p_mgr->p_log, OSM_LOG_INFO ))
   {
-    char desc[IB_NODE_DESCRIPTION_SIZE + 1];
-
-    if (p_node)
-    {
-      memcpy(desc, p_node->node_desc.description, IB_NODE_DESCRIPTION_SIZE);
-      desc[IB_NODE_DESCRIPTION_SIZE] = '\0';
-    }
     osm_log( p_mgr->p_log, OSM_LOG_INFO,
              "__osm_drop_mgr_remove_port: "
              "Removed port with GUID:0x%016" PRIx64
              " LID range [0x%X,0x%X] of node:%s\n",
              cl_ntoh64( port_gid.unicast.interface_id ),
-             min_lid_ho, max_lid_ho, p_node ? desc : "UNKNOWN" );
+             min_lid_ho, max_lid_ho, p_node ? p_node->print_desc : "UNKNOWN" );
   }
 
  Exit:
diff --git a/osm/opensm/osm_node_desc_rcv.c b/osm/opensm/osm_node_desc_rcv.c
index 13c5a93..fc96c12 100644
--- a/osm/opensm/osm_node_desc_rcv.c
+++ b/osm/opensm/osm_node_desc_rcv.c
@@ -69,23 +69,23 @@ __osm_nd_rcv_process_nd(
   IN osm_node_t* const p_node,
   IN const ib_node_desc_t* const p_nd )
 {
-  char desc[IB_NODE_DESCRIPTION_SIZE + 1];
   OSM_LOG_ENTER( p_rcv->p_log, __osm_nd_rcv_process_nd );
 
+  memcpy( &p_node->node_desc.description, p_nd, sizeof(*p_nd) );
+
+  /* also set up a printable version */
+  memcpy( &p_node->print_desc, p_nd, sizeof(*p_nd) );
+  p_node->print_desc[IB_NODE_DESCRIPTION_SIZE] = '\0';
+
   if( osm_log_is_active( p_rcv->p_log, OSM_LOG_VERBOSE ) )
   {
-    memcpy( desc, p_nd, sizeof(*p_nd) );
-    /* Guarantee null termination before printing. */
-    desc[IB_NODE_DESCRIPTION_SIZE] = '\0';
-
     osm_log( p_rcv->p_log, OSM_LOG_VERBOSE,
              "__osm_nd_rcv_process_nd: "
              "Node 0x%" PRIx64 "\n\t\t\t\tDescription = %s\n",
-             cl_ntoh64( osm_node_get_node_guid( p_node )), desc );
+             cl_ntoh64( osm_node_get_node_guid( p_node )),
+             p_node->print_desc);
   }
 
-  memcpy( &p_node->node_desc.description, p_nd, sizeof(*p_nd) );
-
   OSM_LOG_EXIT( p_rcv->p_log );
 }
 
diff --git a/osm/opensm/osm_state_mgr.c b/osm/opensm/osm_state_mgr.c
index 16297c9..2905857 100644
--- a/osm/opensm/osm_state_mgr.c
+++ b/osm/opensm/osm_state_mgr.c
@@ -1076,7 +1076,6 @@ __osm_topology_file_create(
    const osm_node_t *p_node;
    char *file_name;
    FILE *rc;
-   char desc[IB_NODE_DESCRIPTION_SIZE + 1];
 
    OSM_LOG_ENTER( p_mgr->p_log, __osm_topology_file_create );
 
@@ -1139,10 +1138,6 @@ __osm_topology_file_create(
                p_default_physp = p_physp;
             }
 
-            memcpy(desc, p_node->node_desc.description,
-                   IB_NODE_DESCRIPTION_SIZE);
-            desc[IB_NODE_DESCRIPTION_SIZE] = '\0';
-
             fprintf( rc, "{ %s%s Ports:%02X"
                      " SystemGUID:%016" PRIx64
                      " NodeGUID:%016" PRIx64
@@ -1165,7 +1160,7 @@ __osm_topology_file_create(
                                 ( &p_node->node_info ) ),
                      cl_ntoh16( p_node->node_info.device_id ),
                      cl_ntoh32( p_node->node_info.revision ),
-                     desc,
+                     p_node->print_desc,
                      cl_ntoh16( p_default_physp->port_info.base_lid ),
                      cPort );
 
@@ -1180,10 +1175,6 @@ __osm_topology_file_create(
                p_default_physp = p_rphysp;
             }
 
-            memcpy(desc, p_nbnode->node_desc.description,
-                   IB_NODE_DESCRIPTION_SIZE);
-            desc[IB_NODE_DESCRIPTION_SIZE] = '\0';
-
             fprintf( rc, "{ %s%s Ports:%02X"
                      " SystemGUID:%016" PRIx64
                      " NodeGUID:%016" PRIx64
@@ -1206,7 +1197,7 @@ __osm_topology_file_create(
                                 ( &p_nbnode->node_info ) ),
                      cl_ntoh32( p_nbnode->node_info.device_id ),
                      cl_ntoh32( p_nbnode->node_info.revision ),
-                     desc,
+                     p_nbnode->print_desc,
                      cl_ntoh16( p_default_physp->port_info.base_lid ),
                      p_rphysp->port_num );
 
@@ -1662,7 +1653,6 @@ __osm_state_mgr_report_new_ports(
    ib_net64_t port_guid;
    uint16_t min_lid_ho;
    uint16_t max_lid_ho;
-   char desc[IB_NODE_DESCRIPTION_SIZE + 1];
 
    OSM_LOG_ENTER( p_mgr->p_log, __osm_state_mgr_report_new_ports );
 
@@ -1704,19 +1694,13 @@ __osm_state_mgr_report_new_ports(
                   ib_get_err_str( status ) );
       }
       osm_port_get_lid_range_ho( p_port, &min_lid_ho, &max_lid_ho );
-      if (p_port->p_node)
-      {
-         memcpy(desc, p_port->p_node->node_desc.description,
-                IB_NODE_DESCRIPTION_SIZE);
-         desc[IB_NODE_DESCRIPTION_SIZE] = '\0';
-      }
       osm_log( p_mgr->p_log, OSM_LOG_INFO,
                "__osm_state_mgr_report_new_ports: "
                "Discovered new port with GUID:0x%016" PRIx64
                " LID range [0x%X,0x%X] of node:%s\n",
                cl_ntoh64( port_gid.unicast.interface_id ),
                min_lid_ho, max_lid_ho,
-               p_port->p_node ? desc : "UNKNOWN" );
+               p_port->p_node ? p_port->p_node->print_desc : "UNKNOWN" );
 
       p_port =
          ( osm_port_t
diff --git a/osm/opensm/osm_ucast_ftree.c b/osm/opensm/osm_ucast_ftree.c
index cb40ab6..21aa4a8 100644
--- a/osm/opensm/osm_ucast_ftree.c
+++ b/osm/opensm/osm_ucast_ftree.c
@@ -1251,7 +1251,6 @@ __osm_ftree_fabric_dump_hca_ordering(
    uint32_t             i;
    uint32_t             j;
 
-   char desc[IB_NODE_DESCRIPTION_SIZE + 1];
    char path[1024];
    FILE * p_hca_ordering_file;
    char * filename = "osm-ftree-ca-order.dump";
@@ -1278,11 +1277,10 @@ __osm_ftree_fabric_dump_hca_ordering(
       {
          p_group = p_sw->down_port_groups[j];
          p_hca = p_group->remote_hca_or_sw.remote_hca;
-         memcpy(desc,p_hca->p_osm_node->node_desc.description,IB_NODE_DESCRIPTION_SIZE);
-         desc[IB_NODE_DESCRIPTION_SIZE] = '\0';
 
          fprintf(p_hca_ordering_file,"0x%x\t%s\n", 
-                 cl_ntoh16(p_group->remote_base_lid), desc);
+                 cl_ntoh16(p_group->remote_base_lid),
+                 p_hca->p_osm_node->print_desc);
       }
 
       /* now print dummy HCAs */
diff --git a/osm/opensm/osm_ucast_mgr.c b/osm/opensm/osm_ucast_mgr.c
index ded3880..3564ba7 100644
--- a/osm/opensm/osm_ucast_mgr.c
+++ b/osm/opensm/osm_ucast_mgr.c
@@ -361,14 +361,12 @@ ucast_mgr_dump_lfts(cl_map_item_t *p_map
 	unsigned max_port = osm_switch_get_num_ports(p_sw);
 	uint16_t lid;
 	uint8_t port;
-	char desc[IB_NODE_DESCRIPTION_SIZE + 1];
 
-	memcpy(desc, p_node->node_desc.description, IB_NODE_DESCRIPTION_SIZE);
-	desc[IB_NODE_DESCRIPTION_SIZE] = '\0';
 	fprintf(file, "Unicast lids [0x0-0x%x] of switch Lid %u guid 0x%016"
 		PRIx64 " (\'%s\'):\n",
 		max_lid, osm_node_get_base_lid(p_node, 0),
-		cl_ntoh64(osm_node_get_node_guid(p_node)), desc);
+		cl_ntoh64(osm_node_get_node_guid(p_node)),
+		p_node->print_desc);
 	for (lid = 0; lid <= max_lid; lid++) {
 		osm_port_t *p_port;
 		port = osm_switch_get_port_by_lid(p_sw, lid);
@@ -381,12 +379,10 @@ ucast_mgr_dump_lfts(cl_map_item_t *p_map
 		p_port = cl_ptr_vector_get(&p_mgr->p_subn->port_lid_tbl, lid);
 		if (p_port) {
 			p_node = osm_port_get_parent_node(p_port);
-			memcpy(desc, p_node->node_desc.description,
-			       IB_NODE_DESCRIPTION_SIZE);
-			desc[IB_NODE_DESCRIPTION_SIZE] = '\0';
 			fprintf(file, "%s portguid 0x016%" PRIx64 ": \'%s\'",
 				ib_get_node_type_str(osm_node_get_type(p_node)),
-				cl_ntoh64(osm_port_get_guid(p_port)), desc);
+				cl_ntoh64(osm_port_get_guid(p_port)),
+				p_node->print_desc);
 		}
 		else
 			fprintf(file, "unknown node and type");


From halr at voltaire.com  Tue Feb  6 07:53:05 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 06 Feb 2007 10:53:05 -0500
Subject: [openib-general] [PATCH 2/2] OpenSM/osm_sa_mcmember_record.c: Add
 NodeDescription to mcast group join error messages
Message-ID: <1170777169.4525.293474.camel@hal.voltaire.com>

OpenSM/osm_sa_mcmember_record.c: Add NodeDescription to mcast group join
error messages

Signed-off-by: Ira K. Weiny <weiny2 at llnl.gov>
Signed-off-by: Hal Rosenstock <halr at voltaire.com>

diff --git a/osm/opensm/osm_sa_mcmember_record.c b/osm/opensm/osm_sa_mcmember_record.c
index 2c55198..62d00ac 100644
--- a/osm/opensm/osm_sa_mcmember_record.c
+++ b/osm/opensm/osm_sa_mcmember_record.c
@@ -1610,9 +1610,11 @@ __osm_mcmr_rcv_join_mgrp(
                "__osm_mcmr_rcv_join_mgrp: ERR 1B10: "
                "Provided Join State != FullMember - required for create, "
                "MGID: 0x%016" PRIx64 " : "
-               "0x%016" PRIx64 "\n",
+               "0x%016" PRIx64 " from port 0x%016" PRIx64 " (%s)\n",
                cl_ntoh64( p_recvd_mcmember_rec->mgid.unicast.prefix ),
-               cl_ntoh64( p_recvd_mcmember_rec->mgid.unicast.interface_id ) );
+               cl_ntoh64( p_recvd_mcmember_rec->mgid.unicast.interface_id ),
+               cl_ntoh64( portguid ),
+               p_port->p_node->print_desc);
       sa_status = IB_SA_MAD_STATUS_REQ_INVALID;
       osm_sa_send_error( p_rcv->p_resp, p_madw, sa_status );
       goto Exit;
@@ -1649,14 +1651,15 @@ __osm_mcmr_rcv_join_mgrp(
                "component mask = 0x%016" PRIx64 ", "
                "expected comp mask = 0x%016" PRIx64 ", "
                "MGID: 0x%016" PRIx64 " : "
-               "0x%016" PRIx64 " from port 0x%016" PRIx64 "\n",
+               "0x%016" PRIx64 " from port 0x%016" PRIx64 " (%s)\n",
                ib_get_sa_method_str(p_sa_mad->method),
                p_recvd_mcmember_rec->scope_state,
                cl_ntoh64(p_sa_mad->comp_mask),
                CL_NTOH64(REQUIRED_MC_CREATE_COMP_MASK),
                cl_ntoh64( p_recvd_mcmember_rec->mgid.unicast.prefix ),
                cl_ntoh64( p_recvd_mcmember_rec->mgid.unicast.interface_id ),
-               cl_ntoh64( portguid ) );
+               cl_ntoh64( portguid ),
+               p_port->p_node->print_desc);
 
       sa_status = IB_SA_MAD_STATUS_INSUF_COMPS;
       osm_sa_send_error( p_rcv->p_resp, p_madw, sa_status );
@@ -1713,9 +1716,10 @@ __osm_mcmr_rcv_join_mgrp(
     osm_log( p_rcv->p_log, OSM_LOG_ERROR,
              "__osm_mcmr_rcv_join_mgrp: ERR 1B12: "
              "__validate_more_comp_fields, __validate_port_caps, "
-             "or JoinState = 0 failed from port 0x%016" PRIx64 ", "
+             "or JoinState = 0 failed from port 0x%016" PRIx64 " (%s), "
              "sending IB_SA_MAD_STATUS_REQ_INVALID\n",
-             cl_ntoh64( portguid ) );
+             cl_ntoh64( portguid ),
+             p_port->p_node->print_desc);
 
     sa_status = IB_SA_MAD_STATUS_REQ_INVALID;
     osm_sa_send_error( p_rcv->p_resp, p_madw, sa_status );
@@ -1742,8 +1746,10 @@ __osm_mcmr_rcv_join_mgrp(
 
       osm_log( p_rcv->p_log, OSM_LOG_ERROR,
                "__osm_mcmr_rcv_join_mgrp: ERR 1B13: "
-               "__validate_modify failed, "
-               "sending IB_SA_MAD_STATUS_REQ_INVALID\n" );
+               "__validate_modify failed from port 0x%016" PRIx64 " (%s), "
+               "sending IB_SA_MAD_STATUS_REQ_INVALID\n",
+               cl_ntoh64( portguid ),
+               p_port->p_node->print_desc);
 
       sa_status = IB_SA_MAD_STATUS_REQ_INVALID;
       osm_sa_send_error( p_rcv->p_resp, p_madw, sa_status );
@@ -1794,8 +1800,10 @@ __osm_mcmr_rcv_join_mgrp(
   {
     osm_log( p_rcv->p_log, OSM_LOG_ERROR,
              "__osm_mcmr_rcv_join_mgrp: ERR 1B14: "
-             "osm_sm_mcgrp_join failed, "
-             "sending IB_SA_MAD_STATUS_NO_RESOURCES\n" );
+             "osm_sm_mcgrp_join failed from port 0x%016" PRIx64 " (%s), "
+             "sending IB_SA_MAD_STATUS_NO_RESOURCES\n",
+             cl_ntoh64( portguid ),
+             p_port->p_node->print_desc);
 
     CL_PLOCK_EXCL_ACQUIRE(p_rcv->p_lock);
 

From mst at mellanox.co.il  Tue Feb  6 08:02:59 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Tue, 6 Feb 2007 18:02:59 +0200
Subject: [openib-general] [PATCH] IB/ipoib get net_device from
 ipoib_neigh instead of linux neighbour
In-Reply-To: <45C8A32D.2000504@voltaire.com>
References: <45C8A32D.2000504@voltaire.com>
Message-ID: <20070206160259.GC21776@mellanox.co.il>

> ------------------------------------------------------------------------------
> IPoIB uses a two layer neighboring scheme, such that for each struct neighbour
> whose device is an ipoib one, there is a struct ipoib_neigh buddy which is
> created on demand at the tx flow by an ipoib_neigh_alloc(skb->dst->neighbour)
> call.
> 
> When using the bonding driver, neighbours are created by the net stack on behalf
> of the bonding (master) device. On the tx flow the bonding code gets an skb such
> that skb->dev points to the master device, it changes this skb to point on the
> slave device and calls the slave hard_start_xmit function.
> 
> Combing these two flows, there is a hole if some code at ipoib
> (ipoib_neigh_destructor) assumes that for each struct neighbour it gets, n->dev
> is an ipoib device so for example netdev_priv(n->dev) would be of type struct
> ipoib_dev_priv.

Could you plese elaborate how ipoib_neigh_destructor comes to be called at all?
At what point does ipoib_neigh_setup_dev get called?

> To fix it, this patch adds a dev field to struct ipoib_neigh which is used
> instead of the struct neighbour dev one.

What I am concerned with is - if the master is not an IPoIB device,
what guarantee do we have that to_ipoib_neigh will return 0
and not part of an actual hardware address?

Without bonding, the reason is that dev points to an ipoib device,
so we know hw address is 20 bytes.

-- 
MST


From swise at opengridcomputing.com  Tue Feb  6 08:06:27 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Tue, 06 Feb 2007 10:06:27 -0600
Subject: [openib-general] build.sh not building libmthca
Message-ID: <1170777987.19662.31.camel@stevo-desktop>

Another build problem with the alpha test package:

If I run build.sh and _only_ select libmthca, it claims it builds it ok,
but doesn't produce the .rpm file...

Steve.


From darwish.07 at gmail.com  Tue Feb  6 08:07:25 2007
From: darwish.07 at gmail.com (Ahmed S. Darwish)
Date: Tue, 6 Feb 2007 18:07:25 +0200
Subject: [openib-general] [PATCH 2.6.20] infinband: Use ARRAY_SIZE macro
	when appropriate
In-Reply-To: <20070206160204.GA8991@Ahmed>
Message-ID: <20070206160725.GJ8991@Ahmed>

Hi all,

A patch to use ARRAY_SIZE macro already defined in kernel.h

Signed-off-by: Ahmed S. Darwish <darwish.07 at gmail.com>
---
diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
index 63d2a39..7fabb42 100644
--- a/drivers/infiniband/core/device.c
+++ b/drivers/infiniband/core/device.c
@@ -36,6 +36,7 @@
 #include <linux/module.h>
 #include <linux/string.h>
 #include <linux/errno.h>
+#include <linux/kernel.h>
 #include <linux/slab.h>
 #include <linux/init.h>
 #include <linux/mutex.h>
@@ -93,7 +94,7 @@ static int ib_device_check_mandatory(struct ib_device *device)
 	};
 	int i;
 
-	for (i = 0; i < sizeof mandatory_table / sizeof mandatory_table[0]; ++i) {
+	for (i = 0; i < ARRAY_SIZE(mandatory_table); ++i) {
 		if (!*(void **) ((void *) device + mandatory_table[i].offset)) {
 			printk(KERN_WARNING "Device %s is missing mandatory function %s\n",
 			       device->name, mandatory_table[i].name);

-- 
Ahmed S. Darwish
http://darwish-07.blogspot.com


From monis at voltaire.com  Tue Feb  6 08:24:10 2007
From: monis at voltaire.com (Moni Shoua)
Date: Tue, 06 Feb 2007 18:24:10 +0200
Subject: [openib-general] [PATCH] IB/ipoib get net_device from
 ipoib_neigh instead of linux neighbour
In-Reply-To: <20070206160259.GC21776@mellanox.co.il>
References: <45C8A32D.2000504@voltaire.com>
	<20070206160259.GC21776@mellanox.co.il>
Message-ID: <45C8ABAA.10500@voltaire.com>

Michael S. Tsirkin wrote:
>>------------------------------------------------------------------------------
>>IPoIB uses a two layer neighboring scheme, such that for each struct neighbour
>>whose device is an ipoib one, there is a struct ipoib_neigh buddy which is
>>created on demand at the tx flow by an ipoib_neigh_alloc(skb->dst->neighbour)
>>call.
>>
>>When using the bonding driver, neighbours are created by the net stack on behalf
>>of the bonding (master) device. On the tx flow the bonding code gets an skb such
>>that skb->dev points to the master device, it changes this skb to point on the
>>slave device and calls the slave hard_start_xmit function.
>>
>>Combing these two flows, there is a hole if some code at ipoib
>>(ipoib_neigh_destructor) assumes that for each struct neighbour it gets, n->dev
>>is an ipoib device so for example netdev_priv(n->dev) would be of type struct
>>ipoib_dev_priv.
> 
> 
> Could you plese elaborate how ipoib_neigh_destructor comes to be called at all?
> At what point does ipoib_neigh_setup_dev get called?
> 
> 
The bond device uses its slave's neigh_setup function.
Please look at line 19 below from the bonding code.
static void bond_setup_by_slave(struct net_device *bond_dev,
     11 +               struct net_device *slave_dev)
     12 +{
     13 +   bond_dev->hard_header           = slave_dev->hard_header;
     14 +   bond_dev->rebuild_header        = slave_dev->rebuild_header;
     15 +   bond_dev->hard_header_cache = slave_dev->hard_header_cache;
     16 +   bond_dev->header_cache_update   = slave_dev->header_cache_update;
     17 +   bond_dev->hard_header_parse = slave_dev->hard_header_parse;
     18 +
     19 +   bond_dev->neigh_setup       = slave_dev->neigh_setup;
     20 +
     21 +   bond_dev->type          = slave_dev->type;
     22 +   bond_dev->hard_header_len   = slave_dev->hard_header_len;
     23 +   bond_dev->addr_len      = slave_dev->addr_len;
     24 +
     25 +   memcpy(bond_dev->broadcast, slave_dev->broadcast,
     26 +       slave_dev->addr_len);
     27 +}
>>To fix it, this patch adds a dev field to struct ipoib_neigh which is used
>>instead of the struct neighbour dev one.
> 
> 
> What I am concerned with is - if the master is not an IPoIB device,
> what guarantee do we have that to_ipoib_neigh will return 0
> and not part of an actual hardware address?
> 
> Without bonding, the reason is that dev points to an ipoib device,
> so we know hw address is 20 bytes.
> 

I guess you meant "if the slave is not an IPoIB device"...

The bond device doesn't allow devices of different types to be grouped
together as its slaves. Furthermore, bond_setup_by_slave is called only for non
Ethernet devices (we consider to change the logic to "called only for
IPoIB devices just for safety).


From swise at opengridcomputing.com  Tue Feb  6 08:41:42 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Tue, 06 Feb 2007 10:41:42 -0600
Subject: [openib-general] build.sh not building libmthca
In-Reply-To: <1170777987.19662.31.camel@stevo-desktop>
References: <1170777987.19662.31.camel@stevo-desktop>
Message-ID: <1170780102.19662.45.camel@stevo-desktop>

Do you want me to use bugzilla to track these issues?


On Tue, 2007-02-06 at 10:06 -0600, Steve Wise wrote:
> Another build problem with the alpha test package:
> 
> If I run build.sh and _only_ select libmthca, it claims it builds it ok,
> but doesn't produce the .rpm file...
> 
> Steve.
> 
> 
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
> 


From xma at us.ibm.com  Tue Feb  6 08:41:10 2007
From: xma at us.ibm.com (Shirley Ma)
Date: Tue, 6 Feb 2007 08:41:10 -0800
Subject: [openib-general] [PATCH] enable IPoIB only if broadcast join
 finish
In-Reply-To: <OF15ED804B.C1AB3DA2-ON87257279.0050CC75-88257279.00259EE9@us.ibm.com>
Message-ID: <OFB2A1FFEC.76A511C3-ON8725727A.005B7581-8825727A.005BA949@us.ibm.com>


Roland,

      Could you please review this patch when you have time? I am looking
forward to seeing your comments to address a customer issue. Appreciate
your help.

Thanks
Shirley Ma
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070206/d63b45ef/attachment.html>

From vlad at mellanox.co.il  Tue Feb  6 08:48:36 2007
From: vlad at mellanox.co.il (Vladimir Sokolovsky)
Date: Tue, 06 Feb 2007 18:48:36 +0200
Subject: [openib-general] [PATCH] ofed_1_2 Cleanup RHEL4U4 netevent
 backport]
In-Reply-To: <1170604137.4129.13.camel@linux-q667.site>
References: <1170360441.16637.41.camel@stevo-desktop>
	<1170604137.4129.13.camel@linux-q667.site>
Message-ID: <1170780516.6537.28.camel@vladsk-laptop>

On Sun, 2007-02-04 at 09:48 -0600, Steve WIse wrote:
> Vlad/Michael,
> 
> I'm still tracking this as an outstanding patch.  Have you pulled this
> in yet?
> 
> Thanks,
> 
> Steve.


Applied.


-- 
Vladimir Sokolovsky <vlad at mellanox.co.il>
Mellanox Technologies Ltd.


From vlad at mellanox.co.il  Tue Feb  6 08:49:18 2007
From: vlad at mellanox.co.il (Vladimir Sokolovsky)
Date: Tue, 06 Feb 2007 18:49:18 +0200
Subject: [openib-general] [PATCH] ofed_1_2 - iw_cxgb3 - Add standard GPL
 header to tcb.h
In-Reply-To: <1170704623.16661.54.camel@stevo-desktop>
References: <1170704623.16661.54.camel@stevo-desktop>
Message-ID: <1170780558.6537.30.camel@vladsk-laptop>

On Mon, 2007-02-05 at 13:43 -0600, Steve Wise wrote:
> Add standard GPL header to tcb.h
> 
> From: Steve Wise <swise at opengridcomputing.com>
> 
> Signed-off-by: Steve Wise <swise at opengridcomputing.com>
> ---

Applied.


-- 
Vladimir Sokolovsky <vlad at mellanox.co.il>
Mellanox Technologies Ltd.


From swise at opengridcomputing.com  Tue Feb  6 08:50:46 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Tue, 06 Feb 2007 10:50:46 -0600
Subject: [openib-general] OFED-1.2 first release - provider library
 install problem
In-Reply-To: <1170776321.19662.28.camel@stevo-desktop>
References: <6C2C79E72C305246B504CBA17B5500C922B30E@mtlexch01.mtl.com>
	<1170776321.19662.28.camel@stevo-desktop>
Message-ID: <1170780646.19662.48.camel@stevo-desktop>

FYI: The libmthca rpm has the same issue...

Steve.


On Tue, 2007-02-06 at 09:38 -0600, Steve Wise wrote:
> Vlad,
> 
> After installing the  test alpha1 build rpms on rhel4u4 with a
> kernel.org 2.6.20 kernel, it appears that the provider library config
> files didn't get installed for libcxgb3:
> 
> [root at r1-iw ~]#  rping -s -a 0.0.0.0 -p 9999
> libibverbs: Warning: couldn't open config directory '/usr/local/ofed/etc/libibverbs.d'.
> libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
> libibverbs: Warning: couldn't open config directory '/usr/local/ofed/etc/libibverbs.d'.
> libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
> libibverbs: Warning: no userspace device-specific driver found for uverbs0
> Segmentation fault
> [root at r1-iw ~]# ls /usr/local/ofed/etc
> ls: /usr/local/ofed/etc: No such file or directory
> [root at r1-iw ~]#
> 
> I'm running with the cxgb3 driver, so I guess libcxgb3 didn't install
> itself correctly?  This works when doing 'make install' from the
> userspace tarballs.  Is there some rpm magic missing?  I'm not sure how
> to debug this as I'm rpm-challenged (but willing to learn :).
> 
> It appears libcxgb3 installed its v2 libs correctly:  
> 
> [root at r1-iw ~]# ls /usr/local/ofed/lib64
> libcxgb3.a           libdat.a          libibcommon.so        libibumad.a         libibverbs.so.1.0.0
> libcxgb3-rdmav2.so   libdat.so         libibcommon.so.1      libibumad.so        librdmacm.so
> libcxgb3.so          libdat.so.1       libibcommon.so.1.0.0  libibumad.so.1      librdmacm.so.0.9.0
> libdaplcma.a         libdat.so.1.0.2   libibmad.a            libibumad.so.1.0.0
> libdaplcma.so        libibcm.so        libibmad.so           libibverbs.a
> libdaplcma.so.1      libibcm.so.0.9.0  libibmad.so.1         libibverbs.so
> libdaplcma.so.1.0.2  libibcommon.a     libibmad.so.1.2.0     libibverbs.so.1
> [root at r1-iw ~]#
> 
> But /usr/local/ofed/etc/libibverbs.d didn't get created and the cxgb3.driver file installed.
> 
> 
> 
> Steve.
> 
> 
> 
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
> 


From jsquyres at cisco.com  Tue Feb  6 09:05:36 2007
From: jsquyres at cisco.com (Jeff Squyres)
Date: Tue, 6 Feb 2007 12:05:36 -0500
Subject: [openib-general] build.sh not building libmthca
In-Reply-To: <1170780102.19662.45.camel@stevo-desktop>
References: <1170777987.19662.31.camel@stevo-desktop>
	<1170780102.19662.45.camel@stevo-desktop>
Message-ID: <BE5F0242-F8BC-4F63-8E03-CCE24EC000CF@cisco.com>

Yes, please file all bugs in bugzilla.

Thanks!


On Feb 6, 2007, at 11:41 AM, Steve Wise wrote:

> Do you want me to use bugzilla to track these issues?
>
>
> On Tue, 2007-02-06 at 10:06 -0600, Steve Wise wrote:
>> Another build problem with the alpha test package:
>>
>> If I run build.sh and _only_ select libmthca, it claims it builds  
>> it ok,
>> but doesn't produce the .rpm file...
>>
>> Steve.
>>
>>
>> _______________________________________________
>> openib-general mailing list
>> openib-general at openib.org
>> http://openib.org/mailman/listinfo/openib-general
>>
>> To unsubscribe, please visit http://openib.org/mailman/listinfo/ 
>> openib-general
>>
>
>
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
>
> To unsubscribe, please visit http://openib.org/mailman/listinfo/ 
> openib-general


-- 
Jeff Squyres
Server Virtualization Business Unit
Cisco Systems


From swise at opengridcomputing.com  Tue Feb  6 09:09:07 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Tue, 06 Feb 2007 11:09:07 -0600
Subject: [openib-general] OFED-1.2 first release - provider library
 install problem
In-Reply-To: <1170780646.19662.48.camel@stevo-desktop>
References: <6C2C79E72C305246B504CBA17B5500C922B30E@mtlexch01.mtl.com>
	<1170776321.19662.28.camel@stevo-desktop>
	<1170780646.19662.48.camel@stevo-desktop>
Message-ID: <1170781747.19662.54.camel@stevo-desktop>

bug 339 opened.


On Tue, 2007-02-06 at 10:50 -0600, Steve Wise wrote:
> provider library install problem


From swise at opengridcomputing.com  Tue Feb  6 09:09:27 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Tue, 06 Feb 2007 11:09:27 -0600
Subject: [openib-general] build.sh not building libmthca
In-Reply-To: <BE5F0242-F8BC-4F63-8E03-CCE24EC000CF@cisco.com>
References: <1170777987.19662.31.camel@stevo-desktop>
	<1170780102.19662.45.camel@stevo-desktop>
	<BE5F0242-F8BC-4F63-8E03-CCE24EC000CF@cisco.com>
Message-ID: <1170781767.19662.56.camel@stevo-desktop>

bug 338 opened.

On Tue, 2007-02-06 at 12:05 -0500, Jeff Squyres wrote:
> Yes, please file all bugs in bugzilla.
> 
> Thanks!
> 
> 
> On Feb 6, 2007, at 11:41 AM, Steve Wise wrote:
> 
> > Do you want me to use bugzilla to track these issues?
> >
> >
> > On Tue, 2007-02-06 at 10:06 -0600, Steve Wise wrote:
> >> Another build problem with the alpha test package:
> >>
> >> If I run build.sh and _only_ select libmthca, it claims it builds  
> >> it ok,
> >> but doesn't produce the .rpm file...
> >>
> >> Steve.
> >>
> >>
> >> _______________________________________________
> >> openib-general mailing list
> >> openib-general at openib.org
> >> http://openib.org/mailman/listinfo/openib-general
> >>
> >> To unsubscribe, please visit http://openib.org/mailman/listinfo/ 
> >> openib-general
> >>
> >
> >
> > _______________________________________________
> > openib-general mailing list
> > openib-general at openib.org
> > http://openib.org/mailman/listinfo/openib-general
> >
> > To unsubscribe, please visit http://openib.org/mailman/listinfo/ 
> > openib-general
> 
> 


From mshefty at ichips.intel.com  Tue Feb  6 09:08:52 2007
From: mshefty at ichips.intel.com (Sean Hefty)
Date: Tue, 06 Feb 2007 09:08:52 -0800
Subject: [openib-general] [RFC] [PATCH] ib_usa: export multicast and
 informinfo registration to userspace
In-Reply-To: <1170771710.4525.287718.camel@hal.voltaire.com>
References: <000001c74726$94d0f500$e598070a@amr.corp.intel.com>
	<1170771710.4525.287718.camel@hal.voltaire.com>
Message-ID: <45C8B624.9020602@ichips.intel.com>

>>+static int process_mcast(struct usa_file *file, struct ib_usa_request *req,
>>+			 int out_len)
>>+{
>>+	/* Only indirect requests are currently supported. */
>>+	if (!req->local)
>>+		return -ENOSYS;
>>+
>>+	switch (req->method) {
>>+	case IB_MGMT_METHOD_GET:
>>+		return get_mcast(file, req, out_len);
>>+	case IB_MGMT_METHOD_SET:
>>+		return join_mcast(file, req, out_len);
>>+	default:
>>+		return -EINVAL;
> 
> 
> Should leaving a multicast group also be supported ?

It is - just in a different way.  Once a user has joined a multicast group or 
registered for notices, they leave/unregister by issuing a 'free' request.  The 
majority of the code ends up being the same.

- Sean


From mst at mellanox.co.il  Tue Feb  6 09:14:24 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Tue, 6 Feb 2007 19:14:24 +0200
Subject: [openib-general] [PATCH] IB/ipoib get net_device from
 ipoib_neigh instead of linux neighbour
In-Reply-To: <45C8ABAA.10500@voltaire.com>
References: <45C8ABAA.10500@voltaire.com>
Message-ID: <20070206171424.GB24372@mellanox.co.il>

> >>------------------------------------------------------------------------------
> >>IPoIB uses a two layer neighboring scheme, such that for each struct neighbour
> >>whose device is an ipoib one, there is a struct ipoib_neigh buddy which is
> >>created on demand at the tx flow by an ipoib_neigh_alloc(skb->dst->neighbour)
> >>call.
> >>
> >>When using the bonding driver, neighbours are created by the net stack on behalf
> >>of the bonding (master) device. On the tx flow the bonding code gets an skb such
> >>that skb->dev points to the master device, it changes this skb to point on the
> >>slave device and calls the slave hard_start_xmit function.
> >>
> >>Combing these two flows, there is a hole if some code at ipoib
> >>(ipoib_neigh_destructor) assumes that for each struct neighbour it gets, n->dev
> >>is an ipoib device so for example netdev_priv(n->dev) would be of type struct
> >>ipoib_dev_priv.
> > 
> > 
> > Could you plese elaborate how ipoib_neigh_destructor comes to be called at all?
> > At what point does ipoib_neigh_setup_dev get called?
> > 
> > 
> The bond device uses its slave's neigh_setup function.
> Please look at line 19 below from the bonding code.
> static void bond_setup_by_slave(struct net_device *bond_dev,
>      11 +               struct net_device *slave_dev)
>      12 +{
>      13 +   bond_dev->hard_header           = slave_dev->hard_header;
>      14 +   bond_dev->rebuild_header        = slave_dev->rebuild_header;
>      15 +   bond_dev->hard_header_cache = slave_dev->hard_header_cache;
>      16 +   bond_dev->header_cache_update   = slave_dev->header_cache_update;
>      17 +   bond_dev->hard_header_parse = slave_dev->hard_header_parse;
>      18 +
>      19 +   bond_dev->neigh_setup       = slave_dev->neigh_setup;
>      20 +
>      21 +   bond_dev->type          = slave_dev->type;
>      22 +   bond_dev->hard_header_len   = slave_dev->hard_header_len;
>      23 +   bond_dev->addr_len      = slave_dev->addr_len;
>      24 +
>      25 +   memcpy(bond_dev->broadcast, slave_dev->broadcast,
>      26 +       slave_dev->addr_len);
>      27 +}

Another concern: assume that one device goes away (e.g. hotplug).
It seems that neighbours whose dev field point to another device, will not be destroyed.
Correct?

Therefore in your design, it seems that to_ipoib_neigh()->dev
will get us a pointer to device that has been removed already.

> >>To fix it, this patch adds a dev field to struct ipoib_neigh which is used
> >>instead of the struct neighbour dev one.
> > 
> > 
> > What I am concerned with is - if the master is not an IPoIB device,
> > what guarantee do we have that to_ipoib_neigh will return 0
> > and not part of an actual hardware address?
> > 
> > Without bonding, the reason is that dev points to an ipoib device,
> > so we know hw address is 20 bytes.
> > 
> 
> I guess you meant "if the slave is not an IPoIB device"...

Yes.

> The bond device doesn't allow devices of different types to be grouped
> together as its slaves.

I see.

> Furthermore, bond_setup_by_slave is called only for non
> Ethernet devices (we consider to change the logic to "called only for
> IPoIB devices just for safety).

Why is this necessary, BTW?

-- 
MST


From vlad at dev.mellanox.co.il  Tue Feb  6 09:25:00 2007
From: vlad at dev.mellanox.co.il (Vladimir Sokolovsky)
Date: Tue, 06 Feb 2007 19:25:00 +0200
Subject: [openib-general] build.sh not building libmthca
In-Reply-To: <1170780102.19662.45.camel@stevo-desktop>
References: <1170777987.19662.31.camel@stevo-desktop>
	<1170780102.19662.45.camel@stevo-desktop>
Message-ID: <1170782700.6537.32.camel@vladsk-laptop>

On Tue, 2007-02-06 at 10:41 -0600, Steve Wise wrote:
> Do you want me to use bugzilla to track these issues?
> 
Yes, please.


-- 
Vladimir Sokolovsky <vlad at dev.mellanox.co.il>
Mellanox Technologies Ltd.


From swise at opengridcomputing.com  Tue Feb  6 09:28:26 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Tue, 06 Feb 2007 11:28:26 -0600
Subject: [openib-general] [PATCH] [RFC] ofed_1_2 - SLES9SP3 Backport - IWCM
 workaround for ip_dev_find() bug.
Message-ID: <1170782906.19662.61.camel@stevo-desktop>

I propose the following fix for supporting iWARP on SLES9SP3.  

This fixes bug 325.

Sean, can you please review this?  

Steve.


-----------

SLES9SP3 Backport - IWCM workaround for ip_dev_find() bug.

Acquire the cma_dev based on the ib device of the incoming
connect request.

This overcomes a sles9sp3 bug where ip_dev_find(local_ipaddr) always
returns the loopback net_device pointer instead of the actual local
interface pointer.  Note: this workaround leaves the rdma_dev_addr in
the new connection request rdma_cm_id incomplete.  But ULPs don't really
use this, so we'll have to live with it for SLES9SP3.

Signed-off-by: Steve Wise <swise at opengridcomputing.com>
---

 .../iwcm_ip_dev_find_workaround.patch              |   91 +++++++++++++++++++++++
 1 files changed, 91 insertions(+), 0 deletions(-)

diff --git a/kernel_patches/backport/2.6.5_sles9_sp3/iwcm_ip_dev_find_workaround.patch b/kernel_patches/backport/2.6.5_sles9_sp3/iwcm_ip_dev_find_workaround.patch
new file mode 100644
index 0000000..a9d5bfe
--- /dev/null
+++ b/kernel_patches/backport/2.6.5_sles9_sp3/iwcm_ip_dev_find_workaround.patch
@@ -0,0 +1,91 @@
+SLES9SP3 Backport - IWCM workaround for ip_dev_find() bug.
+
+From: Steve Wise <swise at opengridcomputing.com>
+
+Acquire the cma_dev based on the ib device of the incoming
+connect request.
+
+This overcomes a sles9sp3 bug where ip_dev_find(local_ipaddr) always
+returns the loopback net_device pointer instead of the actual local
+interface pointer.  Note: this workaround leaves the rdma_dev_addr in
+the new connection request rdma_cm_id incomplete.  But ULPs don't really
+use this, so we'll have to live with it for SLES9SP3.
+
+Signed-off-by: Steve Wise <swise at opengridcomputing.com>
+---
+
+ drivers/infiniband/core/cma.c |   33 +++++++++++++++------------------
+ 1 files changed, 15 insertions(+), 18 deletions(-)
+
+diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
+index 9e0ab04..c89b611 100644
+--- a/drivers/infiniband/core/cma.c
++++ b/drivers/infiniband/core/cma.c
+@@ -1128,13 +1128,25 @@ static int cma_iw_handler(struct iw_cm_i
+ 	return ret;
+ }
+ 
++static int iw_cma_acquire_dev(struct iw_cm_id *cm_id, struct rdma_id_private *id_priv)
++{
++	struct cma_device *cma_dev;
++
++	list_for_each_entry(cma_dev, &dev_list, list) {
++		if (cma_dev->device == cm_id->device) {
++			cma_attach_to_dev(id_priv, cma_dev);
++			return 0;
++		}
++	}
++	return -ENODEV;
++}
++
+ static int iw_conn_req_handler(struct iw_cm_id *cm_id,
+ 			       struct iw_cm_event *iw_event)
+ {
+ 	struct rdma_cm_id *new_cm_id;
+ 	struct rdma_id_private *listen_id, *conn_id;
+ 	struct sockaddr_in *sin;
+-	struct net_device *dev = NULL;
+ 	struct rdma_cm_event event;
+ 	int ret;
+ 
+@@ -1157,22 +1169,8 @@ static int iw_conn_req_handler(struct iw
+ 	atomic_inc(&conn_id->dev_remove);
+ 	conn_id->state = CMA_CONNECT;
+ 
+-	dev = ip_dev_find(iw_event->local_addr.sin_addr.s_addr);
+-	if (!dev) {
+-		ret = -EADDRNOTAVAIL;
+-		cma_release_remove(conn_id);
+-		rdma_destroy_id(new_cm_id);
+-		goto out;
+-	}
+-	ret = rdma_copy_addr(&conn_id->id.route.addr.dev_addr, dev, NULL);
+-	if (ret) {
+-		cma_release_remove(conn_id);
+-		rdma_destroy_id(new_cm_id);
+-		goto out;
+-	}
+-
+ 	mutex_lock(&lock);
+-	ret = cma_acquire_dev(conn_id);
++	ret = iw_cma_acquire_dev(cm_id, conn_id);
+ 	mutex_unlock(&lock);
+ 	if (ret) {
+ 		cma_release_remove(conn_id);
+@@ -1184,6 +1182,7 @@ static int iw_conn_req_handler(struct iw
+ 	cm_id->context = conn_id;
+ 	cm_id->cm_handler = cma_iw_handler;
+ 
++	new_cm_id->route.addr.dev_addr.dev_type = RDMA_NODE_RNIC;
+ 	sin = (struct sockaddr_in *) &new_cm_id->route.addr.src_addr;
+ 	*sin = iw_event->local_addr;
+ 	sin = (struct sockaddr_in *) &new_cm_id->route.addr.dst_addr;
+@@ -1203,8 +1202,6 @@ static int iw_conn_req_handler(struct iw
+ 	}
+ 
+ out:
+-	if (dev)
+-		dev_put(dev);
+ 	cma_release_remove(listen_id);
+ 	return ret;
+ }


From sweitzen at cisco.com  Tue Feb  6 09:33:42 2007
From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen))
Date: Tue, 6 Feb 2007 09:33:42 -0800
Subject: [openib-general] OFED-1.2 first release
In-Reply-To: <A15335FBE9BD2449AF2C9EF3D1EB8EA302F9B609@xmb-sjc-216.amer.cisco.com>
References: <6C2C79E72C305246B504CBA17B5500C922B30E@mtlexch01.mtl.com>
	<A15335FBE9BD2449AF2C9EF3D1EB8EA302F3F68C@xmb-sjc-216.amer.cisco.com>
	<A15335FBE9BD2449AF2C9EF3D1EB8EA302F3F697@xmb-sjc-216.amer.cisco.com>
	<A15335FBE9BD2449AF2C9EF3D1EB8EA302F9B609@xmb-sjc-216.amer.cisco.com>
Message-ID: <A15335FBE9BD2449AF2C9EF3D1EB8EA302F9B703@xmb-sjc-216.amer.cisco.com>

sdpnetstat is getting added to the dapl-devel RPM.
 
# rpm -qlip dapl-devel-1.2.0-0.x86_64.rpm
Name        : dapl-devel                   Relocations: (not
relocatable)
Version     : 1.2.0                             Vendor: OpenFabrics
Release     : 0                             Build Date: Mon 05 Feb 2007
09:48:50
 PM PST
Install Date: (not installed)               Build Host:
svbu-qa1850-1.cisco.com
Group       : System Environment/Libraries   Source RPM:
ofa_user-1.2-alpha1.src
.rpm
Size        : 692598                           License: GPL/BSD
Signature   : (none)
URL         : http://www.openfabrics.org/
Summary     : Development files for the libdat and libdapl libraries
Description :
Static libraries and header files for the libdat and libdapl library.
/usr/local/ofed/bin/sdpnetstat
/usr/local/ofed/include/dat/dat.h
/usr/local/ofed/include/dat/dat_error.h
/usr/local/ofed/include/dat/dat_platform_specific.h
/usr/local/ofed/include/dat/dat_redirection.h
/usr/local/ofed/include/dat/dat_registry.h
/usr/local/ofed/include/dat/dat_vendor_specific.h
/usr/local/ofed/include/dat/udat.h
/usr/local/ofed/include/dat/udat_config.h
/usr/local/ofed/include/dat/udat_redirection.h
/usr/local/ofed/include/dat/udat_vendor_specific.h
/usr/local/ofed/lib64/libdaplcma.a
/usr/local/ofed/lib64/libdaplcma.so
/usr/local/ofed/lib64/libdat.a
/usr/local/ofed/lib64/libdat.so


________________________________

	From: Scott Weitzenkamp (sweitzen) 
	Sent: Tuesday, February 06, 2007 12:07 AM
	To: Scott Weitzenkamp (sweitzen); 'Vladimir Sokolovsky';
'openfabrics-ewg at openib.org'; 'Tziporet Koren'
	Cc: 'openib-general at openib.org'
	Subject: RE: [openib-general] OFED-1.2 first release
	
	
	Not getting MPI RPMS for Intel compilers, either.
	 
	Running /bin/rpm -Uhv
/tmp/OFED-1.2-20070205-1823/RPMS/redhat-release-4AS-4.1/mp
	itests_mvapich2_gcc-2.0-698.x86_64.rpm
	
/tmp/OFED-1.2-20070205-1823/RPMS/redhat-release-4AS-4.1/mvapich2_intel-0
.9.8-1.x
	86_64.rpm not found
	Running /bin/rpm -Uhv
/tmp/OFED-1.2-20070205-1823/RPMS/redhat-release-4AS-4.1/op
	enmpi_gcc-1.2b4ofedr13470-1ofed.x86_64.rpm
	Running /bin/rpm -Uhv
/tmp/OFED-1.2-20070205-1823/RPMS/redhat-release-4AS-4.1/mp
	itests_openmpi_gcc-2.0-698.x86_64.rpm
	
/tmp/OFED-1.2-20070205-1823/RPMS/redhat-release-4AS-4.1/openmpi_intel-1.
2b4ofedr
	13470-1ofed.x86_64.rpm not found
	ERROR: -.x86_64.rpm not found under
/tmp/OFED-1.2-20070205-1823/RPMS/redhat-rele
	ase-4AS-4.1.
	Installation finished successfully...
	
	Scott

________________________________

		From: Scott Weitzenkamp (sweitzen) 
		Sent: Monday, February 05, 2007 9:44 PM
		To: Scott Weitzenkamp (sweitzen); 'Vladimir Sokolovsky';
'openfabrics-ewg at openib.org'; 'Tziporet Koren'
		Cc: 'openib-general at openib.org'
		Subject: RE: [openib-general] OFED-1.2 first release
		
		
		Moving on, I set ib_bonding=n in ofed.conf and try
install.sh again, and now get this:
		 
		...
		Building MVAPICH RPM. Please wait...
		 
		Using gcc compiler
		Running rpmbuild -v --rebuild --define '_topdir
/var/tmp/OFEDRPM' --define '_nam
		e mvapich_gcc' --define 'ofed 1' --define 'compiler gcc'
--define 'openib_prefix
		 /usr/local/ofed' --define 'build_root /var/tmp/OFED'
--define '_prefix /usr/loc
		al/ofed/mpi/gcc/mvapich-0.9.9'
/tmp/OFED-1.2-20070205-1823/SRPMS/mvapich-0.9.9-9
		71.src.rpm
		 
		ERROR: Failed executing "rpmbuild -v --rebuild --define
'_topdir /var/tmp/OFEDRP
		M' --define '_name mvapich_gcc' --define 'ofed 1'
--define 'compiler gcc' --defi
		ne 'openib_prefix /usr/local/ofed' --define 'build_root
/var/tmp/OFED' --define
		'_prefix /usr/local/ofed/mpi/gcc/mvapich-0.9.9'
/tmp/OFED-1.2-20070205-1823/SRPM
		S/mvapich-0.9.9-971.src.rpm"
		 
		See log file: /tmp/OFED.6120.log
		 
		#  tail /tmp/OFED.6120.log
		+ LANG=C
		+ export LANG
		+ unset DISPLAY
		/var/tmp/rpm-tmp.870: line 33: syntax error near
unexpected token `)'
		error: Bad exit status from /var/tmp/rpm-tmp.870
(%install)
		 
		
		RPM build errors:
		    Bad exit status from /var/tmp/rpm-tmp.870 (%install)
		ERROR: Failed executing "rpmbuild -v --rebuild --define
'_topdir /var/tmp/OFEDRP
		M' --define '_name mvapich_gcc' --define 'ofed 1'
--define 'compiler gcc' --defi
		ne 'openib_prefix /usr/local/ofed' --define 'build_root
/var/tmp/OFED' --define
		'_prefix /usr/local/ofed/mpi/gcc/mvapich-0.9.9'
/tmp/OFED-1.2-20070205-1823/SRPM
		S/mvapich-0.9.9-971.src.rpm"
		
		Scott

________________________________

			From: Scott Weitzenkamp (sweitzen) 
			Sent: Monday, February 05, 2007 9:27 PM
			To: Vladimir Sokolovsky;
openfabrics-ewg at openib.org; Tziporet Koren; Scott Weitzenkamp (sweitzen)
			Cc: openib-general at openib.org
			Subject: RE: [openib-general] OFED-1.2 first
release
			
			
			Vlad and Tziporet,
			 
			It might help if you elaborated on what you
meant by "first release", you have been saying "code freeze" but really
this is "feature freeze", right?  This announcement is quite a bit
different from previous OFED announcements, where you detailed what
features were available and what OS were supported.  The daily build
email mentions compiling against kernels, but I haven't seen what
distros were actually tested.  Are we starting from scratch on compiling
and testing with distros like RHEL4?  Do you anticipate we will just go
day by day with builds trying to stabilize things initially?
			 
			In any case, here's what I see when I try to
compile with install.sh on RHEL4 U3 x86_64:
			 
			...
			/tmp/OFED-1.2-20070205-1823/build.sh: line 802:
kernel-ib: command not found
			Running rpmbuild --rebuild --target=noarch
--define '_topdir /var/tmp/OFEDRPM' -
			-define '_prefix /usr/local/ofed'
/tmp/OFED-1.2-20070205-1823/SRPMS/ofed-docs-1.
			2-0.src.rpm
			Running /bin/mv -f
/var/tmp/OFEDRPM/RPMS/noarch/ofed-docs-1.2-0.noarch.rpm /tmp/
	
OFED-1.2-20070205-1823/RPMS/redhat-release-4AS-4.1
			Running rpmbuild --rebuild --target=noarch
--define '_topdir /var/tmp/OFEDRPM' -
			-define '_prefix /usr/local/ofed'
/tmp/OFED-1.2-20070205-1823/SRPMS/ofed-scripts
			-1.2-0.src.rpm
			Running /bin/mv -f
/var/tmp/OFEDRPM/RPMS/noarch/ofed-scripts-1.2-0.noarch.rpm /t
	
mp/OFED-1.2-20070205-1823/RPMS/redhat-release-4AS-4.1
			Running rpmbuild --rebuild --define '_topdir
/var/tmp/OFEDRPM' --define '_prefix
			 /usr/local/ofed'
/tmp/OFED-1.2-20070205-1823/SRPMS/ib-bonding-0.9.0-1.src.rpm
			Running /bin/mv -f
/var/tmp/OFEDRPM/RPMS/x86_64/ib-bonding-0.9.0-1.x86_64.rpm /t
	
mp/OFED-1.2-20070205-1823/RPMS/redhat-release-4AS-4.1
			 
			ERROR: Failed executing "/bin/mv -f
/var/tmp/OFEDRPM/RPMS/x86_64/ib-bonding-0.9.
			0-1.x86_64.rpm
/tmp/OFED-1.2-20070205-1823/RPMS/redhat-release-4AS-4.1"
			 
			See log file: /tmp/OFED.10899.log
			 
			# tail -10 /tmp/OFED.10899.log
			Checking for unpackaged file(s):
/usr/lib/rpm/check-files /var/tmp/ib-bonding-0.
			9.0-root
			Wrote:
/var/tmp/OFEDRPM/RPMS/x86_64/ib-bonding-0.9.0-1-rh-x86_64.rpm
			Wrote:
/var/tmp/OFEDRPM/RPMS/x86_64/ib-bonding-debuginfo-0.9.0-1-rh-x86_64.rpm
			Executing(--clean): /bin/sh -e
/var/tmp/rpm-tmp.98615
			+ umask 022
			+ cd /var/tmp/OFEDRPM/BUILD
			+ rm -rf ib-bonding-0.9.0
			+ exit 0
			/bin/mv: cannot stat
`/var/tmp/OFEDRPM/RPMS/x86_64/ib-bonding-0.9.0-1.x86_64.rpm
			': No such file or directory
			ERROR: Failed executing "/bin/mv -f
/var/tmp/OFEDRPM/RPMS/x86_64/ib-bonding-0.9.
			0-1.x86_64.rpm
/tmp/OFED-1.2-20070205-1823/RPMS/redhat-release-4AS-4.1"
			

			Scott

________________________________

				From: openib-general-bounces at openib.org
[mailto:openib-general-bounces at openib.org] On Behalf Of Vladimir
Sokolovsky
				Sent: Monday, February 05, 2007 2:26 PM
				To: openfabrics-ewg at openib.org
				Cc: openib-general at openib.org
				Subject: [openib-general] OFED-1.2 first
release
				
				
				Hi,
				
				OFED-1.2-20070205-1823.tgz can be
downloaded from
				
	
http://www.openfabrics.org/builds/ofed-1.2/
				
				
				The first OFED package includes:
				
				
				ofa_kernel-1.2-alpha1.src.rpm
				
				ofa_user-1.2-alpha1.src.rpm
				
				mvapich-0.9.9-971.src.rpm
				
				mvapich2-0.9.8-1.src.rpm
				
				openmpi-1.2b4ofedr13470-1ofed.src.rpm
				
				mpitests-2.0-698.src.rpm
				
				open-iscsi-generic-2.0-742.src.rpm
				
				ib-bonding-0.9.0-1.src.rpm
				
				ofed-docs-1.2-0.src.rpm
				
				ofed-scripts-1.2-0.src.rpm
				
				
				Known issues:
				
				srptools - compilation fails
				
				openib_diags - compilation fails
				
				ibutils - not included yet
				
				
				To build OFED RPMs:
				
				cd OFED-1.2-20070205-1823
				
				./build.sh
				
				
				Created RPMs will be stored under
OFED-1.2-20070205-1823/RPMS/
				
				directory.
				
				
				To install OFED RPMs:
				
				cd OFED-1.2-20070205-1823
				
				./install.sh
				
				
				For a detailed installation guide, see
				
	
OFED-1.2-xxx/docs/OFED_Installation_Guide.txt
				
				
				-- 
				
				Vladimir Sokolovsky
<vlad at dev.mellanox.co.il> <mailto:vlad at dev.mellanox.co.il> 
				
				Mellanox Technologies Ltd.
				 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070206/b2773ce2/attachment.html>

From mshefty at ichips.intel.com  Tue Feb  6 09:37:13 2007
From: mshefty at ichips.intel.com (Sean Hefty)
Date: Tue, 06 Feb 2007 09:37:13 -0800
Subject: [openib-general] [PATCH] [RFC] ofed_1_2 - SLES9SP3 Backport -
 IWCM workaround for ip_dev_find() bug.
In-Reply-To: <1170782906.19662.61.camel@stevo-desktop>
References: <1170782906.19662.61.camel@stevo-desktop>
Message-ID: <45C8BCC9.4070003@ichips.intel.com>

Steve Wise wrote:
> I propose the following fix for supporting iWARP on SLES9SP3.  
> 
> This fixes bug 325.
> 
> Sean, can you please review this?  

The changes seem fine with me.

Does this bug affect the ib_addr module as well?  (addr_resolve_local and 
rdma_translate_ip)

- Sean


From swise at opengridcomputing.com  Tue Feb  6 10:02:04 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Tue, 06 Feb 2007 12:02:04 -0600
Subject: [openib-general] [PATCH] [RFC] ofed_1_2 - SLES9SP3 Backport -
 IWCM workaround for ip_dev_find() bug.
In-Reply-To: <45C8BCC9.4070003@ichips.intel.com>
References: <1170782906.19662.61.camel@stevo-desktop>
	<45C8BCC9.4070003@ichips.intel.com>
Message-ID: <1170784924.19662.79.camel@stevo-desktop>

On Tue, 2007-02-06 at 09:37 -0800, Sean Hefty wrote:
> Steve Wise wrote:
> > I propose the following fix for supporting iWARP on SLES9SP3.  
> > 
> > This fixes bug 325.
> > 
> > Sean, can you please review this?  
> 
> The changes seem fine with me.
> 
> Does this bug affect the ib_addr module as well?  (addr_resolve_local and 
> rdma_translate_ip)
> 

Actually, yes it does.  Here's one case (that I just tested :):

If you rdma_bind() to an explicit address local address, it will fail.

Foo!

I guess I'll need to address the uses of ip_dev_find() in addr.c as well
before we commit this.

What really bothers me is I cannot find the kernel code in the
2.6.5-7.244 kernel that is doing this (returning loopback for all local
devices).  ip_dev_find() does a FIB lookup to find this.  I dug around
the fib code but so far haven't found the culprit.  

I welcome any help from anyone out there interested in the rdma-cm
working on sles9sp3.  I would think if SDP does an rdma_bind() then SDP
will also see this bug when run on sles9sp3.  (Are SUSE folks
listening?)  Any thoughts?

Steve.


From swise at opengridcomputing.com  Tue Feb  6 10:09:17 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Tue, 06 Feb 2007 12:09:17 -0600
Subject: [openib-general] OFED-1.2 first release
In-Reply-To: <1170724023.19728.5.camel@stevo-desktop>
References: <6C2C79E72C305246B504CBA17B5500C922B30E@mtlexch01.mtl.com>
	<1170721163.16661.111.camel@stevo-desktop>
	<1170724023.19728.5.camel@stevo-desktop>
Message-ID: <1170785357.19662.83.camel@stevo-desktop>

opened bug 340.


On Mon, 2007-02-05 at 19:07 -0600, Steve Wise wrote:
> I think there might be some dependency problem.  I selected libibverbs,
> libcxgb3, librdmacm, perftest, mvapich2/IWARP and mpitests.  For some
> reason it pulled in libibumad as a prereq, but not libibcommon...
> 
> Also, I think mvapich2/IWARP links with libibumad or libibcommon and it
> doesn't need to when using librdmacm.
> 
> 
> 
> [root at r2-iw redhat-release-4AS-5.5]# rpm -U *
> error: Failed dependencies:
>         libibcommon.so.1()(64bit) is needed by libibumad-1.0.2-0.x86_64
>         libibcommon.so.1(IBCOMMON_1.0)(64bit) is needed by libibumad-1.0.2-0.x86_64
>     Suggested resolutions:
>         libibcommon-1.0-1.x86_64.rpm
> 
> 
> > On Tue, 2007-02-06 at 00:25 +0200, Vladimir Sokolovsky wrote:
> > > Hi,
> > > 
> > > OFED-1.2-20070205-1823.tgz can be downloaded from
> > > 
> > > http://www.openfabrics.org/builds/ofed-1.2/
> > > 
> > 
> > > 
> > 
> > 
> > _______________________________________________
> > openib-general mailing list
> > openib-general at openib.org
> > http://openib.org/mailman/listinfo/openib-general
> > 
> > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
> > 
> 
> 
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
> 


From mshefty at ichips.intel.com  Tue Feb  6 10:35:40 2007
From: mshefty at ichips.intel.com (Sean Hefty)
Date: Tue, 06 Feb 2007 10:35:40 -0800
Subject: [openib-general] [PATCH] [RFC] ofed_1_2 - SLES9SP3 Backport -
 IWCM workaround for ip_dev_find() bug.
In-Reply-To: <1170784924.19662.79.camel@stevo-desktop>
References: <1170782906.19662.61.camel@stevo-desktop>
	<45C8BCC9.4070003@ichips.intel.com>
	<1170784924.19662.79.camel@stevo-desktop>
Message-ID: <45C8CA7C.3080705@ichips.intel.com>

> Actually, yes it does.  Here's one case (that I just tested :):
> 
> If you rdma_bind() to an explicit address local address, it will fail.
> 
> Foo!
> 
> I guess I'll need to address the uses of ip_dev_find() in addr.c as well
> before we commit this.

Can we just backport our own version of ip_dev_find()?  We had this once before 
in svn when they removed it from being exported from the kernel.

- Sean


From sweitzen at cisco.com  Tue Feb  6 10:54:34 2007
From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen))
Date: Tue, 6 Feb 2007 10:54:34 -0800
Subject: [openib-general] OFED-1.2 first release
In-Reply-To: <A15335FBE9BD2449AF2C9EF3D1EB8EA302F9B703@xmb-sjc-216.amer.cisco.com>
References: <6C2C79E72C305246B504CBA17B5500C922B30E@mtlexch01.mtl.com>
	<A15335FBE9BD2449AF2C9EF3D1EB8EA302F3F68C@xmb-sjc-216.amer.cisco.com>
	<A15335FBE9BD2449AF2C9EF3D1EB8EA302F3F697@xmb-sjc-216.amer.cisco.com>
	<A15335FBE9BD2449AF2C9EF3D1EB8EA302F9B609@xmb-sjc-216.amer.cisco.com>
	<A15335FBE9BD2449AF2C9EF3D1EB8EA302F9B703@xmb-sjc-216.amer.cisco.com>
Message-ID: <A15335FBE9BD2449AF2C9EF3D1EB8EA302F9B77F@xmb-sjc-216.amer.cisco.com>

libibverbs is not working.  I have opened bugs 342-346 for the issues
I've found so far:
 
# ibv_devices
libibverbs: Warning: couldn't open config directory
'/usr/local/ofed/etc/libibverbs.d'.
libibverbs: Warning: no userspace device-specific driver found for
/sys/class/in
finiband_verbs/uverbs0
    device                 node GUID
    ------              ----------------

 
Scott Weitzenkamp
SQA and Release Manager
Server Virtualization Business Unit
Cisco Systems
 

________________________________

	From: Scott Weitzenkamp (sweitzen) 
	Sent: Tuesday, February 06, 2007 9:34 AM
	To: Scott Weitzenkamp (sweitzen); 'Vladimir Sokolovsky';
'openfabrics-ewg at openib.org'; 'Tziporet Koren'
	Cc: 'openib-general at openib.org'
	Subject: RE: [openib-general] OFED-1.2 first release
	
	
	sdpnetstat is getting added to the dapl-devel RPM.
	 
	# rpm -qlip dapl-devel-1.2.0-0.x86_64.rpm
	Name        : dapl-devel                   Relocations: (not
relocatable)
	Version     : 1.2.0                             Vendor:
OpenFabrics
	Release     : 0                             Build Date: Mon 05
Feb 2007 09:48:50
	 PM PST
	Install Date: (not installed)               Build Host:
svbu-qa1850-1.cisco.com
	Group       : System Environment/Libraries   Source RPM:
ofa_user-1.2-alpha1.src
	.rpm
	Size        : 692598                           License: GPL/BSD
	Signature   : (none)
	URL         : http://www.openfabrics.org/
	Summary     : Development files for the libdat and libdapl
libraries
	Description :
	Static libraries and header files for the libdat and libdapl
library.
	/usr/local/ofed/bin/sdpnetstat
	/usr/local/ofed/include/dat/dat.h
	/usr/local/ofed/include/dat/dat_error.h
	/usr/local/ofed/include/dat/dat_platform_specific.h
	/usr/local/ofed/include/dat/dat_redirection.h
	/usr/local/ofed/include/dat/dat_registry.h
	/usr/local/ofed/include/dat/dat_vendor_specific.h
	/usr/local/ofed/include/dat/udat.h
	/usr/local/ofed/include/dat/udat_config.h
	/usr/local/ofed/include/dat/udat_redirection.h
	/usr/local/ofed/include/dat/udat_vendor_specific.h
	/usr/local/ofed/lib64/libdaplcma.a
	/usr/local/ofed/lib64/libdaplcma.so
	/usr/local/ofed/lib64/libdat.a
	/usr/local/ofed/lib64/libdat.so
	

________________________________

		From: Scott Weitzenkamp (sweitzen) 
		Sent: Tuesday, February 06, 2007 12:07 AM
		To: Scott Weitzenkamp (sweitzen); 'Vladimir Sokolovsky';
'openfabrics-ewg at openib.org'; 'Tziporet Koren'
		Cc: 'openib-general at openib.org'
		Subject: RE: [openib-general] OFED-1.2 first release
		
		
		Not getting MPI RPMS for Intel compilers, either.
		 
		Running /bin/rpm -Uhv
/tmp/OFED-1.2-20070205-1823/RPMS/redhat-release-4AS-4.1/mp
		itests_mvapich2_gcc-2.0-698.x86_64.rpm
	
/tmp/OFED-1.2-20070205-1823/RPMS/redhat-release-4AS-4.1/mvapich2_intel-0
.9.8-1.x
		86_64.rpm not found
		Running /bin/rpm -Uhv
/tmp/OFED-1.2-20070205-1823/RPMS/redhat-release-4AS-4.1/op
		enmpi_gcc-1.2b4ofedr13470-1ofed.x86_64.rpm
		Running /bin/rpm -Uhv
/tmp/OFED-1.2-20070205-1823/RPMS/redhat-release-4AS-4.1/mp
		itests_openmpi_gcc-2.0-698.x86_64.rpm
	
/tmp/OFED-1.2-20070205-1823/RPMS/redhat-release-4AS-4.1/openmpi_intel-1.
2b4ofedr
		13470-1ofed.x86_64.rpm not found
		ERROR: -.x86_64.rpm not found under
/tmp/OFED-1.2-20070205-1823/RPMS/redhat-rele
		ase-4AS-4.1.
		Installation finished successfully...
		
		Scott

________________________________

			From: Scott Weitzenkamp (sweitzen) 
			Sent: Monday, February 05, 2007 9:44 PM
			To: Scott Weitzenkamp (sweitzen); 'Vladimir
Sokolovsky'; 'openfabrics-ewg at openib.org'; 'Tziporet Koren'
			Cc: 'openib-general at openib.org'
			Subject: RE: [openib-general] OFED-1.2 first
release
			
			
			Moving on, I set ib_bonding=n in ofed.conf and
try install.sh again, and now get this:
			 
			...
			Building MVAPICH RPM. Please wait...
			 
			Using gcc compiler
			Running rpmbuild -v --rebuild --define '_topdir
/var/tmp/OFEDRPM' --define '_nam
			e mvapich_gcc' --define 'ofed 1' --define
'compiler gcc' --define 'openib_prefix
			 /usr/local/ofed' --define 'build_root
/var/tmp/OFED' --define '_prefix /usr/loc
			al/ofed/mpi/gcc/mvapich-0.9.9'
/tmp/OFED-1.2-20070205-1823/SRPMS/mvapich-0.9.9-9
			71.src.rpm
			 
			ERROR: Failed executing "rpmbuild -v --rebuild
--define '_topdir /var/tmp/OFEDRP
			M' --define '_name mvapich_gcc' --define 'ofed
1' --define 'compiler gcc' --defi
			ne 'openib_prefix /usr/local/ofed' --define
'build_root /var/tmp/OFED' --define
			'_prefix /usr/local/ofed/mpi/gcc/mvapich-0.9.9'
/tmp/OFED-1.2-20070205-1823/SRPM
			S/mvapich-0.9.9-971.src.rpm"
			 
			See log file: /tmp/OFED.6120.log
			 
			#  tail /tmp/OFED.6120.log
			+ LANG=C
			+ export LANG
			+ unset DISPLAY
			/var/tmp/rpm-tmp.870: line 33: syntax error near
unexpected token `)'
			error: Bad exit status from /var/tmp/rpm-tmp.870
(%install)
			 
			
			RPM build errors:
			    Bad exit status from /var/tmp/rpm-tmp.870
(%install)
			ERROR: Failed executing "rpmbuild -v --rebuild
--define '_topdir /var/tmp/OFEDRP
			M' --define '_name mvapich_gcc' --define 'ofed
1' --define 'compiler gcc' --defi
			ne 'openib_prefix /usr/local/ofed' --define
'build_root /var/tmp/OFED' --define
			'_prefix /usr/local/ofed/mpi/gcc/mvapich-0.9.9'
/tmp/OFED-1.2-20070205-1823/SRPM
			S/mvapich-0.9.9-971.src.rpm"
			
			Scott

________________________________

				From: Scott Weitzenkamp (sweitzen) 
				Sent: Monday, February 05, 2007 9:27 PM
				To: Vladimir Sokolovsky;
openfabrics-ewg at openib.org; Tziporet Koren; Scott Weitzenkamp (sweitzen)
				Cc: openib-general at openib.org
				Subject: RE: [openib-general] OFED-1.2
first release
				
				
				Vlad and Tziporet,
				 
				It might help if you elaborated on what
you meant by "first release", you have been saying "code freeze" but
really this is "feature freeze", right?  This announcement is quite a
bit different from previous OFED announcements, where you detailed what
features were available and what OS were supported.  The daily build
email mentions compiling against kernels, but I haven't seen what
distros were actually tested.  Are we starting from scratch on compiling
and testing with distros like RHEL4?  Do you anticipate we will just go
day by day with builds trying to stabilize things initially?
				 
				In any case, here's what I see when I
try to compile with install.sh on RHEL4 U3 x86_64:
				 
				...
				/tmp/OFED-1.2-20070205-1823/build.sh:
line 802: kernel-ib: command not found
				Running rpmbuild --rebuild
--target=noarch --define '_topdir /var/tmp/OFEDRPM' -
				-define '_prefix /usr/local/ofed'
/tmp/OFED-1.2-20070205-1823/SRPMS/ofed-docs-1.
				2-0.src.rpm
				Running /bin/mv -f
/var/tmp/OFEDRPM/RPMS/noarch/ofed-docs-1.2-0.noarch.rpm /tmp/
	
OFED-1.2-20070205-1823/RPMS/redhat-release-4AS-4.1
				Running rpmbuild --rebuild
--target=noarch --define '_topdir /var/tmp/OFEDRPM' -
				-define '_prefix /usr/local/ofed'
/tmp/OFED-1.2-20070205-1823/SRPMS/ofed-scripts
				-1.2-0.src.rpm
				Running /bin/mv -f
/var/tmp/OFEDRPM/RPMS/noarch/ofed-scripts-1.2-0.noarch.rpm /t
	
mp/OFED-1.2-20070205-1823/RPMS/redhat-release-4AS-4.1
				Running rpmbuild --rebuild --define
'_topdir /var/tmp/OFEDRPM' --define '_prefix
				 /usr/local/ofed'
/tmp/OFED-1.2-20070205-1823/SRPMS/ib-bonding-0.9.0-1.src.rpm
				Running /bin/mv -f
/var/tmp/OFEDRPM/RPMS/x86_64/ib-bonding-0.9.0-1.x86_64.rpm /t
	
mp/OFED-1.2-20070205-1823/RPMS/redhat-release-4AS-4.1
				 
				ERROR: Failed executing "/bin/mv -f
/var/tmp/OFEDRPM/RPMS/x86_64/ib-bonding-0.9.
				0-1.x86_64.rpm
/tmp/OFED-1.2-20070205-1823/RPMS/redhat-release-4AS-4.1"
				 
				See log file: /tmp/OFED.10899.log
				 
				# tail -10 /tmp/OFED.10899.log
				Checking for unpackaged file(s):
/usr/lib/rpm/check-files /var/tmp/ib-bonding-0.
				9.0-root
				Wrote:
/var/tmp/OFEDRPM/RPMS/x86_64/ib-bonding-0.9.0-1-rh-x86_64.rpm
				Wrote:
/var/tmp/OFEDRPM/RPMS/x86_64/ib-bonding-debuginfo-0.9.0-1-rh-x86_64.rpm
				Executing(--clean): /bin/sh -e
/var/tmp/rpm-tmp.98615
				+ umask 022
				+ cd /var/tmp/OFEDRPM/BUILD
				+ rm -rf ib-bonding-0.9.0
				+ exit 0
				/bin/mv: cannot stat
`/var/tmp/OFEDRPM/RPMS/x86_64/ib-bonding-0.9.0-1.x86_64.rpm
				': No such file or directory
				ERROR: Failed executing "/bin/mv -f
/var/tmp/OFEDRPM/RPMS/x86_64/ib-bonding-0.9.
				0-1.x86_64.rpm
/tmp/OFED-1.2-20070205-1823/RPMS/redhat-release-4AS-4.1"
				

				Scott

________________________________

				From: openib-general-bounces at openib.org
[mailto:openib-general-bounces at openib.org] On Behalf Of Vladimir
Sokolovsky
				Sent: Monday, February 05, 2007 2:26 PM
				To: openfabrics-ewg at openib.org
				Cc: openib-general at openib.org
				Subject: [openib-general] OFED-1.2 first
release
				
				
				Hi,
				
				OFED-1.2-20070205-1823.tgz can be
downloaded from
				
	
http://www.openfabrics.org/builds/ofed-1.2/
				
				
				The first OFED package includes:
				
				
				ofa_kernel-1.2-alpha1.src.rpm
				
				ofa_user-1.2-alpha1.src.rpm
				
				mvapich-0.9.9-971.src.rpm
				
				mvapich2-0.9.8-1.src.rpm
				
				openmpi-1.2b4ofedr13470-1ofed.src.rpm
				
				mpitests-2.0-698.src.rpm
				
				open-iscsi-generic-2.0-742.src.rpm
				
				ib-bonding-0.9.0-1.src.rpm
				
				ofed-docs-1.2-0.src.rpm
				
				ofed-scripts-1.2-0.src.rpm
				
				
				Known issues:
				
				srptools - compilation fails
				
				openib_diags - compilation fails
				
				ibutils - not included yet
				
				
				To build OFED RPMs:
				
				cd OFED-1.2-20070205-1823
				
				./build.sh
				
				
				Created RPMs will be stored under
OFED-1.2-20070205-1823/RPMS/
				
				directory.
				
				
				To install OFED RPMs:
				
				cd OFED-1.2-20070205-1823
				
				./install.sh
				
				
				For a detailed installation guide, see
				
	
OFED-1.2-xxx/docs/OFED_Installation_Guide.txt
				
				
				-- 
				
				Vladimir Sokolovsky
<vlad at dev.mellanox.co.il> <mailto:vlad at dev.mellanox.co.il> 
				
				Mellanox Technologies Ltd.
				 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070206/3896a052/attachment.html>

From rdreier at cisco.com  Tue Feb  6 10:57:55 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Tue, 06 Feb 2007 10:57:55 -0800
Subject: [openib-general] [PATCH 2.6.20] infinband: Use ARRAY_SIZE macro
 when appropriate
In-Reply-To: <20070206160725.GJ8991@Ahmed> (Ahmed S. Darwish's message
	of "Tue, 6 Feb 2007 18:07:25 +0200")
References: <20070206160725.GJ8991@Ahmed>
Message-ID: <ada4ppz866k.fsf@cisco.com>

Thanks, queued in my tree for 2.6.21


From swise at opengridcomputing.com  Tue Feb  6 11:07:38 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Tue, 06 Feb 2007 13:07:38 -0600
Subject: [openib-general] OFED-1.2 first release
In-Reply-To: <A15335FBE9BD2449AF2C9EF3D1EB8EA302F9B77F@xmb-sjc-216.amer.cisco.com>
References: <6C2C79E72C305246B504CBA17B5500C922B30E@mtlexch01.mtl.com>
	<A15335FBE9BD2449AF2C9EF3D1EB8EA302F3F68C@xmb-sjc-216.amer.cisco.com>
	<A15335FBE9BD2449AF2C9EF3D1EB8EA302F3F697@xmb-sjc-216.amer.cisco.com>
	<A15335FBE9BD2449AF2C9EF3D1EB8EA302F9B609@xmb-sjc-216.amer.cisco.com>
	<A15335FBE9BD2449AF2C9EF3D1EB8EA302F9B703@xmb-sjc-216.amer.cisco.com>
	<A15335FBE9BD2449AF2C9EF3D1EB8EA302F9B77F@xmb-sjc-216.amer.cisco.com>
Message-ID: <1170788858.19662.85.camel@stevo-desktop>

I already opened one for the libibverbs.d problem. 339


On Tue, 2007-02-06 at 10:54 -0800, Scott Weitzenkamp (sweitzen) wrote:
> libibverbs is not working.  I have opened bugs 342-346 for the issues
> I've found so far:
>  
> # ibv_devices
> libibverbs: Warning: couldn't open config directory
> '/usr/local/ofed/etc/libibverbs.d'.
> libibverbs: Warning: no userspace device-specific driver found for
> /sys/class/in
> finiband_verbs/uverbs0
>     device                 node GUID
>     ------              ----------------
> 
>  
> Scott Weitzenkamp
> SQA and Release Manager
> Server Virtualization Business Unit
> Cisco Systems
>  
> 
>         
>         ______________________________________________________________
>         From: Scott Weitzenkamp (sweitzen) 
>         Sent: Tuesday, February 06, 2007 9:34 AM
>         To: Scott Weitzenkamp (sweitzen); 'Vladimir Sokolovsky';
>         'openfabrics-ewg at openib.org'; 'Tziporet Koren'
>         Cc: 'openib-general at openib.org'
>         Subject: RE: [openib-general] OFED-1.2 first release
>         
>         
>         
>         sdpnetstat is getting added to the dapl-devel RPM.
>          
>         # rpm -qlip dapl-devel-1.2.0-0.x86_64.rpm
>         Name        : dapl-devel                   Relocations: (not
>         relocatable)
>         Version     : 1.2.0                             Vendor:
>         OpenFabrics
>         Release     : 0                             Build Date: Mon 05
>         Feb 2007 09:48:50
>          PM PST
>         Install Date: (not installed)               Build Host:
>         svbu-qa1850-1.cisco.com
>         Group       : System Environment/Libraries   Source RPM:
>         ofa_user-1.2-alpha1.src
>         .rpm
>         Size        : 692598                           License:
>         GPL/BSD
>         Signature   : (none)
>         URL         : http://www.openfabrics.org/
>         Summary     : Development files for the libdat and libdapl
>         libraries
>         Description :
>         Static libraries and header files for the libdat and libdapl
>         library.
>         /usr/local/ofed/bin/sdpnetstat
>         /usr/local/ofed/include/dat/dat.h
>         /usr/local/ofed/include/dat/dat_error.h
>         /usr/local/ofed/include/dat/dat_platform_specific.h
>         /usr/local/ofed/include/dat/dat_redirection.h
>         /usr/local/ofed/include/dat/dat_registry.h
>         /usr/local/ofed/include/dat/dat_vendor_specific.h
>         /usr/local/ofed/include/dat/udat.h
>         /usr/local/ofed/include/dat/udat_config.h
>         /usr/local/ofed/include/dat/udat_redirection.h
>         /usr/local/ofed/include/dat/udat_vendor_specific.h
>         /usr/local/ofed/lib64/libdaplcma.a
>         /usr/local/ofed/lib64/libdaplcma.so
>         /usr/local/ofed/lib64/libdat.a
>         /usr/local/ofed/lib64/libdat.so
>         
>         
>                 
>                 ______________________________________________________
>                 From: Scott Weitzenkamp (sweitzen) 
>                 Sent: Tuesday, February 06, 2007 12:07 AM
>                 To: Scott Weitzenkamp (sweitzen); 'Vladimir
>                 Sokolovsky'; 'openfabrics-ewg at openib.org'; 'Tziporet
>                 Koren'
>                 Cc: 'openib-general at openib.org'
>                 Subject: RE: [openib-general] OFED-1.2 first release
>                 
>                 
>                 
>                 Not getting MPI RPMS for Intel compilers, either.
>                  
>                 Running /bin/rpm
>                 -Uhv /tmp/OFED-1.2-20070205-1823/RPMS/redhat-release-4AS-4.1/mp
>                 itests_mvapich2_gcc-2.0-698.x86_64.rpm
>                 /tmp/OFED-1.2-20070205-1823/RPMS/redhat-release-4AS-4.1/mvapich2_intel-0.9.8-1.x
>                 86_64.rpm not found
>                 Running /bin/rpm
>                 -Uhv /tmp/OFED-1.2-20070205-1823/RPMS/redhat-release-4AS-4.1/op
>                 enmpi_gcc-1.2b4ofedr13470-1ofed.x86_64.rpm
>                 Running /bin/rpm
>                 -Uhv /tmp/OFED-1.2-20070205-1823/RPMS/redhat-release-4AS-4.1/mp
>                 itests_openmpi_gcc-2.0-698.x86_64.rpm
>                 /tmp/OFED-1.2-20070205-1823/RPMS/redhat-release-4AS-4.1/openmpi_intel-1.2b4ofedr
>                 13470-1ofed.x86_64.rpm not found
>                 ERROR: -.x86_64.rpm not found
>                 under /tmp/OFED-1.2-20070205-1823/RPMS/redhat-rele
>                 ase-4AS-4.1.
>                 Installation finished successfully...
>                 
>                 Scott
>                 
>                         
>                         ______________________________________________
>                         From: Scott Weitzenkamp (sweitzen) 
>                         Sent: Monday, February 05, 2007 9:44 PM
>                         To: Scott Weitzenkamp (sweitzen); 'Vladimir
>                         Sokolovsky'; 'openfabrics-ewg at openib.org';
>                         'Tziporet Koren'
>                         Cc: 'openib-general at openib.org'
>                         Subject: RE: [openib-general] OFED-1.2 first
>                         release
>                         
>                         
>                         
>                         Moving on, I set ib_bonding=n in ofed.conf and
>                         try install.sh again, and now get this:
>                          
>                         ...
>                         Building MVAPICH RPM. Please wait...
>                          
>                         Using gcc compiler
>                         Running rpmbuild -v --rebuild --define
>                         '_topdir /var/tmp/OFEDRPM' --define '_nam
>                         e mvapich_gcc' --define 'ofed 1' --define
>                         'compiler gcc' --define 'openib_prefix
>                          /usr/local/ofed' --define
>                         'build_root /var/tmp/OFED' --define
>                         '_prefix /usr/loc
>                         al/ofed/mpi/gcc/mvapich-0.9.9' /tmp/OFED-1.2-20070205-1823/SRPMS/mvapich-0.9.9-9
>                         71.src.rpm
>                          
>                         ERROR: Failed executing "rpmbuild -v --rebuild
>                         --define '_topdir /var/tmp/OFEDRP
>                         M' --define '_name mvapich_gcc' --define 'ofed
>                         1' --define 'compiler gcc' --defi
>                         ne 'openib_prefix /usr/local/ofed' --define
>                         'build_root /var/tmp/OFED' --define
>                         '_prefix /usr/local/ofed/mpi/gcc/mvapich-0.9.9' /tmp/OFED-1.2-20070205-1823/SRPM
>                         S/mvapich-0.9.9-971.src.rpm"
>                          
>                         See log file: /tmp/OFED.6120.log
>                          
>                         #  tail /tmp/OFED.6120.log
>                         + LANG=C
>                         + export LANG
>                         + unset DISPLAY
>                         /var/tmp/rpm-tmp.870: line 33: syntax error
>                         near unexpected token `)'
>                         error: Bad exit status
>                         from /var/tmp/rpm-tmp.870 (%install)
>                          
>                         
>                         RPM build errors:
>                             Bad exit status from /var/tmp/rpm-tmp.870
>                         (%install)
>                         ERROR: Failed executing "rpmbuild -v --rebuild
>                         --define '_topdir /var/tmp/OFEDRP
>                         M' --define '_name mvapich_gcc' --define 'ofed
>                         1' --define 'compiler gcc' --defi
>                         ne 'openib_prefix /usr/local/ofed' --define
>                         'build_root /var/tmp/OFED' --define
>                         '_prefix /usr/local/ofed/mpi/gcc/mvapich-0.9.9' /tmp/OFED-1.2-20070205-1823/SRPM
>                         S/mvapich-0.9.9-971.src.rpm"
>                         
>                         Scott
>                         
>                                 
>                                 ______________________________________
>                                 From: Scott Weitzenkamp (sweitzen) 
>                                 Sent: Monday, February 05, 2007 9:27
>                                 PM
>                                 To: Vladimir Sokolovsky;
>                                 openfabrics-ewg at openib.org; Tziporet
>                                 Koren; Scott Weitzenkamp (sweitzen)
>                                 Cc: openib-general at openib.org
>                                 Subject: RE: [openib-general] OFED-1.2
>                                 first release
>                                 
>                                 
>                                 
>                                 Vlad and Tziporet,
>                                  
>                                 It might help if you elaborated on
>                                 what you meant by "first release", you
>                                 have been saying "code freeze" but
>                                 really this is "feature freeze",
>                                 right?  This announcement is quite a
>                                 bit different from previous OFED
>                                 announcements, where you detailed what
>                                 features were available and what OS
>                                 were supported.  The daily build email
>                                 mentions compiling against kernels,
>                                 but I haven't seen what distros were
>                                 actually tested.  Are we starting from
>                                 scratch on compiling and testing with
>                                 distros like RHEL4?  Do you anticipate
>                                 we will just go day by day with builds
>                                 trying to stabilize things initially?
>                                  
>                                 In any case, here's what I see when I
>                                 try to compile with install.sh on
>                                 RHEL4 U3 x86_64:
>                                  
>                                 ...
>                                 /tmp/OFED-1.2-20070205-1823/build.sh:
>                                 line 802: kernel-ib: command not found
>                                 Running rpmbuild --rebuild
>                                 --target=noarch --define
>                                 '_topdir /var/tmp/OFEDRPM' -
>                                 -define
>                                 '_prefix /usr/local/ofed' /tmp/OFED-1.2-20070205-1823/SRPMS/ofed-docs-1.
>                                 2-0.src.rpm
>                                 Running /bin/mv
>                                 -f /var/tmp/OFEDRPM/RPMS/noarch/ofed-docs-1.2-0.noarch.rpm /tmp/
>                                 OFED-1.2-20070205-1823/RPMS/redhat-release-4AS-4.1
>                                 Running rpmbuild --rebuild
>                                 --target=noarch --define
>                                 '_topdir /var/tmp/OFEDRPM' -
>                                 -define
>                                 '_prefix /usr/local/ofed' /tmp/OFED-1.2-20070205-1823/SRPMS/ofed-scripts
>                                 -1.2-0.src.rpm
>                                 Running /bin/mv
>                                 -f /var/tmp/OFEDRPM/RPMS/noarch/ofed-scripts-1.2-0.noarch.rpm /t
>                                 mp/OFED-1.2-20070205-1823/RPMS/redhat-release-4AS-4.1
>                                 Running rpmbuild --rebuild --define
>                                 '_topdir /var/tmp/OFEDRPM' --define
>                                 '_prefix
>                                  /usr/local/ofed' /tmp/OFED-1.2-20070205-1823/SRPMS/ib-bonding-0.9.0-1.src.rpm
>                                 Running /bin/mv
>                                 -f /var/tmp/OFEDRPM/RPMS/x86_64/ib-bonding-0.9.0-1.x86_64.rpm /t
>                                 mp/OFED-1.2-20070205-1823/RPMS/redhat-release-4AS-4.1
>                                  
>                                 ERROR: Failed executing "/bin/mv
>                                 -f /var/tmp/OFEDRPM/RPMS/x86_64/ib-bonding-0.9.
>                                 0-1.x86_64.rpm /tmp/OFED-1.2-20070205-1823/RPMS/redhat-release-4AS-4.1"
>                                  
>                                 See log file: /tmp/OFED.10899.log
>                                  
>                                 # tail -10 /tmp/OFED.10899.log
>                                 Checking for unpackaged
>                                 file(s): /usr/lib/rpm/check-files /var/tmp/ib-bonding-0.
>                                 9.0-root
>                                 Wrote: /var/tmp/OFEDRPM/RPMS/x86_64/ib-bonding-0.9.0-1-rh-x86_64.rpm
>                                 Wrote: /var/tmp/OFEDRPM/RPMS/x86_64/ib-bonding-debuginfo-0.9.0-1-rh-x86_64.rpm
>                                 Executing(--clean): /bin/sh
>                                 -e /var/tmp/rpm-tmp.98615
>                                 + umask 022
>                                 + cd /var/tmp/OFEDRPM/BUILD
>                                 + rm -rf ib-bonding-0.9.0
>                                 + exit 0
>                                 /bin/mv: cannot stat
>                                 `/var/tmp/OFEDRPM/RPMS/x86_64/ib-bonding-0.9.0-1.x86_64.rpm
>                                 ': No such file or directory
>                                 ERROR: Failed executing "/bin/mv
>                                 -f /var/tmp/OFEDRPM/RPMS/x86_64/ib-bonding-0.9.
>                                 0-1.x86_64.rpm /tmp/OFED-1.2-20070205-1823/RPMS/redhat-release-4AS-4.1"
>                                 
>                                 
>                                 
>                                  
>                                 Scott
>                                 
>                                         
>                                         ______________________________
>                                         From:
>                                         openib-general-bounces at openib.org [mailto:openib-general-bounces at openib.org] On Behalf Of Vladimir Sokolovsky
>                                         Sent: Monday, February 05,
>                                         2007 2:26 PM
>                                         To: openfabrics-ewg at openib.org
>                                         Cc: openib-general at openib.org
>                                         Subject: [openib-general]
>                                         OFED-1.2 first release
>                                         
>                                         
>                                         
>                                         Hi,
>                                         
>                                         OFED-1.2-20070205-1823.tgz can be downloaded from
>                                         
>                                         http://www.openfabrics.org/builds/ofed-1.2/
>                                         
>                                         
>                                         
>                                         
>                                         
>                                         The first OFED package includes:
>                                         
>                                         
>                                         
>                                         ofa_kernel-1.2-alpha1.src.rpm
>                                         
>                                         ofa_user-1.2-alpha1.src.rpm
>                                         
>                                         mvapich-0.9.9-971.src.rpm
>                                         
>                                         mvapich2-0.9.8-1.src.rpm
>                                         
>                                         openmpi-1.2b4ofedr13470-1ofed.src.rpm
>                                         
>                                         mpitests-2.0-698.src.rpm
>                                         
>                                         open-iscsi-generic-2.0-742.src.rpm
>                                         
>                                         ib-bonding-0.9.0-1.src.rpm
>                                         
>                                         ofed-docs-1.2-0.src.rpm
>                                         
>                                         ofed-scripts-1.2-0.src.rpm
>                                         
>                                         
>                                         
>                                         Known issues:
>                                         
>                                         srptools - compilation fails
>                                         
>                                         openib_diags - compilation fails
>                                         
>                                         ibutils - not included yet
>                                         
>                                         
>                                         
>                                         To build OFED RPMs:
>                                         
>                                         cd OFED-1.2-20070205-1823
>                                         
>                                         ./build.sh
>                                         
>                                         
>                                         
>                                         Created RPMs will be stored under OFED-1.2-20070205-1823/RPMS/
>                                         
>                                         directory.
>                                         
>                                         
>                                         
>                                         To install OFED RPMs:
>                                         
>                                         cd OFED-1.2-20070205-1823
>                                         
>                                         ./install.sh
>                                         
>                                         
>                                         
>                                         For a detailed installation guide, see
>                                         
>                                         OFED-1.2-xxx/docs/OFED_Installation_Guide.txt
>                                         
>                                         
>                                         
>                                         -- 
>                                         
>                                         Vladimir Sokolovsky <vlad at dev.mellanox.co.il>
>                                         
>                                         Mellanox Technologies Ltd.
>                                         
>                                          
>                                         
>                                         
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


From mst at mellanox.co.il  Tue Feb  6 11:22:31 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Tue, 6 Feb 2007 21:22:31 +0200
Subject: [openib-general] [PATCH] [RFC] ofed_1_2 - SLES9SP3 Backport -
 IWCM workaround for ip_dev_find() bug.
In-Reply-To: <45C8CA7C.3080705@ichips.intel.com>
References: <1170782906.19662.61.camel@stevo-desktop>
	<45C8BCC9.4070003@ichips.intel.com>
	<1170784924.19662.79.camel@stevo-desktop>
	<45C8CA7C.3080705@ichips.intel.com>
Message-ID: <20070206192231.GG24372@mellanox.co.il>

> Quoting Sean Hefty <mshefty at ichips.intel.com>:
> Subject: Re: [PATCH] [RFC] ofed_1_2 - SLES9SP3 Backport - IWCM workaround for ip_dev_find() bug.
> 
> > Actually, yes it does.  Here's one case (that I just tested :):
> > 
> > If you rdma_bind() to an explicit address local address, it will fail.
> > 
> > Foo!
> > 
> > I guess I'll need to address the uses of ip_dev_find() in addr.c as well
> > before we commit this.
> 
> Can we just backport our own version of ip_dev_find()?  We had this once before 
> in svn when they removed it from being exported from the kernel.

Yes, this is in kernel_addons for 2.6.19 or something like that.
Just copy from there, much cleaner than the patch.
	
-- 
MST


From Tim.Snider at lsi.com  Tue Feb  6 11:15:32 2007
From: Tim.Snider at lsi.com (Snider, Tim)
Date: Tue, 6 Feb 2007 12:15:32 -0700
Subject: [openib-general] Run srp and ipoib on same port simultaneously?
Message-ID: <18A61515E49B764AB09447A336E51F5693EC25@NAMAIL2.ad.lsil.com>

Is there anything that prevents 2 ULPs - srp and ipoib - from running
simultaneously on the same port in OFED 1.1.1?
If so what about different ports on the same hca?
 
Timothy Snider 
Storage Architect
Strategic Planning, Technology and Architecture

LSI Logic Corporation
3718 North Rock Road
Wichita, KS 67226
(316) 636-8736 
tim.snider at lsi.com <mailto:tim.snider at lsi.com>  

 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070206/5468269e/attachment.html>

From bugzilla-daemon at lists.openfabrics.org  Tue Feb  6 11:26:23 2007
From: bugzilla-daemon at lists.openfabrics.org (bugzilla-daemon at lists.openfabrics.org)
Date: Tue,  6 Feb 2007 11:26:23 -0800 (PST)
Subject: [openib-general] [Bug 347] New: rdma cm backport to EL4 - U3 broken
Message-ID: <bug-347-1@https.bugs.openfabrics.org/>

https://bugs.openfabrics.org/show_bug.cgi?id=347

           Summary: rdma cm backport to EL4 - U3 broken
           Product: OpenFabrics Linux
           Version: 1.2
          Platform: X86-64
        OS/Version: RHEL 4
            Status: NEW
          Severity: blocker
          Priority: P1
         Component: IB Core
        AssignedTo: bugzilla at openib.org
        ReportedBy: robert.j.woodruff at intel.com


librdmacm: couldn't read ABI version.
librdmacm: assuming: 4

I was able to fix this by applying the following backport patch
when running on EL4-U3

diff -Naurp linux-2.6.9/drivers/infiniband/core/ucma.c
linux-2.6.9-openib-drivers-git013007-fixups/drivers/infiniband/core/ucma.c
--- linux-2.6.9/drivers/infiniband/core/ucma.c  2007-01-30 13:13:54.000000000
-0800
+++ linux-2.6.9-openib-drivers-git013007-fixups/drivers/infiniband/core/ucma.c 
2007-01-30 13:35:56.000000000 -0800
@@ -1045,13 +1045,13 @@ static struct miscdevice ucma_misc = {
        .fops   = &ucma_fops,
 };

-static ssize_t show_abi_version(struct device *dev,
-                               struct device_attribute *attr,
-                               char *buf)
+static struct class *ucma_class;
+static ssize_t show_abi_version(struct class *class_dev, char *buf)
 {
-       return sprintf(buf, "%d\n", RDMA_USER_CM_ABI_VERSION);
+        return sprintf(buf, "%d\n", RDMA_USER_CM_ABI_VERSION);
 }
-static DEVICE_ATTR(abi_version, S_IRUGO, show_abi_version, NULL);
+static CLASS_ATTR(abi_version, S_IRUGO, show_abi_version, NULL);
+

 static int __init ucma_init(void)
 {
@@ -1061,22 +1061,28 @@ static int __init ucma_init(void)
        if (ret)
                return ret;

-       ret = device_create_file(ucma_misc.this_device, &dev_attr_abi_version);
-       if (ret) {
-               printk(KERN_ERR "rdma_ucm: couldn't create abi_version
attr\n");
-               goto err;
-       }
-       return 0;
+        ucma_class = class_create(THIS_MODULE, "infiniband_ucma");
+        if (IS_ERR(ucma_class)) {
+                printk(KERN_ERR "rdma_ucm: couldn't create class
infiniband_ucma\n");
+                goto err;
+        }
+
+        ret = class_create_file(ucma_class, &class_attr_abi_version);
+        if (ret) {
+                printk(KERN_ERR "user_verbs: couldn't create abi_version
attribute\n");
+                goto err;
+        }
+
+        return 0;
 err:
-       misc_deregister(&ucma_misc);
-       return ret;
+        misc_deregister(&ucma_misc);
+        return ret;
 }

+
 static void __exit ucma_cleanup(void)
 {
-       device_remove_file(ucma_misc.this_device, &dev_attr_abi_version);
        misc_deregister(&ucma_misc);
-       idr_destroy(&ctx_idr);
 }

 module_init(ucma_init);


-- 
Configure bugmail: https://bugs.openfabrics.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From rdreier at cisco.com  Tue Feb  6 11:33:02 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Tue, 06 Feb 2007 11:33:02 -0800
Subject: [openib-general] Run srp and ipoib on same port simultaneously?
In-Reply-To: <18A61515E49B764AB09447A336E51F5693EC25@NAMAIL2.ad.lsil.com>
	(Tim Snider's message of "Tue, 6 Feb 2007 12:15:32 -0700")
References: <18A61515E49B764AB09447A336E51F5693EC25@NAMAIL2.ad.lsil.com>
Message-ID: <adaabzrxes1.fsf@cisco.com>

 > Is there anything that prevents 2 ULPs - srp and ipoib - from running
 > simultaneously on the same port in OFED 1.1.1?

No, there isn't.  Are you seeing problems trying it?

 - R.


From swise at opengridcomputing.com  Tue Feb  6 11:42:03 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Tue, 06 Feb 2007 13:42:03 -0600
Subject: [openib-general] [PATCH] [RFC] ofed_1_2 - SLES9SP3 Backport -
 IWCM workaround for ip_dev_find() bug.
In-Reply-To: <20070206192231.GG24372@mellanox.co.il>
References: <1170782906.19662.61.camel@stevo-desktop>
	<45C8BCC9.4070003@ichips.intel.com>
	<1170784924.19662.79.camel@stevo-desktop>
	<45C8CA7C.3080705@ichips.intel.com>
	<20070206192231.GG24372@mellanox.co.il>
Message-ID: <1170790923.19662.95.camel@stevo-desktop>

On Tue, 2007-02-06 at 21:22 +0200, Michael S. Tsirkin wrote:
> > Quoting Sean Hefty <mshefty at ichips.intel.com>:
> > Subject: Re: [PATCH] [RFC] ofed_1_2 - SLES9SP3 Backport - IWCM workaround for ip_dev_find() bug.
> > 
> > > Actually, yes it does.  Here's one case (that I just tested :):
> > > 
> > > If you rdma_bind() to an explicit address local address, it will fail.
> > > 
> > > Foo!
> > > 
> > > I guess I'll need to address the uses of ip_dev_find() in addr.c as well
> > > before we commit this.
> > 
> > Can we just backport our own version of ip_dev_find()?  We had this once before 
> > in svn when they removed it from being exported from the kernel.
> 
> Yes, this is in kernel_addons for 2.6.19 or something like that.
> Just copy from there, much cleaner than the patch.
> 	

I just realized that ip_dev_find() is being redefined to xxx_ip_dev_find
for sles9sp3.  So maybe this function is causing the error.  Stay tuned.


Steve.


From rdreier at cisco.com  Tue Feb  6 11:54:13 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Tue, 06 Feb 2007 11:54:13 -0800
Subject: [openib-general] please pull for 2.6.21: fix + add IB multicast
 support
In-Reply-To: <45C85B39.4080700@voltaire.com> (Or Gerlitz's message of
	"Tue, 06 Feb 2007 12:40:57 +0200")
References: <000701c741a6$16dc4760$ff0da8c0@amr.corp.intel.com>
	<45BF8E17.2010805@ichips.intel.com> <45C37BE9.5040105@ichips.intel.com>
	<45C85B39.4080700@voltaire.com>
Message-ID: <adad54nvz8a.fsf@cisco.com>

 > Can you comment on the multicast changes merge for 2.6.21 status?

Where are the final patches that you want to merge?

 - R.


From Tim.Snider at lsi.com  Tue Feb  6 11:48:34 2007
From: Tim.Snider at lsi.com (Snider, Tim)
Date: Tue, 6 Feb 2007 12:48:34 -0700
Subject: [openib-general] Run srp and ipoib on same port simultaneously?
Message-ID: <18A61515E49B764AB09447A336E51F5693EC31@NAMAIL2.ad.lsil.com>

No specific problems using the 2 just questioning, I've been looking at
other stuff recently.
I'm trying a single server to:
	1.	Connect Lustre servers using ipoib and 
	2.	recognize the IB storage using srp.
all the ibv_xx_ping_pong routines work between servers. Pings using
ipoib IP addresses also work. 
Lustre says ipoib is down, & srp doesn't see luns as it did yesterday. 
Trying to rule out pilot errors / configuration problems.
thanks


-----Original Message-----
From: Roland Dreier [mailto:rdreier at cisco.com] 
Sent: Tuesday, February 06, 2007 1:33 PM
To: Snider, Tim
Cc: openib-general at openib.org
Subject: Re: [openib-general] Run srp and ipoib on same port
simultaneously?

 > Is there anything that prevents 2 ULPs - srp and ipoib - from running
> simultaneously on the same port in OFED 1.1.1?

No, there isn't.  Are you seeing problems trying it?

 - R.


From rdreier at cisco.com  Tue Feb  6 11:59:25 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Tue, 06 Feb 2007 11:59:25 -0800
Subject: [openib-general] [PATCH] enable IPoIB only if broadcast join
	finish
In-Reply-To: <OF15ED804B.C1AB3DA2-ON87257279.0050CC75-88257279.00259EE9@us.ibm.com>
	(Shirley Ma's message of "Mon, 5 Feb 2007 07:50:55 -0700")
References: <OF15ED804B.C1AB3DA2-ON87257279.0050CC75-88257279.00259EE9@us.ibm.com>
Message-ID: <ada4ppzvyzm.fsf@cisco.com>

 > Here is the patch, if possible please give your input asap, we have an
 > urgent customer issue need to be resolved:

I guess this is OK, but what is the urgent issue it fixes?

 - R.


From rdreier at cisco.com  Tue Feb  6 11:58:38 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Tue, 06 Feb 2007 11:58:38 -0800
Subject: [openib-general] [libmthca] deadlock while trying to destroy QP
In-Reply-To: <45C75EA2.6000905@Voltaire.COM> (guyg@voltaire.com's
	message of "Mon, 05 Feb 2007 18:43:14 +0200")
References: <45C75EA2.6000905@Voltaire.COM>
Message-ID: <ada8xfbvz0x.fsf@cisco.com>

 > #0  0x0000003a6ce09172 in pthread_spin_lock () from /lib64/tls/libpthread.so.0
 > #1  0x0000002a959cf449 in mthca_cq_clean (cq=0x607240, qpn=3277830, srq=0x0) at src/cq.c:554
 > #2  0x0000002a959d28b9 in mthca_destroy_qp (qp=0x607400) at src/mthca.h:246
 > #3  0x000000000040117b in client_sig_handler ()
 > #4  <signal handler called>
 > #5  0x0000003a6ce09165 in pthread_spin_lock () from /lib64/tls/libpthread.so.0
 > #6  0x0000002a959cec91 in mthca_poll_cq (ibcq=0x607240, ne=1, wc=0x7fbffff590) at src/cq.c:467
 > #7  0x0000002a9557bf73 in ibv_poll_cq (cq=0x607240, num_entries=1, wc=0x7fbffff590) at /usr/local/ofed/include/infiniband/verbs.h:824

I guess my first reaction is "don't do that."  Trying to do something
as complex as destroying a QP from a signal handler seems very fragile
to me, and I wouldn't consider ibv_destroy_qp() safe to call from a
signal handler.

Can you just have your signal handler set a flag instead, and check
the flag from the normal flow of your program?

 > Does destroy_qp needs to be dependent on the CQ?

Yes, it needs to lock the CQ to get rid of stale completions for the
QP being destroyed.

 - R.


From sean.hefty at intel.com  Tue Feb  6 12:00:22 2007
From: sean.hefty at intel.com (Sean Hefty)
Date: Tue, 6 Feb 2007 12:00:22 -0800
Subject: [openib-general] please pull for 2.6.21: fix + add IB multicast
 support
In-Reply-To: <adad54nvz8a.fsf@cisco.com>
Message-ID: <000101c74a29$6f796610$e598070a@amr.corp.intel.com>

> > Can you comment on the multicast changes merge for 2.6.21 status?
>
>Where are the final patches that you want to merge?

Try the for-roland branch at git.openfabrics.org/~shefty/scm/rdma-dev.git.  If
this doesn't work, or you hit any snags, let me know, and I'll try to correct
any issues so that simple pulls work in the future.  Note that my tree is still
at rc6.

There should be 4 patches.

- Sean


From rdreier at cisco.com  Tue Feb  6 12:02:56 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Tue, 06 Feb 2007 12:02:56 -0800
Subject: [openib-general] [libmthca] deadlock while trying to destroy QP
In-Reply-To: <ada8xfbvz0x.fsf@cisco.com> (Roland Dreier's message of
	"Tue, 06 Feb 2007 11:58:38 -0800")
References: <45C75EA2.6000905@Voltaire.COM> <ada8xfbvz0x.fsf@cisco.com>
Message-ID: <adalkjbuk9b.fsf@cisco.com>

> I guess my first reaction is "don't do that."

eg look at http://www.gnu.org/software/libc/manual/html_node/Nonreentrancy.html


From xma at us.ibm.com  Tue Feb  6 12:16:59 2007
From: xma at us.ibm.com (Shirley Ma)
Date: Tue, 6 Feb 2007 12:16:59 -0800
Subject: [openib-general] [PATCH] enable IPoIB only if broadcast join
	finish
In-Reply-To: <ada4ppzvyzm.fsf@cisco.com>
Message-ID: <OF8B1E1C32.955A161A-ON8725727A.006E31BB-8825727A.006F6B69@us.ibm.com>


      Thanks Roland, I will apply the patch to the customer's cluster.

      The problem I found when failover bringing the new IPoIB interface up
in the existing fabric, with a limit number of multicast join groups from
our configuration, the interface can join broadcast group successfully, but
all hosts multicast group join failure. Then ib interface can be UP, but
not RUNNING. The interface couldn't work at all.

Thanks
Shirley Ma
IBM Linux Technology Center
15300 SW Koll Parkway
Beaverton, OR 97006-6063
Phone(Fax): (503) 578-7638


             Roland Dreier                                                 
             <rdreier at cisco.co                                             
             m>                                                         To 
                                       Shirley Ma/Beaverton/IBM at IBMUS      
             02/06/2007 11:59                                           cc 
             AM                        "Michael S. Tsirkin"                
                                       <mst at mellanox.co.il>,               
                                       openib-general at openib.org           
                                                                   Subject 
                                       Re: [PATCH] enable IPoIB only if    
                                       broadcast join finish               
                                                                           
                                                                           
 > Here is the patch, if possible please give your input asap, we have an
 > urgent customer issue need to be resolved:

I guess this is OK, but what is the urgent issue it fixes?

 - R.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070206/f68bcb91/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070206/f68bcb91/attachment.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: pic21861.gif
Type: image/gif
Size: 1255 bytes
Desc: not available
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070206/f68bcb91/attachment-0001.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ecblank.gif
Type: image/gif
Size: 45 bytes
Desc: not available
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070206/f68bcb91/attachment-0002.gif>

From swise at opengridcomputing.com  Tue Feb  6 12:24:43 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Tue, 06 Feb 2007 14:24:43 -0600
Subject: [openib-general] [PATCH] [RFC] ofed_1_2 - SLES9SP3 Backport -
 IWCM workaround for ip_dev_find() bug.
In-Reply-To: <1170790923.19662.95.camel@stevo-desktop>
References: <1170782906.19662.61.camel@stevo-desktop>
	<45C8BCC9.4070003@ichips.intel.com>
	<1170784924.19662.79.camel@stevo-desktop>
	<45C8CA7C.3080705@ichips.intel.com>
	<20070206192231.GG24372@mellanox.co.il>
	<1170790923.19662.95.camel@stevo-desktop>
Message-ID: <1170793483.19662.112.camel@stevo-desktop>

> > > 
> > > Can we just backport our own version of ip_dev_find()?  We had this once before 
> > > in svn when they removed it from being exported from the kernel.
> > 
> > Yes, this is in kernel_addons for 2.6.19 or something like that.
> > Just copy from there, much cleaner than the patch.
> > 	
> 
> I just realized that ip_dev_find() is being redefined to xxx_ip_dev_find
> for sles9sp3.  So maybe this function is causing the error.  Stay tuned.

xxx_ip_dev_find() is returning the wrong interface (sometimes).  I added
printks to xxx_ip_dev_find().  Then I ran rping -s -a <local ip addr>
and it failed because xxx_ip_dev_find() returned loopback instead of my
eth device.  

Here is the function with printks:

static inline struct net_device *xxx_ip_dev_find(u32 addr)
{
        struct net_device *dev;
        u32 ip;

        read_lock(&dev_base_lock);
        printk("%s looking for dev with addr %x\n", __FUNCTION__, addr);
        for (dev = dev_base; dev; dev = dev->next) {
                ip = inet_select_addr(dev, 0, RT_SCOPE_LINK);
                printk("%s dev %p name %s ipaddr %x\n", __FUNCTION__,
                        dev, dev->name, ip);
                if (ip == addr) {
                        dev_hold(dev);
                        break;
                }
        }
        read_unlock(&dev_base_lock);

        return dev;
}


Here is the printk log showing loopback being returned:

xxx_ip_dev_find looking for dev with addr 8846a8c0
xxx_ip_dev_find dev ffffffff804000e0 name lo ipaddr 8846a8c0

The address bound to eth3 is 192.168.70.136 (0xc0a84688).  For some
reason, this line:

                ip = inet_select_addr(dev, 0, RT_SCOPE_LINK);

Returns the 192.168.70.136 address for device->name == "lo".

Riddle me that!

Also, sometimes it works ok because the loopback interface gets some
other ip address that is assigned to the local system as opposed to my
rdma address.  For example, I booted up the sles9sp3 system with a
rebuilt kernel and no ofed modules installed.  The system gets
10.10.0.136 via DHCP for its "public" interface.  I then built the ofed
modules and installed them.  I then loaded them and configured my rnic
interface with 192.168.70.136.  I ran rping and bound to the local
ipaddr and it worked.  The log showed that inet_select_addr() returned
10.10.0.136 for loopback and thus xxx_ip_dev_find() continued walking
the list and found the correct ethernet interface.  I then rebooted and
ran the test again and it failed.  So somehow module load order affects
this, I think.

grrrr.


Steve.


From mst at mellanox.co.il  Tue Feb  6 12:32:54 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Tue, 6 Feb 2007 22:32:54 +0200
Subject: [openib-general] [PATCH] [RFC] ofed_1_2 - SLES9SP3 Backport
 -IWCM workaround for ip_dev_find() bug.
In-Reply-To: <1170793483.19662.112.camel@stevo-desktop>
References: <1170793483.19662.112.camel@stevo-desktop>
Message-ID: <20070206203253.GL24372@mellanox.co.il>

> Quoting Steve Wise <swise at opengridcomputing.com>:
> Subject: Re: [openib-general] [PATCH] [RFC] ofed_1_2 - SLES9SP3 Backport -IWCM workaround for ip_dev_find() bug.
> 
> > > > 
> > > > Can we just backport our own version of ip_dev_find()?  We had this once before 
> > > > in svn when they removed it from being exported from the kernel.
> > > 
> > > Yes, this is in kernel_addons for 2.6.19 or something like that.
> > > Just copy from there, much cleaner than the patch.
> > > 	
> > 
> > I just realized that ip_dev_find() is being redefined to xxx_ip_dev_find
> > for sles9sp3.  So maybe this function is causing the error.  Stay tuned.
> 
> xxx_ip_dev_find() is returning the wrong interface (sometimes).  I added
> printks to xxx_ip_dev_find().  Then I ran rping -s -a <local ip addr>
> and it failed because xxx_ip_dev_find() returned loopback instead of my
> eth device.  
> 
> Here is the function with printks:
> 
> static inline struct net_device *xxx_ip_dev_find(u32 addr)
> {
>         struct net_device *dev;
>         u32 ip;
> 
>         read_lock(&dev_base_lock);
>         printk("%s looking for dev with addr %x\n", __FUNCTION__, addr);
>         for (dev = dev_base; dev; dev = dev->next) {
>                 ip = inet_select_addr(dev, 0, RT_SCOPE_LINK);
>                 printk("%s dev %p name %s ipaddr %x\n", __FUNCTION__,
>                         dev, dev->name, ip);
>                 if (ip == addr) {
>                         dev_hold(dev);
>                         break;
>                 }
>         }
>         read_unlock(&dev_base_lock);
> 
>         return dev;
> }
> 
> 
> Here is the printk log showing loopback being returned:
> 
> xxx_ip_dev_find looking for dev with addr 8846a8c0
> xxx_ip_dev_find dev ffffffff804000e0 name lo ipaddr 8846a8c0
> 
> The address bound to eth3 is 192.168.70.136 (0xc0a84688).  For some
> reason, this line:
> 
>                 ip = inet_select_addr(dev, 0, RT_SCOPE_LINK);
> 
> Returns the 192.168.70.136 address for device->name == "lo".
> 
> Riddle me that!
> 
> Also, sometimes it works ok because the loopback interface gets some
> other ip address that is assigned to the local system as opposed to my
> rdma address.  For example, I booted up the sles9sp3 system with a
> rebuilt kernel and no ofed modules installed.  The system gets
> 10.10.0.136 via DHCP for its "public" interface.  I then built the ofed
> modules and installed them.  I then loaded them and configured my rnic
> interface with 192.168.70.136.  I ran rping and bound to the local
> ipaddr and it worked.  The log showed that inet_select_addr() returned
> 10.10.0.136 for loopback and thus xxx_ip_dev_find() continued walking
> the list and found the correct ethernet interface.  I then rebooted and
> ran the test again and it failed.  So somehow module load order affects
> this, I think.
> 
> grrrr.


Try copying inet_select_addr source in from some upstream kernel,
look at that.

-- 
MST


From michael.arndt at informatik.tu-chemnitz.de  Tue Feb  6 12:58:29 2007
From: michael.arndt at informatik.tu-chemnitz.de (Michael Arndt)
Date: Tue, 6 Feb 2007 21:58:29 +0100
Subject: [openib-general] Unknown SMP Recv
References: <000901c74938$e10b2a30$21606d86@one7>
	<1170689654.4525.201415.camel@hal.voltaire.com>
	<001401c74946$a664a2e0$21606d86@one7>
	<1170695591.4525.207604.camel@hal.voltaire.com>
Message-ID: <000601c74a31$8e279480$21606d86@one7>

Hi,

> Guess you don't mean IB router when you say router in your description.
yes

> Is the sender a normal node ? Is normal node mean standard OpenIB
> without changes ? How was the SMI changed ? On which nodes ? Only the
> intermediate one ?

Yes, the sender is a normal node without any changes. Yes, the SMI is only 
on intermediate ones changed.

> Aside from the initial path being [0][1][1], what are the hop count and
> hop pointer ? What are DrDLID and DrSLID as well as the LIDs in the LRH
> of the SMP ?

node 1 -> node 2 -> node 3 (router on node 2)

The orginal packet has the initial path [0][1][1], return path [0][2][2], 
hop count and hop pointer are 2 (SubnGetResp), the Dr_DLID and DrSLID are 
permissive. LID in the LRH are both 0.


Thanks Michael


From michael.arndt at informatik.tu-chemnitz.de  Tue Feb  6 13:14:17 2007
From: michael.arndt at informatik.tu-chemnitz.de (Michael Arndt)
Date: Tue, 6 Feb 2007 22:14:17 +0100
Subject: [openib-general] Unknown SMP Recv
References: <000901c74938$e10b2a30$21606d86@one7>
	<1170689654.4525.201415.camel@hal.voltaire.com>
	<001401c74946$a664a2e0$21606d86@one7>
	<1170695591.4525.207604.camel@hal.voltaire.com>
Message-ID: <002001c74a33$c2ec1db0$21606d86@one7>

Sorry,

there was a little mistake.

The orginal packet has the initial path [0][1][1], return path [0][2][2],
hop count and hop pointer are 2 (SubnGetResp), the Dr_DLID and DrSLID are
permissive.

The packet I asking for has the  initial path [0][0][0], return path 
[0][0][0],
hop count and hop pointer are 2 (SubnGetResp), the Dr_DLID and DrSLID are 0. 
And the LIDs in LRH are 0. The rest of the smp header is the same as it is 
in the original header.

Micheal Arndt


From swise at opengridcomputing.com  Tue Feb  6 13:17:43 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Tue, 06 Feb 2007 15:17:43 -0600
Subject: [openib-general] [PATCH] [RFC] ofed_1_2 - SLES9SP3 Backport
 -IWCM workaround for ip_dev_find() bug.
In-Reply-To: <20070206203253.GL24372@mellanox.co.il>
References: <1170793483.19662.112.camel@stevo-desktop>
	<20070206203253.GL24372@mellanox.co.il>
Message-ID: <1170796663.19662.117.camel@stevo-desktop>


> Try copying inet_select_addr source in from some upstream kernel,
> look at that.
> 

It appears that xxx_ip_find_dev() should be calling inet_select_addr
with RT_SCOPE_HOST and not RT_SCOPE_LINK.  Everything works fine for me
if I change xxx_ip_find_dev() to use RT_SCOPE_HOST.  


>From the header file linux/rtnetlink.h.  Note the comment on HOST vs
LINK:


/* rtm_scope

   Really it is not scope, but sort of distance to the destination.
   NOWHERE are reserved for not existing destinations, HOST is our
   local addresses, LINK are destinations, located on directly attached
   link and UNIVERSE is everywhere in the Universe.

   Intermediate values are also possible f.e. interior routes
   could be assigned a value between UNIVERSE and LINK.
*/

enum rt_scope_t
{
        RT_SCOPE_UNIVERSE=0,
/* User defined values  */
        RT_SCOPE_SITE=200,
        RT_SCOPE_LINK=253,
        RT_SCOPE_HOST=254,
        RT_SCOPE_NOWHERE=255
};


From swise at opengridcomputing.com  Tue Feb  6 13:34:03 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Tue, 06 Feb 2007 15:34:03 -0600
Subject: [openib-general] [PATCH] [RFC] ofed_1_2 - SLES9SP3 Backport
 -IWCM workaround for ip_dev_find() bug.
In-Reply-To: <1170796663.19662.117.camel@stevo-desktop>
References: <1170793483.19662.112.camel@stevo-desktop>
	<20070206203253.GL24372@mellanox.co.il>
	<1170796663.19662.117.camel@stevo-desktop>
Message-ID: <1170797643.19662.120.camel@stevo-desktop>

How shall I fix this?  

I think the correct scope is RT_SCOPE_HOST.

Anyone know why RT_SCOPE_LINK was chosen?


On Tue, 2007-02-06 at 15:17 -0600, Steve Wise wrote:
> > Try copying inet_select_addr source in from some upstream kernel,
> > look at that.
> > 
> 
> It appears that xxx_ip_find_dev() should be calling inet_select_addr
> with RT_SCOPE_HOST and not RT_SCOPE_LINK.  Everything works fine for me
> if I change xxx_ip_find_dev() to use RT_SCOPE_HOST.  
> 
> 
> >From the header file linux/rtnetlink.h.  Note the comment on HOST vs
> LINK:
> 
> 
> /* rtm_scope
> 
>    Really it is not scope, but sort of distance to the destination.
>    NOWHERE are reserved for not existing destinations, HOST is our
>    local addresses, LINK are destinations, located on directly attached
>    link and UNIVERSE is everywhere in the Universe.
> 
>    Intermediate values are also possible f.e. interior routes
>    could be assigned a value between UNIVERSE and LINK.
> */
> 
> enum rt_scope_t
> {
>         RT_SCOPE_UNIVERSE=0,
> /* User defined values  */
>         RT_SCOPE_SITE=200,
>         RT_SCOPE_LINK=253,
>         RT_SCOPE_HOST=254,
>         RT_SCOPE_NOWHERE=255
> };
> 
> 
> 
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
> 


From mst at mellanox.co.il  Tue Feb  6 13:36:56 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Tue, 6 Feb 2007 23:36:56 +0200
Subject: [openib-general] [PATCH] [RFC] ofed_1_2 - SLES9SP3 Backport
 -IWCM workaround for ip_dev_find() bug.
In-Reply-To: <1170797643.19662.120.camel@stevo-desktop>
References: <1170793483.19662.112.camel@stevo-desktop>
	<20070206203253.GL24372@mellanox.co.il>
	<1170796663.19662.117.camel@stevo-desktop>
	<1170797643.19662.120.camel@stevo-desktop>
Message-ID: <20070206213656.GN24372@mellanox.co.il>

> How shall I fix this?  

Patch?

-- 
MST


From halr at voltaire.com  Tue Feb  6 13:46:12 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 06 Feb 2007 16:46:12 -0500
Subject: [openib-general] Unknown SMP Recv
In-Reply-To: <002001c74a33$c2ec1db0$21606d86@one7>
References: <000901c74938$e10b2a30$21606d86@one7>
	<1170689654.4525.201415.camel@hal.voltaire.com>
	<001401c74946$a664a2e0$21606d86@one7>
	<1170695591.4525.207604.camel@hal.voltaire.com>
	<002001c74a33$c2ec1db0$21606d86@one7>
Message-ID: <1170798366.4525.314959.camel@hal.voltaire.com>

On Tue, 2007-02-06 at 16:14, Michael Arndt wrote:
> Sorry,
> 
> there was a little mistake.
> 
> The orginal packet has the initial path [0][1][1], return path [0][2][2],
> hop count and hop pointer are 2 (SubnGetResp), the Dr_DLID and DrSLID are
> permissive.

Is this the response ? If so, what's the status ? What is the attribute
ID ?

> The packet I asking for

Is this the outgoing packet ?

>  has the  initial path [0][0][0], return path 
> [0][0][0],
> hop count and hop pointer are 2 (SubnGetResp),

Should this be SubnGet rather than SubnGetResp ?

-- Hal

>  the Dr_DLID and DrSLID are 0. 
> And the LIDs in LRH are 0. The rest of the smp header is the same as it is 
> in the original header.

> Micheal Arndt
> 
> 


From swise at opengridcomputing.com  Tue Feb  6 14:02:00 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Tue, 06 Feb 2007 16:02:00 -0600
Subject: [openib-general] [PATCH] [RFC] ofed_1_2 - SLES9SP3 Backport
 -IWCM workaround for ip_dev_find() bug.
In-Reply-To: <20070206213656.GN24372@mellanox.co.il>
References: <1170793483.19662.112.camel@stevo-desktop>
	<20070206203253.GL24372@mellanox.co.il>
	<1170796663.19662.117.camel@stevo-desktop>
	<1170797643.19662.120.camel@stevo-desktop>
	<20070206213656.GN24372@mellanox.co.il>
Message-ID: <1170799320.19662.124.camel@stevo-desktop>

On Tue, 2007-02-06 at 23:36 +0200, Michael S. Tsirkin wrote:
> > How shall I fix this?  
> 
> Patch?
> 

Riiight.  I'm afraid if I use HOST instead of LINK that I'll break some
strange SDP loopback feature or some such thing.  And I'm not in a
position to test that.

But I can post a patch.  Shall I just change sles9sp3 since we don't see
(yet) any problems with the other distros?


From mst at mellanox.co.il  Tue Feb  6 14:12:32 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Wed, 7 Feb 2007 00:12:32 +0200
Subject: [openib-general] [PATCH] [RFC] ofed_1_2 - SLES9SP3 Backport
 -IWCM workaroundfor ip_dev_find() bug.
In-Reply-To: <1170799320.19662.124.camel@stevo-desktop>
References: <1170799320.19662.124.camel@stevo-desktop>
Message-ID: <20070206221232.GO24372@mellanox.co.il>

> Quoting Steve Wise <swise at opengridcomputing.com>:
> Subject: Re: [PATCH] [RFC] ofed_1_2 - SLES9SP3 Backport -IWCM workaroundfor ip_dev_find() bug.
> 
> On Tue, 2007-02-06 at 23:36 +0200, Michael S. Tsirkin wrote:
> > > How shall I fix this?  
> > 
> > Patch?
> > 
> 
> Riiight.  I'm afraid if I use HOST instead of LINK that I'll break some
> strange SDP loopback feature or some such thing.  And I'm not in a
> position to test that.
> 
> But I can post a patch.  Shall I just change sles9sp3 since we don't see
> (yet) any problems with the other distros?

If you post one that updates all kernels it will be easier to test.

-- 
MST


From michael.arndt at informatik.tu-chemnitz.de  Tue Feb  6 14:14:13 2007
From: michael.arndt at informatik.tu-chemnitz.de (Michael Arndt)
Date: Tue, 6 Feb 2007 23:14:13 +0100
Subject: [openib-general] Unknown SMP Recv
References: <000901c74938$e10b2a30$21606d86@one7>
	<1170689654.4525.201415.camel@hal.voltaire.com>
	<001401c74946$a664a2e0$21606d86@one7>
	<1170695591.4525.207604.camel@hal.voltaire.com>
	<002001c74a33$c2ec1db0$21606d86@one7>
	<1170798366.4525.314959.camel@hal.voltaire.com>
Message-ID: <000401c74a3c$2204c6a0$21606d86@one7>

> Is this the response ? If so, what's the status ? What is the attribute
> ID ?
yes its a response. The attribute is NodeInfo or Portinfo or what ever...the 
attribute ID didn't change from the original packet (first receive). The 
status is 0 and the D-Bit is set, because it is a response.

>> The packet I asking for
>
> Is this the outgoing packet ?

no, it is a receive. I send one SubnGet and recv two SubnGetResp( one is ok 
and one is like I described)

>Should this be SubnGet rather than SubnGetResp ?

I don't know. The szenario is (node1, sender) -> (node2, router, which 
receive the two SubnGetResp) -> (node3, responder)...The affect appears only 
on the way back.

Thanks Michael


From swise at opengridcomputing.com  Tue Feb  6 14:15:43 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Tue, 06 Feb 2007 16:15:43 -0600
Subject: [openib-general] [PATCH] [RFC] ofed_1_2 - SLES9SP3 Backport
 -IWCM workaroundfor ip_dev_find() bug.
In-Reply-To: <20070206221232.GO24372@mellanox.co.il>
References: <1170799320.19662.124.camel@stevo-desktop>
	<20070206221232.GO24372@mellanox.co.il>
Message-ID: <1170800143.19662.125.camel@stevo-desktop>

On Wed, 2007-02-07 at 00:12 +0200, Michael S. Tsirkin wrote:
> > Quoting Steve Wise <swise at opengridcomputing.com>:
> > Subject: Re: [PATCH] [RFC] ofed_1_2 - SLES9SP3 Backport -IWCM workaroundfor ip_dev_find() bug.
> > 
> > On Tue, 2007-02-06 at 23:36 +0200, Michael S. Tsirkin wrote:
> > > > How shall I fix this?  
> > > 
> > > Patch?
> > > 
> > 
> > Riiight.  I'm afraid if I use HOST instead of LINK that I'll break some
> > strange SDP loopback feature or some such thing.  And I'm not in a
> > position to test that.
> > 
> > But I can post a patch.  Shall I just change sles9sp3 since we don't see
> > (yet) any problems with the other distros?
> 
> If you post one that updates all kernels it will be easier to test.
> 

I'm ok with this.  Stay tuned.


Steve.


From rdreier at cisco.com  Tue Feb  6 14:32:40 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Tue, 06 Feb 2007 14:32:40 -0800
Subject: [openib-general] [PATCHv6 RFC] IPoIB CM Experimental support
In-Reply-To: <20070205201223.GD16598@mellanox.co.il> (Michael S.
	Tsirkin's message of "Mon, 5 Feb 2007 22:12:23 +0200")
References: <20070205201223.GD16598@mellanox.co.il>
Message-ID: <adaveiesyrb.fsf@cisco.com>

Looks pretty good, but one thing worries me:

Overall looks great, I'll merge it up.  A few quick questions: > +#ifdef CONFIG_IPV6

I think this really needs to be

#if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE)

but I'm not clear on what happens if IPoIB is built-in and IPv6 is
built as a module, since then icmpv6_send() isn't available until the
ipv6 module is loaded.  It seems ip_gre.c has the same problem, so
I'll ask on netdev about this.

Also a few other minor things:

 > +#ifdef CONFIG_INFINIBAND_IPOIB_CM
 > +struct ib_cm_id;

this #ifdef in ipoib.h is just guarding declarations; we might as well
declare everything even if it's not used.

 > +	rep.starting_psn = 0 /* FIXME */;

any reason not to just do:

	rep.starting_psn = random32() & 0xffffff;

?

 > +	req.srq 	              = 15;

This just should be 1, right?


 - R.


From swise at opengridcomputing.com  Tue Feb  6 15:39:13 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Tue, 06 Feb 2007 17:39:13 -0600
Subject: [openib-general] [PATCH] [RFC] ofed_1_2 - SLES9SP3 Backport
 -IWCM workaroundfor ip_dev_find() bug.
In-Reply-To: <20070206221232.GO24372@mellanox.co.il>
References: <1170799320.19662.124.camel@stevo-desktop>
	<20070206221232.GO24372@mellanox.co.il>
Message-ID: <1170805153.19662.155.camel@stevo-desktop>

Here it is (only tested with rping over iWARP on sles9sp3):

----------------


xxx_ip_dev_find() must use scope HOST.

From: Steve Wise <swise at opengridcomputing.com>

Function xxx_ip_dev_find(RT_SCOPE_LINK) returns the wrong interface on
some kernels.  The correct scope is RT_SCOPE_HOST.

Signed-off-by: Steve Wise <swise at opengridcomputing.com>
---

 .../backport/2.6.11/include/linux/inetdevice.h     |    2 +-
 .../backport/2.6.11_FC4/include/linux/inetdevice.h |    2 +-
 .../backport/2.6.12/include/linux/inetdevice.h     |    2 +-
 .../backport/2.6.13/include/linux/inetdevice.h     |    2 +-
 .../2.6.13_suse10_0_u/include/linux/inetdevice.h   |    2 +-
 .../backport/2.6.14/include/linux/inetdevice.h     |    2 +-
 .../backport/2.6.15/include/linux/inetdevice.h     |    2 +-
 .../2.6.15_ubuntu606/include/linux/inetdevice.h    |    2 +-
 .../backport/2.6.16/include/linux/inetdevice.h     |    2 +-
 .../backport/2.6.17/include/linux/inetdevice.h     |    2 +-
 .../2.6.5_sles9_sp3/include/linux/inetdevice.h     |    2 +-
 .../backport/2.6.9_U2/include/linux/inetdevice.h   |    2 +-
 .../backport/2.6.9_U3/include/linux/inetdevice.h   |    2 +-
 .../backport/2.6.9_U4/include/linux/inetdevice.h   |    2 +-
 14 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/kernel_addons/backport/2.6.11/include/linux/inetdevice.h b/kernel_addons/backport/2.6.11/include/linux/inetdevice.h
index 7244487..2d3c50f 100644
--- a/kernel_addons/backport/2.6.11/include/linux/inetdevice.h
+++ b/kernel_addons/backport/2.6.11/include/linux/inetdevice.h
@@ -13,7 +13,7 @@ static inline struct net_device *xxx_ip_
 
 	read_lock(&dev_base_lock);
 	for (dev = dev_base; dev; dev = dev->next) {
-		ip = inet_select_addr(dev, 0, RT_SCOPE_LINK);
+		ip = inet_select_addr(dev, 0, RT_SCOPE_HOST);
 		if (ip == addr) {
 			dev_hold(dev);
 			break;
diff --git a/kernel_addons/backport/2.6.11_FC4/include/linux/inetdevice.h b/kernel_addons/backport/2.6.11_FC4/include/linux/inetdevice.h
index 7244487..2d3c50f 100644
--- a/kernel_addons/backport/2.6.11_FC4/include/linux/inetdevice.h
+++ b/kernel_addons/backport/2.6.11_FC4/include/linux/inetdevice.h
@@ -13,7 +13,7 @@ static inline struct net_device *xxx_ip_
 
 	read_lock(&dev_base_lock);
 	for (dev = dev_base; dev; dev = dev->next) {
-		ip = inet_select_addr(dev, 0, RT_SCOPE_LINK);
+		ip = inet_select_addr(dev, 0, RT_SCOPE_HOST);
 		if (ip == addr) {
 			dev_hold(dev);
 			break;
diff --git a/kernel_addons/backport/2.6.12/include/linux/inetdevice.h b/kernel_addons/backport/2.6.12/include/linux/inetdevice.h
index 7244487..2d3c50f 100644
--- a/kernel_addons/backport/2.6.12/include/linux/inetdevice.h
+++ b/kernel_addons/backport/2.6.12/include/linux/inetdevice.h
@@ -13,7 +13,7 @@ static inline struct net_device *xxx_ip_
 
 	read_lock(&dev_base_lock);
 	for (dev = dev_base; dev; dev = dev->next) {
-		ip = inet_select_addr(dev, 0, RT_SCOPE_LINK);
+		ip = inet_select_addr(dev, 0, RT_SCOPE_HOST);
 		if (ip == addr) {
 			dev_hold(dev);
 			break;
diff --git a/kernel_addons/backport/2.6.13/include/linux/inetdevice.h b/kernel_addons/backport/2.6.13/include/linux/inetdevice.h
index 7a32313..fd0aa36 100644
--- a/kernel_addons/backport/2.6.13/include/linux/inetdevice.h
+++ b/kernel_addons/backport/2.6.13/include/linux/inetdevice.h
@@ -11,7 +11,7 @@ static inline struct net_device *xxx_ip_
 
 	read_lock(&dev_base_lock);
 	for (dev = dev_base; dev; dev = dev->next) {
-		ip = inet_select_addr(dev, 0, RT_SCOPE_LINK);
+		ip = inet_select_addr(dev, 0, RT_SCOPE_HOST);
 		if (ip == addr) {
 			dev_hold(dev);
 			break;
diff --git a/kernel_addons/backport/2.6.13_suse10_0_u/include/linux/inetdevice.h b/kernel_addons/backport/2.6.13_suse10_0_u/include/linux/inetdevice.h
index 7a32313..fd0aa36 100644
--- a/kernel_addons/backport/2.6.13_suse10_0_u/include/linux/inetdevice.h
+++ b/kernel_addons/backport/2.6.13_suse10_0_u/include/linux/inetdevice.h
@@ -11,7 +11,7 @@ static inline struct net_device *xxx_ip_
 
 	read_lock(&dev_base_lock);
 	for (dev = dev_base; dev; dev = dev->next) {
-		ip = inet_select_addr(dev, 0, RT_SCOPE_LINK);
+		ip = inet_select_addr(dev, 0, RT_SCOPE_HOST);
 		if (ip == addr) {
 			dev_hold(dev);
 			break;
diff --git a/kernel_addons/backport/2.6.14/include/linux/inetdevice.h b/kernel_addons/backport/2.6.14/include/linux/inetdevice.h
index 7a32313..fd0aa36 100644
--- a/kernel_addons/backport/2.6.14/include/linux/inetdevice.h
+++ b/kernel_addons/backport/2.6.14/include/linux/inetdevice.h
@@ -11,7 +11,7 @@ static inline struct net_device *xxx_ip_
 
 	read_lock(&dev_base_lock);
 	for (dev = dev_base; dev; dev = dev->next) {
-		ip = inet_select_addr(dev, 0, RT_SCOPE_LINK);
+		ip = inet_select_addr(dev, 0, RT_SCOPE_HOST);
 		if (ip == addr) {
 			dev_hold(dev);
 			break;
diff --git a/kernel_addons/backport/2.6.15/include/linux/inetdevice.h b/kernel_addons/backport/2.6.15/include/linux/inetdevice.h
index 7a32313..fd0aa36 100644
--- a/kernel_addons/backport/2.6.15/include/linux/inetdevice.h
+++ b/kernel_addons/backport/2.6.15/include/linux/inetdevice.h
@@ -11,7 +11,7 @@ static inline struct net_device *xxx_ip_
 
 	read_lock(&dev_base_lock);
 	for (dev = dev_base; dev; dev = dev->next) {
-		ip = inet_select_addr(dev, 0, RT_SCOPE_LINK);
+		ip = inet_select_addr(dev, 0, RT_SCOPE_HOST);
 		if (ip == addr) {
 			dev_hold(dev);
 			break;
diff --git a/kernel_addons/backport/2.6.15_ubuntu606/include/linux/inetdevice.h b/kernel_addons/backport/2.6.15_ubuntu606/include/linux/inetdevice.h
index 7a32313..fd0aa36 100644
--- a/kernel_addons/backport/2.6.15_ubuntu606/include/linux/inetdevice.h
+++ b/kernel_addons/backport/2.6.15_ubuntu606/include/linux/inetdevice.h
@@ -11,7 +11,7 @@ static inline struct net_device *xxx_ip_
 
 	read_lock(&dev_base_lock);
 	for (dev = dev_base; dev; dev = dev->next) {
-		ip = inet_select_addr(dev, 0, RT_SCOPE_LINK);
+		ip = inet_select_addr(dev, 0, RT_SCOPE_HOST);
 		if (ip == addr) {
 			dev_hold(dev);
 			break;
diff --git a/kernel_addons/backport/2.6.16/include/linux/inetdevice.h b/kernel_addons/backport/2.6.16/include/linux/inetdevice.h
index 7a32313..fd0aa36 100644
--- a/kernel_addons/backport/2.6.16/include/linux/inetdevice.h
+++ b/kernel_addons/backport/2.6.16/include/linux/inetdevice.h
@@ -11,7 +11,7 @@ static inline struct net_device *xxx_ip_
 
 	read_lock(&dev_base_lock);
 	for (dev = dev_base; dev; dev = dev->next) {
-		ip = inet_select_addr(dev, 0, RT_SCOPE_LINK);
+		ip = inet_select_addr(dev, 0, RT_SCOPE_HOST);
 		if (ip == addr) {
 			dev_hold(dev);
 			break;
diff --git a/kernel_addons/backport/2.6.17/include/linux/inetdevice.h b/kernel_addons/backport/2.6.17/include/linux/inetdevice.h
index 7a32313..fd0aa36 100644
--- a/kernel_addons/backport/2.6.17/include/linux/inetdevice.h
+++ b/kernel_addons/backport/2.6.17/include/linux/inetdevice.h
@@ -11,7 +11,7 @@ static inline struct net_device *xxx_ip_
 
 	read_lock(&dev_base_lock);
 	for (dev = dev_base; dev; dev = dev->next) {
-		ip = inet_select_addr(dev, 0, RT_SCOPE_LINK);
+		ip = inet_select_addr(dev, 0, RT_SCOPE_HOST);
 		if (ip == addr) {
 			dev_hold(dev);
 			break;
diff --git a/kernel_addons/backport/2.6.5_sles9_sp3/include/linux/inetdevice.h b/kernel_addons/backport/2.6.5_sles9_sp3/include/linux/inetdevice.h
index 7244487..2d3c50f 100644
--- a/kernel_addons/backport/2.6.5_sles9_sp3/include/linux/inetdevice.h
+++ b/kernel_addons/backport/2.6.5_sles9_sp3/include/linux/inetdevice.h
@@ -13,7 +13,7 @@ static inline struct net_device *xxx_ip_
 
 	read_lock(&dev_base_lock);
 	for (dev = dev_base; dev; dev = dev->next) {
-		ip = inet_select_addr(dev, 0, RT_SCOPE_LINK);
+		ip = inet_select_addr(dev, 0, RT_SCOPE_HOST);
 		if (ip == addr) {
 			dev_hold(dev);
 			break;
diff --git a/kernel_addons/backport/2.6.9_U2/include/linux/inetdevice.h b/kernel_addons/backport/2.6.9_U2/include/linux/inetdevice.h
index 7244487..2d3c50f 100644
--- a/kernel_addons/backport/2.6.9_U2/include/linux/inetdevice.h
+++ b/kernel_addons/backport/2.6.9_U2/include/linux/inetdevice.h
@@ -13,7 +13,7 @@ static inline struct net_device *xxx_ip_
 
 	read_lock(&dev_base_lock);
 	for (dev = dev_base; dev; dev = dev->next) {
-		ip = inet_select_addr(dev, 0, RT_SCOPE_LINK);
+		ip = inet_select_addr(dev, 0, RT_SCOPE_HOST);
 		if (ip == addr) {
 			dev_hold(dev);
 			break;
diff --git a/kernel_addons/backport/2.6.9_U3/include/linux/inetdevice.h b/kernel_addons/backport/2.6.9_U3/include/linux/inetdevice.h
index 7244487..2d3c50f 100644
--- a/kernel_addons/backport/2.6.9_U3/include/linux/inetdevice.h
+++ b/kernel_addons/backport/2.6.9_U3/include/linux/inetdevice.h
@@ -13,7 +13,7 @@ static inline struct net_device *xxx_ip_
 
 	read_lock(&dev_base_lock);
 	for (dev = dev_base; dev; dev = dev->next) {
-		ip = inet_select_addr(dev, 0, RT_SCOPE_LINK);
+		ip = inet_select_addr(dev, 0, RT_SCOPE_HOST);
 		if (ip == addr) {
 			dev_hold(dev);
 			break;
diff --git a/kernel_addons/backport/2.6.9_U4/include/linux/inetdevice.h b/kernel_addons/backport/2.6.9_U4/include/linux/inetdevice.h
index 7244487..2d3c50f 100644
--- a/kernel_addons/backport/2.6.9_U4/include/linux/inetdevice.h
+++ b/kernel_addons/backport/2.6.9_U4/include/linux/inetdevice.h
@@ -13,7 +13,7 @@ static inline struct net_device *xxx_ip_
 
 	read_lock(&dev_base_lock);
 	for (dev = dev_base; dev; dev = dev->next) {
-		ip = inet_select_addr(dev, 0, RT_SCOPE_LINK);
+		ip = inet_select_addr(dev, 0, RT_SCOPE_HOST);
 		if (ip == addr) {
 			dev_hold(dev);
 			break;


From halr at voltaire.com  Tue Feb  6 16:19:51 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 06 Feb 2007 19:19:51 -0500
Subject: [openib-general] Unknown SMP Recv
In-Reply-To: <002001c74a33$c2ec1db0$21606d86@one7>
References: <000901c74938$e10b2a30$21606d86@one7>
	<1170689654.4525.201415.camel@hal.voltaire.com>
	<001401c74946$a664a2e0$21606d86@one7>
	<1170695591.4525.207604.camel@hal.voltaire.com>
	<002001c74a33$c2ec1db0$21606d86@one7>
Message-ID: <1170807564.4525.324195.camel@hal.voltaire.com>

On Tue, 2007-02-06 at 16:14, Michael Arndt wrote:
> Sorry,
> 
> there was a little mistake.

I think I understand what you are saying now. The below are the 2
responses you get.

> The orginal packet has the initial path [0][1][1], return path [0][2][2],
> hop count and hop pointer are 2 (SubnGetResp), the Dr_DLID and DrSLID are
> permissive.

This sounds like the good response and appears to traverse your 3 nodes.

> The packet I asking for has the  initial path [0][0][0], return path 
> [0][0][0],
> hop count and hop pointer are 2 (SubnGetResp), the Dr_DLID and DrSLID are 0. 
> And the LIDs in LRH are 0. The rest of the smp header is the same as it is 
> in the original header.

This is the bogus extra response. Since your sender node is unmodified,
it is unlikely an issue there. It seems like the intermediate node might
be responding and forwarding the packet on although it should only do
one of those two things. You did mention the SMI on the intermediate
node was modified, right ? Also, note that the SMI is not validated and
has some known issues for switches (e.g. intermediate hops).

-- Hal

> Micheal Arndt
> 
> 


From krkumar2 at in.ibm.com  Tue Feb  6 22:56:50 2007
From: krkumar2 at in.ibm.com (Krishna Kumar)
Date: Wed, 07 Feb 2007 12:26:50 +0530
Subject: [openib-general] [PATCH] RDMA/iwcm: Bugs in cm_conn_req_handler()
Message-ID: <20070207065650.24166.6979.sendpatchset@localhost.localdomain>

(I had submitted this once earlier but got no response)

cm_conn_req_handler() :
	1. Calling destroy_cm_id leaks 3 work 'free' list entries.
	2. cm_id is freed up wrongly and not cm_id_priv (though the
	   effect is the same since cm_id is the first element of
	   cm_id_priv, but still a bug if the top level cm_id changes).
	3. Reject message has to be sent on failure. Tested this
	   without the fix and found the client hangs, waited for about
	   20 mins and then did Ctrl-C but the process is unkillable.
	4. Setting IWCM_F_CALLBACK_DESTROY on cm_id (child handle)
	   doesn't achieve anything, since checking for
	   IWCM_F_CALLBACK_DESTROY in the parent's flag (in
	   cm_work_handler) means that this will never be true.

All 4 above cases were tested by injecting random error in
iw_conn_req_handler() and running rdma_bw/krping, they were
confirmed. I added the BUG_ON() to confirm the earlier check
for id_priv->refcount==0 should always be true (and could be
removed).

Patch against 2.6.20

Signed-off-by: Krishna Kumar <krkumar2 at in.ibm.com>
---
diff -ruNp org/drivers/infiniband/core/iwcm.c new/drivers/infiniband/core/iwcm.c
--- org/drivers/infiniband/core/iwcm.c	2007-01-24 10:25:26.000000000 +0530
+++ new/drivers/infiniband/core/iwcm.c	2007-01-24 10:25:31.000000000 +0530
@@ -647,10 +647,9 @@ static void cm_conn_req_handler(struct i
 	/* Call the client CM handler */
 	ret = cm_id->cm_handler(cm_id, iw_event);
 	if (ret) {
-		set_bit(IWCM_F_CALLBACK_DESTROY, &cm_id_priv->flags);
-		destroy_cm_id(cm_id);
-		if (atomic_read(&cm_id_priv->refcount)==0)
-			kfree(cm_id);
+		BUG_ON(atomic_read(&cm_id_priv->refcount) != 1);
+		iw_cm_reject(cm_id, NULL, 0);
+		iw_destroy_cm_id(cm_id);
 	}
 
 out:


From mst at mellanox.co.il  Tue Feb  6 23:41:39 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Wed, 7 Feb 2007 09:41:39 +0200
Subject: [openib-general] [PATCH] [RFC] ofed_1_2 - SLES9SP3 Backport
 -IWCM workaroundfor ip_dev_find() bug.
In-Reply-To: <1170805153.19662.155.camel@stevo-desktop>
References: <1170799320.19662.124.camel@stevo-desktop>
	<20070206221232.GO24372@mellanox.co.il>
	<1170805153.19662.155.camel@stevo-desktop>
Message-ID: <20070207074139.GA20290@mellanox.co.il>

> Quoting Steve Wise <swise at opengridcomputing.com>:
> Subject: Re: [PATCH] [RFC] ofed_1_2 - SLES9SP3 Backport -IWCM workaroundfor ip_dev_find() bug.
> 
> Here it is (only tested with rping over iWARP on sles9sp3):
> 
> ----------------
> 
> 
> xxx_ip_dev_find() must use scope HOST.
> 
> From: Steve Wise <swise at opengridcomputing.com>
> 
> Function xxx_ip_dev_find(RT_SCOPE_LINK) returns the wrong interface on
> some kernels.  The correct scope is RT_SCOPE_HOST.
> 
> Signed-off-by: Steve Wise <swise at opengridcomputing.com>

OK. I don't have access to the lab at the moment, but hope to test this
by next week.

-- 
MST


From mst at mellanox.co.il  Tue Feb  6 23:53:39 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Wed, 7 Feb 2007 09:53:39 +0200
Subject: [openib-general] [PATCHv6 RFC] IPoIB CM Experimental support
In-Reply-To: <adaveiesyrb.fsf@cisco.com>
References: <adaveiesyrb.fsf@cisco.com>
Message-ID: <20070207075339.GB20290@mellanox.co.il>

> Quoting Roland Dreier <rdreier at cisco.com>:
> Subject: Re: [PATCHv6 RFC] IPoIB CM Experimental support
> 
> Looks pretty good, but one thing worries me:
> 
> Overall looks great, I'll merge it up.

Great, thanks!
Just to clarify: do you intend to fix up the comments below or do you prefer for
me to do it and repost? If the later, it's easy for me, but I won't have access
to the lab today so an updated patch won't be tested till tomorrow.

> A few quick questions: > +#ifdef CONFIG_IPV6
> 
> I think this really needs to be
> 
> #if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE)
> 
> but I'm not clear on what happens if IPoIB is built-in and IPv6 is
> built as a module, since then icmpv6_send() isn't available until the
> ipv6 module is loaded.  It seems ip_gre.c has the same problem, so
> I'll ask on netdev about this.

I see this just got answered.

> Also a few other minor things:
> 
>  > +#ifdef CONFIG_INFINIBAND_IPOIB_CM
>  > +struct ib_cm_id;
> 
> this #ifdef in ipoib.h is just guarding declarations; we might as well
> declare everything even if it's not used.

Yes. I wasn't sure which way you'd prefer it.

>  > +	rep.starting_psn = 0 /* FIXME */;
> 
> any reason not to just do:
> 
> 	rep.starting_psn = random32() & 0xffffff;
> 
> ?

Well, randomness is a resource after all, and since we don't have the additional
security provided by PSNs in IPoIB UD, it seemed we do not need it for
IPoIB CM either. So maybe the right thing is just to remove the FIXME comment.

>  > +	req.srq 	              = 15;
> 
> This just should be 1, right?

Of course. It's a 1-bit field.

-- 
MST


From guyg at voltaire.com  Wed Feb  7 01:37:27 2007
From: guyg at voltaire.com (Guy German)
Date: Wed, 07 Feb 2007 11:37:27 +0200
Subject: [openib-general] [libmthca] deadlock while trying to destroy QP
In-Reply-To: <ada8xfbvz0x.fsf@cisco.com>
References: <45C75EA2.6000905@Voltaire.COM> <ada8xfbvz0x.fsf@cisco.com>
Message-ID: <45C99DD7.9030304@voltaire.com>

Roland Dreier wrote:
> I guess my first reaction is "don't do that."  Trying to do something
> as complex as destroying a QP from a signal handler seems very fragile
> to me, and I wouldn't consider ibv_destroy_qp() safe to call from a
> signal handler.

Fair enough.

Thanks,
Guy


From vlad at lists.openfabrics.org  Wed Feb  7 02:22:19 2007
From: vlad at lists.openfabrics.org (vlad at lists.openfabrics.org)
Date: Wed,  7 Feb 2007 02:22:19 -0800 (PST)
Subject: [openib-general] ofa_1_2_kernel 20070207-0200 daily build status
Message-ID: <20070207102219.8CA72E60804@openfabrics.org>

This email was generated automatically, please do not reply


Common build parameters:  --with-ipoib-mod --with-sdp-mod --with-srp-mod --with-user_mad-mod --with-user_access-mod --with-mthca-mod --with-core-mod --with-addr_trans-mod --with-cxgb3-mod 

Passed:
Passed on i686 with 2.6.15-23-server
Passed on i686 with linux-2.6.19
Passed on i686 with linux-2.6.14
Passed on i686 with linux-2.6.17
Passed on i686 with linux-2.6.16
Passed on i686 with linux-2.6.13
Passed on i686 with linux-2.6.12
Passed on i686 with linux-2.6.15
Passed on i686 with linux-2.6.18
Passed on powerpc with linux-2.6.19
Passed on x86_64 with linux-2.6.19
Passed on powerpc with linux-2.6.18
Passed on powerpc with linux-2.6.17
Passed on x86_64 with linux-2.6.16
Passed on x86_64 with linux-2.6.18
Passed on ia64 with linux-2.6.19
Passed on ia64 with linux-2.6.18
Passed on x86_64 with linux-2.6.12
Passed on ppc64 with linux-2.6.12
Passed on x86_64 with linux-2.6.17
Passed on x86_64 with linux-2.6.15
Passed on ppc64 with linux-2.6.18
Passed on x86_64 with linux-2.6.13
Passed on ppc64 with linux-2.6.19
Passed on x86_64 with linux-2.6.14
Passed on ppc64 with linux-2.6.15
Passed on ppc64 with linux-2.6.16
Passed on powerpc with linux-2.6.15
Passed on powerpc with linux-2.6.12
Passed on ia64 with linux-2.6.13
Passed on powerpc with linux-2.6.14
Passed on powerpc with linux-2.6.13
Passed on ppc64 with linux-2.6.13
Passed on powerpc with linux-2.6.16
Passed on ppc64 with linux-2.6.14
Passed on ppc64 with linux-2.6.17
Passed on ia64 with linux-2.6.17
Passed on ia64 with linux-2.6.14
Passed on ia64 with linux-2.6.16
Passed on ia64 with linux-2.6.12
Passed on ia64 with linux-2.6.15

Failed:
Build failed on ia64 with linux-2.6.16.21-0.8-default
Log:
/home/vlad/tmp/ofa_1_2_kernel-20070207-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core/addr.c:380: error: implicit declaration of function ‘register_netevent_notifier’
/home/vlad/tmp/ofa_1_2_kernel-20070207-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core/addr.c: In function ‘addr_cleanup’:
/home/vlad/tmp/ofa_1_2_kernel-20070207-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core/addr.c:386: error: implicit declaration of function ‘unregister_netevent_notifier’
make[4]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070207-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core/addr.o] Error 1
make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070207-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core] Error 2
make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070207-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband] Error 2
make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070207-0200_linux-2.6.16.21-0.8-default_ia64_check] Error 2
make[1]: Leaving directory `/home/vlad/kernel.org/ia64/linux-2.6.16.21-0.8-default'
make: *** [kernel] Error 2
----------------------------------------------------------------------------------


From grossmann at hlrs.de  Wed Feb  7 03:03:45 2007
From: grossmann at hlrs.de (Thomas =?iso-8859-1?q?Gro=DFmann?=)
Date: Wed, 7 Feb 2007 12:03:45 +0100
Subject: [openib-general] Problem with SRP with 512 byte sector size with >
	2 TB LUNs
Message-ID: <200702071203.45309.grossmann@hlrs.de>

Hello,

We have a disk-array connected over a Mellanox MT25204 IB
card. We have configured LUNs with a size of over 2 TB with
512 byte sector size and are using OpenIB 1.1 and SUSE SLES 10 x86_64. 
I get the following output in /var/log/messages when adding a LUN:

Feb  2 09:59:57 data1 kernel:   Vendor: DDN       Model: S2A 9550          
Rev: 3.03
Feb  2 09:59:57 data1 kernel:   Type:   Direct-Access                      
ANSI SCSI revision: 06
Feb  2 09:59:57 data1 kernel: sdc : very big device. try to use READ 
CAPACITY(16).
Feb  2 09:59:57 data1 kernel: sdc : READ CAPACITY(16) failed.
Feb  2 09:59:57 data1 kernel: sdc : status=0, message=00, host=5, driver=00
Feb  2 09:59:57 data1 kernel: sdc : use 0xffffffff as device size
Feb  2 09:59:57 data1 kernel: SCSI device sdc: 4294967296 512-byte hdwr 
sectors (2199023 MB)
Feb  2 09:59:57 data1 kernel: sdc: Write Protect is off
Feb  2 09:59:57 data1 kernel: sdc: Mode Sense: 97 00 10 08
Feb  2 09:59:57 data1 kernel: SCSI device sdc: drive cache: write back w/ FUA
Feb  2 09:59:57 data1 kernel: sdc : very big device. try to use READ 
CAPACITY(16).
Feb  2 09:59:57 data1 kernel: sdc : READ CAPACITY(16) failed.
Feb  2 09:59:57 data1 kernel: sdc : status=0, message=00, host=5, driver=00
Feb  2 09:59:57 data1 kernel: sdc : use 0xffffffff as device size
Feb  2 09:59:57 data1 kernel: SCSI device sdc: 4294967296 512-byte hdwr 
sectors (2199023 MB)
Feb  2 09:59:57 data1 kernel: sdc: Write Protect is off
Feb  2 09:59:57 data1 kernel: sdc: Mode Sense: 97 00 10 08
Feb  2 09:59:57 data1 kernel: SCSI device sdc: drive cache: write back w/ FUA
Feb  2 09:59:57 data1 kernel:  sdc: unknown partition table
Feb  2 09:59:57 data1 kernel: sd 8:0:0:0: Attached scsi disk sdc
Feb  2 09:59:57 data1 kernel: sd 8:0:0:0: Attached scsi generic sg2 type 0

I found in the Changelog of kernel 2.6.20 the following instruction:
target_host->max_cmd_len = sizeof ((struct srp_cmd *) (void *) 0L)->cdb;
(added to the function srp_create_target to achieve READ CAPACITY(16) )
and added it to the ib_srp module of OpenIB 1.1. 

The output was then:
Feb  5 17:53:07 data1 kernel:   Vendor: DDN       Model: S2A 9550          
Rev: 3.03
Feb  5 17:53:07 data1 kernel:   Type:   Direct-Access                      
ANSI SCSI revision: 06
Feb  5 17:53:07 data1 kernel: sdc : very big device. try to use READ 
CAPACITY(16).
Feb  5 17:53:07 data1 kernel: sdc : sector size 0 reported, assuming 512.
Feb  5 17:53:07 data1 kernel: SCSI device sdc: 1 512-byte hdwr sectors (0 MB)
Feb  5 17:53:07 data1 kernel: sdc: Write Protect is off
Feb  5 17:53:07 data1 kernel: sdc: Mode Sense: 97 00 10 08
Feb  5 17:53:07 data1 kernel: SCSI device sdc: drive cache: write back w/ FUA
Feb  5 17:53:07 data1 kernel: sdc : very big device. try to use READ 
CAPACITY(16).
Feb  5 17:53:07 data1 kernel: sdc : sector size 0 reported, assuming 512.
Feb  5 17:53:07 data1 kernel: SCSI device sdc: 1 512-byte hdwr sectors (0 MB)
Feb  5 17:53:07 data1 kernel: sdc: Write Protect is off
Feb  5 17:53:07 data1 kernel: sdc: Mode Sense: 97 00 10 08
Feb  5 17:53:07 data1 kernel: SCSI device sdc: drive cache: write back w/ FUA
Feb  5 17:53:07 data1 kernel:  sdc: unknown partition table
Feb  5 17:53:07 data1 kernel: sd 9:0:0:0: Attached scsi disk sdc
Feb  5 17:53:07 data1 kernel: sd 9:0:0:0: Attached scsi generic sg2 type 0

The same output was shown when trying to add a LUN using kernel 2.6.20.

Is it possible to add LUNs with > 2 TB and 512 byte sectors ?
Why does the READ CAPACITY(16) comand fail ?

Kind regards,
Thomas

-- 
 Thomas Großmann                
 High Performance Computing Center Stuttgart (HLRS)                                      
  
 Allmandring 30                                                
 70550 Stuttgart, Germany   

 E-Mail: grossmann at hlrs.de                                                              
 Phone: ++49-711-685-65529
 Fax  : ++49-711-685-65832


From mst at mellanox.co.il  Wed Feb  7 04:10:22 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Wed, 7 Feb 2007 14:10:22 +0200
Subject: [openib-general] resolving sending mails from OFA new server
In-Reply-To: <3D84A59A1AD3584DA02AEAD240E8863F039CFAB2@ES22SNLNT.srn.sandia.gov>
References: <3D84A59A1AD3584DA02AEAD240E8863F039CFAB2@ES22SNLNT.srn.sandia.gov>
Message-ID: <20070207121022.GA1102@mellanox.co.il>

> Michael,
> 
> I put something together at bugmail at lists.openfabrics.org.  I did not
> get a chance to try it out, so let me know if it's working out for you.
> Keywords used in the e-mail format come from the bugmail_help.html
> included w/ Bugzilla (it is posted at
> http://www.openfabrics.org/docs/bugmail_help.html).  
> 
> Michael

I just tried both and it worked flawlessly.
Thanks, very much!

Guiys, you should try the email gateway, it is amazing
especially for adding text to bugs: just put
[Bug XXX] in mail subject.

Michael, one small request: could the messages that bugzilla
generates have From field as bugmail at lists.openfabrics.org
and not bugzilla-daemon at openib.org as today?
This way I can add text to a bug just by replying to it.

Thanks,
	MST
-- 
MST


From mst at mellanox.co.il  Wed Feb  7 04:35:34 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Wed, 7 Feb 2007 14:35:34 +0200
Subject: [openib-general] [PATCH] RDMA/iwcm: Bugs in
	cm_conn_req_handler()
In-Reply-To: <20070207065650.24166.6979.sendpatchset@localhost.localdomain>
References: <20070207065650.24166.6979.sendpatchset@localhost.localdomain>
Message-ID: <20070207123534.GD716@mellanox.co.il>

-		set_bit(IWCM_F_CALLBACK_DESTROY, &cm_id_priv->flags);
-		destroy_cm_id(cm_id);
-		if (atomic_read(&cm_id_priv->refcount)==0)
-			kfree(cm_id);
+		BUG_ON(atomic_read(&cm_id_priv->refcount) != 1);
+		iw_cm_reject(cm_id, NULL, 0);
+		iw_destroy_cm_id(cm_id);

And BTW, lots of lines with atomic_read()==0 in them have broken whitespace
in iwcm.c. Does anyone care enough to fix them?

-- 
MST


From halr at voltaire.com  Wed Feb  7 05:49:17 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 07 Feb 2007 08:49:17 -0500
Subject: [openib-general] patches to 2.6.19.1 kernel for switch Operation
In-Reply-To: <039701c7494b$6bd5d860$1914a8c0@surioffice>
References: <000601c7419f$d4470c60$ff0da8c0@amr.corp.intel.com>
	<1170072757.4555.242192.camel@hal.voltaire.com>
	<039701c7494b$6bd5d860$1914a8c0@surioffice>
Message-ID: <1170856154.4525.372809.camel@hal.voltaire.com>

Suri,

On Mon, 2007-02-05 at 12:31, Suresh Shelvapille wrote:
> Hal:
> 
> We are upgrading to 2.6.19.1 kernel

Glad to hear this.

>  and I finally ported the changes
> required for Switch operation from my current kernel (2.6.12) version. 
> 
> I have tested these changes for a switch with different SM(s). But I need
> the community's help to test the changes on different HCAs to make sure I
> have not broken anything.
> 
> Please see if the changes look OK.

Have you tested these changes on end nodes (HCAs) ? If so, what tests
have you performed ?

It would be easier to comment if your changes were included inline
rather than as attachments.

Also, you should attach your S-O-B line.

Thanks.

-- Hal

> Thanks,
> Suri


From swise at opengridcomputing.com  Wed Feb  7 06:24:32 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Wed, 07 Feb 2007 08:24:32 -0600
Subject: [openib-general] [PATCH] RDMA/iwcm: Bugs in
 cm_conn_req_handler()
In-Reply-To: <20070207065650.24166.6979.sendpatchset@localhost.localdomain>
References: <20070207065650.24166.6979.sendpatchset@localhost.localdomain>
Message-ID: <1170858272.14381.1.camel@stevo-desktop>

This looks good for 2.6.21 IMO.

Acked-by: Steve Wise <swise at opengridcomputing.com>


On Wed, 2007-02-07 at 12:26 +0530, Krishna Kumar wrote:
> (I had submitted this once earlier but got no response)
> 
> cm_conn_req_handler() :
> 	1. Calling destroy_cm_id leaks 3 work 'free' list entries.
> 	2. cm_id is freed up wrongly and not cm_id_priv (though the
> 	   effect is the same since cm_id is the first element of
> 	   cm_id_priv, but still a bug if the top level cm_id changes).
> 	3. Reject message has to be sent on failure. Tested this
> 	   without the fix and found the client hangs, waited for about
> 	   20 mins and then did Ctrl-C but the process is unkillable.
> 	4. Setting IWCM_F_CALLBACK_DESTROY on cm_id (child handle)
> 	   doesn't achieve anything, since checking for
> 	   IWCM_F_CALLBACK_DESTROY in the parent's flag (in
> 	   cm_work_handler) means that this will never be true.
> 
> All 4 above cases were tested by injecting random error in
> iw_conn_req_handler() and running rdma_bw/krping, they were
> confirmed. I added the BUG_ON() to confirm the earlier check
> for id_priv->refcount==0 should always be true (and could be
> removed).
> 
> Patch against 2.6.20
> 
> Signed-off-by: Krishna Kumar <krkumar2 at in.ibm.com>
> ---
> diff -ruNp org/drivers/infiniband/core/iwcm.c new/drivers/infiniband/core/iwcm.c
> --- org/drivers/infiniband/core/iwcm.c	2007-01-24 10:25:26.000000000 +0530
> +++ new/drivers/infiniband/core/iwcm.c	2007-01-24 10:25:31.000000000 +0530
> @@ -647,10 +647,9 @@ static void cm_conn_req_handler(struct i
>  	/* Call the client CM handler */
>  	ret = cm_id->cm_handler(cm_id, iw_event);
>  	if (ret) {
> -		set_bit(IWCM_F_CALLBACK_DESTROY, &cm_id_priv->flags);
> -		destroy_cm_id(cm_id);
> -		if (atomic_read(&cm_id_priv->refcount)==0)
> -			kfree(cm_id);
> +		BUG_ON(atomic_read(&cm_id_priv->refcount) != 1);
> +		iw_cm_reject(cm_id, NULL, 0);
> +		iw_destroy_cm_id(cm_id);
>  	}
>  
>  out:
> 
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
> 


From michael.arndt at informatik.tu-chemnitz.de  Wed Feb  7 06:38:37 2007
From: michael.arndt at informatik.tu-chemnitz.de (Michael Arndt)
Date: Wed, 7 Feb 2007 15:38:37 +0100
Subject: [openib-general] Unknown SMP Recv
References: <000901c74938$e10b2a30$21606d86@one7>
	<1170689654.4525.201415.camel@hal.voltaire.com>
	<001401c74946$a664a2e0$21606d86@one7>
	<1170695591.4525.207604.camel@hal.voltaire.com>
	<002001c74a33$c2ec1db0$21606d86@one7>
	<1170807564.4525.324195.camel@hal.voltaire.com>
Message-ID: <000801c74ac5$a70c6a90$21606d86@one7>

Hi,

> This sounds like the good response and appears to traverse your 3 nodes.

Yes, that's right

> This is the bogus extra response. Since your sender node is unmodified,
> it is unlikely an issue there. It seems like the intermediate node might
> be responding and forwarding the packet on although it should only do
> one of those two things. You did mention the SMI on the intermediate
> node was modified, right ? Also, note that the SMI is not validated and
> has some known issues for switches (e.g. intermediate hops).

The sender and the responder is unmodified (node1, node3). I have debugged 
the hole SMI, ib_mad_recv_done_handler and handle_outgoing_dr_smp functions 
and did not found the bogus extra response. As debugged is the responder 
sending one packet, which would be right and the intermediate node isn't 
receiving an bogus extra packet. So the extra packet didn't pass the SMI 
that's for sure. I use the libibumad to implement the forwarding mechanism 
and also use the select function to catch any receive I should handle. Maybe 
there is something wrong.

Thanks Michael Arndt 


From tom at opengridcomputing.com  Wed Feb  7 07:01:23 2007
From: tom at opengridcomputing.com (Tom Tucker)
Date: Wed, 07 Feb 2007 09:01:23 -0600
Subject: [openib-general] [PATCH] RDMA/iwcm: Bugs in
 cm_conn_req_handler()
In-Reply-To: <1170858272.14381.1.camel@stevo-desktop>
References: <20070207065650.24166.6979.sendpatchset@localhost.localdomain>
	<1170858272.14381.1.camel@stevo-desktop>
Message-ID: <1170860483.11491.21.camel@trinity.ogc.int>

On Wed, 2007-02-07 at 08:24 -0600, Steve Wise wrote:
> This looks good for 2.6.21 IMO.
> 
> Acked-by: Steve Wise <swise at opengridcomputing.com>
> 
> 
> On Wed, 2007-02-07 at 12:26 +0530, Krishna Kumar wrote:
> > (I had submitted this once earlier but got no response)


> > 
> > cm_conn_req_handler() :
> > 	1. Calling destroy_cm_id leaks 3 work 'free' list entries.

When dealloc_work_entries was added to the iw_destroy_cm_id function, it
needed ALSO to be added everywhere destroy_cm_id was called. So you need
to call dealloc_work_entries everywhere you call destroy_cm_id or this
leak remains all over the place, e.g. cm_work_handler

> > 	2. cm_id is freed up wrongly and not cm_id_priv (though the
> > 	   effect is the same since cm_id is the first element of
> > 	   cm_id_priv, but still a bug if the top level cm_id changes).
> > 	3. Reject message has to be sent on failure. Tested this
> > 	   without the fix and found the client hangs, waited for about
> > 	   20 mins and then did Ctrl-C but the process is unkillable.

This should be added to the switch statement in destroy_cm_id (not here)
so that it doesn't need to be added everywhere the cm_id is destroyed
when it's in a state that requires a reject.

> > 	4. Setting IWCM_F_CALLBACK_DESTROY on cm_id (child handle)
> > 	   doesn't achieve anything, since checking for
> > 	   IWCM_F_CALLBACK_DESTROY in the parent's flag (in
> > 	   cm_work_handler) means that this will never be true.

destroy_cm_id exists to allow cm_id to be destroyed without waiting. If
you're changing it to iw_destroy_cm_id, that may be fine, but all the
setbit/getbit stuff is a side show.  You must be certain that
iw_destroy_cm_id can't wait. If it does, you'll shut down the entire
IWCM.
 
> > 
> > All 4 above cases were tested by injecting random error in
> > iw_conn_req_handler() and running rdma_bw/krping, they were
> > confirmed. I added the BUG_ON() to confirm the earlier check
> > for id_priv->refcount==0 should always be true (and could be
> > removed).
> > 
> > Patch against 2.6.20
> > 
> > Signed-off-by: Krishna Kumar <krkumar2 at in.ibm.com>
> > ---
> > diff -ruNp org/drivers/infiniband/core/iwcm.c new/drivers/infiniband/core/iwcm.c
> > --- org/drivers/infiniband/core/iwcm.c	2007-01-24 10:25:26.000000000 +0530
> > +++ new/drivers/infiniband/core/iwcm.c	2007-01-24 10:25:31.000000000 +0530
> > @@ -647,10 +647,9 @@ static void cm_conn_req_handler(struct i
> >  	/* Call the client CM handler */
> >  	ret = cm_id->cm_handler(cm_id, iw_event);
> >  	if (ret) {
> > -		set_bit(IWCM_F_CALLBACK_DESTROY, &cm_id_priv->flags);
> > -		destroy_cm_id(cm_id);
> > -		if (atomic_read(&cm_id_priv->refcount)==0)
> > -			kfree(cm_id);
> > +		BUG_ON(atomic_read(&cm_id_priv->refcount) != 1);
> > +		iw_cm_reject(cm_id, NULL, 0);
> > +		iw_destroy_cm_id(cm_id);
> >  	}
> >  
> >  out:
> > 
> > _______________________________________________
> > openib-general mailing list
> > openib-general at openib.org
> > http://openib.org/mailman/listinfo/openib-general
> > 
> > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
> > 
> 
> 
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
> 


From yosefe at voltaire.com  Wed Feb  7 07:20:17 2007
From: yosefe at voltaire.com (Yosef Etigin)
Date: Wed, 07 Feb 2007 17:20:17 +0200
Subject: [openib-general] issues with compilation of ofed 1.2
Message-ID: <45C9EE31.2040602@voltaire.com>


******************************************************************
1. When compiling without ibutils I get the following error:


RPM build errors:
    user vladsk does not exist - using root
    group vladsk does not exist - using root
    user vladsk does not exist - using root
    group vladsk does not exist - using root
    File not found by glob: /var/tmp/OFED/usr/local/ofed/man/man1/ibv_*
    File not found by glob: /var/tmp/OFED/usr/local/ofed/man/man8/opensm*
    File not found by glob: /var/tmp/OFED/usr/local/ofed/man/man8/osmtest*
ERROR: Failed executing "rpmbuild --rebuild --define '_topdir /var/tmp/OFEDRPM' --define '_prefix /usr/local/ofed ' --define 'build_root /var/tmp/OFED ' --define 'configure_options --with-ipoibtools --with-libcxgb3 --with-libibcm --with-libibcommon --with-libibmad --with-libibumad --with-libibverbs --with-libmthca --with-opensm --with-librdmacm --with-libsdp --with-sdpnetstat --with-mstflint --with-perftest --mandir=/usr/local/ofed /man' --define 'configure_options32 %{nil}' --define 'build_32bit 0' /tmp/regtest/OFED-1.2-20070205-1823/SRPMS/ofa_user-1.2-alpha1.src.rpm"

******************************************************************
2. After adding ibutils, compilation passes on RH4 (U4 and U3)
However, when execution application that uses libibverbs, i get ths error:

libibverbs: Warning: couldn't open config directory '/usr/local/ofed/etc/libibverbs.d'.
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
No IB devices found

Workaround: copy libibverbs.d from installation of ofed 1.2 from daily build packages to /usr/loca/ofed/etc/


******************************************************************
3. Uninstall script does not always successfully remove libcxgb3 package


******************************************************************
4. When compiling on SLES10 I get this error:

MTHOME directory /var/tmp/OFED/usr/local/ofed does not exist.
Exiting.
error: Bad exit status from /var/tmp/rpm-tmp.37387 (%build)


RPM build errors:
    user rowland does not exist - using root
    group mvapich does not exist - using root
    user rowland does not exist - using root
    group mvapich does not exist - using root
    Bad exit status from /var/tmp/rpm-tmp.37387 (%build)
ERROR: Failed executing "rpmbuild --rebuild --define '_topdir /var/tmp/OFEDRPM' --define '_name mvapich2_gcc' --define '_prefix /usr/local/ofed/mpi/gcc/mvapich2-0.9.8-1' --define 'build_root /var/tmp/OFED' --define 'open_ib_home /usr/local/ofed' --define 'ofed_build_root /var/tmp/OFED' --define 'comp_env CC=gcc CXX=g++ F77=gfortran' --define 'iwarp 0' --define 'romio 1' --define 'shared_libs 1' --define 'auto_req 1' /tmp/OFED-1.2-20070205-1823/SRPMS/mvapich2-0.9.8-1.src.rpm"

******************************************************************
5. When compiling on SLES10 SP1 I get this error:

In file included from /usr/src/linux-2.6.16.37-0.9/include/linux/inetdevice.h:7,
                 from /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/infiniband/core/addr.c:32:
/var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/kernel_addons/backport/2.6.16_sles10/include/linux/netdevice.h:7: error: redefinition of ânetif_tx_lockâ
/usr/src/linux-2.6.16.37-0.9/include/linux/netdevice.h:927: error: previous definition of ânetif_tx_lockâ was here
/var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/kernel_addons/backport/2.6.16_sles10/include/linux/netdevice.h: In function ânetif_tx_lockâ:
/var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/kernel_addons/backport/2.6.16_sles10/include/linux/netdevice.h:8: error: âstruct net_deviceâ has no member named âxmit_lockâ
/var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/kernel_addons/backport/2.6.16_sles10/include/linux/netdevice.h: At top level:
/var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/kernel_addons/backport/2.6.16_sles10/include/linux/netdevice.h:13: error: redefinition of ânetif_tx_unlockâ
/usr/src/linux-2.6.16.37-0.9/include/linux/netdevice.h:947: error: previous definition of ânetif_tx_unlockâ was here
/var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/kernel_addons/backport/2.6.16_sles10/include/linux/netdevice.h: In function ânetif_tx_unlockâ:
/var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/kernel_addons/backport/2.6.16_sles10/include/linux/netdevice.h:15: error: âstruct net_deviceâ has no member named âxmit_lockâ
/var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/infiniband/core/addr.c: At top level:
/var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/infiniband/core/addr.c:61: warning: initialization from incompatible pointer type
make[6]: *** [/var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/infiniband/core/addr.o] Error 1


******************************************************************
6. On [PPC64/Sles10] I get this compilaton error:

make[2]: Entering directory `/var/tmp/OFEDRPM/BUILD/ofa_user-1.2/src/userspace/librdmacm'
if /bin/sh ./libtool --tag=CC --mode=compile gcc -DHAVE_CONFIG_H -I. -I. -I. -I./include  -I../libibverbs/include -g -Wall -D_GNU_SOURCE -m64 -g -O2 -MT cma.lo -MD -MP -MF ".deps/cma.Tpo" -c -o cma.lo `test -f 'src/cma.c' || echo './'`src/cma.c; \
then mv -f ".deps/cma.Tpo" ".deps/cma.Plo"; else rm -f ".deps/cma.Tpo"; exit 1; fi
mkdir .libs
 gcc -DHAVE_CONFIG_H -I. -I. -I. -I./include -I../libibverbs/include -g -Wall -D_GNU_SOURCE -m64 -g -O2 -MT cma.lo -MD -MP -MF .deps/cma.Tpo -c src/cma.c  -fPIC -DPIC -o .libs/cma.o
/bin/sh ./libtool --tag=CC --mode=link gcc -g -Wall -D_GNU_SOURCE -m64 -g -O2 -L../libibverbs/src -libverbs -lsysfs -L. -o src/librdmacm.la -rpath /usr/local/ofed/lib64 -avoid-version -Wl,--version-script=./src/librdmacm.map cma.lo  
mkdir src/.libs
gcc -shared  .libs/cma.o  -Wl,--rpath -Wl,/var/tmp/OFEDRPM/BUILD/ofa_user-1.2/src/userspace/libibverbs/src/.libs /var/tmp/OFEDRPM/BUILD/ofa_user-1.2/src/userspace/libibverbs/src/.libs/libibverbs.so /usr/lib/libsysfs.so -L/var/tmp/OFEDRPM/BUILD/ofa_user-1.2/src/userspace/librdmacm  -m64 -Wl,--version-script=./src/librdmacm.map -Wl,-soname -Wl,librdmacm.so -o src/.libs/librdmacm.so
/usr/lib/libsysfs.so: could not read symbols: File in wrong format
collect2: ld returned 1 exit status
make[2]: *** [src/librdmacm.la] Error 1
make[2]: Leaving directory `/var/tmp/OFEDRPM/BUILD/ofa_user-1.2/src/userspace/librdmacm'
make[1]: *** [all] Error 2

************************
7. On RHAS5 beta 2, the setup requires sysfstuils-devel RPM which is not included in this distro.


--
Yosef Etigin
Alex Tabachnik


From swise at opengridcomputing.com  Wed Feb  7 07:30:18 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Wed, 07 Feb 2007 09:30:18 -0600
Subject: [openib-general] dapltest?
Message-ID: <1170862218.14381.4.camel@stevo-desktop>

Hey Arlin,

Shouldn't dapl/test be shipped with OFED?  It appears not to be...

Steve.


From monis at voltaire.com  Wed Feb  7 07:35:58 2007
From: monis at voltaire.com (Moni Shoua)
Date: Wed, 07 Feb 2007 17:35:58 +0200
Subject: [openib-general] [PATCH] IB/ipoib get net_device from
 ipoib_neigh instead of linux neighbour
In-Reply-To: <20070206171424.GB24372@mellanox.co.il>
References: <45C8ABAA.10500@voltaire.com>
	<20070206171424.GB24372@mellanox.co.il>
Message-ID: <45C9F1DE.8090409@voltaire.com>


> Another concern: assume that one device goes away (e.g. hotplug).
> It seems that neighbours whose dev field point to another device, will not be destroyed.
> Correct?
I agree.
> 
> Therefore in your design, it seems that to_ipoib_neigh()->dev
> will get us a pointer to device that has been removed already.
> 
I agree that this is a problem. It think it would be best to prevent an IPoIB device
from disappearing or from ib_ipoib from being unloaded as long as IPoIB
device is a slave. Unfortunately, I don't see how this can be done just
by fixing something in bonding or IPoIB. 
However, any slave knows he has a master (dev->master). 
What do you think about a solution where IPoIB first tries to clean up the
neighbours that belong to it's master before deleting the IPoIB device?

>> Furthermore, bond_setup_by_slave is called only for non
>> Ethernet devices (we consider to change the logic to "called only for
>> IPoIB devices just for safety).
> 
> Why is this necessary, BTW?
> 
If we don't do that, we get a memory leak because the neigh destructor will
never be called for non IPoIB devices although they carry ipoib_neigh
with them.


From vlad at dev.mellanox.co.il  Wed Feb  7 08:42:02 2007
From: vlad at dev.mellanox.co.il (Vladimir Sokolovsky)
Date: Wed, 07 Feb 2007 18:42:02 +0200
Subject: [openib-general] Open MPI rpmbuild fails in OFED-1.2
Message-ID: <1170866522.6223.8.camel@vladsk-laptop>

Hi Jeff,
Please remove %build macro from the RPM spec file.
On SuSE distros it removes RPM_BUILD_ROOT.

Executing(%build): /bin/sh -e /var/tmp/rpm-tmp.23343
+ umask 022
+ cd /var/tmp/OFEDRPM/BUILD
+ /bin/rm -rf /var/tmp/OFED
++ dirname /var/tmp/OFED
+ /bin/mkdir -p /var/tmp
+ /bin/mkdir /var/tmp/OFED
+ cd openmpi-1.2b4ofedr13470
+ fortify_source=1
+ test '' '!=' ''
...

-- 
Vladimir Sokolovsky <vlad at dev.mellanox.co.il>
Mellanox Technologies Ltd.


From jsquyres at cisco.com  Wed Feb  7 08:52:24 2007
From: jsquyres at cisco.com (Jeff Squyres)
Date: Wed, 7 Feb 2007 11:52:24 -0500
Subject: [openib-general] Open MPI rpmbuild fails in OFED-1.2
In-Reply-To: <1170866522.6223.8.camel@vladsk-laptop>
References: <1170866522.6223.8.camel@vladsk-laptop>
Message-ID: <7FDCD3BB-A76F-4C36-8939-4E7C634F0D86@cisco.com>

The "%build" directive is not just a macro, it's also a section  
qualifier indicating the beginning of the build section.  From

http://fedora.redhat.com/docs/drafts/rpm-guide-en/ch08s02.html#id2966770

"The build section starts with a %build statement."

Is there something else that I should replace it with that will also  
start the build section?


On Feb 7, 2007, at 11:42 AM, Vladimir Sokolovsky wrote:

> Hi Jeff,
> Please remove %build macro from the RPM spec file.
> On SuSE distros it removes RPM_BUILD_ROOT.
>
> Executing(%build): /bin/sh -e /var/tmp/rpm-tmp.23343
> + umask 022
> + cd /var/tmp/OFEDRPM/BUILD
> + /bin/rm -rf /var/tmp/OFED
> ++ dirname /var/tmp/OFED
> + /bin/mkdir -p /var/tmp
> + /bin/mkdir /var/tmp/OFED
> + cd openmpi-1.2b4ofedr13470
> + fortify_source=1
> + test '' '!=' ''
> ...
>
> -- 
> Vladimir Sokolovsky <vlad at dev.mellanox.co.il>
> Mellanox Technologies Ltd.


-- 
Jeff Squyres
Server Virtualization Business Unit
Cisco Systems


From vlad at dev.mellanox.co.il  Wed Feb  7 09:00:20 2007
From: vlad at dev.mellanox.co.il (Vladimir Sokolovsky)
Date: Wed, 07 Feb 2007 19:00:20 +0200
Subject: [openib-general] Open MPI rpmbuild fails in OFED-1.2
In-Reply-To: <7FDCD3BB-A76F-4C36-8939-4E7C634F0D86@cisco.com>
References: <1170866522.6223.8.camel@vladsk-laptop>
	<7FDCD3BB-A76F-4C36-8939-4E7C634F0D86@cisco.com>
Message-ID: <1170867620.6223.11.camel@vladsk-laptop>

I propose to replace %build by %install.
Otherwise %build removes /var/tmp/OFED (on SuSE) which includes all
installed libraries.

Regards,
Vladimir

On Wed, 2007-02-07 at 11:52 -0500, Jeff Squyres wrote:
> The "%build" directive is not just a macro, it's also a section  
> qualifier indicating the beginning of the build section.  From
> 
> http://fedora.redhat.com/docs/drafts/rpm-guide-en/ch08s02.html#id2966770
> 
> "The build section starts with a %build statement."
> 
> Is there something else that I should replace it with that will also  
> start the build section?
> 
> 
> 
> On Feb 7, 2007, at 11:42 AM, Vladimir Sokolovsky wrote:
> 
> > Hi Jeff,
> > Please remove %build macro from the RPM spec file.
> > On SuSE distros it removes RPM_BUILD_ROOT.
> >
> > Executing(%build): /bin/sh -e /var/tmp/rpm-tmp.23343
> > + umask 022
> > + cd /var/tmp/OFEDRPM/BUILD
> > + /bin/rm -rf /var/tmp/OFED
> > ++ dirname /var/tmp/OFED
> > + /bin/mkdir -p /var/tmp
> > + /bin/mkdir /var/tmp/OFED
> > + cd openmpi-1.2b4ofedr13470
> > + fortify_source=1
> > + test '' '!=' ''
> > ...
> >
> > -- 
> > Vladimir Sokolovsky <vlad at dev.mellanox.co.il>
> > Mellanox Technologies Ltd.
> 
> 


From rdreier at cisco.com  Wed Feb  7 09:58:14 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Wed, 07 Feb 2007 09:58:14 -0800
Subject: [openib-general] Problem with SRP with 512 byte sector size
 with > 2 TB LUNs
References: <200702071203.45309.grossmann@hlrs.de>
Message-ID: <adaabzpluix.fsf@cisco.com>

 > Is it possible to add LUNs with > 2 TB and 512 byte sectors ?
 > Why does the READ CAPACITY(16) comand fail ?

It seems that the DDN target is not reporting good information -- I
don't see anything obviously wrong in what the kernel is doing (now
that SRP sends a READ CAPACITY command).  Do you know if the same type
of config works over fibre channel?

 - R.


From jsquyres at cisco.com  Wed Feb  7 09:58:41 2007
From: jsquyres at cisco.com (Jeff Squyres)
Date: Wed, 7 Feb 2007 12:58:41 -0500
Subject: [openib-general] Open MPI rpmbuild fails in OFED-1.2
In-Reply-To: <1170867620.6223.11.camel@vladsk-laptop>
References: <1170866522.6223.8.camel@vladsk-laptop>
	<7FDCD3BB-A76F-4C36-8939-4E7C634F0D86@cisco.com>
	<1170867620.6223.11.camel@vladsk-laptop>
Message-ID: <212D8756-09B1-4637-ADCA-1CD8A535403A@cisco.com>

My $0.02: This is another in a growing list of issues reflecting the  
whole "build everything in DESTDIR" is a problematic approach.

I have distinct %build and %install sections in the Open MPI specfile  
-- they're really intended for two different things.  Specifically: I  
wouldn't call the SuSE %build behavior a bug -- it reflects how they  
want RPM designers to write RPMs.  It appears that we're trying to  
circumvent their intended approach.  Shouldn't that be a warning  
flag?  :-)

I've heard offhand comments that there were problems with trying to  
use chroot for building OFED.  The two that I'm aware of are:

1. need to be root to make a chroot.
    My thought: who cares?
2. takes up lots of extra disk space.
    My thought: does it matter?  Do we know of anyone who has small- 
disk servers who are building OFED? (and/or: can you hard-link files  
to make a chroot environment?  I'm don't know)

Are there other issues?  More specifically, which is going to be  
simpler: a) fixing the growing list of problems with the DESTDIR  
approach or b) switching to a chroot environment?

A simple search for "chroot" on freshmeat, for example, turns up a  
number of projects that can be used to help automate the creation of  
chroot environments.

Again -- this is all my $0.02.  Comments?


On Feb 7, 2007, at 12:00 PM, Vladimir Sokolovsky wrote:

> I propose to replace %build by %install.
> Otherwise %build removes /var/tmp/OFED (on SuSE) which includes all
> installed libraries.
>
> Regards,
> Vladimir
>
> On Wed, 2007-02-07 at 11:52 -0500, Jeff Squyres wrote:
>> The "%build" directive is not just a macro, it's also a section
>> qualifier indicating the beginning of the build section.  From
>>
>> http://fedora.redhat.com/docs/drafts/rpm-guide-en/ 
>> ch08s02.html#id2966770
>>
>> "The build section starts with a %build statement."
>>
>> Is there something else that I should replace it with that will also
>> start the build section?
>>
>>
>>
>> On Feb 7, 2007, at 11:42 AM, Vladimir Sokolovsky wrote:
>>
>>> Hi Jeff,
>>> Please remove %build macro from the RPM spec file.
>>> On SuSE distros it removes RPM_BUILD_ROOT.
>>>
>>> Executing(%build): /bin/sh -e /var/tmp/rpm-tmp.23343
>>> + umask 022
>>> + cd /var/tmp/OFEDRPM/BUILD
>>> + /bin/rm -rf /var/tmp/OFED
>>> ++ dirname /var/tmp/OFED
>>> + /bin/mkdir -p /var/tmp
>>> + /bin/mkdir /var/tmp/OFED
>>> + cd openmpi-1.2b4ofedr13470
>>> + fortify_source=1
>>> + test '' '!=' ''
>>> ...
>>>
>>> -- 
>>> Vladimir Sokolovsky <vlad at dev.mellanox.co.il>
>>> Mellanox Technologies Ltd.
>>
>>


-- 
Jeff Squyres
Server Virtualization Business Unit
Cisco Systems


From mst at mellanox.co.il  Wed Feb  7 10:24:26 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Wed, 7 Feb 2007 20:24:26 +0200
Subject: [openib-general] [PATCH] IB/ipoib get net_device from
 ipoib_neigh instead of linux neighbour
In-Reply-To: <45C9F1DE.8090409@voltaire.com>
References: <45C9F1DE.8090409@voltaire.com>
Message-ID: <20070207182426.GB9131@mellanox.co.il>

> Quoting Moni Shoua <monis at voltaire.com>:
> Subject: Re: [PATCH] IB/ipoib get net_device from ipoib_neigh instead of linux neighbour
> 
> 
> > Another concern: assume that one device goes away (e.g. hotplug).
> > It seems that neighbours whose dev field point to another device, will not be destroyed.
> > Correct?
>
> I agree.
>
> > Therefore in your design, it seems that to_ipoib_neigh()->dev
> > will get us a pointer to device that has been removed already.
> > 
> I agree that this is a problem.

I think we can solve this if we track all ipoib neighbours, like we do for old kernels,
and then flush ipoib neighbours on any hotplug event.
Roland, does this sound too awful?

> It think it would be best to prevent an IPoIB device
> from disappearing or from ib_ipoib from being unloaded as long as IPoIB
> device is a slave. Unfortunately, I don't see how this can be done just
> by fixing something in bonding or IPoIB. 

So hotplug is blocked potentially forever?
This does not sound good.

> However, any slave knows he has a master (dev->master). 
> What do you think about a solution where IPoIB first tries to clean up the
> neighbours that belong to it's master before deleting the IPoIB device?

How?

> >> Furthermore, bond_setup_by_slave is called only for non
> >> Ethernet devices (we consider to change the logic to "called only for
> >> IPoIB devices just for safety).
> > 
> > Why is this necessary, BTW?
> > 
> If we don't do that, we get a memory leak because the neigh destructor will
> never be called for non IPoIB devices although they carry ipoib_neigh
> with them.

How can this happen? If it does, I think we are back to where we started:
to_ipoib_neigh is broken for non-IPoIB device.
I thought you said only devices of the same type can be paired?


-- 
MST


From rdreier at cisco.com  Wed Feb  7 10:39:48 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Wed, 07 Feb 2007 10:39:48 -0800
Subject: [openib-general] [PATCHv6 RFC] IPoIB CM Experimental support
In-Reply-To: <20070207075339.GB20290@mellanox.co.il> (Michael S.
	Tsirkin's message of "Wed, 7 Feb 2007 09:53:39 +0200")
References: <adaveiesyrb.fsf@cisco.com> <20070207075339.GB20290@mellanox.co.il>
Message-ID: <adazm7pizgr.fsf@cisco.com>

 > Well, randomness is a resource after all, and since we don't have the additional
 > security provided by PSNs in IPoIB UD, it seemed we do not need it for
 > IPoIB CM either. So maybe the right thing is just to remove the FIXME comment.

random32() doesn't use up any entropy. Random PSNs help avoid problems
with stale connections, so I think we should do it.

I noticed some funny code in ipoib_cm_skb_reap():

	__be32 mtu = cpu_to_be32(priv->mcast_mtu);

// htonl(__be32)??
			icmp_send(skb, ICMP_DEST_UNREACH, ICMP_FRAG_NEEDED, htonl(mtu));
// no htonl() here -- is this correct?
			icmpv6_send(skb, ICMPV6_PKT_TOOBIG, 0, mtu, dev);

what is the right thing?

 - R.


From mshefty at ichips.intel.com  Wed Feb  7 10:55:00 2007
From: mshefty at ichips.intel.com (Sean Hefty)
Date: Wed, 07 Feb 2007 10:55:00 -0800
Subject: [openib-general] [PATCH] IPOIB: Use a GRH when appropriate for
 unicast packets
In-Reply-To: <20070126180840.GD12386@obsidianresearch.com>
References: <20070126000319.GA12386@obsidianresearch.com>
	<ada4pqel66k.fsf@cisco.com>
	<20070126180840.GD12386@obsidianresearch.com>
Message-ID: <45CA2084.7090503@ichips.intel.com>

> Oops, I'll fix these style things and send a new patch.

Jason, what's the status of this patch?  (I ask because I'm starting to look at 
router support in the stack.)

- Sean


From mst at mellanox.co.il  Wed Feb  7 10:57:45 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Wed, 7 Feb 2007 20:57:45 +0200
Subject: [openib-general] [PATCHv6 RFC] IPoIB CM Experimental support
In-Reply-To: <adazm7pizgr.fsf@cisco.com>
References: <adaveiesyrb.fsf@cisco.com>
	<20070207075339.GB20290@mellanox.co.il> <adazm7pizgr.fsf@cisco.com>
Message-ID: <20070207185745.GD9131@mellanox.co.il>

> Quoting Roland Dreier <rdreier at cisco.com>:
> Subject: Re: [PATCHv6 RFC] IPoIB CM Experimental support
> 
>  > Well, randomness is a resource after all, and since we don't have the additional
>  > security provided by PSNs in IPoIB UD, it seemed we do not need it for
>  > IPoIB CM either. So maybe the right thing is just to remove the FIXME comment.
> 
> random32() doesn't use up any entropy. Random PSNs help avoid problems
> with stale connections, so I think we should do it.

Well, stale connections don't pose any real problems for IPoIB CM - worst case a
connnection is torn down and recreated.  But I don't have a strong opinion
anyway - that's why I put the FIXME there. So I'm OK with random32, too.

> I noticed some funny code in ipoib_cm_skb_reap():
> 
> 	__be32 mtu = cpu_to_be32(priv->mcast_mtu);
> 
> // htonl(__be32)??
> 			icmp_send(skb, ICMP_DEST_UNREACH, ICMP_FRAG_NEEDED, htonl(mtu));
> // no htonl() here -- is this correct?
> 			icmpv6_send(skb, ICMPV6_PKT_TOOBIG, 0, mtu, dev);
> 
> what is the right thing?

Both are right I think.
These two functions seem to accept parameters in different format:

include/net/icmp.h:extern void  icmp_send(struct sk_buff *skb_in,  int type, int
					  code, __be32 info);


include/linux/icmpv6.h:extern void                icmpv6_send(struct sk_buff *skb,
include/linux/icmpv6.h-                                       int type, int code,
include/linux/icmpv6.h-                                       __u32 info,
include/linux/icmpv6.h-                                       struct net_device *dev);

BTW, I just looked at ip_gre.c and it has the same code.

-- 
MST


From ardavis at ichips.intel.com  Wed Feb  7 11:03:20 2007
From: ardavis at ichips.intel.com (Arlin Davis)
Date: Wed, 07 Feb 2007 11:03:20 -0800
Subject: [openib-general] dapltest?
In-Reply-To: <1170862218.14381.4.camel@stevo-desktop>
References: <1170862218.14381.4.camel@stevo-desktop>
Message-ID: <45CA2278.3090309@ichips.intel.com>

Steve Wise wrote:

>Hey Arlin,
>
>Shouldn't dapl/test be shipped with OFED?  It appears not to be...
>  
>

Yes,  I will try to get to this by next week at the latest. Can you add 
a bugzilla report to track against?

-arlin


From rdreier at cisco.com  Wed Feb  7 11:03:46 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Wed, 07 Feb 2007 11:03:46 -0800
Subject: [openib-general] [PATCHv6 RFC] IPoIB CM Experimental support
In-Reply-To: <20070207185745.GD9131@mellanox.co.il> (Michael S.
	Tsirkin's message of "Wed, 7 Feb 2007 20:57:45 +0200")
References: <adaveiesyrb.fsf@cisco.com>
	<20070207075339.GB20290@mellanox.co.il> <adazm7pizgr.fsf@cisco.com>
	<20070207185745.GD9131@mellanox.co.il>
Message-ID: <adazm7phjsd.fsf@cisco.com>

 > > I noticed some funny code in ipoib_cm_skb_reap():
 > > 
 > > 	__be32 mtu = cpu_to_be32(priv->mcast_mtu);
 > > 
 > > // htonl(__be32)??
 > > 			icmp_send(skb, ICMP_DEST_UNREACH, ICMP_FRAG_NEEDED, htonl(mtu));
 > > // no htonl() here -- is this correct?
 > > 			icmpv6_send(skb, ICMPV6_PKT_TOOBIG, 0, mtu, dev);
 > > 
 > > what is the right thing?
 > 
 > Both are right I think.

You're right -- the mistake is making mtu __be32 and preswapping it.
I'll fix it up in my tree.

 > These two functions seem to accept parameters in different format:
 > 
 > include/net/icmp.h:extern void  icmp_send(struct sk_buff *skb_in,  int type, int
 > 					  code, __be32 info);
 > 
 > 
 > include/linux/icmpv6.h:extern void                icmpv6_send(struct sk_buff *skb,
 > include/linux/icmpv6.h-                                       int type, int code,
 > include/linux/icmpv6.h-                                       __u32 info,
 > include/linux/icmpv6.h-                                       struct net_device *dev);
 > 
 > BTW, I just looked at ip_gre.c and it has the same code.

no, it leaves mtu as an int rather than swapping it.

 - R.


From sweitzen at cisco.com  Wed Feb  7 11:07:06 2007
From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen))
Date: Wed, 7 Feb 2007 11:07:06 -0800
Subject: [openib-general] dapltest?
In-Reply-To: <45CA2278.3090309@ichips.intel.com>
References: <1170862218.14381.4.camel@stevo-desktop>
	<45CA2278.3090309@ichips.intel.com>
Message-ID: <A15335FBE9BD2449AF2C9EF3D1EB8EA302F9BCDA@xmb-sjc-216.amer.cisco.com>

I opened bug 350, I would like dapltest (and any other useful dapl test
programs) too.

Scott 

> -----Original Message-----
> From: openib-general-bounces at openib.org 
> [mailto:openib-general-bounces at openib.org] On Behalf Of Arlin Davis
> Sent: Wednesday, February 07, 2007 11:03 AM
> To: Steve Wise
> Cc: openib-general; Arlin Davis
> Subject: Re: [openib-general] dapltest?
> 
> Steve Wise wrote:
> 
> >Hey Arlin,
> >
> >Shouldn't dapl/test be shipped with OFED?  It appears not to be...
> >  
> >
> 
> Yes,  I will try to get to this by next week at the latest. 
> Can you add 
> a bugzilla report to track against?
> 
> -arlin
> 
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit 
> http://openib.org/mailman/listinfo/openib-general
> 


From rdreier at cisco.com  Wed Feb  7 11:13:58 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Wed, 07 Feb 2007 11:13:58 -0800
Subject: [openib-general] [PATCH] IPOIB: Use a GRH when appropriate for
 unicast packets
In-Reply-To: <20070207191154.GC11411@obsidianresearch.com> (Jason
	Gunthorpe's message of "Wed, 7 Feb 2007 12:11:54 -0700")
References: <20070126000319.GA12386@obsidianresearch.com>
	<ada4pqel66k.fsf@cisco.com>
	<20070126180840.GD12386@obsidianresearch.com>
	<45CA2084.7090503@ichips.intel.com>
	<20070207191154.GC11411@obsidianresearch.com>
Message-ID: <adad54lhjbd.fsf@cisco.com>

 > I was going to resend it after Roland's earlier patch to clean up the 
 > ib_init_ah_from_path was accepted..

Sorry, I started having second thoughts about the part about changing
it to return void (it seems more sensible to check it the other places
it's called).  But I'll look at that again soon.

 - R.


From jgunthorpe at obsidianresearch.com  Wed Feb  7 11:11:54 2007
From: jgunthorpe at obsidianresearch.com (Jason Gunthorpe)
Date: Wed, 7 Feb 2007 12:11:54 -0700
Subject: [openib-general] [PATCH] IPOIB: Use a GRH when appropriate for
 unicast packets
In-Reply-To: <45CA2084.7090503@ichips.intel.com>
References: <20070126000319.GA12386@obsidianresearch.com>
	<ada4pqel66k.fsf@cisco.com>
	<20070126180840.GD12386@obsidianresearch.com>
	<45CA2084.7090503@ichips.intel.com>
Message-ID: <20070207191154.GC11411@obsidianresearch.com>

On Wed, Feb 07, 2007 at 10:55:00AM -0800, Sean Hefty wrote:
> >Oops, I'll fix these style things and send a new patch.
> 
> Jason, what's the status of this patch?  (I ask because I'm starting to 
> look at router support in the stack.)

I was going to resend it after Roland's earlier patch to clean up the 
ib_init_ah_from_path was accepted..

I didn't get too far on getting CMA to work. Beyond the bad HopLimit
feild I was seeing Hal pointed out a number of problems in IBA that
would prevent it from working as is :<

Jason


From changquing.tang at hp.com  Wed Feb  7 11:38:07 2007
From: changquing.tang at hp.com (Tang, Changqing)
Date: Wed, 7 Feb 2007 19:38:07 -0000
Subject: [openib-general] Immediate data question
In-Reply-To: <ada7iuwp5rr.fsf@cisco.com>
References: <6C2C79E72C305246B504CBA17B5500C905DC04@mtlexch01.mtl.com><adaveigvg7q.fsf@cisco.com><349DCDA352EACF42A0C49FA6DCEA840350AAC4@G3W0634.americas.hpqcorp.net><adatzy0qmt3.fsf@cisco.com><349DCDA352EACF42A0C49FA6DCEA840350B1B5@G3W0634.americas.hpqcorp.net>
	<ada7iuwp5rr.fsf@cisco.com>
Message-ID: <349DCDA352EACF42A0C49FA6DCEA84035DF880@G3W0634.americas.hpqcorp.net>


Roland:
	This is a followup question. If one process uses
IBV_WR_SEND_WITH_IMM  and IBV_SEND_INLINE to send 8 bytes, but the
receiver process does not post the corresponding receive to the QP,
instead,  this receiver process and other processes are doing  heavy
RDMA_WRITE/READ traffic each other.

	Does this pending SEND_WITH_IMM message affect the performance
of the receiver process ? Is this message buffered in the receiver's
HCA, or the sender retry and get RNR ack until receiver posts a receive
?

	Thanks.

--CQ


> -----Original Message-----
> From: Roland Dreier [mailto:rdreier at cisco.com] 
> Sent: Monday, February 05, 2007 5:03 PM
> To: Tang, Changqing
> Cc: Michael S. Tsirkin; openib-general at openib.org
> Subject: Re: Immediate data question
> 
>     Changqing> Thank you. Other than using immediate data to send
>     Changqing> notification from one end to the other of a QP, is
>     Changqing> there any other way to do this ? For example, can I
>     Changqing> modify QP state from RTS to other state on one end, and
>     Changqing> then the other end gets some notification when I query
>     Changqing> the QP ?
> 
> Not that I know of.  You would need to do something that 
> triggers something to be sent on the wire, and I don't know 
> of any way to do that other than posting a work request.
> 
>  - R.
> 


From mst at mellanox.co.il  Wed Feb  7 11:49:49 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Wed, 7 Feb 2007 21:49:49 +0200
Subject: [openib-general] Open MPI rpmbuild fails in OFED-1.2
In-Reply-To: <212D8756-09B1-4637-ADCA-1CD8A535403A@cisco.com>
References: <1170866522.6223.8.camel@vladsk-laptop>
	<7FDCD3BB-A76F-4C36-8939-4E7C634F0D86@cisco.com>
	<1170867620.6223.11.camel@vladsk-laptop>
	<212D8756-09B1-4637-ADCA-1CD8A535403A@cisco.com>
Message-ID: <20070207194949.GB12140@mellanox.co.il>

> Quoting Jeff Squyres <jsquyres at cisco.com>:
> Subject: Re: Open MPI rpmbuild fails in OFED-1.2
> 
> My $0.02: This is another in a growing list of issues reflecting the  
> whole "build everything in DESTDIR" is a problematic approach.

I don't know much about RPM, and I am not exactly sure why are
our source RPMs so complicated.

However, with the plan configure/make we are able to
build all openfabrics components within build directory,
without any chroot tricks.

So let's not give up yet, IMO it is very nice to be able to build in
standard environment, without being root.

Note that what is biting us here is mostly the large number of modules:
simple single-module packages don't have this problem - and this
is really a design decision we took.

-- 
MST


From mst at mellanox.co.il  Wed Feb  7 11:55:19 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Wed, 7 Feb 2007 21:55:19 +0200
Subject: [openib-general] [PATCHv6 RFC] IPoIB CM Experimental support
In-Reply-To: <adazm7phjsd.fsf@cisco.com>
References: <adaveiesyrb.fsf@cisco.com>
	<20070207075339.GB20290@mellanox.co.il> <adazm7pizgr.fsf@cisco.com>
	<20070207185745.GD9131@mellanox.co.il> <adazm7phjsd.fsf@cisco.com>
Message-ID: <20070207195519.GC12140@mellanox.co.il>

> Quoting Roland Dreier <rdreier at cisco.com>:
> Subject: Re: [PATCHv6 RFC] IPoIB CM Experimental support
> 
>  > > I noticed some funny code in ipoib_cm_skb_reap():
>  > > 
>  > > 	__be32 mtu = cpu_to_be32(priv->mcast_mtu);
>  > > 
>  > > // htonl(__be32)??
>  > > 			icmp_send(skb, ICMP_DEST_UNREACH, ICMP_FRAG_NEEDED, htonl(mtu));
>  > > // no htonl() here -- is this correct?
>  > > 			icmpv6_send(skb, ICMPV6_PKT_TOOBIG, 0, mtu, dev);
>  > > 
>  > > what is the right thing?
>  > 
>  > Both are right I think.
> 
> You're right -- the mistake is making mtu __be32 and preswapping it.
> I'll fix it up in my tree.

Let me know when you push it out, I'll start testing it.

>  > These two functions seem to accept parameters in different format:
>  > 
>  > include/net/icmp.h:extern void  icmp_send(struct sk_buff *skb_in,  int type, int
>  > 					  code, __be32 info);
>  > 
>  > 
>  > include/linux/icmpv6.h:extern void                icmpv6_send(struct sk_buff *skb,
>  > include/linux/icmpv6.h-                                       int type, int code,
>  > include/linux/icmpv6.h-                                       __u32 info,
>  > include/linux/icmpv6.h-                                       struct net_device *dev);
>  > 
>  > BTW, I just looked at ip_gre.c and it has the same code.
> 
> no, it leaves mtu as an int rather than swapping it.

You are right of course. sparse would have found it.

-- 
MST


From swise at opengridcomputing.com  Wed Feb  7 12:02:23 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Wed, 07 Feb 2007 14:02:23 -0600
Subject: [openib-general] dapl broken for iWARP
Message-ID: <1170878543.30334.52.camel@stevo-desktop>

Arlin,

The OFED dapl code is assuming the responder_resources and
initiator_depth passed up on a connection request event are from the
remote peer.  This doesn't happen for iWARP.  In the current iWARP
specifications, its up to the application to exchange this information
somehow. So these are defaulting to 0 on the server side of any dapl
connection over iWARP.  

This is a fairly recent change, I think.  We need to come up with some
way to deal with this for OFED 1.2 IMO.


Steve.


From mshefty at ichips.intel.com  Wed Feb  7 12:24:08 2007
From: mshefty at ichips.intel.com (Sean Hefty)
Date: Wed, 07 Feb 2007 12:24:08 -0800
Subject: [openib-general] [PATCH] IPOIB: Use a GRH when appropriate for
 unicast packets
In-Reply-To: <20070207191154.GC11411@obsidianresearch.com>
References: <20070126000319.GA12386@obsidianresearch.com>
	<ada4pqel66k.fsf@cisco.com>
	<20070126180840.GD12386@obsidianresearch.com>
	<45CA2084.7090503@ichips.intel.com>
	<20070207191154.GC11411@obsidianresearch.com>
Message-ID: <45CA3568.1000508@ichips.intel.com>

> I didn't get too far on getting CMA to work. Beyond the bad HopLimit
> feild I was seeing Hal pointed out a number of problems in IBA that
> would prevent it from working as is :<

I've started thinking about what it would take to get the rdma cm to work across 
a router.  I think the rdma cm may need to treat IPv6 addresses as a GID for 
this to work across subnets, versus trying to map an ipoib IP address to a GID 
based on ARP.

- Sean


From mst at mellanox.co.il  Wed Feb  7 11:59:14 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Wed, 7 Feb 2007 21:59:14 +0200
Subject: [openib-general] RFC ofed 1 2 kernel file structure
In-Reply-To: <20070206150321.GA21776@mellanox.co.il>
References: <6C2C79E72C305246B504CBA17B5500C905DC04@mtlexch01.mtl.com>
	<20070206150321.GA21776@mellanox.co.il>
Message-ID: <20070207195914.GD12140@mellanox.co.il>

Repost. Could everyone please look at
git://git.openfabrics.org/~mst/newofed.git
and tell me whether this looks acceptable?

Thanks,
	MST

Quoting r. Michael S. Tsirkin <mst at mellanox.co.il>:
Subject: Re: idea for ofed 1 2 kernel file structure

> Quoting  Michael S. Tsirkin <mst at mellanox.co.il>:
> It would easy to split OFED specific files In separate directory and have OFED
> 
> All out of tree modules we distribute would go there too.
> 
> What do others think about this?

OK, I didn't quite get whether the majority likes this or not,
so I created such a repository, extracted the ofed specific history
and imported it there.

Take a look here:
git://git.openfabrics.org/~mst/newofed.git

Build scripts will have to be adjusted to add
necessary kernel components that we use.

Another nice thing about this layout, is that users (if they so wish)
will be able to use just linux kernel source tarball instead of full linux
kernel git.

OFED maintainers, you are the primary users of the OFED git.
Please comment which layout is better for you.

-- 
MST

_______________________________________________
openib-general mailing list
openib-general at openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

-- 
MST


From mshefty at ichips.intel.com  Wed Feb  7 13:07:48 2007
From: mshefty at ichips.intel.com (Sean Hefty)
Date: Wed, 07 Feb 2007 13:07:48 -0800
Subject: [openib-general] RFC ofed 1 2 kernel file structure
In-Reply-To: <20070207195914.GD12140@mellanox.co.il>
References: <6C2C79E72C305246B504CBA17B5500C905DC04@mtlexch01.mtl.com>
	<20070206150321.GA21776@mellanox.co.il>
	<20070207195914.GD12140@mellanox.co.il>
Message-ID: <45CA3FA4.9050900@ichips.intel.com>

Michael S. Tsirkin wrote:
> Repost. Could everyone please look at
> git://git.openfabrics.org/~mst/newofed.git
> and tell me whether this looks acceptable?

I don't see anything listed for this off of the web site, and cloning it 
produces an empty tree.

- Sean


From HNGUYEN at de.ibm.com  Wed Feb  7 12:56:17 2007
From: HNGUYEN at de.ibm.com (Hoang-Nam Nguyen)
Date: Wed, 7 Feb 2007 21:56:17 +0100
Subject: [openib-general] RFC ofed 1 2 kernel file structure
In-Reply-To: <OF2E30C900.7023ABF3-ONC125727B.00729C88-C125727B.0072ED54@LocalDomain>
Message-ID: <OF7664C3CE.3203B60C-ONC125727B.0072FA58-C125727B.007303E9@de.ibm.com>

> I could clone it:
Should be "I could not clone it"


From HNGUYEN at de.ibm.com  Wed Feb  7 12:55:19 2007
From: HNGUYEN at de.ibm.com (Hoang-Nam Nguyen)
Date: Wed, 7 Feb 2007 21:55:19 +0100
Subject: [openib-general] RFC ofed 1 2 kernel file structure
In-Reply-To: <20070207195914.GD12140@mellanox.co.il>
Message-ID: <OF2E30C900.7023ABF3-ONC125727B.00729C88-C125727B.0072ED54@de.ibm.com>

Hi Michael,
> Repost. Could everyone please look at
> git://git.openfabrics.org/~mst/newofed.git
> and tell me whether this looks acceptable?
I could clone it:
$git clone git://git.openfabrics.org/~mst/newofed.git
fatal: Unable to look up git.openfabrics.org (Temporary failure in name
resolution)
fetch-pack from 'git://git.openfabrics.org/~mst/newofed.git' failed.
$git clone git://git.openfabrics.org/~mst/newofed.git
fatal: Unable to look up git.openfabrics.org (Temporary failure in name
resolution)
fetch-pack from 'git://git.openfabrics.org/~mst/newofed.git' failed.

I tried to use web git pointing to
http://www.openfabrics.org/git/?p=~mst/newofed.git;a=tree
and got this:
403 Forbidden - Reading tree failed

Is there something else I need to pay attention of?

Thanks
Nam


From mst at mellanox.co.il  Wed Feb  7 13:18:00 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Wed, 7 Feb 2007 23:18:00 +0200
Subject: [openib-general] RFC ofed 1 2 kernel file structure
In-Reply-To: <OF2E30C900.7023ABF3-ONC125727B.00729C88-C125727B.0072ED54@de.ibm.com>
References: <OF2E30C900.7023ABF3-ONC125727B.00729C88-C125727B.0072ED54@de.ibm.com>
Message-ID: <20070207211800.GI12140@mellanox.co.il>

> Quoting r. Hoang-Nam Nguyen <HNGUYEN at de.ibm.com>:
> Subject: Re: [openib-general] RFC ofed 1 2 kernel file structure
> 
> Hi Michael,
> > Repost. Could everyone please look at
> > git://git.openfabrics.org/~mst/newofed.git
> > and tell me whether this looks acceptable?
> I could clone it:
> $git clone git://git.openfabrics.org/~mst/newofed.git
> fatal: Unable to look up git.openfabrics.org (Temporary failure in name
> resolution)
> fetch-pack from 'git://git.openfabrics.org/~mst/newofed.git' failed.
> $git clone git://git.openfabrics.org/~mst/newofed.git
> fatal: Unable to look up git.openfabrics.org (Temporary failure in name
> resolution)
> fetch-pack from 'git://git.openfabrics.org/~mst/newofed.git' failed.
> 
> I tried to use web git pointing to
> http://www.openfabrics.org/git/?p=~mst/newofed.git;a=tree
> and got this:
> 403 Forbidden - Reading tree failed
> 
> Is there something else I need to pay attention of?

Pls try again.


-- 
MST


From mst at mellanox.co.il  Wed Feb  7 13:18:23 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Wed, 7 Feb 2007 23:18:23 +0200
Subject: [openib-general] RFC ofed 1 2 kernel file structure
In-Reply-To: <45CA3FA4.9050900@ichips.intel.com>
References: <45CA3FA4.9050900@ichips.intel.com>
Message-ID: <20070207211823.GJ12140@mellanox.co.il>

> Quoting Sean Hefty <mshefty at ichips.intel.com>:
> Subject: Re: [openib-general] RFC ofed 1 2 kernel file structure
> 
> Michael S. Tsirkin wrote:
> > Repost. Could everyone please look at
> > git://git.openfabrics.org/~mst/newofed.git
> > and tell me whether this looks acceptable?
> 
> I don't see anything listed for this off of the web site, and cloning it 
> produces an empty tree.

Pls try again now.

-- 
MST


From rdreier at cisco.com  Wed Feb  7 13:26:31 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Wed, 07 Feb 2007 13:26:31 -0800
Subject: [openib-general] Immediate data question
In-Reply-To: <349DCDA352EACF42A0C49FA6DCEA84035DF880@G3W0634.americas.hpqcorp.net>
	(Changqing Tang's message of "Wed, 7 Feb 2007 19:38:07 -0000")
References: <6C2C79E72C305246B504CBA17B5500C905DC04@mtlexch01.mtl.com>
	<adaveigvg7q.fsf@cisco.com>
	<349DCDA352EACF42A0C49FA6DCEA840350AAC4@G3W0634.americas.hpqcorp.net>
	<adatzy0qmt3.fsf@cisco.com>
	<349DCDA352EACF42A0C49FA6DCEA840350B1B5@G3W0634.americas.hpqcorp.net>
	<ada7iuwp5rr.fsf@cisco.com>
	<349DCDA352EACF42A0C49FA6DCEA84035DF880@G3W0634.americas.hpqcorp.net>
Message-ID: <adamz3pfym0.fsf@cisco.com>

    Changqing> 	Does this pending SEND_WITH_IMM message affect the
    Changqing> performance of the receiver process ? Is this message
    Changqing> buffered in the receiver's HCA, or the sender retry and
    Changqing> get RNR ack until receiver posts a receive ?

If no receive is pending, then the responder sends an RNR NAK and the
sender will wait for the RNR timeout and retry, etc.

 - R.


From jgunthorpe at obsidianresearch.com  Wed Feb  7 13:31:08 2007
From: jgunthorpe at obsidianresearch.com (Jason Gunthorpe)
Date: Wed, 7 Feb 2007 14:31:08 -0700
Subject: [openib-general] [PATCH] IPOIB: Use a GRH when appropriate for
 unicast packets
In-Reply-To: <45CA3568.1000508@ichips.intel.com>
References: <20070126000319.GA12386@obsidianresearch.com>
	<ada4pqel66k.fsf@cisco.com>
	<20070126180840.GD12386@obsidianresearch.com>
	<45CA2084.7090503@ichips.intel.com>
	<20070207191154.GC11411@obsidianresearch.com>
	<45CA3568.1000508@ichips.intel.com>
Message-ID: <20070207213108.GD11411@obsidianresearch.com>

On Wed, Feb 07, 2007 at 12:24:08PM -0800, Sean Hefty wrote:
> >I didn't get too far on getting CMA to work. Beyond the bad HopLimit
> >feild I was seeing Hal pointed out a number of problems in IBA that
> >would prevent it from working as is :<
> 
> I've started thinking about what it would take to get the rdma cm to work 
> across a router.  I think the rdma cm may need to treat IPv6 addresses as a 
> GID for this to work across subnets, versus trying to map an ipoib IP 
> address to a GID based on ARP.

I don't think that is the main problem - though clearly the way things
are now (for better or worse) rdma cm requires the IPoIB subnet to
span all of the IB subnets.. The main problem with the protocol is in
the LID selection for routed paths on the passive side. It can't rely
on the active side to identify the lids if a router is involved.

One feature I've thought has been underused in IBA is the raw IPv6
packet feature. It would be nice to have a linux netdev interface to
be able to do IPv6 traffic using GID addressing. That would seem to me
to be the natural way to bolt native GID addressing into rdma
cm..

Jason


From rdreier at cisco.com  Wed Feb  7 13:35:25 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Wed, 07 Feb 2007 13:35:25 -0800
Subject: [openib-general] [PATCH] IPOIB: Use a GRH when appropriate for
 unicast packets
In-Reply-To: <45CA3568.1000508@ichips.intel.com> (Sean Hefty's message
	of "Wed, 07 Feb 2007 12:24:08 -0800")
References: <20070126000319.GA12386@obsidianresearch.com>
	<ada4pqel66k.fsf@cisco.com>
	<20070126180840.GD12386@obsidianresearch.com>
	<45CA2084.7090503@ichips.intel.com>
	<20070207191154.GC11411@obsidianresearch.com>
	<45CA3568.1000508@ichips.intel.com>
Message-ID: <ada7iutfy76.fsf@cisco.com>

 > I've started thinking about what it would take to get the rdma cm to
 > work across a router.  I think the rdma cm may need to treat IPv6
 > addresses as a GID for this to work across subnets, versus trying to
 > map an ipoib IP address to a GID based on ARP.

Hmm, why is that?  Shouldn't IPoIB work through a router, and
correctly get the GID of the final destination via ARP just fine?

If the RDMA CM treats IPv6 addresses as GIDs, then this breaks things
on a normal subnet with IPoIB interfaces configured with IPv6 addresses.

 - R.


From swise at opengridcomputing.com  Wed Feb  7 13:57:40 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Wed, 07 Feb 2007 15:57:40 -0600
Subject: [openib-general] dapl broken for iWARP
In-Reply-To: <1170878543.30334.52.camel@stevo-desktop>
References: <1170878543.30334.52.camel@stevo-desktop>
Message-ID: <1170885460.31481.0.camel@stevo-desktop>


On Wed, 2007-02-07 at 14:02 -0600, Steve Wise wrote:
> Arlin,
> 
> The OFED dapl code is assuming the responder_resources and
> initiator_depth passed up on a connection request event are from the
> remote peer.  This doesn't happen for iWARP.  In the current iWARP
> specifications, its up to the application to exchange this information
> somehow. So these are defaulting to 0 on the server side of any dapl
> connection over iWARP.  
> 
> This is a fairly recent change, I think.  We need to come up with some
> way to deal with this for OFED 1.2 IMO.
> 

The IWCM could set these to the device max values for instance.

Steve.


From jgunthorpe at obsidianresearch.com  Wed Feb  7 14:03:04 2007
From: jgunthorpe at obsidianresearch.com (Jason Gunthorpe)
Date: Wed, 7 Feb 2007 15:03:04 -0700
Subject: [openib-general] [PATCH] IPOIB: Use a GRH when appropriate for
 unicast packets
In-Reply-To: <ada7iutfy76.fsf@cisco.com>
References: <20070126000319.GA12386@obsidianresearch.com>
	<ada4pqel66k.fsf@cisco.com>
	<20070126180840.GD12386@obsidianresearch.com>
	<45CA2084.7090503@ichips.intel.com>
	<20070207191154.GC11411@obsidianresearch.com>
	<45CA3568.1000508@ichips.intel.com> <ada7iutfy76.fsf@cisco.com>
Message-ID: <20070207220304.GE11411@obsidianresearch.com>

On Wed, Feb 07, 2007 at 01:35:25PM -0800, Roland Dreier wrote:
> Hmm, why is that?  Shouldn't IPoIB work through a router, and
> correctly get the GID of the final destination via ARP just fine?

Basically, if IB routers are used, and the IPoIB feature of *not*
spanning a subnet is used (for scalabililty?) then you need an
alternate way to specify addresses to rdma cm.

I agree that special casing some IPv6 addresses is a bad idea. It
needs to be integrated correctly with NET and the routing table/etc

Jason


From mst at mellanox.co.il  Wed Feb  7 14:14:49 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Thu, 8 Feb 2007 00:14:49 +0200
Subject: [openib-general] RFC ofed 1 2 kernel file structure
In-Reply-To: <20070207195914.GD12140@mellanox.co.il>
References: <6C2C79E72C305246B504CBA17B5500C905DC04@mtlexch01.mtl.com>
	<20070206150321.GA21776@mellanox.co.il>
	<20070207195914.GD12140@mellanox.co.il>
Message-ID: <20070207221449.GL12140@mellanox.co.il>

> Quoting Michael S. Tsirkin <mst at mellanox.co.il>:
> Subject: RFC ofed 1 2 kernel file structure
> 
> Repost. Could everyone please look at
> git://git.openfabrics.org/~mst/newofed.git
> and tell me whether this looks acceptable?

All, pls try now.
	
-- 
MST


From bos at pathscale.com  Wed Feb  7 14:18:56 2007
From: bos at pathscale.com (Bryan O'Sullivan)
Date: Wed, 07 Feb 2007 14:18:56 -0800
Subject: [openib-general] RFC ofed 1 2 kernel file structure
In-Reply-To: <20070207221449.GL12140@mellanox.co.il>
References: <6C2C79E72C305246B504CBA17B5500C905DC04@mtlexch01.mtl.com>
	<20070206150321.GA21776@mellanox.co.il>
	<20070207195914.GD12140@mellanox.co.il>
	<20070207221449.GL12140@mellanox.co.il>
Message-ID: <45CA5050.2070105@pathscale.com>

Michael S. Tsirkin wrote:

> All, pls try now.

This is similar in layout to the sort of tree we've used internally all 
along, so it's fine by me.  One small problem: I don't like the 
combination of lower and upper case names of makefile and Makefile in 
the top-level directory.

Also, it's no longer obvious to me to tell what kernel version the 
sources are pulled from.  I used to be able to check the top-level 
Makefile or git history, but I no longer know what to look at.

	<b


From changquing.tang at hp.com  Wed Feb  7 14:28:48 2007
From: changquing.tang at hp.com (Tang, Changqing)
Date: Wed, 7 Feb 2007 22:28:48 -0000
Subject: [openib-general] Immediate data question
In-Reply-To: <adamz3pfym0.fsf@cisco.com>
References: <6C2C79E72C305246B504CBA17B5500C905DC04@mtlexch01.mtl.com><adaveigvg7q.fsf@cisco.com><349DCDA352EACF42A0C49FA6DCEA840350AAC4@G3W0634.americas.hpqcorp.net><adatzy0qmt3.fsf@cisco.com><349DCDA352EACF42A0C49FA6DCEA840350B1B5@G3W0634.americas.hpqcorp.net><ada7iuwp5rr.fsf@cisco.com><349DCDA352EACF42A0C49FA6DCEA84035DF880@G3W0634.americas.hpqcorp.net>
	<adamz3pfym0.fsf@cisco.com>
Message-ID: <349DCDA352EACF42A0C49FA6DCEA84035DFAFF@G3W0634.americas.hpqcorp.net>

>     Changqing> 	Does this pending SEND_WITH_IMM message 
> affect the
>     Changqing> performance of the receiver process ? Is this message
>     Changqing> buffered in the receiver's HCA, or the sender retry and
>     Changqing> get RNR ack until receiver posts a receive ?
> 
> If no receive is pending, then the responder sends an RNR NAK 
> and the sender will wait for the RNR timeout and retry, etc.

What I mean is that, is there any performance penalty for receiver's
overall performance if RNR happens continuously on one of the QP ?

--CQ


> 
>  - R.
> 


From mshefty at ichips.intel.com  Wed Feb  7 14:31:10 2007
From: mshefty at ichips.intel.com (Sean Hefty)
Date: Wed, 07 Feb 2007 14:31:10 -0800
Subject: [openib-general] [PATCH] IPOIB: Use a GRH when appropriate for
 unicast packets
In-Reply-To: <20070207220304.GE11411@obsidianresearch.com>
References: <20070126000319.GA12386@obsidianresearch.com>
	<ada4pqel66k.fsf@cisco.com>
	<20070126180840.GD12386@obsidianresearch.com>
	<45CA2084.7090503@ichips.intel.com>
	<20070207191154.GC11411@obsidianresearch.com>
	<45CA3568.1000508@ichips.intel.com> <ada7iutfy76.fsf@cisco.com>
	<20070207220304.GE11411@obsidianresearch.com>
Message-ID: <45CA532E.3050605@ichips.intel.com>

> Basically, if IB routers are used, and the IPoIB feature of *not*
> spanning a subnet is used (for scalabililty?) then you need an
> alternate way to specify addresses to rdma cm.

This was the case I was thinking of.  Without global IB name service resolution, 
how do you get the GID of the remote system?

> I agree that special casing some IPv6 addresses is a bad idea. It
> needs to be integrated correctly with NET and the routing table/etc

I haven't given this more than a few minutes of thought, but I was thinking more 
along the lines of a port having an assigned GID that's the same as an assigned 
IPv6 address.  (Is there some reason this wouldn't work?)  IP name service 
resolution would map the name to the IPv6 address.  The mapping from the IPv6 
address to a GID would then be straightforward, as opposed to using a mapping 
using ARP.

If name service resolution gives me an IPv6 address that's off of the local 
subnet, but the ARP response gives me an address that's on the local subnet, 
then I think we can assume that ARP was unsuccessful is resolving the address to 
the remote GID.  (I.e. the GID should be for a router.)  If this is true, then 
we need some other way to acquire the DGID.

- Sean


From pw at osc.edu  Wed Feb  7 14:31:46 2007
From: pw at osc.edu (Pete Wyckoff)
Date: Wed, 7 Feb 2007 17:31:46 -0500
Subject: [openib-general] sharing qp between user and kernel
Message-ID: <20070207223146.GA28637@osc.edu>

We're writing a kernel module that is an IB verbs consumer.  The
plan was to connect up the QP in userspace and do some preliminary
communication, then hand the QP to the kernel and let it use the QP
directly to do some more communication.  This works fine on ammasso,
but fails on mthca.

In particular, this code in mthca_alloc_wqe_buf():

        /*
         * If this is a userspace QP, we don't actually have to 
         * allocate anything.  All we need is to calculate the WQE
         * sizes and the send_wqe_offset, so we're done now.
         */
        if (pd->ibpd.uobject)
                return 0;

prevents the allocation of space for WQEs required by
kernel-initiated posts.  Just commenting out this section led to
failures elsewhere (local prot error on a userspace cq poll for a
receive).

Before I dig into this anymore, do you expect this to work?  Are
there fundamental problems with QP sharing between user and kernel?
It would sure be nice not to have to stick the connection management
aspects into the kernel.

		-- Pete


From swise at opengridcomputing.com  Wed Feb  7 14:40:48 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Wed, 07 Feb 2007 16:40:48 -0600
Subject: [openib-general] sharing qp between user and kernel
In-Reply-To: <20070207223146.GA28637@osc.edu>
References: <20070207223146.GA28637@osc.edu>
Message-ID: <1170888048.31481.3.camel@stevo-desktop>

On Wed, 2007-02-07 at 17:31 -0500, Pete Wyckoff wrote:
> is an IB verbs consumer.  The
> plan was to connect up the QP in userspace and do some preliminary
> communication, then hand the QP to the kernel and let it use the QP
> directly to do some more communication.  This works fine on ammasso,
> but fails on mthca. 

I think the only reason it works on ammasso is because ammasso doesn't
do any kernel bypass.  

For devices that _do_ kernel bypass, I'm not sure it will work.  

It will _not_ work for the Chelsio iWARP device as its implemented
today.  Once the decision is made to do kernel bypass, the kernel looses
track of the state of the resources shared by HW and library.

Steve.


From mshefty at ichips.intel.com  Wed Feb  7 14:40:51 2007
From: mshefty at ichips.intel.com (Sean Hefty)
Date: Wed, 07 Feb 2007 14:40:51 -0800
Subject: [openib-general] [PATCH] IPOIB: Use a GRH when appropriate for
 unicast packets
In-Reply-To: <20070207213108.GD11411@obsidianresearch.com>
References: <20070126000319.GA12386@obsidianresearch.com>
	<ada4pqel66k.fsf@cisco.com>
	<20070126180840.GD12386@obsidianresearch.com>
	<45CA2084.7090503@ichips.intel.com>
	<20070207191154.GC11411@obsidianresearch.com>
	<45CA3568.1000508@ichips.intel.com>
	<20070207213108.GD11411@obsidianresearch.com>
Message-ID: <45CA5573.80802@ichips.intel.com>

> I don't think that is the main problem - though clearly the way things
> are now (for better or worse) rdma cm requires the IPoIB subnet to
> span all of the IB subnets.. The main problem with the protocol is in
> the LID selection for routed paths on the passive side. It can't rely
> on the active side to identify the lids if a router is involved.

Are you referring to the SLID in the CM REQ?  If so, I've been looking at this 
issue as well.  I simply cannot think of any way to come up with this LID, and 
my current solution is to punt this problem over to the passive side, which 
could use the SLID of the router that the CM REQ is received from.  If not, 
well, then I just rambled more than usual.

- Sean


From jgunthorpe at obsidianresearch.com  Wed Feb  7 14:49:28 2007
From: jgunthorpe at obsidianresearch.com (Jason Gunthorpe)
Date: Wed, 7 Feb 2007 15:49:28 -0700
Subject: [openib-general] [PATCH] IPOIB: Use a GRH when appropriate for
 unicast packets
In-Reply-To: <45CA5573.80802@ichips.intel.com>
References: <20070126000319.GA12386@obsidianresearch.com>
	<ada4pqel66k.fsf@cisco.com>
	<20070126180840.GD12386@obsidianresearch.com>
	<45CA2084.7090503@ichips.intel.com>
	<20070207191154.GC11411@obsidianresearch.com>
	<45CA3568.1000508@ichips.intel.com>
	<20070207213108.GD11411@obsidianresearch.com>
	<45CA5573.80802@ichips.intel.com>
Message-ID: <20070207224928.GF11411@obsidianresearch.com>

On Wed, Feb 07, 2007 at 02:40:51PM -0800, Sean Hefty wrote:
> Are you referring to the SLID in the CM REQ?  If so, I've been looking at 
> this issue as well.  I simply cannot think of any way to come up with this 
> LID, and my current solution is to punt this problem over to the passive 
> side, which could use the SLID of the router that the CM REQ is received 
> from.  If not, well, then I just rambled more than usual.

Yes, this is the problem.

The active side clearly cannot learn what the SLID of the passive
side's router should be.

We don't want to have the routers snoop and alter CM GMPs.

The passive side cannot use information from the LRH to get the router
LID since the LRH may not be reversible.

The only option seems to be to have the passive side do a path record
query on a SGID in the CM REQ...

This is a spec problem unfortunately.

Jason


From ardavis at ichips.intel.com  Wed Feb  7 15:05:38 2007
From: ardavis at ichips.intel.com (Arlin Davis)
Date: Wed, 07 Feb 2007 15:05:38 -0800
Subject: [openib-general] dapl broken for iWARP
In-Reply-To: <1170885460.31481.0.camel@stevo-desktop>
References: <1170878543.30334.52.camel@stevo-desktop>
	<1170885460.31481.0.camel@stevo-desktop>
Message-ID: <45CA5B42.6090503@ichips.intel.com>

Steve Wise wrote:

>On Wed, 2007-02-07 at 14:02 -0600, Steve Wise wrote:
>  
>
>>Arlin,
>>
>>The OFED dapl code is assuming the responder_resources and
>>initiator_depth passed up on a connection request event are from the
>>remote peer.  This doesn't happen for iWARP.  In the current iWARP
>>specifications, its up to the application to exchange this information
>>somehow. So these are defaulting to 0 on the server side of any dapl
>>connection over iWARP.  
>>
>>This is a fairly recent change, I think.  We need to come up with some
>>way to deal with this for OFED 1.2 IMO.
>>    
>>
Yes, this was changed recently to sync up with the rdma_cm changes that 
exposed the values.

>>    
>>
>
>The IWCM could set these to the device max values for instance.
>  
>
That would work fine as long as you know the remote settings will be 
equal or better. The provider just sets the min of local device max 
values and the remote values provided with the request.

-arlin


From mshefty at ichips.intel.com  Wed Feb  7 15:09:17 2007
From: mshefty at ichips.intel.com (Sean Hefty)
Date: Wed, 07 Feb 2007 15:09:17 -0800
Subject: [openib-general] [PATCH] IPOIB: Use a GRH when appropriate for
 unicast packets
In-Reply-To: <20070207224928.GF11411@obsidianresearch.com>
References: <20070126000319.GA12386@obsidianresearch.com>
	<ada4pqel66k.fsf@cisco.com>
	<20070126180840.GD12386@obsidianresearch.com>
	<45CA2084.7090503@ichips.intel.com>
	<20070207191154.GC11411@obsidianresearch.com>
	<45CA3568.1000508@ichips.intel.com>
	<20070207213108.GD11411@obsidianresearch.com>
	<45CA5573.80802@ichips.intel.com>
	<20070207224928.GF11411@obsidianresearch.com>
Message-ID: <45CA5C1D.4060009@ichips.intel.com>

> We don't want to have the routers snoop and alter CM GMPs.

agreed

> The passive side cannot use information from the LRH to get the router
> LID since the LRH may not be reversible.

argh... I was interpreting symmetric paths at the network layer (SGID to DGID) 
and applying it at the link layer as well.  (See the last couple of sentences on 
page 222 of the spec.)

> The only option seems to be to have the passive side do a path record
> query on a SGID in the CM REQ...

I've thought of that as well, and this is what Yaron mentioned in his OFA DevCon 
slides as well.  I'd just like to avoid adding even more complexity to the ib_cm 
state management if at all possible.

> This is a spec problem unfortunately.

aye...

- Sean


From swise at opengridcomputing.com  Wed Feb  7 15:12:09 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Wed, 07 Feb 2007 17:12:09 -0600
Subject: [openib-general] dapl broken for iWARP
In-Reply-To: <45CA5B42.6090503@ichips.intel.com>
References: <1170878543.30334.52.camel@stevo-desktop>
	<1170885460.31481.0.camel@stevo-desktop>
	<45CA5B42.6090503@ichips.intel.com>
Message-ID: <1170889929.31481.11.camel@stevo-desktop>

On Wed, 2007-02-07 at 15:05 -0800, Arlin Davis wrote:
> Steve Wise wrote:
> 
> >On Wed, 2007-02-07 at 14:02 -0600, Steve Wise wrote:
> >  
> >
> >>Arlin,
> >>
> >>The OFED dapl code is assuming the responder_resources and
> >>initiator_depth passed up on a connection request event are from the
> >>remote peer.  This doesn't happen for iWARP.  In the current iWARP
> >>specifications, its up to the application to exchange this information
> >>somehow. So these are defaulting to 0 on the server side of any dapl
> >>connection over iWARP.  
> >>
> >>This is a fairly recent change, I think.  We need to come up with some
> >>way to deal with this for OFED 1.2 IMO.
> >>    
> >>
> Yes, this was changed recently to sync up with the rdma_cm changes that 
> exposed the values.
> 
> >>    
> >>
> >
> >The IWCM could set these to the device max values for instance.
> >  
> >
> That would work fine as long as you know the remote settings will be 
> equal or better. The provider just sets the min of local device max 
> values and the remote values provided with the request.
> 

I know Krishna Kumar is working on a solution for exchanging this info
in private data so the IWCM can "do the right thing".  Stay tuned for a
patch series to review for this.  But this functionality is definitely
post OFED-1.2.  


So for the OFED-1.2, I will set these to the device max in the IWCM.
Assuming the other side is OFED 1.2 DAPL, then it will work fine.

Steve.


From jgunthorpe at obsidianresearch.com  Wed Feb  7 15:33:57 2007
From: jgunthorpe at obsidianresearch.com (Jason Gunthorpe)
Date: Wed, 7 Feb 2007 16:33:57 -0700
Subject: [openib-general] [PATCH] IPOIB: Use a GRH when appropriate for
 unicast packets
In-Reply-To: <45CA532E.3050605@ichips.intel.com>
References: <20070126000319.GA12386@obsidianresearch.com>
	<ada4pqel66k.fsf@cisco.com>
	<20070126180840.GD12386@obsidianresearch.com>
	<45CA2084.7090503@ichips.intel.com>
	<20070207191154.GC11411@obsidianresearch.com>
	<45CA3568.1000508@ichips.intel.com> <ada7iutfy76.fsf@cisco.com>
	<20070207220304.GE11411@obsidianresearch.com>
	<45CA532E.3050605@ichips.intel.com>
Message-ID: <20070207233357.GG11411@obsidianresearch.com>

On Wed, Feb 07, 2007 at 02:31:10PM -0800, Sean Hefty wrote:

> >I agree that special casing some IPv6 addresses is a bad idea. It
> >needs to be integrated correctly with NET and the routing table/etc
 
> I haven't given this more than a few minutes of thought, but I was thinking 
> more along the lines of a port having an assigned GID that's the same as an
> assigned IPv6 address.  (Is there some reason this wouldn't work?)  IP name 
> service resolution would map the name to the IPv6 address.  The mapping 
> from the IPv6 address to a GID would then be straightforward, as opposed to 
> using a mapping using ARP.

Right, I also like the idea of using DNS as a global GID name
service.

> If name service resolution gives me an IPv6 address that's off of the local
> subnet, but the ARP response gives me an address that's on the local 
> subnet, then I think we can assume that ARP was unsuccessful is resolving 
> the address to the remote GID.  (I.e. the GID should be for a router.)  If 
> this is true, then we need some other way to acquire the DGID.

This is where I think you have problems... Why would you ARP for an
off-subnet address? Why would the router answer?  You push the address
through the route table and ARP the router address that results.

All of that is why I think another netdevice is a tidy
solution. ping6/tcp/etc using this device would generate packets that
follow the same path as RMDA connections would. No special rules about
broadcast groups are required. The route table is used to instruct the
kernel what IPv6 prefixes are IB GIDs and which are not by associating
the output of the route with the ib0 device. The admins can use any
means to set that up. Something that looks like:

$ ip addr
1: ib0: <BROADCAST,MULTICAST,UP,10000> mtu 2048 qdisc pfifo_fast qlen 1000
    link/ib [my GID..]
    inet6 fe80::c2/64 scope link dynamic <<-- My LL GID
    inet6 2000::c2/64 scope global dynmaic  <<-- My GID

Both are maintained by the kernel.

$ ip -6 route
fe80::/64 dev ib0
2000::/64 dev ib0 src 2000::c2
2001::/64 dev ib0 src 2000::c2  <<-- Tells the kernel that 2001::/64
                                     is a GID and to use path records
                                     to do lookups at the SM
2002::/64 via fe80::a0 ib0 src 2000::c2 <<--- 2002::/64 is a GID
                                              but don't query the SM and
					      direct things to IB
					      router fe80::a0
$ ping6 -I ib0 2001::b1
 ^--- Generate packet structured as: LRH,GRH,ICMP6,PING_DATA
      Set the GRH.SGID to 2000::c2, DGID to 2001::b1 as per the route
      table
      Do a SM Path Record query for 2001::b1 and use that to set the LRH
$ ping6 -I ib0 2002::b1
 ^--- Generate packet structured as: LRH,GRH,ICMP6,PING_DATA
      Set the GRH.SGID to 2000::c2, DGID to 2002::b1 as per the route
      table
      Do a SM Path Record query for fe80::a0 and use that to set the
      LRH
$ traceroute6 -I ib0 2001::b1
 ^--- Same as the ping, except the IB router can capture the packet when
      the hop limit runs out an produce an ICMP error.

Note: In all three cases the LRH.LNH would be set to 1 (non-IBA raw
IPv6). RDMA CM would use the usual value of 3.

This also provides at least a mechanism, if not a full solution, to
the MTU problem. Linux already allows route entries to specify a MTU
and with closer integration of the raw IPV6 stuff it becomes possible
for routers to send ICMP6 errors as raw IPv6 packet and for Linux to
capture them and update the route. The ICMP6 errors are crucial to
having path MTU type functions converge quickly.

RDMA CM would use the same rules for addressing CM packets.

A further refinement would be to layer the entire path record query
mechanism in the kernel over this so that the admin has local control
over the IB routing table (if desired). A 2nd refinement would be to
use the ND cache of such an ib0 device as a local path record query
cache (again lets the admin see what is going on and override/discard
SA queries using the usual 'ip neigh' command). There might even be
good potential for sa replication using the already existing userspace
arpd stuff.

Overall I would just view something like this as further integrating
the IB stack with the existing rich services provided by NET rather
than trying to duplicate a small portion of them with seperate
interfaces. [For instance with something like this netlink could be
used instead of the sysfs probing for many cases]

But yes, it is a bit outside what the current framework envisions..

Jason


From rdreier at cisco.com  Wed Feb  7 15:41:43 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Wed, 07 Feb 2007 15:41:43 -0800
Subject: [openib-general] Immediate data question
In-Reply-To: <349DCDA352EACF42A0C49FA6DCEA84035DFAFF@G3W0634.americas.hpqcorp.net>
	(Changqing Tang's message of "Wed, 7 Feb 2007 22:28:48 -0000")
References: <6C2C79E72C305246B504CBA17B5500C905DC04@mtlexch01.mtl.com>
	<adaveigvg7q.fsf@cisco.com>
	<349DCDA352EACF42A0C49FA6DCEA840350AAC4@G3W0634.americas.hpqcorp.net>
	<adatzy0qmt3.fsf@cisco.com>
	<349DCDA352EACF42A0C49FA6DCEA840350B1B5@G3W0634.americas.hpqcorp.net>
	<ada7iuwp5rr.fsf@cisco.com>
	<349DCDA352EACF42A0C49FA6DCEA84035DF880@G3W0634.americas.hpqcorp.net>
	<adamz3pfym0.fsf@cisco.com>
	<349DCDA352EACF42A0C49FA6DCEA84035DFAFF@G3W0634.americas.hpqcorp.net>
Message-ID: <adahctxeds8.fsf@cisco.com>

    Changqing> What I mean is that, is there any performance penalty
    Changqing> for receiver's overall performance if RNR happens
    Changqing> continuously on one of the QP ?

Not for the receiver, but the sender will be severely slowed down by
having to wait for the RNR timeouts.


From rdreier at cisco.com  Wed Feb  7 15:43:40 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Wed, 07 Feb 2007 15:43:40 -0800
Subject: [openib-general] [PATCH] IPOIB: Use a GRH when appropriate for
 unicast packets
In-Reply-To: <20070207220304.GE11411@obsidianresearch.com> (Jason
	Gunthorpe's message of "Wed, 7 Feb 2007 15:03:04 -0700")
References: <20070126000319.GA12386@obsidianresearch.com>
	<ada4pqel66k.fsf@cisco.com>
	<20070126180840.GD12386@obsidianresearch.com>
	<45CA2084.7090503@ichips.intel.com>
	<20070207191154.GC11411@obsidianresearch.com>
	<45CA3568.1000508@ichips.intel.com> <ada7iutfy76.fsf@cisco.com>
	<20070207220304.GE11411@obsidianresearch.com>
Message-ID: <adad54ledoz.fsf@cisco.com>

    Jason> Basically, if IB routers are used, and the IPoIB feature of
    Jason> *not* spanning a subnet is used (for scalabililty?) then
    Jason> you need an alternate way to specify addresses to rdma cm.

You mean if the IB router is also an IP router for IPoIB?

Then I think there are some serious semantic problems to solve for the
RDMA CM -- because you are using an IP address to define a
destination, but since that address is on the other side of an IP
router, there's no way to know it even belongs to an IB port.

 - R.


From rdreier at cisco.com  Wed Feb  7 15:50:25 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Wed, 07 Feb 2007 15:50:25 -0800
Subject: [openib-general] sharing qp between user and kernel
In-Reply-To: <20070207223146.GA28637@osc.edu> (Pete Wyckoff's message of
	"Wed, 7 Feb 2007 17:31:46 -0500")
References: <20070207223146.GA28637@osc.edu>
Message-ID: <ada8xf9eddq.fsf@cisco.com>

    Pete> Before I dig into this anymore, do you expect this to work?
    Pete> Are there fundamental problems with QP sharing between user
    Pete> and kernel?  It would sure be nice not to have to stick the
    Pete> connection management aspects into the kernel.

No, I wouldn't expect this to work.  At first glance at least, yes,
there are fundamental problems.  Sharing a QP between user and
kernelspace, where userspace is doing full kernel bypass (as eg mthca
does -- there are NO system calls when doing post work request, poll
CQ and request CQ notification operations), seems like a huge
problem.  I don't see any way that the kernel can keep a consistent
view of the QP state unless userspace has to call into the kernel for
every operation, which would kill performance.

 - R.


From halr at voltaire.com  Wed Feb  7 16:19:56 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 07 Feb 2007 19:19:56 -0500
Subject: [openib-general] [PATCH] IPOIB: Use a GRH when appropriate for
 unicast packets
In-Reply-To: <45CA3568.1000508@ichips.intel.com>
References: <20070126000319.GA12386@obsidianresearch.com>
	<ada4pqel66k.fsf@cisco.com>
	<20070126180840.GD12386@obsidianresearch.com>
	<45CA2084.7090503@ichips.intel.com>
	<20070207191154.GC11411@obsidianresearch.com>
	<45CA3568.1000508@ichips.intel.com>
Message-ID: <1170893995.31538.23321.camel@hal.voltaire.com>

On Wed, 2007-02-07 at 15:24, Sean Hefty wrote:
> > I didn't get too far on getting CMA to work. Beyond the bad HopLimit
> > feild I was seeing Hal pointed out a number of problems in IBA that
> > would prevent it from working as is :<
> 
> I've started thinking about what it would take to get the rdma cm to work across 
> a router.  I think the rdma cm may need to treat IPv6 addresses as a GID for 
> this to work across subnets, versus trying to map an ipoib IP address to a GID 
> based on ARP.

An IB GID is IPv6 like but not an IPv6 address so I don't think this is
a good idea and don't see how you get around mapping IP addresses to
GIDs in an IB routed network given the way things are spec'd. I think
that the RDMA CM assumes a single IPoIB subnet. Does it work when the
destination is on another subnet ? I think there are some unaddressed
gateway issues here to make that work and these may have been punted
(during spec time). Arkady might be a good person to comment on this.

-- Hal

> - Sean
> 
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
> 


From halr at voltaire.com  Wed Feb  7 16:23:47 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 07 Feb 2007 19:23:47 -0500
Subject: [openib-general] [PATCH] IPOIB: Use a GRH when appropriate for
 unicast packets
In-Reply-To: <20070207213108.GD11411@obsidianresearch.com>
References: <20070126000319.GA12386@obsidianresearch.com>
	<ada4pqel66k.fsf@cisco.com>
	<20070126180840.GD12386@obsidianresearch.com>
	<45CA2084.7090503@ichips.intel.com>
	<20070207191154.GC11411@obsidianresearch.com>
	<45CA3568.1000508@ichips.intel.com>
	<20070207213108.GD11411@obsidianresearch.com>
Message-ID: <1170894226.31538.23544.camel@hal.voltaire.com>

On Wed, 2007-02-07 at 16:31, Jason Gunthorpe wrote:
> On Wed, Feb 07, 2007 at 12:24:08PM -0800, Sean Hefty wrote:
> > >I didn't get too far on getting CMA to work. Beyond the bad HopLimit
> > >feild I was seeing Hal pointed out a number of problems in IBA that
> > >would prevent it from working as is :<
> > 
> > I've started thinking about what it would take to get the rdma cm to work 
> > across a router.  I think the rdma cm may need to treat IPv6 addresses as a 
> > GID for this to work across subnets, versus trying to map an ipoib IP 
> > address to a GID based on ARP.
> 
> I don't think that is the main problem - though clearly the way things
> are now (for better or worse) rdma cm requires the IPoIB subnet to
> span all of the IB subnets.. The main problem with the protocol is in
> the LID selection for routed paths on the passive side. It can't rely
> on the active side to identify the lids if a router is involved.
> 
> One feature I've thought has been underused in IBA is the raw IPv6
> packet feature.

I thought raw support (including IPv6 header) although still in the spec
was largely deprecated as the CRC protection was deemed too weak.

-- Hal

>  It would be nice to have a linux netdev interface to
> be able to do IPv6 traffic using GID addressing. That would seem to me
> to be the natural way to bolt native GID addressing into rdma
> cm..
> 
> Jason
> 
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
> 


From halr at voltaire.com  Wed Feb  7 16:27:41 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 07 Feb 2007 19:27:41 -0500
Subject: [openib-general] [PATCH] IPOIB: Use a GRH when appropriate for
 unicast packets
In-Reply-To: <20070207224928.GF11411@obsidianresearch.com>
References: <20070126000319.GA12386@obsidianresearch.com>
	<ada4pqel66k.fsf@cisco.com>
	<20070126180840.GD12386@obsidianresearch.com>
	<45CA2084.7090503@ichips.intel.com>
	<20070207191154.GC11411@obsidianresearch.com>
	<45CA3568.1000508@ichips.intel.com>
	<20070207213108.GD11411@obsidianresearch.com>
	<45CA5573.80802@ichips.intel.com>
	<20070207224928.GF11411@obsidianresearch.com>
Message-ID: <1170894459.31538.23768.camel@hal.voltaire.com>

On Wed, 2007-02-07 at 17:49, Jason Gunthorpe wrote:
> On Wed, Feb 07, 2007 at 02:40:51PM -0800, Sean Hefty wrote:
> > Are you referring to the SLID in the CM REQ?  If so, I've been looking at 
> > this issue as well.  I simply cannot think of any way to come up with this 
> > LID, and my current solution is to punt this problem over to the passive 
> > side, which could use the SLID of the router that the CM REQ is received 
> > from.  If not, well, then I just rambled more than usual.
> 
> Yes, this is the problem.
> 
> The active side clearly cannot learn what the SLID of the passive
> side's router should be.
> 
> We don't want to have the routers snoop and alter CM GMPs.
> 
> The passive side cannot use information from the LRH to get the router
> LID since the LRH may not be reversible.
> 
> The only option seems to be to have the passive side do a path record
> query on a SGID in the CM REQ...
> 
> This is a spec problem unfortunately.

Yes and I would expect that this would be changed.

-- Hal

> 
> Jason
> 
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
> 


From jgunthorpe at obsidianresearch.com  Wed Feb  7 17:30:55 2007
From: jgunthorpe at obsidianresearch.com (Jason Gunthorpe)
Date: Wed, 7 Feb 2007 18:30:55 -0700
Subject: [openib-general] [PATCH] IPOIB: Use a GRH when appropriate for
 unicast packets
In-Reply-To: <1170894226.31538.23544.camel@hal.voltaire.com>
References: <20070126000319.GA12386@obsidianresearch.com>
	<ada4pqel66k.fsf@cisco.com>
	<20070126180840.GD12386@obsidianresearch.com>
	<45CA2084.7090503@ichips.intel.com>
	<20070207191154.GC11411@obsidianresearch.com>
	<45CA3568.1000508@ichips.intel.com>
	<20070207213108.GD11411@obsidianresearch.com>
	<1170894226.31538.23544.camel@hal.voltaire.com>
Message-ID: <20070208013055.GH11411@obsidianresearch.com>

On Wed, Feb 07, 2007 at 07:23:47PM -0500, Hal Rosenstock wrote:
> > One feature I've thought has been underused in IBA is the raw IPv6
> > packet feature.
> 
> I thought raw support (including IPv6 header) although still in the spec
> was largely deprecated as the CRC protection was deemed too weak.

I would envision using the raw support primarily for ICMP6. Ie
diganostics (ping/traceroute) and router messages (Packet to big, ICMP
Redirect, etc). Not to offset IPoIB as a high performance solution. In
this role the reduced MTU that you get because of CRC-16's limited
protection shouldn't be a big problem.

Jason


From sean.hefty at intel.com  Wed Feb  7 19:23:47 2007
From: sean.hefty at intel.com (Sean Hefty)
Date: Wed, 7 Feb 2007 19:23:47 -0800
Subject: [openib-general] [PATCH] IPOIB: Use a GRH when appropriate for
 unicast packets
In-Reply-To: <20070207233357.GG11411@obsidianresearch.com>
Message-ID: <000001c74b30$8bca8fd0$3dd4180a@amr.corp.intel.com>

>> If name service resolution gives me an IPv6 address that's off of the local
>> subnet, but the ARP response gives me an address that's on the local
>> subnet, then I think we can assume that ARP was unsuccessful is resolving
>> the address to the remote GID.  (I.e. the GID should be for a router.)  If
>> this is true, then we need some other way to acquire the DGID.
>
>This is where I think you have problems... Why would you ARP for an
>off-subnet address? Why would the router answer?  You push the address
>through the route table and ARP the router address that results.

I'm confusing myself.  I was considering different IB subnets, and trying to
determine whether they shared the same IP subnet.  The GIDs may have different
subnet prefixes, but the IP addresses may not, and I'm not sure how to relate
this back to using DNS.

>All of that is why I think another netdevice is a tidy
>solution. ping6/tcp/etc using this device would generate packets that
>follow the same path as RMDA connections would. No special rules about
>broadcast groups are required. The route table is used to instruct the
>kernel what IPv6 prefixes are IB GIDs and which are not by associating
>the output of the route with the ib0 device. The admins can use any
>means to set that up. Something that looks like:

At first glance, this seems like a decent approach to explore.

>But yes, it is a bit outside what the current framework envisions..

I'm fine with that.  My short-term objective is to enable basic router support
within the host stack, and I think I have an idea of what that takes.  I'd just
also like to have an idea of how an application could transfer data between
routed IB subnets, including providing a way for the application to locate a
given remote node.

- Sean


From mst at mellanox.co.il  Wed Feb  7 20:37:27 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Thu, 8 Feb 2007 06:37:27 +0200
Subject: [openib-general] RFC ofed 1 2 kernel file structure
In-Reply-To: <45CA5050.2070105@pathscale.com>
References: <6C2C79E72C305246B504CBA17B5500C905DC04@mtlexch01.mtl.com>
	<20070206150321.GA21776@mellanox.co.il>
	<20070207195914.GD12140@mellanox.co.il>
	<20070207221449.GL12140@mellanox.co.il>
	<45CA5050.2070105@pathscale.com>
Message-ID: <20070208043727.GP12140@mellanox.co.il>

> Quoting Bryan O'Sullivan <bos at pathscale.com>:
> Subject: Re: RFC ofed 1 2 kernel file structure
> 
> Michael S. Tsirkin wrote:
> 
> > All, pls try now.
> 
> This is similar in layout to the sort of tree we've used internally all 
> along, so it's fine by me.  One small problem: I don't like the 
> combination of lower and upper case names of makefile and Makefile in 
> the top-level directory.

ofed_1_2 has the same.

> Also, it's no longer obvious to me to tell what kernel version the 
> sources are pulled from.  I used to be able to check the top-level 
> Makefile or git history, but I no longer know what to look at.

This will be part of BUILD_ID.

-- 
MST


From mst at mellanox.co.il  Wed Feb  7 22:40:55 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Thu, 8 Feb 2007 08:40:55 +0200
Subject: [openib-general] more comments on cxgb3
Message-ID: <20070208064055.GR12140@mellanox.co.il>

OK, so I looked at cxgb3 some more.
To summarise my previous comments, I think the cxio hal layer needs to go to
make the code readable - if I understand correctly it is there for historical
reasons only.

I started looking at userspace/kernel interaction, and then
went over to other code under cxgb3 (but not core/).

- Consider a user that does e.g. create QP, but never calls mmap.
  Is there some code that will clean out the unclamed mmap object?
  I couldn't find it, and iwch_dealloc_ucontext does not seem to
  do anything with it.

- Passing physical address to userspace and back looks suspicios.
  Especially this:
                uresp.physaddr = virt_to_phys(chp->cq.queue);
  Could you elaborate on the design here? What are these phy addresses
  and how come userspace needs to know the phy address?
  You are not doing DMA by this address, by any chance?

- It seems that by passing in huge resource sizes, userspace will be able to
  drink up unlimited amounts of kernel memory.
  mthca handles this by using the mlock rlimit, should something be done here
  as well?
 
A couple of comments on PDBG macro.
- I'd like to suggest following the practice of prefixing macro names with module name
  (same goes for functions like get_mhp really) - unless they are local to file.

- You are using __FUNCTION__ a lot - it might be to just to kill it,
  messages are unique so you'll be able to locate the msg source anyway,
  save some kernel text and logs will be shorter. In any case I think
  __func__ is the recommended gcc way to get the name currently.

- comment near pr_debug definition in include/linux/kernel.h says:
	/* If you are writing a driver, please use dev_dbg instead */
  so it might be a good idea for PDBG to follow this rule.

- log messages do not look very informative to me.
  I also think they are a bit too many of them currently.
  For example, I do not think it is a good idea to print
  the kernel pointers out.

  For example, in code like the following:
	mhp = get_mhp(rhp, (sg_list[i].lkey) >> 8);
	if (!mhp) {
		PDBG("%s %d\n", __FUNCTION__, __LINE__);
		return -EIO;
	}

  might be better to say 
  "MR key XXX does not exist. Exiting.".
  -EIO also looks like a strange error code to return here, does it not?
  Maybe something like EINVAL would be more appropriate?

- I wonder about the names like get_mhp - what does "mhp" mean?
static inline struct iwch_mr *get_mhp(struct iwch_dev *rhp, u32 mmid)
{
        return idr_find(&rhp->mmidr, mmid);
}

Looks like it looks up an mr. Is that right? Maybe the name shouldbe changed
to convey this meaning.

- In the following code, what does "missing pdid check" mean?
/*
 * TBD: this is going to be moved to firmware. Missing pdid/qpid check for now.
 */
This sounds interesting.
Does this mean the code does not validate the PD currently?

I have the same question for:
        /* TBD: check perms */
in iwch_bind_mw.

BTW, does TBD stand for "To Be Done" here?
google says:
>Definitions of TBD on the Web:

    * To Be Determined, Defined, Decided.
      www.csr.com/ptot.htm

    * to be determined
      www.liberalsagainstterrorism.com/wiki/index.php/Counterinsurgency_Operations/Glossary

    * Treasury Board (Secretariat)
      www.psc-cfp.gc.ca/centres/definitions_and_notes_e.htm

    * The three letter abbreviation TBD may be/mean, depending on context: * an  acronym for "To Be Determined" ("...at a later point in time.", typically)* the Douglas Devastator, a US Navy torpedo bomber of World War II
      en.wikipedia.org/wiki/TBD

What is to be determined here?
Do you mean TODO really?

- iwch_sgl2pbl_map is used in several places, and seems a bit too big to be inline

Well, it's time to go do my day job now :)

Hope this helps,

-- 
MST


From monil at voltaire.com  Wed Feb  7 23:02:31 2007
From: monil at voltaire.com (Moni Levy)
Date: Thu, 8 Feb 2007 09:02:31 +0200
Subject: [openib-general] issues with compilation of ofed 1.2
In-Reply-To: <45C9EE31.2040602@voltaire.com>
References: <45C9EE31.2040602@voltaire.com>
Message-ID: <6a122cc00702072302s18c1c4b7i3f1e4a1b3f3d0381@mail.gmail.com>

Doug,
On 2/7/07, Yosef Etigin <yosefe at voltaire.com> wrote:
> 7. On RHAS5 beta 2, the setup requires sysfstuils-devel RPM which is not included in this distro.

Can you please help us with that ?

-- Moni

>
> --
> Yosef Etigin
> Alex Tabachnik
>


From vlad at lists.openfabrics.org  Thu Feb  8 02:24:24 2007
From: vlad at lists.openfabrics.org (vlad at lists.openfabrics.org)
Date: Thu,  8 Feb 2007 02:24:24 -0800 (PST)
Subject: [openib-general] ofa_1_2_kernel 20070208-0200 daily build status
Message-ID: <20070208102424.660EAE60808@openfabrics.org>

This email was generated automatically, please do not reply


Common build parameters:  --with-ipoib-mod --with-sdp-mod --with-srp-mod --with-user_mad-mod --with-user_access-mod --with-mthca-mod --with-core-mod --with-addr_trans-mod --with-cxgb3-mod 

Passed:
Passed on i686 with 2.6.15-23-server
Passed on i686 with linux-2.6.18
Passed on i686 with linux-2.6.19
Passed on i686 with linux-2.6.15
Passed on i686 with linux-2.6.17
Passed on i686 with linux-2.6.13
Passed on i686 with linux-2.6.14
Passed on i686 with linux-2.6.16
Passed on i686 with linux-2.6.12
Passed on powerpc with linux-2.6.19
Passed on x86_64 with linux-2.6.19
Passed on x86_64 with linux-2.6.17
Passed on powerpc with linux-2.6.17
Passed on x86_64 with linux-2.6.18
Passed on powerpc with linux-2.6.18
Passed on x86_64 with linux-2.6.15
Passed on x86_64 with linux-2.6.14
Passed on x86_64 with linux-2.6.16
Passed on x86_64 with linux-2.6.13
Passed on x86_64 with linux-2.6.12
Passed on ppc64 with linux-2.6.19
Passed on ia64 with linux-2.6.19
Passed on powerpc with linux-2.6.16
Passed on powerpc with linux-2.6.13
Passed on powerpc with linux-2.6.12
Passed on ppc64 with linux-2.6.16
Passed on ppc64 with linux-2.6.12
Passed on powerpc with linux-2.6.15
Passed on ppc64 with linux-2.6.17
Passed on ppc64 with linux-2.6.14
Passed on ia64 with linux-2.6.13
Passed on ppc64 with linux-2.6.18
Passed on ia64 with linux-2.6.12
Passed on ia64 with linux-2.6.14
Passed on ia64 with linux-2.6.15
Passed on ppc64 with linux-2.6.13
Passed on ia64 with linux-2.6.18
Passed on powerpc with linux-2.6.14
Passed on ia64 with linux-2.6.17
Passed on ppc64 with linux-2.6.15
Passed on ia64 with linux-2.6.16

Failed:
Build failed on ia64 with linux-2.6.16.21-0.8-default
Log:
/home/vlad/tmp/ofa_1_2_kernel-20070208-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core/addr.c:380: error: implicit declaration of function ‘register_netevent_notifier’
/home/vlad/tmp/ofa_1_2_kernel-20070208-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core/addr.c: In function ‘addr_cleanup’:
/home/vlad/tmp/ofa_1_2_kernel-20070208-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core/addr.c:386: error: implicit declaration of function ‘unregister_netevent_notifier’
make[4]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070208-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core/addr.o] Error 1
make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070208-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core] Error 2
make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070208-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband] Error 2
make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070208-0200_linux-2.6.16.21-0.8-default_ia64_check] Error 2
make[1]: Leaving directory `/home/vlad/kernel.org/ia64/linux-2.6.16.21-0.8-default'
make: *** [kernel] Error 2
----------------------------------------------------------------------------------


From amip at mellanox.co.il  Thu Feb  8 04:33:29 2007
From: amip at mellanox.co.il (Ami Perlmutter)
Date: Thu, 8 Feb 2007 14:33:29 +0200
Subject: [openib-general] bug in netpipe
Message-ID: <6C2C79E72C305246B504CBA17B5500C9C41CE2@mtlexch01.mtl.com>

Hi
I've been running netpipe over Infiniband's SDP and uncovered a race
when using the "-r" option.
The problem is when both sides close their sockets, the listening socket
is closed last, which allows a faster
client to try to connect to it before it closes. When this happens, next
time the client tries to use the socket it gets
a connection reset error.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070208/d2f2abf7/attachment.html>

From jsquyres at cisco.com  Thu Feb  8 05:35:13 2007
From: jsquyres at cisco.com (Jeff Squyres)
Date: Thu, 8 Feb 2007 08:35:13 -0500
Subject: [openib-general] Open MPI rpmbuild fails in OFED-1.2
In-Reply-To: <20070207194949.GB12140@mellanox.co.il>
References: <1170866522.6223.8.camel@vladsk-laptop>
	<7FDCD3BB-A76F-4C36-8939-4E7C634F0D86@cisco.com>
	<1170867620.6223.11.camel@vladsk-laptop>
	<212D8756-09B1-4637-ADCA-1CD8A535403A@cisco.com>
	<20070207194949.GB12140@mellanox.co.il>
Message-ID: <1A992437-34C5-4F97-8963-1C99876E0A50@cisco.com>

On Feb 7, 2007, at 2:49 PM, Michael S. Tsirkin wrote:

>> My $0.02: This is another in a growing list of issues reflecting the
>> whole "build everything in DESTDIR" is a problematic approach.
>
> I don't know much about RPM, and I am not exactly sure why are
> our source RPMs so complicated.

It's a combination of two things:

1) similar to what you said below, we have lots of software packages  
that are all dependent upon each other
--> this leads to a conglomeration of rpath's and shared library  
dependencies that are incorrect

2) we're trying to *use* the software when it is installed in the  
DESTDIR
--> this means that you have to put special-case in the software so  
that they look for support files in both the DESTDIR *and* the final  
installation directory

One way to think about what we're doing is making a disk image and  
then snapshotting RPMs from that disk image.  That's a natural  
candidate for chroot.

> However, with the plan configure/make we are able to
> build all openfabrics components within build directory,
> without any chroot tricks.
>
> So let's not give up yet, IMO it is very nice to be able to build in
> standard environment, without being root.

Yes, it's nice from the user perspective.  But it's fairly annoying  
from the developer point of view because you have to add all these  
special cases.

> Note that what is biting us here is mostly the large number of  
> modules:
> simple single-module packages don't have this problem - and this
> is really a design decision we took.

Understood.  I guess I'm asking whether all these special cases  
required to support the DESTDIR approach are a) worth it, b) going to  
take more time to get right (which end up being somewhat fragile)  
than to use a chroot/image-based approach.

Again, just my $0.02, and I might get shouted down.  But I thought  
I'd at least ask...  :-)

-- 
Jeff Squyres
Server Virtualization Business Unit
Cisco Systems


From mst at mellanox.co.il  Thu Feb  8 05:43:05 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Thu, 8 Feb 2007 15:43:05 +0200
Subject: [openib-general] Open MPI rpmbuild fails in OFED-1.2
In-Reply-To: <1A992437-34C5-4F97-8963-1C99876E0A50@cisco.com>
References: <1A992437-34C5-4F97-8963-1C99876E0A50@cisco.com>
Message-ID: <20070208134305.GC20183@mellanox.co.il>

> Quoting Jeff Squyres <jsquyres at cisco.com>:
> Subject: Re: Open MPI rpmbuild fails in OFED-1.2
> 
> On Feb 7, 2007, at 2:49 PM, Michael S. Tsirkin wrote:
> 
> >> My $0.02: This is another in a growing list of issues reflecting the
> >> whole "build everything in DESTDIR" is a problematic approach.
> >
> > I don't know much about RPM, and I am not exactly sure why are
> > our source RPMs so complicated.
> 
> It's a combination of two things:
> 
> 1) similar to what you said below, we have lots of software packages  
> that are all dependent upon each other
> --> this leads to a conglomeration of rpath's and shared library  
> dependencies that are incorrect
> 
> 2) we're trying to *use* the software when it is installed in the  
> DESTDIR
> --> this means that you have to put special-case in the software so  
> that they look for support files in both the DESTDIR *and* the final  
> installation directory

How do you mean, use?

Hmm. I guess my question is - this works fine when I run OFED's
configure script, why is SRPM so much more difficult?

-- 
MST


From jsquyres at cisco.com  Thu Feb  8 06:14:59 2007
From: jsquyres at cisco.com (Jeff Squyres)
Date: Thu, 8 Feb 2007 09:14:59 -0500
Subject: [openib-general] Open MPI rpmbuild fails in OFED-1.2
In-Reply-To: <20070208134305.GC20183@mellanox.co.il>
References: <1A992437-34C5-4F97-8963-1C99876E0A50@cisco.com>
	<20070208134305.GC20183@mellanox.co.il>
Message-ID: <4FE6D655-AF12-4EF5-A058-4CFC9597C12B@cisco.com>

On Feb 8, 2007, at 8:43 AM, Michael S. Tsirkin wrote:

>> 2) we're trying to *use* the software when it is installed in the
>> DESTDIR
>> --> this means that you have to put special-case in the software so
>> that they look for support files in both the DESTDIR *and* the final
>> installation directory
>
> How do you mean, use?

The easiest example is that MPI wrapper compilers are used to compile  
the MPI test suites (mpicc, etc.).  This means that OMPI's libraries  
and support files (e.g., help files, wrapper compiler data files)  
need to be found, even though they're not in their final installation  
locations.

> Hmm. I guess my question is - this works fine when I run OFED's
> configure script, why is SRPM so much more difficult?

It's not the single SRPM that is the problem.  We've had an OMPI SRPM  
that works fine for a long, long time.  A single DESTDIR build is no  
problem, especially for an Autoconf/Automake/Libtool-based project  
like Open MPI.

The problems are:

- libibverbs and other support libraries are in the DESTDIR when OMPI  
is built (but eventually will move).  So OMPI has to rpath *BOTH*  
locations for libibverbs (i.e., the DESTDIR and the final  
installdir), one of which will be a lie.  God help you if you're  
trying to build OFED on a machine where a previous version of OFED is  
installed -- i.e., where libibverbs exists in *BOTH* the DESTDIR and  
the final prefix!  (this specific problem actually caused me to waste  
a few hours while developing the new OMPI build stuff in build.sh  
last week)

- I didn't look closely at the OFED 1.2 build scripts yet, but we ran  
into problems during the development of OFED 1.1 where dependent OFA  
libraries needed to link to each other.  In OFED, those links were  
simply removed because of the whole DESTDIR/installdir duality.  This  
actually caused problems in some scenarios.  IIRC, the one I remember  
is that the link between libmthca and libibverbs was effectively  
removed by removing AC_CHECK_LIB from libmthca's configure.ac (recall  
that mthca uses some of the public symbols in libibverbs) because  
AC_CHECK_LIB was looking in the installdir.  That may not be 100%  
right -- I don't recall all the details.

- we *use* OMPI in the DESTDIR (and MVAPICH), as described above.  I  
had to  add a patch to the upcoming OMPI v1.2 community release to  
first examine the environment and look for a specific variable to re- 
root all of the compiled-in directories (it's too late in the OMPI  
v1.2 release process to put this patch in the official v1.2  
release).  What a pain.  :-\

So the OMPI path issue is resolvable (at the cost of adding a whole  
pile of code to OMPI), but the rpath issue is not.  Once you link an  
app, its rpaths are fixed and you can't change them based on an  
environment variable.  Hence, the only solution is to rpath *both*  
directories, but even that has problems and ambiguities (as described  
above).  In fairness, we could tell the user to set LD_LIBRARY_PATH,  
but no one seems to want to do that (users always screw it up, and it  
becomes problematic for rsh/ssh-based scenarios).

All this plus the fact that we're clearly going outside of what the  
SuSE RPM developers intend just indicates to me that this doesn't  
seem Right...

-- 
Jeff Squyres
Server Virtualization Business Unit
Cisco Systems


From halr at voltaire.com  Thu Feb  8 06:19:43 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 08 Feb 2007 09:19:43 -0500
Subject: [openib-general] [PATCH] OpenSM/osm_ucast_lash.c: In
 osm_get_lash_sl, fix SL when CA ports on same switch
Message-ID: <1170944383.31538.74632.camel@hal.voltaire.com>

OpenSM/osm_ucast_lash.c: In osm_get_lash_sl, fix SL when CA ports on same switch
    
This change resolves an issue with strange SL assignment when
two HCAs communicate with other and are on the same switch.
Since LASH is switch to switch routing, the get_lash_sl
function was casting 9999 (the value assigned to the
variable NONE) to be a uint8_t when asked for an SL assignment
in this case. This change resolves this issue.
    
Signed-off-by: Thomas Sødring <tsodring at simula.no>
Signed-off-by: Hal Rosenstock <halr at voltaire.com>

diff --git a/osm/opensm/osm_ucast_lash.c b/osm/opensm/osm_ucast_lash.c
index 5dfe068..e5f751c 100644
--- a/osm/opensm/osm_ucast_lash.c
+++ b/osm/opensm/osm_ucast_lash.c
@@ -1468,6 +1468,7 @@ uint8_t osm_get_lash_sl(osm_opensm_t *p_
 			osm_port_t *p_src_port, osm_port_t *p_dst_port)
 {
 	unsigned dst_id;
+	unsigned src_id;
 	osm_switch_t *p_sw;
 
 	if (p_osm->routing_engine.ucast_build_fwd_tables != lash_process)
@@ -1482,6 +1483,10 @@ uint8_t osm_get_lash_sl(osm_opensm_t *p_
 	if (!p_sw || !p_sw->priv)
 		return OSM_DEFAULT_SL;
 
+	src_id = get_lash_id(p_sw);
+	if (src_id == dst_id) 
+		return OSM_DEFAULT_SL;
+
 	return (uint8_t)((switch_t *)p_sw->priv)->routing_table[dst_id].lane;
 }
 

From ogerlitz at voltaire.com  Thu Feb  8 06:35:35 2007
From: ogerlitz at voltaire.com (Or Gerlitz)
Date: Thu, 08 Feb 2007 16:35:35 +0200
Subject: [openib-general] please pull for 2.6.21: fix + add IB multicast
 support
In-Reply-To: <45C85B39.4080700@voltaire.com>
References: <000701c741a6$16dc4760$ff0da8c0@amr.corp.intel.com>
	<45BF8E17.2010805@ichips.intel.com> <45C37BE9.5040105@ichips.intel.com>
	<45C85B39.4080700@voltaire.com>
Message-ID: <45CB3537.8060508@voltaire.com>

Or Gerlitz wrote:
> Sean Hefty wrote:

>>> Sean Hefty (3):
>>>        rdma_cm: Increment port number after close to avoid re-use.
>>>        ib_sa: track multicast join/leave requests
>>>        rdma_cm: add multicast communication support

>> Assuming that you haven't look at this yet, I updated the ib_sa patch 
>> above to shorten the workqueue name, plus added a fourth patch to 
>> shorten the workqueue names for ib_addr and rdma_cm.  E.g. "ib_mcast_wq" 
>> became "ib_mcast".

> Roland,

> We are working (developing and testing) with a userspace rdma cm based 
> multicast app over this code during the last two months and are very 
> satisfied with it. The testing included IPoIB, the user space app and 
> multicast interoperability between them.

Roland,

Can you comment on the status of this merge request?

thanks,

Or.


From swise at opengridcomputing.com  Thu Feb  8 06:57:19 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Thu, 08 Feb 2007 08:57:19 -0600
Subject: [openib-general] more comments on cxgb3
In-Reply-To: <20070208064055.GR12140@mellanox.co.il>
References: <20070208064055.GR12140@mellanox.co.il>
Message-ID: <1170946639.3049.10.camel@stevo-desktop>

On Thu, 2007-02-08 at 08:40 +0200, Michael S. Tsirkin wrote:
> OK, so I looked at cxgb3 some more.

Thanks!

> To summarise my previous comments, I think the cxio hal layer needs to go to
> make the code readable - if I understand correctly it is there for historical
> reasons only.
> 

I can do this but its low on the list of todos.


> I started looking at userspace/kernel interaction, and then
> went over to other code under cxgb3 (but not core/).
> 
> - Consider a user that does e.g. create QP, but never calls mmap.
>   Is there some code that will clean out the unclamed mmap object?
>   I couldn't find it, and iwch_dealloc_ucontext does not seem to
>   do anything with it.
> 

This is a bug.  I've got a fix for it too.  

> - Passing physical address to userspace and back looks suspicios.
>   Especially this:
>                 uresp.physaddr = virt_to_phys(chp->cq.queue);
>   Could you elaborate on the design here? What are these phy addresses
>   and how come userspace needs to know the phy address?
>   You are not doing DMA by this address, by any chance?
> 

No, Its not used for DMA by the HW.  The physaddr is passed up to the
user, and the user then mmaps() using that as the offset.

I took this code from the ipath driver.  It has been pointed out to me
that this is broken for 32b userspace on a 64 kernel because mmap2()
cannot pass down 64 bits.  So I need to rework this code.

> - It seems that by passing in huge resource sizes, userspace will be able to
>   drink up unlimited amounts of kernel memory.
>   mthca handles this by using the mlock rlimit, should something be done here
>   as well?

Hmm.  That's a good point.  I'll put this on the todo as well.  So mthca
adds to process's rlimit value as things are allocated out of kernel
memory (or maybe even anything that gets pinned)?

>  
> A couple of comments on PDBG macro.
> - I'd like to suggest following the practice of prefixing macro names with module name
>   (same goes for functions like get_mhp really) - unless they are local to file.
> 
> - You are using __FUNCTION__ a lot - it might be to just to kill it,
>   messages are unique so you'll be able to locate the msg source anyway,
>   save some kernel text and logs will be shorter. In any case I think
>   __func__ is the recommended gcc way to get the name currently.
> 
> - comment near pr_debug definition in include/linux/kernel.h says:
> 	/* If you are writing a driver, please use dev_dbg instead */
>   so it might be a good idea for PDBG to follow this rule.
> 
> - log messages do not look very informative to me.
>   I also think they are a bit too many of them currently.
>   For example, I do not think it is a good idea to print
>   the kernel pointers out.
> 
>   For example, in code like the following:
> 	mhp = get_mhp(rhp, (sg_list[i].lkey) >> 8);
> 	if (!mhp) {
> 		PDBG("%s %d\n", __FUNCTION__, __LINE__);
> 		return -EIO;
> 	}
> 
>   might be better to say 
>   "MR key XXX does not exist. Exiting.".
>   -EIO also looks like a strange error code to return here, does it not?
>   Maybe something like EINVAL would be more appropriate?
> 

I'll take a todo to clean up the debug stuff. 

> - I wonder about the names like get_mhp - what does "mhp" mean?
> static inline struct iwch_mr *get_mhp(struct iwch_dev *rhp, u32 mmid)
> {
>         return idr_find(&rhp->mmidr, mmid);
> }
> 

Memory Handle Pointer.


> Looks like it looks up an mr. Is that right? Maybe the name shouldbe changed
> to convey this meaning.
> 
> - In the following code, what does "missing pdid check" mean?
> /*
>  * TBD: this is going to be moved to firmware. Missing pdid/qpid check for now.
>  */
> This sounds interesting.
> Does this mean the code does not validate the PD currently?
> 

I need firmware support for this.  It will be done asap and I can remove
this code entirely.

> I have the same question for:
>         /* TBD: check perms */
> in iwch_bind_mw.
> 
> BTW, does TBD stand for "To Be Done" here?

Yes.

> Do you mean TODO really?
> 
> - iwch_sgl2pbl_map is used in several places, and seems a bit too big to be inline
> 
> Well, it's time to go do my day job now :)
> 
> Hope this helps,
> 

Thanks again Michael!

Steve.


From monis at voltaire.com  Thu Feb  8 07:00:07 2007
From: monis at voltaire.com (Moni Shoua)
Date: Thu, 08 Feb 2007 17:00:07 +0200
Subject: [openib-general] [PATCH] IB/ipoib get net_device from
 ipoib_neigh instead of linux neighbour
In-Reply-To: <20070207182426.GB9131@mellanox.co.il>
References: <45C9F1DE.8090409@voltaire.com>
	<20070207182426.GB9131@mellanox.co.il>
Message-ID: <45CB3AF7.5000909@voltaire.com>

Michael S. Tsirkin wrote:
>> Quoting Moni Shoua <monis at voltaire.com>:
>> Subject: Re: [PATCH] IB/ipoib get net_device from ipoib_neigh instead of linux neighbour
>>
>>
>>> Another concern: assume that one device goes away (e.g. hotplug).
>>> It seems that neighbours whose dev field point to another device, will not be destroyed.
>>> Correct?
>> I agree.
>>
>>> Therefore in your design, it seems that to_ipoib_neigh()->dev
>>> will get us a pointer to device that has been removed already.
>>>
>> I agree that this is a problem.
> 
> I think we can solve this if we track all ipoib neighbours, like we do for old kernels,
> and then flush ipoib neighbours on any hotplug event.
> Roland, does this sound too awful?
> 
>> It think it would be best to prevent an IPoIB device
>> from disappearing or from ib_ipoib from being unloaded as long as IPoIB
>> device is a slave. Unfortunately, I don't see how this can be done just
>> by fixing something in bonding or IPoIB. 
> 
> So hotplug is blocked potentially forever?
> This does not sound good.
OK, so I'm dropping this thought.
> 
>> However, any slave knows he has a master (dev->master). 
>> What do you think about a solution where IPoIB first tries to clean up the
>> neighbours that belong to it's master before deleting the IPoIB device?
> 
> How?
Let me know what do you think about that. I hope this makes sense.
in IPoIB, before calling unregister_netdev do
	for each kernel neighbour n
		if  n->dev == ib_dev->master
			delete n

Michael, as I see it we have to deal with 2 cases.
1. IPoIB device is deleted (unregister_netdev) - IPoIB netdev in not in the kernel's address space.
	we have to make sure that no one holds a pointer to it after it is deleted.
2 ib_ipoib module is unloaded (modprobe -r) - the ipoib_neigh_destructor is not in the kernel's address space.
	we have to make sure no one calls to it after the module is unloaded.
I think that if nothing prevents the execution of the "code" above it serves both cases.
Do you see any problem with that?
Do I have to maintain my own list of neighbours or use the kernel's arp table for that?

I am trying to study the neighbour cleanup function and do something like that but
I would be happy to learn from others as well.


>>>> Furthermore, bond_setup_by_slave is called only for non
>>>> Ethernet devices (we consider to change the logic to "called only for
>>>> IPoIB devices just for safety).
>>> Why is this necessary, BTW?
>>>
>> If we don't do that, we get a memory leak because the neigh destructor will
>> never be called for non IPoIB devices although they carry ipoib_neigh
>> with them.
> 
> How can this happen? If it does, I think we are back to where we started:
> to_ipoib_neigh is broken for non-IPoIB device.
> I thought you said only devices of the same type can be paired?
> 
> 
The scenario is:
1. kernel allocates a neighbour structure for bond0, puts it on a skb and passed it to bond xmit function.
2. bond0 passes the skb to ipoib
3. ipoib allocates ipoib_neigh and hangs it on linux neighbour. 
4. a while after that, the kernel wants to destroy the neighbour (cleanup) but 
doesn't call ipoib_neigh_destructor because it the neigh setup registered the destructor for ibX device.


From pw at osc.edu  Thu Feb  8 07:24:09 2007
From: pw at osc.edu (Pete Wyckoff)
Date: Thu, 8 Feb 2007 10:24:09 -0500
Subject: [openib-general] sharing qp between user and kernel
In-Reply-To: <ada8xf9eddq.fsf@cisco.com>
References: <20070207223146.GA28637@osc.edu> <ada8xf9eddq.fsf@cisco.com>
Message-ID: <20070208152409.GC31079@osc.edu>

rdreier at cisco.com wrote on Wed, 07 Feb 2007 15:50 -0800:
>     Pete> Before I dig into this anymore, do you expect this to work?
>     Pete> Are there fundamental problems with QP sharing between user
>     Pete> and kernel?  It would sure be nice not to have to stick the
>     Pete> connection management aspects into the kernel.
> 
> No, I wouldn't expect this to work.  At first glance at least, yes,
> there are fundamental problems.  Sharing a QP between user and
> kernelspace, where userspace is doing full kernel bypass (as eg mthca
> does -- there are NO system calls when doing post work request, poll
> CQ and request CQ notification operations), seems like a huge
> problem.  I don't see any way that the kernel can keep a consistent
> view of the QP state unless userspace has to call into the kernel for
> every operation, which would kill performance.

My hope was not to allow full QP sharing between user and kernel,
but just a limited interface to "send this kernel data now".  It
requires that the kernel register some physical memory, and submit
the work requests.  Perhaps the kernel can invoke the equivalent of
the userspace post function instead of trying to use the kernel API
for sending.

Thanks to all for the comments.  We'll keep working with non-bypass
devices to see if the approach offers any advantages first.

		-- Pete


From mst at mellanox.co.il  Thu Feb  8 07:29:47 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Thu, 8 Feb 2007 17:29:47 +0200
Subject: [openib-general] [PATCH] IB/ipoib_cm: fix up issues from code review
In-Reply-To: <adaveiesyrb.fsf@cisco.com>
References: <adaveiesyrb.fsf@cisco.com>
Message-ID: <20070208152947.GA6560@mellanox.co.il>

The following lightly tested patch addresses Roland's comments on IPoIB CM.
Applies on top of PATCHv6:

- Randomise RQ PSN
- Fix for modular IPv6
- MTU endian-ness fix for ICMPs
- Cosmetics

Signed-off-by: Michael S. Tsirkin <mst at mellanox.co.il>

---

Roland, do you want me to report the full fixed-up patch instead?

Pls let me know when IPoIB CM is in for-2.6.21,
I'll switch to that for my testing.

diff --git a/drivers/infiniband/ulp/ipoib/Kconfig b/drivers/infiniband/ulp/ipoib/Kconfig
index 0ffca11..af78ccc 100644
--- a/drivers/infiniband/ulp/ipoib/Kconfig
+++ b/drivers/infiniband/ulp/ipoib/Kconfig
@@ -1,6 +1,6 @@
 config INFINIBAND_IPOIB
 	tristate "IP-over-InfiniBand"
-	depends on INFINIBAND && NETDEVICES && INET
+	depends on INFINIBAND && NETDEVICES && INET && (IPV6 || IPV6=n)
 	---help---
 	  Support for the IP-over-InfiniBand protocol (IPoIB). This
 	  transports IP packets over InfiniBand so you can use your IB
diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h b/drivers/infiniband/ulp/ipoib/ipoib.h
index 8082d50..eb885ee 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib.h
+++ b/drivers/infiniband/ulp/ipoib/ipoib.h
@@ -127,7 +127,6 @@ struct ipoib_tx_buf {
 	u64		mapping;
 };
 
-#ifdef CONFIG_INFINIBAND_IPOIB_CM
 struct ib_cm_id;
 
 struct ipoib_cm_data {
@@ -181,7 +180,6 @@ struct ipoib_cm_dev_priv {
 	struct ib_recv_wr       rx_wr;
 };
 
-#endif
 /*
  * Device private locking: tx_lock protects members used in TX fast
  * path (and we use LLTX so upper layers don't do extra locking).
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_cm.c b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
index e7e7cc0..8ee6f06 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_cm.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
@@ -37,7 +37,7 @@
 #include <net/dst.h>
 #include <net/icmp.h>
 
-#ifdef CONFIG_IPV6
+#if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE)
 #include <linux/icmpv6.h>
 #endif
 
@@ -170,7 +170,8 @@ static struct ib_qp *ipoib_cm_create_rx_qp(struct net_device *dev,
 }
 
 static int ipoib_cm_modify_rx_qp(struct net_device *dev,
-				  struct ib_cm_id *cm_id, struct ib_qp *qp)
+				  struct ib_cm_id *cm_id, struct ib_qp *qp,
+				  unsigned psn)
 {
 	struct ipoib_dev_priv *priv = netdev_priv(dev);
 	struct ib_qp_attr qp_attr;
@@ -193,7 +194,7 @@ static int ipoib_cm_modify_rx_qp(struct net_device *dev,
 		ipoib_warn(priv, "failed to init QP attr for RTR: %d\n", ret);
 		return ret;
 	}
-	qp_attr.rq_psn = 0 /* FIXME */;
+	qp_attr.rq_psn = psn;
 	ret = ib_modify_qp(qp, &qp_attr, qp_attr_mask);
 	if (ret) {
 		ipoib_warn(priv, "failed to modify QP to RTR: %d\n", ret);
@@ -203,7 +204,8 @@ static int ipoib_cm_modify_rx_qp(struct net_device *dev,
 }
 
 static int ipoib_cm_send_rep(struct net_device *dev, struct ib_cm_id *cm_id,
-			     struct ib_qp *qp, struct ib_cm_req_event_param *req)
+			     struct ib_qp *qp, struct ib_cm_req_event_param *req,
+			     unsigned psn)
 {
 	struct ipoib_dev_priv *priv = netdev_priv(dev);
 	struct ipoib_cm_data data = {};
@@ -219,7 +221,7 @@ static int ipoib_cm_send_rep(struct net_device *dev, struct ib_cm_id *cm_id,
 	rep.target_ack_delay = 20; /* FIXME */
 	rep.srq = 1;
 	rep.qp_num = qp->qp_num;
-	rep.starting_psn = 0 /* FIXME */;
+	rep.starting_psn = psn;
 	return ib_send_cm_rep(cm_id, &rep);
 }
 
@@ -229,6 +231,7 @@ static int ipoib_cm_req_handler(struct ib_cm_id *cm_id, struct ib_cm_event *even
 	struct ipoib_dev_priv *priv = netdev_priv(dev);
 	struct ipoib_cm_rx *p;
 	unsigned long flags;
+	unsigned psn;
 	int ret;
 
 	ipoib_dbg(priv, "REQ arrived\n");
@@ -243,11 +246,12 @@ static int ipoib_cm_req_handler(struct ib_cm_id *cm_id, struct ib_cm_event *even
 		goto err_qp;
 	}
 
-	ret = ipoib_cm_modify_rx_qp(dev, cm_id, p->qp);
+	psn = random32() & 0xffffff;
+	ret = ipoib_cm_modify_rx_qp(dev, cm_id, p->qp, psn);
 	if (ret)
 		goto err_modify;
 
-	ret = ipoib_cm_send_rep(dev, cm_id, p->qp, &event->param.req_rcvd);
+	ret = ipoib_cm_send_rep(dev, cm_id, p->qp, &event->param.req_rcvd, psn);
 	if (ret) {
 		ipoib_warn(priv, "failed to send REP: %d\n", ret);
 		goto err_rep;
@@ -742,7 +746,7 @@ static int ipoib_cm_send_req(struct net_device *dev,
 	req.retry_count 	      = 0; /* RFC draft warns against retries */
 	req.rnr_retry_count 	      = 0; /* RFC draft warns against retries */
 	req.max_cm_retries 	      = 15;
-	req.srq 	              = 15;
+	req.srq 	              = 1;
 	return ib_send_cm_req(id, &req);
 }
 
@@ -1041,7 +1045,7 @@ static void ipoib_cm_skb_reap(struct work_struct *work)
 	struct sk_buff *skb;
 	unsigned long flags;
 
-	__be32 mtu = cpu_to_be32(priv->mcast_mtu);
+	unsigned mtu = priv->mcast_mtu;
 
 	spin_lock_irqsave(&priv->tx_lock, flags);
 	spin_lock(&priv->lock);
@@ -1050,7 +1054,7 @@ static void ipoib_cm_skb_reap(struct work_struct *work)
 		spin_unlock_irqrestore(&priv->tx_lock, flags);
 		if (skb->protocol == htons(ETH_P_IP))
 			icmp_send(skb, ICMP_DEST_UNREACH, ICMP_FRAG_NEEDED, htonl(mtu));
-#ifdef CONFIG_IPV6
+#if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE)
 		else if (skb->protocol == htons(ETH_P_IPV6))
 			icmpv6_send(skb, ICMPV6_PKT_TOOBIG, 0, mtu, dev);
 #endif
-- 
MST


From swise at opengridcomputing.com  Thu Feb  8 07:32:11 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Thu, 08 Feb 2007 09:32:11 -0600
Subject: [openib-general] sharing qp between user and kernel
In-Reply-To: <20070208152409.GC31079@osc.edu>
References: <20070207223146.GA28637@osc.edu> <ada8xf9eddq.fsf@cisco.com>
	<20070208152409.GC31079@osc.edu>
Message-ID: <1170948731.3049.24.camel@stevo-desktop>

On Thu, 2007-02-08 at 10:24 -0500, Pete Wyckoff wrote:
> rdreier at cisco.com wrote on Wed, 07 Feb 2007 15:50 -0800:
> >     Pete> Before I dig into this anymore, do you expect this to work?
> >     Pete> Are there fundamental problems with QP sharing between user
> >     Pete> and kernel?  It would sure be nice not to have to stick the
> >     Pete> connection management aspects into the kernel.
> > 
> > No, I wouldn't expect this to work.  At first glance at least, yes,
> > there are fundamental problems.  Sharing a QP between user and
> > kernelspace, where userspace is doing full kernel bypass (as eg mthca
> > does -- there are NO system calls when doing post work request, poll
> > CQ and request CQ notification operations), seems like a huge
> > problem.  I don't see any way that the kernel can keep a consistent
> > view of the QP state unless userspace has to call into the kernel for
> > every operation, which would kill performance.
> 
> My hope was not to allow full QP sharing between user and kernel,
> but just a limited interface to "send this kernel data now".  It
> requires that the kernel register some physical memory, and submit
> the work requests.  Perhaps the kernel can invoke the equivalent of
> the userspace post function instead of trying to use the kernel API
> for sending.
> 

You could map the kernel data into the user process and then have the
user process post the WR.  But the user process would have to have that
memory registered as part of a MR to post it.  It could be done though.
So basically instead of sharing QP, give your kernel module access
memory from a registered MR.  The kernel module can produce the data in
that memory then tell the user process to post the WR...
 

Steve


From mst at mellanox.co.il  Thu Feb  8 07:34:10 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Thu, 8 Feb 2007 17:34:10 +0200
Subject: [openib-general] more comments on cxgb3
In-Reply-To: <1170946639.3049.10.camel@stevo-desktop>
References: <1170946639.3049.10.camel@stevo-desktop>
Message-ID: <20070208153410.GB6560@mellanox.co.il>

> > - It seems that by passing in huge resource sizes, userspace will be able to
> >   drink up unlimited amounts of kernel memory.
> >   mthca handles this by using the mlock rlimit, should something be done here
> >   as well?
> 
> Hmm.  That's a good point.  I'll put this on the todo as well.  So mthca
> adds to process's rlimit value as things are allocated out of kernel
> memory (or maybe even anything that gets pinned)?

Yes. The code is actually in uverbs core, mthca uses that.

> > - I wonder about the names like get_mhp - what does "mhp" mean?
> > static inline struct iwch_mr *get_mhp(struct iwch_dev *rhp, u32 mmid)
> > {
> >         return idr_find(&rhp->mmidr, mmid);
> > }
> > 
> 
> Memory Handle Pointer.

hmm, what's a Handle? Maybe a better name can be found.

-- 
MST


From Arkady.Kanevsky at netapp.com  Thu Feb  8 07:43:16 2007
From: Arkady.Kanevsky at netapp.com (Kanevsky, Arkady)
Date: Thu, 8 Feb 2007 10:43:16 -0500
Subject: [openib-general] dapl broken for iWARP
Message-ID: <C98692FD98048C41885E0B0FACD9DFB803AFA4AF@exnane01.hq.netapp.com>

That is correct.
I am working with Krishna on it.
Expect patches soon.

By the way the problem is not DAPL specific
and so is a proposed solution.

There are 3 aspects of the solution.
One is APIs. We suggest that we do not augment these.
That is a connection requestor sets its QP
RDMA ORD and IRD.
When connection is established user can check the QP RDMA ORD and IRD
to see what he has now to use over the connection.
We may consider to extend QP attributes to support transport specific
parameters passing in the future.
For example, iWARP MPA CRC request.

Second is the semantic that CM provides.
The proposal is to match IBCM semantic.
That is CM guarantee that local IRD is >= remote ORD.
This guarantees that incoming RDMA Read requests will not overwhelm
the QP RDMA Read capabilities.
Again there is not changes to IBCM only to IWCM.
Notice that as part of this IWCM will pass down to driver and extract
from driver
needed info.

The final part is iWARP CM extension to exchange RDMA ORD, IRD.
This is similar to IBTA Annex for IP Addressing.
The harder part that this will eventually require IETF MPA spec
extension,
and the fact that MPA protocol is implemented in RNIC HW by many
vendors,
and hence can not be done by IWCM itself.

Thanks,

Arkady Kanevsky                       email: arkady at netapp.com
Network Appliance Inc.               phone: 781-768-5395
1601 Trapelo Rd. - Suite 16.        Fax: 781-895-1195
Waltham, MA 02451                   central phone: 781-768-5300
 

> -----Original Message-----
> From: Steve Wise [mailto:swise at opengridcomputing.com] 
> Sent: Wednesday, February 07, 2007 6:12 PM
> To: Arlin Davis
> Cc: openib-general
> Subject: Re: [openib-general] dapl broken for iWARP
> 
> On Wed, 2007-02-07 at 15:05 -0800, Arlin Davis wrote:
> > Steve Wise wrote:
> > 
> > >On Wed, 2007-02-07 at 14:02 -0600, Steve Wise wrote:
> > >  
> > >
> > >>Arlin,
> > >>
> > >>The OFED dapl code is assuming the responder_resources and 
> > >>initiator_depth passed up on a connection request event 
> are from the 
> > >>remote peer.  This doesn't happen for iWARP.  In the 
> current iWARP 
> > >>specifications, its up to the application to exchange this 
> > >>information somehow. So these are defaulting to 0 on the 
> server side 
> > >>of any dapl connection over iWARP.
> > >>
> > >>This is a fairly recent change, I think.  We need to come up with 
> > >>some way to deal with this for OFED 1.2 IMO.
> > >>    
> > >>
> > Yes, this was changed recently to sync up with the rdma_cm changes 
> > that exposed the values.
> > 
> > >>    
> > >>
> > >
> > >The IWCM could set these to the device max values for instance.
> > >  
> > >
> > That would work fine as long as you know the remote 
> settings will be 
> > equal or better. The provider just sets the min of local device max 
> > values and the remote values provided with the request.
> > 
> 
> I know Krishna Kumar is working on a solution for exchanging 
> this info in private data so the IWCM can "do the right 
> thing".  Stay tuned for a patch series to review for this.  
> But this functionality is definitely post OFED-1.2.  
> 
> 
> So for the OFED-1.2, I will set these to the device max in the IWCM.
> Assuming the other side is OFED 1.2 DAPL, then it will work fine.
> 
> Steve.
> 
> 
> 
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit 
> http://openib.org/mailman/listinfo/openib-general
> 


From swise at opengridcomputing.com  Thu Feb  8 07:49:16 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Thu, 08 Feb 2007 09:49:16 -0600
Subject: [openib-general] more comments on cxgb3
In-Reply-To: <20070208064055.GR12140@mellanox.co.il>
References: <20070208064055.GR12140@mellanox.co.il>
Message-ID: <1170949756.3049.32.camel@stevo-desktop>

> - Consider a user that does e.g. create QP, but never calls mmap.
>   Is there some code that will clean out the unclamed mmap object?
>   I couldn't find it, and iwch_dealloc_ucontext does not seem to
>   do anything with it.

BTW: Here is my fix for this.

-----

Clean up pending mmaps on ucontext deallocation.

From: Steve Wise <swise at opengridcomputing.com>

Free all pending mmap structs when the ucontext is deallocated.

Signed-off-by: Steve Wise <swise at opengridcomputing.com>
---

 drivers/infiniband/hw/cxgb3/iwch_provider.c |    1 +
 drivers/infiniband/hw/cxgb3/iwch_provider.h |   15 +++++++++++++++
 2 files changed, 16 insertions(+), 0 deletions(-)

diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.c b/drivers/infiniband/hw/cxgb3/iwch_provider.c
index db2b0a8..98568ee 100644
--- a/drivers/infiniband/hw/cxgb3/iwch_provider.c
+++ b/drivers/infiniband/hw/cxgb3/iwch_provider.c
@@ -99,6 +99,7 @@ static int iwch_dealloc_ucontext(struct 
 	struct iwch_dev *rhp = to_iwch_dev(context->device);
 	struct iwch_ucontext *ucontext = to_iwch_ucontext(context);
 	PDBG("%s context %p\n", __FUNCTION__, context);
+	free_mmaps(ucontext);
 	cxio_release_ucontext(&rhp->rdev, &ucontext->uctx);
 	kfree(ucontext);
 	return 0;
diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.h b/drivers/infiniband/hw/cxgb3/iwch_provider.h
index 1ede8a7..c8c07ee 100644
--- a/drivers/infiniband/hw/cxgb3/iwch_provider.h
+++ b/drivers/infiniband/hw/cxgb3/iwch_provider.h
@@ -199,6 +199,21 @@ struct iwch_mm_entry {
 	unsigned len;
 };
 
+static inline void free_mmaps(struct iwch_ucontext *ucontext)
+{
+	struct list_head *pos, *nxt;
+	struct iwch_mm_entry *mm;
+
+	spin_lock(&ucontext->mmap_lock);
+	list_for_each_safe(pos, nxt, &ucontext->mmaps) {
+		mm = list_entry(pos, struct iwch_mm_entry, entry);
+		list_del(&mm->entry);
+		kfree(mm);
+	}
+	spin_unlock(&ucontext->mmap_lock);
+	return;
+}
+
 static inline struct iwch_mm_entry *remove_mmap(struct iwch_ucontext *ucontext,
 						u64 addr, unsigned len)
 {


From ossrosch at linux.vnet.ibm.com  Thu Feb  8 09:20:10 2007
From: ossrosch at linux.vnet.ibm.com (Stefan Roscher)
Date: Thu, 8 Feb 2007 18:20:10 +0100
Subject: [openib-general] [PATCH ofed-1.2] libehca: fix build error with
 disable-libcheck option
Message-ID: <200702081820.10992.ossrosch@linux.vnet.ibm.com>

This patch fix libehca build errors if disable-libcheck option is choosen.


Signed-off-by: Stefan Roscher <stefan.roscher at de.ibm.com>
---


configure.in     |    3 +--
src/ehca_uinit.c |    2 +-
2 files changed, 2 insertions(+), 3 deletions(-)


diff -Nurp libehca_old/configure.in libehca_new/configure.in
--- libehca_old/configure.in	2007-02-08 17:42:09.000000000 +0100
+++ libehca_new/configure.in	2007-02-08 17:18:20.000000000 +0100
@@ -28,6 +28,7 @@ AC_CHECK_LIB(ibverbs, 
 dnl Checks for header files.
 AC_CHECK_HEADER(infiniband/driver.h, [],
     AC_MSG_ERROR([<infiniband/driver.h> not found.  libehca requires libibverbs.]))
+fi
 
 dnl Checks for library functions
 AC_CHECK_FUNCS(ibv_read_sysfs_file ibv_register_driver)
@@ -43,7 +44,6 @@ rm -f $dummy.c
 AM_CONDITIONAL(HAVE_IBV_DEVICE_LIBRARY_EXTENSION,
     test $IBV_DEVICE_LIBRARY_EXTENSION != IBV_DEVICE_LIBRARY_EXTENSION)
 AC_SUBST(IBV_DEVICE_LIBRARY_EXTENSION)
-fi
 
 dnl Checks for programs.
 AC_PROG_CC
@@ -55,4 +55,3 @@ if test "$disable_libcheck" == "yes"
 then
 echo "#define HAVE_IBV_READ_SYSFS_FILE 1" >> config.h
 fi
-
diff -Nurp libehca_old/src/ehca_uinit.c libehca_new/src/ehca_uinit.c
--- libehca_old/src/ehca_uinit.c	2007-02-08 17:42:09.000000000 +0100
+++ libehca_new/src/ehca_uinit.c	2007-02-08 17:18:20.000000000 +0100
@@ -55,7 +55,7 @@
 #include <fcntl.h>
 #endif
 
-#ifdef HAVE_SYSFS_LIBSYSFS_H
+#ifndef HAVE_IBV_REGISTER_DRIVER
 #include <sysfs/libsysfs.h>
 #endif


From ossrosch at linux.vnet.ibm.com  Thu Feb  8 09:32:14 2007
From: ossrosch at linux.vnet.ibm.com (Stefan Roscher)
Date: Thu, 8 Feb 2007 18:32:14 +0100
Subject: [openib-general] [PATCH ofed-1.2] ofa_user.spec: fix libehca
	directory structure
Message-ID: <200702081832.14862.ossrosch@linux.vnet.ibm.com>

Correct directory structure according to new driver loading scheme from libibverbs


Signed-off-by: Stefan Roscher <stefan.roscher at de.ibm.com>
---


--- ofa_user.spec_old	2007-02-08 09:03:33.000000000 -0800
+++ ofa_user.spec_new	2007-02-08 09:07:32.000000000 -0800
@@ -693,11 +693,11 @@ touch libosmvendor-devel-files
             /bin/ls -1 $RPM_BUILD_ROOT%{_libdir32}/ipathverbs*.a | sed -e "s@$RPM_BUILD_ROOT@@g" >> libipathverbs-devel-files
         fi
 
-        if ( /bin/ls $RPM_BUILD_ROOT%{_libdir32}/infiniband/libehca*so* > /dev/null 2>&1 ); then
-            /bin/ls -1 $RPM_BUILD_ROOT%{_libdir32}/infiniband/libehca*so* | sed -e "s@$RPM_BUILD_ROOT@@g" > libehca-files
+        if ( /bin/ls $RPM_BUILD_ROOT%{_libdir32}/libehca*so* > /dev/null 2>&1 ); then
+            /bin/ls -1 $RPM_BUILD_ROOT%{_libdir32}/libehca*so* | sed -e "s@$RPM_BUILD_ROOT@@g" > libehca-files
         fi
-        if ( /bin/ls $RPM_BUILD_ROOT%{_libdir32}/infiniband/libehca*.a > /dev/null 2>&1 ); then
-            /bin/ls -1 $RPM_BUILD_ROOT%{_libdir32}/infiniband/libehca*.a | sed -e "s@$RPM_BUILD_ROOT@@g" >> libehca-devel-files
+        if ( /bin/ls $RPM_BUILD_ROOT%{_libdir32}/libehca*.a > /dev/null 2>&1 ); then
+            /bin/ls -1 $RPM_BUILD_ROOT%{_libdir32}/libehca*.a | sed -e "s@$RPM_BUILD_ROOT@@g" >> libehca-devel-files
         fi
 
         if ( /bin/ls $RPM_BUILD_ROOT%{_libdir32}/libibcommon*so.* > /dev/null 2>&1 ); then
@@ -1165,14 +1165,14 @@ fi
 %if %{build_libehca}
 %files -n libehca -f libehca-files
 %defattr(-,root,root,-)
-%{_libdir}/infiniband/libehca*.so*
+%{_libdir}/libehca*.so*
 # %doc AUTHORS COPYING ChangeLog README
 %endif
 
 %if %{build_libehca_devel}
 %files -n libehca-devel -f libehca-devel-files
 %defattr(-,root,root,-)
-%{_libdir}/infiniband/libehca*.a
+%{_libdir}/libehca*.a
 %endif
 
 %if %{build_libsdp}
 

From sashak at voltaire.com  Thu Feb  8 09:45:53 2007
From: sashak at voltaire.com (Sasha Khapyorsky)
Date: Thu, 8 Feb 2007 19:45:53 +0200
Subject: [openib-general] [Fwd: Re: win related [was: Re: [PATCH 1/2]
 opensm: sigusr1: syslog() fixes]]
In-Reply-To: <45B33135.4010606@dev.mellanox.co.il>
References: <45B33135.4010606@dev.mellanox.co.il>
Message-ID: <20070208174553.GT22807@sashak.voltaire.com>

On 11:24 Sun 21 Jan     , Yevgeny Kliteynik wrote:
> Tzachi, Yossi, please join the thread.
> What do you think about distributing a copy of the pthread DLL 
> with opensm?

Any news here? Thanks.

Sasha

> 
> -- Yevgeny.
> 
> -------- Original Message --------
> Subject: Re: win related [was: Re: [PATCH 1/2] opensm: sigusr1: syslog() fixes]
> Date: Fri, 19 Jan 2007 00:20:32 +0200
> From: Sasha Khapyorsky <sashak at voltaire.com>
> To: Michael S. Tsirkin <mst at mellanox.co.il>
> CC: Yevgeny Kliteynik <kliteyn at dev.mellanox.co.il>,        OPENIB <openib-general at openib.org>
> References: <20070118194403.GA23783 at sashak.voltaire.com> <20070118215023.GP9890 at mellanox.co.il>
> 
> On 23:50 Thu 18 Jan     , Michael S. Tsirkin wrote:
> > > Quoting Sasha Khapyorsky <sashak at voltaire.com>:
> > > Subject: Re: win related [was: Re: [PATCH 1/2] opensm: sigusr1: syslog() fixes]
> > > 
> > > On 07:00 Thu 18 Jan     , Michael S. Tsirkin wrote:
> > > > > What about pure opensource - http://sourceware.org/pthreads-win32/? It
> > > > > is licensed under LGPL, I see on the net many positive reports about
> > > > > stability and usability.
> > > > 
> > > > I used it to do a windows port of linux complib at some point and opensm
> > > > seemed to work fine with it. What it was lacking at that point was
> > > > support for 64 bit applications, and for some reason (which is
> > > > still unclear to me) there was a strong desire to run opensm in 64 bit mode.
> > > > Seems to have been fixed now, BTW.
> > > 
> > > So this seems to be good option for OpenSM on Windows. Right?
> > 
> > No idea. Distributing a copy of the pthread DLL with opensm does not
> > look like a problem. But is it worth it?
> 
> Sure, it makes windows porting much more transparent and let us to
> use standard *nix stuff w/out #ifndef WIN32. Other (generic) benefit
> is that posix is more standard and powerful than wrappers like complib.
> 
> Sasha
> 


From Kosygin'sHalifax at visionelectronics.com  Thu Feb  8 08:00:04 2007
From: Kosygin'sHalifax at visionelectronics.com (Zmeer Jawad)
Date: Thu, 8 Feb 2007 18:20:04 +0180
Subject: [openib-general] Aunt's complaint turned 'vagina' into 'hoohaa'
Message-ID: <F98OMHCE3JR_J371S_EX2FFV@visionelectronics.com>

Good tidings from QCPC give you the real alternative to hit the jackpot. 
QCPC is a company with far-sighted aims.

Company strategy is to diversify within the power supply marketplace 
and build strong, niche oriented operations around the globe. 
QCPC take a long-term view of business, focusing on growth and overall 
development of our subsidiaries in future.


Company has chosen one of solar power producer. 

As you know oil prices are rising higer and higer!
A great amount of electricity generating plants uses oil-products.
May be your domestic electrical power supplier or heat register works by using oil-products.
Modern technologies of solar extraction  also are very effective in bad light or sun and the accumulating
energy can be saved inside special batteries. That is why we can talk about full energy-independent house.

In century of high technologies we can't imagine town life without energy.  
A lot of states in our country has enough reserve of solar power to generate needed 
electric energy. More over the President realizes the important role of this policy and 
allocated $1 Billion to Renewable Energy.

U.S. Department of Energy FY 08 budget includes $179 million for the President's Initiative.
Particularly Solar America Initiative - $148 million;

QCPC is at the right time and place now. They have chosen promising line of activity for your 
share investments and we negotiated a contract with Samlex America, which has manufactured and distributed power supply products to more than 90 countries worldwide since 1991.
Pathfinding product designs, strict quality control, and responsible after sales service provide customers with high quality power conversion products at extremely competitive prices.

Because of this news the prices of the QCPC stocks are about to grow up. 
QCPC's financial condition is stable now. An overall market and economic conditions are also can better affect the performance of  the QCPC's shares.


From mshefty at ichips.intel.com  Thu Feb  8 10:23:11 2007
From: mshefty at ichips.intel.com (Sean Hefty)
Date: Thu, 08 Feb 2007 10:23:11 -0800
Subject: [openib-general] Problem is routing CM REQ was: Use a GRH when
 appropriate for unicast packets
In-Reply-To: <1170894459.31538.23768.camel@hal.voltaire.com>
References: <20070126000319.GA12386@obsidianresearch.com>
	<ada4pqel66k.fsf@cisco.com>
	<20070126180840.GD12386@obsidianresearch.com>
	<45CA2084.7090503@ichips.intel.com>
	<20070207191154.GC11411@obsidianresearch.com>
	<45CA3568.1000508@ichips.intel.com>
	<20070207213108.GD11411@obsidianresearch.com>
	<45CA5573.80802@ichips.intel.com>
	<20070207224928.GF11411@obsidianresearch.com>
	<1170894459.31538.23768.camel@hal.voltaire.com>
Message-ID: <45CB6A8F.2030705@ichips.intel.com>

>>The active side clearly cannot learn what the SLID of the passive
>>side's router should be.
>>
>>We don't want to have the routers snoop and alter CM GMPs.
>>
>>The passive side cannot use information from the LRH to get the router
>>LID since the LRH may not be reversible.
>>
>>The only option seems to be to have the passive side do a path record
>>query on a SGID in the CM REQ...
>>
>>This is a spec problem unfortunately.
> 
> 
> Yes and I would expect that this would be changed.

Looking at the problem more, I think that the issue extends to the remote port 
LID as well.  My expectation with a local path record query is that the SLID is 
the local port, and the DLID is the local router.  This should be sufficient for 
one-way UD traffic, but for connected traffic we still need to discover the 
remote router and remote port LIDs.

I think that we need a way for the local node to query the remote SA to obtain 
this information.  Or we need a new path record for routable paths that includes 
this information.

- Sean


From tzachid at mellanox.co.il  Thu Feb  8 10:31:17 2007
From: tzachid at mellanox.co.il (Tzachi Dar)
Date: Thu, 8 Feb 2007 20:31:17 +0200
Subject: [openib-general] [Fwd: Re: win related [was: Re: [PATCH 1/2]
 opensm: sigusr1: syslog() fixes]]
Message-ID: <6C2C79E72C305246B504CBA17B5500C9C41DBF@mtlexch01.mtl.com>

The windows open IB has decided on using a BSD only license. 
The common implementation of pthreads as far as I know is LGPL, which
means that it can not be used in open IB.

The only two ways that I see around this are 1) Change the license of
open IB windows which might be a complicated thing. 2) Find an
implementation of pthreads that is BSD.

Thanks
Tzachi

> -----Original Message-----
> From: Sasha Khapyorsky [mailto:sashak at voltaire.com] 
> Sent: Thursday, February 08, 2007 7:46 PM
> To: Tzachi Dar; Yossi Leybovich
> Cc: Yevgeny Kliteynik; OPENIB; Michael S. Tsirkin; Hal Rosenstock
> Subject: Re: [Fwd: Re: win related [was: Re: [PATCH 1/2] 
> opensm: sigusr1: syslog() fixes]]
> 
> On 11:24 Sun 21 Jan     , Yevgeny Kliteynik wrote:
> > Tzachi, Yossi, please join the thread.
> > What do you think about distributing a copy of the pthread DLL with 
> > opensm?
> 
> Any news here? Thanks.
> 
> Sasha
> 
> > 
> > -- Yevgeny.
> > 
> > -------- Original Message --------
> > Subject: Re: win related [was: Re: [PATCH 1/2] opensm: sigusr1: 
> > syslog() fixes]
> > Date: Fri, 19 Jan 2007 00:20:32 +0200
> > From: Sasha Khapyorsky <sashak at voltaire.com>
> > To: Michael S. Tsirkin <mst at mellanox.co.il>
> > CC: Yevgeny Kliteynik <kliteyn at dev.mellanox.co.il>,        
> OPENIB <openib-general at openib.org>
> > References: <20070118194403.GA23783 at sashak.voltaire.com> 
> > <20070118215023.GP9890 at mellanox.co.il>
> > 
> > On 23:50 Thu 18 Jan     , Michael S. Tsirkin wrote:
> > > > Quoting Sasha Khapyorsky <sashak at voltaire.com>:
> > > > Subject: Re: win related [was: Re: [PATCH 1/2] opensm: sigusr1: 
> > > > syslog() fixes]
> > > > 
> > > > On 07:00 Thu 18 Jan     , Michael S. Tsirkin wrote:
> > > > > > What about pure opensource - 
> > > > > > http://sourceware.org/pthreads-win32/? It is licensed under 
> > > > > > LGPL, I see on the net many positive reports about 
> stability and usability.
> > > > > 
> > > > > I used it to do a windows port of linux complib at some point 
> > > > > and opensm seemed to work fine with it. What it was 
> lacking at 
> > > > > that point was support for 64 bit applications, and for some 
> > > > > reason (which is still unclear to me) there was a 
> strong desire to run opensm in 64 bit mode.
> > > > > Seems to have been fixed now, BTW.
> > > > 
> > > > So this seems to be good option for OpenSM on Windows. Right?
> > > 
> > > No idea. Distributing a copy of the pthread DLL with 
> opensm does not 
> > > look like a problem. But is it worth it?
> > 
> > Sure, it makes windows porting much more transparent and 
> let us to use 
> > standard *nix stuff w/out #ifndef WIN32. Other (generic) benefit is 
> > that posix is more standard and powerful than wrappers like complib.
> > 
> > Sasha
> > 
> 


From jgunthorpe at obsidianresearch.com  Thu Feb  8 11:08:09 2007
From: jgunthorpe at obsidianresearch.com (Jason Gunthorpe)
Date: Thu, 8 Feb 2007 12:08:09 -0700
Subject: [openib-general] Problem is routing CM REQ was: Use a GRH when
 appropriate for unicast packets
In-Reply-To: <45CB6A8F.2030705@ichips.intel.com>
References: <ada4pqel66k.fsf@cisco.com>
	<20070126180840.GD12386@obsidianresearch.com>
	<45CA2084.7090503@ichips.intel.com>
	<20070207191154.GC11411@obsidianresearch.com>
	<45CA3568.1000508@ichips.intel.com>
	<20070207213108.GD11411@obsidianresearch.com>
	<45CA5573.80802@ichips.intel.com>
	<20070207224928.GF11411@obsidianresearch.com>
	<1170894459.31538.23768.camel@hal.voltaire.com>
	<45CB6A8F.2030705@ichips.intel.com>
Message-ID: <20070208190809.GL11411@obsidianresearch.com>

On Thu, Feb 08, 2007 at 10:23:11AM -0800, Sean Hefty wrote:
> >>The active side clearly cannot learn what the SLID of the passive
> >>side's router should be.
> >>
> >>We don't want to have the routers snoop and alter CM GMPs.
> >>
> >>The passive side cannot use information from the LRH to get the router
> >>LID since the LRH may not be reversible.
> >>
> >>The only option seems to be to have the passive side do a path record
> >>query on a SGID in the CM REQ...
> >>
> >>This is a spec problem unfortunately.
> >
> >
> >Yes and I would expect that this would be changed.
> 
> Looking at the problem more, I think that the issue extends to the remote 
> port LID as well.  My expectation with a local path record query is that 
> the SLID is the local port, and the DLID is the local router.  This should 
> be sufficient for one-way UD traffic, but for connected traffic we still 
> need to discover the remote router and remote port LIDs.

Hum, you mean to meet the LID validation rules of 9.6.1.5? That is a
huge PITA..

[IMHO, 9.6.1.5 C9-54 is a mistake, if there is a GRH then the LRH.SLID
 should not be validated against the QP context since it makes it
 extra hard for multipath routing and QoS to work...]

Here is one thought on how to do this:
To meet this rule each side of the CM must take the SLID from
the incoming LRH as the DLID for the connection. This SLID will be
one of the SLIDs for the local router. The other side doesn't need to
know what it is. The passive side will get the router SLID from the
REQ and the active side gets it from the ACK.

The passive side is easy, it just path record queries the DGID and
requests the DLID == the incoming LRH.SLID.

The nasty problem is with the active side - CMA will select a router
lid it uses as the DLID and the router may select a different LID for
it to use as the SLID when it processes the ACK. By C9-54 they have to
be the same :< So the active side might have to do another path record
query to move its DLID and SL to match the routers choosen
SLID. Double suck :P

Overarching all of this is some mechanism where the SM and all the
routers collaborate to keep the router SLID the same for the duration
of every RC flow. (One simple way would be to have the SM encode the
SLID it wants to router to pick in the Flow Label or TClass..)

Suck.

Another idea would be to encode the local router SLID in the flow
label and have the CM exchange and use asymetric flow labels.. That
would move control over SL selection into the endpoints and remove the
possible 2nd pathrecord query from the active side - but I haven't
looked if CM can exchange flow labels in the ACK..

> I think that we need a way for the local node to query the remote SA to 
> obtain this information.  Or we need a new path record for routable paths 
> that includes this information.

Being able to query doesn't really help matters since you still can't
tell the router what SLID to use.. The main idea is that the router
lid is only useful to the endpoint on the same subnet so there is no
reason to make the non-local side fetch it.

Jason


From sashak at voltaire.com  Thu Feb  8 11:46:56 2007
From: sashak at voltaire.com (Sasha Khapyorsky)
Date: Thu, 8 Feb 2007 21:46:56 +0200
Subject: [openib-general] [Fwd: Re: win related [was: Re: [PATCH 1/2]
 opensm: sigusr1: syslog() fixes]]
In-Reply-To: <6C2C79E72C305246B504CBA17B5500C9C41DBF@mtlexch01.mtl.com>
References: <6C2C79E72C305246B504CBA17B5500C9C41DBF@mtlexch01.mtl.com>
Message-ID: <20070208194656.GV22807@sashak.voltaire.com>

On 20:31 Thu 08 Feb     , Tzachi Dar wrote:
> The windows open IB has decided on using a BSD only license. 
> The common implementation of pthreads as far as I know is LGPL, which
> means that it can not be used in open IB.

Why not? AFAIK it works perfectly (see (5,6 and Preamble)):
http://www.gnu.org/copyleft/lesser.html

And of course there are tons of examples when BSD software links against
LGPLed glibc.

> The only two ways that I see around this are 1) Change the license of
> open IB windows which might be a complicated thing. 2) Find an
> implementation of pthreads that is BSD.

BTW, just wondering... What is relation between windows open IB and OFA
(and OFA's "dual-license rule")?

Sasha

> 
> Thanks
> Tzachi
> 
> > -----Original Message-----
> > From: Sasha Khapyorsky [mailto:sashak at voltaire.com] 
> > Sent: Thursday, February 08, 2007 7:46 PM
> > To: Tzachi Dar; Yossi Leybovich
> > Cc: Yevgeny Kliteynik; OPENIB; Michael S. Tsirkin; Hal Rosenstock
> > Subject: Re: [Fwd: Re: win related [was: Re: [PATCH 1/2] 
> > opensm: sigusr1: syslog() fixes]]
> > 
> > On 11:24 Sun 21 Jan     , Yevgeny Kliteynik wrote:
> > > Tzachi, Yossi, please join the thread.
> > > What do you think about distributing a copy of the pthread DLL with 
> > > opensm?
> > 
> > Any news here? Thanks.
> > 
> > Sasha
> > 
> > > 
> > > -- Yevgeny.
> > > 
> > > -------- Original Message --------
> > > Subject: Re: win related [was: Re: [PATCH 1/2] opensm: sigusr1: 
> > > syslog() fixes]
> > > Date: Fri, 19 Jan 2007 00:20:32 +0200
> > > From: Sasha Khapyorsky <sashak at voltaire.com>
> > > To: Michael S. Tsirkin <mst at mellanox.co.il>
> > > CC: Yevgeny Kliteynik <kliteyn at dev.mellanox.co.il>,        
> > OPENIB <openib-general at openib.org>
> > > References: <20070118194403.GA23783 at sashak.voltaire.com> 
> > > <20070118215023.GP9890 at mellanox.co.il>
> > > 
> > > On 23:50 Thu 18 Jan     , Michael S. Tsirkin wrote:
> > > > > Quoting Sasha Khapyorsky <sashak at voltaire.com>:
> > > > > Subject: Re: win related [was: Re: [PATCH 1/2] opensm: sigusr1: 
> > > > > syslog() fixes]
> > > > > 
> > > > > On 07:00 Thu 18 Jan     , Michael S. Tsirkin wrote:
> > > > > > > What about pure opensource - 
> > > > > > > http://sourceware.org/pthreads-win32/? It is licensed under 
> > > > > > > LGPL, I see on the net many positive reports about 
> > stability and usability.
> > > > > > 
> > > > > > I used it to do a windows port of linux complib at some point 
> > > > > > and opensm seemed to work fine with it. What it was 
> > lacking at 
> > > > > > that point was support for 64 bit applications, and for some 
> > > > > > reason (which is still unclear to me) there was a 
> > strong desire to run opensm in 64 bit mode.
> > > > > > Seems to have been fixed now, BTW.
> > > > > 
> > > > > So this seems to be good option for OpenSM on Windows. Right?
> > > > 
> > > > No idea. Distributing a copy of the pthread DLL with 
> > opensm does not 
> > > > look like a problem. But is it worth it?
> > > 
> > > Sure, it makes windows porting much more transparent and 
> > let us to use 
> > > standard *nix stuff w/out #ifndef WIN32. Other (generic) benefit is 
> > > that posix is more standard and powerful than wrappers like complib.
> > > 
> > > Sasha
> > > 
> > 


From rdreier at cisco.com  Thu Feb  8 11:47:22 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Thu, 08 Feb 2007 11:47:22 -0800
Subject: [openib-general] more comments on cxgb3
In-Reply-To: <1170949756.3049.32.camel@stevo-desktop> (Steve Wise's
	message of "Thu, 08 Feb 2007 09:49:16 -0600")
References: <20070208064055.GR12140@mellanox.co.il>
	<1170949756.3049.32.camel@stevo-desktop>
Message-ID: <ada4ppwo2id.fsf@cisco.com>

 > diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.c b/drivers/infiniband/hw/cxgb3/iwch_provider.c
 > index db2b0a8..98568ee 100644
 > --- a/drivers/infiniband/hw/cxgb3/iwch_provider.c
 > +++ b/drivers/infiniband/hw/cxgb3/iwch_provider.c
 > @@ -99,6 +99,7 @@ static int iwch_dealloc_ucontext(struct 
 >  	struct iwch_dev *rhp = to_iwch_dev(context->device);
 >  	struct iwch_ucontext *ucontext = to_iwch_ucontext(context);
 >  	PDBG("%s context %p\n", __FUNCTION__, context);
 > +	free_mmaps(ucontext);
 >  	cxio_release_ucontext(&rhp->rdev, &ucontext->uctx);
 >  	kfree(ucontext);
 >  	return 0;
 > diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.h b/drivers/infiniband/hw/cxgb3/iwch_provider.h
 > index 1ede8a7..c8c07ee 100644
 > --- a/drivers/infiniband/hw/cxgb3/iwch_provider.h
 > +++ b/drivers/infiniband/hw/cxgb3/iwch_provider.h
 > @@ -199,6 +199,21 @@ struct iwch_mm_entry {
 >  	unsigned len;
 >  };
 >  
 > +static inline void free_mmaps(struct iwch_ucontext *ucontext)
 > +{
 > +	struct list_head *pos, *nxt;
 > +	struct iwch_mm_entry *mm;
 > +
 > +	spin_lock(&ucontext->mmap_lock);
 > +	list_for_each_safe(pos, nxt, &ucontext->mmaps) {
 > +		mm = list_entry(pos, struct iwch_mm_entry, entry);
 > +		list_del(&mm->entry);
 > +		kfree(mm);
 > +	}
 > +	spin_unlock(&ucontext->mmap_lock);
 > +	return;
 > +}

Since you only have one caller, I would suggest just open-coding the
deletion at the call-site (since that function is really too big to
inline if it ever grows another caller).  And I don't think you need
the locking either, since there better be no one else looking at the
context structure while you're in the process of freeing it.

Something like:

 	struct iwch_dev *rhp = to_iwch_dev(context->device);
 	struct iwch_ucontext *ucontext = to_iwch_ucontext(context);
	struct iwch_mm_entry *mm, *tmp;

 	PDBG("%s context %p\n", __FUNCTION__, context);
	list_for_each_entry_safe(mm, tmp, &ucontext->mmaps)
		kfree(mm);
 	cxio_release_ucontext(&rhp->rdev, &ucontext->uctx);
 	kfree(ucontext);
 	return 0;

 - R.


From sean.hefty at intel.com  Thu Feb  8 11:54:38 2007
From: sean.hefty at intel.com (Sean Hefty)
Date: Thu, 8 Feb 2007 11:54:38 -0800
Subject: [openib-general] Problem is routing CM REQ was: Use a GRH when
 appropriate for unicast packets
In-Reply-To: <20070208190809.GL11411@obsidianresearch.com>
Message-ID: <000201c74bba$f72b7890$e598070a@amr.corp.intel.com>

>Hum, you mean to meet the LID validation rules of 9.6.1.5? That is a
>huge PITA..
>
>[IMHO, 9.6.1.5 C9-54 is a mistake, if there is a GRH then the LRH.SLID
> should not be validated against the QP context since it makes it
> extra hard for multipath routing and QoS to work...]

Yes - this gets messy.

>Here is one thought on how to do this:
>To meet this rule each side of the CM must take the SLID from
>the incoming LRH as the DLID for the connection. This SLID will be
>one of the SLIDs for the local router. The other side doesn't need to
>know what it is. The passive side will get the router SLID from the
>REQ and the active side gets it from the ACK.
>
>The passive side is easy, it just path record queries the DGID and
>requests the DLID == the incoming LRH.SLID.

This requires that the passive side be able to issue path record queries, but I
think that it could work for static routes.  A point was made to me that the
remote side could be a TCA without query capabilities.

There's still the issue of what value is carried in the remote port LID in the
CM REQ (12.7.21), and I haven't even gotten to APM yet...

>The nasty problem is with the active side - CMA will select a router
>lid it uses as the DLID and the router may select a different LID for
>it to use as the SLID when it processes the ACK. By C9-54 they have to
>be the same :< So the active side might have to do another path record
>query to move its DLID and SL to match the routers choosen
>SLID. Double suck :P

As long as the SA and local routers are in sync, we may be okay here without a
second path record query.

- Sean


From hnguyen at linux.vnet.ibm.com  Thu Feb  8 12:00:51 2007
From: hnguyen at linux.vnet.ibm.com (Hoang-Nam Nguyen)
Date: Thu, 8 Feb 2007 21:00:51 +0100
Subject: [openib-general] [PATCH ofed-1.2] libehca: fix build error with
 disable-libcheck option
In-Reply-To: <200702081820.10992.ossrosch@linux.vnet.ibm.com>
References: <200702081820.10992.ossrosch@linux.vnet.ibm.com>
Message-ID: <200702082100.51705.hnguyen@linux.vnet.ibm.com>

> This patch fix libehca build errors if disable-libcheck option is choosen.
Applied


From swise at opengridcomputing.com  Thu Feb  8 12:26:38 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Thu, 08 Feb 2007 14:26:38 -0600
Subject: [openib-general] [PATCH 2/5] No need to disable interrupts for mmap
	locking.
In-Reply-To: <20070208202634.4382.15287.stgit@dell3.ogc.int>
References: <20070208202634.4382.15287.stgit@dell3.ogc.int>
Message-ID: <20070208202638.4382.98241.stgit@dell3.ogc.int>

From: Steve Wise <swise at opengridcomputing.com>

Lock mmap_lock is never taken from non-process context, so just use
bare spin_lock()/spin_unlock().

Signed-off-by: Steve Wise <swise at opengridcomputing.com>
---

 drivers/infiniband/hw/cxgb3/iwch_provider.h |   10 +++++-----
 1 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.h b/drivers/infiniband/hw/cxgb3/iwch_provider.h
index a8cfeaf..1ede8a7 100644
--- a/drivers/infiniband/hw/cxgb3/iwch_provider.h
+++ b/drivers/infiniband/hw/cxgb3/iwch_provider.h
@@ -205,29 +205,29 @@ static inline struct iwch_mm_entry *remo
 	struct list_head *pos, *nxt;
 	struct iwch_mm_entry *mm;
 
-	spin_lock_irq(&ucontext->mmap_lock);
+	spin_lock(&ucontext->mmap_lock);
 	list_for_each_safe(pos, nxt, &ucontext->mmaps) {
 
 		mm = list_entry(pos, struct iwch_mm_entry, entry);
 		if (mm->addr == addr && mm->len == len) {
 			list_del_init(&mm->entry);
-			spin_unlock_irq(&ucontext->mmap_lock);
+			spin_unlock(&ucontext->mmap_lock);
 			PDBG("%s addr 0x%llx len %d\n", __FUNCTION__, mm->addr,
 			     mm->len);
 			return mm;
 		}
 	}
-	spin_unlock_irq(&ucontext->mmap_lock);
+	spin_unlock(&ucontext->mmap_lock);
 	return NULL;
 }
 
 static inline void insert_mmap(struct iwch_ucontext *ucontext,
 			       struct iwch_mm_entry *mm)
 {
-	spin_lock_irq(&ucontext->mmap_lock);
+	spin_lock(&ucontext->mmap_lock);
 	PDBG("%s addr 0x%llx len %d\n", __FUNCTION__, mm->addr, mm->len);
 	list_add_tail(&mm->entry, &ucontext->mmaps);
-	spin_unlock_irq(&ucontext->mmap_lock);
+	spin_unlock(&ucontext->mmap_lock);
 }
 
 enum iwch_qp_attr_mask {


From swise at opengridcomputing.com  Thu Feb  8 12:26:34 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Thu, 08 Feb 2007 14:26:34 -0600
Subject: [openib-general] [PATCH  0/5] iw_cxgb3 - misc cleanup and fixes
Message-ID: <20070208202634.4382.15287.stgit@dell3.ogc.int>


Here are some fixes to address various comments from Michael and Roland.

This is _not_ for ofed_1_2, but rather for merging into 2.6.21.


Steve.


From swise at opengridcomputing.com  Thu Feb  8 12:26:40 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Thu, 08 Feb 2007 14:26:40 -0600
Subject: [openib-general] [PATCH 3/5] Clean up pending mmaps on ucontext
	deallocation.
In-Reply-To: <20070208202634.4382.15287.stgit@dell3.ogc.int>
References: <20070208202634.4382.15287.stgit@dell3.ogc.int>
Message-ID: <20070208202640.4382.90592.stgit@dell3.ogc.int>

From: Steve Wise <swise at opengridcomputing.com>

Free all pending mmap structs when the ucontext is deallocated.

Signed-off-by: Steve Wise <swise at opengridcomputing.com>
---

 drivers/infiniband/hw/cxgb3/iwch_provider.c |    4 ++++
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.c b/drivers/infiniband/hw/cxgb3/iwch_provider.c
index db2b0a8..85484ac 100644
--- a/drivers/infiniband/hw/cxgb3/iwch_provider.c
+++ b/drivers/infiniband/hw/cxgb3/iwch_provider.c
@@ -98,7 +98,11 @@ static int iwch_dealloc_ucontext(struct 
 {
 	struct iwch_dev *rhp = to_iwch_dev(context->device);
 	struct iwch_ucontext *ucontext = to_iwch_ucontext(context);
+	struct iwch_mm_entry *mm, *tmp;
+
 	PDBG("%s context %p\n", __FUNCTION__, context);
+	list_for_each_entry_safe(mm, tmp, &ucontext->mmaps, entry)
+		kfree(mm);
 	cxio_release_ucontext(&rhp->rdev, &ucontext->uctx);
 	kfree(ucontext);
 	return 0;


From swise at opengridcomputing.com  Thu Feb  8 12:26:42 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Thu, 08 Feb 2007 14:26:42 -0600
Subject: [openib-general] [PATCH 4/5] Get rid of static rdev table.
In-Reply-To: <20070208202634.4382.15287.stgit@dell3.ogc.int>
References: <20070208202634.4382.15287.stgit@dell3.ogc.int>
Message-ID: <20070208202642.4382.43612.stgit@dell3.ogc.int>

From: Steve Wise <swise at opengridcomputing.com>

Use a liked list.

Signed-off-by: Steve Wise <swise at opengridcomputing.com>
---

 drivers/infiniband/hw/cxgb3/core/cxio_hal.c |   57 +++++++++------------------
 drivers/infiniband/hw/cxgb3/core/cxio_hal.h |    2 -
 2 files changed, 19 insertions(+), 40 deletions(-)

diff --git a/drivers/infiniband/hw/cxgb3/core/cxio_hal.c b/drivers/infiniband/hw/cxgb3/core/cxio_hal.c
index 2c4e351..acffe16 100644
--- a/drivers/infiniband/hw/cxgb3/core/cxio_hal.c
+++ b/drivers/infiniband/hw/cxgb3/core/cxio_hal.c
@@ -43,49 +43,28 @@ #include "cxio_hal.h"
 #include "cxgb3_offload.h"
 #include "sge_defs.h"
 
-static struct cxio_rdev *rdev_tbl[T3_MAX_NUM_RNIC];
+static LIST_HEAD(rdev_list);
 static cxio_hal_ev_callback_func_t cxio_ev_cb = NULL;
 
 static inline struct cxio_rdev *cxio_hal_find_rdev_by_name(char *dev_name)
 {
-	int i;
-	for (i = 0; i < T3_MAX_NUM_RNIC; i++)
-		if (rdev_tbl[i])
-			if (!strcmp(rdev_tbl[i]->dev_name, dev_name))
-				return rdev_tbl[i];
+	struct cxio_rdev *rdev;
+
+	list_for_each_entry(rdev, &rdev_list, entry)
+		if (!strcmp(rdev->dev_name, dev_name))
+			return rdev;
 	return NULL;
 }
 
 static inline struct cxio_rdev *cxio_hal_find_rdev_by_t3cdev(struct t3cdev
 							     *tdev)
 {
-	int i;
-	for (i = 0; i < T3_MAX_NUM_RNIC; i++)
-		if (rdev_tbl[i])
-			if (rdev_tbl[i]->t3cdev_p == tdev)
-				return rdev_tbl[i];
-	return NULL;
-}
-
-static inline int cxio_hal_add_rdev(struct cxio_rdev *rdev_p)
-{
-	int i;
-	for (i = 0; i < T3_MAX_NUM_RNIC; i++)
-		if (!rdev_tbl[i]) {
-			rdev_tbl[i] = rdev_p;
-			break;
-		}
-	return (i == T3_MAX_NUM_RNIC);
-}
+	struct cxio_rdev *rdev;
 
-static inline void cxio_hal_delete_rdev(struct cxio_rdev *rdev_p)
-{
-	int i;
-	for (i = 0; i < T3_MAX_NUM_RNIC; i++)
-		if (rdev_tbl[i] == rdev_p) {
-			rdev_tbl[i] = NULL;
-			break;
-		}
+	list_for_each_entry(rdev, &rdev_list, entry)
+		if (rdev->t3cdev_p == tdev)
+			return rdev;
+	return NULL;
 }
 
 int cxio_hal_cq_op(struct cxio_rdev *rdev_p, struct t3_cq *cq,
@@ -937,8 +916,7 @@ int cxio_rdev_open(struct cxio_rdev *rde
 		return -EINVAL;
 	}
 
-	if (cxio_hal_add_rdev(rdev_p))
-		return -ENOMEM;
+	list_add_tail(&rdev_p->entry, &rdev_list);
 
 	PDBG("%s opening rnic dev %s\n", __FUNCTION__, rdev_p->dev_name);
 	memset(&rdev_p->ctrl_qp, 0, sizeof(rdev_p->ctrl_qp));
@@ -1018,7 +996,7 @@ err3:
 err2:
 	cxio_hal_destroy_ctrl_qp(rdev_p);
 err1:
-	cxio_hal_delete_rdev(rdev_p);
+	list_del(&rdev_p->entry);
 	return err;
 }
 
@@ -1027,7 +1005,7 @@ void cxio_rdev_close(struct cxio_rdev *r
 	if (rdev_p) {
 		cxio_hal_pblpool_destroy(rdev_p);
 		cxio_hal_rqtpool_destroy(rdev_p);
-		cxio_hal_delete_rdev(rdev_p);
+		list_del(&rdev_p->entry);
 		rdev_p->t3cdev_p->ulp = NULL;
 		cxio_hal_destroy_ctrl_qp(rdev_p);
 		cxio_hal_destroy_resource(rdev_p->rscp);
@@ -1038,7 +1016,6 @@ int __init cxio_hal_init(void)
 {
 	if (cxio_hal_init_rhdl_resource(T3_MAX_NUM_RI))
 		return -ENOMEM;
-	memset(rdev_tbl, 0, T3_MAX_NUM_RNIC * sizeof(void *));
 	t3_register_cpl_handler(CPL_ASYNC_NOTIF, cxio_hal_ev_handler);
 	return 0;
 }
@@ -1046,9 +1023,11 @@ int __init cxio_hal_init(void)
 void __exit cxio_hal_exit(void)
 {
 	int i;
+	struct cxio_rdev *rdev, *tmp;
+
 	t3_register_cpl_handler(CPL_ASYNC_NOTIF, NULL);
-	for (i = 0; i < T3_MAX_NUM_RNIC; i++)
-		cxio_rdev_close(rdev_tbl[i]);
+	list_for_each_entry_safe(rdev, tmp, &rdev_list, entry)
+		cxio_rdev_close(rdev);
 	cxio_hal_destroy_rhdl_resource();
 }
 
diff --git a/drivers/infiniband/hw/cxgb3/core/cxio_hal.h b/drivers/infiniband/hw/cxgb3/core/cxio_hal.h
index d5ae282..8fb2999 100644
--- a/drivers/infiniband/hw/cxgb3/core/cxio_hal.h
+++ b/drivers/infiniband/hw/cxgb3/core/cxio_hal.h
@@ -47,7 +47,6 @@ #define T3_CTRL_QP_SIZE_LOG2  8
 #define T3_CTRL_CQ_ID    0
 
 /* TBD */
-#define T3_MAX_NUM_RNIC  8
 #define T3_MAX_NUM_RI (1<<15)
 #define T3_MAX_NUM_QP (1<<15)
 #define T3_MAX_NUM_CQ (1<<15)
@@ -106,6 +105,7 @@ struct cxio_rdev {
 	struct cxio_ucontext uctx;
 	struct gen_pool *pbl_pool;
 	struct gen_pool *rqt_pool;
+	struct list_head entry;
 };
 
 static inline int cxio_num_stags(struct cxio_rdev *rdev_p)


From swise at opengridcomputing.com  Thu Feb  8 12:26:44 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Thu, 08 Feb 2007 14:26:44 -0600
Subject: [openib-general] [PATCH 5/5] Hold the iwch device mutex around
	cxio_rdev_open().
In-Reply-To: <20070208202634.4382.15287.stgit@dell3.ogc.int>
References: <20070208202634.4382.15287.stgit@dell3.ogc.int>
Message-ID: <20070208202644.4382.75136.stgit@dell3.ogc.int>

From: Steve Wise <swise at opengridcomputing.com>

Signed-off-by: Steve Wise <swise at opengridcomputing.com>
---

 drivers/infiniband/hw/cxgb3/iwch.c |    4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/drivers/infiniband/hw/cxgb3/iwch.c b/drivers/infiniband/hw/cxgb3/iwch.c
index 0c95f2c..c353a9b 100644
--- a/drivers/infiniband/hw/cxgb3/iwch.c
+++ b/drivers/infiniband/hw/cxgb3/iwch.c
@@ -119,7 +119,10 @@ static void open_rnic_dev(struct t3cdev 
 	rnicp->rdev.ulp = rnicp;
 	rnicp->rdev.t3cdev_p = tdev;
 
+	mutex_lock(&dev_mutex);
+
 	if (cxio_rdev_open(&rnicp->rdev)) {
+		mutex_unlock(&dev_mutex);
 		printk(KERN_ERR MOD "Unable to open CXIO rdev\n");
 		ib_dealloc_device(&rnicp->ibdev);
 		return;
@@ -127,7 +130,6 @@ static void open_rnic_dev(struct t3cdev 
 
 	rnic_init(rnicp);
 
-	mutex_lock(&dev_mutex);
 	list_add_tail(&rnicp->entry, &dev_list);
 	mutex_unlock(&dev_mutex);
 

From halr at voltaire.com  Thu Feb  8 12:39:44 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 08 Feb 2007 15:39:44 -0500
Subject: [openib-general] Problem is routing CM REQ was: Use a GRH when
 appropriate for unicast packets
In-Reply-To: <000201c74bba$f72b7890$e598070a@amr.corp.intel.com>
References: <000201c74bba$f72b7890$e598070a@amr.corp.intel.com>
Message-ID: <1170967182.31538.96962.camel@hal.voltaire.com>

On Thu, 2007-02-08 at 14:54, Sean Hefty wrote:
> >Hum, you mean to meet the LID validation rules of 9.6.1.5? That is a
> >huge PITA..
> >
> >[IMHO, 9.6.1.5 C9-54 is a mistake, if there is a GRH then the LRH.SLID
> > should not be validated against the QP context since it makes it
> > extra hard for multipath routing and QoS to work...]
> 
> Yes - this gets messy.
> 
> >Here is one thought on how to do this:
> >To meet this rule each side of the CM must take the SLID from
> >the incoming LRH as the DLID for the connection. This SLID will be
> >one of the SLIDs for the local router. The other side doesn't need to
> >know what it is. The passive side will get the router SLID from the
> >REQ and the active side gets it from the ACK.
> >
> >The passive side is easy, it just path record queries the DGID and
> >requests the DLID == the incoming LRH.SLID.
> 
> This requires that the passive side be able to issue path record queries, but I
> think that it could work for static routes.  A point was made to me that the
> remote side could be a TCA without query capabilities.

Are you referring to SA query capabilities ? Would such a device just be
expected to work without change in an IB routed environment anyway ?

-- Hal

> 
> There's still the issue of what value is carried in the remote port LID in the
> CM REQ (12.7.21), and I haven't even gotten to APM yet...
> 
> >The nasty problem is with the active side - CMA will select a router
> >lid it uses as the DLID and the router may select a different LID for
> >it to use as the SLID when it processes the ACK. By C9-54 they have to
> >be the same :< So the active side might have to do another path record
> >query to move its DLID and SL to match the routers choosen
> >SLID. Double suck :P
> 
> As long as the SA and local routers are in sync, we may be okay here without a
> second path record query.
> 
> - Sean
> 
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
> 


From tziporet at mellanox.co.il  Thu Feb  8 13:18:06 2007
From: tziporet at mellanox.co.il (Tziporet Koren)
Date: Thu, 08 Feb 2007 23:18:06 +0200
Subject: [openib-general] new OFED 1.2 package
Message-ID: <45CB938E.5040305@mellanox.co.il>

New OFED package was uploaded to the OFA server:
http://www.openfabrics.org/builds/ofed-1.2/OFED-1.2-20070208-1508.tgz

Many of the issues reported on the previous version are resolved 
(bugzilla will be updated next week).

Since we had lab restructuring we did only basic tests on RHEL up4 and 
SLES10 (x86 and x86_64)

All - we are going for our weekend now.
Please report all issues you encounter so we will be able to fix and do 
the alpha release on Monday.

Thanks,
Tziporet & Vlad


From tzachid at mellanox.co.il  Thu Feb  8 13:24:08 2007
From: tzachid at mellanox.co.il (Tzachi Dar)
Date: Thu, 8 Feb 2007 23:24:08 +0200
Subject: [openib-general] [Fwd: Re: win related [was: Re: [PATCH 1/2]
 opensm: sigusr1: syslog() fixes]]
Message-ID: <6C2C79E72C305246B504CBA17B5500C9C41DD5@mtlexch01.mtl.com>

See bellow.

Thanks
Tzachi 

> -----Original Message-----
> From: Sasha Khapyorsky [mailto:sashak at voltaire.com] 
> Sent: Thursday, February 08, 2007 9:47 PM
> To: Tzachi Dar
> Cc: Yossi Leybovich; Gilad Shainer; Yevgeny Kliteynik; 
> OPENIB; Michael S. Tsirkin; Hal Rosenstock
> Subject: Re: [Fwd: Re: win related [was: Re: [PATCH 1/2] 
> opensm: sigusr1: syslog() fixes]]
> 
> On 20:31 Thu 08 Feb     , Tzachi Dar wrote:
> > The windows open IB has decided on using a BSD only license. 
> > The common implementation of pthreads as far as I know is 
> LGPL, which 
> > means that it can not be used in open IB.
> 
> Why not? AFAIK it works perfectly (see (5,6 and Preamble)):
> http://www.gnu.org/copyleft/lesser.html
> 
> And of course there are tons of examples when BSD software 
> links against LGPLed glibc.

I can of course write you an answer that will be more than 5 pages long
of why *I* don't think that 
Using GPL software is bad for everyone, but I guess that my opinion
doesn't really meter, so I
Won't do it.
The page that you have referenced is of the GNU org, and even there it
is hard to say that they
are trying to encourage you to use the LGPL license. In any case, the
main point is that 
When open IB windows was formed there was a general decision that it
will use BSD license. If we
Start having components with the LGPL this will break that decision, and
therefore this requires
some voting of the open IB organization.


> > The only two ways that I see around this are 1) Change the 
> license of 
> > open IB windows which might be a complicated thing. 2) Find an 
> > implementation of pthreads that is BSD.
> 
> BTW, just wondering... What is relation between windows open 
> IB and OFA (and OFA's "dual-license rule")?
Well, the way I see it one can take code from the Linux part under the
BSD licance and use it in 
The windows part. The otherway around seems fine to me but some say that
since the windows BSD liscance
Reqires that some text will always remain there, the other way around is
not possibale. As I'm not an 
Expert in that erea I don't know who is right.


> Sasha
> 
> > 
> > Thanks
> > Tzachi
> > 
> > > -----Original Message-----
> > > From: Sasha Khapyorsky [mailto:sashak at voltaire.com]
> > > Sent: Thursday, February 08, 2007 7:46 PM
> > > To: Tzachi Dar; Yossi Leybovich
> > > Cc: Yevgeny Kliteynik; OPENIB; Michael S. Tsirkin; Hal Rosenstock
> > > Subject: Re: [Fwd: Re: win related [was: Re: [PATCH 1/2]
> > > opensm: sigusr1: syslog() fixes]]
> > > 
> > > On 11:24 Sun 21 Jan     , Yevgeny Kliteynik wrote:
> > > > Tzachi, Yossi, please join the thread.
> > > > What do you think about distributing a copy of the pthread DLL 
> > > > with opensm?
> > > 
> > > Any news here? Thanks.
> > > 
> > > Sasha
> > > 
> > > > 
> > > > -- Yevgeny.
> > > > 
> > > > -------- Original Message --------
> > > > Subject: Re: win related [was: Re: [PATCH 1/2] opensm: sigusr1: 
> > > > syslog() fixes]
> > > > Date: Fri, 19 Jan 2007 00:20:32 +0200
> > > > From: Sasha Khapyorsky <sashak at voltaire.com>
> > > > To: Michael S. Tsirkin <mst at mellanox.co.il>
> > > > CC: Yevgeny Kliteynik <kliteyn at dev.mellanox.co.il>,        
> > > OPENIB <openib-general at openib.org>
> > > > References: <20070118194403.GA23783 at sashak.voltaire.com>
> > > > <20070118215023.GP9890 at mellanox.co.il>
> > > > 
> > > > On 23:50 Thu 18 Jan     , Michael S. Tsirkin wrote:
> > > > > > Quoting Sasha Khapyorsky <sashak at voltaire.com>:
> > > > > > Subject: Re: win related [was: Re: [PATCH 1/2] 
> opensm: sigusr1: 
> > > > > > syslog() fixes]
> > > > > > 
> > > > > > On 07:00 Thu 18 Jan     , Michael S. Tsirkin wrote:
> > > > > > > > What about pure opensource - 
> > > > > > > > http://sourceware.org/pthreads-win32/? It is licensed 
> > > > > > > > under LGPL, I see on the net many positive reports about
> > > stability and usability.
> > > > > > > 
> > > > > > > I used it to do a windows port of linux complib at some 
> > > > > > > point and opensm seemed to work fine with it. What it was
> > > lacking at
> > > > > > > that point was support for 64 bit applications, 
> and for some 
> > > > > > > reason (which is still unclear to me) there was a
> > > strong desire to run opensm in 64 bit mode.
> > > > > > > Seems to have been fixed now, BTW.
> > > > > > 
> > > > > > So this seems to be good option for OpenSM on 
> Windows. Right?
> > > > > 
> > > > > No idea. Distributing a copy of the pthread DLL with
> > > opensm does not
> > > > > look like a problem. But is it worth it?
> > > > 
> > > > Sure, it makes windows porting much more transparent and
> > > let us to use
> > > > standard *nix stuff w/out #ifndef WIN32. Other 
> (generic) benefit 
> > > > is that posix is more standard and powerful than 
> wrappers like complib.
> > > > 
> > > > Sasha
> > > > 
> > > 
> 


From Shainer at Mellanox.com  Thu Feb  8 13:34:37 2007
From: Shainer at Mellanox.com (Gilad Shainer)
Date: Thu, 8 Feb 2007 13:34:37 -0800
Subject: [openib-general] [Fwd: Re: win related [was: Re: [PATCH 1/2]
 opensm: sigusr1: syslog() fixes]]
Message-ID: <9FA59C95FFCBB34EA5E42C1A8573784F618167@mtiexch01.mti.com>

Windows Open IB is part of OpenFabrics. OpenFabrics includes Linux and
Windows communities. The Linux code is dual license while the Windows
code is BSD only.

Gilad.

  
-----Original Message-----
From: Tzachi Dar 
Sent: Thursday, February 08, 2007 1:24 PM
To: Sasha Khapyorsky
Cc: Yossi Leybovich; Gilad Shainer; Yevgeny Kliteynik; OPENIB; Michael
S. Tsirkin; Hal Rosenstock
Subject: RE: [Fwd: Re: win related [was: Re: [PATCH 1/2] opensm:
sigusr1: syslog() fixes]]

See bellow.

Thanks
Tzachi 

> -----Original Message-----
> From: Sasha Khapyorsky [mailto:sashak at voltaire.com]
> Sent: Thursday, February 08, 2007 9:47 PM
> To: Tzachi Dar
> Cc: Yossi Leybovich; Gilad Shainer; Yevgeny Kliteynik; OPENIB; Michael

> S. Tsirkin; Hal Rosenstock
> Subject: Re: [Fwd: Re: win related [was: Re: [PATCH 1/2]
> opensm: sigusr1: syslog() fixes]]
> 
> On 20:31 Thu 08 Feb     , Tzachi Dar wrote:
> > The windows open IB has decided on using a BSD only license. 
> > The common implementation of pthreads as far as I know is
> LGPL, which
> > means that it can not be used in open IB.
> 
> Why not? AFAIK it works perfectly (see (5,6 and Preamble)):
> http://www.gnu.org/copyleft/lesser.html
> 
> And of course there are tons of examples when BSD software links 
> against LGPLed glibc.

I can of course write you an answer that will be more than 5 pages long
of why *I* don't think that Using GPL software is bad for everyone, but
I guess that my opinion doesn't really meter, so I Won't do it.
The page that you have referenced is of the GNU org, and even there it
is hard to say that they are trying to encourage you to use the LGPL
license. In any case, the main point is that When open IB windows was
formed there was a general decision that it will use BSD license. If we
Start having components with the LGPL this will break that decision, and
therefore this requires some voting of the open IB organization.


> > The only two ways that I see around this are 1) Change the
> license of
> > open IB windows which might be a complicated thing. 2) Find an 
> > implementation of pthreads that is BSD.
> 
> BTW, just wondering... What is relation between windows open IB and 
> OFA (and OFA's "dual-license rule")?
Well, the way I see it one can take code from the Linux part under the
BSD licance and use it in The windows part. The otherway around seems
fine to me but some say that since the windows BSD liscance Reqires that
some text will always remain there, the other way around is not
possibale. As I'm not an Expert in that erea I don't know who is right.


> Sasha
> 
> > 
> > Thanks
> > Tzachi
> > 
> > > -----Original Message-----
> > > From: Sasha Khapyorsky [mailto:sashak at voltaire.com]
> > > Sent: Thursday, February 08, 2007 7:46 PM
> > > To: Tzachi Dar; Yossi Leybovich
> > > Cc: Yevgeny Kliteynik; OPENIB; Michael S. Tsirkin; Hal Rosenstock
> > > Subject: Re: [Fwd: Re: win related [was: Re: [PATCH 1/2]
> > > opensm: sigusr1: syslog() fixes]]
> > > 
> > > On 11:24 Sun 21 Jan     , Yevgeny Kliteynik wrote:
> > > > Tzachi, Yossi, please join the thread.
> > > > What do you think about distributing a copy of the pthread DLL 
> > > > with opensm?
> > > 
> > > Any news here? Thanks.
> > > 
> > > Sasha
> > > 
> > > > 
> > > > -- Yevgeny.
> > > > 
> > > > -------- Original Message --------
> > > > Subject: Re: win related [was: Re: [PATCH 1/2] opensm: sigusr1: 
> > > > syslog() fixes]
> > > > Date: Fri, 19 Jan 2007 00:20:32 +0200
> > > > From: Sasha Khapyorsky <sashak at voltaire.com>
> > > > To: Michael S. Tsirkin <mst at mellanox.co.il>
> > > > CC: Yevgeny Kliteynik <kliteyn at dev.mellanox.co.il>,        
> > > OPENIB <openib-general at openib.org>
> > > > References: <20070118194403.GA23783 at sashak.voltaire.com>
> > > > <20070118215023.GP9890 at mellanox.co.il>
> > > > 
> > > > On 23:50 Thu 18 Jan     , Michael S. Tsirkin wrote:
> > > > > > Quoting Sasha Khapyorsky <sashak at voltaire.com>:
> > > > > > Subject: Re: win related [was: Re: [PATCH 1/2]
> opensm: sigusr1: 
> > > > > > syslog() fixes]
> > > > > > 
> > > > > > On 07:00 Thu 18 Jan     , Michael S. Tsirkin wrote:
> > > > > > > > What about pure opensource - 
> > > > > > > > http://sourceware.org/pthreads-win32/? It is licensed 
> > > > > > > > under LGPL, I see on the net many positive reports about
> > > stability and usability.
> > > > > > > 
> > > > > > > I used it to do a windows port of linux complib at some 
> > > > > > > point and opensm seemed to work fine with it. What it was
> > > lacking at
> > > > > > > that point was support for 64 bit applications,
> and for some
> > > > > > > reason (which is still unclear to me) there was a
> > > strong desire to run opensm in 64 bit mode.
> > > > > > > Seems to have been fixed now, BTW.
> > > > > > 
> > > > > > So this seems to be good option for OpenSM on
> Windows. Right?
> > > > > 
> > > > > No idea. Distributing a copy of the pthread DLL with
> > > opensm does not
> > > > > look like a problem. But is it worth it?
> > > > 
> > > > Sure, it makes windows porting much more transparent and
> > > let us to use
> > > > standard *nix stuff w/out #ifndef WIN32. Other
> (generic) benefit
> > > > is that posix is more standard and powerful than
> wrappers like complib.
> > > > 
> > > > Sasha
> > > > 
> > > 
> 


From krause at cup.hp.com  Thu Feb  8 13:19:38 2007
From: krause at cup.hp.com (Michael Krause)
Date: Thu, 08 Feb 2007 13:19:38 -0800
Subject: [openib-general] Immediate data question
In-Reply-To: <adahctxeds8.fsf@cisco.com>
References: <6C2C79E72C305246B504CBA17B5500C905DC04@mtlexch01.mtl.com>
	<adaveigvg7q.fsf@cisco.com>
	<349DCDA352EACF42A0C49FA6DCEA840350AAC4@G3W0634.americas.hpqcorp.net>
	<adatzy0qmt3.fsf@cisco.com>
	<349DCDA352EACF42A0C49FA6DCEA840350B1B5@G3W0634.americas.hpqcorp.net>
	<ada7iuwp5rr.fsf@cisco.com>
	<349DCDA352EACF42A0C49FA6DCEA84035DF880@G3W0634.americas.hpqcorp.net>
	<adamz3pfym0.fsf@cisco.com>
	<349DCDA352EACF42A0C49FA6DCEA84035DFAFF@G3W0634.americas.hpqcorp.net>
	<adahctxeds8.fsf@cisco.com>
Message-ID: <6.2.0.14.2.20070208131610.085c52f8@esmail.cup.hp.com>

At 03:41 PM 2/7/2007, Roland Dreier wrote:
>     Changqing> What I mean is that, is there any performance penalty
>     Changqing> for receiver's overall performance if RNR happens
>     Changqing> continuously on one of the QP ?
>
>Not for the receiver, but the sender will be severely slowed down by
>having to wait for the RNR timeouts.

RNR = Receiver Not Ready so by definition, the data flow isn't going to 
progress until the receiver is ready to receive data.   If a receive QP 
enters RNR for a RC, then it is likely not progressing as desired.   RNR 
was initially put in place to enable a receiver to create back pressure to 
the sender without causing a fatal error condition.  It should rarely be 
entered and therefore should have negligible impact on overall performance 
however when a RNR occurs, no forward progress will occur so performance is 
essentially zero.

Mike 


From krause at cup.hp.com  Thu Feb  8 13:26:49 2007
From: krause at cup.hp.com (Michael Krause)
Date: Thu, 08 Feb 2007 13:26:49 -0800
Subject: [openib-general] dapl broken for iWARP
In-Reply-To: <C98692FD98048C41885E0B0FACD9DFB803AFA4AF@exnane01.hq.netap p.com>
References: <C98692FD98048C41885E0B0FACD9DFB803AFA4AF@exnane01.hq.netapp.com>
Message-ID: <6.2.0.14.2.20070208132315.08989298@esmail.cup.hp.com>

At 07:43 AM 2/8/2007, Kanevsky, Arkady wrote:
>That is correct.
>I am working with Krishna on it.
>Expect patches soon.
>
>By the way the problem is not DAPL specific
>and so is a proposed solution.
>
>There are 3 aspects of the solution.
>One is APIs. We suggest that we do not augment these.
>That is a connection requestor sets its QP
>RDMA ORD and IRD.
>When connection is established user can check the QP RDMA ORD and IRD
>to see what he has now to use over the connection.
>We may consider to extend QP attributes to support transport specific
>parameters passing in the future.
>For example, iWARP MPA CRC request.
>
>Second is the semantic that CM provides.
>The proposal is to match IBCM semantic.
>That is CM guarantee that local IRD is >= remote ORD.
>This guarantees that incoming RDMA Read requests will not overwhelm
>the QP RDMA Read capabilities.
>Again there is not changes to IBCM only to IWCM.
>Notice that as part of this IWCM will pass down to driver and extract
>from driver
>needed info.
>
>The final part is iWARP CM extension to exchange RDMA ORD, IRD.
>This is similar to IBTA Annex for IP Addressing.
>The harder part that this will eventually require IETF MPA spec extension,
>and the fact that MPA protocol is implemented in RNIC HW by many vendors,
>and hence can not be done by IWCM itself.

We looked at this quite a bit during the creation of the 
specification.   All of the targeted usage models exchange this information 
as part of their "hello" or login exchanges.    As such, the "hum" was to 
not change MPA to communicate such information and leave it to software to 
exchange these values through existing mechanisms.   I seriously doubt 
there will be much support for modifying the MPA specification at this 
stage since the implementations are largely complete and a modification 
would have to deal with the legacy interoperability issue which likely 
would be solved in software any way.  It would be simpler to simply modify 
the underlying DAPL implementation to exchange the information and keep 
this hidden from both the application and the RNIC providers.

Mike


>Thanks,
>
>Arkady Kanevsky                       email: arkady at netapp.com
>Network Appliance Inc.               phone: 781-768-5395
>1601 Trapelo Rd. - Suite 16.        Fax: 781-895-1195
>Waltham, MA 02451                   central phone: 781-768-5300
>
>
> > -----Original Message-----
> > From: Steve Wise [mailto:swise at opengridcomputing.com]
> > Sent: Wednesday, February 07, 2007 6:12 PM
> > To: Arlin Davis
> > Cc: openib-general
> > Subject: Re: [openib-general] dapl broken for iWARP
> >
> > On Wed, 2007-02-07 at 15:05 -0800, Arlin Davis wrote:
> > > Steve Wise wrote:
> > >
> > > >On Wed, 2007-02-07 at 14:02 -0600, Steve Wise wrote:
> > > >
> > > >
> > > >>Arlin,
> > > >>
> > > >>The OFED dapl code is assuming the responder_resources and
> > > >>initiator_depth passed up on a connection request event
> > are from the
> > > >>remote peer.  This doesn't happen for iWARP.  In the
> > current iWARP
> > > >>specifications, its up to the application to exchange this
> > > >>information somehow. So these are defaulting to 0 on the
> > server side
> > > >>of any dapl connection over iWARP.
> > > >>
> > > >>This is a fairly recent change, I think.  We need to come up with
> > > >>some way to deal with this for OFED 1.2 IMO.
> > > >>
> > > >>
> > > Yes, this was changed recently to sync up with the rdma_cm changes
> > > that exposed the values.
> > >
> > > >>
> > > >>
> > > >
> > > >The IWCM could set these to the device max values for instance.
> > > >
> > > >
> > > That would work fine as long as you know the remote
> > settings will be
> > > equal or better. The provider just sets the min of local device max
> > > values and the remote values provided with the request.
> > >
> >
> > I know Krishna Kumar is working on a solution for exchanging
> > this info in private data so the IWCM can "do the right
> > thing".  Stay tuned for a patch series to review for this.
> > But this functionality is definitely post OFED-1.2.
> >
> >
> > So for the OFED-1.2, I will set these to the device max in the IWCM.
> > Assuming the other side is OFED 1.2 DAPL, then it will work fine.
> >
> > Steve.
> >
> >
> >
> > _______________________________________________
> > openib-general mailing list
> > openib-general at openib.org
> > http://openib.org/mailman/listinfo/openib-general
> >
> > To unsubscribe, please visit
> > http://openib.org/mailman/listinfo/openib-general
> >
>
>_______________________________________________
>openib-general mailing list
>openib-general at openib.org
>http://openib.org/mailman/listinfo/openib-general
>
>To unsubscribe, please visit 
>http://openib.org/mailman/listinfo/openib-general


From krause at cup.hp.com  Thu Feb  8 13:36:34 2007
From: krause at cup.hp.com (Michael Krause)
Date: Thu, 08 Feb 2007 13:36:34 -0800
Subject: [openib-general] Problem is routing CM REQ was: Use a GRH when
 appropriate for unicast packets
In-Reply-To: <1170967182.31538.96962.camel@hal.voltaire.com>
References: <000201c74bba$f72b7890$e598070a@amr.corp.intel.com>
	<1170967182.31538.96962.camel@hal.voltaire.com>
Message-ID: <6.2.0.14.2.20070208133129.084a01e0@esmail.cup.hp.com>

At 12:39 PM 2/8/2007, Hal Rosenstock wrote:
>On Thu, 2007-02-08 at 14:54, Sean Hefty wrote:
> > >Hum, you mean to meet the LID validation rules of 9.6.1.5? That is a
> > >huge PITA..
> > >
> > >[IMHO, 9.6.1.5 C9-54 is a mistake, if there is a GRH then the LRH.SLID
> > > should not be validated against the QP context since it makes it
> > > extra hard for multipath routing and QoS to work...]

If you examine the prior diagram, the packet validation is quite precise 
and intent on catching any misrouted packets as early in the validation 
process as possible.  This particular compliance statement makes it clear 
as to the type of connection and how to pattern match.  The protocol was 
designed to work witin a single subnet as well as across subnets.  Hence, 
the GRH must be validated in conjunction with the LRH and the QP context in 
order to insure an intermediate component did not misroute the 
packet.    As described, a RC QP must flow through at most a single path at 
any given time in order to insure packet ordering is maintained (IB 
requires strong ordering so multi-path within a single RC is not 
allowed).   As for QoS, one can arbitrate a packet for a RC QP relative to 
other flows without any additional complexity.   If one wants to segregate 
a set of RC QP onto different paths as well as arbitration slots that is 
allowed and supported by the architecture even if going between the same 
set of ports - simply use multiple LID and SL during connection 
establishment.

Mike

> >
> > Yes - this gets messy.
> >
> > >Here is one thought on how to do this:
> > >To meet this rule each side of the CM must take the SLID from
> > >the incoming LRH as the DLID for the connection. This SLID will be
> > >one of the SLIDs for the local router. The other side doesn't need to
> > >know what it is. The passive side will get the router SLID from the
> > >REQ and the active side gets it from the ACK.
> > >
> > >The passive side is easy, it just path record queries the DGID and
> > >requests the DLID == the incoming LRH.SLID.
> >
> > This requires that the passive side be able to issue path record 
> queries, but I
> > think that it could work for static routes.  A point was made to me 
> that the
> > remote side could be a TCA without query capabilities.
>
>Are you referring to SA query capabilities ? Would such a device just be
>expected to work without change in an IB routed environment anyway ?
>
>-- Hal
>
> >
> > There's still the issue of what value is carried in the remote port LID 
> in the
> > CM REQ (12.7.21), and I haven't even gotten to APM yet...
> >
> > >The nasty problem is with the active side - CMA will select a router
> > >lid it uses as the DLID and the router may select a different LID for
> > >it to use as the SLID when it processes the ACK. By C9-54 they have to
> > >be the same :< So the active side might have to do another path record
> > >query to move its DLID and SL to match the routers choosen
> > >SLID. Double suck :P
> >
> > As long as the SA and local routers are in sync, we may be okay here 
> without a
> > second path record query.
> >
> > - Sean
> >
> > _______________________________________________
> > openib-general mailing list
> > openib-general at openib.org
> > http://openib.org/mailman/listinfo/openib-general
> >
> > To unsubscribe, please visit 
> http://openib.org/mailman/listinfo/openib-general
> >
>
>
>_______________________________________________
>openib-general mailing list
>openib-general at openib.org
>http://openib.org/mailman/listinfo/openib-general
>
>To unsubscribe, please visit 
>http://openib.org/mailman/listinfo/openib-general


From sashak at voltaire.com  Thu Feb  8 14:09:20 2007
From: sashak at voltaire.com (Sasha Khapyorsky)
Date: Fri, 9 Feb 2007 00:09:20 +0200
Subject: [openib-general] [Fwd: Re: win related [was: Re: [PATCH 1/2]
 opensm: sigusr1: syslog() fixes]]
In-Reply-To: <6C2C79E72C305246B504CBA17B5500C9C41DD5@mtlexch01.mtl.com>
References: <6C2C79E72C305246B504CBA17B5500C9C41DD5@mtlexch01.mtl.com>
Message-ID: <20070208220920.GY22807@sashak.voltaire.com>

On 23:24 Thu 08 Feb     , Tzachi Dar wrote:
> > > The windows open IB has decided on using a BSD only license. 
> > > The common implementation of pthreads as far as I know is 
> > LGPL, which 
> > > means that it can not be used in open IB.
> > 
> > Why not? AFAIK it works perfectly (see (5,6 and Preamble)):
> > http://www.gnu.org/copyleft/lesser.html
> > 
> > And of course there are tons of examples when BSD software 
> > links against LGPLed glibc.
> 
> I can of course write you an answer that will be more than 5 pages long
> of why *I* don't think that 
> Using GPL software is bad for everyone, but I guess that my opinion
> doesn't really meter, so I
> Won't do it.

I didn't mean to take it in this direction, sorry.

I reffered original LGPL text where stated explicitly that non-(L)GPL
programs can be linked against LGPLed libraries.

And again, there are lot of examples (Apache, Mozilla, Xorg, etc.) where
this works.

> The page that you have referenced is of the GNU org, and even there it
> is hard to say that they
> are trying to encourage you to use the LGPL license. In any case, the
> main point is that 
> When open IB windows was formed there was a general decision that it
> will use BSD license. If we
> Start having components with the LGPL this will break that decision, and
> therefore this requires
> some voting of the open IB organization.

You are not going to maintain win-pthread32 as OpenIB component, but
using this as third party. I think this should not be very different
from using native windows thread dll (which I guess is not BSD too).
I don't any LGPL issue here. Make sense?

Sasha


From mshefty at ichips.intel.com  Thu Feb  8 14:02:02 2007
From: mshefty at ichips.intel.com (Sean Hefty)
Date: Thu, 08 Feb 2007 14:02:02 -0800
Subject: [openib-general] Problem is routing CM REQ was: Use a GRH when
 appropriate for unicast packets
In-Reply-To: <1170967182.31538.96962.camel@hal.voltaire.com>
References: <000201c74bba$f72b7890$e598070a@amr.corp.intel.com>
	<1170967182.31538.96962.camel@hal.voltaire.com>
Message-ID: <45CB9DDA.8020303@ichips.intel.com>

>>This requires that the passive side be able to issue path record queries, but I
>>think that it could work for static routes.  A point was made to me that the
>>remote side could be a TCA without query capabilities.
> 
> Are you referring to SA query capabilities ? Would such a device just be
> expected to work without change in an IB routed environment anyway ?

Yes I was referring to SA query capability, such as a path record query.  Since 
the spec requires that the path information be provided by the active side, I 
think that such a device could work without change.  (But it does mean that the 
active side has to provide some way to obtain the necessary information to put 
into a CM REQ, plus know what the remote router will do.)

- Sean


From mst at mellanox.co.il  Thu Feb  8 14:20:31 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Fri, 9 Feb 2007 00:20:31 +0200
Subject: [openib-general] [Fwd: Re: win related [was: Re: [PATCH 1/2]
 opensm: sigusr1: syslog() fixes]]
In-Reply-To: <6C2C79E72C305246B504CBA17B5500C9C41DD5@mtlexch01.mtl.com>
References: <6C2C79E72C305246B504CBA17B5500C9C41DD5@mtlexch01.mtl.com>
Message-ID: <20070208222031.GD6560@mellanox.co.il>

> Well, the way I see it one can take code from the Linux part under the BSD
> licance and use it in The windows part. The otherway around seems fine to me but
> some say that since the windows BSD liscance Reqires that some text will always
> remain there, the other way around is not possibale. As I'm not an Expert in
> that erea I don't know who is right.

Interesting. Where does this idea come from? AFAIK BSD license is well known to be
GPL-compatible, so there should be no problem moving code in either direction.

-- 
MST


From mst at mellanox.co.il  Thu Feb  8 14:24:43 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Fri, 9 Feb 2007 00:24:43 +0200
Subject: [openib-general] [Fwd: Re: win related [was: Re: [PATCH 1/2]
 opensm: sigusr1: syslog() fixes]]
In-Reply-To: <20070208222031.GD6560@mellanox.co.il>
References: <6C2C79E72C305246B504CBA17B5500C9C41DD5@mtlexch01.mtl.com>
	<20070208222031.GD6560@mellanox.co.il>
Message-ID: <20070208222443.GE6560@mellanox.co.il>

> Quoting r. Michael S. Tsirkin <mst at mellanox.co.il>:
> Subject: Re: [Fwd: Re: win related [was: Re: [PATCH 1/2] opensm: sigusr1: syslog() fixes]]
> 
> > Well, the way I see it one can take code from the Linux part under the BSD
> > licance and use it in The windows part. The otherway around seems fine to me but
> > some say that since the windows BSD liscance Reqires that some text will always
> > remain there, the other way around is not possibale. As I'm not an Expert in
> > that erea I don't know who is right.
> 
> Interesting. Where does this idea come from?

Maybe this?
http://www.gnu.org/philosophy/bsd.html
Note that openib license does not include the advertising clause.

> AFAIK BSD license is well known to be
> GPL-compatible, so there should be no problem moving code in either direction.

-- 
MST


From dledford at redhat.com  Thu Feb  8 14:28:13 2007
From: dledford at redhat.com (Doug Ledford)
Date: Thu, 08 Feb 2007 17:28:13 -0500
Subject: [openib-general] issues with compilation of ofed 1.2
In-Reply-To: <6a122cc00702072302s18c1c4b7i3f1e4a1b3f3d0381@mail.gmail.com>
References: <45C9EE31.2040602@voltaire.com>
	<6a122cc00702072302s18c1c4b7i3f1e4a1b3f3d0381@mail.gmail.com>
Message-ID: <1170973693.19297.2.camel@firewall.xsintricity.com>

On Thu, 2007-02-08 at 09:02 +0200, Moni Levy wrote:
> Doug,
> On 2/7/07, Yosef Etigin <yosefe at voltaire.com> wrote:
> > 7. On RHAS5 beta 2, the setup requires sysfstuils-devel RPM which is not included in this distro.
> 
> Can you please help us with that ?

The value of the sysfsutils is far overshadowed by the value of libsysfs
(and libsysfs is far more commonly used).  So, in RHEL5, the rpm package
names reflect this:

libsysfs
sysfsutils (I think, might be libsysfs-utils)
libsysfs-devel

It's all still there, just a different name.

> -- Moni
> 
> >
> > --
> > Yosef Etigin
> > Alex Tabachnik
> >
-- 
Doug Ledford <dledford at redhat.com>
              GPG KeyID: CFBFF194
              http://people.redhat.com/dledford

Infiniband specific RPMs available at
              http://people.redhat.com/dledford/Infiniband
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070208/2140701d/attachment.sig>

From rowland at cse.ohio-state.edu  Thu Feb  8 14:24:24 2007
From: rowland at cse.ohio-state.edu (Shaun Rowland)
Date: Thu, 08 Feb 2007 17:24:24 -0500
Subject: [openib-general] Open MPI rpmbuild fails in OFED-1.2
In-Reply-To: <20070208134305.GC20183@mellanox.co.il>
References: <1A992437-34C5-4F97-8963-1C99876E0A50@cisco.com>
	<20070208134305.GC20183@mellanox.co.il>
Message-ID: <45CBA318.9030304@cse.ohio-state.edu>

Michael S. Tsirkin wrote:
>> Quoting Jeff Squyres <jsquyres at cisco.com>:

>> 2) we're trying to *use* the software when it is installed in the  
>> DESTDIR
>> --> this means that you have to put special-case in the software so  
>> that they look for support files in both the DESTDIR *and* the final  
>> installation directory

Either that, or fix your resulting package so that it will work with the
final installation directory case (not work with both), and then setup a
temporary environment that will allow it to work for the rest of the
build process. For the mpitests package being built against our RPM
result, this is the approach I took. It took me a little time to figure
out how to do this, because it is odd.

> How do you mean, use?

I assume this means linking against the libraries. In the mpitests RPM
build, it could mean using mpicc, etc. from the MPI package while it
itself is not working in its final destination directory. No one does
this sort of thing normally when building software packages from source
code.

> Hmm. I guess my question is - this works fine when I run OFED's
> configure script, why is SRPM so much more difficult?

Anyone can correct me if I am reading this wrong, but I've commented on
this at least once before - somewhat indirectly. I've built and
supported open source software at the OSU CSE department for a long time
- so I've built many different packages from source about a million
times. When you build a source code package, obviously you make sure the
necessary libraries can be found. These libraries are in some system
location - their "final installation directory". If this location is not
in the default search path, you can deal with that various ways or add
the path to the system's default. If your package builds its own
libraries and also uses them, then you deal with that yourself in your
package's build system. When you say OFED's configure script, this is
the situation I see.

Never have I purposely built the libraries required for a package in a
temporary location, built the package against those libraries, and then
moved everything to a final location. It makes things more difficult.
Take our SRPM for example. If you have OFED installed, I am mainly
concerned with the stack prefix, by default /usr/local/ofed. If I build
our code with the libraries in their final location, I don't have to
worry about subtle things like the various scripts having this path hard
coded in them. Most packages rightly make the assumption that these
paths you use are to dependencies that have already been installed, and
if there is some need to incorporate those paths into the package build
result, for whatever reason, it's all right to do that. In our SRPM, I
need to fix some things in the OFED installer case because the libraries
I am building against are not in their final location. These are things
that I do not have to do normally, however to be fair - I already have
to fix some things because of the RPM BuildRoot usage anyway because our
package is not installed in its final destination either in any RPM
building scenario. It only goes into its final destination directory
when you install the actual RPM. With RPMs, this is a good, safe way to
build _individual_ packages. In the SuSE case, the %build section is
assuming this and cleaning things up before you start, because - why
would there be anything in there BuildRoot already? Is this right or
wrong? That's a matter of opinion. There's information in various RPM
building resources that mention some of this stuff. Luckily this is not
a big deal for me to handle in our case. However, it could have been.
This is like a "bootstrap" situation, but it's not normally how one
would go and build some source code package on their system, and RPMs
are all about reproducible source code builds. Again, to be totally
fair, you wouldn't normally install your package into a BuildRoot
prefixed "prefix" location either, but that seems easier to deal with
than the location of libraries your package may depend on. And to go
even further, as in the mpitests RPM build case, would one normally
expect the RPM build result that is installed in a BuildRoot prefixed
"prefix" and just left there to even work? I would say it is absolutely
not safe to just assume that for any given RPM build. This all depends
on the source code you are trying to build and what it does exactly. Any
time you are using paths that don't reflect the final destination of
whatever dependency, you have the potential to have to deal with extra
work to fix the final result.

In the usual SRPM building case, the packages that your package depends
on would already be installed on the system in their final destination
directory. I could even require these RPM packages as build requirements
- something I am not doing in the spec file itself, yes? This would mean
that I could take a few steps out of my SRPM spec file. From what I am
reading here, this would be one reason to think of a chroot situation.
But it seems to me that this would make things potentially difficult.
Another option could be to go ahead and install the OFA packages before
the MPI packages. Either way would work for me because I've already
handled this DESTDIR situation (even for the mpitests being built
against our RPM result - because I leave it in DESTDIR after the RPM
build is done if this is all being done by the OFED installer). In
addition, if you were to follow this logic, the MPI packages would be
installed in their final location before the mpitests packages were
built against them. If this were done instead, I don't think chroot
would be required. However, it does mean the packages have to be
installed, and from what I've seen - in a 3 step process.

When I was first trying to make our SRPM work with the OFED install
scripts, this was the first thing I had to fix. And I definitely was not
expecting this type of situation. Now, perhaps I have misread this
thread and applied it to my own experience. If I have not misread it,
then I understand what the point is. This does not mean I advocate
changing it personally. From my testing, I've made this aspect of our
build work. I had no problems moving my %build section code to the
%install section. To me the question is if this is too difficult to deal
with. It depends on the package being built, but no standalone packages
I am aware of contain logic for "temporary paths" to dependencies, or
again to be fair in the RPM building scenario, a "temporary prefix".
None of what I am saying applies to a package that builds and uses its
own libraries though. In those cases, the developers obviously have to
deal with that.
-- 
Shaun Rowland	rowland at cse.ohio-state.edu
http://www.cse.ohio-state.edu/~rowland/


From rdreier at cisco.com  Thu Feb  8 14:41:05 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Thu, 08 Feb 2007 14:41:05 -0800
Subject: [openib-general] [PATCH] IB/ipoib_cm: fix up issues from code
	review
In-Reply-To: <20070208152947.GA6560@mellanox.co.il> (Michael S.
	Tsirkin's message of "Thu, 8 Feb 2007 17:29:47 +0200")
References: <adaveiesyrb.fsf@cisco.com> <20070208152947.GA6560@mellanox.co.il>
Message-ID: <ada3b5gtgqm.fsf@cisco.com>

OK, I pulled this in and fixed it to build with the netdevice
class_device-ectomy that just went upstream, and pushed it out on my
for-2.6.21 branch like this.

diff --git a/drivers/infiniband/ulp/ipoib/Kconfig b/drivers/infiniband/ulp/ipoib/Kconfig
index c75322d..af78ccc 100644
--- a/drivers/infiniband/ulp/ipoib/Kconfig
+++ b/drivers/infiniband/ulp/ipoib/Kconfig
@@ -1,6 +1,6 @@
 config INFINIBAND_IPOIB
 	tristate "IP-over-InfiniBand"
-	depends on INFINIBAND && NETDEVICES && INET
+	depends on INFINIBAND && NETDEVICES && INET && (IPV6 || IPV6=n)
 	---help---
 	  Support for the IP-over-InfiniBand protocol (IPoIB). This
 	  transports IP packets over InfiniBand so you can use your IB
@@ -8,6 +8,20 @@ config INFINIBAND_IPOIB
 
 	  See Documentation/infiniband/ipoib.txt for more information
 
+config INFINIBAND_IPOIB_CM
+	bool "IP-over-InfiniBand Connected Mode support"
+	depends on INFINIBAND_IPOIB && EXPERIMENTAL
+	default n
+	---help---
+	  This option enables experimental support for IPoIB connected mode.
+	  After enabling this option, you need to switch to connected mode through
+	  /sys/class/net/ibXXX/mode to actually create connections, and then increase
+	  the interface MTU with e.g. ifconfig ib0 mtu 65520.
+
+	  WARNING: Enabling connected mode will trigger some
+	  packet drops for multicast and UD mode traffic from this interface,
+	  unless you limit mtu for these destinations to 2044.
+
 config INFINIBAND_IPOIB_DEBUG
 	bool "IP-over-InfiniBand debugging" if EMBEDDED
 	depends on INFINIBAND_IPOIB
diff --git a/drivers/infiniband/ulp/ipoib/Makefile b/drivers/infiniband/ulp/ipoib/Makefile
index 8935e74..98ee38e 100644
--- a/drivers/infiniband/ulp/ipoib/Makefile
+++ b/drivers/infiniband/ulp/ipoib/Makefile
@@ -5,5 +5,6 @@ ib_ipoib-y					:= ipoib_main.o \
 						   ipoib_multicast.o \
 						   ipoib_verbs.o \
 						   ipoib_vlan.o
+ib_ipoib-$(CONFIG_INFINIBAND_IPOIB_CM)		+= ipoib_cm.o
 ib_ipoib-$(CONFIG_INFINIBAND_IPOIB_DEBUG)	+= ipoib_fs.o
 
diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h b/drivers/infiniband/ulp/ipoib/ipoib.h
index 07deee8..eb885ee 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib.h
+++ b/drivers/infiniband/ulp/ipoib/ipoib.h
@@ -62,6 +62,10 @@ enum {
 
 	IPOIB_ENCAP_LEN 	  = 4,
 
+	IPOIB_CM_MTU              = 0x10000 - 0x10, /* padding to align header to 16 */
+	IPOIB_CM_BUF_SIZE         = IPOIB_CM_MTU  + IPOIB_ENCAP_LEN,
+	IPOIB_CM_HEAD_SIZE 	  = IPOIB_CM_BUF_SIZE % PAGE_SIZE,
+	IPOIB_CM_RX_SG            = ALIGN(IPOIB_CM_BUF_SIZE, PAGE_SIZE) / PAGE_SIZE,
 	IPOIB_RX_RING_SIZE 	  = 128,
 	IPOIB_TX_RING_SIZE 	  = 64,
 	IPOIB_MAX_QUEUE_SIZE	  = 8192,
@@ -81,6 +85,8 @@ enum {
 	IPOIB_MCAST_RUN 	  = 6,
 	IPOIB_STOP_REAPER         = 7,
 	IPOIB_MCAST_STARTED       = 8,
+	IPOIB_FLAG_NETIF_STOPPED  = 9,
+	IPOIB_FLAG_ADMIN_CM 	  = 10,
 
 	IPOIB_MAX_BACKOFF_SECONDS = 16,
 
@@ -90,6 +96,14 @@ enum {
 	IPOIB_MCAST_FLAG_ATTACHED = 3,
 };
 
+
+#define	IPOIB_OP_RECV   (1ul << 31)
+#ifdef CONFIG_INFINIBAND_IPOIB_CM
+#define	IPOIB_CM_OP_SRQ (1ul << 30)
+#else
+#define	IPOIB_CM_OP_SRQ (0)
+#endif
+
 /* structs */
 
 struct ipoib_header {
@@ -113,6 +127,59 @@ struct ipoib_tx_buf {
 	u64		mapping;
 };
 
+struct ib_cm_id;
+
+struct ipoib_cm_data {
+	__be32 qpn; /* High byte MUST be ignored on receive */
+	__be32 mtu;
+};
+
+struct ipoib_cm_rx {
+	struct ib_cm_id     *id;
+	struct ib_qp        *qp;
+	struct list_head     list;
+	struct net_device   *dev;
+	unsigned long        jiffies;
+};
+
+struct ipoib_cm_tx {
+	struct ib_cm_id     *id;
+	struct ib_cq        *cq;
+	struct ib_qp        *qp;
+	struct list_head     list;
+	struct net_device   *dev;
+	struct ipoib_neigh  *neigh;
+	struct ipoib_path   *path;
+	struct ipoib_tx_buf *tx_ring;
+	unsigned             tx_head;
+	unsigned             tx_tail;
+	unsigned long        flags;
+	u32                  mtu;
+	struct ib_wc         ibwc[IPOIB_NUM_WC];
+};
+
+struct ipoib_cm_rx_buf {
+	struct sk_buff *skb;
+	u64 mapping[IPOIB_CM_RX_SG];
+};
+
+struct ipoib_cm_dev_priv {
+	struct ib_srq  	       *srq;
+	struct ipoib_cm_rx_buf *srq_ring;
+	struct ib_cm_id        *id;
+	struct list_head        passive_ids;
+	struct work_struct      start_task;
+	struct work_struct      reap_task;
+	struct work_struct      skb_task;
+	struct delayed_work     stale_task;
+	struct sk_buff_head     skb_queue;
+	struct list_head        start_list;
+	struct list_head        reap_list;
+	struct ib_wc            ibwc[IPOIB_NUM_WC];
+	struct ib_sge           rx_sge[IPOIB_CM_RX_SG];
+	struct ib_recv_wr       rx_wr;
+};
+
 /*
  * Device private locking: tx_lock protects members used in TX fast
  * path (and we use LLTX so upper layers don't do extra locking).
@@ -179,6 +246,10 @@ struct ipoib_dev_priv {
 	struct list_head child_intfs;
 	struct list_head list;
 
+#ifdef CONFIG_INFINIBAND_IPOIB_CM
+	struct ipoib_cm_dev_priv cm;
+#endif
+
 #ifdef CONFIG_INFINIBAND_IPOIB_DEBUG
 	struct list_head fs_list;
 	struct dentry *mcg_dentry;
@@ -212,6 +283,9 @@ struct ipoib_path {
 
 struct ipoib_neigh {
 	struct ipoib_ah    *ah;
+#ifdef CONFIG_INFINIBAND_IPOIB_CM
+	struct ipoib_cm_tx *cm;
+#endif
 	union ib_gid        dgid;
 	struct sk_buff_head queue;
 
@@ -315,6 +389,146 @@ int ipoib_vlan_delete(struct net_device *pdev, unsigned short pkey);
 void ipoib_pkey_poll(struct work_struct *work);
 int ipoib_pkey_dev_delay_open(struct net_device *dev);
 
+#ifdef CONFIG_INFINIBAND_IPOIB_CM
+
+#define IPOIB_FLAGS_RC          0x80
+#define IPOIB_FLAGS_UC          0x40
+
+/* We don't support UC connections at the moment */
+#define IPOIB_CM_SUPPORTED(ha)   (ha[0] & (IPOIB_FLAGS_RC))
+
+static inline int ipoib_cm_admin_enabled(struct net_device *dev)
+{
+	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	return IPOIB_CM_SUPPORTED(dev->dev_addr) &&
+		test_bit(IPOIB_FLAG_ADMIN_CM, &priv->flags);
+}
+
+static inline int ipoib_cm_enabled(struct net_device *dev, struct neighbour *n)
+{
+	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	return IPOIB_CM_SUPPORTED(n->ha) &&
+		test_bit(IPOIB_FLAG_ADMIN_CM, &priv->flags);
+}
+
+static inline int ipoib_cm_up(struct ipoib_neigh *neigh)
+
+{
+	return test_bit(IPOIB_FLAG_OPER_UP, &neigh->cm->flags);
+}
+
+static inline struct ipoib_cm_tx *ipoib_cm_get(struct ipoib_neigh *neigh)
+{
+	return neigh->cm;
+}
+
+static inline void ipoib_cm_set(struct ipoib_neigh *neigh, struct ipoib_cm_tx *tx)
+{
+	neigh->cm = tx;
+}
+
+void ipoib_cm_send(struct net_device *dev, struct sk_buff *skb, struct ipoib_cm_tx *tx);
+int ipoib_cm_dev_open(struct net_device *dev);
+void ipoib_cm_dev_stop(struct net_device *dev);
+int ipoib_cm_dev_init(struct net_device *dev);
+int ipoib_cm_add_mode_attr(struct net_device *dev);
+void ipoib_cm_dev_cleanup(struct net_device *dev);
+struct ipoib_cm_tx *ipoib_cm_create_tx(struct net_device *dev, struct ipoib_path *path,
+				    struct ipoib_neigh *neigh);
+void ipoib_cm_destroy_tx(struct ipoib_cm_tx *tx);
+void ipoib_cm_skb_too_long(struct net_device* dev, struct sk_buff *skb,
+			   unsigned int mtu);
+void ipoib_cm_handle_rx_wc(struct net_device *dev, struct ib_wc *wc);
+#else
+
+struct ipoib_cm_tx;
+
+static inline int ipoib_cm_admin_enabled(struct net_device *dev)
+{
+	return 0;
+}
+static inline int ipoib_cm_enabled(struct net_device *dev, struct neighbour *n)
+
+{
+	return 0;
+}
+
+static inline int ipoib_cm_up(struct ipoib_neigh *neigh)
+
+{
+	return 0;
+}
+
+static inline struct ipoib_cm_tx *ipoib_cm_get(struct ipoib_neigh *neigh)
+{
+	return NULL;
+}
+
+static inline void ipoib_cm_set(struct ipoib_neigh *neigh, struct ipoib_cm_tx *tx)
+{
+}
+
+static inline
+void ipoib_cm_send(struct net_device *dev, struct sk_buff *skb, struct ipoib_cm_tx *tx)
+{
+	return;
+}
+
+static inline
+int ipoib_cm_dev_open(struct net_device *dev)
+{
+	return 0;
+}
+
+static inline
+void ipoib_cm_dev_stop(struct net_device *dev)
+{
+	return;
+}
+
+static inline
+int ipoib_cm_dev_init(struct net_device *dev)
+{
+	return -ENOSYS;
+}
+
+static inline
+void ipoib_cm_dev_cleanup(struct net_device *dev)
+{
+	return;
+}
+
+static inline
+struct ipoib_cm_tx *ipoib_cm_create_tx(struct net_device *dev, struct ipoib_path *path,
+				    struct ipoib_neigh *neigh)
+{
+	return NULL;
+}
+
+static inline
+void ipoib_cm_destroy_tx(struct ipoib_cm_tx *tx)
+{
+	return;
+}
+
+static inline
+int ipoib_cm_add_mode_attr(struct net_device *dev)
+{
+	return 0;
+}
+
+static inline void ipoib_cm_skb_too_long(struct net_device* dev, struct sk_buff *skb,
+					 unsigned int mtu)
+{
+	dev_kfree_skb_any(skb);
+}
+
+static inline void ipoib_cm_handle_rx_wc(struct net_device *dev, struct ib_wc *wc)
+{
+}
+
+#endif
+
 #ifdef CONFIG_INFINIBAND_IPOIB_DEBUG
 void ipoib_create_debug_files(struct net_device *dev);
 void ipoib_delete_debug_files(struct net_device *dev);
@@ -392,4 +606,6 @@ extern int ipoib_debug_level;
 
 #define IPOIB_GID_ARG(gid)	IPOIB_GID_RAW_ARG((gid).raw)
 
+#define IPOIB_QPN(ha) (be32_to_cpup((__be32 *) ha) & 0xffffff)
+
 #endif /* _IPOIB_H */
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_cm.c b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
new file mode 100644
index 0000000..2d48387
--- /dev/null
+++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
@@ -0,0 +1,1237 @@
+/*
+ * Copyright (c) 2006 Mellanox Technologies. All rights reserved
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ *
+ * $Id$
+ */
+
+#include <rdma/ib_cm.h>
+#include <rdma/ib_cache.h>
+#include <net/dst.h>
+#include <net/icmp.h>
+#include <linux/icmpv6.h>
+
+#ifdef CONFIG_INFINIBAND_IPOIB_DEBUG_DATA
+static int data_debug_level;
+
+module_param_named(cm_data_debug_level, data_debug_level, int, 0644);
+MODULE_PARM_DESC(cm_data_debug_level,
+		 "Enable data path debug tracing for connected mode if > 0");
+#endif
+
+#include "ipoib.h"
+
+#define IPOIB_CM_IETF_ID 0x1000000000000000ULL
+
+#define IPOIB_CM_RX_UPDATE_TIME (256 * HZ)
+#define IPOIB_CM_RX_TIMEOUT     (2 * 256 * HZ)
+#define IPOIB_CM_RX_DELAY       (3 * 256 * HZ)
+#define IPOIB_CM_RX_UPDATE_MASK (0x3)
+
+struct ipoib_cm_id {
+	struct ib_cm_id *id;
+	int flags;
+	u32 remote_qpn;
+	u32 remote_mtu;
+};
+
+static int ipoib_cm_tx_handler(struct ib_cm_id *cm_id,
+			       struct ib_cm_event *event);
+
+static void ipoib_cm_dma_unmap_rx(struct ipoib_dev_priv *priv,
+				  u64 mapping[IPOIB_CM_RX_SG])
+{
+	int i;
+
+	ib_dma_unmap_single(priv->ca, mapping[0], IPOIB_CM_HEAD_SIZE, DMA_FROM_DEVICE);
+
+	for (i = 0; i < IPOIB_CM_RX_SG - 1; ++i)
+		ib_dma_unmap_single(priv->ca, mapping[i + 1], PAGE_SIZE, DMA_FROM_DEVICE);
+}
+
+static int ipoib_cm_post_receive(struct net_device *dev, int id)
+{
+	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	struct ib_recv_wr *bad_wr;
+	int i, ret;
+
+	priv->cm.rx_wr.wr_id = id | IPOIB_CM_OP_SRQ;
+
+	for (i = 0; i < IPOIB_CM_RX_SG; ++i)
+		priv->cm.rx_sge[i].addr = priv->cm.srq_ring[id].mapping[i];
+
+	ret = ib_post_srq_recv(priv->cm.srq, &priv->cm.rx_wr, &bad_wr);
+	if (unlikely(ret)) {
+		ipoib_warn(priv, "post srq failed for buf %d (%d)\n", id, ret);
+		ipoib_cm_dma_unmap_rx(priv, priv->cm.srq_ring[id].mapping);
+		dev_kfree_skb_any(priv->cm.srq_ring[id].skb);
+		priv->cm.srq_ring[id].skb = NULL;
+	}
+
+	return ret;
+}
+
+static int ipoib_cm_alloc_rx_skb(struct net_device *dev, int id,
+				 u64 mapping[IPOIB_CM_RX_SG])
+{
+	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	struct sk_buff *skb;
+	int i;
+
+	skb = dev_alloc_skb(IPOIB_CM_HEAD_SIZE + 12);
+	if (unlikely(!skb))
+		return -ENOMEM;
+
+	/*
+	 * IPoIB adds a 4 byte header. So we need 12 more bytes to align the
+	 * IP header to a multiple of 16.
+	 */
+	skb_reserve(skb, 12);
+
+	mapping[0] = ib_dma_map_single(priv->ca, skb->data, IPOIB_CM_HEAD_SIZE,
+				       DMA_FROM_DEVICE);
+	if (unlikely(ib_dma_mapping_error(priv->ca, mapping[0]))) {
+		dev_kfree_skb_any(skb);
+		return -EIO;
+	}
+
+	for (i = 0; i < IPOIB_CM_RX_SG - 1; i++) {
+		struct page *page = alloc_page(GFP_ATOMIC);
+
+		if (!page)
+			goto partial_error;
+		skb_fill_page_desc(skb, i, page, 0, PAGE_SIZE);
+
+		mapping[i + 1] = ib_dma_map_page(priv->ca, skb_shinfo(skb)->frags[i].page,
+						 0, PAGE_SIZE, DMA_TO_DEVICE);
+		if (unlikely(ib_dma_mapping_error(priv->ca, mapping[i + 1])))
+			goto partial_error;
+	}
+
+	priv->cm.srq_ring[id].skb = skb;
+	return 0;
+
+partial_error:
+
+	ib_dma_unmap_single(priv->ca, mapping[0], IPOIB_CM_HEAD_SIZE, DMA_FROM_DEVICE);
+
+	for (; i >= 0; --i)
+		ib_dma_unmap_single(priv->ca, mapping[i + 1], PAGE_SIZE, DMA_FROM_DEVICE);
+
+	kfree_skb(skb);
+	return -ENOMEM;
+}
+
+static struct ib_qp *ipoib_cm_create_rx_qp(struct net_device *dev,
+					   struct ipoib_cm_rx *p)
+{
+	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	struct ib_qp_init_attr attr = {
+		.send_cq = priv->cq, /* does not matter, we never send anything */
+		.recv_cq = priv->cq,
+		.srq = priv->cm.srq,
+		.cap.max_send_wr = 1, /* FIXME: 0 Seems not to work */
+		.cap.max_send_sge = 1, /* FIXME: 0 Seems not to work */
+		.sq_sig_type = IB_SIGNAL_ALL_WR,
+		.qp_type = IB_QPT_RC,
+		.qp_context = p,
+	};
+	return ib_create_qp(priv->pd, &attr);
+}
+
+static int ipoib_cm_modify_rx_qp(struct net_device *dev,
+				  struct ib_cm_id *cm_id, struct ib_qp *qp,
+				  unsigned psn)
+{
+	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	struct ib_qp_attr qp_attr;
+	int qp_attr_mask, ret;
+
+	qp_attr.qp_state = IB_QPS_INIT;
+	ret = ib_cm_init_qp_attr(cm_id, &qp_attr, &qp_attr_mask);
+	if (ret) {
+		ipoib_warn(priv, "failed to init QP attr for INIT: %d\n", ret);
+		return ret;
+	}
+	ret = ib_modify_qp(qp, &qp_attr, qp_attr_mask);
+	if (ret) {
+		ipoib_warn(priv, "failed to modify QP to INIT: %d\n", ret);
+		return ret;
+	}
+	qp_attr.qp_state = IB_QPS_RTR;
+	ret = ib_cm_init_qp_attr(cm_id, &qp_attr, &qp_attr_mask);
+	if (ret) {
+		ipoib_warn(priv, "failed to init QP attr for RTR: %d\n", ret);
+		return ret;
+	}
+	qp_attr.rq_psn = psn;
+	ret = ib_modify_qp(qp, &qp_attr, qp_attr_mask);
+	if (ret) {
+		ipoib_warn(priv, "failed to modify QP to RTR: %d\n", ret);
+		return ret;
+	}
+	return 0;
+}
+
+static int ipoib_cm_send_rep(struct net_device *dev, struct ib_cm_id *cm_id,
+			     struct ib_qp *qp, struct ib_cm_req_event_param *req,
+			     unsigned psn)
+{
+	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	struct ipoib_cm_data data = {};
+	struct ib_cm_rep_param rep = {};
+
+	data.qpn = cpu_to_be32(priv->qp->qp_num);
+	data.mtu = cpu_to_be32(IPOIB_CM_BUF_SIZE);
+
+	rep.private_data = &data;
+	rep.private_data_len = sizeof data;
+	rep.flow_control = 0;
+	rep.rnr_retry_count = req->rnr_retry_count;
+	rep.target_ack_delay = 20; /* FIXME */
+	rep.srq = 1;
+	rep.qp_num = qp->qp_num;
+	rep.starting_psn = psn;
+	return ib_send_cm_rep(cm_id, &rep);
+}
+
+static int ipoib_cm_req_handler(struct ib_cm_id *cm_id, struct ib_cm_event *event)
+{
+	struct net_device *dev = cm_id->context;
+	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	struct ipoib_cm_rx *p;
+	unsigned long flags;
+	unsigned psn;
+	int ret;
+
+	ipoib_dbg(priv, "REQ arrived\n");
+	p = kzalloc(sizeof *p, GFP_KERNEL);
+	if (!p)
+		return -ENOMEM;
+	p->dev = dev;
+	p->id = cm_id;
+	p->qp = ipoib_cm_create_rx_qp(dev, p);
+	if (IS_ERR(p->qp)) {
+		ret = PTR_ERR(p->qp);
+		goto err_qp;
+	}
+
+	psn = random32() & 0xffffff;
+	ret = ipoib_cm_modify_rx_qp(dev, cm_id, p->qp, psn);
+	if (ret)
+		goto err_modify;
+
+	ret = ipoib_cm_send_rep(dev, cm_id, p->qp, &event->param.req_rcvd, psn);
+	if (ret) {
+		ipoib_warn(priv, "failed to send REP: %d\n", ret);
+		goto err_rep;
+	}
+
+	cm_id->context = p;
+	p->jiffies = jiffies;
+	spin_lock_irqsave(&priv->lock, flags);
+	list_add(&p->list, &priv->cm.passive_ids);
+	spin_unlock_irqrestore(&priv->lock, flags);
+	queue_delayed_work(ipoib_workqueue,
+			   &priv->cm.stale_task, IPOIB_CM_RX_DELAY);
+	return 0;
+
+err_rep:
+err_modify:
+	ib_destroy_qp(p->qp);
+err_qp:
+	kfree(p);
+	return ret;
+}
+
+static int ipoib_cm_rx_handler(struct ib_cm_id *cm_id,
+			       struct ib_cm_event *event)
+{
+	struct ipoib_cm_rx *p;
+	struct ipoib_dev_priv *priv;
+	unsigned long flags;
+	int ret;
+
+	switch (event->event) {
+	case IB_CM_REQ_RECEIVED:
+		return ipoib_cm_req_handler(cm_id, event);
+	case IB_CM_DREQ_RECEIVED:
+		p = cm_id->context;
+		ib_send_cm_drep(cm_id, NULL, 0);
+		/* Fall through */
+	case IB_CM_REJ_RECEIVED:
+		p = cm_id->context;
+		priv = netdev_priv(p->dev);
+		spin_lock_irqsave(&priv->lock, flags);
+		if (list_empty(&p->list))
+			ret = 0; /* Connection is going away already. */
+		else {
+			list_del_init(&p->list);
+			ret = -ECONNRESET;
+		}
+		spin_unlock_irqrestore(&priv->lock, flags);
+		if (ret) {
+			ib_destroy_qp(p->qp);
+			kfree(p);
+			return ret;
+		}
+		return 0;
+	default:
+		return 0;
+	}
+}
+/* Adjust length of skb with fragments to match received data */
+static void skb_put_frags(struct sk_buff *skb, unsigned int hdr_space,
+			  unsigned int length)
+{
+	int i, num_frags;
+	unsigned int size;
+
+	/* put header into skb */
+	size = min(length, hdr_space);
+	skb->tail += size;
+	skb->len += size;
+	length -= size;
+
+	num_frags = skb_shinfo(skb)->nr_frags;
+	for (i = 0; i < num_frags; i++) {
+		skb_frag_t *frag = &skb_shinfo(skb)->frags[i];
+
+		if (length == 0) {
+			/* don't need this page */
+			__free_page(frag->page);
+			--skb_shinfo(skb)->nr_frags;
+		} else {
+			size = min(length, (unsigned) PAGE_SIZE);
+
+			frag->size = size;
+			skb->data_len += size;
+			skb->truesize += size;
+			skb->len += size;
+			length -= size;
+		}
+	}
+}
+
+void ipoib_cm_handle_rx_wc(struct net_device *dev, struct ib_wc *wc)
+{
+	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	unsigned int wr_id = wc->wr_id & ~IPOIB_CM_OP_SRQ;
+	struct sk_buff *skb;
+	struct ipoib_cm_rx *p;
+	unsigned long flags;
+	u64 mapping[IPOIB_CM_RX_SG];
+
+	ipoib_dbg_data(priv, "cm recv completion: id %d, op %d, status: %d\n",
+		       wr_id, wc->opcode, wc->status);
+
+	if (unlikely(wr_id >= ipoib_recvq_size)) {
+		ipoib_warn(priv, "cm recv completion event with wrid %d (> %d)\n",
+			   wr_id, ipoib_recvq_size);
+		return;
+	}
+
+	skb  = priv->cm.srq_ring[wr_id].skb;
+
+	if (unlikely(wc->status != IB_WC_SUCCESS)) {
+		ipoib_dbg(priv, "cm recv error "
+			   "(status=%d, wrid=%d vend_err %x)\n",
+			   wc->status, wr_id, wc->vendor_err);
+		++priv->stats.rx_dropped;
+		goto repost;
+	}
+
+	if (!likely(wr_id & IPOIB_CM_RX_UPDATE_MASK)) {
+		p = wc->qp->qp_context;
+		if (time_after_eq(jiffies, p->jiffies + IPOIB_CM_RX_UPDATE_TIME)) {
+			spin_lock_irqsave(&priv->lock, flags);
+			p->jiffies = jiffies;
+			/* Move this entry to list head, but do
+			 * not re-add it if it has been removed. */
+			if (!list_empty(&p->list))
+				list_move(&p->list, &priv->cm.passive_ids);
+			spin_unlock_irqrestore(&priv->lock, flags);
+			queue_delayed_work(ipoib_workqueue,
+					   &priv->cm.stale_task, IPOIB_CM_RX_DELAY);
+		}
+	}
+
+	if (unlikely(ipoib_cm_alloc_rx_skb(dev, wr_id, mapping))) {
+		/*
+		 * If we can't allocate a new RX buffer, dump
+		 * this packet and reuse the old buffer.
+		 */
+		ipoib_dbg(priv, "failed to allocate receive buffer %d\n", wr_id);
+		++priv->stats.rx_dropped;
+		goto repost;
+	}
+
+	ipoib_cm_dma_unmap_rx(priv, priv->cm.srq_ring[wr_id].mapping);
+	memcpy(priv->cm.srq_ring[wr_id].mapping, mapping, sizeof mapping);
+
+	ipoib_dbg_data(priv, "received %d bytes, SLID 0x%04x\n",
+		       wc->byte_len, wc->slid);
+
+	skb_put_frags(skb, IPOIB_CM_HEAD_SIZE, wc->byte_len);
+
+	skb->protocol = ((struct ipoib_header *) skb->data)->proto;
+	skb->mac.raw = skb->data;
+	skb_pull(skb, IPOIB_ENCAP_LEN);
+
+	dev->last_rx = jiffies;
+	++priv->stats.rx_packets;
+	priv->stats.rx_bytes += skb->len;
+
+	skb->dev = dev;
+	/* XXX get correct PACKET_ type here */
+	skb->pkt_type = PACKET_HOST;
+	netif_rx_ni(skb);
+
+repost:
+	if (unlikely(ipoib_cm_post_receive(dev, wr_id)))
+		ipoib_warn(priv, "ipoib_cm_post_receive failed "
+			   "for buf %d\n", wr_id);
+}
+
+static inline int post_send(struct ipoib_dev_priv *priv,
+			    struct ipoib_cm_tx *tx,
+			    unsigned int wr_id,
+			    u64 addr, int len)
+{
+	struct ib_send_wr *bad_wr;
+
+	priv->tx_sge.addr             = addr;
+	priv->tx_sge.length           = len;
+
+	priv->tx_wr.wr_id 	      = wr_id;
+
+	return ib_post_send(tx->qp, &priv->tx_wr, &bad_wr);
+}
+
+void ipoib_cm_send(struct net_device *dev, struct sk_buff *skb, struct ipoib_cm_tx *tx)
+{
+	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	struct ipoib_tx_buf *tx_req;
+	u64 addr;
+
+	if (unlikely(skb->len > tx->mtu)) {
+		ipoib_warn(priv, "packet len %d (> %d) too long to send, dropping\n",
+			   skb->len, tx->mtu);
+		++priv->stats.tx_dropped;
+		++priv->stats.tx_errors;
+		ipoib_cm_skb_too_long(dev, skb, tx->mtu - INFINIBAND_ALEN);
+		return;
+	}
+
+	ipoib_dbg_data(priv, "sending packet: head 0x%x length %d connection 0x%x\n",
+		       tx->tx_head, skb->len, tx->qp->qp_num);
+
+	/*
+	 * We put the skb into the tx_ring _before_ we call post_send()
+	 * because it's entirely possible that the completion handler will
+	 * run before we execute anything after the post_send().  That
+	 * means we have to make sure everything is properly recorded and
+	 * our state is consistent before we call post_send().
+	 */
+	tx_req = &tx->tx_ring[tx->tx_head & (ipoib_sendq_size - 1)];
+	tx_req->skb = skb;
+	addr = ib_dma_map_single(priv->ca, skb->data, skb->len, DMA_TO_DEVICE);
+	if (unlikely(ib_dma_mapping_error(priv->ca, addr))) {
+		++priv->stats.tx_errors;
+		dev_kfree_skb_any(skb);
+		return;
+	}
+
+	tx_req->mapping = addr;
+
+	if (unlikely(post_send(priv, tx, tx->tx_head & (ipoib_sendq_size - 1),
+			        addr, skb->len))) {
+		ipoib_warn(priv, "post_send failed\n");
+		++priv->stats.tx_errors;
+		ib_dma_unmap_single(priv->ca, addr, skb->len, DMA_TO_DEVICE);
+		dev_kfree_skb_any(skb);
+	} else {
+		dev->trans_start = jiffies;
+		++tx->tx_head;
+
+		if (tx->tx_head - tx->tx_tail == ipoib_sendq_size) {
+			ipoib_dbg(priv, "TX ring 0x%x full, stopping kernel net queue\n",
+				  tx->qp->qp_num);
+			netif_stop_queue(dev);
+			set_bit(IPOIB_FLAG_NETIF_STOPPED, &tx->flags);
+		}
+	}
+}
+
+static void ipoib_cm_handle_tx_wc(struct net_device *dev, struct ipoib_cm_tx *tx,
+				  struct ib_wc *wc)
+{
+	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	unsigned int wr_id = wc->wr_id;
+	struct ipoib_tx_buf *tx_req;
+	unsigned long flags;
+
+	ipoib_dbg_data(priv, "cm send completion: id %d, op %d, status: %d\n",
+		       wr_id, wc->opcode, wc->status);
+
+	if (unlikely(wr_id >= ipoib_sendq_size)) {
+		ipoib_warn(priv, "cm send completion event with wrid %d (> %d)\n",
+			   wr_id, ipoib_sendq_size);
+		return;
+	}
+
+	tx_req = &tx->tx_ring[wr_id];
+
+	ib_dma_unmap_single(priv->ca, tx_req->mapping, tx_req->skb->len, DMA_TO_DEVICE);
+
+	/* FIXME: is this right? Shouldn't we only increment on success? */
+	++priv->stats.tx_packets;
+	priv->stats.tx_bytes += tx_req->skb->len;
+
+	dev_kfree_skb_any(tx_req->skb);
+
+	spin_lock_irqsave(&priv->tx_lock, flags);
+	++tx->tx_tail;
+	if (unlikely(test_bit(IPOIB_FLAG_NETIF_STOPPED, &tx->flags)) &&
+	    tx->tx_head - tx->tx_tail <= ipoib_sendq_size >> 1) {
+		clear_bit(IPOIB_FLAG_NETIF_STOPPED, &tx->flags);
+		netif_wake_queue(dev);
+	}
+
+	if (wc->status != IB_WC_SUCCESS &&
+	    wc->status != IB_WC_WR_FLUSH_ERR) {
+		struct ipoib_neigh *neigh;
+
+		ipoib_dbg(priv, "failed cm send event "
+			   "(status=%d, wrid=%d vend_err %x)\n",
+			   wc->status, wr_id, wc->vendor_err);
+
+		spin_lock(&priv->lock);
+		neigh = tx->neigh;
+
+		if (neigh) {
+			neigh->cm = NULL;
+			list_del(&neigh->list);
+			if (neigh->ah)
+				ipoib_put_ah(neigh->ah);
+			ipoib_neigh_free(dev, neigh);
+
+			tx->neigh = NULL;
+		}
+
+		/* queue would be re-started anyway when TX is destroyed,
+		 * but it makes sense to do it ASAP here. */
+		if (test_and_clear_bit(IPOIB_FLAG_NETIF_STOPPED, &tx->flags))
+			netif_wake_queue(dev);
+
+		if (test_and_clear_bit(IPOIB_FLAG_INITIALIZED, &tx->flags)) {
+			list_move(&tx->list, &priv->cm.reap_list);
+			queue_work(ipoib_workqueue, &priv->cm.reap_task);
+		}
+
+		clear_bit(IPOIB_FLAG_OPER_UP, &tx->flags);
+
+		spin_unlock(&priv->lock);
+	}
+
+	spin_unlock_irqrestore(&priv->tx_lock, flags);
+}
+
+static void ipoib_cm_tx_completion(struct ib_cq *cq, void *tx_ptr)
+{
+	struct ipoib_cm_tx *tx = tx_ptr;
+	int n, i;
+
+	ib_req_notify_cq(cq, IB_CQ_NEXT_COMP);
+	do {
+		n = ib_poll_cq(cq, IPOIB_NUM_WC, tx->ibwc);
+		for (i = 0; i < n; ++i)
+			ipoib_cm_handle_tx_wc(tx->dev, tx, tx->ibwc + i);
+	} while (n == IPOIB_NUM_WC);
+}
+
+int ipoib_cm_dev_open(struct net_device *dev)
+{
+	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	int ret;
+
+	if (!IPOIB_CM_SUPPORTED(dev->dev_addr))
+		return 0;
+
+	priv->cm.id = ib_create_cm_id(priv->ca, ipoib_cm_rx_handler, dev);
+	if (IS_ERR(priv->cm.id)) {
+		printk(KERN_WARNING "%s: failed to create CM ID\n", priv->ca->name);
+		return IS_ERR(priv->cm.id);
+	}
+
+	ret = ib_cm_listen(priv->cm.id, cpu_to_be64(IPOIB_CM_IETF_ID | priv->qp->qp_num),
+			   0, NULL);
+	if (ret) {
+		printk(KERN_WARNING "%s: failed to listen on ID 0x%llx\n", priv->ca->name,
+		       IPOIB_CM_IETF_ID | priv->qp->qp_num);
+		ib_destroy_cm_id(priv->cm.id);
+		return ret;
+	}
+	return 0;
+}
+
+void ipoib_cm_dev_stop(struct net_device *dev)
+{
+	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	struct ipoib_cm_rx *p;
+	unsigned long flags;
+
+	if (!IPOIB_CM_SUPPORTED(dev->dev_addr))
+		return;
+
+	ib_destroy_cm_id(priv->cm.id);
+	spin_lock_irqsave(&priv->lock, flags);
+	while (!list_empty(&priv->cm.passive_ids)) {
+		p = list_entry(priv->cm.passive_ids.next, typeof(*p), list);
+		list_del_init(&p->list);
+		spin_unlock_irqrestore(&priv->lock, flags);
+		ib_destroy_cm_id(p->id);
+		ib_destroy_qp(p->qp);
+		kfree(p);
+		spin_lock_irqsave(&priv->lock, flags);
+	}
+	spin_unlock_irqrestore(&priv->lock, flags);
+
+	cancel_delayed_work(&priv->cm.stale_task);
+}
+
+static int ipoib_cm_rep_handler(struct ib_cm_id *cm_id, struct ib_cm_event *event)
+{
+	struct ipoib_cm_tx *p = cm_id->context;
+	struct ipoib_dev_priv *priv = netdev_priv(p->dev);
+	struct ipoib_cm_data *data = event->private_data;
+	struct sk_buff_head skqueue;
+	struct ib_qp_attr qp_attr;
+	int qp_attr_mask, ret;
+	struct sk_buff *skb;
+	unsigned long flags;
+
+	p->mtu = be32_to_cpu(data->mtu);
+
+	if (p->mtu < priv->dev->mtu + IPOIB_ENCAP_LEN) {
+		ipoib_warn(priv, "Rejecting connection: mtu %d < device mtu %d + 4\n",
+			   p->mtu, priv->dev->mtu);
+		return -EINVAL;
+	}
+
+	qp_attr.qp_state = IB_QPS_RTR;
+	ret = ib_cm_init_qp_attr(cm_id, &qp_attr, &qp_attr_mask);
+	if (ret) {
+		ipoib_warn(priv, "failed to init QP attr for RTR: %d\n", ret);
+		return ret;
+	}
+
+	qp_attr.rq_psn = 0 /* FIXME */;
+	ret = ib_modify_qp(p->qp, &qp_attr, qp_attr_mask);
+	if (ret) {
+		ipoib_warn(priv, "failed to modify QP to RTR: %d\n", ret);
+		return ret;
+	}
+
+	qp_attr.qp_state = IB_QPS_RTS;
+	ret = ib_cm_init_qp_attr(cm_id, &qp_attr, &qp_attr_mask);
+	if (ret) {
+		ipoib_warn(priv, "failed to init QP attr for RTS: %d\n", ret);
+		return ret;
+	}
+	ret = ib_modify_qp(p->qp, &qp_attr, qp_attr_mask);
+	if (ret) {
+		ipoib_warn(priv, "failed to modify QP to RTS: %d\n", ret);
+		return ret;
+	}
+
+	skb_queue_head_init(&skqueue);
+
+	spin_lock_irqsave(&priv->lock, flags);
+	set_bit(IPOIB_FLAG_OPER_UP, &p->flags);
+	if (p->neigh)
+		while ((skb = __skb_dequeue(&p->neigh->queue)))
+			__skb_queue_tail(&skqueue, skb);
+	spin_unlock_irqrestore(&priv->lock, flags);
+
+	while ((skb = __skb_dequeue(&skqueue))) {
+		skb->dev = p->dev;
+		if (dev_queue_xmit(skb))
+			ipoib_warn(priv, "dev_queue_xmit failed "
+				   "to requeue packet\n");
+	}
+
+	ret = ib_send_cm_rtu(cm_id, NULL, 0);
+	if (ret) {
+		ipoib_warn(priv, "failed to send RTU: %d\n", ret);
+		return ret;
+	}
+	return 0;
+}
+
+static struct ib_qp *ipoib_cm_create_tx_qp(struct net_device *dev, struct ib_cq *cq)
+{
+	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	struct ib_qp_init_attr attr = {};
+	attr.recv_cq = priv->cq;
+	attr.srq = priv->cm.srq;
+	attr.cap.max_send_wr = ipoib_sendq_size;
+	attr.cap.max_send_sge = 1;
+	attr.sq_sig_type = IB_SIGNAL_ALL_WR;
+	attr.qp_type = IB_QPT_RC;
+	attr.send_cq = cq;
+	return ib_create_qp(priv->pd, &attr);
+}
+
+static int ipoib_cm_send_req(struct net_device *dev,
+			     struct ib_cm_id *id, struct ib_qp *qp,
+			     u32 qpn,
+			     struct ib_sa_path_rec *pathrec)
+{
+	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	struct ipoib_cm_data data = {};
+	struct ib_cm_req_param req = {};
+
+	data.qpn = cpu_to_be32(priv->qp->qp_num);
+	data.mtu = cpu_to_be32(IPOIB_CM_BUF_SIZE);
+
+	req.primary_path 	      = pathrec;
+	req.alternate_path 	      = NULL;
+	req.service_id                = cpu_to_be64(IPOIB_CM_IETF_ID | qpn);
+	req.qp_num 		      = qp->qp_num;
+	req.qp_type 		      = qp->qp_type;
+	req.private_data 	      = &data;
+	req.private_data_len 	      = sizeof data;
+	req.flow_control 	      = 0;
+
+	req.starting_psn              = 0; /* FIXME */
+
+	/*
+	 * Pick some arbitrary defaults here; we could make these
+	 * module parameters if anyone cared about setting them.
+	 */
+	req.responder_resources	      = 4;
+	req.remote_cm_response_timeout = 20;
+	req.local_cm_response_timeout  = 20;
+	req.retry_count 	      = 0; /* RFC draft warns against retries */
+	req.rnr_retry_count 	      = 0; /* RFC draft warns against retries */
+	req.max_cm_retries 	      = 15;
+	req.srq 	              = 1;
+	return ib_send_cm_req(id, &req);
+}
+
+static int ipoib_cm_modify_tx_init(struct net_device *dev,
+				  struct ib_cm_id *cm_id, struct ib_qp *qp)
+{
+	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	struct ib_qp_attr qp_attr;
+	int qp_attr_mask, ret;
+	ret = ib_find_cached_pkey(priv->ca, priv->port, priv->pkey, &qp_attr.pkey_index);
+	if (ret) {
+		ipoib_warn(priv, "pkey 0x%x not in cache: %d\n", priv->pkey, ret);
+		return ret;
+	}
+
+	qp_attr.qp_state = IB_QPS_INIT;
+	qp_attr.qp_access_flags = IB_ACCESS_LOCAL_WRITE;
+	qp_attr.port_num = priv->port;
+	qp_attr_mask = IB_QP_STATE | IB_QP_ACCESS_FLAGS | IB_QP_PKEY_INDEX | IB_QP_PORT;
+
+	ret = ib_modify_qp(qp, &qp_attr, qp_attr_mask);
+	if (ret) {
+		ipoib_warn(priv, "failed to modify tx QP to INIT: %d\n", ret);
+		return ret;
+	}
+	return 0;
+}
+
+static int ipoib_cm_tx_init(struct ipoib_cm_tx *p, u32 qpn,
+			    struct ib_sa_path_rec *pathrec)
+{
+	struct ipoib_dev_priv *priv = netdev_priv(p->dev);
+	int ret;
+
+	p->tx_ring = kzalloc(ipoib_sendq_size * sizeof *p->tx_ring,
+				GFP_KERNEL);
+	if (!p->tx_ring) {
+		ipoib_warn(priv, "failed to allocate tx ring\n");
+		ret = -ENOMEM;
+		goto err_tx;
+	}
+
+	p->cq = ib_create_cq(priv->ca, ipoib_cm_tx_completion, NULL, p,
+			     ipoib_sendq_size + 1);
+	if (IS_ERR(p->cq)) {
+		ret = PTR_ERR(p->cq);
+		ipoib_warn(priv, "failed to allocate tx cq: %d\n", ret);
+		goto err_cq;
+	}
+
+	ret = ib_req_notify_cq(p->cq, IB_CQ_NEXT_COMP);
+	if (ret) {
+		ipoib_warn(priv, "failed to request completion notification: %d\n", ret);
+		goto err_req_notify;
+	}
+
+	p->qp = ipoib_cm_create_tx_qp(p->dev, p->cq);
+	if (IS_ERR(p->qp)) {
+		ret = PTR_ERR(p->qp);
+		ipoib_warn(priv, "failed to allocate tx qp: %d\n", ret);
+		goto err_qp;
+	}
+
+	p->id = ib_create_cm_id(priv->ca, ipoib_cm_tx_handler, p);
+	if (IS_ERR(p->id)) {
+		ret = PTR_ERR(p->id);
+		ipoib_warn(priv, "failed to create tx cm id: %d\n", ret);
+		goto err_id;
+	}
+
+	ret = ipoib_cm_modify_tx_init(p->dev, p->id,  p->qp);
+	if (ret) {
+		ipoib_warn(priv, "failed to modify tx qp to rtr: %d\n", ret);
+		goto err_modify;
+	}
+
+	ret = ipoib_cm_send_req(p->dev, p->id, p->qp, qpn, pathrec);
+	if (ret) {
+		ipoib_warn(priv, "failed to send cm req: %d\n", ret);
+		goto err_send_cm;
+	}
+
+	ipoib_dbg(priv, "Request connection 0x%x for gid " IPOIB_GID_FMT " qpn 0x%x\n",
+		  p->qp->qp_num, IPOIB_GID_ARG(pathrec->dgid), qpn);
+
+	return 0;
+
+err_send_cm:
+err_modify:
+	ib_destroy_cm_id(p->id);
+err_id:
+	p->id = NULL;
+	ib_destroy_qp(p->qp);
+err_req_notify:
+err_qp:
+	p->qp = NULL;
+	ib_destroy_cq(p->cq);
+err_cq:
+	p->cq = NULL;
+err_tx:
+	return ret;
+}
+
+static void ipoib_cm_tx_destroy(struct ipoib_cm_tx *p)
+{
+	struct ipoib_dev_priv *priv = netdev_priv(p->dev);
+	struct ipoib_tx_buf *tx_req;
+
+	ipoib_dbg(priv, "Destroy active connection 0x%x head 0x%x tail 0x%x\n",
+		  p->qp ? p->qp->qp_num : 0, p->tx_head, p->tx_tail);
+
+	if (p->id)
+		ib_destroy_cm_id(p->id);
+
+	if (p->qp)
+		ib_destroy_qp(p->qp);
+
+	if (p->cq)
+		ib_destroy_cq(p->cq);
+
+	if (test_bit(IPOIB_FLAG_NETIF_STOPPED, &p->flags))
+		netif_wake_queue(p->dev);
+
+	if (p->tx_ring) {
+		while ((int) p->tx_tail - (int) p->tx_head < 0) {
+			tx_req = &p->tx_ring[p->tx_tail & (ipoib_sendq_size - 1)];
+			ib_dma_unmap_single(priv->ca, tx_req->mapping, tx_req->skb->len,
+					 DMA_TO_DEVICE);
+			dev_kfree_skb_any(tx_req->skb);
+			++p->tx_tail;
+		}
+
+		kfree(p->tx_ring);
+	}
+
+	kfree(p);
+}
+
+static int ipoib_cm_tx_handler(struct ib_cm_id *cm_id,
+			       struct ib_cm_event *event)
+{
+	struct ipoib_cm_tx *tx = cm_id->context;
+	struct ipoib_dev_priv *priv = netdev_priv(tx->dev);
+	struct net_device *dev = priv->dev;
+	struct ipoib_neigh *neigh;
+	unsigned long flags;
+	int ret;
+
+	switch (event->event) {
+	case IB_CM_DREQ_RECEIVED:
+		ipoib_dbg(priv, "DREQ received.\n");
+		ib_send_cm_drep(cm_id, NULL, 0);
+		break;
+	case IB_CM_REP_RECEIVED:
+		ipoib_dbg(priv, "REP received.\n");
+		ret = ipoib_cm_rep_handler(cm_id, event);
+		if (ret)
+			ib_send_cm_rej(cm_id, IB_CM_REJ_CONSUMER_DEFINED,
+				       NULL, 0, NULL, 0);
+		break;
+	case IB_CM_REQ_ERROR:
+	case IB_CM_REJ_RECEIVED:
+	case IB_CM_TIMEWAIT_EXIT:
+		ipoib_dbg(priv, "CM error %d.\n", event->event);
+		spin_lock_irqsave(&priv->tx_lock, flags);
+		spin_lock(&priv->lock);
+		neigh = tx->neigh;
+
+		if (neigh) {
+			neigh->cm = NULL;
+			list_del(&neigh->list);
+			if (neigh->ah)
+				ipoib_put_ah(neigh->ah);
+			ipoib_neigh_free(dev, neigh);
+
+			tx->neigh = NULL;
+		}
+
+		if (test_and_clear_bit(IPOIB_FLAG_INITIALIZED, &tx->flags)) {
+			list_move(&tx->list, &priv->cm.reap_list);
+			queue_work(ipoib_workqueue, &priv->cm.reap_task);
+		}
+
+		spin_unlock(&priv->lock);
+		spin_unlock_irqrestore(&priv->tx_lock, flags);
+		break;
+	default:
+		break;
+	}
+
+	return 0;
+}
+
+struct ipoib_cm_tx *ipoib_cm_create_tx(struct net_device *dev, struct ipoib_path *path,
+				       struct ipoib_neigh *neigh)
+{
+	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	struct ipoib_cm_tx *tx;
+
+	tx = kzalloc(sizeof *tx, GFP_ATOMIC);
+	if (!tx)
+		return NULL;
+
+	neigh->cm = tx;
+	tx->neigh = neigh;
+	tx->path = path;
+	tx->dev = dev;
+	list_add(&tx->list, &priv->cm.start_list);
+	set_bit(IPOIB_FLAG_INITIALIZED, &tx->flags);
+	queue_work(ipoib_workqueue, &priv->cm.start_task);
+	return tx;
+}
+
+void ipoib_cm_destroy_tx(struct ipoib_cm_tx *tx)
+{
+	struct ipoib_dev_priv *priv = netdev_priv(tx->dev);
+	if (test_and_clear_bit(IPOIB_FLAG_INITIALIZED, &tx->flags)) {
+		list_move(&tx->list, &priv->cm.reap_list);
+		queue_work(ipoib_workqueue, &priv->cm.reap_task);
+		ipoib_dbg(priv, "Reap connection for gid " IPOIB_GID_FMT "\n",
+			  IPOIB_GID_ARG(tx->neigh->dgid));
+		tx->neigh = NULL;
+	}
+}
+
+static void ipoib_cm_tx_start(struct work_struct *work)
+{
+	struct ipoib_dev_priv *priv = container_of(work, struct ipoib_dev_priv,
+						   cm.start_task);
+	struct net_device *dev = priv->dev;
+	struct ipoib_neigh *neigh;
+	struct ipoib_cm_tx *p;
+	unsigned long flags;
+	int ret;
+
+	struct ib_sa_path_rec pathrec;
+	u32 qpn;
+
+	spin_lock_irqsave(&priv->tx_lock, flags);
+	spin_lock(&priv->lock);
+	while (!list_empty(&priv->cm.start_list)) {
+		p = list_entry(priv->cm.start_list.next, typeof(*p), list);
+		list_del_init(&p->list);
+		neigh = p->neigh;
+		qpn = IPOIB_QPN(neigh->neighbour->ha);
+		memcpy(&pathrec, &p->path->pathrec, sizeof pathrec);
+		spin_unlock(&priv->lock);
+		spin_unlock_irqrestore(&priv->tx_lock, flags);
+		ret = ipoib_cm_tx_init(p, qpn, &pathrec);
+		spin_lock_irqsave(&priv->tx_lock, flags);
+		spin_lock(&priv->lock);
+		if (ret) {
+			neigh = p->neigh;
+			if (neigh) {
+				neigh->cm = NULL;
+				list_del(&neigh->list);
+				if (neigh->ah)
+					ipoib_put_ah(neigh->ah);
+				ipoib_neigh_free(dev, neigh);
+			}
+			list_del(&p->list);
+			kfree(p);
+		}
+	}
+	spin_unlock(&priv->lock);
+	spin_unlock_irqrestore(&priv->tx_lock, flags);
+}
+
+static void ipoib_cm_tx_reap(struct work_struct *work)
+{
+	struct ipoib_dev_priv *priv = container_of(work, struct ipoib_dev_priv,
+						   cm.reap_task);
+	struct ipoib_cm_tx *p;
+	unsigned long flags;
+
+	spin_lock_irqsave(&priv->tx_lock, flags);
+	spin_lock(&priv->lock);
+	while (!list_empty(&priv->cm.reap_list)) {
+		p = list_entry(priv->cm.reap_list.next, typeof(*p), list);
+		list_del(&p->list);
+		spin_unlock(&priv->lock);
+		spin_unlock_irqrestore(&priv->tx_lock, flags);
+		ipoib_cm_tx_destroy(p);
+		spin_lock_irqsave(&priv->tx_lock, flags);
+		spin_lock(&priv->lock);
+	}
+	spin_unlock(&priv->lock);
+	spin_unlock_irqrestore(&priv->tx_lock, flags);
+}
+
+static void ipoib_cm_skb_reap(struct work_struct *work)
+{
+	struct ipoib_dev_priv *priv = container_of(work, struct ipoib_dev_priv,
+						   cm.skb_task);
+	struct net_device *dev = priv->dev;
+	struct sk_buff *skb;
+	unsigned long flags;
+
+	unsigned mtu = priv->mcast_mtu;
+
+	spin_lock_irqsave(&priv->tx_lock, flags);
+	spin_lock(&priv->lock);
+	while ((skb = skb_dequeue(&priv->cm.skb_queue))) {
+		spin_unlock(&priv->lock);
+		spin_unlock_irqrestore(&priv->tx_lock, flags);
+		if (skb->protocol == htons(ETH_P_IP))
+			icmp_send(skb, ICMP_DEST_UNREACH, ICMP_FRAG_NEEDED, htonl(mtu));
+#if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE)
+		else if (skb->protocol == htons(ETH_P_IPV6))
+			icmpv6_send(skb, ICMPV6_PKT_TOOBIG, 0, mtu, dev);
+#endif
+		dev_kfree_skb_any(skb);
+		spin_lock_irqsave(&priv->tx_lock, flags);
+		spin_lock(&priv->lock);
+	}
+	spin_unlock(&priv->lock);
+	spin_unlock_irqrestore(&priv->tx_lock, flags);
+}
+
+void ipoib_cm_skb_too_long(struct net_device* dev, struct sk_buff *skb,
+			   unsigned int mtu)
+{
+	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	int e = skb_queue_empty(&priv->cm.skb_queue);
+
+	if (skb->dst)
+		skb->dst->ops->update_pmtu(skb->dst, mtu);
+
+	skb_queue_tail(&priv->cm.skb_queue, skb);
+	if (e)
+		queue_work(ipoib_workqueue, &priv->cm.skb_task);
+}
+
+static void ipoib_cm_stale_task(struct work_struct *work)
+{
+	struct ipoib_dev_priv *priv = container_of(work, struct ipoib_dev_priv,
+						   cm.stale_task.work);
+	struct ipoib_cm_rx *p;
+	unsigned long flags;
+
+	spin_lock_irqsave(&priv->lock, flags);
+	while (!list_empty(&priv->cm.passive_ids)) {
+		/* List if sorted by LRU, start from tail,
+		 * stop when we see a recently used entry */
+		p = list_entry(priv->cm.passive_ids.prev, typeof(*p), list);
+		if (time_after_eq(jiffies, p->jiffies + IPOIB_CM_RX_TIMEOUT))
+			break;
+		list_del_init(&p->list);
+		spin_unlock_irqrestore(&priv->lock, flags);
+		ib_destroy_cm_id(p->id);
+		ib_destroy_qp(p->qp);
+		kfree(p);
+		spin_lock_irqsave(&priv->lock, flags);
+	}
+	spin_unlock_irqrestore(&priv->lock, flags);
+}
+
+
+static ssize_t show_mode(struct device *d, struct device_attribute *attr, 
+			 char *buf)
+{
+	struct ipoib_dev_priv *priv = netdev_priv(to_net_dev(d));
+
+	if (test_bit(IPOIB_FLAG_ADMIN_CM, &priv->flags))
+		return sprintf(buf, "connected\n");
+	else
+		return sprintf(buf, "datagram\n");
+}
+
+static ssize_t set_mode(struct device *d, struct device_attribute *attr,
+			const char *buf, size_t count)
+{
+	struct net_device *dev = to_net_dev(d);
+	struct ipoib_dev_priv *priv = netdev_priv(dev);
+
+	/* flush paths if we switch modes so that connections are restarted */
+	if (IPOIB_CM_SUPPORTED(dev->dev_addr) && !strcmp(buf, "connected\n")) {
+		set_bit(IPOIB_FLAG_ADMIN_CM, &priv->flags);
+		ipoib_warn(priv, "enabling connected mode "
+			   "will cause multicast packet drops\n");
+		ipoib_flush_paths(dev);
+		return count;
+	}
+
+	if (!strcmp(buf, "datagram\n")) {
+		clear_bit(IPOIB_FLAG_ADMIN_CM, &priv->flags);
+		dev->mtu = min(priv->mcast_mtu, dev->mtu);
+		ipoib_flush_paths(dev);
+		return count;
+	}
+
+	return -EINVAL;
+}
+
+static DEVICE_ATTR(mode, S_IWUGO | S_IRUGO, show_mode, set_mode);
+
+int ipoib_cm_add_mode_attr(struct net_device *dev)
+{
+	return device_create_file(&dev->dev, &dev_attr_mode);
+}
+
+int ipoib_cm_dev_init(struct net_device *dev)
+{
+	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	struct ib_srq_init_attr srq_init_attr = {
+		.attr = {
+			.max_wr  = ipoib_recvq_size,
+			.max_sge = IPOIB_CM_RX_SG
+		}
+	};
+	int ret, i;
+
+	INIT_LIST_HEAD(&priv->cm.passive_ids);
+	INIT_LIST_HEAD(&priv->cm.reap_list);
+	INIT_LIST_HEAD(&priv->cm.start_list);
+	INIT_WORK(&priv->cm.start_task, ipoib_cm_tx_start);
+	INIT_WORK(&priv->cm.reap_task, ipoib_cm_tx_reap);
+	INIT_WORK(&priv->cm.skb_task, ipoib_cm_skb_reap);
+	INIT_DELAYED_WORK(&priv->cm.stale_task, ipoib_cm_stale_task);
+
+	skb_queue_head_init(&priv->cm.skb_queue);
+
+	priv->cm.srq = ib_create_srq(priv->pd, &srq_init_attr);
+	if (IS_ERR(priv->cm.srq)) {
+		ret = PTR_ERR(priv->cm.srq);
+		priv->cm.srq = NULL;
+		return ret;
+	}
+
+	priv->cm.srq_ring = kzalloc(ipoib_recvq_size * sizeof *priv->cm.srq_ring,
+				    GFP_KERNEL);
+	if (!priv->cm.srq_ring) {
+		printk(KERN_WARNING "%s: failed to allocate CM ring (%d entries)\n",
+		       priv->ca->name, ipoib_recvq_size);
+		ipoib_cm_dev_cleanup(dev);
+		return -ENOMEM;
+	}
+
+	for (i = 0; i < IPOIB_CM_RX_SG; ++i)
+		priv->cm.rx_sge[i].lkey	= priv->mr->lkey;
+
+	priv->cm.rx_sge[0].length = IPOIB_CM_HEAD_SIZE;
+	for (i = 1; i < IPOIB_CM_RX_SG; ++i)
+		priv->cm.rx_sge[i].length = PAGE_SIZE;
+	priv->cm.rx_wr.next = NULL;
+	priv->cm.rx_wr.sg_list = priv->cm.rx_sge;
+	priv->cm.rx_wr.num_sge = IPOIB_CM_RX_SG;
+
+	for (i = 0; i < ipoib_recvq_size; ++i) {
+		if (ipoib_cm_alloc_rx_skb(dev, i, priv->cm.srq_ring[i].mapping)) {
+			ipoib_warn(priv, "failed to allocate receive buffer %d\n", i);
+			ipoib_cm_dev_cleanup(dev);
+			return -ENOMEM;
+		}
+		if (ipoib_cm_post_receive(dev, i)) {
+			ipoib_warn(priv, "ipoib_ib_post_receive failed for buf %d\n", i);
+			ipoib_cm_dev_cleanup(dev);
+			return -EIO;
+		}
+	}
+
+	priv->dev->dev_addr[0] = IPOIB_FLAGS_RC;
+	return 0;
+}
+
+void ipoib_cm_dev_cleanup(struct net_device *dev)
+{
+	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	int i, ret;
+
+	if (!priv->cm.srq)
+		return;
+
+	ipoib_dbg(priv, "Cleanup ipoib connected mode.\n");
+
+	ret = ib_destroy_srq(priv->cm.srq);
+	if (ret)
+		ipoib_warn(priv, "ib_destroy_srq failed: %d\n", ret);
+
+	priv->cm.srq = NULL;
+	if (!priv->cm.srq_ring)
+		return;
+	for (i = 0; i < ipoib_recvq_size; ++i)
+		if (priv->cm.srq_ring[i].skb) {
+			ipoib_cm_dma_unmap_rx(priv, priv->cm.srq_ring[i].mapping);
+			dev_kfree_skb_any(priv->cm.srq_ring[i].skb);
+			priv->cm.srq_ring[i].skb = NULL;
+		}
+	kfree(priv->cm.srq_ring);
+	priv->cm.srq_ring = NULL;
+}
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_ib.c b/drivers/infiniband/ulp/ipoib/ipoib_ib.c
index 59d9594..f2aa923 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_ib.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_ib.c
@@ -50,8 +50,6 @@ MODULE_PARM_DESC(data_debug_level,
 		 "Enable data path debug tracing if > 0");
 #endif
 
-#define	IPOIB_OP_RECV	(1ul << 31)
-
 static DEFINE_MUTEX(pkey_mutex);
 
 struct ipoib_ah *ipoib_create_ah(struct net_device *dev,
@@ -268,10 +266,11 @@ static void ipoib_ib_handle_tx_wc(struct net_device *dev, struct ib_wc *wc)
 
 	spin_lock_irqsave(&priv->tx_lock, flags);
 	++priv->tx_tail;
-	if (netif_queue_stopped(dev) &&
-	    test_bit(IPOIB_FLAG_ADMIN_UP, &priv->flags) &&
-	    priv->tx_head - priv->tx_tail <= ipoib_sendq_size >> 1)
+	if (unlikely(test_bit(IPOIB_FLAG_NETIF_STOPPED, &priv->flags)) &&
+	    priv->tx_head - priv->tx_tail <= ipoib_sendq_size >> 1) {
+		clear_bit(IPOIB_FLAG_NETIF_STOPPED, &priv->flags);
 		netif_wake_queue(dev);
+	}
 	spin_unlock_irqrestore(&priv->tx_lock, flags);
 
 	if (wc->status != IB_WC_SUCCESS &&
@@ -283,7 +282,9 @@ static void ipoib_ib_handle_tx_wc(struct net_device *dev, struct ib_wc *wc)
 
 static void ipoib_ib_handle_wc(struct net_device *dev, struct ib_wc *wc)
 {
-	if (wc->wr_id & IPOIB_OP_RECV)
+	if (wc->wr_id & IPOIB_CM_OP_SRQ)
+		ipoib_cm_handle_rx_wc(dev, wc);
+	else if (wc->wr_id & IPOIB_OP_RECV)
 		ipoib_ib_handle_rx_wc(dev, wc);
 	else
 		ipoib_ib_handle_tx_wc(dev, wc);
@@ -327,12 +328,12 @@ void ipoib_send(struct net_device *dev, struct sk_buff *skb,
 	struct ipoib_tx_buf *tx_req;
 	u64 addr;
 
-	if (unlikely(skb->len > dev->mtu + INFINIBAND_ALEN)) {
+	if (unlikely(skb->len > priv->mcast_mtu + INFINIBAND_ALEN)) {
 		ipoib_warn(priv, "packet len %d (> %d) too long to send, dropping\n",
-			   skb->len, dev->mtu + INFINIBAND_ALEN);
+			   skb->len, priv->mcast_mtu + INFINIBAND_ALEN);
 		++priv->stats.tx_dropped;
 		++priv->stats.tx_errors;
-		dev_kfree_skb_any(skb);
+		ipoib_cm_skb_too_long(dev, skb, priv->mcast_mtu);
 		return;
 	}
 
@@ -372,6 +373,7 @@ void ipoib_send(struct net_device *dev, struct sk_buff *skb,
 		if (priv->tx_head - priv->tx_tail == ipoib_sendq_size) {
 			ipoib_dbg(priv, "TX ring full, stopping kernel net queue\n");
 			netif_stop_queue(dev);
+			set_bit(IPOIB_FLAG_NETIF_STOPPED, &priv->flags);
 		}
 	}
 }
@@ -424,6 +426,13 @@ int ipoib_ib_dev_open(struct net_device *dev)
 		return -1;
 	}
 
+	ret = ipoib_cm_dev_open(dev);
+	if (ret) {
+		ipoib_warn(priv, "ipoib_ib_post_receives returned %d\n", ret);
+		ipoib_ib_dev_stop(dev);
+		return -1;
+	}
+
 	clear_bit(IPOIB_STOP_REAPER, &priv->flags);
 	queue_delayed_work(ipoib_workqueue, &priv->ah_reap_task, HZ);
 
@@ -509,6 +518,8 @@ int ipoib_ib_dev_stop(struct net_device *dev)
 
 	clear_bit(IPOIB_FLAG_INITIALIZED, &priv->flags);
 
+	ipoib_cm_dev_stop(dev);
+
 	/*
 	 * Move our QP to the error state and then reinitialize in
 	 * when all work requests have completed or have been flushed.
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c
index af5ee2e..18d27fd 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
@@ -49,8 +49,6 @@
 
 #include <net/dst.h>
 
-#define IPOIB_QPN(ha) (be32_to_cpup((__be32 *) ha) & 0xffffff)
-
 MODULE_AUTHOR("Roland Dreier");
 MODULE_DESCRIPTION("IP-over-InfiniBand net driver");
 MODULE_LICENSE("Dual BSD/GPL");
@@ -145,6 +143,8 @@ static int ipoib_stop(struct net_device *dev)
 
 	netif_stop_queue(dev);
 
+	clear_bit(IPOIB_FLAG_NETIF_STOPPED, &priv->flags);
+
 	/*
 	 * Now flush workqueue to make sure a scheduled task doesn't
 	 * bring our internal state back up.
@@ -178,8 +178,18 @@ static int ipoib_change_mtu(struct net_device *dev, int new_mtu)
 {
 	struct ipoib_dev_priv *priv = netdev_priv(dev);
 
-	if (new_mtu > IPOIB_PACKET_SIZE - IPOIB_ENCAP_LEN)
+	/* dev->mtu > 2K ==> connected mode */
+	if (ipoib_cm_admin_enabled(dev) && new_mtu <= IPOIB_CM_MTU) {
+		if (new_mtu > priv->mcast_mtu)
+			ipoib_warn(priv, "mtu > %d will cause multicast packet drops.\n",
+				   priv->mcast_mtu);
+		dev->mtu = new_mtu;
+		return 0;
+	}
+
+	if (new_mtu > IPOIB_PACKET_SIZE - IPOIB_ENCAP_LEN) {
 		return -EINVAL;
+	}
 
 	priv->admin_mtu = new_mtu;
 
@@ -414,6 +424,20 @@ static void path_rec_completion(int status,
 			memcpy(&neigh->dgid.raw, &path->pathrec.dgid.raw,
 			       sizeof(union ib_gid));
 
+			if (ipoib_cm_enabled(dev, neigh->neighbour)) {
+				if (!ipoib_cm_get(neigh))
+					ipoib_cm_set(neigh, ipoib_cm_create_tx(dev,
+									       path,
+									       neigh));
+				if (!ipoib_cm_get(neigh)) {
+					list_del(&neigh->list);
+					if (neigh->ah)
+						ipoib_put_ah(neigh->ah);
+					ipoib_neigh_free(dev, neigh);
+					continue;
+				}
+			}
+
 			while ((skb = __skb_dequeue(&neigh->queue)))
 				__skb_queue_tail(&skqueue, skb);
 		}
@@ -520,7 +544,25 @@ static void neigh_add_path(struct sk_buff *skb, struct net_device *dev)
 		memcpy(&neigh->dgid.raw, &path->pathrec.dgid.raw,
 		       sizeof(union ib_gid));
 
-		ipoib_send(dev, skb, path->ah, IPOIB_QPN(skb->dst->neighbour->ha));
+		if (ipoib_cm_enabled(dev, neigh->neighbour)) {
+			if (!ipoib_cm_get(neigh))
+				ipoib_cm_set(neigh, ipoib_cm_create_tx(dev, path, neigh));
+			if (!ipoib_cm_get(neigh)) {
+				list_del(&neigh->list);
+				if (neigh->ah)
+					ipoib_put_ah(neigh->ah);
+				ipoib_neigh_free(dev, neigh);
+				goto err_drop;
+			}
+			if (skb_queue_len(&neigh->queue) < IPOIB_MAX_PATH_REC_QUEUE)
+				__skb_queue_tail(&neigh->queue, skb);
+			else {
+				ipoib_warn(priv, "queue length limit %d. Packet drop.\n",
+					   skb_queue_len(&neigh->queue));
+				goto err_drop;
+			}
+		} else
+			ipoib_send(dev, skb, path->ah, IPOIB_QPN(skb->dst->neighbour->ha));
 	} else {
 		neigh->ah  = NULL;
 
@@ -538,6 +580,7 @@ err_list:
 
 err_path:
 	ipoib_neigh_free(dev, neigh);
+err_drop:
 	++priv->stats.tx_dropped;
 	dev_kfree_skb_any(skb);
 
@@ -640,7 +683,12 @@ static int ipoib_start_xmit(struct sk_buff *skb, struct net_device *dev)
 
 		neigh = *to_ipoib_neigh(skb->dst->neighbour);
 
-		if (likely(neigh->ah)) {
+		if (ipoib_cm_get(neigh)) {
+			if (ipoib_cm_up(neigh)) {
+				ipoib_cm_send(dev, skb, ipoib_cm_get(neigh));
+				goto out;
+			}
+		} else if (neigh->ah) {
 			if (unlikely(memcmp(&neigh->dgid.raw,
 					    skb->dst->neighbour->ha + 4,
 					    sizeof(union ib_gid)))) {
@@ -805,6 +853,7 @@ struct ipoib_neigh *ipoib_neigh_alloc(struct neighbour *neighbour)
 	neigh->neighbour = neighbour;
 	*to_ipoib_neigh(neighbour) = neigh;
 	skb_queue_head_init(&neigh->queue);
+	ipoib_cm_set(neigh, NULL);
 
 	return neigh;
 }
@@ -818,6 +867,8 @@ void ipoib_neigh_free(struct net_device *dev, struct ipoib_neigh *neigh)
 		++priv->stats.tx_dropped;
 		dev_kfree_skb_any(skb);
 	}
+	if (ipoib_cm_get(neigh))
+		ipoib_cm_destroy_tx(ipoib_cm_get(neigh));
 	kfree(neigh);
 }
 
@@ -1080,6 +1131,8 @@ static struct net_device *ipoib_add_port(const char *format,
 
 	ipoib_create_debug_files(priv->dev);
 
+	if (ipoib_cm_add_mode_attr(priv->dev))
+		goto sysfs_failed;
 	if (ipoib_add_pkey_attr(priv->dev))
 		goto sysfs_failed;
 	if (device_create_file(&priv->dev->dev, &dev_attr_create_child))
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
index b04b72c..fea737f 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
@@ -597,7 +597,9 @@ void ipoib_mcast_join_task(struct work_struct *work)
 
 	priv->mcast_mtu = ib_mtu_enum_to_int(priv->broadcast->mcmember.mtu) -
 		IPOIB_ENCAP_LEN;
-	dev->mtu = min(priv->mcast_mtu, priv->admin_mtu);
+
+	if (!ipoib_cm_admin_enabled(dev))
+		dev->mtu = min(priv->mcast_mtu, priv->admin_mtu);
 
 	ipoib_dbg_mcast(priv, "successfully joined all multicast groups\n");
 
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_verbs.c b/drivers/infiniband/ulp/ipoib/ipoib_verbs.c
index 7b717c6..3cb551b 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_verbs.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_verbs.c
@@ -168,35 +168,41 @@ int ipoib_transport_dev_init(struct net_device *dev, struct ib_device *ca)
 		.qp_type     = IB_QPT_UD
 	};
 
+	int ret, size;
+
 	priv->pd = ib_alloc_pd(priv->ca);
 	if (IS_ERR(priv->pd)) {
 		printk(KERN_WARNING "%s: failed to allocate PD\n", ca->name);
 		return -ENODEV;
 	}
 
-	priv->cq = ib_create_cq(priv->ca, ipoib_ib_completion, NULL, dev,
-				ipoib_sendq_size + ipoib_recvq_size + 1);
+	priv->mr = ib_get_dma_mr(priv->pd, IB_ACCESS_LOCAL_WRITE);
+	if (IS_ERR(priv->mr)) {
+		printk(KERN_WARNING "%s: ib_get_dma_mr failed\n", ca->name);
+		goto out_free_pd;
+	}
+
+	size = ipoib_sendq_size + ipoib_recvq_size + 1;
+	ret = ipoib_cm_dev_init(dev);
+	if (!ret)
+		size += ipoib_recvq_size;
+
+	priv->cq = ib_create_cq(priv->ca, ipoib_ib_completion, NULL, dev, size);
 	if (IS_ERR(priv->cq)) {
 		printk(KERN_WARNING "%s: failed to create CQ\n", ca->name);
-		goto out_free_pd;
+		goto out_free_mr;
 	}
 
 	if (ib_req_notify_cq(priv->cq, IB_CQ_NEXT_COMP))
 		goto out_free_cq;
 
-	priv->mr = ib_get_dma_mr(priv->pd, IB_ACCESS_LOCAL_WRITE);
-	if (IS_ERR(priv->mr)) {
-		printk(KERN_WARNING "%s: ib_get_dma_mr failed\n", ca->name);
-		goto out_free_cq;
-	}
-
 	init_attr.send_cq = priv->cq;
 	init_attr.recv_cq = priv->cq,
 
 	priv->qp = ib_create_qp(priv->pd, &init_attr);
 	if (IS_ERR(priv->qp)) {
 		printk(KERN_WARNING "%s: failed to create QP\n", ca->name);
-		goto out_free_mr;
+		goto out_free_cq;
 	}
 
 	priv->dev->dev_addr[1] = (priv->qp->qp_num >> 16) & 0xff;
@@ -212,12 +218,12 @@ int ipoib_transport_dev_init(struct net_device *dev, struct ib_device *ca)
 
 	return 0;
 
-out_free_mr:
-	ib_dereg_mr(priv->mr);
-
 out_free_cq:
 	ib_destroy_cq(priv->cq);
 
+out_free_mr:
+	ib_dereg_mr(priv->mr);
+
 out_free_pd:
 	ib_dealloc_pd(priv->pd);
 	return -ENODEV;
@@ -235,12 +241,14 @@ void ipoib_transport_dev_cleanup(struct net_device *dev)
 		clear_bit(IPOIB_PKEY_ASSIGNED, &priv->flags);
 	}
 
-	if (ib_dereg_mr(priv->mr))
-		ipoib_warn(priv, "ib_dereg_mr failed\n");
-
 	if (ib_destroy_cq(priv->cq))
 		ipoib_warn(priv, "ib_cq_destroy failed\n");
 
+	ipoib_cm_dev_cleanup(dev);
+
+	if (ib_dereg_mr(priv->mr))
+		ipoib_warn(priv, "ib_dereg_mr failed\n");
+
 	if (ib_dealloc_pd(priv->pd))
 		ipoib_warn(priv, "ib_dealloc_pd failed\n");
 }
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_vlan.c b/drivers/infiniband/ulp/ipoib/ipoib_vlan.c
index 085eafe..6762988 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_vlan.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_vlan.c
@@ -115,6 +115,8 @@ int ipoib_vlan_add(struct net_device *pdev, unsigned short pkey)
 
 	ipoib_create_debug_files(priv->dev);
 
+	if (ipoib_cm_add_mode_attr(priv->dev))
+		goto sysfs_failed;
 	if (ipoib_add_pkey_attr(priv->dev))
 		goto sysfs_failed;
 

From mst at mellanox.co.il  Thu Feb  8 14:54:50 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Fri, 9 Feb 2007 00:54:50 +0200
Subject: [openib-general] [PATCH] IB/ipoib_cm: fix up issues from code
	review
In-Reply-To: <ada3b5gtgqm.fsf@cisco.com>
References: <ada3b5gtgqm.fsf@cisco.com>
Message-ID: <20070208225450.GJ6560@mellanox.co.il>

> Quoting Roland Dreier <rdreier at cisco.com>:
> Subject: Re: [PATCH] IB/ipoib_cm: fix up issues from code review
> 
> OK, I pulled this in and fixed it to build with the netdevice
> class_device-ectomy that just went upstream, and pushed it out on my
> for-2.6.21 branch like this.

Thanks!

-- 
MST


From sashak at voltaire.com  Thu Feb  8 15:14:12 2007
From: sashak at voltaire.com (Sasha Khapyorsky)
Date: Fri, 9 Feb 2007 01:14:12 +0200
Subject: [openib-general] [PATCH TRIVIAL] opensm: remove #ifdef __WIN__ in
 not shared file.
Message-ID: <20070208231412.GA22807@sashak.voltaire.com>


opensm/main.c is not shared by win OpenSM, and #ifdef __WIN__ is not
needed here.

Signed-off-by: Sasha Khapyorsky <sashak at voltaire.com>
---
 osm/opensm/main.c |    5 -----
 1 files changed, 0 insertions(+), 5 deletions(-)

diff --git a/osm/opensm/main.c b/osm/opensm/main.c
index 69c940c..fa09360 100644
--- a/osm/opensm/main.c
+++ b/osm/opensm/main.c
@@ -65,10 +65,6 @@ static volatile unsigned int osm_usr1_flag = 0;
 #define GUID_ARRAY_SIZE 64
 #define INVALID_GUID (0xFFFFFFFFFFFFFFFFULL)
 
-#ifdef __WIN__
-#define block_signals()
-#define setup_signals()
-#else
 static void mark_exit_flag(int signum)
 {
 	if(!osm_exit_flag)
@@ -119,7 +115,6 @@ static void setup_signals()
 #endif
 	pthread_sigmask(SIG_SETMASK, &saved_sigset, NULL);
 }
-#endif /* __WIN__ */
 
 /**********************************************************************
  **********************************************************************/
-- 
1.5.0.rc2.g11a3


From sashak at voltaire.com  Thu Feb  8 15:16:18 2007
From: sashak at voltaire.com (Sasha Khapyorsky)
Date: Fri, 9 Feb 2007 01:16:18 +0200
Subject: [openib-general] [PATCH TRIVIAL] osmtest: use more descriptive
	constant names
Message-ID: <20070208231618.GB22807@sashak.voltaire.com>


Use more descriptive constant names for osmtest flows.

Signed-off-by: Sasha Khapyorsky <sashak at voltaire.com>
---
 osm/osmtest/include/osmtest.h |   12 ++++++++++++
 osm/osmtest/main.c            |   20 ++++++++++----------
 osm/osmtest/osmtest.c         |   22 +++++++++++++---------
 3 files changed, 35 insertions(+), 19 deletions(-)

diff --git a/osm/osmtest/include/osmtest.h b/osm/osmtest/include/osmtest.h
index 39afbaf..13131dd 100644
--- a/osm/osmtest/include/osmtest.h
+++ b/osm/osmtest/include/osmtest.h
@@ -56,6 +56,18 @@
 #include "osmtest_base.h"
 #include "osmtest_subnet.h"
 
+enum OSMT_FLOWS {
+	OSMT_FLOW_ALL = 0,
+	OSMT_FLOW_CREATE_INVENTORY,
+	OSMT_FLOW_VALIDATE_INVENTORY,
+	OSMT_FLOW_SERVICE_REGISTRATION,
+	OSMT_FLOW_EVENT_FORWARDING,
+	OSMT_FLOW_STRESS_SA,
+	OSMT_FLOW_MULTICAST,
+	OSMT_FLOW_QOS,
+	OSMT_FLOW_TRAP,
+};
+
 /****s* OpenSM: Subnet/osmtest_opt_t
  * NAME
  * osmtest_opt_t
diff --git a/osm/osmtest/main.c b/osm/osmtest/main.c
index ca5805b..5f402b7 100644
--- a/osm/osmtest/main.c
+++ b/osm/osmtest/main.c
@@ -354,7 +354,7 @@ main( int argc,
 	opt.create = FALSE;
 	opt.mmode = 1;
 	opt.ignore_path_records = FALSE; /*  Do path Records too */
-	opt.flow = 0; /*  run all validation tests */
+	opt.flow = OSMT_FLOW_ALL; /*  run all validation tests */
 	strcpy(flow_name, "All Validations");
 	strcpy( opt.file_name, "osmtest.dat" );
 
@@ -396,31 +396,31 @@ main( int argc,
 
 		  if (!strcmp("c", optarg)) {
 			 strcpy(flow_name, "Create Inventory");
-			 opt.flow = 1;
+			 opt.flow = OSMT_FLOW_CREATE_INVENTORY;
 		  } else if (!strcmp("v", optarg)) {
 			 strcpy(flow_name, "Validate Inventory");
-			 opt.flow = 2;
+			 opt.flow = OSMT_FLOW_VALIDATE_INVENTORY;
 		  } else if (!strcmp("s", optarg)) {
 			 strcpy(flow_name, "Services Registration");
-			 opt.flow = 3;
+			 opt.flow = OSMT_FLOW_SERVICE_REGISTRATION;
 		  } else if (!strcmp("e", optarg)) {
 			 strcpy(flow_name, "Event Forwarding");
-			 opt.flow = 4;
+			 opt.flow = OSMT_FLOW_EVENT_FORWARDING;
 		  } else if (!strcmp("f", optarg)) {
 			 strcpy(flow_name, "Stress SA");
-			 opt.flow = 5;
+			 opt.flow = OSMT_FLOW_STRESS_SA;
 		  } else if (!strcmp("m", optarg)) {
 			 strcpy(flow_name, "Multicast");
-			 opt.flow = 6;
+			 opt.flow = OSMT_FLOW_MULTICAST;
 		  } else if (!strcmp("q", optarg)) {
 			 strcpy(flow_name, "QoS: VLArb and SLtoVL");
-			 opt.flow = 7;
+			 opt.flow = OSMT_FLOW_QOS;
 		  } else if (!strcmp("t", optarg)) {
 			 strcpy(flow_name, "Trap 64/65");
-			 opt.flow = 8;
+			 opt.flow = OSMT_FLOW_TRAP;
 		  } else if (!strcmp("a", optarg)) {
 			 strcpy(flow_name, "All Validations");
-			 opt.flow = 0;
+			 opt.flow = OSMT_FLOW_ALL;
 		  } else {
 			 printf( "\nError: unknown flow %s\n",flow_name);
 			 exit(2);
diff --git a/osm/osmtest/osmtest.c b/osm/osmtest/osmtest.c
index 3c16a6f..ce185ec 100644
--- a/osm/osmtest/osmtest.c
+++ b/osm/osmtest/osmtest.c
@@ -7948,7 +7948,7 @@ osmtest_run( IN osmtest_t * const p_osmt )
     goto Exit;
   }
 
-  if( p_osmt->opt.flow == 1 )
+  if( p_osmt->opt.flow == OSMT_FLOW_CREATE_INVENTORY )
   {
     /*
      * Creating an inventory file with all nodes, ports and paths
@@ -7965,7 +7965,7 @@ osmtest_run( IN osmtest_t * const p_osmt )
   }
   else
   {
-    if( p_osmt->opt.flow == 5 )
+    if( p_osmt->opt.flow == OSMT_FLOW_STRESS_SA )
     {
       /*
        * Stress SA - flood the it with queries
@@ -8030,7 +8030,8 @@ osmtest_run( IN osmtest_t * const p_osmt )
       /*
        * Run normal validition tests.
        */
-       if (p_osmt->opt.flow == 0 || p_osmt->opt.flow == 2)
+       if (p_osmt->opt.flow == OSMT_FLOW_ALL ||
+           p_osmt->opt.flow == OSMT_FLOW_VALIDATE_INVENTORY)
        {
          /*
           * Only validate the given inventory file
@@ -8056,7 +8057,7 @@ osmtest_run( IN osmtest_t * const p_osmt )
          }
        }
 
-       if (p_osmt->opt.flow == 0)
+       if (p_osmt->opt.flow == OSMT_FLOW_ALL)
        {
          status = osmtest_wrong_sm_key_ignored( p_osmt );
          if( status != IB_SUCCESS )
@@ -8069,7 +8070,8 @@ osmtest_run( IN osmtest_t * const p_osmt )
          }
        }
 
-       if (p_osmt->opt.flow == 0 || p_osmt->opt.flow == 3)
+       if (p_osmt->opt.flow == OSMT_FLOW_ALL ||
+           p_osmt->opt.flow == OSMT_FLOW_SERVICE_REGISTRATION)
        {
          /*
           * run service registration, deregistration, and lease test
@@ -8085,7 +8087,8 @@ osmtest_run( IN osmtest_t * const p_osmt )
          }
        }
 
-       if (p_osmt->opt.flow == 0 || p_osmt->opt.flow == 4)
+       if (p_osmt->opt.flow == OSMT_FLOW_ALL ||
+           p_osmt->opt.flow == OSMT_FLOW_EVENT_FORWARDING)
        {
           /* 
            * Run event forwarding test
@@ -8110,7 +8113,7 @@ osmtest_run( IN osmtest_t * const p_osmt )
 #endif
         }
 
-        if (p_osmt->opt.flow == 7)
+        if (p_osmt->opt.flow == OSMT_FLOW_QOS)
         {
           /* 
            * QoS info: dump VLArb and SLtoVL tables.
@@ -8138,7 +8141,7 @@ osmtest_run( IN osmtest_t * const p_osmt )
           }
         }
 
-        if (p_osmt->opt.flow == 8)
+        if (p_osmt->opt.flow == OSMT_FLOW_TRAP)
         {
           /*
            * Run trap 64/65 flow (this flow requires running of external tool)
@@ -8162,7 +8165,8 @@ osmtest_run( IN osmtest_t * const p_osmt )
 #endif
         }
 
-        if (p_osmt->opt.flow == 0 || p_osmt->opt.flow == 6)
+        if (p_osmt->opt.flow == OSMT_FLOW_ALL ||
+            p_osmt->opt.flow == OSMT_FLOW_MULTICAST)
         {
           /*
            * Multicast flow
-- 
1.5.0.rc2.g11a3


From swise at opengridcomputing.com  Thu Feb  8 15:11:21 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Thu, 08 Feb 2007 17:11:21 -0600
Subject: [openib-general] dapl broken for iWARP
In-Reply-To: <1170885460.31481.0.camel@stevo-desktop>
References: <1170878543.30334.52.camel@stevo-desktop>
	<1170885460.31481.0.camel@stevo-desktop>
Message-ID: <1170976281.3049.122.camel@stevo-desktop>

On Wed, 2007-02-07 at 15:57 -0600, Steve Wise wrote:
> On Wed, 2007-02-07 at 14:02 -0600, Steve Wise wrote:
> > Arlin,
> > 
> > The OFED dapl code is assuming the responder_resources and
> > initiator_depth passed up on a connection request event are from the
> > remote peer.  This doesn't happen for iWARP.  In the current iWARP
> > specifications, its up to the application to exchange this information
> > somehow. So these are defaulting to 0 on the server side of any dapl
> > connection over iWARP.  
> > 
> > This is a fairly recent change, I think.  We need to come up with some
> > way to deal with this for OFED 1.2 IMO.
> > 
> 
> The IWCM could set these to the device max values for instance.
> 
> Steve.
> 

There is a slight problem with all this.  There are no device attributes
currently for ORD and IRD.  The ammasso driver maps these to
max_qp_rd_atom (IRD) and max_qp_init_rd_atom(ORD).  But this is screwy.
We need new attribute for these.

For OFED 1.2, I think I should just have the IWCM set them to 8.  The
only RNIC in ofed is cxgb3 and it supports 8...


Steve.


From mshefty at ichips.intel.com  Thu Feb  8 15:43:24 2007
From: mshefty at ichips.intel.com (Sean Hefty)
Date: Thu, 08 Feb 2007 15:43:24 -0800
Subject: [openib-general] Problem is routing CM REQ
In-Reply-To: <45CB6A8F.2030705@ichips.intel.com>
References: <20070126000319.GA12386@obsidianresearch.com>
	<ada4pqel66k.fsf@cisco.com>
	<20070126180840.GD12386@obsidianresearch.com>
	<45CA2084.7090503@ichips.intel.com>
	<20070207191154.GC11411@obsidianresearch.com>
	<45CA3568.1000508@ichips.intel.com>
	<20070207213108.GD11411@obsidianresearch.com>
	<45CA5573.80802@ichips.intel.com>
	<20070207224928.GF11411@obsidianresearch.com>
	<1170894459.31538.23768.camel@hal.voltaire.com>
	<45CB6A8F.2030705@ichips.intel.com>
Message-ID: <45CBB59C.4010709@ichips.intel.com>

> Looking at the problem more, I think that the issue extends to the remote port 
> LID as well.  My expectation with a local path record query is that the SLID is 
> the local port, and the DLID is the local router.  This should be sufficient for 
> one-way UD traffic, but for connected traffic we still need to discover the 
> remote router and remote port LIDs.

Given a path record query for:

SGID - local
DGID - remote

What would be the SLID and DLID?

And if the query is reversed, such that:

SGID - remote
DGID - local

Are the SLID/DLID values simply reversed?

What if the DGID in the second case were a multicast GID?  What does the SLID 
become in this case?

- Sean


From halr at voltaire.com  Thu Feb  8 15:18:03 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 08 Feb 2007 18:18:03 -0500
Subject: [openib-general] Problem is routing CM REQ was: Use a GRH when
 appropriate for unicast packets
In-Reply-To: <45CB9DDA.8020303@ichips.intel.com>
References: <000201c74bba$f72b7890$e598070a@amr.corp.intel.com>
	<1170967182.31538.96962.camel@hal.voltaire.com>
	<45CB9DDA.8020303@ichips.intel.com>
Message-ID: <1170976680.31538.106389.camel@hal.voltaire.com>

On Thu, 2007-02-08 at 17:02, Sean Hefty wrote:
> >>This requires that the passive side be able to issue path record queries, but I
> >>think that it could work for static routes.  A point was made to me that the
> >>remote side could be a TCA without query capabilities.
> > 
> > Are you referring to SA query capabilities ? Would such a device just be
> > expected to work without change in an IB routed environment anyway ?
> 
> Yes I was referring to SA query capability, such as a path record query.  Since 
> the spec requires that the path information be provided by the active side, I 
> think that such a device could work without change.  (But it does mean that the 
> active side has to provide some way to obtain the necessary information to put 
> into a CM REQ, plus know what the remote router will do.)

It also means it needs to be able to put a GRH in on the sending side.
Not sure that is "free" in implementations as you have been noting for
OpenIB recently.

-- Hal

> - Sean


From rdreier at cisco.com  Thu Feb  8 15:50:42 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Thu, 08 Feb 2007 15:50:42 -0800
Subject: [openib-general] [PATCH] RDMA/iwcm: Bugs in
 cm_conn_req_handler()
In-Reply-To: <1170860483.11491.21.camel@trinity.ogc.int> (Tom Tucker's
	message of "Wed, 07 Feb 2007 09:01:23 -0600")
References: <20070207065650.24166.6979.sendpatchset@localhost.localdomain>
	<1170858272.14381.1.camel@stevo-desktop>
	<1170860483.11491.21.camel@trinity.ogc.int>
Message-ID: <adazm7onr8t.fsf@cisco.com>

Hmm, Steve likes it, Tom doesn't.  Can you guys arm wrestle or
something and tell me if this patch is correct or not?

 - R.


From rdreier at cisco.com  Thu Feb  8 15:56:28 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Thu, 08 Feb 2007 15:56:28 -0800
Subject: [openib-general] please pull for 2.6.21: fix + add IB multicast
 support
In-Reply-To: <45CB3537.8060508@voltaire.com> (Or Gerlitz's message of
	"Thu, 08 Feb 2007 16:35:35 +0200")
References: <000701c741a6$16dc4760$ff0da8c0@amr.corp.intel.com>
	<45BF8E17.2010805@ichips.intel.com> <45C37BE9.5040105@ichips.intel.com>
	<45C85B39.4080700@voltaire.com> <45CB3537.8060508@voltaire.com>
Message-ID: <adaodo4nqz7.fsf@cisco.com>

I merged the "increment port number" and "remove redundant '_wq'"
patches from git.openfabrics.org/~shefty/scm/rdma-dev.git for-roland

I plan to review to multicast stuff next week and I hope to merge it
for 2.6.21.  Or, have you or anyone else at Voltaire read over the
code in addition to using it?  Do you see anything that should be
cleaned up?

 - R.


From rdreier at cisco.com  Thu Feb  8 16:26:25 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Thu, 08 Feb 2007 16:26:25 -0800
Subject: [openib-general] [PATCH 0/5] iw_cxgb3 - misc cleanup and fixes
In-Reply-To: <20070208202634.4382.15287.stgit@dell3.ogc.int> (Steve
	Wise's message of "Thu, 08 Feb 2007 14:26:34 -0600")
References: <20070208202634.4382.15287.stgit@dell3.ogc.int>
Message-ID: <ada64acnpla.fsf@cisco.com>

OK, I've pulled the cxgb3 stuff into a single commit in my for-2.6.21
branch.  I took the liberty of cleaning up some sparse warnings, etc.
There's still a few other obvious things to fix up:

    drivers/infiniband/hw/cxgb3/iwch_ev.c:102:6: warning: symbol 'iwch_ev_disp
atch' was not declared. Should it be static?

  Rather than putting an extern in iwch.c, please put a proper
  definition in an appropriate header file included from iwch.c.

Also I agree with MST, I would like to see the core/ subdirectory die
completely.

 - R.


From swise at opengridcomputing.com  Thu Feb  8 16:39:10 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Thu, 08 Feb 2007 18:39:10 -0600
Subject: [openib-general] [PATCH 0/5] iw_cxgb3 - misc cleanup and fixes
In-Reply-To: <ada64acnpla.fsf@cisco.com>
References: <20070208202634.4382.15287.stgit@dell3.ogc.int>
	<ada64acnpla.fsf@cisco.com>
Message-ID: <1170981550.3049.130.camel@stevo-desktop>

On Thu, 2007-02-08 at 16:26 -0800, Roland Dreier wrote:
> OK, I've pulled the cxgb3 stuff into a single commit in my for-2.6.21
> branch.  I took the liberty of cleaning up some sparse warnings, etc.
> There's still a few other obvious things to fix up:
> 
>     drivers/infiniband/hw/cxgb3/iwch_ev.c:102:6: warning: symbol 'iwch_ev_disp
> atch' was not declared. Should it be static?
> 
>   Rather than putting an extern in iwch.c, please put a proper
>   definition in an appropriate header file included from iwch.c.
> 

ok.

> Also I agree with MST, I would like to see the core/ subdirectory die
> completely.
> 

ok ok...I'll kill the subdir...


From rdreier at cisco.com  Thu Feb  8 16:40:09 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Thu, 08 Feb 2007 16:40:09 -0800
Subject: [openib-general] [PATCH 0/5] iw_cxgb3 - misc cleanup and fixes
In-Reply-To: <ada64acnpla.fsf@cisco.com> (Roland Dreier's message of
	"Thu, 08 Feb 2007 16:26:25 -0800")
References: <20070208202634.4382.15287.stgit@dell3.ogc.int>
	<ada64acnpla.fsf@cisco.com>
Message-ID: <adatzxwmady.fsf@cisco.com>

Oh yeah -- Steve, please keep sending cleanup patches based on my tree
now.  I'm planning on asking Linus to merge what's in for-2.6.21 in
the next couple of days, but there's still more than a week before the
merge window closes, and even after the merge window closes I'll still
accept fixes/cleanups for stuff already upstream.

And here's what I have pending in for-2.6.21 so far:

Ahmed S. Darwish (1):
      IB/core: Use ARRAY_SIZE macro for mandatory_table

Akinobu Mita (1):
      IB/ehca: Fix memleak on module unloading

David Howells (1):
      IB/mthca: Work around gcc bug on sparc64

Michael S. Tsirkin (1):
      IPoIB: Connected mode experimental support

Roland Dreier (1):
      IB/mthca: Use correct structure size in call to memset()

Sean Hefty (2):
      RDMA/cma: Increment port number after close to avoid re-use
      IB: Remove redundant "_wq" from workqueue names

Steve Wise (1):
      RDMA/cxgb3: Add driver for Chelsio T3 Rnic


From swise at opengridcomputing.com  Thu Feb  8 16:40:53 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Thu, 08 Feb 2007 18:40:53 -0600
Subject: [openib-general] [PATCH] [RFC] ofed_1_2 - SLES9SP3 Backport
 -IWCM workaroundfor ip_dev_find() bug.
In-Reply-To: <1170805153.19662.155.camel@stevo-desktop>
References: <1170799320.19662.124.camel@stevo-desktop>
	<20070206221232.GO24372@mellanox.co.il>
	<1170805153.19662.155.camel@stevo-desktop>
Message-ID: <1170981653.3049.133.camel@stevo-desktop>

Michael, 

>From your email, it sounded like you would regression test this.  Is it
ready to pull in?  

Thanks!

Steve.


On Tue, 2007-02-06 at 17:39 -0600, Steve Wise wrote:
> Here it is (only tested with rping over iWARP on sles9sp3):
> 
> ----------------
> 
> 
> xxx_ip_dev_find() must use scope HOST.
> 
> From: Steve Wise <swise at opengridcomputing.com>
> 
> Function xxx_ip_dev_find(RT_SCOPE_LINK) returns the wrong interface on
> some kernels.  The correct scope is RT_SCOPE_HOST.
> 
> Signed-off-by: Steve Wise <swise at opengridcomputing.com>
> ---
> 
>  .../backport/2.6.11/include/linux/inetdevice.h     |    2 +-
>  .../backport/2.6.11_FC4/include/linux/inetdevice.h |    2 +-
>  .../backport/2.6.12/include/linux/inetdevice.h     |    2 +-
>  .../backport/2.6.13/include/linux/inetdevice.h     |    2 +-
>  .../2.6.13_suse10_0_u/include/linux/inetdevice.h   |    2 +-
>  .../backport/2.6.14/include/linux/inetdevice.h     |    2 +-
>  .../backport/2.6.15/include/linux/inetdevice.h     |    2 +-
>  .../2.6.15_ubuntu606/include/linux/inetdevice.h    |    2 +-
>  .../backport/2.6.16/include/linux/inetdevice.h     |    2 +-
>  .../backport/2.6.17/include/linux/inetdevice.h     |    2 +-
>  .../2.6.5_sles9_sp3/include/linux/inetdevice.h     |    2 +-
>  .../backport/2.6.9_U2/include/linux/inetdevice.h   |    2 +-
>  .../backport/2.6.9_U3/include/linux/inetdevice.h   |    2 +-
>  .../backport/2.6.9_U4/include/linux/inetdevice.h   |    2 +-
>  14 files changed, 14 insertions(+), 14 deletions(-)
> 
> diff --git a/kernel_addons/backport/2.6.11/include/linux/inetdevice.h b/kernel_addons/backport/2.6.11/include/linux/inetdevice.h
> index 7244487..2d3c50f 100644
> --- a/kernel_addons/backport/2.6.11/include/linux/inetdevice.h
> +++ b/kernel_addons/backport/2.6.11/include/linux/inetdevice.h
> @@ -13,7 +13,7 @@ static inline struct net_device *xxx_ip_
>  
>  	read_lock(&dev_base_lock);
>  	for (dev = dev_base; dev; dev = dev->next) {
> -		ip = inet_select_addr(dev, 0, RT_SCOPE_LINK);
> +		ip = inet_select_addr(dev, 0, RT_SCOPE_HOST);
>  		if (ip == addr) {
>  			dev_hold(dev);
>  			break;
> diff --git a/kernel_addons/backport/2.6.11_FC4/include/linux/inetdevice.h b/kernel_addons/backport/2.6.11_FC4/include/linux/inetdevice.h
> index 7244487..2d3c50f 100644
> --- a/kernel_addons/backport/2.6.11_FC4/include/linux/inetdevice.h
> +++ b/kernel_addons/backport/2.6.11_FC4/include/linux/inetdevice.h
> @@ -13,7 +13,7 @@ static inline struct net_device *xxx_ip_
>  
>  	read_lock(&dev_base_lock);
>  	for (dev = dev_base; dev; dev = dev->next) {
> -		ip = inet_select_addr(dev, 0, RT_SCOPE_LINK);
> +		ip = inet_select_addr(dev, 0, RT_SCOPE_HOST);
>  		if (ip == addr) {
>  			dev_hold(dev);
>  			break;
> diff --git a/kernel_addons/backport/2.6.12/include/linux/inetdevice.h b/kernel_addons/backport/2.6.12/include/linux/inetdevice.h
> index 7244487..2d3c50f 100644
> --- a/kernel_addons/backport/2.6.12/include/linux/inetdevice.h
> +++ b/kernel_addons/backport/2.6.12/include/linux/inetdevice.h
> @@ -13,7 +13,7 @@ static inline struct net_device *xxx_ip_
>  
>  	read_lock(&dev_base_lock);
>  	for (dev = dev_base; dev; dev = dev->next) {
> -		ip = inet_select_addr(dev, 0, RT_SCOPE_LINK);
> +		ip = inet_select_addr(dev, 0, RT_SCOPE_HOST);
>  		if (ip == addr) {
>  			dev_hold(dev);
>  			break;
> diff --git a/kernel_addons/backport/2.6.13/include/linux/inetdevice.h b/kernel_addons/backport/2.6.13/include/linux/inetdevice.h
> index 7a32313..fd0aa36 100644
> --- a/kernel_addons/backport/2.6.13/include/linux/inetdevice.h
> +++ b/kernel_addons/backport/2.6.13/include/linux/inetdevice.h
> @@ -11,7 +11,7 @@ static inline struct net_device *xxx_ip_
>  
>  	read_lock(&dev_base_lock);
>  	for (dev = dev_base; dev; dev = dev->next) {
> -		ip = inet_select_addr(dev, 0, RT_SCOPE_LINK);
> +		ip = inet_select_addr(dev, 0, RT_SCOPE_HOST);
>  		if (ip == addr) {
>  			dev_hold(dev);
>  			break;
> diff --git a/kernel_addons/backport/2.6.13_suse10_0_u/include/linux/inetdevice.h b/kernel_addons/backport/2.6.13_suse10_0_u/include/linux/inetdevice.h
> index 7a32313..fd0aa36 100644
> --- a/kernel_addons/backport/2.6.13_suse10_0_u/include/linux/inetdevice.h
> +++ b/kernel_addons/backport/2.6.13_suse10_0_u/include/linux/inetdevice.h
> @@ -11,7 +11,7 @@ static inline struct net_device *xxx_ip_
>  
>  	read_lock(&dev_base_lock);
>  	for (dev = dev_base; dev; dev = dev->next) {
> -		ip = inet_select_addr(dev, 0, RT_SCOPE_LINK);
> +		ip = inet_select_addr(dev, 0, RT_SCOPE_HOST);
>  		if (ip == addr) {
>  			dev_hold(dev);
>  			break;
> diff --git a/kernel_addons/backport/2.6.14/include/linux/inetdevice.h b/kernel_addons/backport/2.6.14/include/linux/inetdevice.h
> index 7a32313..fd0aa36 100644
> --- a/kernel_addons/backport/2.6.14/include/linux/inetdevice.h
> +++ b/kernel_addons/backport/2.6.14/include/linux/inetdevice.h
> @@ -11,7 +11,7 @@ static inline struct net_device *xxx_ip_
>  
>  	read_lock(&dev_base_lock);
>  	for (dev = dev_base; dev; dev = dev->next) {
> -		ip = inet_select_addr(dev, 0, RT_SCOPE_LINK);
> +		ip = inet_select_addr(dev, 0, RT_SCOPE_HOST);
>  		if (ip == addr) {
>  			dev_hold(dev);
>  			break;
> diff --git a/kernel_addons/backport/2.6.15/include/linux/inetdevice.h b/kernel_addons/backport/2.6.15/include/linux/inetdevice.h
> index 7a32313..fd0aa36 100644
> --- a/kernel_addons/backport/2.6.15/include/linux/inetdevice.h
> +++ b/kernel_addons/backport/2.6.15/include/linux/inetdevice.h
> @@ -11,7 +11,7 @@ static inline struct net_device *xxx_ip_
>  
>  	read_lock(&dev_base_lock);
>  	for (dev = dev_base; dev; dev = dev->next) {
> -		ip = inet_select_addr(dev, 0, RT_SCOPE_LINK);
> +		ip = inet_select_addr(dev, 0, RT_SCOPE_HOST);
>  		if (ip == addr) {
>  			dev_hold(dev);
>  			break;
> diff --git a/kernel_addons/backport/2.6.15_ubuntu606/include/linux/inetdevice.h b/kernel_addons/backport/2.6.15_ubuntu606/include/linux/inetdevice.h
> index 7a32313..fd0aa36 100644
> --- a/kernel_addons/backport/2.6.15_ubuntu606/include/linux/inetdevice.h
> +++ b/kernel_addons/backport/2.6.15_ubuntu606/include/linux/inetdevice.h
> @@ -11,7 +11,7 @@ static inline struct net_device *xxx_ip_
>  
>  	read_lock(&dev_base_lock);
>  	for (dev = dev_base; dev; dev = dev->next) {
> -		ip = inet_select_addr(dev, 0, RT_SCOPE_LINK);
> +		ip = inet_select_addr(dev, 0, RT_SCOPE_HOST);
>  		if (ip == addr) {
>  			dev_hold(dev);
>  			break;
> diff --git a/kernel_addons/backport/2.6.16/include/linux/inetdevice.h b/kernel_addons/backport/2.6.16/include/linux/inetdevice.h
> index 7a32313..fd0aa36 100644
> --- a/kernel_addons/backport/2.6.16/include/linux/inetdevice.h
> +++ b/kernel_addons/backport/2.6.16/include/linux/inetdevice.h
> @@ -11,7 +11,7 @@ static inline struct net_device *xxx_ip_
>  
>  	read_lock(&dev_base_lock);
>  	for (dev = dev_base; dev; dev = dev->next) {
> -		ip = inet_select_addr(dev, 0, RT_SCOPE_LINK);
> +		ip = inet_select_addr(dev, 0, RT_SCOPE_HOST);
>  		if (ip == addr) {
>  			dev_hold(dev);
>  			break;
> diff --git a/kernel_addons/backport/2.6.17/include/linux/inetdevice.h b/kernel_addons/backport/2.6.17/include/linux/inetdevice.h
> index 7a32313..fd0aa36 100644
> --- a/kernel_addons/backport/2.6.17/include/linux/inetdevice.h
> +++ b/kernel_addons/backport/2.6.17/include/linux/inetdevice.h
> @@ -11,7 +11,7 @@ static inline struct net_device *xxx_ip_
>  
>  	read_lock(&dev_base_lock);
>  	for (dev = dev_base; dev; dev = dev->next) {
> -		ip = inet_select_addr(dev, 0, RT_SCOPE_LINK);
> +		ip = inet_select_addr(dev, 0, RT_SCOPE_HOST);
>  		if (ip == addr) {
>  			dev_hold(dev);
>  			break;
> diff --git a/kernel_addons/backport/2.6.5_sles9_sp3/include/linux/inetdevice.h b/kernel_addons/backport/2.6.5_sles9_sp3/include/linux/inetdevice.h
> index 7244487..2d3c50f 100644
> --- a/kernel_addons/backport/2.6.5_sles9_sp3/include/linux/inetdevice.h
> +++ b/kernel_addons/backport/2.6.5_sles9_sp3/include/linux/inetdevice.h
> @@ -13,7 +13,7 @@ static inline struct net_device *xxx_ip_
>  
>  	read_lock(&dev_base_lock);
>  	for (dev = dev_base; dev; dev = dev->next) {
> -		ip = inet_select_addr(dev, 0, RT_SCOPE_LINK);
> +		ip = inet_select_addr(dev, 0, RT_SCOPE_HOST);
>  		if (ip == addr) {
>  			dev_hold(dev);
>  			break;
> diff --git a/kernel_addons/backport/2.6.9_U2/include/linux/inetdevice.h b/kernel_addons/backport/2.6.9_U2/include/linux/inetdevice.h
> index 7244487..2d3c50f 100644
> --- a/kernel_addons/backport/2.6.9_U2/include/linux/inetdevice.h
> +++ b/kernel_addons/backport/2.6.9_U2/include/linux/inetdevice.h
> @@ -13,7 +13,7 @@ static inline struct net_device *xxx_ip_
>  
>  	read_lock(&dev_base_lock);
>  	for (dev = dev_base; dev; dev = dev->next) {
> -		ip = inet_select_addr(dev, 0, RT_SCOPE_LINK);
> +		ip = inet_select_addr(dev, 0, RT_SCOPE_HOST);
>  		if (ip == addr) {
>  			dev_hold(dev);
>  			break;
> diff --git a/kernel_addons/backport/2.6.9_U3/include/linux/inetdevice.h b/kernel_addons/backport/2.6.9_U3/include/linux/inetdevice.h
> index 7244487..2d3c50f 100644
> --- a/kernel_addons/backport/2.6.9_U3/include/linux/inetdevice.h
> +++ b/kernel_addons/backport/2.6.9_U3/include/linux/inetdevice.h
> @@ -13,7 +13,7 @@ static inline struct net_device *xxx_ip_
>  
>  	read_lock(&dev_base_lock);
>  	for (dev = dev_base; dev; dev = dev->next) {
> -		ip = inet_select_addr(dev, 0, RT_SCOPE_LINK);
> +		ip = inet_select_addr(dev, 0, RT_SCOPE_HOST);
>  		if (ip == addr) {
>  			dev_hold(dev);
>  			break;
> diff --git a/kernel_addons/backport/2.6.9_U4/include/linux/inetdevice.h b/kernel_addons/backport/2.6.9_U4/include/linux/inetdevice.h
> index 7244487..2d3c50f 100644
> --- a/kernel_addons/backport/2.6.9_U4/include/linux/inetdevice.h
> +++ b/kernel_addons/backport/2.6.9_U4/include/linux/inetdevice.h
> @@ -13,7 +13,7 @@ static inline struct net_device *xxx_ip_
>  
>  	read_lock(&dev_base_lock);
>  	for (dev = dev_base; dev; dev = dev->next) {
> -		ip = inet_select_addr(dev, 0, RT_SCOPE_LINK);
> +		ip = inet_select_addr(dev, 0, RT_SCOPE_HOST);
>  		if (ip == addr) {
>  			dev_hold(dev);
>  			break;
> 
> 
> 
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
> 


From michael.arndt at informatik.tu-chemnitz.de  Thu Feb  8 16:39:06 2007
From: michael.arndt at informatik.tu-chemnitz.de (Michael Arndt)
Date: Fri, 9 Feb 2007 01:39:06 +0100
Subject: [openib-general] Unknown SMP Recv
References: <000901c74938$e10b2a30$21606d86@one7>
	<1170689654.4525.201415.camel@hal.voltaire.com>
	<001401c74946$a664a2e0$21606d86@one7>
	<1170695591.4525.207604.camel@hal.voltaire.com>
	<002001c74a33$c2ec1db0$21606d86@one7>
	<1170807564.4525.324195.camel@hal.voltaire.com>
Message-ID: <001e01c74be2$b4889310$21606d86@one7>

Hi,

I think I have found the problem. It is the timeout parameter on the 
umad_send function. How exactly I have to handle this parameter? It seems to 
be that it shoult be zero if there is no response exspected. But what value 
should it be if there is a response expected. In a test I used zero for 
SubnGetResp packets because there shouldn't be more packets and 100 for 
SubnGet or SubnSet. But if the router is stressed the umad_send function 
broke down and give an error -5 every thiertieth packet. Any idea or advice?

Thanks Michael 


From bsharp at NetEffect.com  Thu Feb  8 17:19:46 2007
From: bsharp at NetEffect.com (Bob Sharp)
Date: Thu, 8 Feb 2007 19:19:46 -0600
Subject: [openib-general] dapl broken for iWARP
Message-ID: <5E701717F2B2ED4EA60F87C8AA57B7CC06AD1A81@venom2>

> For OFED 1.2, I think I should just have the IWCM set them to 8.  The
> only RNIC in ofed is cxgb3 and it supports 8...
> 
Steve,

If we can create the new attributes for RNICs, it seems like would be
better to agree on the mapping of IRD/ORD to IB parameters than it would
be to limit these parameters to 8.  That number seems a bit low.

Bob


From swise at opengridcomputing.com  Thu Feb  8 17:41:22 2007
From: swise at opengridcomputing.com (Steve WIse)
Date: Thu, 08 Feb 2007 19:41:22 -0600
Subject: [openib-general] dapl broken for iWARP
In-Reply-To: <5E701717F2B2ED4EA60F87C8AA57B7CC06AD1A81@venom2>
References: <5E701717F2B2ED4EA60F87C8AA57B7CC06AD1A81@venom2>
Message-ID: <1170985282.25474.2.camel@linux-q667.site>

On Thu, 2007-02-08 at 19:19 -0600, Bob Sharp wrote:
> > For OFED 1.2, I think I should just have the IWCM set them to 8.  The
> > only RNIC in ofed is cxgb3 and it supports 8...
> > 
> Steve,
> 
> If we can create the new attributes for RNICs, it seems like would be
> better to agree on the mapping of IRD/ORD to IB parameters than it would
> be to limit these parameters to 8.  That number seems a bit low.
> 

Hey Bob,

This is for the OFED 1.2 release only and its too late to be adding new
features methinks since we're at feature freeze.  For the upstream
kernel (ie 2.6.21) we can define the attributes.


> Bob


From bsharp at NetEffect.com  Thu Feb  8 17:51:34 2007
From: bsharp at NetEffect.com (Bob Sharp)
Date: Thu, 8 Feb 2007 19:51:34 -0600
Subject: [openib-general] dapl broken for iWARP
References: <5E701717F2B2ED4EA60F87C8AA57B7CC06AD1A81@venom2>
	<1170985282.25474.2.camel@linux-q667.site>
Message-ID: <5E701717F2B2ED4EA60F87C8AA57B7CC01E5DDD5@venom2>

> > > For OFED 1.2, I think I should just have the IWCM set them to 8.  The
> > > only RNIC in ofed is cxgb3 and it supports 8...
> > > 
> > Steve,
> > 
> > If we can't create the new attributes for RNICs, it seems like it would be
> > better to agree on the mapping of IRD/ORD to IB parameters than it would
> > be to limit these parameters to 8.  That number seems a bit low.
> > 
> 
> Hey Bob,
> 
> This is for the OFED 1.2 release only and its too late to be adding new
> features methinks since we're at feature freeze.  For the upstream
> kernel (ie 2.6.21) we can define the attributes.
> 

I figured as much.  So lets just go with your Ammasso mapping of IRD/ORD to 
the IB parameters that RNICs don't use for now.


From krkumar2 at in.ibm.com  Thu Feb  8 19:29:20 2007
From: krkumar2 at in.ibm.com (Krishna Kumar2)
Date: Fri, 9 Feb 2007 08:59:20 +0530
Subject: [openib-general] [PATCH] RDMA/iwcm: Bugs in
 cm_conn_req_handler()
In-Reply-To: <adazm7onr8t.fsf@cisco.com>
Message-ID: <OFB574507E.6CAEE34E-ON6525727D.0013194D-6525727D.00132A8C@in.ibm.com>

Roland,

Yes, we will do some "arm wrestling" today :)

thanks,

 KK

Roland Dreier <rdreier at cisco.com> wrote on 02/09/2007 05:20:42 AM:

> Hmm, Steve likes it, Tom doesn't.  Can you guys arm wrestle or
> something and tell me if this patch is correct or not?
>
>  - R.


From halr at voltaire.com  Thu Feb  8 20:15:36 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 08 Feb 2007 23:15:36 -0500
Subject: [openib-general] Unknown SMP Recv
In-Reply-To: <001e01c74be2$b4889310$21606d86@one7>
References: <000901c74938$e10b2a30$21606d86@one7>
	<1170689654.4525.201415.camel@hal.voltaire.com>
	<001401c74946$a664a2e0$21606d86@one7>
	<1170695591.4525.207604.camel@hal.voltaire.com>
	<002001c74a33$c2ec1db0$21606d86@one7>
	<1170807564.4525.324195.camel@hal.voltaire.com>
	<001e01c74be2$b4889310$21606d86@one7>
Message-ID: <1170994529.31538.124584.camel@hal.voltaire.com>

On Thu, 2007-02-08 at 19:39, Michael Arndt wrote:
> Hi,
> 
> I think I have found the problem. It is the timeout parameter on the 
> umad_send function. How exactly I have to handle this parameter? It seems to 
> be that it shoult be zero if there is no response exspected. But what value 
> should it be if there is a response expected. In a test I used zero for 
> SubnGetResp packets because there shouldn't be more packets and 100 for 
> SubnGet or SubnSet. But if the router is stressed the umad_send function 
> broke down and give an error -5 every thiertieth packet. Any idea or advice?

umad_send takes the timeout in msec. 100 msec is too short. Try
something on the order of seconds. Note also that negative 'timeout_ms'
value makes the kernel wait for the reply forever.

-- Hal

> Thanks Michael 


From rdreier at cisco.com  Thu Feb  8 20:23:18 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Thu, 08 Feb 2007 20:23:18 -0800
Subject: [openib-general] [PATCH] RDMA/iwcm: Bugs in
 cm_conn_req_handler()
In-Reply-To: <OFB574507E.6CAEE34E-ON6525727D.0013194D-6525727D.00132A8C@in.ibm.com>
	(Krishna Kumar2's message of "Fri, 9 Feb 2007 08:59:20 +0530")
References: <OFB574507E.6CAEE34E-ON6525727D.0013194D-6525727D.00132A8C@in.ibm.com>
Message-ID: <ada7iusm021.fsf@cisco.com>

BTW, while looking at iwcm.c, I noticed the following highly dubious
code for the first time:

	static int iwcm_deref_id(struct iwcm_id_private *cm_id_priv)
	{
		int ret = 0;
	
		BUG_ON(atomic_read(&cm_id_priv->refcount)==0);
		if (atomic_dec_and_test(&cm_id_priv->refcount)) {
			BUG_ON(!list_empty(&cm_id_priv->work_list));
			if (waitqueue_active(&cm_id_priv->destroy_comp.wait)) {
				BUG_ON(cm_id_priv->state != IW_CM_STATE_DESTROYING);
				BUG_ON(test_bit(IWCM_F_CALLBACK_DESTROY,
						&cm_id_priv->flags));
				ret = 1;
			}
			complete(&cm_id_priv->destroy_comp);
		}
	
		return ret;
	}

The test of waitqueue_active on destroy_comp.wait looks really bad for
two reasons: first, it is relying on an internal implementation detail
of struct completion that really shouldn't be used by generic code.
And second, it seems to me that this doesn't even work right, since
there is a race something like the following:

iw_destroy_cm_id():
destroy_cm_id(cm_id); // still 1 ref left

				cm_work_handler():
					if (iwcm_deref_id()) // drop last ref
						return;
					// no one waiting yet, doesn't
					// return, but destroy_comp is
					// signaled

wait_for_completion(&cm_id_priv->destroy_comp);
// destroy_comp is signaled, proceed
kfree(cm_id_priv);

					// continue using cm_id_priv
					// OOPS

I don't understand this code well enough for the fix to be obvious.

 - R.


From jgunthorpe at obsidianresearch.com  Thu Feb  8 20:37:27 2007
From: jgunthorpe at obsidianresearch.com (Jason Gunthorpe)
Date: Thu, 8 Feb 2007 21:37:27 -0700
Subject: [openib-general] Problem is routing CM REQ
In-Reply-To: <45CBB59C.4010709@ichips.intel.com>
References: <20070126180840.GD12386@obsidianresearch.com>
	<45CA2084.7090503@ichips.intel.com>
	<20070207191154.GC11411@obsidianresearch.com>
	<45CA3568.1000508@ichips.intel.com>
	<20070207213108.GD11411@obsidianresearch.com>
	<45CA5573.80802@ichips.intel.com>
	<20070207224928.GF11411@obsidianresearch.com>
	<1170894459.31538.23768.camel@hal.voltaire.com>
	<45CB6A8F.2030705@ichips.intel.com> <45CBB59C.4010709@ichips.intel.com>
Message-ID: <20070209043727.GN11411@obsidianresearch.com>

On Thu, Feb 08, 2007 at 03:43:24PM -0800, Sean Hefty wrote:
> > Looking at the problem more, I think that the issue extends to the remote port 
> > LID as well.  My expectation with a local path record query is that the SLID is 
> > the local port, and the DLID is the local router.  This should be sufficient for 
> > one-way UD traffic, but for connected traffic we still need to discover the 
> > remote router and remote port LIDs.
> 
> Given a path record query for:
> 
> SGID - local
> DGID - remote
> 
> What would be the SLID and DLID?
> 
> And if the query is reversed, such that:
> 
> SGID - remote
> DGID - local
> 
> Are the SLID/DLID values simply reversed?

I have a follow up question to this.. With CM how is the SL for each
side determined? I'm looking through the code here and it looks like
the SL of the active side is passed in the REQ to the passive side (ie
both sides are the same) But cma_query_ib_route does not set the
reversible bit when it asks for the path. If you don't set the
reversible bit isn't it necessary to make a 2nd path query to get the
reverse path's SL? [Path responses without the reversible bit set
are actually simplex paths and reversing them probably will run into
SL2VL mapping tables that cause the packets to be dropped ie o7-8]

Infact, to get an optimal path aren't 3 path records required:
1) A reversible path from active to passive from the CM GMPs
  (required by C12-5.1.3)
2) An optimal non-reversible path from active to passive
3) An optimal non-reversible path from passive to active

Jason


From krkumar2 at in.ibm.com  Thu Feb  8 21:01:23 2007
From: krkumar2 at in.ibm.com (Krishna Kumar2)
Date: Fri, 9 Feb 2007 10:31:23 +0530
Subject: [openib-general] [PATCH] RDMA/iwcm: Bugs in
 cm_conn_req_handler()
In-Reply-To: <ada7iusm021.fsf@cisco.com>
Message-ID: <OF7A2A4E2F.CE6B521A-ON6525727D.0019D888-6525727D.001B97B5@in.ibm.com>

Regarding the race - can this and the other problem (of
using internal data-structure) both be taken care of by
changing iw_deref_id to return 1 if atomic_dec_and_test
finds the last reference ? Then the waitqueue_active()
code can be removed, just do the completion (reaching
here implies that someone is in the middle of
iw_destroy_cm_id).

The question is what is the issue if we return 1 even if
no one is waiting in iw_destroy_cm_id() and which results
in cm_work_handler() returning out ?

thanks,

- KK

> BTW, while looking at iwcm.c, I noticed the following highly dubious
> code for the first time:
>
>    static int iwcm_deref_id(struct iwcm_id_private *cm_id_priv)
>    {
>       int ret = 0;
>
>       BUG_ON(atomic_read(&cm_id_priv->refcount)==0);
>       if (atomic_dec_and_test(&cm_id_priv->refcount)) {
>          BUG_ON(!list_empty(&cm_id_priv->work_list));
>          if (waitqueue_active(&cm_id_priv->destroy_comp.wait)) {
>             BUG_ON(cm_id_priv->state != IW_CM_STATE_DESTROYING);
>             BUG_ON(test_bit(IWCM_F_CALLBACK_DESTROY,
>                   &cm_id_priv->flags));
>             ret = 1;
>          }
>          complete(&cm_id_priv->destroy_comp);
>       }
>
>       return ret;
>    }
>
> The test of waitqueue_active on destroy_comp.wait looks really bad for
> two reasons: first, it is relying on an internal implementation detail
> of struct completion that really shouldn't be used by generic code.
> And second, it seems to me that this doesn't even work right, since
> there is a race something like the following:
>
> iw_destroy_cm_id():
> destroy_cm_id(cm_id); // still 1 ref left
>
>             cm_work_handler():
>                if (iwcm_deref_id()) // drop last ref
>                   return;
>                // no one waiting yet, doesn't
>                // return, but destroy_comp is
>                // signaled
>
> wait_for_completion(&cm_id_priv->destroy_comp);
> // destroy_comp is signaled, proceed
> kfree(cm_id_priv);
>
>                // continue using cm_id_priv
>                // OOPS
>
> I don't understand this code well enough for the fix to be obvious.
>
>  - R.


From mst at mellanox.co.il  Thu Feb  8 22:51:49 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Fri, 9 Feb 2007 08:51:49 +0200
Subject: [openib-general] [PATCH 0/5] iw_cxgb3 - misc cleanup and fixes
In-Reply-To: <1170981550.3049.130.camel@stevo-desktop>
References: <1170981550.3049.130.camel@stevo-desktop>
Message-ID: <20070209065149.GL6560@mellanox.co.il>

> > Also I agree with MST, I would like to see the core/ subdirectory die
> > completely.
> > 
> 
> ok ok...I'll kill the subdir...

It's not just the directory BTW. Stuff like building completions in
t3_cqe format and then reformatting to ib_wc seems to be much more confusing
(and some of it is actually on datapath).
Same goes for t3_wq and I suspect everything else defined in cxio_wr.h -
please, use the native types from include/rdma/.

Having to wade through 3 driver-specific layers of abstractions just because I want to
for example change API in ib_verbs.h and need to update all drivers will be
very taxing. I understand your design calls for 2 layers, but at least the API exposed
by code in drivers/net is fairly small, while cxio_wr.h declares 27 structures
which seem to just duplicate ib_verbs.h.

-- 
MST


From mst at mellanox.co.il  Thu Feb  8 23:19:15 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Fri, 9 Feb 2007 09:19:15 +0200
Subject: [openib-general] [PATCH] [RFC] ofed_1_2 - SLES9SP3
 Backport-IWCM workaroundfor ip_dev_find() bug.
In-Reply-To: <1170981653.3049.133.camel@stevo-desktop>
References: <1170981653.3049.133.camel@stevo-desktop>
Message-ID: <20070209071915.GN6560@mellanox.co.il>

> Quoting Steve Wise <swise at opengridcomputing.com>:
> Subject: Re: [openib-general] [PATCH] [RFC] ofed_1_2 - SLES9SP3 Backport-IWCM workaroundfor ip_dev_find() bug.
> 
> Michael, 
> 
> >From your email, it sounded like you would regression test this.

Not yet, we had lab restructuring - hopefully next week.


-- 
MST


From mst at mellanox.co.il  Thu Feb  8 23:28:52 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Fri, 9 Feb 2007 09:28:52 +0200
Subject: [openib-general] [PATCH 0/5] iw_cxgb3 - misc cleanup and fixes
In-Reply-To: <adatzxwmady.fsf@cisco.com>
References: <20070208202634.4382.15287.stgit@dell3.ogc.int>
	<ada64acnpla.fsf@cisco.com> <adatzxwmady.fsf@cisco.com>
Message-ID: <20070209072852.GP6560@mellanox.co.il>

> And here's what I have pending in for-2.6.21 so far:

What about the mthca memory registration patches?
I thought they are on their way. Should I repost?

-- 
MST


From mst at mellanox.co.il  Fri Feb  9 00:04:19 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Fri, 9 Feb 2007 10:04:19 +0200
Subject: [openib-general] please pull for 2.6.21: fix + add IB multicast
	support
In-Reply-To: <adaodo4nqz7.fsf@cisco.com>
References: <000701c741a6$16dc4760$ff0da8c0@amr.corp.intel.com>
	<45BF8E17.2010805@ichips.intel.com> <45C37BE9.5040105@ichips.intel.com>
	<45C85B39.4080700@voltaire.com> <45CB3537.8060508@voltaire.com>
	<adaodo4nqz7.fsf@cisco.com>
Message-ID: <20070209080418.GQ6560@mellanox.co.il>

> Quoting Roland Dreier <rdreier at cisco.com>:
> Subject: Re: please pull for 2.6.21: fix + add IB multicast support
> 
> I merged the "increment port number" and "remove redundant '_wq'"
> patches from git.openfabrics.org/~shefty/scm/rdma-dev.git for-roland
> 
> I plan to review to multicast stuff next week and I hope to merge it
> for 2.6.21.  Or, have you or anyone else at Voltaire read over the
> code in addition to using it?  Do you see anything that should be
> cleaned up?

I looked at the code briefly, don't have much time at the moment
unfortunately.

+static void join_group(struct mcast_group *group, struct mcast_member *member,
+                      u8 join_state)
+{
+       member->state = MCAST_MEMBER;
+       adjust_membership(group, join_state, 1);
+       group->rec.join_state |= join_state;
+       member->multicast.rec = group->rec;
+       member->multicast.rec.join_state = join_state;
+       list_del(&member->list);
+       list_add(&member->list, &group->active_list);
+}

Can be just list_move.

Patch allocates everything with kzalloc, but then goes ahead and initialize everything.
So just kmalloc it - no reason to waste initialized memory if non-initialized will do.
List of places:

+       member = kzalloc(sizeof *member, gfp_mask);
+       if (!member)
+               return ERR_PTR(-ENOMEM);


Same here:

+       group = kzalloc(sizeof *group, gfp_mask);
+       if (!group)
+               return NULL;
+

and same here:

+       iter = kzalloc(sizeof *iter + attr_size, GFP_KERNEL);
+       if (!iter)
+               return ERR_PTR(-ENOMEM);
+

It seems same goes for

+       mc = kzalloc(sizeof(*mc), GFP_KERNEL);
+       if (!mc)
+               return NULL;

in ucma.c - everything gets initied by calling function - but
a bit less sure, needs checking.

By the way, it seems same goes for

+       bind_list = kzalloc(sizeof *bind_list, GFP_KERNEL);
+       if (!bind_list)
+               return -ENOMEM;

in cma_alloc_any_port in the port randomization patch that was merged
and for cma_alloc_port in existing code.


-- 
MSTYou seem to be careful to do list_del_init for member->list all over,


From or.gerlitz at gmail.com  Fri Feb  9 01:21:30 2007
From: or.gerlitz at gmail.com (Or Gerlitz)
Date: Fri, 9 Feb 2007 11:21:30 +0200
Subject: [openib-general] please pull for 2.6.21: fix + add IB multicast
 support
In-Reply-To: <adaodo4nqz7.fsf@cisco.com>
References: <000701c741a6$16dc4760$ff0da8c0@amr.corp.intel.com>
	<45BF8E17.2010805@ichips.intel.com> <45C37BE9.5040105@ichips.intel.com>
	<45C85B39.4080700@voltaire.com> <45CB3537.8060508@voltaire.com>
	<adaodo4nqz7.fsf@cisco.com>
Message-ID: <15ddcffd0702090121t3314f577ue42584282991984a@mail.gmail.com>

On 2/9/07, Roland Dreier <rdreier at cisco.com> wrote:
> I plan to review to multicast stuff next week and I hope to merge it for 2.6.21

thanks, good news!

> Or, have you or anyone else at Voltaire read over the
> code in addition to using it?  Do you see anything that should be
> cleaned up?

OK, I most the the review i did (and interaction with Sean to add changes) was
on the rdma_cm: add multicast communication support patch, and i was
less focused
on the ib_sa: track multicast join/leave requests patch,  however i
recall that there were some discussions between Sean and Michael and
they reached an agreement.

I will look on the ib_sa patch on Sunday and let Sean/you know if i
have any comments.

Or.


From mst at mellanox.co.il  Fri Feb  9 01:29:21 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Fri, 9 Feb 2007 11:29:21 +0200
Subject: [openib-general] please pull for 2.6.21: fix + add IB multicast
	support
In-Reply-To: <15ddcffd0702090121t3314f577ue42584282991984a@mail.gmail.com>
References: <000701c741a6$16dc4760$ff0da8c0@amr.corp.intel.com>
	<45BF8E17.2010805@ichips.intel.com> <45C37BE9.5040105@ichips.intel.com>
	<45C85B39.4080700@voltaire.com> <45CB3537.8060508@voltaire.com>
	<adaodo4nqz7.fsf@cisco.com>
	<15ddcffd0702090121t3314f577ue42584282991984a@mail.gmail.com>
Message-ID: <20070209092921.GU6560@mellanox.co.il>

> > Or, have you or anyone else at Voltaire read over the
> > code in addition to using it?  Do you see anything that should be
> > cleaned up?
> 
> OK, I most the the review i did (and interaction with Sean to add changes) was
> on the rdma_cm: add multicast communication support patch, and i was
> less focused
> on the ib_sa: track multicast join/leave requests patch,  however i
> recall that there were some discussions between Sean and Michael and
> they reached an agreement.

Yes, we reached an agreement.
These patches have also seen some limited testing in the OFED tree.

-- 
MST


From vlad at lists.openfabrics.org  Fri Feb  9 02:24:20 2007
From: vlad at lists.openfabrics.org (vlad at lists.openfabrics.org)
Date: Fri,  9 Feb 2007 02:24:20 -0800 (PST)
Subject: [openib-general] ofa_1_2_kernel 20070209-0200 daily build status
Message-ID: <20070209102420.B6FB6E60807@openfabrics.org>

This email was generated automatically, please do not reply


Common build parameters:  --with-ipoib-mod --with-sdp-mod --with-srp-mod --with-user_mad-mod --with-user_access-mod --with-mthca-mod --with-core-mod --with-addr_trans-mod --with-cxgb3-mod 

Passed:
Passed on i686 with 2.6.15-23-server
Passed on i686 with linux-2.6.16
Passed on i686 with linux-2.6.17
Passed on i686 with linux-2.6.19
Passed on i686 with linux-2.6.12
Passed on i686 with linux-2.6.14
Passed on i686 with linux-2.6.15
Passed on i686 with linux-2.6.13
Passed on i686 with linux-2.6.18
Passed on powerpc with linux-2.6.18
Passed on powerpc with linux-2.6.19
Passed on powerpc with linux-2.6.17
Passed on ia64 with linux-2.6.19
Passed on x86_64 with linux-2.6.17
Passed on x86_64 with linux-2.6.14
Passed on x86_64 with linux-2.6.18
Passed on x86_64 with linux-2.6.13
Passed on x86_64 with linux-2.6.19
Passed on x86_64 with linux-2.6.12
Passed on x86_64 with linux-2.6.15
Passed on powerpc with linux-2.6.14
Passed on x86_64 with linux-2.6.16
Passed on ppc64 with linux-2.6.19
Passed on ppc64 with linux-2.6.12
Passed on powerpc with linux-2.6.12
Passed on ppc64 with linux-2.6.16
Passed on powerpc with linux-2.6.15
Passed on powerpc with linux-2.6.13
Passed on powerpc with linux-2.6.16
Passed on ppc64 with linux-2.6.14
Passed on ppc64 with linux-2.6.17
Passed on ppc64 with linux-2.6.15
Passed on ia64 with linux-2.6.18
Passed on ppc64 with linux-2.6.13
Passed on ia64 with linux-2.6.13
Passed on ia64 with linux-2.6.12
Passed on ia64 with linux-2.6.16
Passed on ia64 with linux-2.6.14
Passed on ppc64 with linux-2.6.18
Passed on ia64 with linux-2.6.15
Passed on ia64 with linux-2.6.17

Failed:
Build failed on ia64 with linux-2.6.16.21-0.8-default
Log:
/home/vlad/tmp/ofa_1_2_kernel-20070209-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core/addr.c:380: error: implicit declaration of function ‘register_netevent_notifier’
/home/vlad/tmp/ofa_1_2_kernel-20070209-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core/addr.c: In function ‘addr_cleanup’:
/home/vlad/tmp/ofa_1_2_kernel-20070209-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core/addr.c:386: error: implicit declaration of function ‘unregister_netevent_notifier’
make[4]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070209-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core/addr.o] Error 1
make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070209-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core] Error 2
make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070209-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband] Error 2
make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070209-0200_linux-2.6.16.21-0.8-default_ia64_check] Error 2
make[1]: Leaving directory `/home/vlad/kernel.org/ia64/linux-2.6.16.21-0.8-default'
make: *** [kernel] Error 2
----------------------------------------------------------------------------------


From halr at voltaire.com  Fri Feb  9 04:12:55 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 09 Feb 2007 07:12:55 -0500
Subject: [openib-general] Problem is routing CM REQ
In-Reply-To: <45CBB59C.4010709@ichips.intel.com>
References: <20070126000319.GA12386@obsidianresearch.com>
	<ada4pqel66k.fsf@cisco.com>
	<20070126180840.GD12386@obsidianresearch.com>
	<45CA2084.7090503@ichips.intel.com>
	<20070207191154.GC11411@obsidianresearch.com>
	<45CA3568.1000508@ichips.intel.com>
	<20070207213108.GD11411@obsidianresearch.com>
	<45CA5573.80802@ichips.intel.com>
	<20070207224928.GF11411@obsidianresearch.com>
	<1170894459.31538.23768.camel@hal.voltaire.com>
	<45CB6A8F.2030705@ichips.intel.com> <45CBB59C.4010709@ichips.intel.com>
Message-ID: <1171023168.31538.153989.camel@hal.voltaire.com>

On Thu, 2007-02-08 at 18:43, Sean Hefty wrote:
> > Looking at the problem more, I think that the issue extends to the remote port 
> > LID as well.  My expectation with a local path record query is that the SLID is 
> > the local port, and the DLID is the local router.  This should be sufficient for 
> > one-way UD traffic, but for connected traffic we still need to discover the 
> > remote router and remote port LIDs.
> 
> Given a path record query for:
> 
> SGID - local
> DGID - remote
> 
> What would be the SLID and DLID?

SLID corresponding to SGID and a DLID for some IB router on the subnet
which can route to the remote DGID.

> And if the query is reversed, such that:
> 
> SGID - remote
> DGID - local
> 
> Are the SLID/DLID values simply reversed?

An SM is free to choose SLID and DLID to supply to if there are multiple
LIDs for the ports in question it can choose alternates. The key here is
whether a reversible path has been requested or not. It is also not
clear what reversible means in the context of an IB internetwork
(multiple IB subnets interconnected by IB routers).

> What if the DGID in the second case were a multicast GID?

So you are asking about what an SA PR lookup for a remote SGID to a DGID
which is an MGID would yield ? I think this too is beyond the spec.

> What does the SLID become in this case?

The SLID couldn't be valid (on a remote subnet) so I'm not sure what
would be said for this case.

-- Hal

> - Sean


From halr at voltaire.com  Fri Feb  9 04:15:31 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 09 Feb 2007 07:15:31 -0500
Subject: [openib-general] Problem is routing CM REQ
In-Reply-To: <20070209043727.GN11411@obsidianresearch.com>
References: <20070126180840.GD12386@obsidianresearch.com>
	<45CA2084.7090503@ichips.intel.com>
	<20070207191154.GC11411@obsidianresearch.com>
	<45CA3568.1000508@ichips.intel.com>
	<20070207213108.GD11411@obsidianresearch.com>
	<45CA5573.80802@ichips.intel.com>
	<20070207224928.GF11411@obsidianresearch.com>
	<1170894459.31538.23768.camel@hal.voltaire.com>
	<45CB6A8F.2030705@ichips.intel.com> <45CBB59C.4010709@ichips.intel.com>
	<20070209043727.GN11411@obsidianresearch.com>
Message-ID: <1171023319.31538.154141.camel@hal.voltaire.com>

On Thu, 2007-02-08 at 23:37, Jason Gunthorpe wrote:
> On Thu, Feb 08, 2007 at 03:43:24PM -0800, Sean Hefty wrote:
> > > Looking at the problem more, I think that the issue extends to the remote port 
> > > LID as well.  My expectation with a local path record query is that the SLID is 
> > > the local port, and the DLID is the local router.  This should be sufficient for 
> > > one-way UD traffic, but for connected traffic we still need to discover the 
> > > remote router and remote port LIDs.
> > 
> > Given a path record query for:
> > 
> > SGID - local
> > DGID - remote
> > 
> > What would be the SLID and DLID?
> > 
> > And if the query is reversed, such that:
> > 
> > SGID - remote
> > DGID - local
> > 
> > Are the SLID/DLID values simply reversed?
> 
> I have a follow up question to this.. With CM how is the SL for each
> side determined? I'm looking through the code here and it looks like
> the SL of the active side is passed in the REQ to the passive side (ie
> both sides are the same) But cma_query_ib_route does not set the
> reversible bit when it asks for the path. If you don't set the
> reversible bit isn't it necessary to make a 2nd path query to get the
> reverse path's SL? [Path responses without the reversible bit set
> are actually simplex paths and reversing them probably will run into
> SL2VL mapping tables that cause the packets to be dropped ie o7-8]
> 
> Infact, to get an optimal path aren't 3 path records required:
> 1) A reversible path from active to passive from the CM GMPs
>   (required by C12-5.1.3)
> 2) An optimal non-reversible path from active to passive
> 3) An optimal non-reversible path from passive to active

What you are saying seems correct to me although I am not sure about
reversibility in the intersubnet case.

It may be that the non reversible paths supplied (in a single subnet)
happen to also be reversible so this all works.

-- Hal

> Jason
> 
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
> 


From ossrosch at linux.vnet.ibm.com  Fri Feb  9 05:37:01 2007
From: ossrosch at linux.vnet.ibm.com (Stefan Roscher)
Date: Fri, 9 Feb 2007 14:37:01 +0100
Subject: [openib-general] [PATCH ofed-1.2] ofa_user.spec: fix installation
 path for ehca.driver
Message-ID: <200702091437.02142.ossrosch@linux.vnet.ibm.com>

Hi Vladimir,

we tested the newest ofed1.2 package and found out that ehca.driver file is
not copied into /usr/local/ofed/etc/libibverbs.d/

This patch add the installation path for ehca.driver to ofa_user.spec. 
Please ensure you first apply the ofa_user.spec patch I sent yesterday:
http://openib.org/pipermail/openib-general/2007-February/032736.html


Signed-off-by: Stefan Roscher <stefan.roscher at de.ibm.com>
---


ofa_user.spec |    1 +
1 files changed, 1 insertion(+)


diff -Nurp ofed_scripts_old/ofa_user.spec ofed_scripts_new/ofa_user.spec
--- ofed_scripts_old/ofa_user.spec	2007-02-09 14:00:38.000000000 +0100
+++ ofed_scripts_new/ofa_user.spec	2007-02-09 14:02:45.000000000 +0100
@@ -1165,6 +1165,7 @@ fi
 %files -n libehca -f libehca-files
 %defattr(-,root,root,-)
 %{_libdir}/libehca*.so*
+%config %{_prefix}/etc/libibverbs.d/ehca.driver
 # %doc AUTHORS COPYING ChangeLog README
 %endif
 

From halr at voltaire.com  Fri Feb  9 06:04:21 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 09 Feb 2007 09:04:21 -0500
Subject: [openib-general] [PATCH TRIVIAL] osmtest: use more descriptive
 constant names
In-Reply-To: <20070208231618.GB22807@sashak.voltaire.com>
References: <20070208231618.GB22807@sashak.voltaire.com>
Message-ID: <1171029859.31538.160613.camel@hal.voltaire.com>

On Thu, 2007-02-08 at 18:16, Sasha Khapyorsky wrote:
> Use more descriptive constant names for osmtest flows.
> 
> Signed-off-by: Sasha Khapyorsky <sashak at voltaire.com>

Thanks. Applied (to master and ofed_1_2).

-- Hal


From tom at opengridcomputing.com  Fri Feb  9 06:22:39 2007
From: tom at opengridcomputing.com (Tom Tucker)
Date: Fri, 09 Feb 2007 08:22:39 -0600
Subject: [openib-general] [PATCH] RDMA/iwcm: Bugs in
 cm_conn_req_handler()
In-Reply-To: <OFB574507E.6CAEE34E-ON6525727D.0013194D-6525727D.00132A8C@in.ibm.com>
References: <OFB574507E.6CAEE34E-ON6525727D.0013194D-6525727D.00132A8C@in.ibm.com>
Message-ID: <1171030959.26453.6.camel@trinity.ogc.int>


Kumar:

I _LOVE_ the patch and the fact that you're making this code better. I
just want to tweak it a little bit...

* Please convince yourself (and me ;-)) that the iw_cm_destroy_id can
never block where you've put it. I'll bet that it's fine, but convince
yourself too. Your comment scared me a little -- that's all.

* Please see if moving the call to reject can be moved to the destroy
switch so that we don't have to call it everywhere else.

* Please make sure that everywhere we call destory_cm_id, the cleanup of
the work queue is also done. 

Thanks,
Tom


On Fri, 2007-02-09 at 08:59 +0530, Krishna Kumar2 wrote:
> Roland,
> 
> Yes, we will do some "arm wrestling" today :)
> 
> thanks,
> 
>  KK
> 
> Roland Dreier <rdreier at cisco.com> wrote on 02/09/2007 05:20:42 AM:
> 
> > Hmm, Steve likes it, Tom doesn't.  Can you guys arm wrestle or
> > something and tell me if this patch is correct or not?
> >
> >  - R.
> 


From swise at opengridcomputing.com  Fri Feb  9 06:23:47 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Fri, 09 Feb 2007 08:23:47 -0600
Subject: [openib-general] [PATCH 0/5] iw_cxgb3 - misc cleanup and fixes
In-Reply-To: <20070209065149.GL6560@mellanox.co.il>
References: <1170981550.3049.130.camel@stevo-desktop>
	<20070209065149.GL6560@mellanox.co.il>
Message-ID: <1171031027.4896.1.camel@stevo-desktop>

On Fri, 2007-02-09 at 08:51 +0200, Michael S. Tsirkin wrote:
> > > Also I agree with MST, I would like to see the core/ subdirectory die
> > > completely.
> > > 
> > 
> > ok ok...I'll kill the subdir...
> 
> It's not just the directory BTW. Stuff like building completions in
> t3_cqe format and then reformatting to ib_wc seems to be much more confusing
> (and some of it is actually on datapath).

The t3_cqe format is built BY THE HW.


> Same goes for t3_wq and I suspect everything else defined in cxio_wr.h -
> please, use the native types from include/rdma/.
> 

Ditto.  t3_wq is the HW format.

> Having to wade through 3 driver-specific layers of abstractions just because I want to
> for example change API in ib_verbs.h and need to update all drivers will be
> very taxing. I understand your design calls for 2 layers, but at least the API exposed
> by code in drivers/net is fairly small, while cxio_wr.h declares 27 structures
> which seem to just duplicate ib_verbs.h.

cxio_wr.h is hw format.  You want me to change ib_verbs.h to make WRs
and CQEs align with Chelsio hardware?


From swise at opengridcomputing.com  Fri Feb  9 06:58:45 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Fri, 09 Feb 2007 08:58:45 -0600
Subject: [openib-general] [PATCH 0/5] iw_cxgb3 - misc cleanup and fixes
In-Reply-To: <1171031027.4896.1.camel@stevo-desktop>
References: <1170981550.3049.130.camel@stevo-desktop>
	<20070209065149.GL6560@mellanox.co.il>
	<1171031027.4896.1.camel@stevo-desktop>
Message-ID: <1171033125.4896.21.camel@stevo-desktop>

On Fri, 2007-02-09 at 08:23 -0600, Steve Wise wrote:
> On Fri, 2007-02-09 at 08:51 +0200, Michael S. Tsirkin wrote:
> > > > Also I agree with MST, I would like to see the core/ subdirectory die
> > > > completely.
> > > > 
> > > 
> > > ok ok...I'll kill the subdir...
> > 
> > It's not just the directory BTW. Stuff like building completions in
> > t3_cqe format and then reformatting to ib_wc seems to be much more confusing
> > (and some of it is actually on datapath).
> 
> The t3_cqe format is built BY THE HW.
> 
> 
> > Same goes for t3_wq and I suspect everything else defined in cxio_wr.h -
> > please, use the native types from include/rdma/.
> > 
> 
> Ditto.  t3_wq is the HW format.
> 

To be more precise:  

struct t3_wq is the struct used to describe the T3 HW WQ, which is both
the SQ and RQ for the QP.  

struct t3_cq is the struct used to describe the T3 HW CQ -and- a SW CQ
used to maintain proper completion ordering that isn't maintained by the
HW for some operations.  

union t3_wr defines the union of all the HW-specific WR structs.

struct t3_cqe defines the HW CQE format.

All of the is very tightly integrated with the HW.  

These HW-specific structs  are included in a high-level struct that
defines the object and has all the stuff needed to integrate into the
OFA stack.

Example:

struct iwch_qp defines the top-level QP structure that maintains both
the T3 HW struct (struct t3_wq) and the OFA struct (struct ib_qp) as
well as attributes, wait objects, locks, etc to correctly implement the
OFA QP object.   

This is similar to what other providers do:

hw/mthca/mthca_provider.h: mthca_qp includes 2 mthca_wq structs for the
SQ and RQ.

hw/amso/c2_provider.h: c2_qp includes 2 c2_mq structs for the SQ and RQ
message queues.

hw/ipath/ipath_verbs.h: ipath_qp include ipath_swqe and ipath_rq for
their work queues.

WRT data path operations, consider iwch_poll_cq_one().  The CQE is in T3
FW format and must be converted into a OFA struct ib_wc.  There's no way
around this, right?  mthca_poll_one() does the same thing.  Ditto for
c2_poll_one().


Hope this helps.  

Steve.


From mst at mellanox.co.il  Fri Feb  9 07:03:07 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Fri, 9 Feb 2007 17:03:07 +0200
Subject: [openib-general] [PATCH 0/5] iw_cxgb3 - misc cleanup and fixes
In-Reply-To: <1171031027.4896.1.camel@stevo-desktop>
References: <1170981550.3049.130.camel@stevo-desktop>
	<20070209065149.GL6560@mellanox.co.il>
	<1171031027.4896.1.camel@stevo-desktop>
Message-ID: <20070209150307.GW6560@mellanox.co.il>

> Quoting r. Steve Wise <swise at opengridcomputing.com>:
> Subject: Re: [PATCH 0/5] iw_cxgb3 - misc cleanup and fixes
> 
> On Fri, 2007-02-09 at 08:51 +0200, Michael S. Tsirkin wrote:
> > > > Also I agree with MST, I would like to see the core/ subdirectory die
> > > > completely.
> > > > 
> > > 
> > > ok ok...I'll kill the subdir...
> > 
> > It's not just the directory BTW. Stuff like building completions in
> > t3_cqe format and then reformatting to ib_wc seems to be much more confusing
> > (and some of it is actually on datapath).
> 
> The t3_cqe format is built BY THE HW.

I understand, I did not get that.

But for example create_read_req_cqe builds it in software.
It could build ib_wc instead.

...

> > Having to wade through 3 driver-specific layers of abstractions just because I want to
> > for example change API in ib_verbs.h and need to update all drivers will be
> > very taxing. I understand your design calls for 2 layers, but at least the API exposed
> > by code in drivers/net is fairly small, while cxio_wr.h declares 27 structures
> > which seem to just duplicate ib_verbs.h.
> 
> cxio_wr.h is hw format.  You want me to change ib_verbs.h to make WRs
> and CQEs align with Chelsio hardware?

No, but it need not be part of interface.  The reason I was confused is because
you seem to create an extra copy e.g.  for t3_cqe.  cxio_poll_cq currently
creates an intermediate copy of the completion on the stack, I think it could
format ib_wc directly instead.

-- 
MST


From Arkady.Kanevsky at netapp.com  Fri Feb  9 07:15:47 2007
From: Arkady.Kanevsky at netapp.com (Kanevsky, Arkady)
Date: Fri, 9 Feb 2007 10:15:47 -0500
Subject: [openib-general] dapl broken for iWARP
Message-ID: <C98692FD98048C41885E0B0FACD9DFB803AFA726@exnane01.hq.netapp.com>


Steve,
what is an issue of using 
max_qp_rd_atom and max_qp_init_rd_atom
beside the bad name?
Thanks,

Arkady Kanevsky                       email: arkady at netapp.com
Network Appliance Inc.               phone: 781-768-5395
1601 Trapelo Rd. - Suite 16.        Fax: 781-895-1195
Waltham, MA 02451                   central phone: 781-768-5300
 

> -----Original Message-----
> From: Steve Wise [mailto:swise at opengridcomputing.com] 
> Sent: Thursday, February 08, 2007 6:11 PM
> To: Arlin Davis
> Cc: openib-general
> Subject: Re: [openib-general] dapl broken for iWARP
> 
> On Wed, 2007-02-07 at 15:57 -0600, Steve Wise wrote:
> > On Wed, 2007-02-07 at 14:02 -0600, Steve Wise wrote:
> > > Arlin,
> > > 
> > > The OFED dapl code is assuming the responder_resources and 
> > > initiator_depth passed up on a connection request event 
> are from the 
> > > remote peer.  This doesn't happen for iWARP.  In the 
> current iWARP 
> > > specifications, its up to the application to exchange this 
> > > information somehow. So these are defaulting to 0 on the 
> server side 
> > > of any dapl connection over iWARP.
> > > 
> > > This is a fairly recent change, I think.  We need to come up with 
> > > some way to deal with this for OFED 1.2 IMO.
> > > 
> > 
> > The IWCM could set these to the device max values for instance.
> > 
> > Steve.
> > 
> 
> There is a slight problem with all this.  There are no device 
> attributes currently for ORD and IRD.  The ammasso driver 
> maps these to max_qp_rd_atom (IRD) and 
> max_qp_init_rd_atom(ORD).  But this is screwy.
> We need new attribute for these.
> 
> For OFED 1.2, I think I should just have the IWCM set them to 
> 8.  The only RNIC in ofed is cxgb3 and it supports 8...
> 
> 
> Steve.
> 
> 
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit 
> http://openib.org/mailman/listinfo/openib-general
> 


From swise at opengridcomputing.com  Fri Feb  9 07:25:45 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Fri, 09 Feb 2007 09:25:45 -0600
Subject: [openib-general] [PATCH 0/5] iw_cxgb3 - misc cleanup and fixes
In-Reply-To: <20070209150307.GW6560@mellanox.co.il>
References: <1170981550.3049.130.camel@stevo-desktop>
	<20070209065149.GL6560@mellanox.co.il>
	<1171031027.4896.1.camel@stevo-desktop>
	<20070209150307.GW6560@mellanox.co.il>
Message-ID: <1171034745.4896.37.camel@stevo-desktop>

> I understand, I did not get that.
> 
> But for example create_read_req_cqe builds it in software.
> It could build ib_wc instead.
> 

Reads are handled in a slightly different manner.  This is due to the
fact that the T3 HW can complete a read out of order.  For example:

POST READ
POST WRITE

The post read trigger the HW to send an RDMA_READ_REQUEST.  Immediately
after that the HW can (and will) send the RDMA_WRITE.  Once the peer TCP
ACKs the WRITE, the HW will post a CQE for the WRITE.  That completion
might happen before the peer sends back the READ_RESPONSE.  Since the
RDMAC verbs spec sez WRs must be completed in order, the T3 driver has
to deal with this.  (and its painful :)

In addition, I have to maintain other state about a read.  1) the
consumer wr_id.  For non reads, the wr_id is actually reflected back by
the HW from the WQE to the CQE.  For reads, this doesn't happen.  2) the
CQE for a read completion doesn't contain the original length.  I need
to pull that from the associated original WQE.  

So all this means the driver needs to construct a proper read cqe from
several parts.  That's why it creates it locally on the stack. 

BUT: 

You're right though:  All WQEs get copied out of the HWCQ and into an
on-stack variable in iwch_poll_cq_one().  Removing this, however,
requires rethinking all the READ logic which assumes the WQE is copied
out of the HWCQ.  Can cannot make this change right now because of
stability concerns (it took me long enough to understand how to
correctly handle the read case as it stands :-)


> ...
> 
> > > Having to wade through 3 driver-specific layers of abstractions just because I want to
> > > for example change API in ib_verbs.h and need to update all drivers will be
> > > very taxing. I understand your design calls for 2 layers, but at least the API exposed
> > > by code in drivers/net is fairly small, while cxio_wr.h declares 27 structures
> > > which seem to just duplicate ib_verbs.h.
> > 
> > cxio_wr.h is hw format.  You want me to change ib_verbs.h to make WRs
> > and CQEs align with Chelsio hardware?
> 
> No, but it need not be part of interface.  The reason I was confused is because
> you seem to create an extra copy e.g.  for t3_cqe.  cxio_poll_cq currently
> creates an intermediate copy of the completion on the stack, I think it could
> format ib_wc directly instead.
> 

I'll log this as a performance optimization that we can do later. 

Thanks for helping review this stuff!!

Steve.


From swise at opengridcomputing.com  Fri Feb  9 07:26:57 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Fri, 09 Feb 2007 09:26:57 -0600
Subject: [openib-general] dapl broken for iWARP
In-Reply-To: <C98692FD98048C41885E0B0FACD9DFB803AFA726@exnane01.hq.netapp.com>
References: <C98692FD98048C41885E0B0FACD9DFB803AFA726@exnane01.hq.netapp.com>
Message-ID: <1171034817.4896.39.camel@stevo-desktop>

On Fri, 2007-02-09 at 10:15 -0500, Kanevsky, Arkady wrote:
> Steve,
> what is an issue of using 
> max_qp_rd_atom and max_qp_init_rd_atom
> beside the bad name?

its a hack.

But Bob already asked to do this, so I guess I will.

We still don't ensure interoperability with DAPL consumers.  A global
value would.  Using device max's wont.


> Thanks,
> 
> Arkady Kanevsky                       email: arkady at netapp.com
> Network Appliance Inc.               phone: 781-768-5395
> 1601 Trapelo Rd. - Suite 16.        Fax: 781-895-1195
> Waltham, MA 02451                   central phone: 781-768-5300
>  
> 
> > -----Original Message-----
> > From: Steve Wise [mailto:swise at opengridcomputing.com] 
> > Sent: Thursday, February 08, 2007 6:11 PM
> > To: Arlin Davis
> > Cc: openib-general
> > Subject: Re: [openib-general] dapl broken for iWARP
> > 
> > On Wed, 2007-02-07 at 15:57 -0600, Steve Wise wrote:
> > > On Wed, 2007-02-07 at 14:02 -0600, Steve Wise wrote:
> > > > Arlin,
> > > > 
> > > > The OFED dapl code is assuming the responder_resources and 
> > > > initiator_depth passed up on a connection request event 
> > are from the 
> > > > remote peer.  This doesn't happen for iWARP.  In the 
> > current iWARP 
> > > > specifications, its up to the application to exchange this 
> > > > information somehow. So these are defaulting to 0 on the 
> > server side 
> > > > of any dapl connection over iWARP.
> > > > 
> > > > This is a fairly recent change, I think.  We need to come up with 
> > > > some way to deal with this for OFED 1.2 IMO.
> > > > 
> > > 
> > > The IWCM could set these to the device max values for instance.
> > > 
> > > Steve.
> > > 
> > 
> > There is a slight problem with all this.  There are no device 
> > attributes currently for ORD and IRD.  The ammasso driver 
> > maps these to max_qp_rd_atom (IRD) and 
> > max_qp_init_rd_atom(ORD).  But this is screwy.
> > We need new attribute for these.
> > 
> > For OFED 1.2, I think I should just have the IWCM set them to 
> > 8.  The only RNIC in ofed is cxgb3 and it supports 8...
> > 
> > 
> > Steve.
> > 
> > 
> > _______________________________________________
> > openib-general mailing list
> > openib-general at openib.org
> > http://openib.org/mailman/listinfo/openib-general
> > 
> > To unsubscribe, please visit 
> > http://openib.org/mailman/listinfo/openib-general
> > 


From Arkady.Kanevsky at netapp.com  Fri Feb  9 07:29:57 2007
From: Arkady.Kanevsky at netapp.com (Kanevsky, Arkady)
Date: Fri, 9 Feb 2007 10:29:57 -0500
Subject: [openib-general] dapl broken for iWARP
Message-ID: <C98692FD98048C41885E0B0FACD9DFB803AFA730@exnane01.hq.netapp.com>

Mike,
this is not a DAPL issue.
There are 2 ways to deal with it.
One is for all ULPs to use private data to exchange CM info.
yes, some ULPs, like SDP do that in hello world message.

Another is to let CM handle it.
This way ULP does not have to deal with it.
This is analogous to the IBTA CM IP addressing Annex.
It ensure backwards compatibility and does not break any existing apps
which use MPA as specified by IETF.

No need to bother IETF until we have it working.
Thanks,

Arkady Kanevsky                       email: arkady at netapp.com
Network Appliance Inc.               phone: 781-768-5395
1601 Trapelo Rd. - Suite 16.        Fax: 781-895-1195
Waltham, MA 02451                   central phone: 781-768-5300
 

> -----Original Message-----
> From: Michael Krause [mailto:krause at cup.hp.com] 
> Sent: Thursday, February 08, 2007 4:27 PM
> To: Kanevsky, Arkady; Steve Wise; Arlin Davis
> Cc: openib-general
> Subject: Re: [openib-general] dapl broken for iWARP
> 
> At 07:43 AM 2/8/2007, Kanevsky, Arkady wrote:
> >That is correct.
> >I am working with Krishna on it.
> >Expect patches soon.
> >
> >By the way the problem is not DAPL specific and so is a proposed 
> >solution.
> >
> >There are 3 aspects of the solution.
> >One is APIs. We suggest that we do not augment these.
> >That is a connection requestor sets its QP RDMA ORD and IRD.
> >When connection is established user can check the QP RDMA 
> ORD and IRD 
> >to see what he has now to use over the connection.
> >We may consider to extend QP attributes to support transport 
> specific 
> >parameters passing in the future.
> >For example, iWARP MPA CRC request.
> >
> >Second is the semantic that CM provides.
> >The proposal is to match IBCM semantic.
> >That is CM guarantee that local IRD is >= remote ORD.
> >This guarantees that incoming RDMA Read requests will not 
> overwhelm the 
> >QP RDMA Read capabilities.
> >Again there is not changes to IBCM only to IWCM.
> >Notice that as part of this IWCM will pass down to driver 
> and extract 
> >from driver needed info.
> >
> >The final part is iWARP CM extension to exchange RDMA ORD, IRD.
> >This is similar to IBTA Annex for IP Addressing.
> >The harder part that this will eventually require IETF MPA spec 
> >extension, and the fact that MPA protocol is implemented in 
> RNIC HW by 
> >many vendors, and hence can not be done by IWCM itself.
> 
> We looked at this quite a bit during the creation of the 
> specification.   All of the targeted usage models exchange 
> this information 
> as part of their "hello" or login exchanges.    As such, the 
> "hum" was to 
> not change MPA to communicate such information and leave it 
> to software to 
> exchange these values through existing mechanisms.   I 
> seriously doubt 
> there will be much support for modifying the MPA 
> specification at this stage since the implementations are 
> largely complete and a modification would have to deal with 
> the legacy interoperability issue which likely would be 
> solved in software any way.  It would be simpler to simply 
> modify the underlying DAPL implementation to exchange the 
> information and keep this hidden from both the application 
> and the RNIC providers.
> 
> Mike
> 
> 
> >Thanks,
> >
> >Arkady Kanevsky                       email: arkady at netapp.com
> >Network Appliance Inc.               phone: 781-768-5395
> >1601 Trapelo Rd. - Suite 16.        Fax: 781-895-1195
> >Waltham, MA 02451                   central phone: 781-768-5300
> >
> >
> > > -----Original Message-----
> > > From: Steve Wise [mailto:swise at opengridcomputing.com]
> > > Sent: Wednesday, February 07, 2007 6:12 PM
> > > To: Arlin Davis
> > > Cc: openib-general
> > > Subject: Re: [openib-general] dapl broken for iWARP
> > >
> > > On Wed, 2007-02-07 at 15:05 -0800, Arlin Davis wrote:
> > > > Steve Wise wrote:
> > > >
> > > > >On Wed, 2007-02-07 at 14:02 -0600, Steve Wise wrote:
> > > > >
> > > > >
> > > > >>Arlin,
> > > > >>
> > > > >>The OFED dapl code is assuming the responder_resources and
> > > > >>initiator_depth passed up on a connection request event
> > > are from the
> > > > >>remote peer.  This doesn't happen for iWARP.  In the
> > > current iWARP
> > > > >>specifications, its up to the application to exchange this
> > > > >>information somehow. So these are defaulting to 0 on the
> > > server side
> > > > >>of any dapl connection over iWARP.
> > > > >>
> > > > >>This is a fairly recent change, I think.  We need to 
> come up with
> > > > >>some way to deal with this for OFED 1.2 IMO.
> > > > >>
> > > > >>
> > > > Yes, this was changed recently to sync up with the 
> rdma_cm changes
> > > > that exposed the values.
> > > >
> > > > >>
> > > > >>
> > > > >
> > > > >The IWCM could set these to the device max values for instance.
> > > > >
> > > > >
> > > > That would work fine as long as you know the remote
> > > settings will be
> > > > equal or better. The provider just sets the min of 
> local device max
> > > > values and the remote values provided with the request.
> > > >
> > >
> > > I know Krishna Kumar is working on a solution for exchanging
> > > this info in private data so the IWCM can "do the right
> > > thing".  Stay tuned for a patch series to review for this.
> > > But this functionality is definitely post OFED-1.2.
> > >
> > >
> > > So for the OFED-1.2, I will set these to the device max 
> in the IWCM.
> > > Assuming the other side is OFED 1.2 DAPL, then it will work fine.
> > >
> > > Steve.
> > >
> > >
> > >
> > > _______________________________________________
> > > openib-general mailing list
> > > openib-general at openib.org
> > > http://openib.org/mailman/listinfo/openib-general
> > >
> > > To unsubscribe, please visit
> > > http://openib.org/mailman/listinfo/openib-general
> > >
> >
> >_______________________________________________
> >openib-general mailing list
> >openib-general at openib.org
> >http://openib.org/mailman/listinfo/openib-general
> >
> >To unsubscribe, please visit 
> >http://openib.org/mailman/listinfo/openib-general
> 
> 


From Arkady.Kanevsky at netapp.com  Fri Feb  9 07:32:56 2007
From: Arkady.Kanevsky at netapp.com (Kanevsky, Arkady)
Date: Fri, 9 Feb 2007 10:32:56 -0500
Subject: [openib-general] [PATCH] IPOIB: Use a GRH when appropriate
 forunicast packets
Message-ID: <C98692FD98048C41885E0B0FACD9DFB803AFA733@exnane01.hq.netapp.com>

Hal,
unfortunately, IBTA punted on this issue.
We considered it for IBTA CM IP address annex but at the end
could not handle all the cases.
Thanks,

Arkady Kanevsky                       email: arkady at netapp.com
Network Appliance Inc.               phone: 781-768-5395
1601 Trapelo Rd. - Suite 16.        Fax: 781-895-1195
Waltham, MA 02451                   central phone: 781-768-5300
 

> -----Original Message-----
> From: Hal Rosenstock [mailto:halr at voltaire.com] 
> Sent: Wednesday, February 07, 2007 7:20 PM
> To: Sean Hefty
> Cc: Jason Gunthorpe; Roland Dreier; 
> openib-general at openib.org; Kanevsky, Arkady
> Subject: Re: [openib-general] [PATCH] IPOIB: Use a GRH when 
> appropriate forunicast packets
> 
> On Wed, 2007-02-07 at 15:24, Sean Hefty wrote:
> > > I didn't get too far on getting CMA to work. Beyond the 
> bad HopLimit 
> > > feild I was seeing Hal pointed out a number of problems 
> in IBA that 
> > > would prevent it from working as is :<
> > 
> > I've started thinking about what it would take to get the 
> rdma cm to 
> > work across a router.  I think the rdma cm may need to treat IPv6 
> > addresses as a GID for this to work across subnets, versus 
> trying to 
> > map an ipoib IP address to a GID based on ARP.
> 
> An IB GID is IPv6 like but not an IPv6 address so I don't 
> think this is a good idea and don't see how you get around 
> mapping IP addresses to GIDs in an IB routed network given 
> the way things are spec'd. I think that the RDMA CM assumes a 
> single IPoIB subnet. Does it work when the destination is on 
> another subnet ? I think there are some unaddressed gateway 
> issues here to make that work and these may have been punted 
> (during spec time). Arkady might be a good person to comment on this.
> 
> -- Hal
> 
> > - Sean
> > 
> > _______________________________________________
> > openib-general mailing list
> > openib-general at openib.org
> > http://openib.org/mailman/listinfo/openib-general
> > 
> > To unsubscribe, please visit 
> > http://openib.org/mailman/listinfo/openib-general
> > 
> 


From rdreier at cisco.com  Fri Feb  9 07:34:10 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Fri, 09 Feb 2007 07:34:10 -0800
Subject: [openib-general] [PATCH 0/5] iw_cxgb3 - misc cleanup and fixes
In-Reply-To: <20070209072852.GP6560@mellanox.co.il> (Michael S.
	Tsirkin's message of "Fri, 9 Feb 2007 09:28:52 +0200")
References: <20070208202634.4382.15287.stgit@dell3.ogc.int>
	<ada64acnpla.fsf@cisco.com> <adatzxwmady.fsf@cisco.com>
	<20070209072852.GP6560@mellanox.co.il>
Message-ID: <adahctvl4zx.fsf@cisco.com>

    Michael> What about the mthca memory registration patches?  I
    Michael> thought they are on their way. Should I repost?

Sorry, I forgot about that.  Yes, please resend the latest state.


From tom at opengridcomputing.com  Fri Feb  9 07:41:08 2007
From: tom at opengridcomputing.com (Tom Tucker)
Date: Fri, 09 Feb 2007 09:41:08 -0600
Subject: [openib-general] [PATCH] RDMA/iwcm: Bugs in
 cm_conn_req_handler()
In-Reply-To: <ada7iusm021.fsf@cisco.com>
References: <OFB574507E.6CAEE34E-ON6525727D.0013194D-6525727D.00132A8C@in.ibm.com>
	<ada7iusm021.fsf@cisco.com>
Message-ID: <1171035668.26453.11.camel@trinity.ogc.int>

Roland:

This looks bad. Lemme noodle...

On Thu, 2007-02-08 at 20:23 -0800, Roland Dreier wrote:
> BTW, while looking at iwcm.c, I noticed the following highly dubious
> code for the first time:
> 
> 	static int iwcm_deref_id(struct iwcm_id_private *cm_id_priv)
> 	{
> 		int ret = 0;
> 	
> 		BUG_ON(atomic_read(&cm_id_priv->refcount)==0);
> 		if (atomic_dec_and_test(&cm_id_priv->refcount)) {
> 			BUG_ON(!list_empty(&cm_id_priv->work_list));
> 			if (waitqueue_active(&cm_id_priv->destroy_comp.wait)) {
> 				BUG_ON(cm_id_priv->state != IW_CM_STATE_DESTROYING);
> 				BUG_ON(test_bit(IWCM_F_CALLBACK_DESTROY,
> 						&cm_id_priv->flags));
> 				ret = 1;
> 			}
> 			complete(&cm_id_priv->destroy_comp);
> 		}
> 	
> 		return ret;
> 	}
> 
> The test of waitqueue_active on destroy_comp.wait looks really bad for
> two reasons: first, it is relying on an internal implementation detail
> of struct completion that really shouldn't be used by generic code.
> And second, it seems to me that this doesn't even work right, since
> there is a race something like the following:
> 
> iw_destroy_cm_id():
> destroy_cm_id(cm_id); // still 1 ref left
> 
> 				cm_work_handler():
> 					if (iwcm_deref_id()) // drop last ref
> 						return;
> 					// no one waiting yet, doesn't
> 					// return, but destroy_comp is
> 					// signaled
> 
> wait_for_completion(&cm_id_priv->destroy_comp);
> // destroy_comp is signaled, proceed
> kfree(cm_id_priv);
> 
> 					// continue using cm_id_priv
> 					// OOPS
> 
> I don't understand this code well enough for the fix to be obvious.
> 
>  - R.


From halr at voltaire.com  Fri Feb  9 07:39:01 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 09 Feb 2007 10:39:01 -0500
Subject: [openib-general] [PATCH] IPOIB: Use a GRH when appropriate
 forunicast packets
In-Reply-To: <C98692FD98048C41885E0B0FACD9DFB803AFA733@exnane01.hq.netapp.com>
References: <C98692FD98048C41885E0B0FACD9DFB803AFA733@exnane01.hq.netapp.com>
Message-ID: <1171035534.31538.166197.camel@hal.voltaire.com>

Arkady,

On Fri, 2007-02-09 at 10:32, Kanevsky, Arkady wrote:
> Hal,
> unfortunately, IBTA punted on this issue.
> We considered it for IBTA CM IP address annex but at the end
> could not handle all the cases.

Thanks.

Any idea if this issue might be addressed (no pun intended) or whether
it is left for implementors to decide if/how to try to handle this ?

-- Hal

> Thanks,
> 
> Arkady Kanevsky                       email: arkady at netapp.com
> Network Appliance Inc.               phone: 781-768-5395
> 1601 Trapelo Rd. - Suite 16.        Fax: 781-895-1195
> Waltham, MA 02451                   central phone: 781-768-5300
>  
> 
> > -----Original Message-----
> > From: Hal Rosenstock [mailto:halr at voltaire.com] 
> > Sent: Wednesday, February 07, 2007 7:20 PM
> > To: Sean Hefty
> > Cc: Jason Gunthorpe; Roland Dreier; 
> > openib-general at openib.org; Kanevsky, Arkady
> > Subject: Re: [openib-general] [PATCH] IPOIB: Use a GRH when 
> > appropriate forunicast packets
> > 
> > On Wed, 2007-02-07 at 15:24, Sean Hefty wrote:
> > > > I didn't get too far on getting CMA to work. Beyond the 
> > bad HopLimit 
> > > > feild I was seeing Hal pointed out a number of problems 
> > in IBA that 
> > > > would prevent it from working as is :<
> > > 
> > > I've started thinking about what it would take to get the 
> > rdma cm to 
> > > work across a router.  I think the rdma cm may need to treat IPv6 
> > > addresses as a GID for this to work across subnets, versus 
> > trying to 
> > > map an ipoib IP address to a GID based on ARP.
> > 
> > An IB GID is IPv6 like but not an IPv6 address so I don't 
> > think this is a good idea and don't see how you get around 
> > mapping IP addresses to GIDs in an IB routed network given 
> > the way things are spec'd. I think that the RDMA CM assumes a 
> > single IPoIB subnet. Does it work when the destination is on 
> > another subnet ? I think there are some unaddressed gateway 
> > issues here to make that work and these may have been punted 
> > (during spec time). Arkady might be a good person to comment on this.
> > 
> > -- Hal
> > 
> > > - Sean
> > > 
> > > _______________________________________________
> > > openib-general mailing list
> > > openib-general at openib.org
> > > http://openib.org/mailman/listinfo/openib-general
> > > 
> > > To unsubscribe, please visit 
> > > http://openib.org/mailman/listinfo/openib-general
> > > 
> > 


From purdy at sgi.com  Fri Feb  9 08:05:16 2007
From: purdy at sgi.com (Dale Purdy)
Date: Fri, 9 Feb 2007 10:05:16 -0600
Subject: [openib-general] [PATCH] OpenSM/osm_ucast_lash.c: In
 osm_get_lash_sl, fix SL when CA ports on same switch
In-Reply-To: <1170944383.31538.74632.camel@hal.voltaire.com>
References: <1170944383.31538.74632.camel@hal.voltaire.com>
Message-ID: <Pine.SGI.4.58.0702091001590.47103@cantor.americas.sgi.com>

We have successfully tested this bug fix and would like to see it
pushed into the 1.2 branch.

Dale

On Thu, 8 Feb 2007, Hal Rosenstock wrote:

> OpenSM/osm_ucast_lash.c: In osm_get_lash_sl, fix SL when CA ports on same switch

This change resolves an issue with strange SL assignment when
two HCAs communicate with other and are on the same switch.
Since LASH is switch to switch routing, the get_lash_sl
function was casting 9999 (the value assigned to the
variable NONE) to be a uint8_t when asked for an SL assignment
in this case. This change resolves this issue.

Signed-off-by: Thomas SÃ¸dring <tsodring at simula.no>
Signed-off-by: Hal Rosenstock <halr at voltaire.com>

diff --git a/osm/opensm/osm_ucast_lash.c b/osm/opensm/osm_ucast_lash.c
index 5dfe068..e5f751c 100644
--- a/osm/opensm/osm_ucast_lash.c
+++ b/osm/opensm/osm_ucast_lash.c
@@ -1468,6 +1468,7 @@ uint8_t osm_get_lash_sl(osm_opensm_t *p_
 			osm_port_t *p_src_port, osm_port_t *p_dst_port)
 {
 	unsigned dst_id;
+	unsigned src_id;
 	osm_switch_t *p_sw;

 	if (p_osm->routing_engine.ucast_build_fwd_tables != lash_process)
@@ -1482,6 +1483,10 @@ uint8_t osm_get_lash_sl(osm_opensm_t *p_
 	if (!p_sw || !p_sw->priv)
 		return OSM_DEFAULT_SL;

+	src_id = get_lash_id(p_sw);
+	if (src_id == dst_id)
+		return OSM_DEFAULT_SL;
+
 	return (uint8_t)((switch_t *)p_sw->priv)->routing_table[dst_id].lane;
 }


_______________________________________________
openib-general mailing list
openib-general at openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


From halr at voltaire.com  Fri Feb  9 08:22:52 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 09 Feb 2007 11:22:52 -0500
Subject: [openib-general] [PATCH] OpenSM/osm_ucast_lash.c: In
 osm_get_lash_sl, fix SL when CA ports on same switch
In-Reply-To: <Pine.SGI.4.58.0702091001590.47103@cantor.americas.sgi.com>
References: <1170944383.31538.74632.camel@hal.voltaire.com>
	<Pine.SGI.4.58.0702091001590.47103@cantor.americas.sgi.com>
Message-ID: <1171038146.31538.168863.camel@hal.voltaire.com>

On Fri, 2007-02-09 at 11:05, Dale Purdy wrote:
> We have successfully tested this bug fix

Thanks.

>  and would like to see it
> pushed into the 1.2 branch.

Already pushed for ofed_1_2. I will be sending a note to Vlad to pick
these up and it should be in alpha.

-- Hal

> Dale
> 
> On Thu, 8 Feb 2007, Hal Rosenstock wrote:
> 
> > OpenSM/osm_ucast_lash.c: In osm_get_lash_sl, fix SL when CA ports on same switch
> 
> This change resolves an issue with strange SL assignment when
> two HCAs communicate with other and are on the same switch.
> Since LASH is switch to switch routing, the get_lash_sl
> function was casting 9999 (the value assigned to the
> variable NONE) to be a uint8_t when asked for an SL assignment
> in this case. This change resolves this issue.
> 
> Signed-off-by: Thomas SÃ¸dring <tsodring at simula.no>
> Signed-off-by: Hal Rosenstock <halr at voltaire.com>
> 
> diff --git a/osm/opensm/osm_ucast_lash.c b/osm/opensm/osm_ucast_lash.c
> index 5dfe068..e5f751c 100644
> --- a/osm/opensm/osm_ucast_lash.c
> +++ b/osm/opensm/osm_ucast_lash.c
> @@ -1468,6 +1468,7 @@ uint8_t osm_get_lash_sl(osm_opensm_t *p_
>  			osm_port_t *p_src_port, osm_port_t *p_dst_port)
>  {
>  	unsigned dst_id;
> +	unsigned src_id;
>  	osm_switch_t *p_sw;
> 
>  	if (p_osm->routing_engine.ucast_build_fwd_tables != lash_process)
> @@ -1482,6 +1483,10 @@ uint8_t osm_get_lash_sl(osm_opensm_t *p_
>  	if (!p_sw || !p_sw->priv)
>  		return OSM_DEFAULT_SL;
> 
> +	src_id = get_lash_id(p_sw);
> +	if (src_id == dst_id)
> +		return OSM_DEFAULT_SL;
> +
>  	return (uint8_t)((switch_t *)p_sw->priv)->routing_table[dst_id].lane;
>  }
> 
> 
> 
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


From swise at opengridcomputing.com  Fri Feb  9 08:49:58 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Fri, 09 Feb 2007 10:49:58 -0600
Subject: [openib-general] [PATCH] for-2.6.21 Declare iwch_ev_dispatch in
	iwch.h
Message-ID: <1171039798.4896.49.camel@stevo-desktop>

Declare iwch_ev_dispatch in iwch.h

Remove the extern declaration from iwch.c and put it in iwch.h

Signed-off-by: Steve Wise <swise at opengridcomputing.com>
---

 drivers/infiniband/hw/cxgb3/iwch.c |    2 --
 drivers/infiniband/hw/cxgb3/iwch.h |    2 ++
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/hw/cxgb3/iwch.c b/drivers/infiniband/hw/cxgb3/iwch.c
index c353a9b..4611afa 100644
--- a/drivers/infiniband/hw/cxgb3/iwch.c
+++ b/drivers/infiniband/hw/cxgb3/iwch.c
@@ -162,8 +162,6 @@ static void close_rnic_dev(struct t3cdev
 	mutex_unlock(&dev_mutex);
 }
 
-extern void iwch_ev_dispatch(struct cxio_rdev *rdev_p, struct sk_buff *skb);
-
 static int __init iwch_init_module(void)
 {
 	int err;
diff --git a/drivers/infiniband/hw/cxgb3/iwch.h b/drivers/infiniband/hw/cxgb3/iwch.h
index 29cf2e8..6517ef8 100644
--- a/drivers/infiniband/hw/cxgb3/iwch.h
+++ b/drivers/infiniband/hw/cxgb3/iwch.h
@@ -172,4 +172,6 @@ static inline void remove_handle(struct 
 
 extern struct cxgb3_client t3c_client;
 extern cxgb3_cpl_handler_func t3c_handlers[NUM_CPL_CMDS];
+extern void iwch_ev_dispatch(struct cxio_rdev *rdev_p, struct sk_buff *skb);
+
 #endif


From mshefty at ichips.intel.com  Fri Feb  9 09:09:05 2007
From: mshefty at ichips.intel.com (Sean Hefty)
Date: Fri, 09 Feb 2007 09:09:05 -0800
Subject: [openib-general] Problem is routing CM REQ
In-Reply-To: <20070209043727.GN11411@obsidianresearch.com>
References: <20070126180840.GD12386@obsidianresearch.com>
	<45CA2084.7090503@ichips.intel.com>
	<20070207191154.GC11411@obsidianresearch.com>
	<45CA3568.1000508@ichips.intel.com>
	<20070207213108.GD11411@obsidianresearch.com>
	<45CA5573.80802@ichips.intel.com>
	<20070207224928.GF11411@obsidianresearch.com>
	<1170894459.31538.23768.camel@hal.voltaire.com>
	<45CB6A8F.2030705@ichips.intel.com> <45CBB59C.4010709@ichips.intel.com>
	<20070209043727.GN11411@obsidianresearch.com>
Message-ID: <45CCAAB1.7000103@ichips.intel.com>

> I have a follow up question to this.. With CM how is the SL for each
> side determined? I'm looking through the code here and it looks like
> the SL of the active side is passed in the REQ to the passive side (ie
> both sides are the same) But cma_query_ib_route does not set the
> reversible bit when it asks for the path. If you don't set the
> reversible bit isn't it necessary to make a 2nd path query to get the
> reverse path's SL? [Path responses without the reversible bit set
> are actually simplex paths and reversing them probably will run into
> SL2VL mapping tables that cause the packets to be dropped ie o7-8]

Complete support for non-reversible paths is missing.  It would take some 
additional work to add this in, and would likely require API changes. 
(Personally, I would like to keep ignoring this until it becomes an issue.)  For 
now, the CMA should at least set the reversible bit for its query.

I don't know that the reversible bit in a path record can really apply across 
subnets.

- Sean


From michael.arndt at informatik.tu-chemnitz.de  Fri Feb  9 09:14:49 2007
From: michael.arndt at informatik.tu-chemnitz.de (Michael Arndt)
Date: Fri, 9 Feb 2007 18:14:49 +0100
Subject: [openib-general] Unknown SMP Recv
References: <000901c74938$e10b2a30$21606d86@one7>
	<1170689654.4525.201415.camel@hal.voltaire.com>
	<001401c74946$a664a2e0$21606d86@one7>
	<1170695591.4525.207604.camel@hal.voltaire.com>
	<002001c74a33$c2ec1db0$21606d86@one7>
	<1170807564.4525.324195.camel@hal.voltaire.com>
	<001e01c74be2$b4889310$21606d86@one7>
	<1170994529.31538.124584.camel@hal.voltaire.com>
Message-ID: <000401c74c6d$ce4875f0$21606d86@one7>

Hi,

 > umad_send takes the timeout in msec. 100 msec is too short. Try
> something on the order of seconds. Note also that negative 'timeout_ms'
> value makes the kernel wait for the reply forever.

 I have tried many values, but sooner or later the umad_send broke down, 
which is bad because the SM thinks a port or node is unreachable if there 
didn't come an response. All works fine if I sleep after every send but that 
can't be the right track. What can I do or is there a known bug in the 
libibumad that I have slipped?

I modified the __osm_state_mgr_sweep_hop_1 function so it send not just one 
packets with [0][1] but also [0][1][1], [0][1][1][1]. Any there it happens 
too that one packet is not be sent sometimes. I'm wondering because if the 
SM get all PortInfos from an switch there are be many sends too, but it 
seems be that it works.

Is the packet size for the umad_send 256 max?

Thanks Michael


From mshefty at ichips.intel.com  Fri Feb  9 09:22:15 2007
From: mshefty at ichips.intel.com (Sean Hefty)
Date: Fri, 09 Feb 2007 09:22:15 -0800
Subject: [openib-general] Problem is routing CM REQ
In-Reply-To: <1171023168.31538.153989.camel@hal.voltaire.com>
References: <20070126000319.GA12386@obsidianresearch.com>
	<ada4pqel66k.fsf@cisco.com>
	<20070126180840.GD12386@obsidianresearch.com>
	<45CA2084.7090503@ichips.intel.com>
	<20070207191154.GC11411@obsidianresearch.com>
	<45CA3568.1000508@ichips.intel.com>
	<20070207213108.GD11411@obsidianresearch.com>
	<45CA5573.80802@ichips.intel.com>
	<20070207224928.GF11411@obsidianresearch.com>
	<1170894459.31538.23768.camel@hal.voltaire.com>
	<45CB6A8F.2030705@ichips.intel.com> <45CBB59C.4010709@ichips.intel.com>
	<1171023168.31538.153989.camel@hal.voltaire.com>
Message-ID: <45CCADC7.5000804@ichips.intel.com>

> SLID corresponding to SGID and a DLID for some IB router on the subnet
> which can route to the remote DGID.

This was my assumption as well.

> An SM is free to choose SLID and DLID to supply to if there are multiple
> LIDs for the ports in question it can choose alternates. The key here is
> whether a reversible path has been requested or not. It is also not
> clear what reversible means in the context of an IB internetwork
> (multiple IB subnets interconnected by IB routers).

For simplicity, assume a single path.  My assumption in this case was that the 
SLID/DLID values would be reversed.  That is, the LIDs are relative to the local 
subnet, not the SGID.  But if I set the SGID = DGID = remote GID, then the LIDs 
would be relative to the remote subnet.  (Assuming that the local SA could 
support such a query at all.)

It seems that in order to meet the requirements of the spec, we need a way to 
perform inter-subnet queries.  (The alternative being to change the spec...) 
And if the local SA can return a path record to a remote DGID, then it also 
seems like the local SA must be able to collect some sort of information about 
the path to the remote subnet.  (How it does this seems TBD.)

So... I'm thinking that the solution to these problems should rest within the 
local SA...

- Sean


From mshefty at ichips.intel.com  Fri Feb  9 10:01:03 2007
From: mshefty at ichips.intel.com (Sean Hefty)
Date: Fri, 09 Feb 2007 10:01:03 -0800
Subject: [openib-general] please pull for 2.6.21: fix + add IB multicast
 support
In-Reply-To: <20070209080418.GQ6560@mellanox.co.il>
References: <000701c741a6$16dc4760$ff0da8c0@amr.corp.intel.com>
	<45BF8E17.2010805@ichips.intel.com> <45C37BE9.5040105@ichips.intel.com>
	<45C85B39.4080700@voltaire.com> <45CB3537.8060508@voltaire.com>
	<adaodo4nqz7.fsf@cisco.com> <20070209080418.GQ6560@mellanox.co.il>
Message-ID: <45CCB6DF.3020602@ichips.intel.com>

> +       member = kzalloc(sizeof *member, gfp_mask);
> +       if (!member)
> +               return ERR_PTR(-ENOMEM);

This appears okay to replace with kmalloc.

> +       group = kzalloc(sizeof *group, gfp_mask);
> +       if (!group)
> +               return NULL;
> +

We would need additional initialize code to clear the members array, set the 
state, and set last_join fields.

> and same here:
> 
> +       iter = kzalloc(sizeof *iter + attr_size, GFP_KERNEL);
> +       if (!iter)
> +               return ERR_PTR(-ENOMEM);

I think this is coming from the local SA cache patch, which isn't part of this 
pull request.

> +
> 
> It seems same goes for
> 
> +       mc = kzalloc(sizeof(*mc), GFP_KERNEL);
> +       if (!mc)
> +               return NULL;

We would need to set events_reported.

> +       bind_list = kzalloc(sizeof *bind_list, GFP_KERNEL);
> +       if (!bind_list)
> +               return -ENOMEM;

This looks like it can be replaced with kmalloc.

Roland, let me know how you'd like to handle any changes.

- Sean


From halr at voltaire.com  Fri Feb  9 09:58:51 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 09 Feb 2007 12:58:51 -0500
Subject: [openib-general] Problem is routing CM REQ
In-Reply-To: <45CCADC7.5000804@ichips.intel.com>
References: <20070126000319.GA12386@obsidianresearch.com>
	<ada4pqel66k.fsf@cisco.com>
	<20070126180840.GD12386@obsidianresearch.com>
	<45CA2084.7090503@ichips.intel.com>
	<20070207191154.GC11411@obsidianresearch.com>
	<45CA3568.1000508@ichips.intel.com>
	<20070207213108.GD11411@obsidianresearch.com>
	<45CA5573.80802@ichips.intel.com>
	<20070207224928.GF11411@obsidianresearch.com>
	<1170894459.31538.23768.camel@hal.voltaire.com>
	<45CB6A8F.2030705@ichips.intel.com> <45CBB59C.4010709@ichips.intel.com>
	<1171023168.31538.153989.camel@hal.voltaire.com>
	<45CCADC7.5000804@ichips.intel.com>
Message-ID: <1171043929.31538.174521.camel@hal.voltaire.com>

On Fri, 2007-02-09 at 12:22, Sean Hefty wrote:
> > SLID corresponding to SGID and a DLID for some IB router on the subnet
> > which can route to the remote DGID.
> 
> This was my assumption as well.
> 
> > An SM is free to choose SLID and DLID to supply to if there are multiple
> > LIDs for the ports in question it can choose alternates. The key here is
> > whether a reversible path has been requested or not. It is also not
> > clear what reversible means in the context of an IB internetwork
> > (multiple IB subnets interconnected by IB routers).
> 
> For simplicity, assume a single path.  My assumption in this case was that the 
> SLID/DLID values would be reversed.  That is, the LIDs are relative to the local 
> subnet, not the SGID.  But if I set the SGID = DGID = remote GID, then the LIDs 
> would be relative to the remote subnet.  (Assuming that the local SA could 
> support such a query at all.)
> 
> It seems that in order to meet the requirements of the spec, we need a way to 
> perform inter-subnet queries.  (The alternative being to change the spec...) 
> And if the local SA can return a path record to a remote DGID, then it also 
> seems like the local SA must be able to collect some sort of information about 
> the path to the remote subnet.  (How it does this seems TBD.)
> 
> So... I'm thinking that the solution to these problems should rest within the 
> local SA...

Yes, this seems most consistent with what is there now although there
are some issues to work out on how some of the fields are supported and
which queries would work intersubnet (as well as how they would work).

-- Hal

> - Sean


From halr at voltaire.com  Fri Feb  9 10:12:56 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 09 Feb 2007 13:12:56 -0500
Subject: [openib-general] Unknown SMP Recv
In-Reply-To: <000401c74c6d$ce4875f0$21606d86@one7>
References: <000901c74938$e10b2a30$21606d86@one7>
	<1170689654.4525.201415.camel@hal.voltaire.com>
	<001401c74946$a664a2e0$21606d86@one7>
	<1170695591.4525.207604.camel@hal.voltaire.com>
	<002001c74a33$c2ec1db0$21606d86@one7>
	<1170807564.4525.324195.camel@hal.voltaire.com>
	<001e01c74be2$b4889310$21606d86@one7>
	<1170994529.31538.124584.camel@hal.voltaire.com>
	<000401c74c6d$ce4875f0$21606d86@one7>
Message-ID: <1171044773.31538.175280.camel@hal.voltaire.com>

On Fri, 2007-02-09 at 12:14, Michael Arndt wrote:
> Hi,
> 
>  > umad_send takes the timeout in msec. 100 msec is too short. Try
> > something on the order of seconds. Note also that negative 'timeout_ms'
> > value makes the kernel wait for the reply forever.
> 
> I have tried many values, but sooner or later the umad_send broke down, 
> which is bad because the SM thinks a port or node is unreachable if there 
> didn't come an response. All works fine if I sleep after every send but that 
> can't be the right track. What can I do or is there a known bug in the 
> libibumad that I have slipped?

I have no clue; I don't really understand what you have changed so it is
hard to know.

> I modified the __osm_state_mgr_sweep_hop_1 function so it send not just one 
> packets with [0][1] but also [0][1][1], [0][1][1][1].

I don't understand what you are trying to do and the scope of your
changes.

>  Any there it happens too that one packet is not be sent sometimes.

I can't parse this sentence.

> I'm wondering because if the SM get all PortInfos from an switch 
> there are be many sends too, but it seems be that it works.

Yes, the SM will poll for each port on a switch for its PortInfo and
each of these is a separate SubnGet.

> Is the packet size for the umad_send 256 max?

It depends on the MAD type. SMPs are limited to a single MAD (256 bytes)
whereas GMPs can be larger if the class supports RMPP (as SA does).

-- Hal

> Thanks Michael
> 
> 
> 
> 
> 


From michael.arndt at informatik.tu-chemnitz.de  Fri Feb  9 10:38:12 2007
From: michael.arndt at informatik.tu-chemnitz.de (Michael Arndt)
Date: Fri, 9 Feb 2007 19:38:12 +0100
Subject: [openib-general] Unknown SMP Recv
References: <000901c74938$e10b2a30$21606d86@one7>
	<1170689654.4525.201415.camel@hal.voltaire.com>
	<001401c74946$a664a2e0$21606d86@one7>
	<1170695591.4525.207604.camel@hal.voltaire.com>
	<002001c74a33$c2ec1db0$21606d86@one7>
	<1170807564.4525.324195.camel@hal.voltaire.com>
	<001e01c74be2$b4889310$21606d86@one7>
	<1170994529.31538.124584.camel@hal.voltaire.com>
	<000401c74c6d$ce4875f0$21606d86@one7>
	<1171044773.31538.175280.camel@hal.voltaire.com>
Message-ID: <000401c74c79$74439b50$21606d86@one7>

Hi,

> I have no clue; I don't really understand what you have changed so it is
> hard to know.

For example: if I send ten SMPs like:

    for (i=0;i<10;i++){
        umad_send(portid, agentid, msg, len, timeout, repeats);
    }

    timeout > 0!
than only the first one is sent and all other umad_send calls returning 
with -5.


Thanks Michael


From jsquyres at cisco.com  Fri Feb  9 10:38:04 2007
From: jsquyres at cisco.com (Jeff Squyres)
Date: Fri, 9 Feb 2007 13:38:04 -0500
Subject: [openib-general] Open MPI rpmbuild fails in OFED-1.2
In-Reply-To: <1170866522.6223.8.camel@vladsk-laptop>
References: <1170866522.6223.8.camel@vladsk-laptop>
Message-ID: <7CDAEF93-7E07-45CE-9D66-99F3ED98405B@cisco.com>

New SRPM on server that munges the %build section into the %install  
section.

Yuck.  :-)


On Feb 7, 2007, at 11:42 AM, Vladimir Sokolovsky wrote:

> Hi Jeff,
> Please remove %build macro from the RPM spec file.
> On SuSE distros it removes RPM_BUILD_ROOT.
>
> Executing(%build): /bin/sh -e /var/tmp/rpm-tmp.23343
> + umask 022
> + cd /var/tmp/OFEDRPM/BUILD
> + /bin/rm -rf /var/tmp/OFED
> ++ dirname /var/tmp/OFED
> + /bin/mkdir -p /var/tmp
> + /bin/mkdir /var/tmp/OFED
> + cd openmpi-1.2b4ofedr13470
> + fortify_source=1
> + test '' '!=' ''
> ...
>
> -- 
> Vladimir Sokolovsky <vlad at dev.mellanox.co.il>
> Mellanox Technologies Ltd.


-- 
Jeff Squyres
Server Virtualization Business Unit
Cisco Systems


From sashak at voltaire.com  Fri Feb  9 11:04:18 2007
From: sashak at voltaire.com (Sasha Khapyorsky)
Date: Fri, 9 Feb 2007 21:04:18 +0200
Subject: [openib-general] Unknown SMP Recv
In-Reply-To: <000401c74c79$74439b50$21606d86@one7>
References: <000901c74938$e10b2a30$21606d86@one7>
	<1170689654.4525.201415.camel@hal.voltaire.com>
	<001401c74946$a664a2e0$21606d86@one7>
	<1170695591.4525.207604.camel@hal.voltaire.com>
	<002001c74a33$c2ec1db0$21606d86@one7>
	<1170807564.4525.324195.camel@hal.voltaire.com>
	<001e01c74be2$b4889310$21606d86@one7>
	<1170994529.31538.124584.camel@hal.voltaire.com>
	<000401c74c6d$ce4875f0$21606d86@one7>
	<1171044773.31538.175280.camel@hal.voltaire.com>
	<000401c74c79$74439b50$21606d86@one7>
Message-ID: <1171051141.2767.7.camel@localhost>

Hi Michael,

On Fri, 2007-02-09 at 19:38 +0100, Michael Arndt wrote:
> Hi,
> 
> > I have no clue; I don't really understand what you have changed so it is
> > hard to know.
> 
> For example: if I send ten SMPs like:
> 
>     for (i=0;i<10;i++){
>         umad_send(portid, agentid, msg, len, timeout, repeats);
>     }
> 
>     timeout > 0!
> than only the first one is sent and all other umad_send calls returning 
> with -5.

It is strange, I did similar thing (you can see in
management/diags/src/mcm_rereg_test.c) and it worked fine for me.

Which libibumad version you are using? Also I understand you did some
changes in the stack, is it related to user_mad? Could you publish this?

Sasha


From halr at voltaire.com  Fri Feb  9 11:03:07 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 09 Feb 2007 14:03:07 -0500
Subject: [openib-general] Unknown SMP Recv
In-Reply-To: <000401c74c79$74439b50$21606d86@one7>
References: <000901c74938$e10b2a30$21606d86@one7>
	<1170689654.4525.201415.camel@hal.voltaire.com>
	<001401c74946$a664a2e0$21606d86@one7>
	<1170695591.4525.207604.camel@hal.voltaire.com>
	<002001c74a33$c2ec1db0$21606d86@one7>
	<1170807564.4525.324195.camel@hal.voltaire.com>
	<001e01c74be2$b4889310$21606d86@one7>
	<1170994529.31538.124584.camel@hal.voltaire.com>
	<000401c74c6d$ce4875f0$21606d86@one7>
	<1171044773.31538.175280.camel@hal.voltaire.com>
	<000401c74c79$74439b50$21606d86@one7>
Message-ID: <1171047785.31538.178263.camel@hal.voltaire.com>

On Fri, 2007-02-09 at 13:38, Michael Arndt wrote:
> Hi,
> 
> > I have no clue; I don't really understand what you have changed so it is
> > hard to know.
> 
> For example: if I send ten SMPs like:
> 
>     for (i=0;i<10;i++){
>         umad_send(portid, agentid, msg, len, timeout, repeats);
>     }
> 
>     timeout > 0!
> than only the first one is sent and all other umad_send calls returning 
> with -5.

-5 is EIO 

For some reason, umad_send is indicating this after the write into the
fd to pass the send to user_mad kernel module:

        n = write(port->dev_fd, mad, length + sizeof *mad);
        if (n == length + sizeof *mad)
                return 0;

        DEBUG("write returned %d != sizeof umad %zu + length %d (%m)",
              n, sizeof *mad, length);
        if (!errno)
                errno = EIO;
        return -EIO;

I have no clue as to why subsequent (non first) writes are failing to
write the proper amount of data. Do you have or can you create a simple
test program to demonstrate this ?

-- Hal

> Thanks Michael


From changquing.tang at hp.com  Fri Feb  9 11:11:04 2007
From: changquing.tang at hp.com (Tang, Changqing)
Date: Fri, 9 Feb 2007 19:11:04 -0000
Subject: [openib-general] Immediate data question
In-Reply-To: <6.2.0.14.2.20070208131610.085c52f8@esmail.cup.hp.com>
References: <6C2C79E72C305246B504CBA17B5500C905DC04@mtlexch01.mtl.com>
	<adaveigvg7q.fsf@cisco.com>
	<349DCDA352EACF42A0C49FA6DCEA840350AAC4@G3W0634.americas.hpqcorp.net>
	<adatzy0qmt3.fsf@cisco.com>
	<349DCDA352EACF42A0C49FA6DCEA840350B1B5@G3W0634.americas.hpqcorp.net>
	<ada7iuwp5rr.fsf@cisco.com>
	<349DCDA352EACF42A0C49FA6DCEA84035DF880@G3W0634.americas.hpqcorp.net>
	<adamz3pfym0.fsf@cisco.com>
	<349DCDA352EACF42A0C49FA6DCEA84035DFAFF@G3W0634.americas.hpqcorp.net>
	<adahctxeds8.fsf@cisco.com>
	<6.2.0.14.2.20070208131610.085c52f8@esmail.cup.hp.com>
Message-ID: <349DCDA352EACF42A0C49FA6DCEA840362A87E@G3W0634.americas.hpqcorp.net>

> >
> >Not for the receiver, but the sender will be severely slowed down by 
> >having to wait for the RNR timeouts.
> 
> RNR = Receiver Not Ready so by definition, the data flow 
> isn't going to 
> progress until the receiver is ready to receive data.   If a 
> receive QP 
> enters RNR for a RC, then it is likely not progressing as 
> desired.   RNR 
> was initially put in place to enable a receiver to create 
> back pressure to the sender without causing a fatal error 
> condition.  It should rarely be entered and therefore should 
> have negligible impact on overall performance however when a 
> RNR occurs, no forward progress will occur so performance is 
> essentially zero.

Mike:
	I still do not quite understand this issue. I have two
situations that have RNR triggered.

1. process A and process B is connected with QP. A first post a send to
B, B does not post receive. Then A and B are doing a long time
RDMA_WRITE each other, A and B just check memory for the RDMA_WRITE
message. Finally B will post a receive. Does the first pending send in A
block all the later RDMA_WRITE ? If not, since RNR is triggered
periodically till B post receive, does it affect the RDMA_WRITE
performance between A and B ?

2. extend above to three processes, A connect to B, B connect to C, so B
has two QPs, but one CQ. A posts a send to B, B does not post receive,
rather B and C are doing a long time RDMA_WRITE, or send/recv. But B
must sends RNR periodically to A, right?. So does the pending message
from A affects B's overall performance  between B and C ?

	Thank you.

--CQ


> 
> Mike 
> 
> 
> 


From jgunthorpe at obsidianresearch.com  Fri Feb  9 11:20:46 2007
From: jgunthorpe at obsidianresearch.com (Jason Gunthorpe)
Date: Fri, 9 Feb 2007 12:20:46 -0700
Subject: [openib-general] Problem is routing CM REQ
In-Reply-To: <1171043929.31538.174521.camel@hal.voltaire.com>
References: <45CA3568.1000508@ichips.intel.com>
	<20070207213108.GD11411@obsidianresearch.com>
	<45CA5573.80802@ichips.intel.com>
	<20070207224928.GF11411@obsidianresearch.com>
	<1170894459.31538.23768.camel@hal.voltaire.com>
	<45CB6A8F.2030705@ichips.intel.com> <45CBB59C.4010709@ichips.intel.com>
	<1171023168.31538.153989.camel@hal.voltaire.com>
	<45CCADC7.5000804@ichips.intel.com>
	<1171043929.31538.174521.camel@hal.voltaire.com>
Message-ID: <20070209192046.GP11411@obsidianresearch.com>

On Fri, Feb 09, 2007 at 12:58:51PM -0500, Hal Rosenstock wrote:
> > For simplicity, assume a single path.  My assumption in this case was that the 
> > SLID/DLID values would be reversed.  That is, the LIDs are relative to the local 
> > subnet, not the SGID.  But if I set the SGID = DGID = remote GID, then the LIDs 
> > would be relative to the remote subnet.  (Assuming that the local SA could 
> > support such a query at all.)
> > 
> > It seems that in order to meet the requirements of the spec, we need a way to 
> > perform inter-subnet queries.  (The alternative being to change the spec...) 
> > And if the local SA can return a path record to a remote DGID, then it also 
> > seems like the local SA must be able to collect some sort of information about 
> > the path to the remote subnet.  (How it does this seems TBD.)
> > 
> > So... I'm thinking that the solution to these problems should rest within the 
> > local SA...
> 
> Yes, this seems most consistent with what is there now although there
> are some issues to work out on how some of the fields are supported and
> which queries would work intersubnet (as well as how they would work).

I agree, some kind of inter subnet query will have to be used to make
this work consistently with the rest of IBA.

It looks to me like we overall need to have this look like:
- Routers need to be able to support inter-subnet reversible paths
  to meed the requirements for CM.
  - Inter-subnet reversible paths are defined to mean that when the LRH
    is selected on the destination subnet by the router it is reversible.
  - This can be signaled by using TClass and/or FlowLabel fields in the GRH.
- Routers need to be able to produce knowable SLIDs to meet the QP LID
  matching requirement
  - The LID to use can be signaled by using TClass and/or FlowLabel
- A kind of inter-subnet path record query is needed that can
  return a local and remote GRH and LRH. These four structures need to
  be *linked* so that:
   - Side A GRH.SGID = active side's Port GID
   - Side A GRH.DGID = passive side's Port GID
   - Side A LRH.SLID = any active side's port LID
   - Side A LRH.DLID = A subnet router
   - Side A LRH.SL   = SL to A subnet router

   - Side B GRH.SGID = Side A GRH.DGID
   - Side B GRH.DGID = Side A GRH.SGID
   - Side B LRH.SLID = any passive side's port LID
   - Side B LRH.DLID = B subnet router
   - Side B LRH.SL   = SL to B subnet router
   
   - When the A subnet router sees Side B GRH it produces
      LRH.SLID = Side A LRH.DLID
      LRH.DLID = Side A LRH.SLID
      LRH.SL   = SL to Side A Active side (may be != to Side A LRH.SL)
   - Similarly for Side B.      

   This linkage requirement is necessary due to the QP LID matching
   rules. I'm imagining that like SL the GRH.TClass and GRH.FlowLabel
   could be different in each direction.

   I'd think of this query as a generic duplex PathRecord query.

   Off hand I don't see that the existing path record query structure
   has enough information to do this.. Particularly, in cases
   where each subnet has more than 1 router port there is no real
   guarentee that querying for the SGID -> DGID direction and then the
   DGID -> SGID direction uses the same router ports without providing
   both router LIDs as part of the query.

Whatever responds to this query must be interacting with the router(s)
to ensure they recognize the GRHs and produce LRHs to meet all the
above requirements.

** The hackish and simple thing to do right now is to just demand that
   routers *always* use reversible LRHs with a single SLID and have the
   passive side pick up the QP lids from the LRH if it is routed..

Jason


From halr at voltaire.com  Fri Feb  9 11:47:23 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 09 Feb 2007 14:47:23 -0500
Subject: [openib-general] patches to 2.6.19.1 kernel for switch Operation
In-Reply-To: <039701c7494b$6bd5d860$1914a8c0@surioffice>
References: <000601c7419f$d4470c60$ff0da8c0@amr.corp.intel.com>
	<1170072757.4555.242192.camel@hal.voltaire.com>
	<039701c7494b$6bd5d860$1914a8c0@surioffice>
Message-ID: <1171050441.31538.180858.camel@hal.voltaire.com>

Suri,

On Mon, 2007-02-05 at 12:31, Suresh Shelvapille wrote:
> Hal:
> 
> We are upgrading to 2.6.19.1 kernel and I finally ported the changes
> required for Switch operation from my current kernel (2.6.12) version. 
> 
> I have tested these changes for a switch with different SM(s). But I need
> the community's help to test the changes on different HCAs to make sure I
> have not broken anything.
> 
> Please see if the changes look OK.

Here are my initial comments on these patches based only on code
inspection:

mad.c:
@@ -1871,24 +1877,49 @@
...
        if (recv->mad.mad.mad_hdr.mgmt_class ==
            IB_MGMT_CLASS_SUBN_DIRECTED_ROUTE) {
-               if (!smi_handle_dr_smp_recv(&recv->mad.smp,
-                                          
port_priv->device->node_type,
-                                           port_priv->port_num,
-                                          
port_priv->device->phys_port_cnt))
-                       goto out;
-               if (!smi_check_forward_dr_smp(&recv->mad.smp))
-                       goto local;
-               if (!smi_handle_dr_smp_send(&recv->mad.smp,
-                                          
port_priv->device->node_type,
-                                           port_priv->port_num))
+
+               int retsmi;
+
+               retsmi = smi_handle_dr_smp_recv(&recv->mad.smp,
+                                               port_priv->device->node_type,
+                                               port_num,
+                                               port_priv->device->phys_port_cnt);
+               if (!retsmi)
                        goto out;
-               if (!smi_check_local_smp(&recv->mad.smp,
port_priv->device))
+               else if (retsmi == 2) {
+                       if (!response) {
+                               printk(KERN_ERR PFX "No memory for
forwarded MAD\n");
+                               goto out;
+                       }
+                       memcpy(response, recv, sizeof(*response));
+                       response->header.recv_wc.wc =
&response->header.wc;
+                       response->header.recv_wc.recv_buf.mad =
&response->mad.mad;
+                       response->header.recv_wc.recv_buf.grh =
&response->grh;
+
+                       /* in case of forward, output port should be the
one
+                        * in either the Initial path(for outgoing) or
return_path(return)
+                        */
+                       if (!ib_get_smp_direction(&recv->mad.smp))
+                               port_num =
recv->mad.smp.initial_path[recv->mad.smp.hop_ptr+1];
+                       else
+                               port_num =
recv->mad.smp.return_path[recv->mad.smp.hop_ptr-1];
+
+                       if (!agent_send_response(&response->mad.mad, 
+                                                &response->grh, wc, 
+                                                port_priv->device,
+                                                port_num,
+                                                qp_info->qp->qp_num))
+                               response = NULL;

Per the above change, it appears that smi_check_forward_dr_smp and
smi_handle_dr_smp_send are no longer used at least here
(smi_check_forward_dr_smp is not used at all with this change). Couldn't
these be fixed to do the right thing for this case (as well as existing
cases) ? I'm not sure your changes work for end ports (CA and router
ports).

Also, based on smi comments below, there might also be changes to
following:
+                       if (!ib_get_smp_direction(&recv->mad.smp))
+                               port_num =
recv->mad.smp.initial_path[recv->mad.smp.hop_ptr+1];
+                       else
+                               port_num =
recv->mad.smp.return_path[recv->mad.smp.hop_ptr-1];
+

smi.c:
@@ -147,13 +147,18 @@
...
 
                /* C14-9:3 -- We're at the end of the DR segment of path
*/
                if (hop_ptr == hop_cnt) {
                        if (hop_cnt)
                                smp->return_path[hop_ptr] = port_num;
+                       smp->hop_ptr++;
+
                        /* smp->hop_ptr updated when sending */
The comment indicates the hop_ptr should be updated when sending not
here. Can't this be done ?

@@ -168,8 +173,8 @@
 
                /* C14-13:1 */
                if (hop_cnt && hop_ptr == hop_cnt + 1) {
-                       smp->hop_ptr--;
-                       return (smp->return_path[smp->hop_ptr] ==
+                       /* smp->hop_ptr--;*/
+                       return (smp->return_path[smp->hop_ptr-1] ==
                                port_num);
                }
This change affects more than switches as now the hop_ptr is not correct
per SMI. I think this also should be handled differently.

agent.c: 
@@ -113,6 +119,11 @@
 
        memcpy(send_buf->mad, mad, sizeof *mad);
        send_buf->ah = ah;
+       mad_send_wr = container_of(send_buf,
+                                  struct ib_mad_send_wr_private,
+                                  send_buf);
+       mad_send_wr->send_wr.wr.ud.port_num = port_num;
+       
        if ((ret = ib_post_send_mad(send_buf, NULL))) {

Shouldn't this only be for switches ? Not sure it causes a problem for
other than switches, but I think would be more consistent with the
current code. So I think this change should be surrounded by:
if (device->node_type == RDMA_NODE_IB_SWITCH) {
...
}

-- Hal

> Thanks,
> Suri


From halr at voltaire.com  Fri Feb  9 12:01:59 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 09 Feb 2007 15:01:59 -0500
Subject: [openib-general] Problem is routing CM REQ
In-Reply-To: <45CCD1E2.5050806@ichips.intel.com>
References: <45CA3568.1000508@ichips.intel.com>
	<20070207213108.GD11411@obsidianresearch.com>
	<45CA5573.80802@ichips.intel.com>
	<20070207224928.GF11411@obsidianresearch.com>
	<1170894459.31538.23768.camel@hal.voltaire.com>
	<45CB6A8F.2030705@ichips.intel.com> <45CBB59C.4010709@ichips.intel.com>
	<1171023168.31538.153989.camel@hal.voltaire.com>
	<45CCADC7.5000804@ichips.intel.com>
	<1171043929.31538.174521.camel@hal.voltaire.com>
	<20070209192046.GP11411@obsidianresearch.com>
	<45CCD1E2.5050806@ichips.intel.com>
Message-ID: <1171051315.31538.181667.camel@hal.voltaire.com>

On Fri, 2007-02-09 at 14:56, Sean Hefty wrote:
> I don't see a way to issue the SA query to the remote subnet though.

Even though SA queries can go intersubnet as they are GMPs and can
contain a GRH, the /missing part (right now) is locating the SA on that
remote subnet if this is a needed function. In any case, as there needs
to be some SA PathRecord forwarding from SM to SM on a per subnet basis
from source to destination, this will need to be solved (at least the
SMs will likely know and that could be exposed as well to SA clients).

-- Hal


From mshefty at ichips.intel.com  Fri Feb  9 11:56:18 2007
From: mshefty at ichips.intel.com (Sean Hefty)
Date: Fri, 09 Feb 2007 11:56:18 -0800
Subject: [openib-general] Problem is routing CM REQ
In-Reply-To: <20070209192046.GP11411@obsidianresearch.com>
References: <45CA3568.1000508@ichips.intel.com>
	<20070207213108.GD11411@obsidianresearch.com>
	<45CA5573.80802@ichips.intel.com>
	<20070207224928.GF11411@obsidianresearch.com>
	<1170894459.31538.23768.camel@hal.voltaire.com>
	<45CB6A8F.2030705@ichips.intel.com> <45CBB59C.4010709@ichips.intel.com>
	<1171023168.31538.153989.camel@hal.voltaire.com>
	<45CCADC7.5000804@ichips.intel.com>
	<1171043929.31538.174521.camel@hal.voltaire.com>
	<20070209192046.GP11411@obsidianresearch.com>
Message-ID: <45CCD1E2.5050806@ichips.intel.com>

> - A kind of inter-subnet path record query is needed that can
>   return a local and remote GRH and LRH. These four structures need to
>   be *linked* so that:
>    - Side A GRH.SGID = active side's Port GID
>    - Side A GRH.DGID = passive side's Port GID
>    - Side A LRH.SLID = any active side's port LID
>    - Side A LRH.DLID = A subnet router
>    - Side A LRH.SL   = SL to A subnet router
> 
>    - Side B GRH.SGID = Side A GRH.DGID
>    - Side B GRH.DGID = Side A GRH.SGID
>    - Side B LRH.SLID = any passive side's port LID
>    - Side B LRH.DLID = B subnet router
>    - Side B LRH.SL   = SL to B subnet router

Something along this line is what I was considering as well.

>    Off hand I don't see that the existing path record query structure
>    has enough information to do this.. Particularly, in cases
>    where each subnet has more than 1 router port there is no real
>    guarentee that querying for the SGID -> DGID direction and then the
>    DGID -> SGID direction uses the same router ports without providing
>    both router LIDs as part of the query.

I'm trying to figure out a way to get this information, but I'm still at a loss. 
  If there was a way to query both the local SA and remote SA using the same 
SGID/DGID pair, it's possible that the combined path records could be used to 
form this data.  I.e. set SGID = local port, and DGID = remote port for both 
queries.

I don't see a way to issue the SA query to the remote subnet though.

> ** The hackish and simple thing to do right now is to just demand that
>    routers *always* use reversible LRHs with a single SLID and have the
>    passive side pick up the QP lids from the LRH if it is routed..

Yep - we also need to hack the CM to set/replace the SLID/DLID carried in the CM 
REQ.

- Sean


From michael.arndt at informatik.tu-chemnitz.de  Fri Feb  9 12:19:04 2007
From: michael.arndt at informatik.tu-chemnitz.de (Michael Arndt)
Date: Fri, 9 Feb 2007 21:19:04 +0100
Subject: [openib-general] Unknown SMP Recv
References: <000901c74938$e10b2a30$21606d86@one7>
	<1170689654.4525.201415.camel@hal.voltaire.com>
	<001401c74946$a664a2e0$21606d86@one7>
	<1170695591.4525.207604.camel@hal.voltaire.com>
	<002001c74a33$c2ec1db0$21606d86@one7>
	<1170807564.4525.324195.camel@hal.voltaire.com>
	<001e01c74be2$b4889310$21606d86@one7>
	<1170994529.31538.124584.camel@hal.voltaire.com>
	<000401c74c6d$ce4875f0$21606d86@one7>
	<1171044773.31538.175280.camel@hal.voltaire.com>
	<000401c74c79$74439b50$21606d86@one7>
	<1171051141.2767.7.camel@localhost>
Message-ID: <001001c74c87$8b653470$21606d86@one7>

Hi,

> It is strange, I did similar thing (you can see in
> management/diags/src/mcm_rereg_test.c) and it worked fine for me.

What location is that?

>Which libibumad version you are using? Also I understand you did some
>changes in the stack, is it related to user_mad? Could you publish this?

I use OFED-1.1 and attached libibumad version. The stack where I have tested 
this context wasn't changed to exclude this. It is a diploma thesis and will 
publish as soon as posible ;)...in german ...sorry.

The hole example code Hal was asking for is below. I have marked the 
position with /* here */. Currently is the retry parameter zero, but I also 
tested 3.

Thanks Michael

// ---- Includes --------------------------------
#include <infiniband/umad.h>
#include <string.h>
#include <errno.h>

#include "sender.h"

// ---- Defines und Deklarationen ---------------

 static const uint8_t  CLASS_SUBN_DIRECTED_ROUTE = 0x81;
 static const uint8_t  CLASS_SUBN_LID_ROUTE = 0x1;

 static int long drmad_tid = 0x123;

 // Prototypes

 void drsmp_get_init(void *umad, DRPath *path, uint16_t attr, int mod);
 void drsmp_set_init(void *umad, DRPath *path, uint16_t attr, int mod, void 
*data);
 char * drmad_status_str(struct drsmp *drsmp);
 int str2DRPath(char *str, DRPath *path);
 int set_bit(int nr, void *method_mask);


// ---- Main ------------------------------------

int main (void){

 int Port_ID = 0;
 int Agent_ID = 0;
 int ret;
 int i;
 int length, timeout_ms = 10000;


 void *umad;
 struct drsmp *smp;


// ---- Einstellungen ---------------------------
 int Portnummer = 1;
 char Devicename [2][UMAD_CA_NAME_LEN];
 DRPath Path;
 char Path_Str[64];

 uint16_t attribute = MAD_ATTR_PORT_INFO; // PortInfo
 int modifier = 1;

 struct _register_info{
  int Management_Class;
  int Management_Version;
  uint8_t RMPP_Version;
  uint32_t Method_Mask[4];
 } Register_Info;

 // ++ Wertzuweisung ++

 Register_Info.Management_Class = CLASS_SUBN_DIRECTED_ROUTE;
 Register_Info.Management_Version = 1;
 Register_Info.RMPP_Version = 0;

 set_bit(0x01,&(Register_Info.Method_Mask));
 set_bit(0x02,&(Register_Info.Method_Mask));
 set_bit(0x81,&(Register_Info.Method_Mask));
 set_bit(0x03,&(Register_Info.Method_Mask));
 set_bit(0x05,&(Register_Info.Method_Mask));
 set_bit(0x06,&(Register_Info.Method_Mask));

 sprintf(Path_Str,"0,1,1,1");


// ---- Init Phase ------------------------------
 printf("... Init Lib ...");
 umad_init();
 printf("done\n\n");

 // ++ Debug ++
 umad_debug(0);

 printf("... Get CAs Names ...");
 ret = umad_get_cas_names(Devicename,2);
 if (!ret) {
  printf("Fehler: umad_get_cas_names: %i\n",ret);
  return -1;
 }
 else {
  printf("done\n\n");
  for (i = 0;i < ret;i++){
   printf("Devicename: %s\n",Devicename[i]);
  }

 }
 // ++ Open ++
 printf("... Open Port ...");
 if ((Port_ID = umad_open_port(Devicename[0],Portnummer)) < 0)
 {
  printf("Fehler: umad_open_port: %i\n",Port_ID);
  return -1;
 }
 else printf("done\n\n");
 // ++ Register ++
 printf("... Register User Mad ...");
 if ((Agent_ID = umad_register(Port_ID,Register_Info.Management_Class,
            Register_Info.Management_Version,
            Register_Info.RMPP_Version,
            0)) < 0){
  printf("Fehler: umad_register : %i\n",Agent_ID);
  goto Exit;
 }
 else printf("done\n\n");
// ---- Paket bauen -----------------------------

 printf("... Paket allokieren ...");
 if (!(umad = umad_alloc(1, umad_size() + IB_MAD_SIZE))){
  printf("Fehler: umad_alloc\n");
  goto Exit;
 }
 printf("done\n\n");

 smp = umad_get_mad(umad);
 printf("... Smp Pointer ... done\n");

 if ((str2DRPath(Path_Str, &Path)) < 0) printf("Fehler: str2DRPath\n");

 printf("... SMP bauen ...");
 drsmp_get_init(umad,&Path,attribute,modifier);
 printf("... done ...\n\n");


 //xdump(stderr, "before send:\n", smp, 256);
 dump_dr_smp(smp);

 length = IB_MAD_SIZE;

/* here */
 for (i = 0; i < 10; i++){
     printf("... Send Mad ...");
       if ((ret = umad_send(Port_ID, Agent_ID, umad, length, 200, 0)) < 0)
          printf("Fehler: umad_send : %i\n",ret);
       else printf("done\n\n");
 }

/*
 for (i = 0; i < 10; i++){
   printf("... Recv Mad ...");
   if (umad_recv(Port_ID, umad, &length, timeout_ms) != Agent_ID)
        printf("Fehler umad_recv: %s\n", drmad_status_str(smp));
   else printf("done\n\n");
 }
*/

 dump_dr_smp(smp);
 switch (attribute){
  case MAD_ATTR_NODE_INFO : dump_node_info((const struct 
node_info*)&(smp->data[0])); break;
  case MAD_ATTR_PORT_INFO : dump_port_info(0,0,0,(const struct 
port_info*)&(smp->data[0])); break;
 }


// ---- Down Phase ------------------------------ 
Exit:
 printf("... Unregister User Mad ...");
 if (umad_unregister(Port_ID,Agent_ID) < 0)
  printf("Fehler: umad_unregister\n");
 else printf("done\n\n");

 printf("... Close Port ...");
 if (Port_ID != -1)
  if ((umad_close_port(Port_ID)) != 0){
   printf("Fehler: umad_close_port\n");
  }
  else printf("done\n\n");
 else printf("nix zu tun\n\n");

}

// ---- SMP Paket -------------------------------


void drsmp_get_init(void *umad, DRPath *path, uint16_t attr, int mod)
{
   struct drsmp *smp = (struct drsmp *)(umad_get_mad(umad));

   memset(smp, 0, sizeof (*smp));

   smp->base_version  = 1;
   smp->mgmt_class    = CLASS_SUBN_DIRECTED_ROUTE;
   smp->class_version = 1;

   smp->method        = 0x01;
   smp->attr_id      = (uint16_t)htons((uint16_t)attr);
   smp->attr_mod     = htonl(mod);
   smp->tid           = htonll(drmad_tid++);
   smp->dr_slid       = 0xffff;
   smp->dr_dlid       = 0xffff;

   umad_set_addr(umad, 0xffff, 0, 0, 0);

   if (path)
      memcpy(smp->initial_path, path->path, path->hop_cnt+1);

   smp->hop_cnt = path->hop_cnt;
}

void drsmp_set_init(void *umad, DRPath *path, uint16_t attr, int mod, void 
*data)
{
   struct drsmp *smp = (struct drsmp *)(umad_get_mad(umad));

   memset(smp, 0, sizeof (*smp));

   smp->method        = 2;    /* SET */
   smp->attr_id      = (uint16_t)htons((uint16_t)attr);
   smp->attr_mod     = htonl(mod);
   smp->tid           = htonll(drmad_tid++);
   smp->dr_slid       = 0xffff;
   smp->dr_dlid       = 0xffff;

   umad_set_addr(umad, 0xffff, 0, 0, 0);

   if (path)
      memcpy(smp->initial_path, path->path, path->hop_cnt+1);

   if (data)
      memcpy(smp->data, data, sizeof smp->data);

   smp->hop_cnt = path->hop_cnt;
}

int str2DRPath(char *str, DRPath *path)
{
   char *s;

   path->hop_cnt = -1;

   //DEBUG("DR str: %s", str);
   while (str && *str) {
      if ((s = strchr(str, ',')))
         *s = 0;
      path->path[++path->hop_cnt] = atoi(str);
      if (!s)
         break;
      str = s+1;
   }

#if 0
   if (path->path[0] != 0 ||
      (path->hop_cnt > 0 && dev_port && path->path[1] != dev_port)) {
      DEBUG("hop 0 != 0 or hop 1 != dev_port");
      return -1;
   }
#endif

   return path->hop_cnt;
}


From mshefty at ichips.intel.com  Fri Feb  9 12:34:40 2007
From: mshefty at ichips.intel.com (Sean Hefty)
Date: Fri, 09 Feb 2007 12:34:40 -0800
Subject: [openib-general] Problem is routing CM REQ
In-Reply-To: <1171051315.31538.181667.camel@hal.voltaire.com>
References: <45CA3568.1000508@ichips.intel.com>
	<20070207213108.GD11411@obsidianresearch.com>
	<45CA5573.80802@ichips.intel.com>
	<20070207224928.GF11411@obsidianresearch.com>
	<1170894459.31538.23768.camel@hal.voltaire.com>
	<45CB6A8F.2030705@ichips.intel.com> <45CBB59C.4010709@ichips.intel.com>
	<1171023168.31538.153989.camel@hal.voltaire.com>
	<45CCADC7.5000804@ichips.intel.com>
	<1171043929.31538.174521.camel@hal.voltaire.com>
	<20070209192046.GP11411@obsidianresearch.com>
	<45CCD1E2.5050806@ichips.intel.com>
	<1171051315.31538.181667.camel@hal.voltaire.com>
Message-ID: <45CCDAE0.1080102@ichips.intel.com>

> the /missing part (right now) is locating the SA on that
> remote subnet if this is a needed function.

Maybe we can expose this to SA clients through a ServiceRecord?  This doesn't 
solve how the two SAs find each other (or any of the other difficult stuff), but 
with this and the path record query ability that we mentioned, I think we may 
have a solution for the host stack.

- Sean


From sashak at voltaire.com  Fri Feb  9 13:41:19 2007
From: sashak at voltaire.com (Sasha Khapyorsky)
Date: Fri, 09 Feb 2007 23:41:19 +0200
Subject: [openib-general] Unknown SMP Recv
In-Reply-To: <001001c74c87$8b653470$21606d86@one7>
References: <000901c74938$e10b2a30$21606d86@one7>
	<1170689654.4525.201415.camel@hal.voltaire.com>
	<001401c74946$a664a2e0$21606d86@one7>
	<1170695591.4525.207604.camel@hal.voltaire.com>
	<002001c74a33$c2ec1db0$21606d86@one7>
	<1170807564.4525.324195.camel@hal.voltaire.com>
	<001e01c74be2$b4889310$21606d86@one7>
	<1170994529.31538.124584.camel@hal.voltaire.com>
	<000401c74c6d$ce4875f0$21606d86@one7>
	<1171044773.31538.175280.camel@hal.voltaire.com>
	<000401c74c79$74439b50$21606d86@one7>
	<1171051141.2767.7.camel@localhost>
	<001001c74c87$8b653470$21606d86@one7>
Message-ID: <1171057279.2767.20.camel@localhost>

On Fri, 2007-02-09 at 21:19 +0100, Michael Arndt wrote:
> Hi,
> 
> > It is strange, I did similar thing (you can see in
> > management/diags/src/mcm_rereg_test.c) and it worked fine for me.
> 
> What location is that?

Do

  git clone git://git.openfabrics.org/~halr/management

and find this as management/diags/src/mcm_rereg_test.c .

Or you can look at this via gitweb interface:
http://git.openfabrics.org/git

Sasha

> >Which libibumad version you are using? Also I understand you did some
> >changes in the stack, is it related to user_mad? Could you publish this?
> 
> I use OFED-1.1 and attached libibumad version. The stack where I have tested 
> this context wasn't changed to exclude this. It is a diploma thesis and will 
> publish as soon as posible ;)...in german ...sorry.
> 
> The hole example code Hal was asking for is below. I have marked the 
> position with /* here */. Currently is the retry parameter zero, but I also 
> tested 3.
> 
> Thanks Michael
> 
> // ---- Includes --------------------------------
> #include <infiniband/umad.h>
> #include <string.h>
> #include <errno.h>
> 
> #include "sender.h"
> 
> // ---- Defines und Deklarationen ---------------
> 
>  static const uint8_t  CLASS_SUBN_DIRECTED_ROUTE = 0x81;
>  static const uint8_t  CLASS_SUBN_LID_ROUTE = 0x1;
> 
>  static int long drmad_tid = 0x123;
> 
>  // Prototypes
> 
>  void drsmp_get_init(void *umad, DRPath *path, uint16_t attr, int mod);
>  void drsmp_set_init(void *umad, DRPath *path, uint16_t attr, int mod, void 
> *data);
>  char * drmad_status_str(struct drsmp *drsmp);
>  int str2DRPath(char *str, DRPath *path);
>  int set_bit(int nr, void *method_mask);
> 
> 
> 
> // ---- Main ------------------------------------
> 
> int main (void){
> 
>  int Port_ID = 0;
>  int Agent_ID = 0;
>  int ret;
>  int i;
>  int length, timeout_ms = 10000;
> 
> 
>  void *umad;
>  struct drsmp *smp;
> 
> 
> // ---- Einstellungen ---------------------------
>  int Portnummer = 1;
>  char Devicename [2][UMAD_CA_NAME_LEN];
>  DRPath Path;
>  char Path_Str[64];
> 
>  uint16_t attribute = MAD_ATTR_PORT_INFO; // PortInfo
>  int modifier = 1;
> 
>  struct _register_info{
>   int Management_Class;
>   int Management_Version;
>   uint8_t RMPP_Version;
>   uint32_t Method_Mask[4];
>  } Register_Info;
> 
>  // ++ Wertzuweisung ++
> 
>  Register_Info.Management_Class = CLASS_SUBN_DIRECTED_ROUTE;
>  Register_Info.Management_Version = 1;
>  Register_Info.RMPP_Version = 0;
> 
>  set_bit(0x01,&(Register_Info.Method_Mask));
>  set_bit(0x02,&(Register_Info.Method_Mask));
>  set_bit(0x81,&(Register_Info.Method_Mask));
>  set_bit(0x03,&(Register_Info.Method_Mask));
>  set_bit(0x05,&(Register_Info.Method_Mask));
>  set_bit(0x06,&(Register_Info.Method_Mask));
> 
>  sprintf(Path_Str,"0,1,1,1");
> 
> 
> // ---- Init Phase ------------------------------
>  printf("... Init Lib ...");
>  umad_init();
>  printf("done\n\n");
> 
>  // ++ Debug ++
>  umad_debug(0);
> 
>  printf("... Get CAs Names ...");
>  ret = umad_get_cas_names(Devicename,2);
>  if (!ret) {
>   printf("Fehler: umad_get_cas_names: %i\n",ret);
>   return -1;
>  }
>  else {
>   printf("done\n\n");
>   for (i = 0;i < ret;i++){
>    printf("Devicename: %s\n",Devicename[i]);
>   }
> 
>  }
>  // ++ Open ++
>  printf("... Open Port ...");
>  if ((Port_ID = umad_open_port(Devicename[0],Portnummer)) < 0)
>  {
>   printf("Fehler: umad_open_port: %i\n",Port_ID);
>   return -1;
>  }
>  else printf("done\n\n");
>  // ++ Register ++
>  printf("... Register User Mad ...");
>  if ((Agent_ID = umad_register(Port_ID,Register_Info.Management_Class,
>             Register_Info.Management_Version,
>             Register_Info.RMPP_Version,
>             0)) < 0){
>   printf("Fehler: umad_register : %i\n",Agent_ID);
>   goto Exit;
>  }
>  else printf("done\n\n");
> // ---- Paket bauen -----------------------------
> 
>  printf("... Paket allokieren ...");
>  if (!(umad = umad_alloc(1, umad_size() + IB_MAD_SIZE))){
>   printf("Fehler: umad_alloc\n");
>   goto Exit;
>  }
>  printf("done\n\n");
> 
>  smp = umad_get_mad(umad);
>  printf("... Smp Pointer ... done\n");
> 
>  if ((str2DRPath(Path_Str, &Path)) < 0) printf("Fehler: str2DRPath\n");
> 
>  printf("... SMP bauen ...");
>  drsmp_get_init(umad,&Path,attribute,modifier);
>  printf("... done ...\n\n");
> 
> 
>  //xdump(stderr, "before send:\n", smp, 256);
>  dump_dr_smp(smp);
> 
>  length = IB_MAD_SIZE;
> 
> /* here */
>  for (i = 0; i < 10; i++){
>      printf("... Send Mad ...");
>        if ((ret = umad_send(Port_ID, Agent_ID, umad, length, 200, 0)) < 0)
>           printf("Fehler: umad_send : %i\n",ret);
>        else printf("done\n\n");
>  }
> 
> /*
>  for (i = 0; i < 10; i++){
>    printf("... Recv Mad ...");
>    if (umad_recv(Port_ID, umad, &length, timeout_ms) != Agent_ID)
>         printf("Fehler umad_recv: %s\n", drmad_status_str(smp));
>    else printf("done\n\n");
>  }
> */
> 
>  dump_dr_smp(smp);
>  switch (attribute){
>   case MAD_ATTR_NODE_INFO : dump_node_info((const struct 
> node_info*)&(smp->data[0])); break;
>   case MAD_ATTR_PORT_INFO : dump_port_info(0,0,0,(const struct 
> port_info*)&(smp->data[0])); break;
>  }
> 
> 
> // ---- Down Phase ------------------------------ 
> Exit:
>  printf("... Unregister User Mad ...");
>  if (umad_unregister(Port_ID,Agent_ID) < 0)
>   printf("Fehler: umad_unregister\n");
>  else printf("done\n\n");
> 
>  printf("... Close Port ...");
>  if (Port_ID != -1)
>   if ((umad_close_port(Port_ID)) != 0){
>    printf("Fehler: umad_close_port\n");
>   }
>   else printf("done\n\n");
>  else printf("nix zu tun\n\n");
> 
> }
> 
> // ---- SMP Paket -------------------------------
> 
> 
> void drsmp_get_init(void *umad, DRPath *path, uint16_t attr, int mod)
> {
>    struct drsmp *smp = (struct drsmp *)(umad_get_mad(umad));
> 
>    memset(smp, 0, sizeof (*smp));
> 
>    smp->base_version  = 1;
>    smp->mgmt_class    = CLASS_SUBN_DIRECTED_ROUTE;
>    smp->class_version = 1;
> 
>    smp->method        = 0x01;
>    smp->attr_id      = (uint16_t)htons((uint16_t)attr);
>    smp->attr_mod     = htonl(mod);
>    smp->tid           = htonll(drmad_tid++);
>    smp->dr_slid       = 0xffff;
>    smp->dr_dlid       = 0xffff;
> 
>    umad_set_addr(umad, 0xffff, 0, 0, 0);
> 
>    if (path)
>       memcpy(smp->initial_path, path->path, path->hop_cnt+1);
> 
>    smp->hop_cnt = path->hop_cnt;
> }
> 
> void drsmp_set_init(void *umad, DRPath *path, uint16_t attr, int mod, void 
> *data)
> {
>    struct drsmp *smp = (struct drsmp *)(umad_get_mad(umad));
> 
>    memset(smp, 0, sizeof (*smp));
> 
>    smp->method        = 2;    /* SET */
>    smp->attr_id      = (uint16_t)htons((uint16_t)attr);
>    smp->attr_mod     = htonl(mod);
>    smp->tid           = htonll(drmad_tid++);
>    smp->dr_slid       = 0xffff;
>    smp->dr_dlid       = 0xffff;
> 
>    umad_set_addr(umad, 0xffff, 0, 0, 0);
> 
>    if (path)
>       memcpy(smp->initial_path, path->path, path->hop_cnt+1);
> 
>    if (data)
>       memcpy(smp->data, data, sizeof smp->data);
> 
>    smp->hop_cnt = path->hop_cnt;
> }
> 
> int str2DRPath(char *str, DRPath *path)
> {
>    char *s;
> 
>    path->hop_cnt = -1;
> 
>    //DEBUG("DR str: %s", str);
>    while (str && *str) {
>       if ((s = strchr(str, ',')))
>          *s = 0;
>       path->path[++path->hop_cnt] = atoi(str);
>       if (!s)
>          break;
>       str = s+1;
>    }
> 
> #if 0
>    if (path->path[0] != 0 ||
>       (path->hop_cnt > 0 && dev_port && path->path[1] != dev_port)) {
>       DEBUG("hop 0 != 0 or hop 1 != dev_port");
>       return -1;
>    }
> #endif
> 
>    return path->hop_cnt;
> }
> 
> 


From halr at voltaire.com  Fri Feb  9 13:45:29 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 09 Feb 2007 16:45:29 -0500
Subject: [openib-general] Problem is routing CM REQ
In-Reply-To: <20070209192046.GP11411@obsidianresearch.com>
References: <45CA3568.1000508@ichips.intel.com>
	<20070207213108.GD11411@obsidianresearch.com>
	<45CA5573.80802@ichips.intel.com>
	<20070207224928.GF11411@obsidianresearch.com>
	<1170894459.31538.23768.camel@hal.voltaire.com>
	<45CB6A8F.2030705@ichips.intel.com> <45CBB59C.4010709@ichips.intel.com>
	<1171023168.31538.153989.camel@hal.voltaire.com>
	<45CCADC7.5000804@ichips.intel.com>
	<1171043929.31538.174521.camel@hal.voltaire.com>
	<20070209192046.GP11411@obsidianresearch.com>
Message-ID: <1171057501.31538.187596.camel@hal.voltaire.com>

On Fri, 2007-02-09 at 14:20, Jason Gunthorpe wrote:
> On Fri, Feb 09, 2007 at 12:58:51PM -0500, Hal Rosenstock wrote:
> > > For simplicity, assume a single path.  My assumption in this case was that the 
> > > SLID/DLID values would be reversed.  That is, the LIDs are relative to the local 
> > > subnet, not the SGID.  But if I set the SGID = DGID = remote GID, then the LIDs 
> > > would be relative to the remote subnet.  (Assuming that the local SA could 
> > > support such a query at all.)
> > > 
> > > It seems that in order to meet the requirements of the spec, we need a way to 
> > > perform inter-subnet queries.  (The alternative being to change the spec...) 
> > > And if the local SA can return a path record to a remote DGID, then it also 
> > > seems like the local SA must be able to collect some sort of information about 
> > > the path to the remote subnet.  (How it does this seems TBD.)
> > > 
> > > So... I'm thinking that the solution to these problems should rest within the 
> > > local SA...
> > 
> > Yes, this seems most consistent with what is there now although there
> > are some issues to work out on how some of the fields are supported and
> > which queries would work intersubnet (as well as how they would work).
> 
> I agree, some kind of inter subnet query will have to be used to make
> this work consistently with the rest of IBA.
> 
> It looks to me like we overall need to have this look like:
> - Routers need to be able to support inter-subnet reversible paths
>   to meed the requirements for CM.
>   - Inter-subnet reversible paths are defined to mean that when the LRH
>     is selected on the destination subnet by the router it is reversible.
>   - This can be signaled by using TClass and/or FlowLabel fields in the GRH.
> - Routers need to be able to produce knowable SLIDs to meet the QP LID
>   matching requirement
>   - The LID to use can be signaled by using TClass and/or FlowLabel
> - A kind of inter-subnet path record query is needed that can
>   return a local and remote GRH and LRH. These four structures need to
>   be *linked* so that:
>    - Side A GRH.SGID = active side's Port GID
>    - Side A GRH.DGID = passive side's Port GID
>    - Side A LRH.SLID = any active side's port LID
>    - Side A LRH.DLID = A subnet router
>    - Side A LRH.SL   = SL to A subnet router
> 
>    - Side B GRH.SGID = Side A GRH.DGID
>    - Side B GRH.DGID = Side A GRH.SGID
>    - Side B LRH.SLID = any passive side's port LID
>    - Side B LRH.DLID = B subnet router
>    - Side B LRH.SL   = SL to B subnet router
>    
>    - When the A subnet router sees Side B GRH it produces
>       LRH.SLID = Side A LRH.DLID
>       LRH.DLID = Side A LRH.SLID
>       LRH.SL   = SL to Side A Active side (may be != to Side A LRH.SL)
>    - Similarly for Side B.      
> 
>    This linkage requirement is necessary due to the QP LID matching
>    rules. I'm imagining that like SL the GRH.TClass and GRH.FlowLabel
>    could be different in each direction.
> 
>    I'd think of this query as a generic duplex PathRecord query.
> 
>    Off hand I don't see that the existing path record query structure
>    has enough information to do this.. Particularly, in cases
>    where each subnet has more than 1 router port there is no real
>    guarentee that querying for the SGID -> DGID direction and then the
>    DGID -> SGID direction uses the same router ports without providing
>    both router LIDs as part of the query.

Router LIDs rather than GIDs (in the case of LMC > 0) ?

The SA PathRecord may have room but the MultiPathRecord is pretty
tightly packed now.

-- Hal

> Whatever responds to this query must be interacting with the router(s)
> to ensure they recognize the GRHs and produce LRHs to meet all the
> above requirements.
> 
> ** The hackish and simple thing to do right now is to just demand that
>    routers *always* use reversible LRHs with a single SLID and have the
>    passive side pick up the QP lids from the LRH if it is routed..
> 
> Jason


From halr at voltaire.com  Fri Feb  9 13:54:48 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 09 Feb 2007 16:54:48 -0500
Subject: [openib-general] Unknown SMP Recv
In-Reply-To: <001001c74c87$8b653470$21606d86@one7>
References: <000901c74938$e10b2a30$21606d86@one7>
	<1170689654.4525.201415.camel@hal.voltaire.com>
	<001401c74946$a664a2e0$21606d86@one7>
	<1170695591.4525.207604.camel@hal.voltaire.com>
	<002001c74a33$c2ec1db0$21606d86@one7>
	<1170807564.4525.324195.camel@hal.voltaire.com>
	<001e01c74be2$b4889310$21606d86@one7>
	<1170994529.31538.124584.camel@hal.voltaire.com>
	<000401c74c6d$ce4875f0$21606d86@one7>
	<1171044773.31538.175280.camel@hal.voltaire.com>
	<000401c74c79$74439b50$21606d86@one7>
	<1171051141.2767.7.camel@localhost>
	<001001c74c87$8b653470$21606d86@one7>
Message-ID: <1171058084.31538.188191.camel@hal.voltaire.com>

On Fri, 2007-02-09 at 15:19, Michael Arndt wrote:
> Hi,
> 
> > It is strange, I did similar thing (you can see in
> > management/diags/src/mcm_rereg_test.c) and it worked fine for me.
> 
> What location is that?
> 
> >Which libibumad version you are using? Also I understand you did some
> >changes in the stack, is it related to user_mad? Could you publish this?
> 
> I use OFED-1.1 and attached libibumad version. The stack where I have tested 
> this context wasn't changed to exclude this. It is a diploma thesis and will 
> publish as soon as posible ;)...in german ...sorry.
> 
> The hole example code Hal was asking for is below. I have marked the 
> position with /* here */. Currently is the retry parameter zero, but I also 
> tested 3.
> 
> Thanks Michael
> 
> // ---- Includes --------------------------------
> #include <infiniband/umad.h>
> #include <string.h>
> #include <errno.h>
> 
> #include "sender.h"

Can you provide this as well ?

-- Hal

> 
> // ---- Defines und Deklarationen ---------------
> 
>  static const uint8_t  CLASS_SUBN_DIRECTED_ROUTE = 0x81;
>  static const uint8_t  CLASS_SUBN_LID_ROUTE = 0x1;
> 
>  static int long drmad_tid = 0x123;
> 
>  // Prototypes
> 
>  void drsmp_get_init(void *umad, DRPath *path, uint16_t attr, int mod);
>  void drsmp_set_init(void *umad, DRPath *path, uint16_t attr, int mod, void 
> *data);
>  char * drmad_status_str(struct drsmp *drsmp);
>  int str2DRPath(char *str, DRPath *path);
>  int set_bit(int nr, void *method_mask);
> 
> 
> 
> // ---- Main ------------------------------------
> 
> int main (void){
> 
>  int Port_ID = 0;
>  int Agent_ID = 0;
>  int ret;
>  int i;
>  int length, timeout_ms = 10000;
> 
> 
>  void *umad;
>  struct drsmp *smp;
> 
> 
> // ---- Einstellungen ---------------------------
>  int Portnummer = 1;
>  char Devicename [2][UMAD_CA_NAME_LEN];
>  DRPath Path;
>  char Path_Str[64];
> 
>  uint16_t attribute = MAD_ATTR_PORT_INFO; // PortInfo
>  int modifier = 1;
> 
>  struct _register_info{
>   int Management_Class;
>   int Management_Version;
>   uint8_t RMPP_Version;
>   uint32_t Method_Mask[4];
>  } Register_Info;
> 
>  // ++ Wertzuweisung ++
> 
>  Register_Info.Management_Class = CLASS_SUBN_DIRECTED_ROUTE;
>  Register_Info.Management_Version = 1;
>  Register_Info.RMPP_Version = 0;
> 
>  set_bit(0x01,&(Register_Info.Method_Mask));
>  set_bit(0x02,&(Register_Info.Method_Mask));
>  set_bit(0x81,&(Register_Info.Method_Mask));
>  set_bit(0x03,&(Register_Info.Method_Mask));
>  set_bit(0x05,&(Register_Info.Method_Mask));
>  set_bit(0x06,&(Register_Info.Method_Mask));
> 
>  sprintf(Path_Str,"0,1,1,1");
> 
> 
> // ---- Init Phase ------------------------------
>  printf("... Init Lib ...");
>  umad_init();
>  printf("done\n\n");
> 
>  // ++ Debug ++
>  umad_debug(0);
> 
>  printf("... Get CAs Names ...");
>  ret = umad_get_cas_names(Devicename,2);
>  if (!ret) {
>   printf("Fehler: umad_get_cas_names: %i\n",ret);
>   return -1;
>  }
>  else {
>   printf("done\n\n");
>   for (i = 0;i < ret;i++){
>    printf("Devicename: %s\n",Devicename[i]);
>   }
> 
>  }
>  // ++ Open ++
>  printf("... Open Port ...");
>  if ((Port_ID = umad_open_port(Devicename[0],Portnummer)) < 0)
>  {
>   printf("Fehler: umad_open_port: %i\n",Port_ID);
>   return -1;
>  }
>  else printf("done\n\n");
>  // ++ Register ++
>  printf("... Register User Mad ...");
>  if ((Agent_ID = umad_register(Port_ID,Register_Info.Management_Class,
>             Register_Info.Management_Version,
>             Register_Info.RMPP_Version,
>             0)) < 0){
>   printf("Fehler: umad_register : %i\n",Agent_ID);
>   goto Exit;
>  }
>  else printf("done\n\n");
> // ---- Paket bauen -----------------------------
> 
>  printf("... Paket allokieren ...");
>  if (!(umad = umad_alloc(1, umad_size() + IB_MAD_SIZE))){
>   printf("Fehler: umad_alloc\n");
>   goto Exit;
>  }
>  printf("done\n\n");
> 
>  smp = umad_get_mad(umad);
>  printf("... Smp Pointer ... done\n");
> 
>  if ((str2DRPath(Path_Str, &Path)) < 0) printf("Fehler: str2DRPath\n");
> 
>  printf("... SMP bauen ...");
>  drsmp_get_init(umad,&Path,attribute,modifier);
>  printf("... done ...\n\n");
> 
> 
>  //xdump(stderr, "before send:\n", smp, 256);
>  dump_dr_smp(smp);
> 
>  length = IB_MAD_SIZE;
> 
> /* here */
>  for (i = 0; i < 10; i++){
>      printf("... Send Mad ...");
>        if ((ret = umad_send(Port_ID, Agent_ID, umad, length, 200, 0)) < 0)
>           printf("Fehler: umad_send : %i\n",ret);
>        else printf("done\n\n");
>  }
> 
> /*
>  for (i = 0; i < 10; i++){
>    printf("... Recv Mad ...");
>    if (umad_recv(Port_ID, umad, &length, timeout_ms) != Agent_ID)
>         printf("Fehler umad_recv: %s\n", drmad_status_str(smp));
>    else printf("done\n\n");
>  }
> */
> 
>  dump_dr_smp(smp);
>  switch (attribute){
>   case MAD_ATTR_NODE_INFO : dump_node_info((const struct 
> node_info*)&(smp->data[0])); break;
>   case MAD_ATTR_PORT_INFO : dump_port_info(0,0,0,(const struct 
> port_info*)&(smp->data[0])); break;
>  }
> 
> 
> // ---- Down Phase ------------------------------ 
> Exit:
>  printf("... Unregister User Mad ...");
>  if (umad_unregister(Port_ID,Agent_ID) < 0)
>   printf("Fehler: umad_unregister\n");
>  else printf("done\n\n");
> 
>  printf("... Close Port ...");
>  if (Port_ID != -1)
>   if ((umad_close_port(Port_ID)) != 0){
>    printf("Fehler: umad_close_port\n");
>   }
>   else printf("done\n\n");
>  else printf("nix zu tun\n\n");
> 
> }
> 
> // ---- SMP Paket -------------------------------
> 
> 
> void drsmp_get_init(void *umad, DRPath *path, uint16_t attr, int mod)
> {
>    struct drsmp *smp = (struct drsmp *)(umad_get_mad(umad));
> 
>    memset(smp, 0, sizeof (*smp));
> 
>    smp->base_version  = 1;
>    smp->mgmt_class    = CLASS_SUBN_DIRECTED_ROUTE;
>    smp->class_version = 1;
> 
>    smp->method        = 0x01;
>    smp->attr_id      = (uint16_t)htons((uint16_t)attr);
>    smp->attr_mod     = htonl(mod);
>    smp->tid           = htonll(drmad_tid++);
>    smp->dr_slid       = 0xffff;
>    smp->dr_dlid       = 0xffff;
> 
>    umad_set_addr(umad, 0xffff, 0, 0, 0);
> 
>    if (path)
>       memcpy(smp->initial_path, path->path, path->hop_cnt+1);
> 
>    smp->hop_cnt = path->hop_cnt;
> }
> 
> void drsmp_set_init(void *umad, DRPath *path, uint16_t attr, int mod, void 
> *data)
> {
>    struct drsmp *smp = (struct drsmp *)(umad_get_mad(umad));
> 
>    memset(smp, 0, sizeof (*smp));
> 
>    smp->method        = 2;    /* SET */
>    smp->attr_id      = (uint16_t)htons((uint16_t)attr);
>    smp->attr_mod     = htonl(mod);
>    smp->tid           = htonll(drmad_tid++);
>    smp->dr_slid       = 0xffff;
>    smp->dr_dlid       = 0xffff;
> 
>    umad_set_addr(umad, 0xffff, 0, 0, 0);
> 
>    if (path)
>       memcpy(smp->initial_path, path->path, path->hop_cnt+1);
> 
>    if (data)
>       memcpy(smp->data, data, sizeof smp->data);
> 
>    smp->hop_cnt = path->hop_cnt;
> }
> 
> int str2DRPath(char *str, DRPath *path)
> {
>    char *s;
> 
>    path->hop_cnt = -1;
> 
>    //DEBUG("DR str: %s", str);
>    while (str && *str) {
>       if ((s = strchr(str, ',')))
>          *s = 0;
>       path->path[++path->hop_cnt] = atoi(str);
>       if (!s)
>          break;
>       str = s+1;
>    }
> 
> #if 0
>    if (path->path[0] != 0 ||
>       (path->hop_cnt > 0 && dev_port && path->path[1] != dev_port)) {
>       DEBUG("hop 0 != 0 or hop 1 != dev_port");
>       return -1;
>    }
> #endif
> 
>    return path->hop_cnt;
> }
> 
> 


From halr at voltaire.com  Fri Feb  9 13:48:47 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 09 Feb 2007 16:48:47 -0500
Subject: [openib-general] Problem is routing CM REQ
In-Reply-To: <45CCDAE0.1080102@ichips.intel.com>
References: <45CA3568.1000508@ichips.intel.com>
	<20070207213108.GD11411@obsidianresearch.com>
	<45CA5573.80802@ichips.intel.com>
	<20070207224928.GF11411@obsidianresearch.com>
	<1170894459.31538.23768.camel@hal.voltaire.com>
	<45CB6A8F.2030705@ichips.intel.com> <45CBB59C.4010709@ichips.intel.com>
	<1171023168.31538.153989.camel@hal.voltaire.com>
	<45CCADC7.5000804@ichips.intel.com>
	<1171043929.31538.174521.camel@hal.voltaire.com>
	<20070209192046.GP11411@obsidianresearch.com>
	<45CCD1E2.5050806@ichips.intel.com>
	<1171051315.31538.181667.camel@hal.voltaire.com>
	<45CCDAE0.1080102@ichips.intel.com>
Message-ID: <1171057719.31538.187820.camel@hal.voltaire.com>

On Fri, 2007-02-09 at 15:34, Sean Hefty wrote:
> > the /missing part (right now) is locating the SA on that
> > remote subnet if this is a needed function.
> 
> Maybe we can expose this to SA clients through a ServiceRecord?

That might be one way if there were a standardized service name for SA
and there was some way to globally distribute those across SAs.
The hard part is the global distribution of this information.

-- Hal

>   This doesn't 
> solve how the two SAs find each other (or any of the other difficult stuff), but 
> with this and the path record query ability that we mentioned, I think we may 
> have a solution for the host stack.
> 
> - Sean


From panda at cse.ohio-state.edu  Fri Feb  9 14:28:16 2007
From: panda at cse.ohio-state.edu (Dhabaleswar Panda)
Date: Fri, 9 Feb 2007 17:28:16 -0500 (EST)
Subject: [openib-general] MVAPICH 0.9.9-beta release is available
Message-ID: <200702092228.l19MSGEo006670@xi.cse.ohio-state.edu>

The MVAPICH team is pleased to announce the availability of MVAPICH
0.9.9-beta with the following NEW features:

- Message coalescing support to enable reduction of per Queue-pair
  send queues for reduction in memory requirement on large scale
  clusters. This design also increases the small message messaging
  rate significantly.

- Designs for avoiding hot-spots in networks of large-scale clusters

  - Multi-pathing support leveraging LMC mechanism
  - Multi-port support for enabling user processes to bind to 
    different IB ports for balanced communication performance
    on multi-core platforms

- Multi-core optimized scalable shared memory design

- Memory Hook support provided by integration with ptmalloc2 library. 
  This provides safe release of memory to the Operating System and
  is expected to benefit the memory usage of applications that 
  frequently use malloc and free operations.

- Optimized, high-performance shared memory aware collective
  operations for multi-core platforms

- Shared-Memory only channel (This interface support is useful for
  running MPI jobs on multi-processor systems without using any 
  high-performance network. For example, multi-core servers, 
  desktops, and laptops; and clusters with serial nodes.)

A new "Multiple-pair Bandwidth and Message Rate" test is also
available as a part of OSU_Benchmarks.

For downloading MVAPICH 0.9.9-beta package and accessing the anonymous
SVN, please visit the following URL:

http://nowlab.cse.ohio-state.edu/projects/mpi-iba/

MVAPICH 0.9.9-beta is also available for OFED 1.2 testing.

All feedbacks, including bug reports and hints for performance tuning,
are welcome. Please post it to the mvapich-discuss mailing list.

Thanks, 

MVAPICH Team


From jgunthorpe at obsidianresearch.com  Fri Feb  9 14:38:45 2007
From: jgunthorpe at obsidianresearch.com (Jason Gunthorpe)
Date: Fri, 9 Feb 2007 15:38:45 -0700
Subject: [openib-general] Problem is routing CM REQ
In-Reply-To: <1171057501.31538.187596.camel@hal.voltaire.com>
References: <45CA5573.80802@ichips.intel.com>
	<20070207224928.GF11411@obsidianresearch.com>
	<1170894459.31538.23768.camel@hal.voltaire.com>
	<45CB6A8F.2030705@ichips.intel.com> <45CBB59C.4010709@ichips.intel.com>
	<1171023168.31538.153989.camel@hal.voltaire.com>
	<45CCADC7.5000804@ichips.intel.com>
	<1171043929.31538.174521.camel@hal.voltaire.com>
	<20070209192046.GP11411@obsidianresearch.com>
	<1171057501.31538.187596.camel@hal.voltaire.com>
Message-ID: <20070209223845.GR11411@obsidianresearch.com>

On Fri, Feb 09, 2007 at 04:45:29PM -0500, Hal Rosenstock wrote:

> >    Off hand I don't see that the existing path record query structure
> >    has enough information to do this.. Particularly, in cases
> >    where each subnet has more than 1 router port there is no real
> >    guarentee that querying for the SGID -> DGID direction and then the
> >    DGID -> SGID direction uses the same router ports without providing
> >    both router LIDs as part of the query.
> 
> Router LIDs rather than GIDs (in the case of LMC > 0) ?

Yes, it is the router LID that is matched by the QP, the router GID
never makes it into any packets or PR responses. To elaborate on it..

Basically you need to specify the egress subnet *and* the egress
router LID when constructing the path to handle the case of multiple
fabric and router paths. The GID of the target and the LID of the
target's router port is enough disambiguate all the possible
multipaths down to a set that will match the QP programming.

This is all because of the LID matching rules. The ultimate router
egress LID must be controlled when establishing the path. It must
match the DLID in the QP, so it must be specified when the path is
looked up so that the SA/Routers/etc can provide a PR that meets the
egress LID requirement. This is not just to ensure that the router
selects the right LRH.SLID in the case of LMC >0 but to also ensure
that the *right* router port is used in the case of multiple
(redundant) routed paths.

Basically the idea where each end of a RC QP could independently do a
Path Record query for the remote GID cannot work due to the LID
matching rule. 

Sean: Even if you can query both SA's there isn't enough information
to force things to use the same router path in each direction.

Jason


From swise at opengridcomputing.com  Fri Feb  9 14:46:57 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Fri, 09 Feb 2007 16:46:57 -0600
Subject: [openib-general] [PATCH] RDMA/iwcm: Bugs in
 cm_conn_req_handler()
In-Reply-To: <20070207065650.24166.6979.sendpatchset@localhost.localdomain>
References: <20070207065650.24166.6979.sendpatchset@localhost.localdomain>
Message-ID: <1171061217.4525.15.camel@stevo-desktop>


> All 4 above cases were tested by injecting random error in
> iw_conn_req_handler() and running rdma_bw/krping, they were
> confirmed. I added the BUG_ON() to confirm the earlier check
> for id_priv->refcount==0 should always be true (and could be
> removed).

Can you post the test case you're using for this? 


Steve.


From mshefty at ichips.intel.com  Fri Feb  9 15:08:12 2007
From: mshefty at ichips.intel.com (Sean Hefty)
Date: Fri, 09 Feb 2007 15:08:12 -0800
Subject: [openib-general] Problem is routing CM REQ
In-Reply-To: <20070209223845.GR11411@obsidianresearch.com>
References: <45CA5573.80802@ichips.intel.com>
	<20070207224928.GF11411@obsidianresearch.com>
	<1170894459.31538.23768.camel@hal.voltaire.com>
	<45CB6A8F.2030705@ichips.intel.com> <45CBB59C.4010709@ichips.intel.com>
	<1171023168.31538.153989.camel@hal.voltaire.com>
	<45CCADC7.5000804@ichips.intel.com>
	<1171043929.31538.174521.camel@hal.voltaire.com>
	<20070209192046.GP11411@obsidianresearch.com>
	<1171057501.31538.187596.camel@hal.voltaire.com>
	<20070209223845.GR11411@obsidianresearch.com>
Message-ID: <45CCFEDC.3040700@ichips.intel.com>

> Sean: Even if you can query both SA's there isn't enough information
> to force things to use the same router path in each direction.

My assumption is that the remote SA contains the necessary information about how 
a packet coming from the local SGID to the remote DGID would be routed on the 
remote subnet.  The returned path record must specify the SLID that the remote 
router will send from, along with the DLID that the router will map the DGID to.

Likewise for the local SA.  As long as the path is reversible, then my 
expectation is that the local router will use the returned LIDs for packets 
coming from the remote DGID to the local SGID.

The route itself is determined using the SGID, DGID, TClass, FlowLabel.  So, as 
long as the two queries match on these fields, I would think that it would work.

- Sean


From michael.arndt at informatik.tu-chemnitz.de  Fri Feb  9 15:14:35 2007
From: michael.arndt at informatik.tu-chemnitz.de (Michael Arndt)
Date: Sat, 10 Feb 2007 00:14:35 +0100
Subject: [openib-general] Unknown SMP Recv
References: <000901c74938$e10b2a30$21606d86@one7>
	<1170689654.4525.201415.camel@hal.voltaire.com>
	<001401c74946$a664a2e0$21606d86@one7>
	<1170695591.4525.207604.camel@hal.voltaire.com>
	<002001c74a33$c2ec1db0$21606d86@one7>
	<1170807564.4525.324195.camel@hal.voltaire.com>
	<001e01c74be2$b4889310$21606d86@one7>
	<1170994529.31538.124584.camel@hal.voltaire.com>
	<000401c74c6d$ce4875f0$21606d86@one7>
	<1171044773.31538.175280.camel@hal.voltaire.com>
	<000401c74c79$74439b50$21606d86@one7>
	<1171051141.2767.7.camel@localhost>
	<001001c74c87$8b653470$21606d86@one7>
	<1171058084.31538.188191.camel@hal.voltaire.com>
Message-ID: <000801c74ca0$108a83e0$21606d86@one7>

Hi,

below the two files missing, sender.h and helper.c.

Thanks Michael

############################# Sender. h 
##############################################
// ---- Includes --------------------------------
#include <infiniband/umad.h>
#include <string.h>
#include <errno.h>
#include <sys/select.h>
#include <netinet/in.h>
#include <bits/endian.h>

// ---- Defines ---------------------------------

   #define IB_MAD_SIZE           256
   #define UMAD_DEV_NAME_SZ      32
   #define UMAD_DEV_FILE_SZ      256

   #define DIRECTION (uint16_t)htons(0x8000)

   #define BUF_SIZE              4096
   #define MCLASS_SUBN_DIR    0x81
   #define MCLASS_SUBN_LID    0x01
   #define MCLASS_SUBN_ADM    0x03
   #define SMP_STATUS_MASK_HO 0x7FFF
   #define SMP_STATUS_MASK   (uint16_t)htons(SMP_STATUS_MASK_HO)
   #define PRIx64    "lx"
   #define SM_METHOD_STR_UNKNOWN_VAL 0x21

 #define NODE_INFO_PORT_NUM_MASK     (ntohl(0xFF000000))
 #define NODE_INFO_VEND_ID_MASK      (ntohl(0x00FFFFFF))


 #define MAD_ATTR_PORT_INFO          0x0015
 #define MAD_ATTR_NODE_INFO          0x0011
 #define MAD_ATTR_NODE_DESC          0x0010

 #define IB_NOTICE_NODE_TYPE_ROUTER        (ntohl(0x000003))

 #define SM_ATTR_STR_UNKNOWN_VAL 0x21

 #define PORT_LINK_SPEED_SHIFT    4
 #define PORT_LINK_SPEED_SUPPORTED_MASK 0xF0
 #define PORT_LINK_SPEED_ACTIVE_MASK    0xF0
 #define PORT_LINK_SPEED_ENABLED_MASK      0x0F

 #define LINK_NO_CHANGE 0
 #define LINK_DOWN      1
 #define LINK_INIT   2
 #define LINK_ARMED     3
 #define LINK_ACTIVE    4
 #define LINK_ACT_DEFER 5

 #define PORT_STATE_MASK       0x0F
 #define PORT_LMC_MASK         0x07
 #define PORT_LMC_MAX          0x07
 #define PORT_MPB_MASK         0xC0
 #define PORT_MPB_SHIFT        6
 #define PORT_LINK_SPEED_SHIFT    4
 #define PORT_LINK_SPEED_SUPPORTED_MASK 0xF0
 #define PORT_LINK_SPEED_ACTIVE_MASK    0xF0
 #define PORT_LINK_SPEED_ENABLED_MASK      0x0F
 #define PORT_PHYS_STATE_MASK        0xF0
 #define PORT_PHYS_STATE_SHIFT    4
 #define PORT_LNKDWNDFTSTATE_MASK    0x0F


#ifndef __BYTE_ORDER
#error "__BYTE_ORDER macro undefined. Missing in endian.h?"
#endif

#if __BYTE_ORDER == __LITTLE_ENDIAN
 #define CPU_LE    1
 #define CPU_BE    0
#else
 #define CPU_LE    0
 #define CPU_BE    1
#endif

#if CPU_LE
   #define NODE_INFO_PORT_NUM_SHIFT 0
#else
   #define NODE_INFO_PORT_NUM_SHIFT 24
#endif


#define own_ntoh64( x )     (uint64_t)(             \
         (((uint64_t)(x) & 0x00000000000000FFULL) << 56) |  \
         (((uint64_t)(x) & 0x000000000000FF00ULL) << 40) |  \
         (((uint64_t)(x) & 0x0000000000FF0000ULL) << 24) |  \
         (((uint64_t)(x) & 0x00000000FF000000ULL) << 8 ) |  \
         (((uint64_t)(x) & 0x000000FF00000000ULL) >> 8 ) |  \
         (((uint64_t)(x) & 0x0000FF0000000000ULL) >> 24) |  \
         (((uint64_t)(x) & 0x00FF000000000000ULL) >> 40) |  \
         (((uint64_t)(x) & 0xFF00000000000000ULL) >> 56) )

#define own_ntoh64_2( x )     (uint64_t)(             \
         (((uint64_t)(x) & 0x00000000000000FFULL) << 24) |  \
         (((uint64_t)(x) & 0x000000000000FF00ULL) << 8) |  \
         (((uint64_t)(x) & 0x0000000000FF0000ULL) >> 8) |  \
         (((uint64_t)(x) & 0x00000000FF000000ULL) >> 24 ) |  \
         (((uint64_t)(x) & 0x000000FF00000000ULL) << 24 ) |  \
         (((uint64_t)(x) & 0x0000FF0000000000ULL) << 8) |  \
         (((uint64_t)(x) & 0x00FF000000000000ULL) >> 8) |  \
         (((uint64_t)(x) & 0xFF00000000000000ULL) >> 24) )


// ---- Deklarationen ---------------------------


struct  Port {
   char dev_file[UMAD_DEV_FILE_SZ];
   char dev_name[UMAD_DEV_NAME_SZ];
   int dev_port;
   int dev_fd;
   int id;
};

struct _register_info{
      int Management_Class;
      int Management_Version;
      uint8_t RMPP_Version;
      uint32_t Method_Mask[4];
   } Register_Info;

typedef struct {
   char path[64];
   int hop_cnt;
} DRPath;

struct drsmp {
   uint8_t     base_version;
   uint8_t     mgmt_class;
   uint8_t     class_version;
   uint8_t     method;
   uint16_t  status;
   uint8_t     hop_ptr;
   uint8_t     hop_cnt;
   uint64_t  tid;
   uint16_t  attr_id;
   uint16_t  resv;
   uint32_t  attr_mod;
   uint64_t  mkey;
   uint16_t  dr_slid;
   uint16_t  dr_dlid;
   uint32_t    reserved[7];
   uint8_t     data[64];
   uint8_t     initial_path[64];
   uint8_t     return_path[64];
};

struct node_info
{
   uint8_t         base_version;
   uint8_t         class_version;
   uint8_t         node_type;
   uint8_t         num_ports;
   uint64_t        sys_guid;
   uint64_t        node_guid;
   uint64_t        port_guid;
   uint16_t        partition_cap;
   uint16_t        device_id;
   uint32_t        revision;
   uint32_t        port_num_vendor_id;

};

struct port_info
{
   uint64_t        m_key;
   uint64_t        subnet_prefix;
   uint16_t        base_lid;
   uint16_t        master_sm_base_lid;
   uint32_t        capability_mask;
   uint16_t        diag_code;
   uint16_t        m_key_lease_period;
   uint8_t           local_port_num;
   uint8_t           link_width_enabled;
   uint8_t           link_width_supported;
   uint8_t           link_width_active;
   uint8_t           state_info1; /* LinkSpeedSupported and PortState */
   uint8_t           state_info2; /* PortPhysState and LinkDownDefaultState 
*/
   uint8_t           mkey_lmc;
   uint8_t           link_speed;  /* LinkSpeedEnabled and LinkSpeedActive */
   uint8_t           mtu_smsl;
   uint8_t           vl_cap;      /* VLCap and InitType */
   uint8_t           vl_high_limit;
   uint8_t           vl_arb_high_cap;
   uint8_t           vl_arb_low_cap;
   uint8_t           mtu_cap;
   uint8_t           vl_stall_life;
   uint8_t           vl_enforce;
   uint16_t        m_key_violations;
   uint16_t        p_key_violations;
   uint16_t        q_key_violations;
   uint8_t           guid_cap;
   uint8_t           subnet_timeout; /* cli_rereg(1b), resrv(2b), 
timeout(5b) */
   uint8_t           resp_time_value;
   uint8_t           error_threshold;

};


// ---- Prototypes

int routing(struct drsmp* smp, struct umad_ca* Devices_Info , int 
Devices_cnt);


int set_bit(int nr, void *method_mask);

char *drmad_status_str(struct drsmp *drsmp);

void dump_dr_smp(const struct drsmp* const p_smp);

############################################## helper.c 
##########################################################

// ---- Include ---------------------------------

#include "sender.h"

// ---- Hilfe Funktionen ------------------------

const char* sm_method_str[] =
{
  "RESERVED0",              /* 0 */
  "SubnGet",              /* 1 */
  "SubnSet",              /* 2 */
  "RESERVED3",               /* 3 */
  "RESERVED4",               /* 4 */
  "SubnTrap",                /* 5 */
  "RESERVED6",               /* 6 */
  "SubnTrapRepress",         /* 7 */
  "RESERVED8",               /* 8 */
  "RESERVED9",               /* 9 */
  "RESERVEDA",               /* A */
  "RESERVEDB",               /* B */
  "RESERVEDC",               /* C */
  "RESERVEDD",               /* D */
  "RESERVEDE",               /* E */
  "RESERVEDF",               /* F */
  "RESERVED10",              /* 10 */
  "SubnGetResp",             /* 11 */
  "RESERVED12",           /* 12 */
  "RESERVED13",           /* 13 */
  "RESERVED14",           /* 14 */
  "RESERVED15",           /* 15 */
  "RESERVED16",              /* 16 */
  "RESERVED17",           /* 17 */
  "RESERVED18",           /* 18 */
  "RESERVED19",           /* 19 */
  "RESERVED1A",           /* 1A */
  "RESERVED1B",           /* 1B */
  "RESERVED1C",           /* 1C */
  "RESERVED1D",           /* 1D */
  "RESERVED1E",             /* 1E */
  "RESERVED1F",             /* 1F */
  "UNKNOWN"                  /* 20 */
};

const char* node_type_str[] =
{
   "UNKNOWN",
   "Channel Adapter",
   "Switch",
   "Router",
   "Subnet Management"
};

const char* sm_attr_str[] =
{
  "RESERVED",                  /* 0 */
  "ClassPortInfo",             /* 1 */
  "Notice",                    /* 2 */
  "InformInfo",                /* 3 */
  "RESERVED",                  /* 4 */
  "RESERVED",                  /* 5 */
  "RESERVED",                  /* 6 */
  "RESERVED",                  /* 7 */
  "RESERVED",                  /* 8 */
  "RESERVED",                  /* 9 */
  "RESERVED",                  /* A */
  "RESERVED",                  /* B */
  "RESERVED",                  /* C */
  "RESERVED",                  /* D */
  "RESERVED",                  /* E */
  "RESERVED",                  /* F */
  "NodeDescription",           /* 10 */
  "NodeInfo",                  /* 11 */
  "SwitchInfo",                /* 12 */
  "UNKNOWN",                   /* 13 */
  "GUIDInfo",                  /* 14 */
  "PortInfo",                  /* 15 */
  "P_KeyTable",                /* 16 */
  "SLtoVLMappingTable",        /* 17 */
  "VLArbitrationTable",        /* 18 */
  "LinearForwardingTable",     /* 19 */
  "RandomForwardingTable",     /* 1A */
  "MulticastForwardingTable",  /* 1B */
  "UNKNOWN",                   /* 1C */
  "UNKNOWN",                   /* 1D */
  "UNKNOWN",                   /* 1E */
  "UNKNOWN",                   /* 1F */
  "SMInfo",                    /* 20 */
  "UNKNOWN"                    /* 21 - always highest value */
};


const char* sa_attr_str[] =
{
  "RESERVED",                  /* 0 */
  "ClassPortInfo",             /* 1 */
  "Notice",                    /* 2 */
  "InformInfo",                /* 3 */
  "RESERVED",                  /* 4 */
  "RESERVED",                  /* 5 */
  "RESERVED",                  /* 6 */
  "RESERVED",                  /* 7 */
  "RESERVED",                  /* 8 */
  "RESERVED",                  /* 9 */
  "RESERVED",                  /* A */
  "RESERVED",                  /* B */
  "RESERVED",                  /* C */
  "RESERVED",                  /* D */
  "RESERVED",                  /* E */
  "RESERVED",                  /* F */
  "RESERVED",                  /* 10 */
  "NodeRecord",                /* 11 */
  "PortInfoRecord",            /* 12 */
  "SLtoVLMappingTableRecord",  /* 13 */
  "SwitchInfoRecord",          /* 14 */
  "LinearForwardingTableRecord", /* 15 */
  "RandomForwardingTableRecord", /* 16 */
  "MulticastForwardingTableRecord",  /* 17 */
  "SMInfoRecord",              /* 18 */
  "RESERVED",                  /* 19 */
  "RandomForwardingTable",     /* 1A */
  "MulticastForwardingTable",  /* 1B */
  "UNKNOWN",                   /* 1C */
  "UNKNOWN",                   /* 1D */
  "UNKNOWN",                   /* 1E */
  "UNKNOWN",                   /* 1F */
  "LinkRecord",                /* 20 */
  "UNKNOWN",                   /* 21 */
  "UNKNOWN",                   /* 22 */
  "UNKNOWN",                   /* 23 */
  "UNKNOWN",                   /* 24 */
  "UNKNOWN",                   /* 25 */
  "UNKNOWN",                   /* 26 */
  "UNKNOWN",                   /* 27 */
  "UNKNOWN",                   /* 28 */
  "UNKNOWN",                   /* 29 */
  "UNKNOWN",                   /* 2A */
  "UNKNOWN",                   /* 2B */
  "UNKNOWN",                   /* 2C */
  "UNKNOWN",                   /* 2D */
  "UNKNOWN",                   /* 2E */
  "UNKNOWN",                   /* 2F */
  "GuidInfoRecord",            /* 30 */
  "ServiceRecord",             /* 31 */
  "UNKNOWN",                   /* 32 */
  "P_KeyTableRecord",          /* 33 */
  "UNKNOWN",                   /* 34 */
  "PathRecord",                /* 35 */
  "VLArbitrationTableRecord",  /* 36 */
  "UNKNOWN",                   /* 37 */
  "MCMemberRecord",            /* 38 */
  "TraceRecord",               /* 39 */
  "MultiPathRecord",           /* 3A */
  "ServiceAssociationRecord",  /* 3B */
  "UNKNOWN",                   /* 3C */
  "UNKNOWN",                   /* 3D */
  "UNKNOWN",                   /* 3E */
  "UNKNOWN",                   /* 3F */
  "UNKNOWN",                   /* 40 */
  "UNKNOWN",                   /* 41 */
  "UNKNOWN",                   /* 42 */
  "UNKNOWN",                   /* 43 */
  "UNKNOWN",                   /* 44 */
  "UNKNOWN",                   /* 45 */
  "UNKNOWN",                   /* 46 */
  "UNKNOWN",                   /* 47 */
  "UNKNOWN",                   /* 48 */
  "UNKNOWN",                   /* 49 */
  "UNKNOWN",                   /* 4A */
  "UNKNOWN",                   /* 4B */
  "UNKNOWN",                   /* 4C */
  "UNKNOWN",                   /* 4D */
  "UNKNOWN",                   /* 4E */
  "UNKNOWN",                   /* 4F */
  "UNKNOWN",                   /* 50 */
  "UNKNOWN",                   /* 51 */
  "UNKNOWN",                   /* 52 */
  "UNKNOWN",                   /* 53 */
  "UNKNOWN",                   /* 54 */
  "UNKNOWN",                   /* 55 */
  "UNKNOWN",                   /* 56 */
  "UNKNOWN",                   /* 57 */
  "UNKNOWN",                   /* 58 */
  "UNKNOWN",                   /* 59 */
  "UNKNOWN",                   /* 5A */
  "UNKNOWN",                   /* 5B */
  "UNKNOWN",                   /* 5C */
  "UNKNOWN",                   /* 5D */
  "UNKNOWN",                   /* 5E */
  "UNKNOWN",                   /* 5F */
  "UNKNOWN",                   /* 60 */
  "UNKNOWN",                   /* 61 */
  "UNKNOWN",                   /* 62 */
  "UNKNOWN",                   /* 63 */
  "UNKNOWN",                   /* 64 */
  "UNKNOWN",                   /* 65 */
  "UNKNOWN",                   /* 66 */
  "UNKNOWN",                   /* 67 */
  "UNKNOWN",                   /* 68 */
  "UNKNOWN",                   /* 69 */
  "UNKNOWN",                   /* 6A */
  "UNKNOWN",                   /* 6B */
  "UNKNOWN",                   /* 6C */
  "UNKNOWN",                   /* 6D */
  "UNKNOWN",                   /* 6E */
  "UNKNOWN",                   /* 6F */
  "UNKNOWN",                   /* 70 */
  "UNKNOWN",                   /* 71 */
  "UNKNOWN",                   /* 72 */
  "UNKNOWN",                   /* 73 */
  "UNKNOWN",                   /* 74 */
  "UNKNOWN",                   /* 75 */
  "UNKNOWN",                   /* 76 */
  "UNKNOWN",                   /* 77 */
  "UNKNOWN",                   /* 78 */
  "UNKNOWN",                   /* 79 */
  "UNKNOWN",                   /* 7A */
  "UNKNOWN",                   /* 7B */
   "UNKNOWN",                   /* 7C */
  "UNKNOWN",                   /* 7D */
  "UNKNOWN",                   /* 7E */
  "UNKNOWN",                   /* 7F */
  "UNKNOWN",                   /* 80 */
  "UNKNOWN",                   /* 81 */
  "UNKNOWN",                   /* 82 */
  "UNKNOWN",                   /* 83 */
  "UNKNOWN",                   /* 84 */
  "UNKNOWN",                   /* 85 */
  "UNKNOWN",                   /* 86 */
  "UNKNOWN",                   /* 87 */
  "UNKNOWN",                   /* 88 */
  "UNKNOWN",                   /* 89 */
  "UNKNOWN",                   /* 8A */
  "UNKNOWN",                   /* 8B */
  "UNKNOWN",                   /* 8C */
  "UNKNOWN",                   /* 8D */
  "UNKNOWN",                   /* 8E */
  "UNKNOWN",                   /* 8F */
  "UNKNOWN",                   /* 90 */
  "UNKNOWN",                   /* 91 */
  "UNKNOWN",                   /* 92 */
  "UNKNOWN",                   /* 93 */
  "UNKNOWN",                   /* 94 */
  "UNKNOWN",                   /* 95 */
  "UNKNOWN",                   /* 96 */
  "UNKNOWN",                   /* 97 */
  "UNKNOWN",                   /* 98 */
  "UNKNOWN",                   /* 99 */
  "UNKNOWN",                   /* 9A */
  "UNKNOWN",                   /* 9B */
  "UNKNOWN",                   /* 9C */
  "UNKNOWN",                   /* 9D */
  "UNKNOWN",                   /* 9E */
  "UNKNOWN",                   /* 9F */
  "UNKNOWN",                   /* A0 */
  "UNKNOWN",                   /* A1 */
  "UNKNOWN",                   /* A2 */
  "UNKNOWN",                   /* A3 */
  "UNKNOWN",                   /* A4 */
  "UNKNOWN",                   /* A5 */
  "UNKNOWN",                   /* A6 */
  "UNKNOWN",                   /* A7 */
  "UNKNOWN",                   /* A8 */
  "UNKNOWN",                   /* A9 */
  "UNKNOWN",                   /* AA */
  "UNKNOWN",                   /* AB */
  "UNKNOWN",                   /* AC */
  "UNKNOWN",                   /* AD */
  "UNKNOWN",                   /* AE */
  "UNKNOWN",                   /* AF */
  "UNKNOWN",                   /* B0 */
  "UNKNOWN",                   /* B1 */
  "UNKNOWN",                   /* B2 */
  "UNKNOWN",                   /* B3 */
  "UNKNOWN",                   /* B4 */
  "UNKNOWN",                   /* B5 */
  "UNKNOWN",                   /* B6 */
  "UNKNOWN",                   /* B7 */
  "UNKNOWN",                   /* B8 */
  "UNKNOWN",                   /* B9 */
  "UNKNOWN",                   /* BA */
 "UNKNOWN",                   /* BB */
  "UNKNOWN",                   /* BC */
  "UNKNOWN",                   /* BD */
  "UNKNOWN",                   /* BE */
  "UNKNOWN",                   /* BF */
  "UNKNOWN",                   /* C0 */
  "UNKNOWN",                   /* C1 */
  "UNKNOWN",                   /* C2 */
  "UNKNOWN",                   /* C3 */
  "UNKNOWN",                   /* C4 */
  "UNKNOWN",                   /* C5 */
  "UNKNOWN",                   /* C6 */
  "UNKNOWN",                   /* C7 */
  "UNKNOWN",                   /* C8 */
  "UNKNOWN",                   /* C9 */
  "UNKNOWN",                   /* CA */
  "UNKNOWN",                   /* CB */
  "UNKNOWN",                   /* CC */
  "UNKNOWN",                   /* CD */
  "UNKNOWN",                   /* CE */
  "UNKNOWN",                   /* CF */
  "UNKNOWN",                   /* D0 */
  "UNKNOWN",                   /* D1 */
  "UNKNOWN",                   /* D2 */
  "UNKNOWN",                   /* D3 */
  "UNKNOWN",                   /* D4 */
  "UNKNOWN",                   /* D5 */
  "UNKNOWN",                   /* D6 */
  "UNKNOWN",                   /* D7 */
  "UNKNOWN",                   /* D8 */
  "UNKNOWN",                   /* D9 */
  "UNKNOWN",                   /* DA */
  "UNKNOWN",                   /* DB */
  "UNKNOWN",                   /* DC */
  "UNKNOWN",                   /* DD */
  "UNKNOWN",                   /* DE */
  "UNKNOWN",                   /* DF */
  "UNKNOWN",                   /* E0 */
  "UNKNOWN",                   /* E1 */
  "UNKNOWN",                   /* E2 */
  "UNKNOWN",                   /* E3 */
  "UNKNOWN",                   /* E4 */
  "UNKNOWN",                   /* E5 */
  "UNKNOWN",                   /* E6 */
  "UNKNOWN",                   /* E7 */
  "UNKNOWN",                   /* E8 */
  "UNKNOWN",                   /* E9 */
  "UNKNOWN",                   /* EA */
  "UNKNOWN",                   /* EB */
  "UNKNOWN",                   /* EC */
  "UNKNOWN",                   /* ED */
  "UNKNOWN",                   /* EE */
  "UNKNOWN",                   /* EF */
  "UNKNOWN",                   /* F0 */
  "UNKNOWN",                   /* F1 */
  "UNKNOWN",                   /* F2 */
  "InformInfoRecord",          /* F3 */
  "UNKNOWN"                    /* F4 - always highest value */
};


const char* port_state_str[] =
{
   "No State Change (NOP)",
   "DOWN",
   "INIT",
   "ARMED",
   "ACTIVE",
   "ACTDEFER",
   "UNKNOWN"
};


int set_bit(int nr, void *method_mask)
{
   int mask, retval;
   long *addr = method_mask;

   addr += nr >> 5;
   mask = 1 << (nr & 0x1f);
   retval = (mask & *addr) != 0;
   *addr |= mask;
   return retval;
}

char * drmad_status_str(struct drsmp *drsmp)
{
   switch (drsmp->status) {
   case 0:
      return "success";
   case ETIMEDOUT:
      return "timeout";
   }
   return "unknown error";
}

const char* get_sm_method_str(uint8_t method )
{
  if (method & 0x80) method = (method & 0x0F) | 0x10;
  if( method >= SM_METHOD_STR_UNKNOWN_VAL  )
    method = SM_METHOD_STR_UNKNOWN_VAL;
  return( sm_method_str[method] );
}

uint16_t smp_get_status(uint16_t status )
{
   return( (uint16_t)(status & SMP_STATUS_MASK) );
}

uint8_t node_info_get_local_port_num(const struct node_info* const p_ni)
{
   return( (uint8_t)(( p_ni->port_num_vendor_id & NODE_INFO_PORT_NUM_MASK ) 
 >> NODE_INFO_PORT_NUM_SHIFT ));
}


const char* get_node_type_str(uint32_t node_type)
{
   if( node_type >= IB_NOTICE_NODE_TYPE_ROUTER )
      node_type = 0;
   return( node_type_str[node_type] );
}

uint32_t node_info_get_vendor_id(const struct node_info* const p_ni )
{
   return( (uint32_t)( p_ni->port_num_vendor_id & 
NODE_INFO_VEND_ID_MASK ) );
}

const char* get_sm_attr_str(uint16_t attr )
{
  uint16_t host_attr;
  host_attr = ntohs( attr );

  if( host_attr >= SM_ATTR_STR_UNKNOWN_VAL  )
    host_attr = SM_ATTR_STR_UNKNOWN_VAL;

  return( sm_attr_str[host_attr] );
}

uint8_t port_info_get_link_speed_sup(const struct port_info* const p_pi )
{
   return( (uint8_t)((p_pi->state_info1 & PORT_LINK_SPEED_SUPPORTED_MASK) >> 
PORT_LINK_SPEED_SHIFT) );
}

const char* get_port_state_str(uint8_t port_state )
{
   if( port_state > LINK_ACTIVE )  port_state = LINK_ACTIVE + 1;
   return( port_state_str[port_state] );
}

uint8_t port_info_get_mpb(const struct port_info* const  p_pi )
{
   return( (uint8_t)((p_pi->mkey_lmc & PORT_MPB_MASK) >> PORT_MPB_SHIFT) );
}

uint8_t port_info_get_lmc(const struct port_info* const  p_pi )
{
   return( (uint8_t)(p_pi->mkey_lmc & PORT_LMC_MASK) );
}

uint8_t port_info_get_client_rereg(struct port_info const* p_pi )
{
  return ( (p_pi->subnet_timeout & 0x80 ) >> 7);
}

uint8_t port_info_get_timeout(struct port_info const*   p_pi )
{
  return(p_pi->subnet_timeout & 0x1F );
}


uint8_t port_info_get_port_state(const struct port_info* const p_pi )
{
   return( (uint8_t)(p_pi->state_info1 & PORT_STATE_MASK) );
}


void dump_dr_smp( const struct drsmp * const p_smp)
{
  uint32_t i;
  char buf[BUF_SIZE];
  char line[BUF_SIZE];

   sprintf( buf,
             "SMP dump:\n"
             "\t\t\t\tbase_ver................0x%X\n"
             "\t\t\t\tmgmt_class..............0x%X\n"
             "\t\t\t\tclass_ver...............0x%X\n"
             "\t\t\t\tmethod..................0x%X (%s)\n",
             p_smp->base_version,
             p_smp->mgmt_class,
             p_smp->class_version,
             p_smp->method, get_sm_method_str(p_smp->method));

 if (p_smp->mgmt_class == MCLASS_SUBN_DIR)
   {
      sprintf( line,
               "\t\t\t\tD bit...................0x%X\n"
               "\t\t\t\tstatus..................0x%X\n",
               (p_smp->status & DIRECTION) == DIRECTION,
     smp_get_status(p_smp->status) );
   }

   else
   {
      sprintf( line,"\t\t\t\tstatus..................0x%X\n", 
ntohs(p_smp->status));
   }
   strcat( buf, line );

   sprintf( line,
             "\t\t\t\thop_ptr.................0x%X\n"
             "\t\t\t\thop_count...............0x%X\n"
             "\t\t\t\ttrans_id................0x%" PRIx64 "\n"
             "\t\t\t\tattr_id.................0x%X (%s)\n"
             "\t\t\t\tresv....................0x%X\n"
             "\t\t\t\tattr_mod................0x%X\n"
             "\t\t\t\tm_key...................0x%016" PRIx64 "\n",
             p_smp->hop_ptr,
             p_smp->hop_cnt,
             own_ntoh64(p_smp->tid),
             ntohs(p_smp->attr_id),
             get_sm_attr_str(p_smp->attr_id),
             ntohs(p_smp->resv),
             ntohl(p_smp->attr_mod),
             ntohl(p_smp->mkey)
             );
   strcat( buf, line );

   if (p_smp->mgmt_class == MCLASS_SUBN_DIR)
   {
      sprintf( line,
               "\t\t\t\tdr_slid.................0x%X\n"
               "\t\t\t\tdr_dlid.................0x%X\n",
               ntohs(p_smp->dr_slid),
               ntohs(p_smp->dr_dlid)
               );
      strcat( buf, line );

      strcat( buf, "\n\t\t\t\tInitial path: " );

      for( i = 0; i <= p_smp->hop_cnt; i++ )
      {
        sprintf( line, "[%X]", p_smp->initial_path[i] );
        strcat( buf, line );
      }

      strcat( buf, "\n\t\t\t\tReturn path:  " );

      for( i = 0; i <= p_smp->hop_cnt; i++ )
      {
        sprintf( line, "[%X]", p_smp->return_path[i] );
        strcat( buf, line );
      }

      strcat( buf, "\n\t\t\t\tReserved:     " );

      for( i = 0; i < 7; i++ )
      {
        sprintf( line, "[%0X]", p_smp->reserved[i] );
        strcat( buf, line );
      }

      strcat( buf, "\n" );

      for( i = 0; i < 64; i += 16 )
      {
        sprintf( line, "\n\t\t\t\t%02X %02X %02X %02X "
                 "%02X %02X %02X %02X"
                 "   %02X %02X %02X %02X %02X %02X %02X %02X\n",
                 p_smp->data[i],
               p_smp->data[i+1],
                 p_smp->data[i+2],
                 p_smp->data[i+3],
                 p_smp->data[i+4],
                 p_smp->data[i+5],
                 p_smp->data[i+6],
                 p_smp->data[i+7],
                 p_smp->data[i+8],
                 p_smp->data[i+9],
                 p_smp->data[i+10],
                 p_smp->data[i+11],
                 p_smp->data[i+12],
                 p_smp->data[i+13],
                 p_smp->data[i+14],
                 p_smp->data[i+15] );

        strcat( buf, line );
      }
    }
    else
    {
      // not a Direct Route so provide source and destination lids
      strcat(buf, "\t\t\t\tMAD IS LID ROUTED\n");
    }
    printf("%s",buf);


}
void dump_node_info(const struct node_info* const p_ni)
{
 printf(   "NodeInfo dump:\n"
             "\t\t\t\tbase_version............0x%X\n"
             "\t\t\t\tclass_version...........0x%X\n"
             "\t\t\t\tnode_type...............%s\n"
             "\t\t\t\tnum_ports...............0x%X\n"
             "\t\t\t\tsys_guid................0x%016" PRIx64 "\n"
             "\t\t\t\tnode_guid...............0x%016" PRIx64 "\n"
             "\t\t\t\tport_guid...............0x%016" PRIx64 "\n"
             "\t\t\t\tpartition_cap...........0x%X\n"
             "\t\t\t\tdevice_id...............0x%X\n"
             "\t\t\t\trevision................0x%X\n"
             "\t\t\t\tport_num................0x%X\n"
             "\t\t\t\tvendor_id...............0x%X\n"
             "",
             p_ni->base_version,
             p_ni->class_version,
             get_node_type_str( p_ni->node_type ),
             p_ni->num_ports,
             own_ntoh64_2(p_ni->sys_guid),
             own_ntoh64_2( p_ni->node_guid ),
             own_ntoh64_2( p_ni->port_guid ),
             ntohs( p_ni->partition_cap ),
             ntohs( p_ni->device_id ),
             ntohl( p_ni->revision ),
             node_info_get_local_port_num( p_ni ),
             ntohl( node_info_get_vendor_id( p_ni ) )
             );

}

void dump_port_info(const uint64_t node_guid, const uint64_t port_guid, 
const uint8_t port_num, const struct port_info* const p_pi)
{
  char buf[BUF_SIZE];

  printf(
             "PortInfo dump:\n"
             "\t\t\t\tport number.............0x%X\n"
             "\t\t\t\tnode_guid...............0x%016" PRIx64 "\n"
             "\t\t\t\tport_guid...............0x%016" PRIx64 "\n"
             "\t\t\t\tm_key...................0x%016" PRIx64 "\n"
             "\t\t\t\tsubnet_prefix...........0x%016" PRIx64 "\n"
             "\t\t\t\tbase_lid................0x%X\n"
             "\t\t\t\tmaster_sm_base_lid......0x%X\n"
             "\t\t\t\tcapability_mask.........0x%X\n"
             "\t\t\t\tdiag_code...............0x%X\n"
             "\t\t\t\tm_key_lease_period......0x%X\n"
             "\t\t\t\tlocal_port_num..........0x%X\n"
             "\t\t\t\tlink_width_enabled......0x%X\n"
             "\t\t\t\tlink_width_supported....0x%X\n"
             "\t\t\t\tlink_width_active.......0x%X\n"
             "\t\t\t\tlink_speed_supported....0x%X\n"
             "\t\t\t\tport_state..............%s\n"
             "\t\t\t\tstate_info2.............0x%X\n"
             "\t\t\t\tm_key_protect_bits......0x%X\n"
             "\t\t\t\tlmc.....................0x%X\n"
             "\t\t\t\tlink_speed..............0x%X\n"
             "\t\t\t\tmtu_smsl................0x%X\n"
             "\t\t\t\tvl_cap_init_type........0x%X\n"
             "\t\t\t\tvl_high_limit...........0x%X\n"
             "\t\t\t\tvl_arb_high_cap.........0x%X\n"
             "\t\t\t\tvl_arb_low_cap..........0x%X\n"
             "\t\t\t\tinit_rep_mtu_cap........0x%X\n"
             "\t\t\t\tvl_stall_life...........0x%X\n"
             "\t\t\t\tvl_enforce..............0x%X\n"
             "\t\t\t\tm_key_violations........0x%X\n"
             "\t\t\t\tp_key_violations........0x%X\n"
             "\t\t\t\tq_key_violations........0x%X\n"
             "\t\t\t\tguid_cap................0x%X\n"
             "\t\t\t\tclient_reregister.......0x%X\n"
             "\t\t\t\tsubnet_timeout..........0x%X\n"
             "\t\t\t\tresp_time_value.........0x%X\n"
             "\t\t\t\terror_threshold.........0x%X\n"
             "",
             port_num,
             own_ntoh64( node_guid ),
             own_ntoh64( port_guid ),
             own_ntoh64( p_pi->m_key ),
             own_ntoh64( p_pi->subnet_prefix ),
             ntohs( p_pi->base_lid ),
             ntohs( p_pi->master_sm_base_lid ),
             ntohl( p_pi->capability_mask ),
             ntohs( p_pi->diag_code ),
             ntohs( p_pi->m_key_lease_period ),
       p_pi->local_port_num,
             p_pi->link_width_enabled,
             p_pi->link_width_supported,
             p_pi->link_width_active,
             port_info_get_link_speed_sup( p_pi ),
             get_port_state_str( port_info_get_port_state( p_pi ) ),
             p_pi->state_info2,
             port_info_get_mpb( p_pi ),
             port_info_get_lmc( p_pi ),
             p_pi->link_speed,
             p_pi->mtu_smsl,
             p_pi->vl_cap,
             p_pi->vl_high_limit,
             p_pi->vl_arb_high_cap,
             p_pi->vl_arb_low_cap,
             p_pi->mtu_cap,
             p_pi->vl_stall_life,
             p_pi->vl_enforce,
             ntohs( p_pi->m_key_violations ),
             ntohs( p_pi->p_key_violations ),
             ntohs( p_pi->q_key_violations ),
             p_pi->guid_cap,
             port_info_get_client_rereg( p_pi ),
             port_info_get_timeout( p_pi ),
             p_pi->resp_time_value,
             p_pi->error_threshold
             );

}


From mshefty at ichips.intel.com  Fri Feb  9 15:57:59 2007
From: mshefty at ichips.intel.com (Sean Hefty)
Date: Fri, 09 Feb 2007 15:57:59 -0800
Subject: [openib-general] Problem is routing CM REQ
In-Reply-To: <1171057719.31538.187820.camel@hal.voltaire.com>
References: <45CA3568.1000508@ichips.intel.com>
	<20070207213108.GD11411@obsidianresearch.com>
	<45CA5573.80802@ichips.intel.com>
	<20070207224928.GF11411@obsidianresearch.com>
	<1170894459.31538.23768.camel@hal.voltaire.com>
	<45CB6A8F.2030705@ichips.intel.com> <45CBB59C.4010709@ichips.intel.com>
	<1171023168.31538.153989.camel@hal.voltaire.com>
	<45CCADC7.5000804@ichips.intel.com>
	<1171043929.31538.174521.camel@hal.voltaire.com>
	<20070209192046.GP11411@obsidianresearch.com>
	<45CCD1E2.5050806@ichips.intel.com>
	<1171051315.31538.181667.camel@hal.voltaire.com>
	<45CCDAE0.1080102@ichips.intel.com>
	<1171057719.31538.187820.camel@hal.voltaire.com>
Message-ID: <45CD0A87.9010800@ichips.intel.com>

> The hard part is the global distribution of this information.

The best idea I can come up with for locating remote SAs is to have the SAs 
assign themselves a specific Unicast Global GID Assigned Value.  So, each SA 
gives themselves a GID similar to: 64-bit subnet prefix :: 1.  Hosts on remote 
subnets can then direct requests to the SAs on the remote subnet.

SA failover would need to take this GID with them...

- Sean


From jgunthorpe at obsidianresearch.com  Fri Feb  9 16:48:20 2007
From: jgunthorpe at obsidianresearch.com (Jason Gunthorpe)
Date: Fri, 9 Feb 2007 17:48:20 -0700
Subject: [openib-general] Problem is routing CM REQ
In-Reply-To: <45CCFEDC.3040700@ichips.intel.com>
References: <1170894459.31538.23768.camel@hal.voltaire.com>
	<45CB6A8F.2030705@ichips.intel.com> <45CBB59C.4010709@ichips.intel.com>
	<1171023168.31538.153989.camel@hal.voltaire.com>
	<45CCADC7.5000804@ichips.intel.com>
	<1171043929.31538.174521.camel@hal.voltaire.com>
	<20070209192046.GP11411@obsidianresearch.com>
	<1171057501.31538.187596.camel@hal.voltaire.com>
	<20070209223845.GR11411@obsidianresearch.com>
	<45CCFEDC.3040700@ichips.intel.com>
Message-ID: <20070210004820.GS11411@obsidianresearch.com>

On Fri, Feb 09, 2007 at 03:08:12PM -0800, Sean Hefty wrote:

> The route itself is determined using the SGID, DGID, TClass, FlowLabel.  
> So, as long as the two queries match on these fields, I would think that it 
> would work.

So basically what you are saying is that the TClass and FlowLabel act
as some kind of global dis-ambiguation that lets all SAs know that the
tuple <SGID,DGID,TClass,FlowLabel> MUST be matched with <LRH_A,LRH_B>
on each side.

I can see how this can work, but I think it has big implications, like
global SA database sharing, maybe larger router tables, or limited
router multipath, etc. [1]

I've been thinking that the <SGID,DGID,TClass,FlowLabel> tuple would
only reflect 2 of the 4 lids (ie the ones the router chooses on entry
to the final subnet).

I personally can't see anything discussed so far as a slam dunk answer
to this broader problem.

The very simple reversible paths only solution still seems best to me
only because it involves the least work and only requires that IBA
specify routed reversible paths.

The only missing bit is a way to signal that the target should have
this behavior in the REQ message. Perhaps something like setting the
DLID in the REQ to 0xFFFF?

Jason

[1] - Interestingly with this scheme the first PR query must select
all 4 LIDs (although it may not know what they are..). The PR itself
would return the first two local LIDS and those would also have to be
encoded in the GRH. The 2nd remote PR would recover those LIDs from
the GRH to build the return GRH. Since routers route based on GRH
every GRH also encodes the destination LIDs too!


From sean.hefty at intel.com  Fri Feb  9 18:08:34 2007
From: sean.hefty at intel.com (Sean Hefty)
Date: Fri, 9 Feb 2007 18:08:34 -0800
Subject: [openib-general] Problem is routing CM REQ
In-Reply-To: <20070210004820.GS11411@obsidianresearch.com>
Message-ID: <000001c74cb8$5e80eef0$3cd4180a@amr.corp.intel.com>

>So basically what you are saying is that the TClass and FlowLabel act
>as some kind of global dis-ambiguation that lets all SAs know that the
>tuple <SGID,DGID,TClass,FlowLabel> MUST be matched with <LRH_A,LRH_B>
>on each side.

Sort of...  My reasoning is that if you look at a packet traveling from the
source QP to the destination QP, and examine the packet in some intermediate
subnet (say between two routers), then the only information that it carries is
the <SGID, DGID, TClass, FlowLabel> tuple.  This information must be sufficient
to direct the routing at the endpoints.

I don't see that all SAs need to know this information.  An SA would:

1. Given local and remote GIDs, need to know which router the packet will arrive
on.
2. Know the SLID/DLID mapping used by that router.

It shouldn't need information about the paths used by packets on the remote
subnet.  If a subnet has multiple routers into it, they can forward packets to
the correct router if needed.  (Could the routers just forward to the end node
and insert the expected SLID?)

If the path is reversible (with reversible defined relative to SLID/DLID that is
returned in the path record), then the active node would only need two SA
queries - one to each subnet.  For non-reversible paths, 4 queries may be
needed.

>I've been thinking that the <SGID,DGID,TClass,FlowLabel> tuple would
>only reflect 2 of the 4 lids (ie the ones the router chooses on entry
>to the final subnet).

This was my thinking as well, which is why I think 2 path record queries are
needed.  Each path would specify 2 of the 4 LIDs that we need.  The local path
record gives us the local QP information, and the remote path record is used to
fill in the SLID/DLID in the CM REQ.

>The very simple reversible paths only solution still seems best to me
>only because it involves the least work and only requires that IBA
>specify routed reversible paths.

I'm still trying to find a solution that doesn't violate the architecture as
defined.  I don't see why my idea wouldn't work yet.  It just requires some
unspecified coordination between the local SA and local routers.

- Sean


From vlad at lists.openfabrics.org  Sat Feb 10 02:23:54 2007
From: vlad at lists.openfabrics.org (vlad at lists.openfabrics.org)
Date: Sat, 10 Feb 2007 02:23:54 -0800 (PST)
Subject: [openib-general] ofa_1_2_kernel 20070210-0200 daily build status
Message-ID: <20070210102354.64E09E60804@openfabrics.org>

This email was generated automatically, please do not reply


Common build parameters:  --with-ipoib-mod --with-sdp-mod --with-srp-mod --with-user_mad-mod --with-user_access-mod --with-mthca-mod --with-core-mod --with-addr_trans-mod --with-cxgb3-mod 

Passed:
Passed on i686 with 2.6.15-23-server
Passed on i686 with linux-2.6.19
Passed on i686 with linux-2.6.16
Passed on i686 with linux-2.6.18
Passed on i686 with linux-2.6.17
Passed on i686 with linux-2.6.15
Passed on i686 with linux-2.6.14
Passed on i686 with linux-2.6.12
Passed on i686 with linux-2.6.13
Passed on ppc64 with linux-2.6.18
Passed on powerpc with linux-2.6.19
Passed on powerpc with linux-2.6.18
Passed on x86_64 with linux-2.6.13
Passed on x86_64 with linux-2.6.16
Passed on x86_64 with linux-2.6.19
Passed on x86_64 with linux-2.6.12
Passed on x86_64 with linux-2.6.17
Passed on x86_64 with linux-2.6.18
Passed on x86_64 with linux-2.6.14
Passed on x86_64 with linux-2.6.15
Passed on ppc64 with linux-2.6.19
Passed on ia64 with linux-2.6.19
Passed on powerpc with linux-2.6.17
Passed on powerpc with linux-2.6.14
Passed on powerpc with linux-2.6.15
Passed on powerpc with linux-2.6.16
Passed on ppc64 with linux-2.6.12
Passed on powerpc with linux-2.6.13
Passed on ia64 with linux-2.6.18
Passed on powerpc with linux-2.6.12
Passed on ppc64 with linux-2.6.13
Passed on ppc64 with linux-2.6.15
Passed on ppc64 with linux-2.6.16
Passed on ppc64 with linux-2.6.17
Passed on ia64 with linux-2.6.13
Passed on ppc64 with linux-2.6.14
Passed on ia64 with linux-2.6.16
Passed on ia64 with linux-2.6.15
Passed on ia64 with linux-2.6.17
Passed on ia64 with linux-2.6.12
Passed on ia64 with linux-2.6.14

Failed:
Build failed on ia64 with linux-2.6.16.21-0.8-default
Log:
/home/vlad/tmp/ofa_1_2_kernel-20070210-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core/addr.c:380: error: implicit declaration of function ‘register_netevent_notifier’
/home/vlad/tmp/ofa_1_2_kernel-20070210-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core/addr.c: In function ‘addr_cleanup’:
/home/vlad/tmp/ofa_1_2_kernel-20070210-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core/addr.c:386: error: implicit declaration of function ‘unregister_netevent_notifier’
make[4]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070210-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core/addr.o] Error 1
make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070210-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core] Error 2
make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070210-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband] Error 2
make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070210-0200_linux-2.6.16.21-0.8-default_ia64_check] Error 2
make[1]: Leaving directory `/home/vlad/kernel.org/ia64/linux-2.6.16.21-0.8-default'
make: *** [kernel] Error 2
----------------------------------------------------------------------------------


From halr at voltaire.com  Sat Feb 10 07:49:15 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 10 Feb 2007 10:49:15 -0500
Subject: [openib-general] Unknown SMP Recv
In-Reply-To: <001001c74c87$8b653470$21606d86@one7>
References: <000901c74938$e10b2a30$21606d86@one7>
	<1170689654.4525.201415.camel@hal.voltaire.com>
	<001401c74946$a664a2e0$21606d86@one7>
	<1170695591.4525.207604.camel@hal.voltaire.com>
	<002001c74a33$c2ec1db0$21606d86@one7>
	<1170807564.4525.324195.camel@hal.voltaire.com>
	<001e01c74be2$b4889310$21606d86@one7>
	<1170994529.31538.124584.camel@hal.voltaire.com>
	<000401c74c6d$ce4875f0$21606d86@one7>
	<1171044773.31538.175280.camel@hal.voltaire.com>
	<000401c74c79$74439b50$21606d86@one7>
	<1171051141.2767.7.camel@localhost>
	<001001c74c87$8b653470$21606d86@one7>
Message-ID: <1171122546.31538.251673.camel@hal.voltaire.com>

On Fri, 2007-02-09 at 15:19, Michael Arndt wrote:
> Hi,
> 
> > It is strange, I did similar thing (you can see in
> > management/diags/src/mcm_rereg_test.c) and it worked fine for me.
> 
> What location is that?
> 
> >Which libibumad version you are using? Also I understand you did some
> >changes in the stack, is it related to user_mad? Could you publish this?
> 
> I use OFED-1.1 and attached libibumad version. The stack where I have tested 
> this context wasn't changed to exclude this. It is a diploma thesis and will 
> publish as soon as posible ;)...in german ...sorry.
> 
> The hole example code Hal was asking for is below.

Some comments interspersed below with my modified version which sends
the 10 SMPs.

-- Hal

>  I have marked the 
> position with /* here */. Currently is the retry parameter zero, but I also 
> tested 3.
> 
> Thanks Michael
> 
> // ---- Includes --------------------------------
> #include <infiniband/umad.h>
> #include <string.h>
> #include <errno.h>
> 
> #include "sender.h"
> 
> // ---- Defines und Deklarationen ---------------
> 
>  static const uint8_t  CLASS_SUBN_DIRECTED_ROUTE = 0x81;
>  static const uint8_t  CLASS_SUBN_LID_ROUTE = 0x1;
> 
>  static int long drmad_tid = 0x123;
> 
>  // Prototypes
> 
>  void drsmp_get_init(void *umad, DRPath *path, uint16_t attr, int mod);
>  void drsmp_set_init(void *umad, DRPath *path, uint16_t attr, int mod, void 
> *data);
>  char * drmad_status_str(struct drsmp *drsmp);
>  int str2DRPath(char *str, DRPath *path);
>  int set_bit(int nr, void *method_mask);
> 
> 
> 
> // ---- Main ------------------------------------
> 
> int main (void){
> 
>  int Port_ID = 0;
>  int Agent_ID = 0;
>  int ret;
>  int i;
>  int length, timeout_ms = 10000;
> 
> 
>  void *umad;
>  struct drsmp *smp;
> 
> 
> // ---- Einstellungen ---------------------------
>  int Portnummer = 1;
>  char Devicename [2][UMAD_CA_NAME_LEN];
>  DRPath Path;
>  char Path_Str[64];
> 
>  uint16_t attribute = MAD_ATTR_PORT_INFO; // PortInfo
>  int modifier = 1;
> 
>  struct _register_info{
>   int Management_Class;
>   int Management_Version;
>   uint8_t RMPP_Version;
>   uint32_t Method_Mask[4];
>  } Register_Info;
> 
>  // ++ Wertzuweisung ++
> 
>  Register_Info.Management_Class = CLASS_SUBN_DIRECTED_ROUTE;
>  Register_Info.Management_Version = 1;
>  Register_Info.RMPP_Version = 0;
> 
>  set_bit(0x01,&(Register_Info.Method_Mask));
>  set_bit(0x02,&(Register_Info.Method_Mask));
>  set_bit(0x81,&(Register_Info.Method_Mask));

This overwrites something past method mask.

>  set_bit(0x03,&(Register_Info.Method_Mask));
>  set_bit(0x05,&(Register_Info.Method_Mask));
>  set_bit(0x06,&(Register_Info.Method_Mask));

Several of these methods don't apply to SM class.

Also, your umad_register doesn't use this so this is not needed if that
is the case but are you trying to use solicited or unsolicited sending ?
That is unclear to me as to what you really want.

>  sprintf(Path_Str,"0,1,1,1");
> 
> 
> // ---- Init Phase ------------------------------
>  printf("... Init Lib ...");
>  umad_init();
>  printf("done\n\n");
> 
>  // ++ Debug ++
>  umad_debug(0);
> 
>  printf("... Get CAs Names ...");
>  ret = umad_get_cas_names(Devicename,2);
>  if (!ret) {
>   printf("Fehler: umad_get_cas_names: %i\n",ret);
>   return -1;
>  }
>  else {
>   printf("done\n\n");
>   for (i = 0;i < ret;i++){
>    printf("Devicename: %s\n",Devicename[i]);
>   }
> 
>  }
>  // ++ Open ++
>  printf("... Open Port ...");
>  if ((Port_ID = umad_open_port(Devicename[0],Portnummer)) < 0)
>  {
>   printf("Fehler: umad_open_port: %i\n",Port_ID);
>   return -1;
>  }
>  else printf("done\n\n");
>  // ++ Register ++
>  printf("... Register User Mad ...");
>  if ((Agent_ID = umad_register(Port_ID,Register_Info.Management_Class,
>             Register_Info.Management_Version,
>             Register_Info.RMPP_Version,
>             0)) < 0){

See previous comment on this.

>   printf("Fehler: umad_register : %i\n",Agent_ID);
>   goto Exit;
>  }
>  else printf("done\n\n");
> // ---- Paket bauen -----------------------------
> 
>  printf("... Paket allokieren ...");
>  if (!(umad = umad_alloc(1, umad_size() + IB_MAD_SIZE))){
>   printf("Fehler: umad_alloc\n");
>   goto Exit;
>  }
>  printf("done\n\n");
> 
>  smp = umad_get_mad(umad);
>  printf("... Smp Pointer ... done\n");
> 
>  if ((str2DRPath(Path_Str, &Path)) < 0) printf("Fehler: str2DRPath\n");

I moved this up to where Path_Str was initially set. It wouldn't
actually send the packets multiple times without doing this. I didn't
investigate this further.

>  printf("... SMP bauen ...");
>  drsmp_get_init(umad,&Path,attribute,modifier);
>  printf("... done ...\n\n");
> 
> 
>  //xdump(stderr, "before send:\n", smp, 256);
>  dump_dr_smp(smp);

I got seg fault on this so I commented it out.

>  length = IB_MAD_SIZE;
> 
> /* here */
>  for (i = 0; i < 10; i++){
>      printf("... Send Mad ...");
>        if ((ret = umad_send(Port_ID, Agent_ID, umad, length, 200, 0)) < 0)

The main problem is this:
You cannot reuse the same umad allocation for multiple umad_sends.
That's why you get the error. So I changed this.

Also, since you are not using solicited sends there is no need for the
timeout to be specified but that doesn't really matter.

>           printf("Fehler: umad_send : %i\n",ret);
>        else printf("done\n\n");
>  }
> 
> /*
>  for (i = 0; i < 10; i++){
>    printf("... Recv Mad ...");
>    if (umad_recv(Port_ID, umad, &length, timeout_ms) != Agent_ID)
>         printf("Fehler umad_recv: %s\n", drmad_status_str(smp));
>    else printf("done\n\n");
>  }
> */
> 
>  dump_dr_smp(smp);

Also, got seg fault on this so also commented it out.

>  switch (attribute){
>   case MAD_ATTR_NODE_INFO : dump_node_info((const struct 
> node_info*)&(smp->data[0])); break;
>   case MAD_ATTR_PORT_INFO : dump_port_info(0,0,0,(const struct 
> port_info*)&(smp->data[0])); break;
>  }

Also, got seg fault on this so also commented it out.

> 
> // ---- Down Phase ------------------------------ 
> Exit:
>  printf("... Unregister User Mad ...");
>  if (umad_unregister(Port_ID,Agent_ID) < 0)
>   printf("Fehler: umad_unregister\n");
>  else printf("done\n\n");
> 
>  printf("... Close Port ...");
>  if (Port_ID != -1)
>   if ((umad_close_port(Port_ID)) != 0){
>    printf("Fehler: umad_close_port\n");
>   }
>   else printf("done\n\n");
>  else printf("nix zu tun\n\n");
> 
> }
> 
> // ---- SMP Paket -------------------------------
> 
> 
> void drsmp_get_init(void *umad, DRPath *path, uint16_t attr, int mod)
> {
>    struct drsmp *smp = (struct drsmp *)(umad_get_mad(umad));
> 
>    memset(smp, 0, sizeof (*smp));
> 
>    smp->base_version  = 1;
>    smp->mgmt_class    = CLASS_SUBN_DIRECTED_ROUTE;
>    smp->class_version = 1;
> 
>    smp->method        = 0x01;
>    smp->attr_id      = (uint16_t)htons((uint16_t)attr);
>    smp->attr_mod     = htonl(mod);
>    smp->tid           = htonll(drmad_tid++);
>    smp->dr_slid       = 0xffff;
>    smp->dr_dlid       = 0xffff;
> 
>    umad_set_addr(umad, 0xffff, 0, 0, 0);
> 
>    if (path)
>       memcpy(smp->initial_path, path->path, path->hop_cnt+1);
> 
>    smp->hop_cnt = path->hop_cnt;
> }
> 
> void drsmp_set_init(void *umad, DRPath *path, uint16_t attr, int mod, void 
> *data)
> {
>    struct drsmp *smp = (struct drsmp *)(umad_get_mad(umad));
> 
>    memset(smp, 0, sizeof (*smp));
> 
>    smp->method        = 2;    /* SET */
>    smp->attr_id      = (uint16_t)htons((uint16_t)attr);
>    smp->attr_mod     = htonl(mod);
>    smp->tid           = htonll(drmad_tid++);
>    smp->dr_slid       = 0xffff;
>    smp->dr_dlid       = 0xffff;
> 
>    umad_set_addr(umad, 0xffff, 0, 0, 0);
> 
>    if (path)
>       memcpy(smp->initial_path, path->path, path->hop_cnt+1);
> 
>    if (data)
>       memcpy(smp->data, data, sizeof smp->data);
> 
>    smp->hop_cnt = path->hop_cnt;
> }
> 
> int str2DRPath(char *str, DRPath *path)
> {
>    char *s;
> 
>    path->hop_cnt = -1;
> 
>    //DEBUG("DR str: %s", str);
>    while (str && *str) {
>       if ((s = strchr(str, ',')))
>          *s = 0;
>       path->path[++path->hop_cnt] = atoi(str);
>       if (!s)
>          break;
>       str = s+1;
>    }
> 
> #if 0
>    if (path->path[0] != 0 ||
>       (path->hop_cnt > 0 && dev_port && path->path[1] != dev_port)) {
>       DEBUG("hop 0 != 0 or hop 1 != dev_port");
>       return -1;
>    }
> #endif
> 
>    return path->hop_cnt;
> }
> 

Here's my modified version.

---
// ---- Includes --------------------------------
#include <infiniband/umad.h>
#include <string.h>
#include <errno.h>

#include "sender.h"

// ---- Defines und Deklarationen ---------------

 static const uint8_t  CLASS_SUBN_DIRECTED_ROUTE = 0x81;
 static const uint8_t  CLASS_SUBN_LID_ROUTE = 0x1;

 static int long drmad_tid = 0x123;

 // Prototypes

 void drsmp_get_init(void *umad, DRPath *path, uint16_t attr, int mod);
 void drsmp_set_init(void *umad, DRPath *path, uint16_t attr, int mod, void 
*data);
 char * drmad_status_str(struct drsmp *drsmp);
 int str2DRPath(char *str, DRPath *path);
 int set_bit(int nr, void *method_mask);


// ---- Main ------------------------------------

int main (void){

 int Port_ID = 0;
 int Agent_ID = 0;
 int ret;
 int i;
 int length, timeout_ms = 10000;


 void *umad;
 struct drsmp *smp;


// ---- Einstellungen ---------------------------
 int Portnummer = 1;
 char Devicename [2][UMAD_CA_NAME_LEN];
 DRPath Path;
 char Path_Str[64];

 uint16_t attribute = MAD_ATTR_PORT_INFO; // PortInfo
 int modifier = 1;

 struct _register_info{
  int Management_Class;
  int Management_Version;
  uint8_t RMPP_Version;
  uint32_t Method_Mask[4];
 } Register_Info;

 // ++ Wertzuweisung ++

 Register_Info.Management_Class = CLASS_SUBN_DIRECTED_ROUTE;
 Register_Info.Management_Version = 1;
 Register_Info.RMPP_Version = 0;

 set_bit(0x01,&(Register_Info.Method_Mask));
 set_bit(0x02,&(Register_Info.Method_Mask));
#if 0
 set_bit(0x81,&(Register_Info.Method_Mask));
#endif
 set_bit(0x03,&(Register_Info.Method_Mask));
 set_bit(0x05,&(Register_Info.Method_Mask));
 set_bit(0x06,&(Register_Info.Method_Mask));

 sprintf(Path_Str,"0,1,1,1");
#if 1
if ((str2DRPath(Path_Str, &Path)) < 0) printf("Fehler: str2DRPath\n");
#endif

// ---- Init Phase ------------------------------
 printf("... Init Lib ...");
 umad_init();
 printf("done\n\n");

 // ++ Debug ++
 umad_debug(0);

 printf("... Get CAs Names ...");
 ret = umad_get_cas_names(Devicename,2);
 if (!ret) {
  printf("Fehler: umad_get_cas_names: %i\n",ret);
  return -1;
 }
 else {
  printf("done\n\n");
  for (i = 0;i < ret;i++){
   printf("Devicename: %s\n",Devicename[i]);
  }

 }
 // ++ Open ++
 printf("... Open Port ...");
 if ((Port_ID = umad_open_port(Devicename[0],Portnummer)) < 0)
 {
  printf("Fehler: umad_open_port: %i\n",Port_ID);
  return -1;
 }
 else printf("done\n\n");
 // ++ Register ++
 printf("... Register User Mad ...");
#if 1
 if ((Agent_ID = umad_register(Port_ID,Register_Info.Management_Class,
            Register_Info.Management_Version,
            Register_Info.RMPP_Version,
            0)) < 0){
#else
 if ((Agent_ID = umad_register(Port_ID,Register_Info.Management_Class,
            Register_Info.Management_Version,
            Register_Info.RMPP_Version,
            &(Register_Info.Method_Mask[0]))) < 0){
#endif
  printf("Fehler: umad_register : %i\n",Agent_ID);
  goto Exit;
 }
 else printf("done\n\n");
// ---- Paket bauen -----------------------------

#if 0
 printf("... Paket allokieren ...");
 if (!(umad = umad_alloc(1, umad_size() + IB_MAD_SIZE))){
  printf("Fehler: umad_alloc\n");
  goto Exit;
 }
 printf("done\n\n");

 smp = umad_get_mad(umad);
 printf("... Smp Pointer ... done\n");

 if ((str2DRPath(Path_Str, &Path)) < 0) printf("Fehler: str2DRPath\n");

 printf("... SMP bauen ...");
 drsmp_get_init(umad,&Path,attribute,modifier);
 printf("... done ...\n\n");
#endif

 //xdump(stderr, "before send:\n", smp, 256);
#if 0
 dump_dr_smp(smp);
#endif

 length = IB_MAD_SIZE;

/* here */
 for (i = 0; i < 10; i++){

#if 1
 printf("... Paket allokieren ...");
 if (!(umad = umad_alloc(1, umad_size() + IB_MAD_SIZE))){
  printf("Fehler: umad_alloc %p\n", umad);
  goto Exit;
 }
 printf("done\n\n");

 smp = umad_get_mad(umad);
 printf("... Smp Pointer ... done\n");

#if 0
 if ((str2DRPath(Path_Str, &Path)) < 0) printf("Fehler: str2DRPath\n");
#endif

 printf("... SMP bauen ...");
 drsmp_get_init(umad,&Path,attribute,modifier);
 printf("... done ...\n\n");
#endif

     printf("... Send Mad ...");
#if 0
      if ((ret = umad_send(Port_ID, Agent_ID, umad, length, 200, 0)) < 0)
#else
      if ((ret = umad_send(Port_ID, Agent_ID, umad, length, 0, 0)) < 0)
#endif
          printf("Fehler: umad_send : %i\n",ret);
       else printf("done\n\n");
 }

/*
 for (i = 0; i < 10; i++){
   printf("... Recv Mad ...");
   if (umad_recv(Port_ID, umad, &length, timeout_ms) != Agent_ID)
        printf("Fehler umad_recv: %s\n", drmad_status_str(smp));
   else printf("done\n\n");
 }
*/

#if 0
 dump_dr_smp(smp);

 switch (attribute){
  case MAD_ATTR_NODE_INFO : dump_node_info((const struct 
node_info*)&(smp->data[0])); break;
  case MAD_ATTR_PORT_INFO : dump_port_info(0,0,0,(const struct 
port_info*)&(smp->data[0])); break;
 }
#endif

// ---- Down Phase ------------------------------ 
Exit:
 printf("... Unregister User Mad ...");
 if (umad_unregister(Port_ID,Agent_ID) < 0)
  printf("Fehler: umad_unregister\n");
 else printf("done\n\n");

 printf("... Close Port ...");
 if (Port_ID != -1)
  if ((umad_close_port(Port_ID)) != 0){
   printf("Fehler: umad_close_port\n");
  }
  else printf("done\n\n");
 else printf("nix zu tun\n\n");

}

// ---- SMP Paket -------------------------------


void drsmp_get_init(void *umad, DRPath *path, uint16_t attr, int mod)
{
   struct drsmp *smp = (struct drsmp *)(umad_get_mad(umad));

   memset(smp, 0, sizeof (*smp));

   smp->base_version  = 1;
   smp->mgmt_class    = CLASS_SUBN_DIRECTED_ROUTE;
   smp->class_version = 1;

   smp->method        = 0x01;
   smp->attr_id      = (uint16_t)htons((uint16_t)attr);
   smp->attr_mod     = htonl(mod);
   smp->tid           = htonll(drmad_tid++);
   smp->dr_slid       = 0xffff;
   smp->dr_dlid       = 0xffff;

   umad_set_addr(umad, 0xffff, 0, 0, 0);

   if (path)
      memcpy(smp->initial_path, path->path, path->hop_cnt+1);

   smp->hop_cnt = path->hop_cnt;
}

void drsmp_set_init(void *umad, DRPath *path, uint16_t attr, int mod, void 
*data)
{
   struct drsmp *smp = (struct drsmp *)(umad_get_mad(umad));

   memset(smp, 0, sizeof (*smp));

   smp->method        = 2;    /* SET */
   smp->attr_id      = (uint16_t)htons((uint16_t)attr);
   smp->attr_mod     = htonl(mod);
   smp->tid           = htonll(drmad_tid++);
   smp->dr_slid       = 0xffff;
   smp->dr_dlid       = 0xffff;

   umad_set_addr(umad, 0xffff, 0, 0, 0);

   if (path)
      memcpy(smp->initial_path, path->path, path->hop_cnt+1);

   if (data)
      memcpy(smp->data, data, sizeof smp->data);

   smp->hop_cnt = path->hop_cnt;
}

int str2DRPath(char *str, DRPath *path)
{
   char *s;

   path->hop_cnt = -1;

   //DEBUG("DR str: %s", str);
   while (str && *str) {
      if ((s = strchr(str, ',')))
         *s = 0;
      path->path[++path->hop_cnt] = atoi(str);
      if (!s)
         break;
      str = s+1;
   }

#if 0
   if (path->path[0] != 0 ||
      (path->hop_cnt > 0 && dev_port && path->path[1] != dev_port)) {
      DEBUG("hop 0 != 0 or hop 1 != dev_port");
      return -1;
   }
#endif

   return path->hop_cnt;
}


From mst at mellanox.co.il  Sat Feb 10 09:51:18 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Sat, 10 Feb 2007 19:51:18 +0200
Subject: [openib-general] please pull for 2.6.21: fix + add IB multicast
	support
In-Reply-To: <45CCB6DF.3020602@ichips.intel.com>
References: <000701c741a6$16dc4760$ff0da8c0@amr.corp.intel.com>
	<45BF8E17.2010805@ichips.intel.com> <45C37BE9.5040105@ichips.intel.com>
	<45C85B39.4080700@voltaire.com> <45CB3537.8060508@voltaire.com>
	<adaodo4nqz7.fsf@cisco.com> <20070209080418.GQ6560@mellanox.co.il>
	<45CCB6DF.3020602@ichips.intel.com>
Message-ID: <20070210175118.GX6560@mellanox.co.il>

> > +
> > 
> > It seems same goes for
> > 
> > +       mc = kzalloc(sizeof(*mc), GFP_KERNEL);
> > +       if (!mc)
> > +               return NULL;
> 
> We would need to set events_reported.

IMO, probably worth it to init just this one field rather than use up
initialized memory - and I think it's clearer.

-- 
MST


From rdreier at cisco.com  Sat Feb 10 10:32:08 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Sat, 10 Feb 2007 10:32:08 -0800
Subject: [openib-general] please pull for 2.6.21: fix + add IB multicast
	support
In-Reply-To: <20070210175118.GX6560@mellanox.co.il> (Michael S.
	Tsirkin's message of "Sat, 10 Feb 2007 19:51:18 +0200")
References: <000701c741a6$16dc4760$ff0da8c0@amr.corp.intel.com>
	<45BF8E17.2010805@ichips.intel.com> <45C37BE9.5040105@ichips.intel.com>
	<45C85B39.4080700@voltaire.com> <45CB3537.8060508@voltaire.com>
	<adaodo4nqz7.fsf@cisco.com> <20070209080418.GQ6560@mellanox.co.il>
	<45CCB6DF.3020602@ichips.intel.com>
	<20070210175118.GX6560@mellanox.co.il>
Message-ID: <adabqk1hniv.fsf@cisco.com>

 > IMO, probably worth it to init just this one field rather than use up
 > initialized memory - and I think it's clearer.

What do you mean by using up initialized memory?  kzalloc() just does
a memset(0), and it's not like there's a limit on the number of times
we're allowed to call memset().

 - R.


From swise at opengridcomputing.com  Sat Feb 10 10:52:53 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Sat, 10 Feb 2007 12:52:53 -0600
Subject: [openib-general] [PATCH] for-2.6.21 Remove hw/cxgb3/core
	subdirectory.
Message-ID: <1171133573.11017.41.camel@stevo-desktop>

From: Steve Wise <swise at opengridcomputing.com>

Signed-off-by: Steve Wise <swise at opengridcomputing.com>
---

 drivers/infiniband/hw/cxgb3/Makefile             |    4 
 drivers/infiniband/hw/cxgb3/core/cxio_dbg.c      |  205 ----
 drivers/infiniband/hw/cxgb3/core/cxio_hal.c      | 1280 ----------------------
 drivers/infiniband/hw/cxgb3/core/cxio_hal.h      |  201 ---
 drivers/infiniband/hw/cxgb3/core/cxio_resource.c |  331 ------
 drivers/infiniband/hw/cxgb3/core/cxio_resource.h |   70 -
 drivers/infiniband/hw/cxgb3/core/cxio_wr.h       |  685 ------------
 drivers/infiniband/hw/cxgb3/cxio_dbg.c           |  205 ++++
 drivers/infiniband/hw/cxgb3/cxio_hal.c           | 1280 ++++++++++++++++++++++
 drivers/infiniband/hw/cxgb3/cxio_hal.h           |  201 +++
 drivers/infiniband/hw/cxgb3/cxio_resource.c      |  331 ++++++
 drivers/infiniband/hw/cxgb3/cxio_resource.h      |   70 +
 drivers/infiniband/hw/cxgb3/cxio_wr.h            |  685 ++++++++++++
 drivers/infiniband/hw/cxgb3/iwch_provider.c      |    2 
 14 files changed, 2775 insertions(+), 2775 deletions(-)

diff --git a/drivers/infiniband/hw/cxgb3/Makefile b/drivers/infiniband/hw/cxgb3/Makefile
index ae63195..0e110f3 100644
--- a/drivers/infiniband/hw/cxgb3/Makefile
+++ b/drivers/infiniband/hw/cxgb3/Makefile
@@ -4,9 +4,9 @@ EXTRA_CFLAGS += -I$(TOPDIR)/drivers/net/
 obj-$(CONFIG_INFINIBAND_CXGB3) += iw_cxgb3.o
 
 iw_cxgb3-y :=  iwch_cm.o iwch_ev.o iwch_cq.o iwch_qp.o iwch_mem.o \
-	       iwch_provider.o iwch.o core/cxio_hal.o core/cxio_resource.o
+	       iwch_provider.o iwch.o cxio_hal.o cxio_resource.o
 
 ifdef CONFIG_INFINIBAND_CXGB3_DEBUG
 EXTRA_CFLAGS += -DDEBUG
-iw_cxgb3-y += core/cxio_dbg.o
+iw_cxgb3-y += cxio_dbg.o
 endif
diff --git a/drivers/infiniband/hw/cxgb3/core/cxio_dbg.c b/drivers/infiniband/hw/cxgb3/core/cxio_dbg.c
deleted file mode 100644
index dfaa704..0000000
--- a/drivers/infiniband/hw/cxgb3/core/cxio_dbg.c
+++ /dev/null
@@ -1,205 +0,0 @@
-/*
- * Copyright (c) 2006 Chelsio, Inc. All rights reserved.
- * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved.
- *
- * This software is available to you under a choice of one of two
- * licenses.  You may choose to be licensed under the terms of the GNU
- * General Public License (GPL) Version 2, available from the file
- * COPYING in the main directory of this source tree, or the
- * OpenIB.org BSD license below:
- *
- *     Redistribution and use in source and binary forms, with or
- *     without modification, are permitted provided that the following
- *     conditions are met:
- *
- *      - Redistributions of source code must retain the above
- *        copyright notice, this list of conditions and the following
- *        disclaimer.
- *
- *      - Redistributions in binary form must reproduce the above
- *        copyright notice, this list of conditions and the following
- *        disclaimer in the documentation and/or other materials
- *        provided with the distribution.
- *
- * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
- * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
- * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
- * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
- * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
- * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
- * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
- * SOFTWARE.
- */
-#ifdef DEBUG
-#include <linux/types.h>
-#include "common.h"
-#include "cxgb3_ioctl.h"
-#include "cxio_hal.h"
-#include "cxio_wr.h"
-
-void cxio_dump_tpt(struct cxio_rdev *rdev, u32 stag)
-{
-	struct ch_mem_range *m;
-	u64 *data;
-	int rc;
-	int size = 32;
-
-	m = kmalloc(sizeof(*m) + size, GFP_ATOMIC);
-	if (!m) {
-		PDBG("%s couldn't allocate memory.\n", __FUNCTION__);
-		return;
-	}
-	m->mem_id = MEM_PMRX;
-	m->addr = (stag>>8) * 32 + rdev->rnic_info.tpt_base;
-	m->len = size;
-	PDBG("%s TPT addr 0x%x len %d\n", __FUNCTION__, m->addr, m->len);
-	rc = rdev->t3cdev_p->ctl(rdev->t3cdev_p, RDMA_GET_MEM, m);
-	if (rc) {
-		PDBG("%s toectl returned error %d\n", __FUNCTION__, rc);
-		kfree(m);
-		return;
-	}
-
-	data = (u64 *)m->buf;
-	while (size > 0) {
-		PDBG("TPT %08x: %016llx\n", m->addr, (u64)*data);
-		size -= 8;
-		data++;
-		m->addr += 8;
-	}
-	kfree(m);
-}
-
-void cxio_dump_pbl(struct cxio_rdev *rdev, u32 pbl_addr, uint len, u8 shift)
-{
-	struct ch_mem_range *m;
-	u64 *data;
-	int rc;
-	int size, npages;
-
-	shift += 12;
-	npages = (len + (1ULL << shift) - 1) >> shift;
-	size = npages * sizeof(u64);
-
-	m = kmalloc(sizeof(*m) + size, GFP_ATOMIC);
-	if (!m) {
-		PDBG("%s couldn't allocate memory.\n", __FUNCTION__);
-		return;
-	}
-	m->mem_id = MEM_PMRX;
-	m->addr = pbl_addr;
-	m->len = size;
-	PDBG("%s PBL addr 0x%x len %d depth %d\n",
-		__FUNCTION__, m->addr, m->len, npages);
-	rc = rdev->t3cdev_p->ctl(rdev->t3cdev_p, RDMA_GET_MEM, m);
-	if (rc) {
-		PDBG("%s toectl returned error %d\n", __FUNCTION__, rc);
-		kfree(m);
-		return;
-	}
-
-	data = (u64 *)m->buf;
-	while (size > 0) {
-		PDBG("PBL %08x: %016llx\n", m->addr, (u64)*data);
-		size -= 8;
-		data++;
-		m->addr += 8;
-	}
-	kfree(m);
-}
-
-void cxio_dump_wqe(union t3_wr *wqe)
-{
-	__be64 *data = (__be64 *)wqe;
-	uint size = (uint)(be64_to_cpu(*data) & 0xff);
-
-	if (size == 0)
-		size = 8;
-	while (size > 0) {
-		PDBG("WQE %p: %016llx\n", data, be64_to_cpu(*data));
-		size--;
-		data++;
-	}
-}
-
-void cxio_dump_wce(struct t3_cqe *wce)
-{
-	__be64 *data = (__be64 *)wce;
-	int size = sizeof(*wce);
-
-	while (size > 0) {
-		PDBG("WCE %p: %016llx\n", data, be64_to_cpu(*data));
-		size -= 8;
-		data++;
-	}
-}
-
-void cxio_dump_rqt(struct cxio_rdev *rdev, u32 hwtid, int nents)
-{
-	struct ch_mem_range *m;
-	int size = nents * 64;
-	u64 *data;
-	int rc;
-
-	m = kmalloc(sizeof(*m) + size, GFP_ATOMIC);
-	if (!m) {
-		PDBG("%s couldn't allocate memory.\n", __FUNCTION__);
-		return;
-	}
-	m->mem_id = MEM_PMRX;
-	m->addr = ((hwtid)<<10) + rdev->rnic_info.rqt_base;
-	m->len = size;
-	PDBG("%s RQT addr 0x%x len %d\n", __FUNCTION__, m->addr, m->len);
-	rc = rdev->t3cdev_p->ctl(rdev->t3cdev_p, RDMA_GET_MEM, m);
-	if (rc) {
-		PDBG("%s toectl returned error %d\n", __FUNCTION__, rc);
-		kfree(m);
-		return;
-	}
-
-	data = (u64 *)m->buf;
-	while (size > 0) {
-		PDBG("RQT %08x: %016llx\n", m->addr, (u64)*data);
-		size -= 8;
-		data++;
-		m->addr += 8;
-	}
-	kfree(m);
-}
-
-void cxio_dump_tcb(struct cxio_rdev *rdev, u32 hwtid)
-{
-	struct ch_mem_range *m;
-	int size = TCB_SIZE;
-	u32 *data;
-	int rc;
-
-	m = kmalloc(sizeof(*m) + size, GFP_ATOMIC);
-	if (!m) {
-		PDBG("%s couldn't allocate memory.\n", __FUNCTION__);
-		return;
-	}
-	m->mem_id = MEM_CM;
-	m->addr = hwtid * size;
-	m->len = size;
-	PDBG("%s TCB %d len %d\n", __FUNCTION__, m->addr, m->len);
-	rc = rdev->t3cdev_p->ctl(rdev->t3cdev_p, RDMA_GET_MEM, m);
-	if (rc) {
-		PDBG("%s toectl returned error %d\n", __FUNCTION__, rc);
-		kfree(m);
-		return;
-	}
-
-	data = (u32 *)m->buf;
-	while (size > 0) {
-		printk("%2u: %08x %08x %08x %08x %08x %08x %08x %08x\n",
-			m->addr,
-			*(data+2), *(data+3), *(data),*(data+1),
-			*(data+6), *(data+7), *(data+4), *(data+5));
-		size -= 32;
-		data += 8;
-		m->addr += 32;
-	}
-	kfree(m);
-}
-#endif
diff --git a/drivers/infiniband/hw/cxgb3/core/cxio_hal.c b/drivers/infiniband/hw/cxgb3/core/cxio_hal.c
deleted file mode 100644
index 19553b3..0000000
--- a/drivers/infiniband/hw/cxgb3/core/cxio_hal.c
+++ /dev/null
@@ -1,1280 +0,0 @@
-/*
- * Copyright (c) 2006 Chelsio, Inc. All rights reserved.
- * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved.
- *
- * This software is available to you under a choice of one of two
- * licenses.  You may choose to be licensed under the terms of the GNU
- * General Public License (GPL) Version 2, available from the file
- * COPYING in the main directory of this source tree, or the
- * OpenIB.org BSD license below:
- *
- *     Redistribution and use in source and binary forms, with or
- *     without modification, are permitted provided that the following
- *     conditions are met:
- *
- *      - Redistributions of source code must retain the above
- *        copyright notice, this list of conditions and the following
- *        disclaimer.
- *
- *      - Redistributions in binary form must reproduce the above
- *        copyright notice, this list of conditions and the following
- *        disclaimer in the documentation and/or other materials
- *        provided with the distribution.
- *
- * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
- * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
- * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
- * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
- * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
- * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
- * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
- * SOFTWARE.
- */
-#include <asm/semaphore.h>
-#include <asm/delay.h>
-
-#include <linux/netdevice.h>
-#include <linux/sched.h>
-#include <linux/spinlock.h>
-#include <linux/pci.h>
-
-#include "cxio_resource.h"
-#include "cxio_hal.h"
-#include "cxgb3_offload.h"
-#include "sge_defs.h"
-
-static LIST_HEAD(rdev_list);
-static cxio_hal_ev_callback_func_t cxio_ev_cb = NULL;
-
-static inline struct cxio_rdev *cxio_hal_find_rdev_by_name(char *dev_name)
-{
-	struct cxio_rdev *rdev;
-
-	list_for_each_entry(rdev, &rdev_list, entry)
-		if (!strcmp(rdev->dev_name, dev_name))
-			return rdev;
-	return NULL;
-}
-
-static inline struct cxio_rdev *cxio_hal_find_rdev_by_t3cdev(struct t3cdev
-							     *tdev)
-{
-	struct cxio_rdev *rdev;
-
-	list_for_each_entry(rdev, &rdev_list, entry)
-		if (rdev->t3cdev_p == tdev)
-			return rdev;
-	return NULL;
-}
-
-int cxio_hal_cq_op(struct cxio_rdev *rdev_p, struct t3_cq *cq,
-		   enum t3_cq_opcode op, u32 credit)
-{
-	int ret;
-	struct t3_cqe *cqe;
-	u32 rptr;
-
-	struct rdma_cq_op setup;
-	setup.id = cq->cqid;
-	setup.credits = (op == CQ_CREDIT_UPDATE) ? credit : 0;
-	setup.op = op;
-	ret = rdev_p->t3cdev_p->ctl(rdev_p->t3cdev_p, RDMA_CQ_OP, &setup);
-
-	if ((ret < 0) || (op == CQ_CREDIT_UPDATE))
-		return ret;
-
-	/*
-	 * If the rearm returned an index other than our current index,
-	 * then there might be CQE's in flight (being DMA'd).  We must wait
-	 * here for them to complete or the consumer can miss a notification.
-	 */
-	if (Q_PTR2IDX((cq->rptr), cq->size_log2) != ret) {
-		int i=0;
-
-		rptr = cq->rptr;
-
-		/*
-		 * Keep the generation correct by bumping rptr until it
-		 * matches the index returned by the rearm - 1.
-		 */
-		while (Q_PTR2IDX((rptr+1), cq->size_log2) != ret)
-			rptr++;
-
-		/*
-		 * Now rptr is the index for the (last) cqe that was
-		 * in-flight at the time the HW rearmed the CQ.  We
-		 * spin until that CQE is valid.
-		 */
-		cqe = cq->queue + Q_PTR2IDX(rptr, cq->size_log2);
-		while (!CQ_VLD_ENTRY(rptr, cq->size_log2, cqe)) {
-			udelay(1);
-			if (i++ > 1000000) {
-				BUG_ON(1);
-				printk(KERN_ERR "%s: stalled rnic\n",
-				       rdev_p->dev_name);
-				return -EIO;
-			}
-		}
-	}
-	return 0;
-}
-
-static inline int cxio_hal_clear_cq_ctx(struct cxio_rdev *rdev_p, u32 cqid)
-{
-	struct rdma_cq_setup setup;
-	setup.id = cqid;
-	setup.base_addr = 0;	/* NULL address */
-	setup.size = 0;		/* disaable the CQ */
-	setup.credits = 0;
-	setup.credit_thres = 0;
-	setup.ovfl_mode = 0;
-	return (rdev_p->t3cdev_p->ctl(rdev_p->t3cdev_p, RDMA_CQ_SETUP, &setup));
-}
-
-int cxio_hal_clear_qp_ctx(struct cxio_rdev *rdev_p, u32 qpid)
-{
-	u64 sge_cmd;
-	struct t3_modify_qp_wr *wqe;
-	struct sk_buff *skb = alloc_skb(sizeof(*wqe), GFP_KERNEL);
-	if (!skb) {
-		PDBG("%s alloc_skb failed\n", __FUNCTION__);
-		return -ENOMEM;
-	}
-	wqe = (struct t3_modify_qp_wr *) skb_put(skb, sizeof(*wqe));
-	memset(wqe, 0, sizeof(*wqe));
-	build_fw_riwrh((struct fw_riwrh *) wqe, T3_WR_QP_MOD, 3, 1, qpid, 7);
-	wqe->flags = cpu_to_be32(MODQP_WRITE_EC);
-	sge_cmd = qpid << 8 | 3;
-	wqe->sge_cmd = cpu_to_be64(sge_cmd);
-	skb->priority = CPL_PRIORITY_CONTROL;
-	return (cxgb3_ofld_send(rdev_p->t3cdev_p, skb));
-}
-
-int cxio_create_cq(struct cxio_rdev *rdev_p, struct t3_cq *cq)
-{
-	struct rdma_cq_setup setup;
-	int size = (1UL << (cq->size_log2)) * sizeof(struct t3_cqe);
-
-	cq->cqid = cxio_hal_get_cqid(rdev_p->rscp);
-	if (!cq->cqid)
-		return -ENOMEM;
-	cq->sw_queue = kzalloc(size, GFP_KERNEL);
-	if (!cq->sw_queue)
-		return -ENOMEM;
-	cq->queue = dma_alloc_coherent(&(rdev_p->rnic_info.pdev->dev),
-					     (1UL << (cq->size_log2)) *
-					     sizeof(struct t3_cqe),
-					     &(cq->dma_addr), GFP_KERNEL);
-	if (!cq->queue) {
-		kfree(cq->sw_queue);
-		return -ENOMEM;
-	}
-	pci_unmap_addr_set(cq, mapping, cq->dma_addr);
-	memset(cq->queue, 0, size);
-	setup.id = cq->cqid;
-	setup.base_addr = (u64) (cq->dma_addr);
-	setup.size = 1UL << cq->size_log2;
-	setup.credits = 65535;
-	setup.credit_thres = 1;
-	if (rdev_p->t3cdev_p->type == T3B)
-		setup.ovfl_mode = 0;
-	else
-		setup.ovfl_mode = 1;
-	return (rdev_p->t3cdev_p->ctl(rdev_p->t3cdev_p, RDMA_CQ_SETUP, &setup));
-}
-
-int cxio_resize_cq(struct cxio_rdev *rdev_p, struct t3_cq *cq)
-{
-	struct rdma_cq_setup setup;
-	setup.id = cq->cqid;
-	setup.base_addr = (u64) (cq->dma_addr);
-	setup.size = 1UL << cq->size_log2;
-	setup.credits = setup.size;
-	setup.credit_thres = setup.size;	/* TBD: overflow recovery */
-	setup.ovfl_mode = 1;
-	return (rdev_p->t3cdev_p->ctl(rdev_p->t3cdev_p, RDMA_CQ_SETUP, &setup));
-}
-
-static u32 get_qpid(struct cxio_rdev *rdev_p, struct cxio_ucontext *uctx)
-{
-	struct cxio_qpid_list *entry;
-	u32 qpid;
-	int i;
-
-	mutex_lock(&uctx->lock);
-	if (!list_empty(&uctx->qpids)) {
-		entry = list_entry(uctx->qpids.next, struct cxio_qpid_list,
-				   entry);
-		list_del(&entry->entry);
-		qpid = entry->qpid;
-		kfree(entry);
-	} else {
-		qpid = cxio_hal_get_qpid(rdev_p->rscp);
-		if (!qpid)
-			goto out;
-		for (i = qpid+1; i & rdev_p->qpmask; i++) {
-			entry = kmalloc(sizeof *entry, GFP_KERNEL);
-			if (!entry)
-				break;
-			entry->qpid = i;
-			list_add_tail(&entry->entry, &uctx->qpids);
-		}
-	}
-out:
-	mutex_unlock(&uctx->lock);
-	PDBG("%s qpid 0x%x\n", __FUNCTION__, qpid);
-	return qpid;
-}
-
-static void put_qpid(struct cxio_rdev *rdev_p, u32 qpid,
-		     struct cxio_ucontext *uctx)
-{
-	struct cxio_qpid_list *entry;
-
-	entry = kmalloc(sizeof *entry, GFP_KERNEL);
-	if (!entry)
-		return;
-	PDBG("%s qpid 0x%x\n", __FUNCTION__, qpid);
-	entry->qpid = qpid;
-	mutex_lock(&uctx->lock);
-	list_add_tail(&entry->entry, &uctx->qpids);
-	mutex_unlock(&uctx->lock);
-}
-
-void cxio_release_ucontext(struct cxio_rdev *rdev_p, struct cxio_ucontext *uctx)
-{
-	struct list_head *pos, *nxt;
-	struct cxio_qpid_list *entry;
-
-	mutex_lock(&uctx->lock);
-	list_for_each_safe(pos, nxt, &uctx->qpids) {
-		entry = list_entry(pos, struct cxio_qpid_list, entry);
-		list_del_init(&entry->entry);
-		if (!(entry->qpid & rdev_p->qpmask))
-			cxio_hal_put_qpid(rdev_p->rscp, entry->qpid);
-		kfree(entry);
-	}
-	mutex_unlock(&uctx->lock);
-}
-
-void cxio_init_ucontext(struct cxio_rdev *rdev_p, struct cxio_ucontext *uctx)
-{
-	INIT_LIST_HEAD(&uctx->qpids);
-	mutex_init(&uctx->lock);
-}
-
-int cxio_create_qp(struct cxio_rdev *rdev_p, u32 kernel_domain,
-		   struct t3_wq *wq, struct cxio_ucontext *uctx)
-{
-	int depth = 1UL << wq->size_log2;
-	int rqsize = 1UL << wq->rq_size_log2;
-
-	wq->qpid = get_qpid(rdev_p, uctx);
-	if (!wq->qpid)
-		return -ENOMEM;
-
-	wq->rq = kzalloc(depth * sizeof(u64), GFP_KERNEL);
-	if (!wq->rq)
-		goto err1;
-
-	wq->rq_addr = cxio_hal_rqtpool_alloc(rdev_p, rqsize);
-	if (!wq->rq_addr)
-		goto err2;
-
-	wq->sq = kzalloc(depth * sizeof(struct t3_swsq), GFP_KERNEL);
-	if (!wq->sq)
-		goto err3;
-
-	wq->queue = dma_alloc_coherent(&(rdev_p->rnic_info.pdev->dev),
-					     depth * sizeof(union t3_wr),
-					     &(wq->dma_addr), GFP_KERNEL);
-	if (!wq->queue)
-		goto err4;
-
-	memset(wq->queue, 0, depth * sizeof(union t3_wr));
-	pci_unmap_addr_set(wq, mapping, wq->dma_addr);
-	wq->doorbell = (void __iomem *)rdev_p->rnic_info.kdb_addr;
-	if (!kernel_domain)
-		wq->udb = (u64)rdev_p->rnic_info.udbell_physbase +
-					(wq->qpid << rdev_p->qpshift);
-	PDBG("%s qpid 0x%x doorbell 0x%p udb 0x%llx\n", __FUNCTION__,
-	     wq->qpid, wq->doorbell, wq->udb);
-	return 0;
-err4:
-	kfree(wq->sq);
-err3:
-	cxio_hal_rqtpool_free(rdev_p, wq->rq_addr, rqsize);
-err2:
-	kfree(wq->rq);
-err1:
-	put_qpid(rdev_p, wq->qpid, uctx);
-	return -ENOMEM;
-}
-
-int cxio_destroy_cq(struct cxio_rdev *rdev_p, struct t3_cq *cq)
-{
-	int err;
-	err = cxio_hal_clear_cq_ctx(rdev_p, cq->cqid);
-	kfree(cq->sw_queue);
-	dma_free_coherent(&(rdev_p->rnic_info.pdev->dev),
-			  (1UL << (cq->size_log2))
-			  * sizeof(struct t3_cqe), cq->queue,
-			  pci_unmap_addr(cq, mapping));
-	cxio_hal_put_cqid(rdev_p->rscp, cq->cqid);
-	return err;
-}
-
-int cxio_destroy_qp(struct cxio_rdev *rdev_p, struct t3_wq *wq,
-		    struct cxio_ucontext *uctx)
-{
-	dma_free_coherent(&(rdev_p->rnic_info.pdev->dev),
-			  (1UL << (wq->size_log2))
-			  * sizeof(union t3_wr), wq->queue,
-			  pci_unmap_addr(wq, mapping));
-	kfree(wq->sq);
-	cxio_hal_rqtpool_free(rdev_p, wq->rq_addr, (1UL << wq->rq_size_log2));
-	kfree(wq->rq);
-	put_qpid(rdev_p, wq->qpid, uctx);
-	return 0;
-}
-
-static void insert_recv_cqe(struct t3_wq *wq, struct t3_cq *cq)
-{
-	struct t3_cqe cqe;
-
-	PDBG("%s wq %p cq %p sw_rptr 0x%x sw_wptr 0x%x\n", __FUNCTION__,
-	     wq, cq, cq->sw_rptr, cq->sw_wptr);
-	memset(&cqe, 0, sizeof(cqe));
-	cqe.header = cpu_to_be32(V_CQE_STATUS(TPT_ERR_SWFLUSH) |
-			         V_CQE_OPCODE(T3_SEND) |
-				 V_CQE_TYPE(0) |
-				 V_CQE_SWCQE(1) |
-				 V_CQE_QPID(wq->qpid) |
-				 V_CQE_GENBIT(Q_GENBIT(cq->sw_wptr,
-						       cq->size_log2)));
-	*(cq->sw_queue + Q_PTR2IDX(cq->sw_wptr, cq->size_log2)) = cqe;
-	cq->sw_wptr++;
-}
-
-void cxio_flush_rq(struct t3_wq *wq, struct t3_cq *cq, int count)
-{
-	u32 ptr;
-
-	PDBG("%s wq %p cq %p\n", __FUNCTION__, wq, cq);
-
-	/* flush RQ */
-	PDBG("%s rq_rptr %u rq_wptr %u skip count %u\n", __FUNCTION__,
-	    wq->rq_rptr, wq->rq_wptr, count);
-	ptr = wq->rq_rptr + count;
-	while (ptr++ != wq->rq_wptr)
-		insert_recv_cqe(wq, cq);
-}
-
-static void insert_sq_cqe(struct t3_wq *wq, struct t3_cq *cq,
-		          struct t3_swsq *sqp)
-{
-	struct t3_cqe cqe;
-
-	PDBG("%s wq %p cq %p sw_rptr 0x%x sw_wptr 0x%x\n", __FUNCTION__,
-	     wq, cq, cq->sw_rptr, cq->sw_wptr);
-	memset(&cqe, 0, sizeof(cqe));
-	cqe.header = cpu_to_be32(V_CQE_STATUS(TPT_ERR_SWFLUSH) |
-			         V_CQE_OPCODE(sqp->opcode) |
-			         V_CQE_TYPE(1) |
-			         V_CQE_SWCQE(1) |
-			         V_CQE_QPID(wq->qpid) |
-			         V_CQE_GENBIT(Q_GENBIT(cq->sw_wptr,
-						       cq->size_log2)));
-	cqe.u.scqe.wrid_hi = sqp->sq_wptr;
-
-	*(cq->sw_queue + Q_PTR2IDX(cq->sw_wptr, cq->size_log2)) = cqe;
-	cq->sw_wptr++;
-}
-
-void cxio_flush_sq(struct t3_wq *wq, struct t3_cq *cq, int count)
-{
-	__u32 ptr;
-	struct t3_swsq *sqp = wq->sq + Q_PTR2IDX(wq->sq_rptr, wq->sq_size_log2);
-
-	ptr = wq->sq_rptr + count;
-	sqp += count;
-	while (ptr != wq->sq_wptr) {
-		insert_sq_cqe(wq, cq, sqp);
-		sqp++;
-		ptr++;
-	}
-}
-
-/*
- * Move all CQEs from the HWCQ into the SWCQ.
- */
-void cxio_flush_hw_cq(struct t3_cq *cq)
-{
-	struct t3_cqe *cqe, *swcqe;
-
-	PDBG("%s cq %p cqid 0x%x\n", __FUNCTION__, cq, cq->cqid);
-	cqe = cxio_next_hw_cqe(cq);
-	while (cqe) {
-		PDBG("%s flushing hwcq rptr 0x%x to swcq wptr 0x%x\n",
-		     __FUNCTION__, cq->rptr, cq->sw_wptr);
-		swcqe = cq->sw_queue + Q_PTR2IDX(cq->sw_wptr, cq->size_log2);
-		*swcqe = *cqe;
-		swcqe->header |= cpu_to_be32(V_CQE_SWCQE(1));
-		cq->sw_wptr++;
-		cq->rptr++;
-		cqe = cxio_next_hw_cqe(cq);
-	}
-}
-
-static inline int cqe_completes_wr(struct t3_cqe *cqe, struct t3_wq *wq)
-{
-	if (CQE_OPCODE(*cqe) == T3_TERMINATE)
-		return 0;
-
-	if ((CQE_OPCODE(*cqe) == T3_RDMA_WRITE) && RQ_TYPE(*cqe))
-		return 0;
-
-	if ((CQE_OPCODE(*cqe) == T3_READ_RESP) && SQ_TYPE(*cqe))
-		return 0;
-
-	if ((CQE_OPCODE(*cqe) == T3_SEND) && RQ_TYPE(*cqe) &&
-	    Q_EMPTY(wq->rq_rptr, wq->rq_wptr))
-		return 0;
-
-	return 1;
-}
-
-void cxio_count_scqes(struct t3_cq *cq, struct t3_wq *wq, int *count)
-{
-	struct t3_cqe *cqe;
-	u32 ptr;
-
-	*count = 0;
-	ptr = cq->sw_rptr;
-	while (!Q_EMPTY(ptr, cq->sw_wptr)) {
-		cqe = cq->sw_queue + (Q_PTR2IDX(ptr, cq->size_log2));
-		if ((SQ_TYPE(*cqe) || (CQE_OPCODE(*cqe) == T3_READ_RESP)) &&
-		    (CQE_QPID(*cqe) == wq->qpid))
-			(*count)++;
-		ptr++;
-	}
-	PDBG("%s cq %p count %d\n", __FUNCTION__, cq, *count);
-}
-
-void cxio_count_rcqes(struct t3_cq *cq, struct t3_wq *wq, int *count)
-{
-	struct t3_cqe *cqe;
-	u32 ptr;
-
-	*count = 0;
-	PDBG("%s count zero %d\n", __FUNCTION__, *count);
-	ptr = cq->sw_rptr;
-	while (!Q_EMPTY(ptr, cq->sw_wptr)) {
-		cqe = cq->sw_queue + (Q_PTR2IDX(ptr, cq->size_log2));
-		if (RQ_TYPE(*cqe) && (CQE_OPCODE(*cqe) != T3_READ_RESP) &&
-		    (CQE_QPID(*cqe) == wq->qpid) && cqe_completes_wr(cqe, wq))
-			(*count)++;
-		ptr++;
-	}
-	PDBG("%s cq %p count %d\n", __FUNCTION__, cq, *count);
-}
-
-static int cxio_hal_init_ctrl_cq(struct cxio_rdev *rdev_p)
-{
-	struct rdma_cq_setup setup;
-	setup.id = 0;
-	setup.base_addr = 0;	/* NULL address */
-	setup.size = 1;		/* enable the CQ */
-	setup.credits = 0;
-
-	/* force SGE to redirect to RspQ and interrupt */
-	setup.credit_thres = 0;
-	setup.ovfl_mode = 1;
-	return (rdev_p->t3cdev_p->ctl(rdev_p->t3cdev_p, RDMA_CQ_SETUP, &setup));
-}
-
-static int cxio_hal_init_ctrl_qp(struct cxio_rdev *rdev_p)
-{
-	int err;
-	u64 sge_cmd, ctx0, ctx1;
-	u64 base_addr;
-	struct t3_modify_qp_wr *wqe;
-	struct sk_buff *skb = alloc_skb(sizeof(*wqe), GFP_KERNEL);
-
-
-	if (!skb) {
-		PDBG("%s alloc_skb failed\n", __FUNCTION__);
-		return -ENOMEM;
-	}
-	err = cxio_hal_init_ctrl_cq(rdev_p);
-	if (err) {
-		PDBG("%s err %d initializing ctrl_cq\n", __FUNCTION__, err);
-		return err;
-	}
-	rdev_p->ctrl_qp.workq = dma_alloc_coherent(
-					&(rdev_p->rnic_info.pdev->dev),
-					(1 << T3_CTRL_QP_SIZE_LOG2) *
-					sizeof(union t3_wr),
-					&(rdev_p->ctrl_qp.dma_addr),
-					GFP_KERNEL);
-	if (!rdev_p->ctrl_qp.workq) {
-		PDBG("%s dma_alloc_coherent failed\n", __FUNCTION__);
-		return -ENOMEM;
-	}
-	pci_unmap_addr_set(&rdev_p->ctrl_qp, mapping,
-			   rdev_p->ctrl_qp.dma_addr);
-	rdev_p->ctrl_qp.doorbell = (void __iomem *)rdev_p->rnic_info.kdb_addr;
-	memset(rdev_p->ctrl_qp.workq, 0,
-	       (1 << T3_CTRL_QP_SIZE_LOG2) * sizeof(union t3_wr));
-
-	init_MUTEX(&rdev_p->ctrl_qp.sem);
-	init_waitqueue_head(&rdev_p->ctrl_qp.waitq);
-
-	/* update HW Ctrl QP context */
-	base_addr = rdev_p->ctrl_qp.dma_addr;
-	base_addr >>= 12;
-	ctx0 = (V_EC_SIZE((1 << T3_CTRL_QP_SIZE_LOG2)) |
-		V_EC_BASE_LO((u32) base_addr & 0xffff));
-	ctx0 <<= 32;
-	ctx0 |= V_EC_CREDITS(FW_WR_NUM);
-	base_addr >>= 16;
-	ctx1 = (u32) base_addr;
-	base_addr >>= 32;
-	ctx1 |= ((u64) (V_EC_BASE_HI((u32) base_addr & 0xf) | V_EC_RESPQ(0) |
-			V_EC_TYPE(0) | V_EC_GEN(1) |
-			V_EC_UP_TOKEN(T3_CTL_QP_TID) | F_EC_VALID)) << 32;
-	wqe = (struct t3_modify_qp_wr *) skb_put(skb, sizeof(*wqe));
-	memset(wqe, 0, sizeof(*wqe));
-	build_fw_riwrh((struct fw_riwrh *) wqe, T3_WR_QP_MOD, 0, 1,
-		       T3_CTL_QP_TID, 7);
-	wqe->flags = cpu_to_be32(MODQP_WRITE_EC);
-	sge_cmd = (3ULL << 56) | FW_RI_SGEEC_START << 8 | 3;
-	wqe->sge_cmd = cpu_to_be64(sge_cmd);
-	wqe->ctx1 = cpu_to_be64(ctx1);
-	wqe->ctx0 = cpu_to_be64(ctx0);
-	PDBG("CtrlQP dma_addr 0x%llx workq %p size %d\n",
-	     (u64) rdev_p->ctrl_qp.dma_addr, rdev_p->ctrl_qp.workq,
-	     1 << T3_CTRL_QP_SIZE_LOG2);
-	skb->priority = CPL_PRIORITY_CONTROL;
-	return (cxgb3_ofld_send(rdev_p->t3cdev_p, skb));
-}
-
-static int cxio_hal_destroy_ctrl_qp(struct cxio_rdev *rdev_p)
-{
-	dma_free_coherent(&(rdev_p->rnic_info.pdev->dev),
-			  (1UL << T3_CTRL_QP_SIZE_LOG2)
-			  * sizeof(union t3_wr), rdev_p->ctrl_qp.workq,
-			  pci_unmap_addr(&rdev_p->ctrl_qp, mapping));
-	return cxio_hal_clear_qp_ctx(rdev_p, T3_CTRL_QP_ID);
-}
-
-/* write len bytes of data into addr (32B aligned address)
- * If data is NULL, clear len byte of memory to zero.
- * caller aquires the sem before the call
- */
-static int cxio_hal_ctrl_qp_write_mem(struct cxio_rdev *rdev_p, u32 addr,
-				      u32 len, void *data, int completion)
-{
-	u32 i, nr_wqe, copy_len;
-	u8 *copy_data;
-	u8 wr_len, utx_len;	/* lenght in 8 byte flit */
-	enum t3_wr_flags flag;
-	__be64 *wqe;
-	u64 utx_cmd;
-	addr &= 0x7FFFFFF;
-	nr_wqe = len % 96 ? len / 96 + 1 : len / 96;	/* 96B max per WQE */
-	PDBG("%s wptr 0x%x rptr 0x%x len %d, nr_wqe %d data %p addr 0x%0x\n",
-	     __FUNCTION__, rdev_p->ctrl_qp.wptr, rdev_p->ctrl_qp.rptr, len,
-	     nr_wqe, data, addr);
-	utx_len = 3;		/* in 32B unit */
-	for (i = 0; i < nr_wqe; i++) {
-		if (Q_FULL(rdev_p->ctrl_qp.rptr, rdev_p->ctrl_qp.wptr,
-		           T3_CTRL_QP_SIZE_LOG2)) {
-			PDBG("%s ctrl_qp full wtpr 0x%0x rptr 0x%0x, "
-			     "wait for more space i %d\n", __FUNCTION__,
-			     rdev_p->ctrl_qp.wptr, rdev_p->ctrl_qp.rptr, i);
-			if (wait_event_interruptible(rdev_p->ctrl_qp.waitq,
-					     !Q_FULL(rdev_p->ctrl_qp.rptr,
-						     rdev_p->ctrl_qp.wptr,
-						     T3_CTRL_QP_SIZE_LOG2))) {
-				PDBG("%s ctrl_qp workq interrupted\n",
-				     __FUNCTION__);
-				return -ERESTARTSYS;
-			}
-			PDBG("%s ctrl_qp wakeup, continue posting work request "
-			     "i %d\n", __FUNCTION__, i);
-		}
-		wqe = (__be64 *)(rdev_p->ctrl_qp.workq + (rdev_p->ctrl_qp.wptr %
-						(1 << T3_CTRL_QP_SIZE_LOG2)));
-		flag = 0;
-		if (i == (nr_wqe - 1)) {
-			/* last WQE */
-			flag = completion ? T3_COMPLETION_FLAG : 0;
-			if (len % 32)
-				utx_len = len / 32 + 1;
-			else
-				utx_len = len / 32;
-		}
-
-		/*
-		 * Force a CQE to return the credit to the workq in case
-		 * we posted more than half the max QP size of WRs
-		 */
-		if ((i != 0) &&
-		    (i % (((1 << T3_CTRL_QP_SIZE_LOG2)) >> 1) == 0)) {
-			flag = T3_COMPLETION_FLAG;
-			PDBG("%s force completion at i %d\n", __FUNCTION__, i);
-		}
-
-		/* build the utx mem command */
-		wqe += (sizeof(struct t3_bypass_wr) >> 3);
-		utx_cmd = (T3_UTX_MEM_WRITE << 28) | (addr + i * 3);
-		utx_cmd <<= 32;
-		utx_cmd |= (utx_len << 28) | ((utx_len << 2) + 1);
-		*wqe = cpu_to_be64(utx_cmd);
-		wqe++;
-		copy_data = (u8 *) data + i * 96;
-		copy_len = len > 96 ? 96 : len;
-
-		/* clear memory content if data is NULL */
-		if (data)
-			memcpy(wqe, copy_data, copy_len);
-		else
-			memset(wqe, 0, copy_len);
-		if (copy_len % 32)
-			memset(((u8 *) wqe) + copy_len, 0,
-			       32 - (copy_len % 32));
-		wr_len = ((sizeof(struct t3_bypass_wr)) >> 3) + 1 +
-			 (utx_len << 2);
-		wqe = (__be64 *)(rdev_p->ctrl_qp.workq + (rdev_p->ctrl_qp.wptr %
-			      (1 << T3_CTRL_QP_SIZE_LOG2)));
-
-		/* wptr in the WRID[31:0] */
-		((union t3_wrid *)(wqe+1))->id0.low = rdev_p->ctrl_qp.wptr;
-
-		/*
-		 * This must be the last write with a memory barrier
-		 * for the genbit
-		 */
-		build_fw_riwrh((struct fw_riwrh *) wqe, T3_WR_BP, flag,
-			       Q_GENBIT(rdev_p->ctrl_qp.wptr,
-					T3_CTRL_QP_SIZE_LOG2), T3_CTRL_QP_ID,
-			       wr_len);
-		if (flag == T3_COMPLETION_FLAG)
-			ring_doorbell(rdev_p->ctrl_qp.doorbell, T3_CTRL_QP_ID);
-		len -= 96;
-		rdev_p->ctrl_qp.wptr++;
-	}
-	return 0;
-}
-
-/* IN: stag key, pdid, perm, zbva, to, len, page_size, pbl, and pbl_size
- * OUT: stag index, actual pbl_size, pbl_addr allocated.
- * TBD: shared memory region support
- */
-static int __cxio_tpt_op(struct cxio_rdev *rdev_p, u32 reset_tpt_entry,
-			 u32 *stag, u8 stag_state, u32 pdid,
-			 enum tpt_mem_type type, enum tpt_mem_perm perm,
-			 u32 zbva, u64 to, u32 len, u8 page_size, __be64 *pbl,
-			 u32 *pbl_size, u32 *pbl_addr)
-{
-	int err;
-	struct tpt_entry tpt;
-	u32 stag_idx;
-	u32 wptr;
-	int rereg = (*stag != T3_STAG_UNSET);
-
-	stag_state = stag_state > 0;
-	stag_idx = (*stag) >> 8;
-
-	if ((!reset_tpt_entry) && !(*stag != T3_STAG_UNSET)) {
-		stag_idx = cxio_hal_get_stag(rdev_p->rscp);
-		if (!stag_idx)
-			return -ENOMEM;
-		*stag = (stag_idx << 8) | ((*stag) & 0xFF);
-	}
-	PDBG("%s stag_state 0x%0x type 0x%0x pdid 0x%0x, stag_idx 0x%x\n",
-	     __FUNCTION__, stag_state, type, pdid, stag_idx);
-
-	if (reset_tpt_entry)
-		cxio_hal_pblpool_free(rdev_p, *pbl_addr, *pbl_size << 3);
-	else if (!rereg) {
-		*pbl_addr = cxio_hal_pblpool_alloc(rdev_p, *pbl_size << 3);
-		if (!*pbl_addr) {
-			return -ENOMEM;
-		}
-	}
-
-	down_interruptible(&rdev_p->ctrl_qp.sem);
-
-	/* write PBL first if any - update pbl only if pbl list exist */
-	if (pbl) {
-
-		PDBG("%s *pdb_addr 0x%x, pbl_base 0x%x, pbl_size %d\n",
-		     __FUNCTION__, *pbl_addr, rdev_p->rnic_info.pbl_base,
-		     *pbl_size);
-		err = cxio_hal_ctrl_qp_write_mem(rdev_p,
-				(*pbl_addr >> 5),
-				(*pbl_size << 3), pbl, 0);
-		if (err)
-			goto ret;
-	}
-
-	/* write TPT entry */
-	if (reset_tpt_entry)
-		memset(&tpt, 0, sizeof(tpt));
-	else {
-		tpt.valid_stag_pdid = cpu_to_be32(F_TPT_VALID |
-				V_TPT_STAG_KEY((*stag) & M_TPT_STAG_KEY) |
-				V_TPT_STAG_STATE(stag_state) |
-				V_TPT_STAG_TYPE(type) | V_TPT_PDID(pdid));
-		BUG_ON(page_size >= 28);
-		tpt.flags_pagesize_qpid = cpu_to_be32(V_TPT_PERM(perm) |
-				F_TPT_MW_BIND_ENABLE |
-				V_TPT_ADDR_TYPE((zbva ? TPT_ZBTO : TPT_VATO)) |
-				V_TPT_PAGE_SIZE(page_size));
-		tpt.rsvd_pbl_addr = reset_tpt_entry ? 0 :
-				    cpu_to_be32(V_TPT_PBL_ADDR(PBL_OFF(rdev_p, *pbl_addr)>>3));
-		tpt.len = cpu_to_be32(len);
-		tpt.va_hi = cpu_to_be32((u32) (to >> 32));
-		tpt.va_low_or_fbo = cpu_to_be32((u32) (to & 0xFFFFFFFFULL));
-		tpt.rsvd_bind_cnt_or_pstag = 0;
-		tpt.rsvd_pbl_size = reset_tpt_entry ? 0 :
-				  cpu_to_be32(V_TPT_PBL_SIZE((*pbl_size) >> 2));
-	}
-	err = cxio_hal_ctrl_qp_write_mem(rdev_p,
-				       stag_idx +
-				       (rdev_p->rnic_info.tpt_base >> 5),
-				       sizeof(tpt), &tpt, 1);
-
-	/* release the stag index to free pool */
-	if (reset_tpt_entry)
-		cxio_hal_put_stag(rdev_p->rscp, stag_idx);
-ret:
-	wptr = rdev_p->ctrl_qp.wptr;
-	up(&rdev_p->ctrl_qp.sem);
-	if (!err)
-		if (wait_event_interruptible(rdev_p->ctrl_qp.waitq,
-					     SEQ32_GE(rdev_p->ctrl_qp.rptr,
-						      wptr)))
-			return -ERESTARTSYS;
-	return err;
-}
-
-/* IN : stag key, pdid, pbl_size
- * Out: stag index, actaul pbl_size, and pbl_addr allocated.
- */
-int cxio_allocate_stag(struct cxio_rdev *rdev_p, u32 * stag, u32 pdid,
-		       enum tpt_mem_perm perm, u32 * pbl_size, u32 * pbl_addr)
-{
-	*stag = T3_STAG_UNSET;
-	return (__cxio_tpt_op(rdev_p, 0, stag, 0, pdid, TPT_NON_SHARED_MR,
-			      perm, 0, 0ULL, 0, 0, NULL, pbl_size, pbl_addr));
-}
-
-int cxio_register_phys_mem(struct cxio_rdev *rdev_p, u32 *stag, u32 pdid,
-			   enum tpt_mem_perm perm, u32 zbva, u64 to, u32 len,
-			   u8 page_size, __be64 *pbl, u32 *pbl_size,
-			   u32 *pbl_addr)
-{
-	*stag = T3_STAG_UNSET;
-	return __cxio_tpt_op(rdev_p, 0, stag, 1, pdid, TPT_NON_SHARED_MR, perm,
-			     zbva, to, len, page_size, pbl, pbl_size, pbl_addr);
-}
-
-int cxio_reregister_phys_mem(struct cxio_rdev *rdev_p, u32 *stag, u32 pdid,
-			   enum tpt_mem_perm perm, u32 zbva, u64 to, u32 len,
-			   u8 page_size, __be64 *pbl, u32 *pbl_size,
-			   u32 *pbl_addr)
-{
-	return __cxio_tpt_op(rdev_p, 0, stag, 1, pdid, TPT_NON_SHARED_MR, perm,
-			     zbva, to, len, page_size, pbl, pbl_size, pbl_addr);
-}
-
-int cxio_dereg_mem(struct cxio_rdev *rdev_p, u32 stag, u32 pbl_size,
-		   u32 pbl_addr)
-{
-	return __cxio_tpt_op(rdev_p, 1, &stag, 0, 0, 0, 0, 0, 0ULL, 0, 0, NULL,
-			     &pbl_size, &pbl_addr);
-}
-
-int cxio_allocate_window(struct cxio_rdev *rdev_p, u32 * stag, u32 pdid)
-{
-	u32 pbl_size = 0;
-	*stag = T3_STAG_UNSET;
-	return __cxio_tpt_op(rdev_p, 0, stag, 0, pdid, TPT_MW, 0, 0, 0ULL, 0, 0,
-			     NULL, &pbl_size, NULL);
-}
-
-int cxio_deallocate_window(struct cxio_rdev *rdev_p, u32 stag)
-{
-	return __cxio_tpt_op(rdev_p, 1, &stag, 0, 0, 0, 0, 0, 0ULL, 0, 0, NULL,
-			     NULL, NULL);
-}
-
-int cxio_rdma_init(struct cxio_rdev *rdev_p, struct t3_rdma_init_attr *attr)
-{
-	struct t3_rdma_init_wr *wqe;
-	struct sk_buff *skb = alloc_skb(sizeof(*wqe), GFP_ATOMIC);
-	if (!skb)
-		return -ENOMEM;
-	PDBG("%s rdev_p %p\n", __FUNCTION__, rdev_p);
-	wqe = (struct t3_rdma_init_wr *) __skb_put(skb, sizeof(*wqe));
-	wqe->wrh.op_seop_flags = cpu_to_be32(V_FW_RIWR_OP(T3_WR_INIT));
-	wqe->wrh.gen_tid_len = cpu_to_be32(V_FW_RIWR_TID(attr->tid) |
-					   V_FW_RIWR_LEN(sizeof(*wqe) >> 3));
-	wqe->wrid.id1 = 0;
-	wqe->qpid = cpu_to_be32(attr->qpid);
-	wqe->pdid = cpu_to_be32(attr->pdid);
-	wqe->scqid = cpu_to_be32(attr->scqid);
-	wqe->rcqid = cpu_to_be32(attr->rcqid);
-	wqe->rq_addr = cpu_to_be32(attr->rq_addr - rdev_p->rnic_info.rqt_base);
-	wqe->rq_size = cpu_to_be32(attr->rq_size);
-	wqe->mpaattrs = attr->mpaattrs;
-	wqe->qpcaps = attr->qpcaps;
-	wqe->ulpdu_size = cpu_to_be16(attr->tcp_emss);
-	wqe->flags = cpu_to_be32(attr->flags);
-	wqe->ord = cpu_to_be32(attr->ord);
-	wqe->ird = cpu_to_be32(attr->ird);
-	wqe->qp_dma_addr = cpu_to_be64(attr->qp_dma_addr);
-	wqe->qp_dma_size = cpu_to_be32(attr->qp_dma_size);
-	wqe->rsvd = 0;
-	skb->priority = 0;	/* 0=>ToeQ; 1=>CtrlQ */
-	return (cxgb3_ofld_send(rdev_p->t3cdev_p, skb));
-}
-
-void cxio_register_ev_cb(cxio_hal_ev_callback_func_t ev_cb)
-{
-	cxio_ev_cb = ev_cb;
-}
-
-void cxio_unregister_ev_cb(cxio_hal_ev_callback_func_t ev_cb)
-{
-	cxio_ev_cb = NULL;
-}
-
-static int cxio_hal_ev_handler(struct t3cdev *t3cdev_p, struct sk_buff *skb)
-{
-	static int cnt;
-	struct cxio_rdev *rdev_p = NULL;
-	struct respQ_msg_t *rsp_msg = (struct respQ_msg_t *) skb->data;
-	PDBG("%d: %s cq_id 0x%x cq_ptr 0x%x genbit %0x overflow %0x an %0x"
-	     " se %0x notify %0x cqbranch %0x creditth %0x\n",
-	     cnt, __FUNCTION__, RSPQ_CQID(rsp_msg), RSPQ_CQPTR(rsp_msg),
-	     RSPQ_GENBIT(rsp_msg), RSPQ_OVERFLOW(rsp_msg), RSPQ_AN(rsp_msg),
-	     RSPQ_SE(rsp_msg), RSPQ_NOTIFY(rsp_msg), RSPQ_CQBRANCH(rsp_msg),
-	     RSPQ_CREDIT_THRESH(rsp_msg));
-	PDBG("CQE: QPID 0x%0x genbit %0x type 0x%0x status 0x%0x opcode %d "
-	     "len 0x%0x wrid_hi_stag 0x%x wrid_low_msn 0x%x\n",
-	     CQE_QPID(rsp_msg->cqe), CQE_GENBIT(rsp_msg->cqe),
-	     CQE_TYPE(rsp_msg->cqe), CQE_STATUS(rsp_msg->cqe),
-	     CQE_OPCODE(rsp_msg->cqe), CQE_LEN(rsp_msg->cqe),
-	     CQE_WRID_HI(rsp_msg->cqe), CQE_WRID_LOW(rsp_msg->cqe));
-	rdev_p = (struct cxio_rdev *)t3cdev_p->ulp;
-	if (!rdev_p) {
-		PDBG("%s called by t3cdev %p with null ulp\n", __FUNCTION__,
-		     t3cdev_p);
-		return 0;
-	}
-	if (CQE_QPID(rsp_msg->cqe) == T3_CTRL_QP_ID) {
-		rdev_p->ctrl_qp.rptr = CQE_WRID_LOW(rsp_msg->cqe) + 1;
-		wake_up_interruptible(&rdev_p->ctrl_qp.waitq);
-		dev_kfree_skb_irq(skb);
-	} else if (CQE_QPID(rsp_msg->cqe) == 0xfff8)
-		dev_kfree_skb_irq(skb);
-	else if (cxio_ev_cb)
-		(*cxio_ev_cb) (rdev_p, skb);
-	else
-		dev_kfree_skb_irq(skb);
-	cnt++;
-	return 0;
-}
-
-/* Caller takes care of locking if needed */
-int cxio_rdev_open(struct cxio_rdev *rdev_p)
-{
-	struct net_device *netdev_p = NULL;
-	int err = 0;
-	if (strlen(rdev_p->dev_name)) {
-		if (cxio_hal_find_rdev_by_name(rdev_p->dev_name)) {
-			return -EBUSY;
-		}
-		netdev_p = dev_get_by_name(rdev_p->dev_name);
-		if (!netdev_p) {
-			return -EINVAL;
-		}
-		dev_put(netdev_p);
-	} else if (rdev_p->t3cdev_p) {
-		if (cxio_hal_find_rdev_by_t3cdev(rdev_p->t3cdev_p)) {
-			return -EBUSY;
-		}
-		netdev_p = rdev_p->t3cdev_p->lldev;
-		strncpy(rdev_p->dev_name, rdev_p->t3cdev_p->name,
-			T3_MAX_DEV_NAME_LEN);
-	} else {
-		PDBG("%s t3cdev_p or dev_name must be set\n", __FUNCTION__);
-		return -EINVAL;
-	}
-
-	list_add_tail(&rdev_p->entry, &rdev_list);
-
-	PDBG("%s opening rnic dev %s\n", __FUNCTION__, rdev_p->dev_name);
-	memset(&rdev_p->ctrl_qp, 0, sizeof(rdev_p->ctrl_qp));
-	if (!rdev_p->t3cdev_p)
-		rdev_p->t3cdev_p = T3CDEV(netdev_p);
-	rdev_p->t3cdev_p->ulp = (void *) rdev_p;
-	err = rdev_p->t3cdev_p->ctl(rdev_p->t3cdev_p, RDMA_GET_PARAMS,
-					 &(rdev_p->rnic_info));
-	if (err) {
-		printk(KERN_ERR "%s t3cdev_p(%p)->ctl returned error %d.\n",
-		     __FUNCTION__, rdev_p->t3cdev_p, err);
-		goto err1;
-	}
-	err = rdev_p->t3cdev_p->ctl(rdev_p->t3cdev_p, GET_PORTS,
-				    &(rdev_p->port_info));
-	if (err) {
-		printk(KERN_ERR "%s t3cdev_p(%p)->ctl returned error %d.\n",
-		     __FUNCTION__, rdev_p->t3cdev_p, err);
-		goto err1;
-	}
-
-	/*
-	 * qpshift is the number of bits to shift the qpid left in order
-	 * to get the correct address of the doorbell for that qp.
-	 */
-	cxio_init_ucontext(rdev_p, &rdev_p->uctx);
-	rdev_p->qpshift = PAGE_SHIFT -
-			  ilog2(65536 >>
-			            ilog2(rdev_p->rnic_info.udbell_len >>
-					      PAGE_SHIFT));
-	rdev_p->qpnr = rdev_p->rnic_info.udbell_len >> PAGE_SHIFT;
-	rdev_p->qpmask = (65536 >> ilog2(rdev_p->qpnr)) - 1;
-	PDBG("%s rnic %s info: tpt_base 0x%0x tpt_top 0x%0x num stags %d "
-	     "pbl_base 0x%0x pbl_top 0x%0x rqt_base 0x%0x, rqt_top 0x%0x\n",
-	     __FUNCTION__, rdev_p->dev_name, rdev_p->rnic_info.tpt_base,
-	     rdev_p->rnic_info.tpt_top, cxio_num_stags(rdev_p),
-	     rdev_p->rnic_info.pbl_base,
-	     rdev_p->rnic_info.pbl_top, rdev_p->rnic_info.rqt_base,
-	     rdev_p->rnic_info.rqt_top);
-	PDBG("udbell_len 0x%0x udbell_physbase 0x%lx kdb_addr %p qpshift %lu "
-	     "qpnr %d qpmask 0x%x\n",
-	     rdev_p->rnic_info.udbell_len,
-	     rdev_p->rnic_info.udbell_physbase, rdev_p->rnic_info.kdb_addr,
-	     rdev_p->qpshift, rdev_p->qpnr, rdev_p->qpmask);
-
-	err = cxio_hal_init_ctrl_qp(rdev_p);
-	if (err) {
-		printk(KERN_ERR "%s error %d initializing ctrl_qp.\n",
-		       __FUNCTION__, err);
-		goto err1;
-	}
-	err = cxio_hal_init_resource(rdev_p, cxio_num_stags(rdev_p), 0,
-				     0, T3_MAX_NUM_QP, T3_MAX_NUM_CQ,
-				     T3_MAX_NUM_PD);
-	if (err) {
-		printk(KERN_ERR "%s error %d initializing hal resources.\n",
-		       __FUNCTION__, err);
-		goto err2;
-	}
-	err = cxio_hal_pblpool_create(rdev_p);
-	if (err) {
-		printk(KERN_ERR "%s error %d initializing pbl mem pool.\n",
-		       __FUNCTION__, err);
-		goto err3;
-	}
-	err = cxio_hal_rqtpool_create(rdev_p);
-	if (err) {
-		printk(KERN_ERR "%s error %d initializing rqt mem pool.\n",
-		       __FUNCTION__, err);
-		goto err4;
-	}
-	return 0;
-err4:
-	cxio_hal_pblpool_destroy(rdev_p);
-err3:
-	cxio_hal_destroy_resource(rdev_p->rscp);
-err2:
-	cxio_hal_destroy_ctrl_qp(rdev_p);
-err1:
-	list_del(&rdev_p->entry);
-	return err;
-}
-
-void cxio_rdev_close(struct cxio_rdev *rdev_p)
-{
-	if (rdev_p) {
-		cxio_hal_pblpool_destroy(rdev_p);
-		cxio_hal_rqtpool_destroy(rdev_p);
-		list_del(&rdev_p->entry);
-		rdev_p->t3cdev_p->ulp = NULL;
-		cxio_hal_destroy_ctrl_qp(rdev_p);
-		cxio_hal_destroy_resource(rdev_p->rscp);
-	}
-}
-
-int __init cxio_hal_init(void)
-{
-	if (cxio_hal_init_rhdl_resource(T3_MAX_NUM_RI))
-		return -ENOMEM;
-	t3_register_cpl_handler(CPL_ASYNC_NOTIF, cxio_hal_ev_handler);
-	return 0;
-}
-
-void __exit cxio_hal_exit(void)
-{
-	struct cxio_rdev *rdev, *tmp;
-
-	t3_register_cpl_handler(CPL_ASYNC_NOTIF, NULL);
-	list_for_each_entry_safe(rdev, tmp, &rdev_list, entry)
-		cxio_rdev_close(rdev);
-	cxio_hal_destroy_rhdl_resource();
-}
-
-static inline void flush_completed_wrs(struct t3_wq *wq, struct t3_cq *cq)
-{
-	struct t3_swsq *sqp;
-	__u32 ptr = wq->sq_rptr;
-	int count = Q_COUNT(wq->sq_rptr, wq->sq_wptr);
-
-	sqp = wq->sq + Q_PTR2IDX(ptr, wq->sq_size_log2);
-	while (count--)
-		if (!sqp->signaled) {
-			ptr++;
-			sqp = wq->sq + Q_PTR2IDX(ptr,  wq->sq_size_log2);
-		} else if (sqp->complete) {
-
-			/*
-			 * Insert this completed cqe into the swcq.
-			 */
-			PDBG("%s moving cqe into swcq sq idx %ld cq idx %ld\n",
-			     __FUNCTION__, Q_PTR2IDX(ptr,  wq->sq_size_log2),
-			     Q_PTR2IDX(cq->sw_wptr, cq->size_log2));
-			sqp->cqe.header |= htonl(V_CQE_SWCQE(1));
-			*(cq->sw_queue + Q_PTR2IDX(cq->sw_wptr, cq->size_log2))
-				= sqp->cqe;
-			cq->sw_wptr++;
-			sqp->signaled = 0;
-			break;
-		} else
-			break;
-}
-
-static inline void create_read_req_cqe(struct t3_wq *wq,
-				       struct t3_cqe *hw_cqe,
-				       struct t3_cqe *read_cqe)
-{
-	read_cqe->u.scqe.wrid_hi = wq->oldest_read->sq_wptr;
-	read_cqe->len = wq->oldest_read->read_len;
-	read_cqe->header = htonl(V_CQE_QPID(CQE_QPID(*hw_cqe)) |
-				 V_CQE_SWCQE(SW_CQE(*hw_cqe)) |
-				 V_CQE_OPCODE(T3_READ_REQ) |
-				 V_CQE_TYPE(1));
-}
-
-/*
- * Return a ptr to the next read wr in the SWSQ or NULL.
- */
-static inline void advance_oldest_read(struct t3_wq *wq)
-{
-
-	u32 rptr = wq->oldest_read - wq->sq + 1;
-	u32 wptr = Q_PTR2IDX(wq->sq_wptr, wq->sq_size_log2);
-
-	while (Q_PTR2IDX(rptr, wq->sq_size_log2) != wptr) {
-		wq->oldest_read = wq->sq + Q_PTR2IDX(rptr, wq->sq_size_log2);
-
-		if (wq->oldest_read->opcode == T3_READ_REQ)
-			return;
-		rptr++;
-	}
-	wq->oldest_read = NULL;
-}
-
-/*
- * cxio_poll_cq
- *
- * Caller must:
- *     check the validity of the first CQE,
- *     supply the wq assicated with the qpid.
- *
- * credit: cq credit to return to sge.
- * cqe_flushed: 1 iff the CQE is flushed.
- * cqe: copy of the polled CQE.
- *
- * return value:
- *     0       CQE returned,
- *    -1       CQE skipped, try again.
- */
-int cxio_poll_cq(struct t3_wq *wq, struct t3_cq *cq, struct t3_cqe *cqe,
-		     u8 *cqe_flushed, u64 *cookie, u32 *credit)
-{
-	int ret = 0;
-	struct t3_cqe *hw_cqe, read_cqe;
-
-	*cqe_flushed = 0;
-	*credit = 0;
-	hw_cqe = cxio_next_cqe(cq);
-
-	PDBG("%s CQE OOO %d qpid 0x%0x genbit %d type %d status 0x%0x"
-	     " opcode 0x%0x len 0x%0x wrid_hi_stag 0x%x wrid_low_msn 0x%x\n",
-	     __FUNCTION__, CQE_OOO(*hw_cqe), CQE_QPID(*hw_cqe),
-	     CQE_GENBIT(*hw_cqe), CQE_TYPE(*hw_cqe), CQE_STATUS(*hw_cqe),
-	     CQE_OPCODE(*hw_cqe), CQE_LEN(*hw_cqe), CQE_WRID_HI(*hw_cqe),
-	     CQE_WRID_LOW(*hw_cqe));
-
-	/*
-	 * skip cqe's not affiliated with a QP.
-	 */
-	if (wq == NULL) {
-		ret = -1;
-		goto skip_cqe;
-	}
-
-	/*
-	 * Gotta tweak READ completions:
-	 *	1) the cqe doesn't contain the sq_wptr from the wr.
-	 *	2) opcode not reflected from the wr.
-	 *	3) read_len not reflected from the wr.
-	 *	4) cq_type is RQ_TYPE not SQ_TYPE.
-	 */
-	if (RQ_TYPE(*hw_cqe) && (CQE_OPCODE(*hw_cqe) == T3_READ_RESP)) {
-
-		/*
-		 * Don't write to the HWCQ, so create a new read req CQE
-		 * in local memory.
-		 */
-		create_read_req_cqe(wq, hw_cqe, &read_cqe);
-		hw_cqe = &read_cqe;
-		advance_oldest_read(wq);
-	}
-
-	/*
-	 * T3A: Discard TERMINATE CQEs.
-	 */
-	if (CQE_OPCODE(*hw_cqe) == T3_TERMINATE) {
-		ret = -1;
-		wq->error = 1;
-		goto skip_cqe;
-	}
-
-	if (CQE_STATUS(*hw_cqe) || wq->error) {
-		*cqe_flushed = wq->error;
-		wq->error = 1;
-
-		/*
-		 * T3A inserts errors into the CQE.  We cannot return
-		 * these as work completions.
-		 */
-		/* incoming write failures */
-		if ((CQE_OPCODE(*hw_cqe) == T3_RDMA_WRITE)
-		     && RQ_TYPE(*hw_cqe)) {
-			ret = -1;
-			goto skip_cqe;
-		}
-		/* incoming read request failures */
-		if ((CQE_OPCODE(*hw_cqe) == T3_READ_RESP) && SQ_TYPE(*hw_cqe)) {
-			ret = -1;
-			goto skip_cqe;
-		}
-
-		/* incoming SEND with no receive posted failures */
-		if ((CQE_OPCODE(*hw_cqe) == T3_SEND) && RQ_TYPE(*hw_cqe) &&
-		    Q_EMPTY(wq->rq_rptr, wq->rq_wptr)) {
-			ret = -1;
-			goto skip_cqe;
-		}
-		goto proc_cqe;
-	}
-
-	/*
-	 * RECV completion.
-	 */
-	if (RQ_TYPE(*hw_cqe)) {
-
-		/*
-		 * HW only validates 4 bits of MSN.  So we must validate that
-		 * the MSN in the SEND is the next expected MSN.  If its not,
-		 * then we complete this with TPT_ERR_MSN and mark the wq in
-		 * error.
-		 */
-		if (unlikely((CQE_WRID_MSN(*hw_cqe) != (wq->rq_rptr + 1)))) {
-			wq->error = 1;
-			hw_cqe->header |= htonl(V_CQE_STATUS(TPT_ERR_MSN));
-			goto proc_cqe;
-		}
-		goto proc_cqe;
-	}
-
-	/*
-	 * If we get here its a send completion.
-	 *
-	 * Handle out of order completion. These get stuffed
-	 * in the SW SQ. Then the SW SQ is walked to move any
-	 * now in-order completions into the SW CQ.  This handles
-	 * 2 cases:
-	 *	1) reaping unsignaled WRs when the first subsequent
-	 *	   signaled WR is completed.
-	 *	2) out of order read completions.
-	 */
-	if (!SW_CQE(*hw_cqe) && (CQE_WRID_SQ_WPTR(*hw_cqe) != wq->sq_rptr)) {
-		struct t3_swsq *sqp;
-
-		PDBG("%s out of order completion going in swsq at idx %ld\n",
-		     __FUNCTION__,
-		     Q_PTR2IDX(CQE_WRID_SQ_WPTR(*hw_cqe), wq->sq_size_log2));
-		sqp = wq->sq +
-		      Q_PTR2IDX(CQE_WRID_SQ_WPTR(*hw_cqe), wq->sq_size_log2);
-		sqp->cqe = *hw_cqe;
-		sqp->complete = 1;
-		ret = -1;
-		goto flush_wq;
-	}
-
-proc_cqe:
-	*cqe = *hw_cqe;
-
-	/*
-	 * Reap the associated WR(s) that are freed up with this
-	 * completion.
-	 */
-	if (SQ_TYPE(*hw_cqe)) {
-		wq->sq_rptr = CQE_WRID_SQ_WPTR(*hw_cqe);
-		PDBG("%s completing sq idx %ld\n", __FUNCTION__,
-		     Q_PTR2IDX(wq->sq_rptr, wq->sq_size_log2));
-		*cookie = (wq->sq +
-			   Q_PTR2IDX(wq->sq_rptr, wq->sq_size_log2))->wr_id;
-		wq->sq_rptr++;
-	} else {
-		PDBG("%s completing rq idx %ld\n", __FUNCTION__,
-		     Q_PTR2IDX(wq->rq_rptr, wq->rq_size_log2));
-		*cookie = *(wq->rq + Q_PTR2IDX(wq->rq_rptr, wq->rq_size_log2));
-		wq->rq_rptr++;
-	}
-
-flush_wq:
-	/*
-	 * Flush any completed cqes that are now in-order.
-	 */
-	flush_completed_wrs(wq, cq);
-
-skip_cqe:
-	if (SW_CQE(*hw_cqe)) {
-		PDBG("%s cq %p cqid 0x%x skip sw cqe sw_rptr 0x%x\n",
-		     __FUNCTION__, cq, cq->cqid, cq->sw_rptr);
-		++cq->sw_rptr;
-	} else {
-		PDBG("%s cq %p cqid 0x%x skip hw cqe rptr 0x%x\n",
-		     __FUNCTION__, cq, cq->cqid, cq->rptr);
-		++cq->rptr;
-
-		/*
-		 * T3A: compute credits.
-		 */
-		if (((cq->rptr - cq->wptr) > (1 << (cq->size_log2 - 1)))
-		    || ((cq->rptr - cq->wptr) >= 128)) {
-			*credit = cq->rptr - cq->wptr;
-			cq->wptr = cq->rptr;
-		}
-	}
-	return ret;
-}
diff --git a/drivers/infiniband/hw/cxgb3/core/cxio_hal.h b/drivers/infiniband/hw/cxgb3/core/cxio_hal.h
deleted file mode 100644
index 8fb2999..0000000
--- a/drivers/infiniband/hw/cxgb3/core/cxio_hal.h
+++ /dev/null
@@ -1,201 +0,0 @@
-/*
- * Copyright (c) 2006 Chelsio, Inc. All rights reserved.
- * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved.
- *
- * This software is available to you under a choice of one of two
- * licenses.  You may choose to be licensed under the terms of the GNU
- * General Public License (GPL) Version 2, available from the file
- * COPYING in the main directory of this source tree, or the
- * OpenIB.org BSD license below:
- *
- *     Redistribution and use in source and binary forms, with or
- *     without modification, are permitted provided that the following
- *     conditions are met:
- *
- *      - Redistributions of source code must retain the above
- *        copyright notice, this list of conditions and the following
- *        disclaimer.
- *
- *      - Redistributions in binary form must reproduce the above
- *        copyright notice, this list of conditions and the following
- *        disclaimer in the documentation and/or other materials
- *        provided with the distribution.
- *
- * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
- * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
- * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
- * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
- * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
- * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
- * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
- * SOFTWARE.
- */
-#ifndef  __CXIO_HAL_H__
-#define  __CXIO_HAL_H__
-
-#include <linux/list.h>
-#include <linux/mutex.h>
-
-#include "t3_cpl.h"
-#include "t3cdev.h"
-#include "cxgb3_ctl_defs.h"
-#include "cxio_wr.h"
-
-#define T3_CTRL_QP_ID    FW_RI_SGEEC_START
-#define T3_CTL_QP_TID	 FW_RI_TID_START
-#define T3_CTRL_QP_SIZE_LOG2  8
-#define T3_CTRL_CQ_ID    0
-
-/* TBD */
-#define T3_MAX_NUM_RI (1<<15)
-#define T3_MAX_NUM_QP (1<<15)
-#define T3_MAX_NUM_CQ (1<<15)
-#define T3_MAX_NUM_PD (1<<15)
-#define T3_MAX_PBL_SIZE 256
-#define T3_MAX_RQ_SIZE 1024
-#define T3_MAX_NUM_STAG (1<<15)
-
-#define T3_STAG_UNSET 0xffffffff
-
-#define T3_MAX_DEV_NAME_LEN 32
-
-struct cxio_hal_ctrl_qp {
-	u32 wptr;
-	u32 rptr;
-	struct semaphore sem;	/* for the wtpr, can sleep */
-	wait_queue_head_t waitq;	/* wait for RspQ/CQE msg */
-	union t3_wr *workq;	/* the work request queue */
-	dma_addr_t dma_addr;	/* pci bus address of the workq */
-	DECLARE_PCI_UNMAP_ADDR(mapping)
-	void __iomem *doorbell;
-};
-
-struct cxio_hal_resource {
-	struct kfifo *tpt_fifo;
-	spinlock_t tpt_fifo_lock;
-	struct kfifo *qpid_fifo;
-	spinlock_t qpid_fifo_lock;
-	struct kfifo *cqid_fifo;
-	spinlock_t cqid_fifo_lock;
-	struct kfifo *pdid_fifo;
-	spinlock_t pdid_fifo_lock;
-};
-
-struct cxio_qpid_list {
-	struct list_head entry;
-	u32 qpid;
-};
-
-struct cxio_ucontext {
-	struct list_head qpids;
-	struct mutex lock;
-};
-
-struct cxio_rdev {
-	char dev_name[T3_MAX_DEV_NAME_LEN];
-	struct t3cdev *t3cdev_p;
-	struct rdma_info rnic_info;
-	struct adap_ports port_info;
-	struct cxio_hal_resource *rscp;
-	struct cxio_hal_ctrl_qp ctrl_qp;
-	void *ulp;
-	unsigned long qpshift;
-	u32 qpnr;
-	u32 qpmask;
-	struct cxio_ucontext uctx;
-	struct gen_pool *pbl_pool;
-	struct gen_pool *rqt_pool;
-	struct list_head entry;
-};
-
-static inline int cxio_num_stags(struct cxio_rdev *rdev_p)
-{
-	return min((int)T3_MAX_NUM_STAG, (int)((rdev_p->rnic_info.tpt_top - rdev_p->rnic_info.tpt_base) >> 5));
-}
-
-typedef void (*cxio_hal_ev_callback_func_t) (struct cxio_rdev * rdev_p,
-					     struct sk_buff * skb);
-
-#define RSPQ_CQID(rsp) (be32_to_cpu(rsp->cq_ptrid) & 0xffff)
-#define RSPQ_CQPTR(rsp) ((be32_to_cpu(rsp->cq_ptrid) >> 16) & 0xffff)
-#define RSPQ_GENBIT(rsp) ((be32_to_cpu(rsp->flags) >> 16) & 1)
-#define RSPQ_OVERFLOW(rsp) ((be32_to_cpu(rsp->flags) >> 17) & 1)
-#define RSPQ_AN(rsp) ((be32_to_cpu(rsp->flags) >> 18) & 1)
-#define RSPQ_SE(rsp) ((be32_to_cpu(rsp->flags) >> 19) & 1)
-#define RSPQ_NOTIFY(rsp) ((be32_to_cpu(rsp->flags) >> 20) & 1)
-#define RSPQ_CQBRANCH(rsp) ((be32_to_cpu(rsp->flags) >> 21) & 1)
-#define RSPQ_CREDIT_THRESH(rsp) ((be32_to_cpu(rsp->flags) >> 22) & 1)
-
-struct respQ_msg_t {
-	__be32 flags;		/* flit 0 */
-	__be32 cq_ptrid;
-	__be64 rsvd;		/* flit 1 */
-	struct t3_cqe cqe;	/* flits 2-3 */
-};
-
-enum t3_cq_opcode {
-	CQ_ARM_AN = 0x2,
-	CQ_ARM_SE = 0x6,
-	CQ_FORCE_AN = 0x3,
-	CQ_CREDIT_UPDATE = 0x7
-};
-
-int cxio_rdev_open(struct cxio_rdev *rdev);
-void cxio_rdev_close(struct cxio_rdev *rdev);
-int cxio_hal_cq_op(struct cxio_rdev *rdev, struct t3_cq *cq,
-		   enum t3_cq_opcode op, u32 credit);
-int cxio_hal_clear_qp_ctx(struct cxio_rdev *rdev, u32 qpid);
-int cxio_create_cq(struct cxio_rdev *rdev, struct t3_cq *cq);
-int cxio_destroy_cq(struct cxio_rdev *rdev, struct t3_cq *cq);
-int cxio_resize_cq(struct cxio_rdev *rdev, struct t3_cq *cq);
-void cxio_release_ucontext(struct cxio_rdev *rdev, struct cxio_ucontext *uctx);
-void cxio_init_ucontext(struct cxio_rdev *rdev, struct cxio_ucontext *uctx);
-int cxio_create_qp(struct cxio_rdev *rdev, u32 kernel_domain, struct t3_wq *wq,
-		   struct cxio_ucontext *uctx);
-int cxio_destroy_qp(struct cxio_rdev *rdev, struct t3_wq *wq,
-		    struct cxio_ucontext *uctx);
-int cxio_peek_cq(struct t3_wq *wr, struct t3_cq *cq, int opcode);
-int cxio_allocate_stag(struct cxio_rdev *rdev, u32 * stag, u32 pdid,
-		       enum tpt_mem_perm perm, u32 * pbl_size, u32 * pbl_addr);
-int cxio_register_phys_mem(struct cxio_rdev *rdev, u32 * stag, u32 pdid,
-			   enum tpt_mem_perm perm, u32 zbva, u64 to, u32 len,
-			   u8 page_size, __be64 *pbl, u32 *pbl_size,
-			   u32 *pbl_addr);
-int cxio_reregister_phys_mem(struct cxio_rdev *rdev, u32 * stag, u32 pdid,
-			   enum tpt_mem_perm perm, u32 zbva, u64 to, u32 len,
-			   u8 page_size, __be64 *pbl, u32 *pbl_size,
-			   u32 *pbl_addr);
-int cxio_dereg_mem(struct cxio_rdev *rdev, u32 stag, u32 pbl_size,
-		   u32 pbl_addr);
-int cxio_allocate_window(struct cxio_rdev *rdev, u32 * stag, u32 pdid);
-int cxio_deallocate_window(struct cxio_rdev *rdev, u32 stag);
-int cxio_rdma_init(struct cxio_rdev *rdev, struct t3_rdma_init_attr *attr);
-void cxio_register_ev_cb(cxio_hal_ev_callback_func_t ev_cb);
-void cxio_unregister_ev_cb(cxio_hal_ev_callback_func_t ev_cb);
-u32 cxio_hal_get_rhdl(void);
-void cxio_hal_put_rhdl(u32 rhdl);
-u32 cxio_hal_get_pdid(struct cxio_hal_resource *rscp);
-void cxio_hal_put_pdid(struct cxio_hal_resource *rscp, u32 pdid);
-int __init cxio_hal_init(void);
-void __exit cxio_hal_exit(void);
-void cxio_flush_rq(struct t3_wq *wq, struct t3_cq *cq, int count);
-void cxio_flush_sq(struct t3_wq *wq, struct t3_cq *cq, int count);
-void cxio_count_rcqes(struct t3_cq *cq, struct t3_wq *wq, int *count);
-void cxio_count_scqes(struct t3_cq *cq, struct t3_wq *wq, int *count);
-void cxio_flush_hw_cq(struct t3_cq *cq);
-int cxio_poll_cq(struct t3_wq *wq, struct t3_cq *cq, struct t3_cqe *cqe,
-		     u8 *cqe_flushed, u64 *cookie, u32 *credit);
-
-#define MOD "iw_cxgb3: "
-#define PDBG(fmt, args...) pr_debug(MOD fmt, ## args)
-
-#ifdef DEBUG
-void cxio_dump_tpt(struct cxio_rdev *rev, u32 stag);
-void cxio_dump_pbl(struct cxio_rdev *rev, u32 pbl_addr, uint len, u8 shift);
-void cxio_dump_wqe(union t3_wr *wqe);
-void cxio_dump_wce(struct t3_cqe *wce);
-void cxio_dump_rqt(struct cxio_rdev *rdev, u32 hwtid, int nents);
-void cxio_dump_tcb(struct cxio_rdev *rdev, u32 hwtid);
-#endif
-
-#endif
diff --git a/drivers/infiniband/hw/cxgb3/core/cxio_resource.c b/drivers/infiniband/hw/cxgb3/core/cxio_resource.c
deleted file mode 100644
index 997aa32..0000000
--- a/drivers/infiniband/hw/cxgb3/core/cxio_resource.c
+++ /dev/null
@@ -1,331 +0,0 @@
-/*
- * Copyright (c) 2006 Chelsio, Inc. All rights reserved.
- * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved.
- *
- * This software is available to you under a choice of one of two
- * licenses.  You may choose to be licensed under the terms of the GNU
- * General Public License (GPL) Version 2, available from the file
- * COPYING in the main directory of this source tree, or the
- * OpenIB.org BSD license below:
- *
- *     Redistribution and use in source and binary forms, with or
- *     without modification, are permitted provided that the following
- *     conditions are met:
- *
- *      - Redistributions of source code must retain the above
- *        copyright notice, this list of conditions and the following
- *        disclaimer.
- *
- *      - Redistributions in binary form must reproduce the above
- *        copyright notice, this list of conditions and the following
- *        disclaimer in the documentation and/or other materials
- *        provided with the distribution.
- *
- * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
- * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
- * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
- * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
- * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
- * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
- * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
- * SOFTWARE.
- */
-/* Crude resource management */
-#include <linux/kernel.h>
-#include <linux/random.h>
-#include <linux/slab.h>
-#include <linux/kfifo.h>
-#include <linux/spinlock.h>
-#include <linux/errno.h>
-#include "cxio_resource.h"
-#include "cxio_hal.h"
-
-static struct kfifo *rhdl_fifo;
-static spinlock_t rhdl_fifo_lock;
-
-#define RANDOM_SIZE 16
-
-static int __cxio_init_resource_fifo(struct kfifo **fifo,
-				   spinlock_t *fifo_lock,
-				   u32 nr, u32 skip_low,
-				   u32 skip_high,
-				   int random)
-{
-	u32 i, j, entry = 0, idx;
-	u32 random_bytes;
-	u32 rarray[16];
-	spin_lock_init(fifo_lock);
-
-	*fifo = kfifo_alloc(nr * sizeof(u32), GFP_KERNEL, fifo_lock);
-	if (IS_ERR(*fifo))
-		return -ENOMEM;
-
-	for (i = 0; i < skip_low + skip_high; i++)
-		__kfifo_put(*fifo, (unsigned char *) &entry, sizeof(u32));
-	if (random) {
-		j = 0;
-		random_bytes = random32();
-		for (i = 0; i < RANDOM_SIZE; i++)
-			rarray[i] = i + skip_low;
-		for (i = skip_low + RANDOM_SIZE; i < nr - skip_high; i++) {
-			if (j >= RANDOM_SIZE) {
-				j = 0;
-				random_bytes = random32();
-			}
-			idx = (random_bytes >> (j * 2)) & 0xF;
-			__kfifo_put(*fifo,
-				(unsigned char *) &rarray[idx],
-				sizeof(u32));
-			rarray[idx] = i;
-			j++;
-		}
-		for (i = 0; i < RANDOM_SIZE; i++)
-			__kfifo_put(*fifo,
-				(unsigned char *) &rarray[i],
-				sizeof(u32));
-	} else
-		for (i = skip_low; i < nr - skip_high; i++)
-			__kfifo_put(*fifo, (unsigned char *) &i, sizeof(u32));
-
-	for (i = 0; i < skip_low + skip_high; i++)
-		kfifo_get(*fifo, (unsigned char *) &entry, sizeof(u32));
-	return 0;
-}
-
-static int cxio_init_resource_fifo(struct kfifo **fifo, spinlock_t * fifo_lock,
-				   u32 nr, u32 skip_low, u32 skip_high)
-{
-	return (__cxio_init_resource_fifo(fifo, fifo_lock, nr, skip_low,
-					  skip_high, 0));
-}
-
-static int cxio_init_resource_fifo_random(struct kfifo **fifo,
-				   spinlock_t * fifo_lock,
-				   u32 nr, u32 skip_low, u32 skip_high)
-{
-
-	return (__cxio_init_resource_fifo(fifo, fifo_lock, nr, skip_low,
-					  skip_high, 1));
-}
-
-static int cxio_init_qpid_fifo(struct cxio_rdev *rdev_p)
-{
-	u32 i;
-
-	spin_lock_init(&rdev_p->rscp->qpid_fifo_lock);
-
-	rdev_p->rscp->qpid_fifo = kfifo_alloc(T3_MAX_NUM_QP * sizeof(u32),
-					      GFP_KERNEL,
-					      &rdev_p->rscp->qpid_fifo_lock);
-	if (IS_ERR(rdev_p->rscp->qpid_fifo))
-		return -ENOMEM;
-
-	for (i = 16; i < T3_MAX_NUM_QP; i++)
-		if (!(i & rdev_p->qpmask))
-			__kfifo_put(rdev_p->rscp->qpid_fifo,
-				    (unsigned char *) &i, sizeof(u32));
-	return 0;
-}
-
-int cxio_hal_init_rhdl_resource(u32 nr_rhdl)
-{
-	return cxio_init_resource_fifo(&rhdl_fifo, &rhdl_fifo_lock, nr_rhdl, 1,
-				       0);
-}
-
-void cxio_hal_destroy_rhdl_resource(void)
-{
-	kfifo_free(rhdl_fifo);
-}
-
-/* nr_* must be power of 2 */
-int cxio_hal_init_resource(struct cxio_rdev *rdev_p,
-			   u32 nr_tpt, u32 nr_pbl,
-			   u32 nr_rqt, u32 nr_qpid, u32 nr_cqid, u32 nr_pdid)
-{
-	int err = 0;
-	struct cxio_hal_resource *rscp;
-
-	rscp = kmalloc(sizeof(*rscp), GFP_KERNEL);
-	if (!rscp)
-		return -ENOMEM;
-	rdev_p->rscp = rscp;
-	err = cxio_init_resource_fifo_random(&rscp->tpt_fifo,
-				      &rscp->tpt_fifo_lock,
-				      nr_tpt, 1, 0);
-	if (err)
-		goto tpt_err;
-	err = cxio_init_qpid_fifo(rdev_p);
-	if (err)
-		goto qpid_err;
-	err = cxio_init_resource_fifo(&rscp->cqid_fifo, &rscp->cqid_fifo_lock,
-				      nr_cqid, 1, 0);
-	if (err)
-		goto cqid_err;
-	err = cxio_init_resource_fifo(&rscp->pdid_fifo, &rscp->pdid_fifo_lock,
-				      nr_pdid, 1, 0);
-	if (err)
-		goto pdid_err;
-	return 0;
-pdid_err:
-	kfifo_free(rscp->cqid_fifo);
-cqid_err:
-	kfifo_free(rscp->qpid_fifo);
-qpid_err:
-	kfifo_free(rscp->tpt_fifo);
-tpt_err:
-	return -ENOMEM;
-}
-
-/*
- * returns 0 if no resource available
- */
-static inline u32 cxio_hal_get_resource(struct kfifo *fifo)
-{
-	u32 entry;
-	if (kfifo_get(fifo, (unsigned char *) &entry, sizeof(u32)))
-		return entry;
-	else
-		return 0;	/* fifo emptry */
-}
-
-static inline void cxio_hal_put_resource(struct kfifo *fifo, u32 entry)
-{
-	BUG_ON(kfifo_put(fifo, (unsigned char *) &entry, sizeof(u32)) == 0);
-}
-
-u32 cxio_hal_get_rhdl(void)
-{
-	return cxio_hal_get_resource(rhdl_fifo);
-}
-
-void cxio_hal_put_rhdl(u32 rhdl)
-{
-	cxio_hal_put_resource(rhdl_fifo, rhdl);
-}
-
-u32 cxio_hal_get_stag(struct cxio_hal_resource *rscp)
-{
-	return cxio_hal_get_resource(rscp->tpt_fifo);
-}
-
-void cxio_hal_put_stag(struct cxio_hal_resource *rscp, u32 stag)
-{
-	cxio_hal_put_resource(rscp->tpt_fifo, stag);
-}
-
-u32 cxio_hal_get_qpid(struct cxio_hal_resource *rscp)
-{
-	u32 qpid = cxio_hal_get_resource(rscp->qpid_fifo);
-	PDBG("%s qpid 0x%x\n", __FUNCTION__, qpid);
-	return qpid;
-}
-
-void cxio_hal_put_qpid(struct cxio_hal_resource *rscp, u32 qpid)
-{
-	PDBG("%s qpid 0x%x\n", __FUNCTION__, qpid);
-	cxio_hal_put_resource(rscp->qpid_fifo, qpid);
-}
-
-u32 cxio_hal_get_cqid(struct cxio_hal_resource *rscp)
-{
-	return cxio_hal_get_resource(rscp->cqid_fifo);
-}
-
-void cxio_hal_put_cqid(struct cxio_hal_resource *rscp, u32 cqid)
-{
-	cxio_hal_put_resource(rscp->cqid_fifo, cqid);
-}
-
-u32 cxio_hal_get_pdid(struct cxio_hal_resource *rscp)
-{
-	return cxio_hal_get_resource(rscp->pdid_fifo);
-}
-
-void cxio_hal_put_pdid(struct cxio_hal_resource *rscp, u32 pdid)
-{
-	cxio_hal_put_resource(rscp->pdid_fifo, pdid);
-}
-
-void cxio_hal_destroy_resource(struct cxio_hal_resource *rscp)
-{
-	kfifo_free(rscp->tpt_fifo);
-	kfifo_free(rscp->cqid_fifo);
-	kfifo_free(rscp->qpid_fifo);
-	kfifo_free(rscp->pdid_fifo);
-	kfree(rscp);
-}
-
-/*
- * PBL Memory Manager.  Uses Linux generic allocator.
- */
-
-#define MIN_PBL_SHIFT 8			/* 256B == min PBL size (32 entries) */
-#define PBL_CHUNK 2*1024*1024
-
-u32 cxio_hal_pblpool_alloc(struct cxio_rdev *rdev_p, int size)
-{
-	unsigned long addr = gen_pool_alloc(rdev_p->pbl_pool, size);
-	PDBG("%s addr 0x%x size %d\n", __FUNCTION__, (u32)addr, size);
-	return (u32)addr;
-}
-
-void cxio_hal_pblpool_free(struct cxio_rdev *rdev_p, u32 addr, int size)
-{
-	PDBG("%s addr 0x%x size %d\n", __FUNCTION__, addr, size);
-	gen_pool_free(rdev_p->pbl_pool, (unsigned long)addr, size);
-}
-
-int cxio_hal_pblpool_create(struct cxio_rdev *rdev_p)
-{
-	unsigned long i;
-	rdev_p->pbl_pool = gen_pool_create(MIN_PBL_SHIFT, -1);
-	if (rdev_p->pbl_pool)
-		for (i = rdev_p->rnic_info.pbl_base;
-		     i <= rdev_p->rnic_info.pbl_top - PBL_CHUNK + 1;
-		     i += PBL_CHUNK)
-			gen_pool_add(rdev_p->pbl_pool, i, PBL_CHUNK, -1);
-	return rdev_p->pbl_pool ? 0 : -ENOMEM;
-}
-
-void cxio_hal_pblpool_destroy(struct cxio_rdev *rdev_p)
-{
-	gen_pool_destroy(rdev_p->pbl_pool);
-}
-
-/*
- * RQT Memory Manager.  Uses Linux generic allocator.
- */
-
-#define MIN_RQT_SHIFT 10	/* 1KB == mini RQT size (16 entries) */
-#define RQT_CHUNK 2*1024*1024
-
-u32 cxio_hal_rqtpool_alloc(struct cxio_rdev *rdev_p, int size)
-{
-	unsigned long addr = gen_pool_alloc(rdev_p->rqt_pool, size << 6);
-	PDBG("%s addr 0x%x size %d\n", __FUNCTION__, (u32)addr, size << 6);
-	return (u32)addr;
-}
-
-void cxio_hal_rqtpool_free(struct cxio_rdev *rdev_p, u32 addr, int size)
-{
-	PDBG("%s addr 0x%x size %d\n", __FUNCTION__, addr, size << 6);
-	gen_pool_free(rdev_p->rqt_pool, (unsigned long)addr, size << 6);
-}
-
-int cxio_hal_rqtpool_create(struct cxio_rdev *rdev_p)
-{
-	unsigned long i;
-	rdev_p->rqt_pool = gen_pool_create(MIN_RQT_SHIFT, -1);
-	if (rdev_p->rqt_pool)
-		for (i = rdev_p->rnic_info.rqt_base;
-		     i <= rdev_p->rnic_info.rqt_top - RQT_CHUNK + 1;
-		     i += RQT_CHUNK)
-			gen_pool_add(rdev_p->rqt_pool, i, RQT_CHUNK, -1);
-	return rdev_p->rqt_pool ? 0 : -ENOMEM;
-}
-
-void cxio_hal_rqtpool_destroy(struct cxio_rdev *rdev_p)
-{
-	gen_pool_destroy(rdev_p->rqt_pool);
-}
diff --git a/drivers/infiniband/hw/cxgb3/core/cxio_resource.h b/drivers/infiniband/hw/cxgb3/core/cxio_resource.h
deleted file mode 100644
index a6bbe83..0000000
--- a/drivers/infiniband/hw/cxgb3/core/cxio_resource.h
+++ /dev/null
@@ -1,70 +0,0 @@
-/*
- * Copyright (c) 2006 Chelsio, Inc. All rights reserved.
- * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved.
- *
- * This software is available to you under a choice of one of two
- * licenses.  You may choose to be licensed under the terms of the GNU
- * General Public License (GPL) Version 2, available from the file
- * COPYING in the main directory of this source tree, or the
- * OpenIB.org BSD license below:
- *
- *     Redistribution and use in source and binary forms, with or
- *     without modification, are permitted provided that the following
- *     conditions are met:
- *
- *      - Redistributions of source code must retain the above
- *        copyright notice, this list of conditions and the following
- *        disclaimer.
- *
- *      - Redistributions in binary form must reproduce the above
- *        copyright notice, this list of conditions and the following
- *        disclaimer in the documentation and/or other materials
- *        provided with the distribution.
- *
- * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
- * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
- * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
- * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
- * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
- * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
- * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
- * SOFTWARE.
- */
-#ifndef __CXIO_RESOURCE_H__
-#define __CXIO_RESOURCE_H__
-
-#include <linux/kernel.h>
-#include <linux/random.h>
-#include <linux/slab.h>
-#include <linux/kfifo.h>
-#include <linux/spinlock.h>
-#include <linux/errno.h>
-#include <linux/genalloc.h>
-#include "cxio_hal.h"
-
-extern int cxio_hal_init_rhdl_resource(u32 nr_rhdl);
-extern void cxio_hal_destroy_rhdl_resource(void);
-extern int cxio_hal_init_resource(struct cxio_rdev *rdev_p,
-				  u32 nr_tpt, u32 nr_pbl,
-				  u32 nr_rqt, u32 nr_qpid, u32 nr_cqid,
-				  u32 nr_pdid);
-extern u32 cxio_hal_get_stag(struct cxio_hal_resource *rscp);
-extern void cxio_hal_put_stag(struct cxio_hal_resource *rscp, u32 stag);
-extern u32 cxio_hal_get_qpid(struct cxio_hal_resource *rscp);
-extern void cxio_hal_put_qpid(struct cxio_hal_resource *rscp, u32 qpid);
-extern u32 cxio_hal_get_cqid(struct cxio_hal_resource *rscp);
-extern void cxio_hal_put_cqid(struct cxio_hal_resource *rscp, u32 cqid);
-extern void cxio_hal_destroy_resource(struct cxio_hal_resource *rscp);
-
-#define PBL_OFF(rdev_p, a) ( (a) - (rdev_p)->rnic_info.pbl_base )
-extern int cxio_hal_pblpool_create(struct cxio_rdev *rdev_p);
-extern void cxio_hal_pblpool_destroy(struct cxio_rdev *rdev_p);
-extern u32 cxio_hal_pblpool_alloc(struct cxio_rdev *rdev_p, int size);
-extern void cxio_hal_pblpool_free(struct cxio_rdev *rdev_p, u32 addr, int size);
-
-#define RQT_OFF(rdev_p, a) ( (a) - (rdev_p)->rnic_info.rqt_base )
-extern int cxio_hal_rqtpool_create(struct cxio_rdev *rdev_p);
-extern void cxio_hal_rqtpool_destroy(struct cxio_rdev *rdev_p);
-extern u32 cxio_hal_rqtpool_alloc(struct cxio_rdev *rdev_p, int size);
-extern void cxio_hal_rqtpool_free(struct cxio_rdev *rdev_p, u32 addr, int size);
-#endif
diff --git a/drivers/infiniband/hw/cxgb3/core/cxio_wr.h b/drivers/infiniband/hw/cxgb3/core/cxio_wr.h
deleted file mode 100644
index 103fc42..0000000
--- a/drivers/infiniband/hw/cxgb3/core/cxio_wr.h
+++ /dev/null
@@ -1,685 +0,0 @@
-/*
- * Copyright (c) 2006 Chelsio, Inc. All rights reserved.
- * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved.
- *
- * This software is available to you under a choice of one of two
- * licenses.  You may choose to be licensed under the terms of the GNU
- * General Public License (GPL) Version 2, available from the file
- * COPYING in the main directory of this source tree, or the
- * OpenIB.org BSD license below:
- *
- *     Redistribution and use in source and binary forms, with or
- *     without modification, are permitted provided that the following
- *     conditions are met:
- *
- *      - Redistributions of source code must retain the above
- *        copyright notice, this list of conditions and the following
- *        disclaimer.
- *
- *      - Redistributions in binary form must reproduce the above
- *        copyright notice, this list of conditions and the following
- *        disclaimer in the documentation and/or other materials
- *        provided with the distribution.
- *
- * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
- * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
- * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
- * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
- * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
- * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
- * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
- * SOFTWARE.
- */
-#ifndef __CXIO_WR_H__
-#define __CXIO_WR_H__
-
-#include <asm/io.h>
-#include <linux/pci.h>
-#include <linux/timer.h>
-#include "firmware_exports.h"
-
-#define T3_MAX_SGE      4
-
-#define Q_EMPTY(rptr,wptr) ((rptr)==(wptr))
-#define Q_FULL(rptr,wptr,size_log2)  ( (((wptr)-(rptr))>>(size_log2)) && \
-				       ((rptr)!=(wptr)) )
-#define Q_GENBIT(ptr,size_log2) (!(((ptr)>>size_log2)&0x1))
-#define Q_FREECNT(rptr,wptr,size_log2) ((1UL<<size_log2)-((wptr)-(rptr)))
-#define Q_COUNT(rptr,wptr) ((wptr)-(rptr))
-#define Q_PTR2IDX(ptr,size_log2) (ptr & ((1UL<<size_log2)-1))
-
-static inline void ring_doorbell(void __iomem *doorbell, u32 qpid)
-{
-	writel(((1<<31) | qpid), doorbell);
-}
-
-#define SEQ32_GE(x,y) (!( (((u32) (x)) - ((u32) (y))) & 0x80000000 ))
-
-enum t3_wr_flags {
-	T3_COMPLETION_FLAG = 0x01,
-	T3_NOTIFY_FLAG = 0x02,
-	T3_SOLICITED_EVENT_FLAG = 0x04,
-	T3_READ_FENCE_FLAG = 0x08,
-	T3_LOCAL_FENCE_FLAG = 0x10
-} __attribute__ ((packed));
-
-enum t3_wr_opcode {
-	T3_WR_BP = FW_WROPCODE_RI_BYPASS,
-	T3_WR_SEND = FW_WROPCODE_RI_SEND,
-	T3_WR_WRITE = FW_WROPCODE_RI_RDMA_WRITE,
-	T3_WR_READ = FW_WROPCODE_RI_RDMA_READ,
-	T3_WR_INV_STAG = FW_WROPCODE_RI_LOCAL_INV,
-	T3_WR_BIND = FW_WROPCODE_RI_BIND_MW,
-	T3_WR_RCV = FW_WROPCODE_RI_RECEIVE,
-	T3_WR_INIT = FW_WROPCODE_RI_RDMA_INIT,
-	T3_WR_QP_MOD = FW_WROPCODE_RI_MODIFY_QP
-} __attribute__ ((packed));
-
-enum t3_rdma_opcode {
-	T3_RDMA_WRITE,		/* IETF RDMAP v1.0 ... */
-	T3_READ_REQ,
-	T3_READ_RESP,
-	T3_SEND,
-	T3_SEND_WITH_INV,
-	T3_SEND_WITH_SE,
-	T3_SEND_WITH_SE_INV,
-	T3_TERMINATE,
-	T3_RDMA_INIT,		/* CHELSIO RI specific ... */
-	T3_BIND_MW,
-	T3_FAST_REGISTER,
-	T3_LOCAL_INV,
-	T3_QP_MOD,
-	T3_BYPASS
-} __attribute__ ((packed));
-
-static inline enum t3_rdma_opcode wr2opcode(enum t3_wr_opcode wrop)
-{
-	switch (wrop) {
-		case T3_WR_BP: return T3_BYPASS;
-		case T3_WR_SEND: return T3_SEND;
-		case T3_WR_WRITE: return T3_RDMA_WRITE;
-		case T3_WR_READ: return T3_READ_REQ;
-		case T3_WR_INV_STAG: return T3_LOCAL_INV;
-		case T3_WR_BIND: return T3_BIND_MW;
-		case T3_WR_INIT: return T3_RDMA_INIT;
-		case T3_WR_QP_MOD: return T3_QP_MOD;
-		default: break;
-	}
-	return -1;
-}
-
-
-/* Work request id */
-union t3_wrid {
-	struct {
-		u32 hi;
-		u32 low;
-	} id0;
-	u64 id1;
-};
-
-#define WRID(wrid)		(wrid.id1)
-#define WRID_GEN(wrid)		(wrid.id0.wr_gen)
-#define WRID_IDX(wrid)		(wrid.id0.wr_idx)
-#define WRID_LO(wrid)		(wrid.id0.wr_lo)
-
-struct fw_riwrh {
-	__be32 op_seop_flags;
-	__be32 gen_tid_len;
-};
-
-#define S_FW_RIWR_OP		24
-#define M_FW_RIWR_OP		0xff
-#define V_FW_RIWR_OP(x)		((x) << S_FW_RIWR_OP)
-#define G_FW_RIWR_OP(x)	((((x) >> S_FW_RIWR_OP)) & M_FW_RIWR_OP)
-
-#define S_FW_RIWR_SOPEOP	22
-#define M_FW_RIWR_SOPEOP	0x3
-#define V_FW_RIWR_SOPEOP(x)	((x) << S_FW_RIWR_SOPEOP)
-
-#define S_FW_RIWR_FLAGS		8
-#define M_FW_RIWR_FLAGS		0x3fffff
-#define V_FW_RIWR_FLAGS(x)	((x) << S_FW_RIWR_FLAGS)
-#define G_FW_RIWR_FLAGS(x)	((((x) >> S_FW_RIWR_FLAGS)) & M_FW_RIWR_FLAGS)
-
-#define S_FW_RIWR_TID		8
-#define V_FW_RIWR_TID(x)	((x) << S_FW_RIWR_TID)
-
-#define S_FW_RIWR_LEN		0
-#define V_FW_RIWR_LEN(x)	((x) << S_FW_RIWR_LEN)
-
-#define S_FW_RIWR_GEN           31
-#define V_FW_RIWR_GEN(x)        ((x)  << S_FW_RIWR_GEN)
-
-struct t3_sge {
-	__be32 stag;
-	__be32 len;
-	__be64 to;
-};
-
-/* If num_sgle is zero, flit 5+ contains immediate data.*/
-struct t3_send_wr {
-	struct fw_riwrh wrh;	/* 0 */
-	union t3_wrid wrid;	/* 1 */
-
-	u8 rdmaop;		/* 2 */
-	u8 reserved[3];
-	__be32 rem_stag;
-	__be32 plen;		/* 3 */
-	__be32 num_sgle;
-	struct t3_sge sgl[T3_MAX_SGE];	/* 4+ */
-};
-
-struct t3_local_inv_wr {
-	struct fw_riwrh wrh;	/* 0 */
-	union t3_wrid wrid;	/* 1 */
-	__be32 stag;		/* 2 */
-	__be32 reserved3;
-};
-
-struct t3_rdma_write_wr {
-	struct fw_riwrh wrh;	/* 0 */
-	union t3_wrid wrid;	/* 1 */
-	u8 rdmaop;		/* 2 */
-	u8 reserved[3];
-	__be32 stag_sink;
-	__be64 to_sink;		/* 3 */
-	__be32 plen;		/* 4 */
-	__be32 num_sgle;
-	struct t3_sge sgl[T3_MAX_SGE];	/* 5+ */
-};
-
-struct t3_rdma_read_wr {
-	struct fw_riwrh wrh;	/* 0 */
-	union t3_wrid wrid;	/* 1 */
-	u8 rdmaop;		/* 2 */
-	u8 reserved[3];
-	__be32 rem_stag;
-	__be64 rem_to;		/* 3 */
-	__be32 local_stag;	/* 4 */
-	__be32 local_len;
-	__be64 local_to;	/* 5 */
-};
-
-enum t3_addr_type {
-	T3_VA_BASED_TO = 0x0,
-	T3_ZERO_BASED_TO = 0x1
-} __attribute__ ((packed));
-
-enum t3_mem_perms {
-	T3_MEM_ACCESS_LOCAL_READ = 0x1,
-	T3_MEM_ACCESS_LOCAL_WRITE = 0x2,
-	T3_MEM_ACCESS_REM_READ = 0x4,
-	T3_MEM_ACCESS_REM_WRITE = 0x8
-} __attribute__ ((packed));
-
-struct t3_bind_mw_wr {
-	struct fw_riwrh wrh;	/* 0 */
-	union t3_wrid wrid;	/* 1 */
-	u16 reserved;		/* 2 */
-	u8 type;
-	u8 perms;
-	__be32 mr_stag;
-	__be32 mw_stag;		/* 3 */
-	__be32 mw_len;
-	__be64 mw_va;		/* 4 */
-	__be32 mr_pbl_addr;	/* 5 */
-	u8 reserved2[3];
-	u8 mr_pagesz;
-};
-
-struct t3_receive_wr {
-	struct fw_riwrh wrh;	/* 0 */
-	union t3_wrid wrid;	/* 1 */
-	u8 pagesz[T3_MAX_SGE];
-	__be32 num_sgle;		/* 2 */
-	struct t3_sge sgl[T3_MAX_SGE];	/* 3+ */
-	__be32 pbl_addr[T3_MAX_SGE];
-};
-
-struct t3_bypass_wr {
-	struct fw_riwrh wrh;
-	union t3_wrid wrid;	/* 1 */
-};
-
-struct t3_modify_qp_wr {
-	struct fw_riwrh wrh;	/* 0 */
-	union t3_wrid wrid;	/* 1 */
-	__be32 flags;		/* 2 */
-	__be32 quiesce;		/* 2 */
-	__be32 max_ird;		/* 3 */
-	__be32 max_ord;		/* 3 */
-	__be64 sge_cmd;		/* 4 */
-	__be64 ctx1;		/* 5 */
-	__be64 ctx0;		/* 6 */
-};
-
-enum t3_modify_qp_flags {
-	MODQP_QUIESCE  = 0x01,
-	MODQP_MAX_IRD  = 0x02,
-	MODQP_MAX_ORD  = 0x04,
-	MODQP_WRITE_EC = 0x08,
-	MODQP_READ_EC  = 0x10,
-};
-
-
-enum t3_mpa_attrs {
-	uP_RI_MPA_RX_MARKER_ENABLE = 0x1,
-	uP_RI_MPA_TX_MARKER_ENABLE = 0x2,
-	uP_RI_MPA_CRC_ENABLE = 0x4,
-	uP_RI_MPA_IETF_ENABLE = 0x8
-} __attribute__ ((packed));
-
-enum t3_qp_caps {
-	uP_RI_QP_RDMA_READ_ENABLE = 0x01,
-	uP_RI_QP_RDMA_WRITE_ENABLE = 0x02,
-	uP_RI_QP_BIND_ENABLE = 0x04,
-	uP_RI_QP_FAST_REGISTER_ENABLE = 0x08,
-	uP_RI_QP_STAG0_ENABLE = 0x10
-} __attribute__ ((packed));
-
-struct t3_rdma_init_attr {
-	u32 tid;
-	u32 qpid;
-	u32 pdid;
-	u32 scqid;
-	u32 rcqid;
-	u32 rq_addr;
-	u32 rq_size;
-	enum t3_mpa_attrs mpaattrs;
-	enum t3_qp_caps qpcaps;
-	u16 tcp_emss;
-	u32 ord;
-	u32 ird;
-	u64 qp_dma_addr;
-	u32 qp_dma_size;
-	u32 flags;
-};
-
-struct t3_rdma_init_wr {
-	struct fw_riwrh wrh;	/* 0 */
-	union t3_wrid wrid;	/* 1 */
-	__be32 qpid;		/* 2 */
-	__be32 pdid;
-	__be32 scqid;		/* 3 */
-	__be32 rcqid;
-	__be32 rq_addr;		/* 4 */
-	__be32 rq_size;
-	u8 mpaattrs;		/* 5 */
-	u8 qpcaps;
-	__be16 ulpdu_size;
-	__be32 flags;		/* bits 31-1 - reservered */
-				/* bit     0 - set if RECV posted */
-	__be32 ord;		/* 6 */
-	__be32 ird;
-	__be64 qp_dma_addr;	/* 7 */
-	__be32 qp_dma_size;	/* 8 */
-	u32 rsvd;
-};
-
-struct t3_genbit {
-	u64 flit[15];
-	__be64 genbit;
-};
-
-enum rdma_init_wr_flags {
-	RECVS_POSTED = 1,
-};
-
-union t3_wr {
-	struct t3_send_wr send;
-	struct t3_rdma_write_wr write;
-	struct t3_rdma_read_wr read;
-	struct t3_receive_wr recv;
-	struct t3_local_inv_wr local_inv;
-	struct t3_bind_mw_wr bind;
-	struct t3_bypass_wr bypass;
-	struct t3_rdma_init_wr init;
-	struct t3_modify_qp_wr qp_mod;
-	struct t3_genbit genbit;
-	u64 flit[16];
-};
-
-#define T3_SQ_CQE_FLIT	  13
-#define T3_SQ_COOKIE_FLIT 14
-
-#define T3_RQ_COOKIE_FLIT 13
-#define T3_RQ_CQE_FLIT	  14
-
-static inline enum t3_wr_opcode fw_riwrh_opcode(struct fw_riwrh *wqe)
-{
-	return G_FW_RIWR_OP(be32_to_cpu(wqe->op_seop_flags));
-}
-
-static inline void build_fw_riwrh(struct fw_riwrh *wqe, enum t3_wr_opcode op,
-				  enum t3_wr_flags flags, u8 genbit, u32 tid,
-				  u8 len)
-{
-	wqe->op_seop_flags = cpu_to_be32(V_FW_RIWR_OP(op) |
-					 V_FW_RIWR_SOPEOP(M_FW_RIWR_SOPEOP) |
-					 V_FW_RIWR_FLAGS(flags));
-	wmb();
-	wqe->gen_tid_len = cpu_to_be32(V_FW_RIWR_GEN(genbit) |
-				       V_FW_RIWR_TID(tid) |
-				       V_FW_RIWR_LEN(len));
-	/* 2nd gen bit... */
-	((union t3_wr *)wqe)->genbit.genbit = cpu_to_be64(genbit);
-}
-
-/*
- * T3 ULP2_TX commands
- */
-enum t3_utx_mem_op {
-	T3_UTX_MEM_READ = 2,
-	T3_UTX_MEM_WRITE = 3
-};
-
-/* T3 MC7 RDMA TPT entry format */
-
-enum tpt_mem_type {
-	TPT_NON_SHARED_MR = 0x0,
-	TPT_SHARED_MR = 0x1,
-	TPT_MW = 0x2,
-	TPT_MW_RELAXED_PROTECTION = 0x3
-};
-
-enum tpt_addr_type {
-	TPT_ZBTO = 0,
-	TPT_VATO = 1
-};
-
-enum tpt_mem_perm {
-	TPT_LOCAL_READ = 0x8,
-	TPT_LOCAL_WRITE = 0x4,
-	TPT_REMOTE_READ = 0x2,
-	TPT_REMOTE_WRITE = 0x1
-};
-
-struct tpt_entry {
-	__be32 valid_stag_pdid;
-	__be32 flags_pagesize_qpid;
-
-	__be32 rsvd_pbl_addr;
-	__be32 len;
-	__be32 va_hi;
-	__be32 va_low_or_fbo;
-
-	__be32 rsvd_bind_cnt_or_pstag;
-	__be32 rsvd_pbl_size;
-};
-
-#define S_TPT_VALID		31
-#define V_TPT_VALID(x)		((x) << S_TPT_VALID)
-#define F_TPT_VALID		V_TPT_VALID(1U)
-
-#define S_TPT_STAG_KEY		23
-#define M_TPT_STAG_KEY		0xFF
-#define V_TPT_STAG_KEY(x)	((x) << S_TPT_STAG_KEY)
-#define G_TPT_STAG_KEY(x)	(((x) >> S_TPT_STAG_KEY) & M_TPT_STAG_KEY)
-
-#define S_TPT_STAG_STATE	22
-#define V_TPT_STAG_STATE(x)	((x) << S_TPT_STAG_STATE)
-#define F_TPT_STAG_STATE	V_TPT_STAG_STATE(1U)
-
-#define S_TPT_STAG_TYPE		20
-#define M_TPT_STAG_TYPE		0x3
-#define V_TPT_STAG_TYPE(x)	((x) << S_TPT_STAG_TYPE)
-#define G_TPT_STAG_TYPE(x)	(((x) >> S_TPT_STAG_TYPE) & M_TPT_STAG_TYPE)
-
-#define S_TPT_PDID		0
-#define M_TPT_PDID		0xFFFFF
-#define V_TPT_PDID(x)		((x) << S_TPT_PDID)
-#define G_TPT_PDID(x)		(((x) >> S_TPT_PDID) & M_TPT_PDID)
-
-#define S_TPT_PERM		28
-#define M_TPT_PERM		0xF
-#define V_TPT_PERM(x)		((x) << S_TPT_PERM)
-#define G_TPT_PERM(x)		(((x) >> S_TPT_PERM) & M_TPT_PERM)
-
-#define S_TPT_REM_INV_DIS	27
-#define V_TPT_REM_INV_DIS(x)	((x) << S_TPT_REM_INV_DIS)
-#define F_TPT_REM_INV_DIS	V_TPT_REM_INV_DIS(1U)
-
-#define S_TPT_ADDR_TYPE		26
-#define V_TPT_ADDR_TYPE(x)	((x) << S_TPT_ADDR_TYPE)
-#define F_TPT_ADDR_TYPE		V_TPT_ADDR_TYPE(1U)
-
-#define S_TPT_MW_BIND_ENABLE	25
-#define V_TPT_MW_BIND_ENABLE(x)	((x) << S_TPT_MW_BIND_ENABLE)
-#define F_TPT_MW_BIND_ENABLE    V_TPT_MW_BIND_ENABLE(1U)
-
-#define S_TPT_PAGE_SIZE		20
-#define M_TPT_PAGE_SIZE		0x1F
-#define V_TPT_PAGE_SIZE(x)	((x) << S_TPT_PAGE_SIZE)
-#define G_TPT_PAGE_SIZE(x)	(((x) >> S_TPT_PAGE_SIZE) & M_TPT_PAGE_SIZE)
-
-#define S_TPT_PBL_ADDR		0
-#define M_TPT_PBL_ADDR		0x1FFFFFFF
-#define V_TPT_PBL_ADDR(x)	((x) << S_TPT_PBL_ADDR)
-#define G_TPT_PBL_ADDR(x)       (((x) >> S_TPT_PBL_ADDR) & M_TPT_PBL_ADDR)
-
-#define S_TPT_QPID		0
-#define M_TPT_QPID		0xFFFFF
-#define V_TPT_QPID(x)		((x) << S_TPT_QPID)
-#define G_TPT_QPID(x)		(((x) >> S_TPT_QPID) & M_TPT_QPID)
-
-#define S_TPT_PSTAG		0
-#define M_TPT_PSTAG		0xFFFFFF
-#define V_TPT_PSTAG(x)		((x) << S_TPT_PSTAG)
-#define G_TPT_PSTAG(x)		(((x) >> S_TPT_PSTAG) & M_TPT_PSTAG)
-
-#define S_TPT_PBL_SIZE		0
-#define M_TPT_PBL_SIZE		0xFFFFF
-#define V_TPT_PBL_SIZE(x)	((x) << S_TPT_PBL_SIZE)
-#define G_TPT_PBL_SIZE(x)	(((x) >> S_TPT_PBL_SIZE) & M_TPT_PBL_SIZE)
-
-/*
- * CQE defs
- */
-struct t3_cqe {
-	__be32 header;
-	__be32 len;
-	union {
-		struct {
-			__be32 stag;
-			__be32 msn;
-		} rcqe;
-		struct {
-			u32 wrid_hi;
-			u32 wrid_low;
-		} scqe;
-	} u;
-};
-
-#define S_CQE_OOO	  31
-#define M_CQE_OOO	  0x1
-#define G_CQE_OOO(x)	  ((((x) >> S_CQE_OOO)) & M_CQE_OOO)
-#define V_CEQ_OOO(x)	  ((x)<<S_CQE_OOO)
-
-#define S_CQE_QPID        12
-#define M_CQE_QPID        0x7FFFF
-#define G_CQE_QPID(x)     ((((x) >> S_CQE_QPID)) & M_CQE_QPID)
-#define V_CQE_QPID(x)	  ((x)<<S_CQE_QPID)
-
-#define S_CQE_SWCQE       11
-#define M_CQE_SWCQE       0x1
-#define G_CQE_SWCQE(x)    ((((x) >> S_CQE_SWCQE)) & M_CQE_SWCQE)
-#define V_CQE_SWCQE(x)	  ((x)<<S_CQE_SWCQE)
-
-#define S_CQE_GENBIT      10
-#define M_CQE_GENBIT      0x1
-#define G_CQE_GENBIT(x)   (((x) >> S_CQE_GENBIT) & M_CQE_GENBIT)
-#define V_CQE_GENBIT(x)	  ((x)<<S_CQE_GENBIT)
-
-#define S_CQE_STATUS      5
-#define M_CQE_STATUS      0x1F
-#define G_CQE_STATUS(x)   ((((x) >> S_CQE_STATUS)) & M_CQE_STATUS)
-#define V_CQE_STATUS(x)   ((x)<<S_CQE_STATUS)
-
-#define S_CQE_TYPE        4
-#define M_CQE_TYPE        0x1
-#define G_CQE_TYPE(x)     ((((x) >> S_CQE_TYPE)) & M_CQE_TYPE)
-#define V_CQE_TYPE(x)     ((x)<<S_CQE_TYPE)
-
-#define S_CQE_OPCODE      0
-#define M_CQE_OPCODE      0xF
-#define G_CQE_OPCODE(x)   ((((x) >> S_CQE_OPCODE)) & M_CQE_OPCODE)
-#define V_CQE_OPCODE(x)   ((x)<<S_CQE_OPCODE)
-
-#define SW_CQE(x)         (G_CQE_SWCQE(be32_to_cpu((x).header)))
-#define CQE_OOO(x)        (G_CQE_OOO(be32_to_cpu((x).header)))
-#define CQE_QPID(x)       (G_CQE_QPID(be32_to_cpu((x).header)))
-#define CQE_GENBIT(x)     (G_CQE_GENBIT(be32_to_cpu((x).header)))
-#define CQE_TYPE(x)       (G_CQE_TYPE(be32_to_cpu((x).header)))
-#define SQ_TYPE(x)	  (CQE_TYPE((x)))
-#define RQ_TYPE(x)	  (!CQE_TYPE((x)))
-#define CQE_STATUS(x)     (G_CQE_STATUS(be32_to_cpu((x).header)))
-#define CQE_OPCODE(x)     (G_CQE_OPCODE(be32_to_cpu((x).header)))
-
-#define CQE_LEN(x)        (be32_to_cpu((x).len))
-
-/* used for RQ completion processing */
-#define CQE_WRID_STAG(x)  (be32_to_cpu((x).u.rcqe.stag))
-#define CQE_WRID_MSN(x)   (be32_to_cpu((x).u.rcqe.msn))
-
-/* used for SQ completion processing */
-#define CQE_WRID_SQ_WPTR(x)	((x).u.scqe.wrid_hi)
-#define CQE_WRID_WPTR(x)	((x).u.scqe.wrid_low)
-
-/* generic accessor macros */
-#define CQE_WRID_HI(x)		((x).u.scqe.wrid_hi)
-#define CQE_WRID_LOW(x)		((x).u.scqe.wrid_low)
-
-#define TPT_ERR_SUCCESS                     0x0
-#define TPT_ERR_STAG                        0x1	 /* STAG invalid: either the */
-						 /* STAG is offlimt, being 0, */
-						 /* or STAG_key mismatch */
-#define TPT_ERR_PDID                        0x2	 /* PDID mismatch */
-#define TPT_ERR_QPID                        0x3	 /* QPID mismatch */
-#define TPT_ERR_ACCESS                      0x4	 /* Invalid access right */
-#define TPT_ERR_WRAP                        0x5	 /* Wrap error */
-#define TPT_ERR_BOUND                       0x6	 /* base and bounds voilation */
-#define TPT_ERR_INVALIDATE_SHARED_MR        0x7	 /* attempt to invalidate a  */
-						 /* shared memory region */
-#define TPT_ERR_INVALIDATE_MR_WITH_MW_BOUND 0x8	 /* attempt to invalidate a  */
-						 /* shared memory region */
-#define TPT_ERR_ECC                         0x9	 /* ECC error detected */
-#define TPT_ERR_ECC_PSTAG                   0xA	 /* ECC error detected when  */
-						 /* reading PSTAG for a MW  */
-						 /* Invalidate */
-#define TPT_ERR_PBL_ADDR_BOUND              0xB	 /* pbl addr out of bounds:  */
-						 /* software error */
-#define TPT_ERR_SWFLUSH			    0xC	 /* SW FLUSHED */
-#define TPT_ERR_CRC                         0x10 /* CRC error */
-#define TPT_ERR_MARKER                      0x11 /* Marker error */
-#define TPT_ERR_PDU_LEN_ERR                 0x12 /* invalid PDU length */
-#define TPT_ERR_OUT_OF_RQE                  0x13 /* out of RQE */
-#define TPT_ERR_DDP_VERSION                 0x14 /* wrong DDP version */
-#define TPT_ERR_RDMA_VERSION                0x15 /* wrong RDMA version */
-#define TPT_ERR_OPCODE                      0x16 /* invalid rdma opcode */
-#define TPT_ERR_DDP_QUEUE_NUM               0x17 /* invalid ddp queue number */
-#define TPT_ERR_MSN                         0x18 /* MSN error */
-#define TPT_ERR_TBIT                        0x19 /* tag bit not set correctly */
-#define TPT_ERR_MO                          0x1A /* MO not 0 for TERMINATE  */
-						 /* or READ_REQ */
-#define TPT_ERR_MSN_GAP                     0x1B
-#define TPT_ERR_MSN_RANGE                   0x1C
-#define TPT_ERR_IRD_OVERFLOW                0x1D
-#define TPT_ERR_RQE_ADDR_BOUND              0x1E /* RQE addr out of bounds:  */
-						 /* software error */
-#define TPT_ERR_INTERNAL_ERR                0x1F /* internal error (opcode  */
-						 /* mismatch) */
-
-struct t3_swsq {
-	__u64			wr_id;
-	struct t3_cqe		cqe;
-	__u32			sq_wptr;
-	__be32			read_len;
-	int			opcode;
-	int			complete;
-	int			signaled;
-};
-
-/*
- * A T3 WQ implements both the SQ and RQ.
- */
-struct t3_wq {
-	union t3_wr *queue;		/* DMA accessable memory */
-	dma_addr_t dma_addr;		/* DMA address for HW */
-	DECLARE_PCI_UNMAP_ADDR(mapping)	/* unmap kruft */
-	u32 error;			/* 1 once we go to ERROR */
-	u32 qpid;
-	u32 wptr;			/* idx to next available WR slot */
-	u32 size_log2;			/* total wq size */
-	struct t3_swsq *sq;		/* SW SQ */
-	struct t3_swsq *oldest_read;	/* tracks oldest pending read */
-	u32 sq_wptr;			/* sq_wptr - sq_rptr == count of */
-	u32 sq_rptr;			/* pending wrs */
-	u32 sq_size_log2;		/* sq size */
-	u64 *rq;			/* SW RQ (holds consumer wr_ids */
-	u32 rq_wptr;			/* rq_wptr - rq_rptr == count of */
-	u32 rq_rptr;			/* pending wrs */
-	u64 *rq_oldest_wr;		/* oldest wr on the SW RQ */
-	u32 rq_size_log2;		/* rq size */
-	u32 rq_addr;			/* rq adapter address */
-	void __iomem *doorbell;		/* kernel db */
-	u64 udb;			/* user db if any */
-};
-
-struct t3_cq {
-	u32 cqid;
-	u32 rptr;
-	u32 wptr;
-	u32 size_log2;
-	dma_addr_t dma_addr;
-	DECLARE_PCI_UNMAP_ADDR(mapping)
-	struct t3_cqe *queue;
-	struct t3_cqe *sw_queue;
-	u32 sw_rptr;
-	u32 sw_wptr;
-};
-
-#define CQ_VLD_ENTRY(ptr,size_log2,cqe) (Q_GENBIT(ptr,size_log2) == \
-					 CQE_GENBIT(*cqe))
-
-static inline void cxio_set_wq_in_error(struct t3_wq *wq)
-{
-	wq->queue->flit[13] = 1;
-}
-
-static inline struct t3_cqe *cxio_next_hw_cqe(struct t3_cq *cq)
-{
-	struct t3_cqe *cqe;
-
-	cqe = cq->queue + (Q_PTR2IDX(cq->rptr, cq->size_log2));
-	if (CQ_VLD_ENTRY(cq->rptr, cq->size_log2, cqe))
-		return cqe;
-	return NULL;
-}
-
-static inline struct t3_cqe *cxio_next_sw_cqe(struct t3_cq *cq)
-{
-	struct t3_cqe *cqe;
-
-	if (!Q_EMPTY(cq->sw_rptr, cq->sw_wptr)) {
-		cqe = cq->sw_queue + (Q_PTR2IDX(cq->sw_rptr, cq->size_log2));
-		return cqe;
-	}
-	return NULL;
-}
-
-static inline struct t3_cqe *cxio_next_cqe(struct t3_cq *cq)
-{
-	struct t3_cqe *cqe;
-
-	if (!Q_EMPTY(cq->sw_rptr, cq->sw_wptr)) {
-		cqe = cq->sw_queue + (Q_PTR2IDX(cq->sw_rptr, cq->size_log2));
-		return cqe;
-	}
-	cqe = cq->queue + (Q_PTR2IDX(cq->rptr, cq->size_log2));
-	if (CQ_VLD_ENTRY(cq->rptr, cq->size_log2, cqe))
-		return cqe;
-	return NULL;
-}
-
-#endif
diff --git a/drivers/infiniband/hw/cxgb3/cxio_dbg.c b/drivers/infiniband/hw/cxgb3/cxio_dbg.c
new file mode 100644
index 0000000..dfaa704
--- /dev/null
+++ b/drivers/infiniband/hw/cxgb3/cxio_dbg.c
@@ -0,0 +1,205 @@
+/*
+ * Copyright (c) 2006 Chelsio, Inc. All rights reserved.
+ * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+#ifdef DEBUG
+#include <linux/types.h>
+#include "common.h"
+#include "cxgb3_ioctl.h"
+#include "cxio_hal.h"
+#include "cxio_wr.h"
+
+void cxio_dump_tpt(struct cxio_rdev *rdev, u32 stag)
+{
+	struct ch_mem_range *m;
+	u64 *data;
+	int rc;
+	int size = 32;
+
+	m = kmalloc(sizeof(*m) + size, GFP_ATOMIC);
+	if (!m) {
+		PDBG("%s couldn't allocate memory.\n", __FUNCTION__);
+		return;
+	}
+	m->mem_id = MEM_PMRX;
+	m->addr = (stag>>8) * 32 + rdev->rnic_info.tpt_base;
+	m->len = size;
+	PDBG("%s TPT addr 0x%x len %d\n", __FUNCTION__, m->addr, m->len);
+	rc = rdev->t3cdev_p->ctl(rdev->t3cdev_p, RDMA_GET_MEM, m);
+	if (rc) {
+		PDBG("%s toectl returned error %d\n", __FUNCTION__, rc);
+		kfree(m);
+		return;
+	}
+
+	data = (u64 *)m->buf;
+	while (size > 0) {
+		PDBG("TPT %08x: %016llx\n", m->addr, (u64)*data);
+		size -= 8;
+		data++;
+		m->addr += 8;
+	}
+	kfree(m);
+}
+
+void cxio_dump_pbl(struct cxio_rdev *rdev, u32 pbl_addr, uint len, u8 shift)
+{
+	struct ch_mem_range *m;
+	u64 *data;
+	int rc;
+	int size, npages;
+
+	shift += 12;
+	npages = (len + (1ULL << shift) - 1) >> shift;
+	size = npages * sizeof(u64);
+
+	m = kmalloc(sizeof(*m) + size, GFP_ATOMIC);
+	if (!m) {
+		PDBG("%s couldn't allocate memory.\n", __FUNCTION__);
+		return;
+	}
+	m->mem_id = MEM_PMRX;
+	m->addr = pbl_addr;
+	m->len = size;
+	PDBG("%s PBL addr 0x%x len %d depth %d\n",
+		__FUNCTION__, m->addr, m->len, npages);
+	rc = rdev->t3cdev_p->ctl(rdev->t3cdev_p, RDMA_GET_MEM, m);
+	if (rc) {
+		PDBG("%s toectl returned error %d\n", __FUNCTION__, rc);
+		kfree(m);
+		return;
+	}
+
+	data = (u64 *)m->buf;
+	while (size > 0) {
+		PDBG("PBL %08x: %016llx\n", m->addr, (u64)*data);
+		size -= 8;
+		data++;
+		m->addr += 8;
+	}
+	kfree(m);
+}
+
+void cxio_dump_wqe(union t3_wr *wqe)
+{
+	__be64 *data = (__be64 *)wqe;
+	uint size = (uint)(be64_to_cpu(*data) & 0xff);
+
+	if (size == 0)
+		size = 8;
+	while (size > 0) {
+		PDBG("WQE %p: %016llx\n", data, be64_to_cpu(*data));
+		size--;
+		data++;
+	}
+}
+
+void cxio_dump_wce(struct t3_cqe *wce)
+{
+	__be64 *data = (__be64 *)wce;
+	int size = sizeof(*wce);
+
+	while (size > 0) {
+		PDBG("WCE %p: %016llx\n", data, be64_to_cpu(*data));
+		size -= 8;
+		data++;
+	}
+}
+
+void cxio_dump_rqt(struct cxio_rdev *rdev, u32 hwtid, int nents)
+{
+	struct ch_mem_range *m;
+	int size = nents * 64;
+	u64 *data;
+	int rc;
+
+	m = kmalloc(sizeof(*m) + size, GFP_ATOMIC);
+	if (!m) {
+		PDBG("%s couldn't allocate memory.\n", __FUNCTION__);
+		return;
+	}
+	m->mem_id = MEM_PMRX;
+	m->addr = ((hwtid)<<10) + rdev->rnic_info.rqt_base;
+	m->len = size;
+	PDBG("%s RQT addr 0x%x len %d\n", __FUNCTION__, m->addr, m->len);
+	rc = rdev->t3cdev_p->ctl(rdev->t3cdev_p, RDMA_GET_MEM, m);
+	if (rc) {
+		PDBG("%s toectl returned error %d\n", __FUNCTION__, rc);
+		kfree(m);
+		return;
+	}
+
+	data = (u64 *)m->buf;
+	while (size > 0) {
+		PDBG("RQT %08x: %016llx\n", m->addr, (u64)*data);
+		size -= 8;
+		data++;
+		m->addr += 8;
+	}
+	kfree(m);
+}
+
+void cxio_dump_tcb(struct cxio_rdev *rdev, u32 hwtid)
+{
+	struct ch_mem_range *m;
+	int size = TCB_SIZE;
+	u32 *data;
+	int rc;
+
+	m = kmalloc(sizeof(*m) + size, GFP_ATOMIC);
+	if (!m) {
+		PDBG("%s couldn't allocate memory.\n", __FUNCTION__);
+		return;
+	}
+	m->mem_id = MEM_CM;
+	m->addr = hwtid * size;
+	m->len = size;
+	PDBG("%s TCB %d len %d\n", __FUNCTION__, m->addr, m->len);
+	rc = rdev->t3cdev_p->ctl(rdev->t3cdev_p, RDMA_GET_MEM, m);
+	if (rc) {
+		PDBG("%s toectl returned error %d\n", __FUNCTION__, rc);
+		kfree(m);
+		return;
+	}
+
+	data = (u32 *)m->buf;
+	while (size > 0) {
+		printk("%2u: %08x %08x %08x %08x %08x %08x %08x %08x\n",
+			m->addr,
+			*(data+2), *(data+3), *(data),*(data+1),
+			*(data+6), *(data+7), *(data+4), *(data+5));
+		size -= 32;
+		data += 8;
+		m->addr += 32;
+	}
+	kfree(m);
+}
+#endif
diff --git a/drivers/infiniband/hw/cxgb3/cxio_hal.c b/drivers/infiniband/hw/cxgb3/cxio_hal.c
new file mode 100644
index 0000000..19553b3
--- /dev/null
+++ b/drivers/infiniband/hw/cxgb3/cxio_hal.c
@@ -0,0 +1,1280 @@
+/*
+ * Copyright (c) 2006 Chelsio, Inc. All rights reserved.
+ * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+#include <asm/semaphore.h>
+#include <asm/delay.h>
+
+#include <linux/netdevice.h>
+#include <linux/sched.h>
+#include <linux/spinlock.h>
+#include <linux/pci.h>
+
+#include "cxio_resource.h"
+#include "cxio_hal.h"
+#include "cxgb3_offload.h"
+#include "sge_defs.h"
+
+static LIST_HEAD(rdev_list);
+static cxio_hal_ev_callback_func_t cxio_ev_cb = NULL;
+
+static inline struct cxio_rdev *cxio_hal_find_rdev_by_name(char *dev_name)
+{
+	struct cxio_rdev *rdev;
+
+	list_for_each_entry(rdev, &rdev_list, entry)
+		if (!strcmp(rdev->dev_name, dev_name))
+			return rdev;
+	return NULL;
+}
+
+static inline struct cxio_rdev *cxio_hal_find_rdev_by_t3cdev(struct t3cdev
+							     *tdev)
+{
+	struct cxio_rdev *rdev;
+
+	list_for_each_entry(rdev, &rdev_list, entry)
+		if (rdev->t3cdev_p == tdev)
+			return rdev;
+	return NULL;
+}
+
+int cxio_hal_cq_op(struct cxio_rdev *rdev_p, struct t3_cq *cq,
+		   enum t3_cq_opcode op, u32 credit)
+{
+	int ret;
+	struct t3_cqe *cqe;
+	u32 rptr;
+
+	struct rdma_cq_op setup;
+	setup.id = cq->cqid;
+	setup.credits = (op == CQ_CREDIT_UPDATE) ? credit : 0;
+	setup.op = op;
+	ret = rdev_p->t3cdev_p->ctl(rdev_p->t3cdev_p, RDMA_CQ_OP, &setup);
+
+	if ((ret < 0) || (op == CQ_CREDIT_UPDATE))
+		return ret;
+
+	/*
+	 * If the rearm returned an index other than our current index,
+	 * then there might be CQE's in flight (being DMA'd).  We must wait
+	 * here for them to complete or the consumer can miss a notification.
+	 */
+	if (Q_PTR2IDX((cq->rptr), cq->size_log2) != ret) {
+		int i=0;
+
+		rptr = cq->rptr;
+
+		/*
+		 * Keep the generation correct by bumping rptr until it
+		 * matches the index returned by the rearm - 1.
+		 */
+		while (Q_PTR2IDX((rptr+1), cq->size_log2) != ret)
+			rptr++;
+
+		/*
+		 * Now rptr is the index for the (last) cqe that was
+		 * in-flight at the time the HW rearmed the CQ.  We
+		 * spin until that CQE is valid.
+		 */
+		cqe = cq->queue + Q_PTR2IDX(rptr, cq->size_log2);
+		while (!CQ_VLD_ENTRY(rptr, cq->size_log2, cqe)) {
+			udelay(1);
+			if (i++ > 1000000) {
+				BUG_ON(1);
+				printk(KERN_ERR "%s: stalled rnic\n",
+				       rdev_p->dev_name);
+				return -EIO;
+			}
+		}
+	}
+	return 0;
+}
+
+static inline int cxio_hal_clear_cq_ctx(struct cxio_rdev *rdev_p, u32 cqid)
+{
+	struct rdma_cq_setup setup;
+	setup.id = cqid;
+	setup.base_addr = 0;	/* NULL address */
+	setup.size = 0;		/* disaable the CQ */
+	setup.credits = 0;
+	setup.credit_thres = 0;
+	setup.ovfl_mode = 0;
+	return (rdev_p->t3cdev_p->ctl(rdev_p->t3cdev_p, RDMA_CQ_SETUP, &setup));
+}
+
+int cxio_hal_clear_qp_ctx(struct cxio_rdev *rdev_p, u32 qpid)
+{
+	u64 sge_cmd;
+	struct t3_modify_qp_wr *wqe;
+	struct sk_buff *skb = alloc_skb(sizeof(*wqe), GFP_KERNEL);
+	if (!skb) {
+		PDBG("%s alloc_skb failed\n", __FUNCTION__);
+		return -ENOMEM;
+	}
+	wqe = (struct t3_modify_qp_wr *) skb_put(skb, sizeof(*wqe));
+	memset(wqe, 0, sizeof(*wqe));
+	build_fw_riwrh((struct fw_riwrh *) wqe, T3_WR_QP_MOD, 3, 1, qpid, 7);
+	wqe->flags = cpu_to_be32(MODQP_WRITE_EC);
+	sge_cmd = qpid << 8 | 3;
+	wqe->sge_cmd = cpu_to_be64(sge_cmd);
+	skb->priority = CPL_PRIORITY_CONTROL;
+	return (cxgb3_ofld_send(rdev_p->t3cdev_p, skb));
+}
+
+int cxio_create_cq(struct cxio_rdev *rdev_p, struct t3_cq *cq)
+{
+	struct rdma_cq_setup setup;
+	int size = (1UL << (cq->size_log2)) * sizeof(struct t3_cqe);
+
+	cq->cqid = cxio_hal_get_cqid(rdev_p->rscp);
+	if (!cq->cqid)
+		return -ENOMEM;
+	cq->sw_queue = kzalloc(size, GFP_KERNEL);
+	if (!cq->sw_queue)
+		return -ENOMEM;
+	cq->queue = dma_alloc_coherent(&(rdev_p->rnic_info.pdev->dev),
+					     (1UL << (cq->size_log2)) *
+					     sizeof(struct t3_cqe),
+					     &(cq->dma_addr), GFP_KERNEL);
+	if (!cq->queue) {
+		kfree(cq->sw_queue);
+		return -ENOMEM;
+	}
+	pci_unmap_addr_set(cq, mapping, cq->dma_addr);
+	memset(cq->queue, 0, size);
+	setup.id = cq->cqid;
+	setup.base_addr = (u64) (cq->dma_addr);
+	setup.size = 1UL << cq->size_log2;
+	setup.credits = 65535;
+	setup.credit_thres = 1;
+	if (rdev_p->t3cdev_p->type == T3B)
+		setup.ovfl_mode = 0;
+	else
+		setup.ovfl_mode = 1;
+	return (rdev_p->t3cdev_p->ctl(rdev_p->t3cdev_p, RDMA_CQ_SETUP, &setup));
+}
+
+int cxio_resize_cq(struct cxio_rdev *rdev_p, struct t3_cq *cq)
+{
+	struct rdma_cq_setup setup;
+	setup.id = cq->cqid;
+	setup.base_addr = (u64) (cq->dma_addr);
+	setup.size = 1UL << cq->size_log2;
+	setup.credits = setup.size;
+	setup.credit_thres = setup.size;	/* TBD: overflow recovery */
+	setup.ovfl_mode = 1;
+	return (rdev_p->t3cdev_p->ctl(rdev_p->t3cdev_p, RDMA_CQ_SETUP, &setup));
+}
+
+static u32 get_qpid(struct cxio_rdev *rdev_p, struct cxio_ucontext *uctx)
+{
+	struct cxio_qpid_list *entry;
+	u32 qpid;
+	int i;
+
+	mutex_lock(&uctx->lock);
+	if (!list_empty(&uctx->qpids)) {
+		entry = list_entry(uctx->qpids.next, struct cxio_qpid_list,
+				   entry);
+		list_del(&entry->entry);
+		qpid = entry->qpid;
+		kfree(entry);
+	} else {
+		qpid = cxio_hal_get_qpid(rdev_p->rscp);
+		if (!qpid)
+			goto out;
+		for (i = qpid+1; i & rdev_p->qpmask; i++) {
+			entry = kmalloc(sizeof *entry, GFP_KERNEL);
+			if (!entry)
+				break;
+			entry->qpid = i;
+			list_add_tail(&entry->entry, &uctx->qpids);
+		}
+	}
+out:
+	mutex_unlock(&uctx->lock);
+	PDBG("%s qpid 0x%x\n", __FUNCTION__, qpid);
+	return qpid;
+}
+
+static void put_qpid(struct cxio_rdev *rdev_p, u32 qpid,
+		     struct cxio_ucontext *uctx)
+{
+	struct cxio_qpid_list *entry;
+
+	entry = kmalloc(sizeof *entry, GFP_KERNEL);
+	if (!entry)
+		return;
+	PDBG("%s qpid 0x%x\n", __FUNCTION__, qpid);
+	entry->qpid = qpid;
+	mutex_lock(&uctx->lock);
+	list_add_tail(&entry->entry, &uctx->qpids);
+	mutex_unlock(&uctx->lock);
+}
+
+void cxio_release_ucontext(struct cxio_rdev *rdev_p, struct cxio_ucontext *uctx)
+{
+	struct list_head *pos, *nxt;
+	struct cxio_qpid_list *entry;
+
+	mutex_lock(&uctx->lock);
+	list_for_each_safe(pos, nxt, &uctx->qpids) {
+		entry = list_entry(pos, struct cxio_qpid_list, entry);
+		list_del_init(&entry->entry);
+		if (!(entry->qpid & rdev_p->qpmask))
+			cxio_hal_put_qpid(rdev_p->rscp, entry->qpid);
+		kfree(entry);
+	}
+	mutex_unlock(&uctx->lock);
+}
+
+void cxio_init_ucontext(struct cxio_rdev *rdev_p, struct cxio_ucontext *uctx)
+{
+	INIT_LIST_HEAD(&uctx->qpids);
+	mutex_init(&uctx->lock);
+}
+
+int cxio_create_qp(struct cxio_rdev *rdev_p, u32 kernel_domain,
+		   struct t3_wq *wq, struct cxio_ucontext *uctx)
+{
+	int depth = 1UL << wq->size_log2;
+	int rqsize = 1UL << wq->rq_size_log2;
+
+	wq->qpid = get_qpid(rdev_p, uctx);
+	if (!wq->qpid)
+		return -ENOMEM;
+
+	wq->rq = kzalloc(depth * sizeof(u64), GFP_KERNEL);
+	if (!wq->rq)
+		goto err1;
+
+	wq->rq_addr = cxio_hal_rqtpool_alloc(rdev_p, rqsize);
+	if (!wq->rq_addr)
+		goto err2;
+
+	wq->sq = kzalloc(depth * sizeof(struct t3_swsq), GFP_KERNEL);
+	if (!wq->sq)
+		goto err3;
+
+	wq->queue = dma_alloc_coherent(&(rdev_p->rnic_info.pdev->dev),
+					     depth * sizeof(union t3_wr),
+					     &(wq->dma_addr), GFP_KERNEL);
+	if (!wq->queue)
+		goto err4;
+
+	memset(wq->queue, 0, depth * sizeof(union t3_wr));
+	pci_unmap_addr_set(wq, mapping, wq->dma_addr);
+	wq->doorbell = (void __iomem *)rdev_p->rnic_info.kdb_addr;
+	if (!kernel_domain)
+		wq->udb = (u64)rdev_p->rnic_info.udbell_physbase +
+					(wq->qpid << rdev_p->qpshift);
+	PDBG("%s qpid 0x%x doorbell 0x%p udb 0x%llx\n", __FUNCTION__,
+	     wq->qpid, wq->doorbell, wq->udb);
+	return 0;
+err4:
+	kfree(wq->sq);
+err3:
+	cxio_hal_rqtpool_free(rdev_p, wq->rq_addr, rqsize);
+err2:
+	kfree(wq->rq);
+err1:
+	put_qpid(rdev_p, wq->qpid, uctx);
+	return -ENOMEM;
+}
+
+int cxio_destroy_cq(struct cxio_rdev *rdev_p, struct t3_cq *cq)
+{
+	int err;
+	err = cxio_hal_clear_cq_ctx(rdev_p, cq->cqid);
+	kfree(cq->sw_queue);
+	dma_free_coherent(&(rdev_p->rnic_info.pdev->dev),
+			  (1UL << (cq->size_log2))
+			  * sizeof(struct t3_cqe), cq->queue,
+			  pci_unmap_addr(cq, mapping));
+	cxio_hal_put_cqid(rdev_p->rscp, cq->cqid);
+	return err;
+}
+
+int cxio_destroy_qp(struct cxio_rdev *rdev_p, struct t3_wq *wq,
+		    struct cxio_ucontext *uctx)
+{
+	dma_free_coherent(&(rdev_p->rnic_info.pdev->dev),
+			  (1UL << (wq->size_log2))
+			  * sizeof(union t3_wr), wq->queue,
+			  pci_unmap_addr(wq, mapping));
+	kfree(wq->sq);
+	cxio_hal_rqtpool_free(rdev_p, wq->rq_addr, (1UL << wq->rq_size_log2));
+	kfree(wq->rq);
+	put_qpid(rdev_p, wq->qpid, uctx);
+	return 0;
+}
+
+static void insert_recv_cqe(struct t3_wq *wq, struct t3_cq *cq)
+{
+	struct t3_cqe cqe;
+
+	PDBG("%s wq %p cq %p sw_rptr 0x%x sw_wptr 0x%x\n", __FUNCTION__,
+	     wq, cq, cq->sw_rptr, cq->sw_wptr);
+	memset(&cqe, 0, sizeof(cqe));
+	cqe.header = cpu_to_be32(V_CQE_STATUS(TPT_ERR_SWFLUSH) |
+			         V_CQE_OPCODE(T3_SEND) |
+				 V_CQE_TYPE(0) |
+				 V_CQE_SWCQE(1) |
+				 V_CQE_QPID(wq->qpid) |
+				 V_CQE_GENBIT(Q_GENBIT(cq->sw_wptr,
+						       cq->size_log2)));
+	*(cq->sw_queue + Q_PTR2IDX(cq->sw_wptr, cq->size_log2)) = cqe;
+	cq->sw_wptr++;
+}
+
+void cxio_flush_rq(struct t3_wq *wq, struct t3_cq *cq, int count)
+{
+	u32 ptr;
+
+	PDBG("%s wq %p cq %p\n", __FUNCTION__, wq, cq);
+
+	/* flush RQ */
+	PDBG("%s rq_rptr %u rq_wptr %u skip count %u\n", __FUNCTION__,
+	    wq->rq_rptr, wq->rq_wptr, count);
+	ptr = wq->rq_rptr + count;
+	while (ptr++ != wq->rq_wptr)
+		insert_recv_cqe(wq, cq);
+}
+
+static void insert_sq_cqe(struct t3_wq *wq, struct t3_cq *cq,
+		          struct t3_swsq *sqp)
+{
+	struct t3_cqe cqe;
+
+	PDBG("%s wq %p cq %p sw_rptr 0x%x sw_wptr 0x%x\n", __FUNCTION__,
+	     wq, cq, cq->sw_rptr, cq->sw_wptr);
+	memset(&cqe, 0, sizeof(cqe));
+	cqe.header = cpu_to_be32(V_CQE_STATUS(TPT_ERR_SWFLUSH) |
+			         V_CQE_OPCODE(sqp->opcode) |
+			         V_CQE_TYPE(1) |
+			         V_CQE_SWCQE(1) |
+			         V_CQE_QPID(wq->qpid) |
+			         V_CQE_GENBIT(Q_GENBIT(cq->sw_wptr,
+						       cq->size_log2)));
+	cqe.u.scqe.wrid_hi = sqp->sq_wptr;
+
+	*(cq->sw_queue + Q_PTR2IDX(cq->sw_wptr, cq->size_log2)) = cqe;
+	cq->sw_wptr++;
+}
+
+void cxio_flush_sq(struct t3_wq *wq, struct t3_cq *cq, int count)
+{
+	__u32 ptr;
+	struct t3_swsq *sqp = wq->sq + Q_PTR2IDX(wq->sq_rptr, wq->sq_size_log2);
+
+	ptr = wq->sq_rptr + count;
+	sqp += count;
+	while (ptr != wq->sq_wptr) {
+		insert_sq_cqe(wq, cq, sqp);
+		sqp++;
+		ptr++;
+	}
+}
+
+/*
+ * Move all CQEs from the HWCQ into the SWCQ.
+ */
+void cxio_flush_hw_cq(struct t3_cq *cq)
+{
+	struct t3_cqe *cqe, *swcqe;
+
+	PDBG("%s cq %p cqid 0x%x\n", __FUNCTION__, cq, cq->cqid);
+	cqe = cxio_next_hw_cqe(cq);
+	while (cqe) {
+		PDBG("%s flushing hwcq rptr 0x%x to swcq wptr 0x%x\n",
+		     __FUNCTION__, cq->rptr, cq->sw_wptr);
+		swcqe = cq->sw_queue + Q_PTR2IDX(cq->sw_wptr, cq->size_log2);
+		*swcqe = *cqe;
+		swcqe->header |= cpu_to_be32(V_CQE_SWCQE(1));
+		cq->sw_wptr++;
+		cq->rptr++;
+		cqe = cxio_next_hw_cqe(cq);
+	}
+}
+
+static inline int cqe_completes_wr(struct t3_cqe *cqe, struct t3_wq *wq)
+{
+	if (CQE_OPCODE(*cqe) == T3_TERMINATE)
+		return 0;
+
+	if ((CQE_OPCODE(*cqe) == T3_RDMA_WRITE) && RQ_TYPE(*cqe))
+		return 0;
+
+	if ((CQE_OPCODE(*cqe) == T3_READ_RESP) && SQ_TYPE(*cqe))
+		return 0;
+
+	if ((CQE_OPCODE(*cqe) == T3_SEND) && RQ_TYPE(*cqe) &&
+	    Q_EMPTY(wq->rq_rptr, wq->rq_wptr))
+		return 0;
+
+	return 1;
+}
+
+void cxio_count_scqes(struct t3_cq *cq, struct t3_wq *wq, int *count)
+{
+	struct t3_cqe *cqe;
+	u32 ptr;
+
+	*count = 0;
+	ptr = cq->sw_rptr;
+	while (!Q_EMPTY(ptr, cq->sw_wptr)) {
+		cqe = cq->sw_queue + (Q_PTR2IDX(ptr, cq->size_log2));
+		if ((SQ_TYPE(*cqe) || (CQE_OPCODE(*cqe) == T3_READ_RESP)) &&
+		    (CQE_QPID(*cqe) == wq->qpid))
+			(*count)++;
+		ptr++;
+	}
+	PDBG("%s cq %p count %d\n", __FUNCTION__, cq, *count);
+}
+
+void cxio_count_rcqes(struct t3_cq *cq, struct t3_wq *wq, int *count)
+{
+	struct t3_cqe *cqe;
+	u32 ptr;
+
+	*count = 0;
+	PDBG("%s count zero %d\n", __FUNCTION__, *count);
+	ptr = cq->sw_rptr;
+	while (!Q_EMPTY(ptr, cq->sw_wptr)) {
+		cqe = cq->sw_queue + (Q_PTR2IDX(ptr, cq->size_log2));
+		if (RQ_TYPE(*cqe) && (CQE_OPCODE(*cqe) != T3_READ_RESP) &&
+		    (CQE_QPID(*cqe) == wq->qpid) && cqe_completes_wr(cqe, wq))
+			(*count)++;
+		ptr++;
+	}
+	PDBG("%s cq %p count %d\n", __FUNCTION__, cq, *count);
+}
+
+static int cxio_hal_init_ctrl_cq(struct cxio_rdev *rdev_p)
+{
+	struct rdma_cq_setup setup;
+	setup.id = 0;
+	setup.base_addr = 0;	/* NULL address */
+	setup.size = 1;		/* enable the CQ */
+	setup.credits = 0;
+
+	/* force SGE to redirect to RspQ and interrupt */
+	setup.credit_thres = 0;
+	setup.ovfl_mode = 1;
+	return (rdev_p->t3cdev_p->ctl(rdev_p->t3cdev_p, RDMA_CQ_SETUP, &setup));
+}
+
+static int cxio_hal_init_ctrl_qp(struct cxio_rdev *rdev_p)
+{
+	int err;
+	u64 sge_cmd, ctx0, ctx1;
+	u64 base_addr;
+	struct t3_modify_qp_wr *wqe;
+	struct sk_buff *skb = alloc_skb(sizeof(*wqe), GFP_KERNEL);
+
+
+	if (!skb) {
+		PDBG("%s alloc_skb failed\n", __FUNCTION__);
+		return -ENOMEM;
+	}
+	err = cxio_hal_init_ctrl_cq(rdev_p);
+	if (err) {
+		PDBG("%s err %d initializing ctrl_cq\n", __FUNCTION__, err);
+		return err;
+	}
+	rdev_p->ctrl_qp.workq = dma_alloc_coherent(
+					&(rdev_p->rnic_info.pdev->dev),
+					(1 << T3_CTRL_QP_SIZE_LOG2) *
+					sizeof(union t3_wr),
+					&(rdev_p->ctrl_qp.dma_addr),
+					GFP_KERNEL);
+	if (!rdev_p->ctrl_qp.workq) {
+		PDBG("%s dma_alloc_coherent failed\n", __FUNCTION__);
+		return -ENOMEM;
+	}
+	pci_unmap_addr_set(&rdev_p->ctrl_qp, mapping,
+			   rdev_p->ctrl_qp.dma_addr);
+	rdev_p->ctrl_qp.doorbell = (void __iomem *)rdev_p->rnic_info.kdb_addr;
+	memset(rdev_p->ctrl_qp.workq, 0,
+	       (1 << T3_CTRL_QP_SIZE_LOG2) * sizeof(union t3_wr));
+
+	init_MUTEX(&rdev_p->ctrl_qp.sem);
+	init_waitqueue_head(&rdev_p->ctrl_qp.waitq);
+
+	/* update HW Ctrl QP context */
+	base_addr = rdev_p->ctrl_qp.dma_addr;
+	base_addr >>= 12;
+	ctx0 = (V_EC_SIZE((1 << T3_CTRL_QP_SIZE_LOG2)) |
+		V_EC_BASE_LO((u32) base_addr & 0xffff));
+	ctx0 <<= 32;
+	ctx0 |= V_EC_CREDITS(FW_WR_NUM);
+	base_addr >>= 16;
+	ctx1 = (u32) base_addr;
+	base_addr >>= 32;
+	ctx1 |= ((u64) (V_EC_BASE_HI((u32) base_addr & 0xf) | V_EC_RESPQ(0) |
+			V_EC_TYPE(0) | V_EC_GEN(1) |
+			V_EC_UP_TOKEN(T3_CTL_QP_TID) | F_EC_VALID)) << 32;
+	wqe = (struct t3_modify_qp_wr *) skb_put(skb, sizeof(*wqe));
+	memset(wqe, 0, sizeof(*wqe));
+	build_fw_riwrh((struct fw_riwrh *) wqe, T3_WR_QP_MOD, 0, 1,
+		       T3_CTL_QP_TID, 7);
+	wqe->flags = cpu_to_be32(MODQP_WRITE_EC);
+	sge_cmd = (3ULL << 56) | FW_RI_SGEEC_START << 8 | 3;
+	wqe->sge_cmd = cpu_to_be64(sge_cmd);
+	wqe->ctx1 = cpu_to_be64(ctx1);
+	wqe->ctx0 = cpu_to_be64(ctx0);
+	PDBG("CtrlQP dma_addr 0x%llx workq %p size %d\n",
+	     (u64) rdev_p->ctrl_qp.dma_addr, rdev_p->ctrl_qp.workq,
+	     1 << T3_CTRL_QP_SIZE_LOG2);
+	skb->priority = CPL_PRIORITY_CONTROL;
+	return (cxgb3_ofld_send(rdev_p->t3cdev_p, skb));
+}
+
+static int cxio_hal_destroy_ctrl_qp(struct cxio_rdev *rdev_p)
+{
+	dma_free_coherent(&(rdev_p->rnic_info.pdev->dev),
+			  (1UL << T3_CTRL_QP_SIZE_LOG2)
+			  * sizeof(union t3_wr), rdev_p->ctrl_qp.workq,
+			  pci_unmap_addr(&rdev_p->ctrl_qp, mapping));
+	return cxio_hal_clear_qp_ctx(rdev_p, T3_CTRL_QP_ID);
+}
+
+/* write len bytes of data into addr (32B aligned address)
+ * If data is NULL, clear len byte of memory to zero.
+ * caller aquires the sem before the call
+ */
+static int cxio_hal_ctrl_qp_write_mem(struct cxio_rdev *rdev_p, u32 addr,
+				      u32 len, void *data, int completion)
+{
+	u32 i, nr_wqe, copy_len;
+	u8 *copy_data;
+	u8 wr_len, utx_len;	/* lenght in 8 byte flit */
+	enum t3_wr_flags flag;
+	__be64 *wqe;
+	u64 utx_cmd;
+	addr &= 0x7FFFFFF;
+	nr_wqe = len % 96 ? len / 96 + 1 : len / 96;	/* 96B max per WQE */
+	PDBG("%s wptr 0x%x rptr 0x%x len %d, nr_wqe %d data %p addr 0x%0x\n",
+	     __FUNCTION__, rdev_p->ctrl_qp.wptr, rdev_p->ctrl_qp.rptr, len,
+	     nr_wqe, data, addr);
+	utx_len = 3;		/* in 32B unit */
+	for (i = 0; i < nr_wqe; i++) {
+		if (Q_FULL(rdev_p->ctrl_qp.rptr, rdev_p->ctrl_qp.wptr,
+		           T3_CTRL_QP_SIZE_LOG2)) {
+			PDBG("%s ctrl_qp full wtpr 0x%0x rptr 0x%0x, "
+			     "wait for more space i %d\n", __FUNCTION__,
+			     rdev_p->ctrl_qp.wptr, rdev_p->ctrl_qp.rptr, i);
+			if (wait_event_interruptible(rdev_p->ctrl_qp.waitq,
+					     !Q_FULL(rdev_p->ctrl_qp.rptr,
+						     rdev_p->ctrl_qp.wptr,
+						     T3_CTRL_QP_SIZE_LOG2))) {
+				PDBG("%s ctrl_qp workq interrupted\n",
+				     __FUNCTION__);
+				return -ERESTARTSYS;
+			}
+			PDBG("%s ctrl_qp wakeup, continue posting work request "
+			     "i %d\n", __FUNCTION__, i);
+		}
+		wqe = (__be64 *)(rdev_p->ctrl_qp.workq + (rdev_p->ctrl_qp.wptr %
+						(1 << T3_CTRL_QP_SIZE_LOG2)));
+		flag = 0;
+		if (i == (nr_wqe - 1)) {
+			/* last WQE */
+			flag = completion ? T3_COMPLETION_FLAG : 0;
+			if (len % 32)
+				utx_len = len / 32 + 1;
+			else
+				utx_len = len / 32;
+		}
+
+		/*
+		 * Force a CQE to return the credit to the workq in case
+		 * we posted more than half the max QP size of WRs
+		 */
+		if ((i != 0) &&
+		    (i % (((1 << T3_CTRL_QP_SIZE_LOG2)) >> 1) == 0)) {
+			flag = T3_COMPLETION_FLAG;
+			PDBG("%s force completion at i %d\n", __FUNCTION__, i);
+		}
+
+		/* build the utx mem command */
+		wqe += (sizeof(struct t3_bypass_wr) >> 3);
+		utx_cmd = (T3_UTX_MEM_WRITE << 28) | (addr + i * 3);
+		utx_cmd <<= 32;
+		utx_cmd |= (utx_len << 28) | ((utx_len << 2) + 1);
+		*wqe = cpu_to_be64(utx_cmd);
+		wqe++;
+		copy_data = (u8 *) data + i * 96;
+		copy_len = len > 96 ? 96 : len;
+
+		/* clear memory content if data is NULL */
+		if (data)
+			memcpy(wqe, copy_data, copy_len);
+		else
+			memset(wqe, 0, copy_len);
+		if (copy_len % 32)
+			memset(((u8 *) wqe) + copy_len, 0,
+			       32 - (copy_len % 32));
+		wr_len = ((sizeof(struct t3_bypass_wr)) >> 3) + 1 +
+			 (utx_len << 2);
+		wqe = (__be64 *)(rdev_p->ctrl_qp.workq + (rdev_p->ctrl_qp.wptr %
+			      (1 << T3_CTRL_QP_SIZE_LOG2)));
+
+		/* wptr in the WRID[31:0] */
+		((union t3_wrid *)(wqe+1))->id0.low = rdev_p->ctrl_qp.wptr;
+
+		/*
+		 * This must be the last write with a memory barrier
+		 * for the genbit
+		 */
+		build_fw_riwrh((struct fw_riwrh *) wqe, T3_WR_BP, flag,
+			       Q_GENBIT(rdev_p->ctrl_qp.wptr,
+					T3_CTRL_QP_SIZE_LOG2), T3_CTRL_QP_ID,
+			       wr_len);
+		if (flag == T3_COMPLETION_FLAG)
+			ring_doorbell(rdev_p->ctrl_qp.doorbell, T3_CTRL_QP_ID);
+		len -= 96;
+		rdev_p->ctrl_qp.wptr++;
+	}
+	return 0;
+}
+
+/* IN: stag key, pdid, perm, zbva, to, len, page_size, pbl, and pbl_size
+ * OUT: stag index, actual pbl_size, pbl_addr allocated.
+ * TBD: shared memory region support
+ */
+static int __cxio_tpt_op(struct cxio_rdev *rdev_p, u32 reset_tpt_entry,
+			 u32 *stag, u8 stag_state, u32 pdid,
+			 enum tpt_mem_type type, enum tpt_mem_perm perm,
+			 u32 zbva, u64 to, u32 len, u8 page_size, __be64 *pbl,
+			 u32 *pbl_size, u32 *pbl_addr)
+{
+	int err;
+	struct tpt_entry tpt;
+	u32 stag_idx;
+	u32 wptr;
+	int rereg = (*stag != T3_STAG_UNSET);
+
+	stag_state = stag_state > 0;
+	stag_idx = (*stag) >> 8;
+
+	if ((!reset_tpt_entry) && !(*stag != T3_STAG_UNSET)) {
+		stag_idx = cxio_hal_get_stag(rdev_p->rscp);
+		if (!stag_idx)
+			return -ENOMEM;
+		*stag = (stag_idx << 8) | ((*stag) & 0xFF);
+	}
+	PDBG("%s stag_state 0x%0x type 0x%0x pdid 0x%0x, stag_idx 0x%x\n",
+	     __FUNCTION__, stag_state, type, pdid, stag_idx);
+
+	if (reset_tpt_entry)
+		cxio_hal_pblpool_free(rdev_p, *pbl_addr, *pbl_size << 3);
+	else if (!rereg) {
+		*pbl_addr = cxio_hal_pblpool_alloc(rdev_p, *pbl_size << 3);
+		if (!*pbl_addr) {
+			return -ENOMEM;
+		}
+	}
+
+	down_interruptible(&rdev_p->ctrl_qp.sem);
+
+	/* write PBL first if any - update pbl only if pbl list exist */
+	if (pbl) {
+
+		PDBG("%s *pdb_addr 0x%x, pbl_base 0x%x, pbl_size %d\n",
+		     __FUNCTION__, *pbl_addr, rdev_p->rnic_info.pbl_base,
+		     *pbl_size);
+		err = cxio_hal_ctrl_qp_write_mem(rdev_p,
+				(*pbl_addr >> 5),
+				(*pbl_size << 3), pbl, 0);
+		if (err)
+			goto ret;
+	}
+
+	/* write TPT entry */
+	if (reset_tpt_entry)
+		memset(&tpt, 0, sizeof(tpt));
+	else {
+		tpt.valid_stag_pdid = cpu_to_be32(F_TPT_VALID |
+				V_TPT_STAG_KEY((*stag) & M_TPT_STAG_KEY) |
+				V_TPT_STAG_STATE(stag_state) |
+				V_TPT_STAG_TYPE(type) | V_TPT_PDID(pdid));
+		BUG_ON(page_size >= 28);
+		tpt.flags_pagesize_qpid = cpu_to_be32(V_TPT_PERM(perm) |
+				F_TPT_MW_BIND_ENABLE |
+				V_TPT_ADDR_TYPE((zbva ? TPT_ZBTO : TPT_VATO)) |
+				V_TPT_PAGE_SIZE(page_size));
+		tpt.rsvd_pbl_addr = reset_tpt_entry ? 0 :
+				    cpu_to_be32(V_TPT_PBL_ADDR(PBL_OFF(rdev_p, *pbl_addr)>>3));
+		tpt.len = cpu_to_be32(len);
+		tpt.va_hi = cpu_to_be32((u32) (to >> 32));
+		tpt.va_low_or_fbo = cpu_to_be32((u32) (to & 0xFFFFFFFFULL));
+		tpt.rsvd_bind_cnt_or_pstag = 0;
+		tpt.rsvd_pbl_size = reset_tpt_entry ? 0 :
+				  cpu_to_be32(V_TPT_PBL_SIZE((*pbl_size) >> 2));
+	}
+	err = cxio_hal_ctrl_qp_write_mem(rdev_p,
+				       stag_idx +
+				       (rdev_p->rnic_info.tpt_base >> 5),
+				       sizeof(tpt), &tpt, 1);
+
+	/* release the stag index to free pool */
+	if (reset_tpt_entry)
+		cxio_hal_put_stag(rdev_p->rscp, stag_idx);
+ret:
+	wptr = rdev_p->ctrl_qp.wptr;
+	up(&rdev_p->ctrl_qp.sem);
+	if (!err)
+		if (wait_event_interruptible(rdev_p->ctrl_qp.waitq,
+					     SEQ32_GE(rdev_p->ctrl_qp.rptr,
+						      wptr)))
+			return -ERESTARTSYS;
+	return err;
+}
+
+/* IN : stag key, pdid, pbl_size
+ * Out: stag index, actaul pbl_size, and pbl_addr allocated.
+ */
+int cxio_allocate_stag(struct cxio_rdev *rdev_p, u32 * stag, u32 pdid,
+		       enum tpt_mem_perm perm, u32 * pbl_size, u32 * pbl_addr)
+{
+	*stag = T3_STAG_UNSET;
+	return (__cxio_tpt_op(rdev_p, 0, stag, 0, pdid, TPT_NON_SHARED_MR,
+			      perm, 0, 0ULL, 0, 0, NULL, pbl_size, pbl_addr));
+}
+
+int cxio_register_phys_mem(struct cxio_rdev *rdev_p, u32 *stag, u32 pdid,
+			   enum tpt_mem_perm perm, u32 zbva, u64 to, u32 len,
+			   u8 page_size, __be64 *pbl, u32 *pbl_size,
+			   u32 *pbl_addr)
+{
+	*stag = T3_STAG_UNSET;
+	return __cxio_tpt_op(rdev_p, 0, stag, 1, pdid, TPT_NON_SHARED_MR, perm,
+			     zbva, to, len, page_size, pbl, pbl_size, pbl_addr);
+}
+
+int cxio_reregister_phys_mem(struct cxio_rdev *rdev_p, u32 *stag, u32 pdid,
+			   enum tpt_mem_perm perm, u32 zbva, u64 to, u32 len,
+			   u8 page_size, __be64 *pbl, u32 *pbl_size,
+			   u32 *pbl_addr)
+{
+	return __cxio_tpt_op(rdev_p, 0, stag, 1, pdid, TPT_NON_SHARED_MR, perm,
+			     zbva, to, len, page_size, pbl, pbl_size, pbl_addr);
+}
+
+int cxio_dereg_mem(struct cxio_rdev *rdev_p, u32 stag, u32 pbl_size,
+		   u32 pbl_addr)
+{
+	return __cxio_tpt_op(rdev_p, 1, &stag, 0, 0, 0, 0, 0, 0ULL, 0, 0, NULL,
+			     &pbl_size, &pbl_addr);
+}
+
+int cxio_allocate_window(struct cxio_rdev *rdev_p, u32 * stag, u32 pdid)
+{
+	u32 pbl_size = 0;
+	*stag = T3_STAG_UNSET;
+	return __cxio_tpt_op(rdev_p, 0, stag, 0, pdid, TPT_MW, 0, 0, 0ULL, 0, 0,
+			     NULL, &pbl_size, NULL);
+}
+
+int cxio_deallocate_window(struct cxio_rdev *rdev_p, u32 stag)
+{
+	return __cxio_tpt_op(rdev_p, 1, &stag, 0, 0, 0, 0, 0, 0ULL, 0, 0, NULL,
+			     NULL, NULL);
+}
+
+int cxio_rdma_init(struct cxio_rdev *rdev_p, struct t3_rdma_init_attr *attr)
+{
+	struct t3_rdma_init_wr *wqe;
+	struct sk_buff *skb = alloc_skb(sizeof(*wqe), GFP_ATOMIC);
+	if (!skb)
+		return -ENOMEM;
+	PDBG("%s rdev_p %p\n", __FUNCTION__, rdev_p);
+	wqe = (struct t3_rdma_init_wr *) __skb_put(skb, sizeof(*wqe));
+	wqe->wrh.op_seop_flags = cpu_to_be32(V_FW_RIWR_OP(T3_WR_INIT));
+	wqe->wrh.gen_tid_len = cpu_to_be32(V_FW_RIWR_TID(attr->tid) |
+					   V_FW_RIWR_LEN(sizeof(*wqe) >> 3));
+	wqe->wrid.id1 = 0;
+	wqe->qpid = cpu_to_be32(attr->qpid);
+	wqe->pdid = cpu_to_be32(attr->pdid);
+	wqe->scqid = cpu_to_be32(attr->scqid);
+	wqe->rcqid = cpu_to_be32(attr->rcqid);
+	wqe->rq_addr = cpu_to_be32(attr->rq_addr - rdev_p->rnic_info.rqt_base);
+	wqe->rq_size = cpu_to_be32(attr->rq_size);
+	wqe->mpaattrs = attr->mpaattrs;
+	wqe->qpcaps = attr->qpcaps;
+	wqe->ulpdu_size = cpu_to_be16(attr->tcp_emss);
+	wqe->flags = cpu_to_be32(attr->flags);
+	wqe->ord = cpu_to_be32(attr->ord);
+	wqe->ird = cpu_to_be32(attr->ird);
+	wqe->qp_dma_addr = cpu_to_be64(attr->qp_dma_addr);
+	wqe->qp_dma_size = cpu_to_be32(attr->qp_dma_size);
+	wqe->rsvd = 0;
+	skb->priority = 0;	/* 0=>ToeQ; 1=>CtrlQ */
+	return (cxgb3_ofld_send(rdev_p->t3cdev_p, skb));
+}
+
+void cxio_register_ev_cb(cxio_hal_ev_callback_func_t ev_cb)
+{
+	cxio_ev_cb = ev_cb;
+}
+
+void cxio_unregister_ev_cb(cxio_hal_ev_callback_func_t ev_cb)
+{
+	cxio_ev_cb = NULL;
+}
+
+static int cxio_hal_ev_handler(struct t3cdev *t3cdev_p, struct sk_buff *skb)
+{
+	static int cnt;
+	struct cxio_rdev *rdev_p = NULL;
+	struct respQ_msg_t *rsp_msg = (struct respQ_msg_t *) skb->data;
+	PDBG("%d: %s cq_id 0x%x cq_ptr 0x%x genbit %0x overflow %0x an %0x"
+	     " se %0x notify %0x cqbranch %0x creditth %0x\n",
+	     cnt, __FUNCTION__, RSPQ_CQID(rsp_msg), RSPQ_CQPTR(rsp_msg),
+	     RSPQ_GENBIT(rsp_msg), RSPQ_OVERFLOW(rsp_msg), RSPQ_AN(rsp_msg),
+	     RSPQ_SE(rsp_msg), RSPQ_NOTIFY(rsp_msg), RSPQ_CQBRANCH(rsp_msg),
+	     RSPQ_CREDIT_THRESH(rsp_msg));
+	PDBG("CQE: QPID 0x%0x genbit %0x type 0x%0x status 0x%0x opcode %d "
+	     "len 0x%0x wrid_hi_stag 0x%x wrid_low_msn 0x%x\n",
+	     CQE_QPID(rsp_msg->cqe), CQE_GENBIT(rsp_msg->cqe),
+	     CQE_TYPE(rsp_msg->cqe), CQE_STATUS(rsp_msg->cqe),
+	     CQE_OPCODE(rsp_msg->cqe), CQE_LEN(rsp_msg->cqe),
+	     CQE_WRID_HI(rsp_msg->cqe), CQE_WRID_LOW(rsp_msg->cqe));
+	rdev_p = (struct cxio_rdev *)t3cdev_p->ulp;
+	if (!rdev_p) {
+		PDBG("%s called by t3cdev %p with null ulp\n", __FUNCTION__,
+		     t3cdev_p);
+		return 0;
+	}
+	if (CQE_QPID(rsp_msg->cqe) == T3_CTRL_QP_ID) {
+		rdev_p->ctrl_qp.rptr = CQE_WRID_LOW(rsp_msg->cqe) + 1;
+		wake_up_interruptible(&rdev_p->ctrl_qp.waitq);
+		dev_kfree_skb_irq(skb);
+	} else if (CQE_QPID(rsp_msg->cqe) == 0xfff8)
+		dev_kfree_skb_irq(skb);
+	else if (cxio_ev_cb)
+		(*cxio_ev_cb) (rdev_p, skb);
+	else
+		dev_kfree_skb_irq(skb);
+	cnt++;
+	return 0;
+}
+
+/* Caller takes care of locking if needed */
+int cxio_rdev_open(struct cxio_rdev *rdev_p)
+{
+	struct net_device *netdev_p = NULL;
+	int err = 0;
+	if (strlen(rdev_p->dev_name)) {
+		if (cxio_hal_find_rdev_by_name(rdev_p->dev_name)) {
+			return -EBUSY;
+		}
+		netdev_p = dev_get_by_name(rdev_p->dev_name);
+		if (!netdev_p) {
+			return -EINVAL;
+		}
+		dev_put(netdev_p);
+	} else if (rdev_p->t3cdev_p) {
+		if (cxio_hal_find_rdev_by_t3cdev(rdev_p->t3cdev_p)) {
+			return -EBUSY;
+		}
+		netdev_p = rdev_p->t3cdev_p->lldev;
+		strncpy(rdev_p->dev_name, rdev_p->t3cdev_p->name,
+			T3_MAX_DEV_NAME_LEN);
+	} else {
+		PDBG("%s t3cdev_p or dev_name must be set\n", __FUNCTION__);
+		return -EINVAL;
+	}
+
+	list_add_tail(&rdev_p->entry, &rdev_list);
+
+	PDBG("%s opening rnic dev %s\n", __FUNCTION__, rdev_p->dev_name);
+	memset(&rdev_p->ctrl_qp, 0, sizeof(rdev_p->ctrl_qp));
+	if (!rdev_p->t3cdev_p)
+		rdev_p->t3cdev_p = T3CDEV(netdev_p);
+	rdev_p->t3cdev_p->ulp = (void *) rdev_p;
+	err = rdev_p->t3cdev_p->ctl(rdev_p->t3cdev_p, RDMA_GET_PARAMS,
+					 &(rdev_p->rnic_info));
+	if (err) {
+		printk(KERN_ERR "%s t3cdev_p(%p)->ctl returned error %d.\n",
+		     __FUNCTION__, rdev_p->t3cdev_p, err);
+		goto err1;
+	}
+	err = rdev_p->t3cdev_p->ctl(rdev_p->t3cdev_p, GET_PORTS,
+				    &(rdev_p->port_info));
+	if (err) {
+		printk(KERN_ERR "%s t3cdev_p(%p)->ctl returned error %d.\n",
+		     __FUNCTION__, rdev_p->t3cdev_p, err);
+		goto err1;
+	}
+
+	/*
+	 * qpshift is the number of bits to shift the qpid left in order
+	 * to get the correct address of the doorbell for that qp.
+	 */
+	cxio_init_ucontext(rdev_p, &rdev_p->uctx);
+	rdev_p->qpshift = PAGE_SHIFT -
+			  ilog2(65536 >>
+			            ilog2(rdev_p->rnic_info.udbell_len >>
+					      PAGE_SHIFT));
+	rdev_p->qpnr = rdev_p->rnic_info.udbell_len >> PAGE_SHIFT;
+	rdev_p->qpmask = (65536 >> ilog2(rdev_p->qpnr)) - 1;
+	PDBG("%s rnic %s info: tpt_base 0x%0x tpt_top 0x%0x num stags %d "
+	     "pbl_base 0x%0x pbl_top 0x%0x rqt_base 0x%0x, rqt_top 0x%0x\n",
+	     __FUNCTION__, rdev_p->dev_name, rdev_p->rnic_info.tpt_base,
+	     rdev_p->rnic_info.tpt_top, cxio_num_stags(rdev_p),
+	     rdev_p->rnic_info.pbl_base,
+	     rdev_p->rnic_info.pbl_top, rdev_p->rnic_info.rqt_base,
+	     rdev_p->rnic_info.rqt_top);
+	PDBG("udbell_len 0x%0x udbell_physbase 0x%lx kdb_addr %p qpshift %lu "
+	     "qpnr %d qpmask 0x%x\n",
+	     rdev_p->rnic_info.udbell_len,
+	     rdev_p->rnic_info.udbell_physbase, rdev_p->rnic_info.kdb_addr,
+	     rdev_p->qpshift, rdev_p->qpnr, rdev_p->qpmask);
+
+	err = cxio_hal_init_ctrl_qp(rdev_p);
+	if (err) {
+		printk(KERN_ERR "%s error %d initializing ctrl_qp.\n",
+		       __FUNCTION__, err);
+		goto err1;
+	}
+	err = cxio_hal_init_resource(rdev_p, cxio_num_stags(rdev_p), 0,
+				     0, T3_MAX_NUM_QP, T3_MAX_NUM_CQ,
+				     T3_MAX_NUM_PD);
+	if (err) {
+		printk(KERN_ERR "%s error %d initializing hal resources.\n",
+		       __FUNCTION__, err);
+		goto err2;
+	}
+	err = cxio_hal_pblpool_create(rdev_p);
+	if (err) {
+		printk(KERN_ERR "%s error %d initializing pbl mem pool.\n",
+		       __FUNCTION__, err);
+		goto err3;
+	}
+	err = cxio_hal_rqtpool_create(rdev_p);
+	if (err) {
+		printk(KERN_ERR "%s error %d initializing rqt mem pool.\n",
+		       __FUNCTION__, err);
+		goto err4;
+	}
+	return 0;
+err4:
+	cxio_hal_pblpool_destroy(rdev_p);
+err3:
+	cxio_hal_destroy_resource(rdev_p->rscp);
+err2:
+	cxio_hal_destroy_ctrl_qp(rdev_p);
+err1:
+	list_del(&rdev_p->entry);
+	return err;
+}
+
+void cxio_rdev_close(struct cxio_rdev *rdev_p)
+{
+	if (rdev_p) {
+		cxio_hal_pblpool_destroy(rdev_p);
+		cxio_hal_rqtpool_destroy(rdev_p);
+		list_del(&rdev_p->entry);
+		rdev_p->t3cdev_p->ulp = NULL;
+		cxio_hal_destroy_ctrl_qp(rdev_p);
+		cxio_hal_destroy_resource(rdev_p->rscp);
+	}
+}
+
+int __init cxio_hal_init(void)
+{
+	if (cxio_hal_init_rhdl_resource(T3_MAX_NUM_RI))
+		return -ENOMEM;
+	t3_register_cpl_handler(CPL_ASYNC_NOTIF, cxio_hal_ev_handler);
+	return 0;
+}
+
+void __exit cxio_hal_exit(void)
+{
+	struct cxio_rdev *rdev, *tmp;
+
+	t3_register_cpl_handler(CPL_ASYNC_NOTIF, NULL);
+	list_for_each_entry_safe(rdev, tmp, &rdev_list, entry)
+		cxio_rdev_close(rdev);
+	cxio_hal_destroy_rhdl_resource();
+}
+
+static inline void flush_completed_wrs(struct t3_wq *wq, struct t3_cq *cq)
+{
+	struct t3_swsq *sqp;
+	__u32 ptr = wq->sq_rptr;
+	int count = Q_COUNT(wq->sq_rptr, wq->sq_wptr);
+
+	sqp = wq->sq + Q_PTR2IDX(ptr, wq->sq_size_log2);
+	while (count--)
+		if (!sqp->signaled) {
+			ptr++;
+			sqp = wq->sq + Q_PTR2IDX(ptr,  wq->sq_size_log2);
+		} else if (sqp->complete) {
+
+			/*
+			 * Insert this completed cqe into the swcq.
+			 */
+			PDBG("%s moving cqe into swcq sq idx %ld cq idx %ld\n",
+			     __FUNCTION__, Q_PTR2IDX(ptr,  wq->sq_size_log2),
+			     Q_PTR2IDX(cq->sw_wptr, cq->size_log2));
+			sqp->cqe.header |= htonl(V_CQE_SWCQE(1));
+			*(cq->sw_queue + Q_PTR2IDX(cq->sw_wptr, cq->size_log2))
+				= sqp->cqe;
+			cq->sw_wptr++;
+			sqp->signaled = 0;
+			break;
+		} else
+			break;
+}
+
+static inline void create_read_req_cqe(struct t3_wq *wq,
+				       struct t3_cqe *hw_cqe,
+				       struct t3_cqe *read_cqe)
+{
+	read_cqe->u.scqe.wrid_hi = wq->oldest_read->sq_wptr;
+	read_cqe->len = wq->oldest_read->read_len;
+	read_cqe->header = htonl(V_CQE_QPID(CQE_QPID(*hw_cqe)) |
+				 V_CQE_SWCQE(SW_CQE(*hw_cqe)) |
+				 V_CQE_OPCODE(T3_READ_REQ) |
+				 V_CQE_TYPE(1));
+}
+
+/*
+ * Return a ptr to the next read wr in the SWSQ or NULL.
+ */
+static inline void advance_oldest_read(struct t3_wq *wq)
+{
+
+	u32 rptr = wq->oldest_read - wq->sq + 1;
+	u32 wptr = Q_PTR2IDX(wq->sq_wptr, wq->sq_size_log2);
+
+	while (Q_PTR2IDX(rptr, wq->sq_size_log2) != wptr) {
+		wq->oldest_read = wq->sq + Q_PTR2IDX(rptr, wq->sq_size_log2);
+
+		if (wq->oldest_read->opcode == T3_READ_REQ)
+			return;
+		rptr++;
+	}
+	wq->oldest_read = NULL;
+}
+
+/*
+ * cxio_poll_cq
+ *
+ * Caller must:
+ *     check the validity of the first CQE,
+ *     supply the wq assicated with the qpid.
+ *
+ * credit: cq credit to return to sge.
+ * cqe_flushed: 1 iff the CQE is flushed.
+ * cqe: copy of the polled CQE.
+ *
+ * return value:
+ *     0       CQE returned,
+ *    -1       CQE skipped, try again.
+ */
+int cxio_poll_cq(struct t3_wq *wq, struct t3_cq *cq, struct t3_cqe *cqe,
+		     u8 *cqe_flushed, u64 *cookie, u32 *credit)
+{
+	int ret = 0;
+	struct t3_cqe *hw_cqe, read_cqe;
+
+	*cqe_flushed = 0;
+	*credit = 0;
+	hw_cqe = cxio_next_cqe(cq);
+
+	PDBG("%s CQE OOO %d qpid 0x%0x genbit %d type %d status 0x%0x"
+	     " opcode 0x%0x len 0x%0x wrid_hi_stag 0x%x wrid_low_msn 0x%x\n",
+	     __FUNCTION__, CQE_OOO(*hw_cqe), CQE_QPID(*hw_cqe),
+	     CQE_GENBIT(*hw_cqe), CQE_TYPE(*hw_cqe), CQE_STATUS(*hw_cqe),
+	     CQE_OPCODE(*hw_cqe), CQE_LEN(*hw_cqe), CQE_WRID_HI(*hw_cqe),
+	     CQE_WRID_LOW(*hw_cqe));
+
+	/*
+	 * skip cqe's not affiliated with a QP.
+	 */
+	if (wq == NULL) {
+		ret = -1;
+		goto skip_cqe;
+	}
+
+	/*
+	 * Gotta tweak READ completions:
+	 *	1) the cqe doesn't contain the sq_wptr from the wr.
+	 *	2) opcode not reflected from the wr.
+	 *	3) read_len not reflected from the wr.
+	 *	4) cq_type is RQ_TYPE not SQ_TYPE.
+	 */
+	if (RQ_TYPE(*hw_cqe) && (CQE_OPCODE(*hw_cqe) == T3_READ_RESP)) {
+
+		/*
+		 * Don't write to the HWCQ, so create a new read req CQE
+		 * in local memory.
+		 */
+		create_read_req_cqe(wq, hw_cqe, &read_cqe);
+		hw_cqe = &read_cqe;
+		advance_oldest_read(wq);
+	}
+
+	/*
+	 * T3A: Discard TERMINATE CQEs.
+	 */
+	if (CQE_OPCODE(*hw_cqe) == T3_TERMINATE) {
+		ret = -1;
+		wq->error = 1;
+		goto skip_cqe;
+	}
+
+	if (CQE_STATUS(*hw_cqe) || wq->error) {
+		*cqe_flushed = wq->error;
+		wq->error = 1;
+
+		/*
+		 * T3A inserts errors into the CQE.  We cannot return
+		 * these as work completions.
+		 */
+		/* incoming write failures */
+		if ((CQE_OPCODE(*hw_cqe) == T3_RDMA_WRITE)
+		     && RQ_TYPE(*hw_cqe)) {
+			ret = -1;
+			goto skip_cqe;
+		}
+		/* incoming read request failures */
+		if ((CQE_OPCODE(*hw_cqe) == T3_READ_RESP) && SQ_TYPE(*hw_cqe)) {
+			ret = -1;
+			goto skip_cqe;
+		}
+
+		/* incoming SEND with no receive posted failures */
+		if ((CQE_OPCODE(*hw_cqe) == T3_SEND) && RQ_TYPE(*hw_cqe) &&
+		    Q_EMPTY(wq->rq_rptr, wq->rq_wptr)) {
+			ret = -1;
+			goto skip_cqe;
+		}
+		goto proc_cqe;
+	}
+
+	/*
+	 * RECV completion.
+	 */
+	if (RQ_TYPE(*hw_cqe)) {
+
+		/*
+		 * HW only validates 4 bits of MSN.  So we must validate that
+		 * the MSN in the SEND is the next expected MSN.  If its not,
+		 * then we complete this with TPT_ERR_MSN and mark the wq in
+		 * error.
+		 */
+		if (unlikely((CQE_WRID_MSN(*hw_cqe) != (wq->rq_rptr + 1)))) {
+			wq->error = 1;
+			hw_cqe->header |= htonl(V_CQE_STATUS(TPT_ERR_MSN));
+			goto proc_cqe;
+		}
+		goto proc_cqe;
+	}
+
+	/*
+	 * If we get here its a send completion.
+	 *
+	 * Handle out of order completion. These get stuffed
+	 * in the SW SQ. Then the SW SQ is walked to move any
+	 * now in-order completions into the SW CQ.  This handles
+	 * 2 cases:
+	 *	1) reaping unsignaled WRs when the first subsequent
+	 *	   signaled WR is completed.
+	 *	2) out of order read completions.
+	 */
+	if (!SW_CQE(*hw_cqe) && (CQE_WRID_SQ_WPTR(*hw_cqe) != wq->sq_rptr)) {
+		struct t3_swsq *sqp;
+
+		PDBG("%s out of order completion going in swsq at idx %ld\n",
+		     __FUNCTION__,
+		     Q_PTR2IDX(CQE_WRID_SQ_WPTR(*hw_cqe), wq->sq_size_log2));
+		sqp = wq->sq +
+		      Q_PTR2IDX(CQE_WRID_SQ_WPTR(*hw_cqe), wq->sq_size_log2);
+		sqp->cqe = *hw_cqe;
+		sqp->complete = 1;
+		ret = -1;
+		goto flush_wq;
+	}
+
+proc_cqe:
+	*cqe = *hw_cqe;
+
+	/*
+	 * Reap the associated WR(s) that are freed up with this
+	 * completion.
+	 */
+	if (SQ_TYPE(*hw_cqe)) {
+		wq->sq_rptr = CQE_WRID_SQ_WPTR(*hw_cqe);
+		PDBG("%s completing sq idx %ld\n", __FUNCTION__,
+		     Q_PTR2IDX(wq->sq_rptr, wq->sq_size_log2));
+		*cookie = (wq->sq +
+			   Q_PTR2IDX(wq->sq_rptr, wq->sq_size_log2))->wr_id;
+		wq->sq_rptr++;
+	} else {
+		PDBG("%s completing rq idx %ld\n", __FUNCTION__,
+		     Q_PTR2IDX(wq->rq_rptr, wq->rq_size_log2));
+		*cookie = *(wq->rq + Q_PTR2IDX(wq->rq_rptr, wq->rq_size_log2));
+		wq->rq_rptr++;
+	}
+
+flush_wq:
+	/*
+	 * Flush any completed cqes that are now in-order.
+	 */
+	flush_completed_wrs(wq, cq);
+
+skip_cqe:
+	if (SW_CQE(*hw_cqe)) {
+		PDBG("%s cq %p cqid 0x%x skip sw cqe sw_rptr 0x%x\n",
+		     __FUNCTION__, cq, cq->cqid, cq->sw_rptr);
+		++cq->sw_rptr;
+	} else {
+		PDBG("%s cq %p cqid 0x%x skip hw cqe rptr 0x%x\n",
+		     __FUNCTION__, cq, cq->cqid, cq->rptr);
+		++cq->rptr;
+
+		/*
+		 * T3A: compute credits.
+		 */
+		if (((cq->rptr - cq->wptr) > (1 << (cq->size_log2 - 1)))
+		    || ((cq->rptr - cq->wptr) >= 128)) {
+			*credit = cq->rptr - cq->wptr;
+			cq->wptr = cq->rptr;
+		}
+	}
+	return ret;
+}
diff --git a/drivers/infiniband/hw/cxgb3/cxio_hal.h b/drivers/infiniband/hw/cxgb3/cxio_hal.h
new file mode 100644
index 0000000..8fb2999
--- /dev/null
+++ b/drivers/infiniband/hw/cxgb3/cxio_hal.h
@@ -0,0 +1,201 @@
+/*
+ * Copyright (c) 2006 Chelsio, Inc. All rights reserved.
+ * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+#ifndef  __CXIO_HAL_H__
+#define  __CXIO_HAL_H__
+
+#include <linux/list.h>
+#include <linux/mutex.h>
+
+#include "t3_cpl.h"
+#include "t3cdev.h"
+#include "cxgb3_ctl_defs.h"
+#include "cxio_wr.h"
+
+#define T3_CTRL_QP_ID    FW_RI_SGEEC_START
+#define T3_CTL_QP_TID	 FW_RI_TID_START
+#define T3_CTRL_QP_SIZE_LOG2  8
+#define T3_CTRL_CQ_ID    0
+
+/* TBD */
+#define T3_MAX_NUM_RI (1<<15)
+#define T3_MAX_NUM_QP (1<<15)
+#define T3_MAX_NUM_CQ (1<<15)
+#define T3_MAX_NUM_PD (1<<15)
+#define T3_MAX_PBL_SIZE 256
+#define T3_MAX_RQ_SIZE 1024
+#define T3_MAX_NUM_STAG (1<<15)
+
+#define T3_STAG_UNSET 0xffffffff
+
+#define T3_MAX_DEV_NAME_LEN 32
+
+struct cxio_hal_ctrl_qp {
+	u32 wptr;
+	u32 rptr;
+	struct semaphore sem;	/* for the wtpr, can sleep */
+	wait_queue_head_t waitq;	/* wait for RspQ/CQE msg */
+	union t3_wr *workq;	/* the work request queue */
+	dma_addr_t dma_addr;	/* pci bus address of the workq */
+	DECLARE_PCI_UNMAP_ADDR(mapping)
+	void __iomem *doorbell;
+};
+
+struct cxio_hal_resource {
+	struct kfifo *tpt_fifo;
+	spinlock_t tpt_fifo_lock;
+	struct kfifo *qpid_fifo;
+	spinlock_t qpid_fifo_lock;
+	struct kfifo *cqid_fifo;
+	spinlock_t cqid_fifo_lock;
+	struct kfifo *pdid_fifo;
+	spinlock_t pdid_fifo_lock;
+};
+
+struct cxio_qpid_list {
+	struct list_head entry;
+	u32 qpid;
+};
+
+struct cxio_ucontext {
+	struct list_head qpids;
+	struct mutex lock;
+};
+
+struct cxio_rdev {
+	char dev_name[T3_MAX_DEV_NAME_LEN];
+	struct t3cdev *t3cdev_p;
+	struct rdma_info rnic_info;
+	struct adap_ports port_info;
+	struct cxio_hal_resource *rscp;
+	struct cxio_hal_ctrl_qp ctrl_qp;
+	void *ulp;
+	unsigned long qpshift;
+	u32 qpnr;
+	u32 qpmask;
+	struct cxio_ucontext uctx;
+	struct gen_pool *pbl_pool;
+	struct gen_pool *rqt_pool;
+	struct list_head entry;
+};
+
+static inline int cxio_num_stags(struct cxio_rdev *rdev_p)
+{
+	return min((int)T3_MAX_NUM_STAG, (int)((rdev_p->rnic_info.tpt_top - rdev_p->rnic_info.tpt_base) >> 5));
+}
+
+typedef void (*cxio_hal_ev_callback_func_t) (struct cxio_rdev * rdev_p,
+					     struct sk_buff * skb);
+
+#define RSPQ_CQID(rsp) (be32_to_cpu(rsp->cq_ptrid) & 0xffff)
+#define RSPQ_CQPTR(rsp) ((be32_to_cpu(rsp->cq_ptrid) >> 16) & 0xffff)
+#define RSPQ_GENBIT(rsp) ((be32_to_cpu(rsp->flags) >> 16) & 1)
+#define RSPQ_OVERFLOW(rsp) ((be32_to_cpu(rsp->flags) >> 17) & 1)
+#define RSPQ_AN(rsp) ((be32_to_cpu(rsp->flags) >> 18) & 1)
+#define RSPQ_SE(rsp) ((be32_to_cpu(rsp->flags) >> 19) & 1)
+#define RSPQ_NOTIFY(rsp) ((be32_to_cpu(rsp->flags) >> 20) & 1)
+#define RSPQ_CQBRANCH(rsp) ((be32_to_cpu(rsp->flags) >> 21) & 1)
+#define RSPQ_CREDIT_THRESH(rsp) ((be32_to_cpu(rsp->flags) >> 22) & 1)
+
+struct respQ_msg_t {
+	__be32 flags;		/* flit 0 */
+	__be32 cq_ptrid;
+	__be64 rsvd;		/* flit 1 */
+	struct t3_cqe cqe;	/* flits 2-3 */
+};
+
+enum t3_cq_opcode {
+	CQ_ARM_AN = 0x2,
+	CQ_ARM_SE = 0x6,
+	CQ_FORCE_AN = 0x3,
+	CQ_CREDIT_UPDATE = 0x7
+};
+
+int cxio_rdev_open(struct cxio_rdev *rdev);
+void cxio_rdev_close(struct cxio_rdev *rdev);
+int cxio_hal_cq_op(struct cxio_rdev *rdev, struct t3_cq *cq,
+		   enum t3_cq_opcode op, u32 credit);
+int cxio_hal_clear_qp_ctx(struct cxio_rdev *rdev, u32 qpid);
+int cxio_create_cq(struct cxio_rdev *rdev, struct t3_cq *cq);
+int cxio_destroy_cq(struct cxio_rdev *rdev, struct t3_cq *cq);
+int cxio_resize_cq(struct cxio_rdev *rdev, struct t3_cq *cq);
+void cxio_release_ucontext(struct cxio_rdev *rdev, struct cxio_ucontext *uctx);
+void cxio_init_ucontext(struct cxio_rdev *rdev, struct cxio_ucontext *uctx);
+int cxio_create_qp(struct cxio_rdev *rdev, u32 kernel_domain, struct t3_wq *wq,
+		   struct cxio_ucontext *uctx);
+int cxio_destroy_qp(struct cxio_rdev *rdev, struct t3_wq *wq,
+		    struct cxio_ucontext *uctx);
+int cxio_peek_cq(struct t3_wq *wr, struct t3_cq *cq, int opcode);
+int cxio_allocate_stag(struct cxio_rdev *rdev, u32 * stag, u32 pdid,
+		       enum tpt_mem_perm perm, u32 * pbl_size, u32 * pbl_addr);
+int cxio_register_phys_mem(struct cxio_rdev *rdev, u32 * stag, u32 pdid,
+			   enum tpt_mem_perm perm, u32 zbva, u64 to, u32 len,
+			   u8 page_size, __be64 *pbl, u32 *pbl_size,
+			   u32 *pbl_addr);
+int cxio_reregister_phys_mem(struct cxio_rdev *rdev, u32 * stag, u32 pdid,
+			   enum tpt_mem_perm perm, u32 zbva, u64 to, u32 len,
+			   u8 page_size, __be64 *pbl, u32 *pbl_size,
+			   u32 *pbl_addr);
+int cxio_dereg_mem(struct cxio_rdev *rdev, u32 stag, u32 pbl_size,
+		   u32 pbl_addr);
+int cxio_allocate_window(struct cxio_rdev *rdev, u32 * stag, u32 pdid);
+int cxio_deallocate_window(struct cxio_rdev *rdev, u32 stag);
+int cxio_rdma_init(struct cxio_rdev *rdev, struct t3_rdma_init_attr *attr);
+void cxio_register_ev_cb(cxio_hal_ev_callback_func_t ev_cb);
+void cxio_unregister_ev_cb(cxio_hal_ev_callback_func_t ev_cb);
+u32 cxio_hal_get_rhdl(void);
+void cxio_hal_put_rhdl(u32 rhdl);
+u32 cxio_hal_get_pdid(struct cxio_hal_resource *rscp);
+void cxio_hal_put_pdid(struct cxio_hal_resource *rscp, u32 pdid);
+int __init cxio_hal_init(void);
+void __exit cxio_hal_exit(void);
+void cxio_flush_rq(struct t3_wq *wq, struct t3_cq *cq, int count);
+void cxio_flush_sq(struct t3_wq *wq, struct t3_cq *cq, int count);
+void cxio_count_rcqes(struct t3_cq *cq, struct t3_wq *wq, int *count);
+void cxio_count_scqes(struct t3_cq *cq, struct t3_wq *wq, int *count);
+void cxio_flush_hw_cq(struct t3_cq *cq);
+int cxio_poll_cq(struct t3_wq *wq, struct t3_cq *cq, struct t3_cqe *cqe,
+		     u8 *cqe_flushed, u64 *cookie, u32 *credit);
+
+#define MOD "iw_cxgb3: "
+#define PDBG(fmt, args...) pr_debug(MOD fmt, ## args)
+
+#ifdef DEBUG
+void cxio_dump_tpt(struct cxio_rdev *rev, u32 stag);
+void cxio_dump_pbl(struct cxio_rdev *rev, u32 pbl_addr, uint len, u8 shift);
+void cxio_dump_wqe(union t3_wr *wqe);
+void cxio_dump_wce(struct t3_cqe *wce);
+void cxio_dump_rqt(struct cxio_rdev *rdev, u32 hwtid, int nents);
+void cxio_dump_tcb(struct cxio_rdev *rdev, u32 hwtid);
+#endif
+
+#endif
diff --git a/drivers/infiniband/hw/cxgb3/cxio_resource.c b/drivers/infiniband/hw/cxgb3/cxio_resource.c
new file mode 100644
index 0000000..997aa32
--- /dev/null
+++ b/drivers/infiniband/hw/cxgb3/cxio_resource.c
@@ -0,0 +1,331 @@
+/*
+ * Copyright (c) 2006 Chelsio, Inc. All rights reserved.
+ * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+/* Crude resource management */
+#include <linux/kernel.h>
+#include <linux/random.h>
+#include <linux/slab.h>
+#include <linux/kfifo.h>
+#include <linux/spinlock.h>
+#include <linux/errno.h>
+#include "cxio_resource.h"
+#include "cxio_hal.h"
+
+static struct kfifo *rhdl_fifo;
+static spinlock_t rhdl_fifo_lock;
+
+#define RANDOM_SIZE 16
+
+static int __cxio_init_resource_fifo(struct kfifo **fifo,
+				   spinlock_t *fifo_lock,
+				   u32 nr, u32 skip_low,
+				   u32 skip_high,
+				   int random)
+{
+	u32 i, j, entry = 0, idx;
+	u32 random_bytes;
+	u32 rarray[16];
+	spin_lock_init(fifo_lock);
+
+	*fifo = kfifo_alloc(nr * sizeof(u32), GFP_KERNEL, fifo_lock);
+	if (IS_ERR(*fifo))
+		return -ENOMEM;
+
+	for (i = 0; i < skip_low + skip_high; i++)
+		__kfifo_put(*fifo, (unsigned char *) &entry, sizeof(u32));
+	if (random) {
+		j = 0;
+		random_bytes = random32();
+		for (i = 0; i < RANDOM_SIZE; i++)
+			rarray[i] = i + skip_low;
+		for (i = skip_low + RANDOM_SIZE; i < nr - skip_high; i++) {
+			if (j >= RANDOM_SIZE) {
+				j = 0;
+				random_bytes = random32();
+			}
+			idx = (random_bytes >> (j * 2)) & 0xF;
+			__kfifo_put(*fifo,
+				(unsigned char *) &rarray[idx],
+				sizeof(u32));
+			rarray[idx] = i;
+			j++;
+		}
+		for (i = 0; i < RANDOM_SIZE; i++)
+			__kfifo_put(*fifo,
+				(unsigned char *) &rarray[i],
+				sizeof(u32));
+	} else
+		for (i = skip_low; i < nr - skip_high; i++)
+			__kfifo_put(*fifo, (unsigned char *) &i, sizeof(u32));
+
+	for (i = 0; i < skip_low + skip_high; i++)
+		kfifo_get(*fifo, (unsigned char *) &entry, sizeof(u32));
+	return 0;
+}
+
+static int cxio_init_resource_fifo(struct kfifo **fifo, spinlock_t * fifo_lock,
+				   u32 nr, u32 skip_low, u32 skip_high)
+{
+	return (__cxio_init_resource_fifo(fifo, fifo_lock, nr, skip_low,
+					  skip_high, 0));
+}
+
+static int cxio_init_resource_fifo_random(struct kfifo **fifo,
+				   spinlock_t * fifo_lock,
+				   u32 nr, u32 skip_low, u32 skip_high)
+{
+
+	return (__cxio_init_resource_fifo(fifo, fifo_lock, nr, skip_low,
+					  skip_high, 1));
+}
+
+static int cxio_init_qpid_fifo(struct cxio_rdev *rdev_p)
+{
+	u32 i;
+
+	spin_lock_init(&rdev_p->rscp->qpid_fifo_lock);
+
+	rdev_p->rscp->qpid_fifo = kfifo_alloc(T3_MAX_NUM_QP * sizeof(u32),
+					      GFP_KERNEL,
+					      &rdev_p->rscp->qpid_fifo_lock);
+	if (IS_ERR(rdev_p->rscp->qpid_fifo))
+		return -ENOMEM;
+
+	for (i = 16; i < T3_MAX_NUM_QP; i++)
+		if (!(i & rdev_p->qpmask))
+			__kfifo_put(rdev_p->rscp->qpid_fifo,
+				    (unsigned char *) &i, sizeof(u32));
+	return 0;
+}
+
+int cxio_hal_init_rhdl_resource(u32 nr_rhdl)
+{
+	return cxio_init_resource_fifo(&rhdl_fifo, &rhdl_fifo_lock, nr_rhdl, 1,
+				       0);
+}
+
+void cxio_hal_destroy_rhdl_resource(void)
+{
+	kfifo_free(rhdl_fifo);
+}
+
+/* nr_* must be power of 2 */
+int cxio_hal_init_resource(struct cxio_rdev *rdev_p,
+			   u32 nr_tpt, u32 nr_pbl,
+			   u32 nr_rqt, u32 nr_qpid, u32 nr_cqid, u32 nr_pdid)
+{
+	int err = 0;
+	struct cxio_hal_resource *rscp;
+
+	rscp = kmalloc(sizeof(*rscp), GFP_KERNEL);
+	if (!rscp)
+		return -ENOMEM;
+	rdev_p->rscp = rscp;
+	err = cxio_init_resource_fifo_random(&rscp->tpt_fifo,
+				      &rscp->tpt_fifo_lock,
+				      nr_tpt, 1, 0);
+	if (err)
+		goto tpt_err;
+	err = cxio_init_qpid_fifo(rdev_p);
+	if (err)
+		goto qpid_err;
+	err = cxio_init_resource_fifo(&rscp->cqid_fifo, &rscp->cqid_fifo_lock,
+				      nr_cqid, 1, 0);
+	if (err)
+		goto cqid_err;
+	err = cxio_init_resource_fifo(&rscp->pdid_fifo, &rscp->pdid_fifo_lock,
+				      nr_pdid, 1, 0);
+	if (err)
+		goto pdid_err;
+	return 0;
+pdid_err:
+	kfifo_free(rscp->cqid_fifo);
+cqid_err:
+	kfifo_free(rscp->qpid_fifo);
+qpid_err:
+	kfifo_free(rscp->tpt_fifo);
+tpt_err:
+	return -ENOMEM;
+}
+
+/*
+ * returns 0 if no resource available
+ */
+static inline u32 cxio_hal_get_resource(struct kfifo *fifo)
+{
+	u32 entry;
+	if (kfifo_get(fifo, (unsigned char *) &entry, sizeof(u32)))
+		return entry;
+	else
+		return 0;	/* fifo emptry */
+}
+
+static inline void cxio_hal_put_resource(struct kfifo *fifo, u32 entry)
+{
+	BUG_ON(kfifo_put(fifo, (unsigned char *) &entry, sizeof(u32)) == 0);
+}
+
+u32 cxio_hal_get_rhdl(void)
+{
+	return cxio_hal_get_resource(rhdl_fifo);
+}
+
+void cxio_hal_put_rhdl(u32 rhdl)
+{
+	cxio_hal_put_resource(rhdl_fifo, rhdl);
+}
+
+u32 cxio_hal_get_stag(struct cxio_hal_resource *rscp)
+{
+	return cxio_hal_get_resource(rscp->tpt_fifo);
+}
+
+void cxio_hal_put_stag(struct cxio_hal_resource *rscp, u32 stag)
+{
+	cxio_hal_put_resource(rscp->tpt_fifo, stag);
+}
+
+u32 cxio_hal_get_qpid(struct cxio_hal_resource *rscp)
+{
+	u32 qpid = cxio_hal_get_resource(rscp->qpid_fifo);
+	PDBG("%s qpid 0x%x\n", __FUNCTION__, qpid);
+	return qpid;
+}
+
+void cxio_hal_put_qpid(struct cxio_hal_resource *rscp, u32 qpid)
+{
+	PDBG("%s qpid 0x%x\n", __FUNCTION__, qpid);
+	cxio_hal_put_resource(rscp->qpid_fifo, qpid);
+}
+
+u32 cxio_hal_get_cqid(struct cxio_hal_resource *rscp)
+{
+	return cxio_hal_get_resource(rscp->cqid_fifo);
+}
+
+void cxio_hal_put_cqid(struct cxio_hal_resource *rscp, u32 cqid)
+{
+	cxio_hal_put_resource(rscp->cqid_fifo, cqid);
+}
+
+u32 cxio_hal_get_pdid(struct cxio_hal_resource *rscp)
+{
+	return cxio_hal_get_resource(rscp->pdid_fifo);
+}
+
+void cxio_hal_put_pdid(struct cxio_hal_resource *rscp, u32 pdid)
+{
+	cxio_hal_put_resource(rscp->pdid_fifo, pdid);
+}
+
+void cxio_hal_destroy_resource(struct cxio_hal_resource *rscp)
+{
+	kfifo_free(rscp->tpt_fifo);
+	kfifo_free(rscp->cqid_fifo);
+	kfifo_free(rscp->qpid_fifo);
+	kfifo_free(rscp->pdid_fifo);
+	kfree(rscp);
+}
+
+/*
+ * PBL Memory Manager.  Uses Linux generic allocator.
+ */
+
+#define MIN_PBL_SHIFT 8			/* 256B == min PBL size (32 entries) */
+#define PBL_CHUNK 2*1024*1024
+
+u32 cxio_hal_pblpool_alloc(struct cxio_rdev *rdev_p, int size)
+{
+	unsigned long addr = gen_pool_alloc(rdev_p->pbl_pool, size);
+	PDBG("%s addr 0x%x size %d\n", __FUNCTION__, (u32)addr, size);
+	return (u32)addr;
+}
+
+void cxio_hal_pblpool_free(struct cxio_rdev *rdev_p, u32 addr, int size)
+{
+	PDBG("%s addr 0x%x size %d\n", __FUNCTION__, addr, size);
+	gen_pool_free(rdev_p->pbl_pool, (unsigned long)addr, size);
+}
+
+int cxio_hal_pblpool_create(struct cxio_rdev *rdev_p)
+{
+	unsigned long i;
+	rdev_p->pbl_pool = gen_pool_create(MIN_PBL_SHIFT, -1);
+	if (rdev_p->pbl_pool)
+		for (i = rdev_p->rnic_info.pbl_base;
+		     i <= rdev_p->rnic_info.pbl_top - PBL_CHUNK + 1;
+		     i += PBL_CHUNK)
+			gen_pool_add(rdev_p->pbl_pool, i, PBL_CHUNK, -1);
+	return rdev_p->pbl_pool ? 0 : -ENOMEM;
+}
+
+void cxio_hal_pblpool_destroy(struct cxio_rdev *rdev_p)
+{
+	gen_pool_destroy(rdev_p->pbl_pool);
+}
+
+/*
+ * RQT Memory Manager.  Uses Linux generic allocator.
+ */
+
+#define MIN_RQT_SHIFT 10	/* 1KB == mini RQT size (16 entries) */
+#define RQT_CHUNK 2*1024*1024
+
+u32 cxio_hal_rqtpool_alloc(struct cxio_rdev *rdev_p, int size)
+{
+	unsigned long addr = gen_pool_alloc(rdev_p->rqt_pool, size << 6);
+	PDBG("%s addr 0x%x size %d\n", __FUNCTION__, (u32)addr, size << 6);
+	return (u32)addr;
+}
+
+void cxio_hal_rqtpool_free(struct cxio_rdev *rdev_p, u32 addr, int size)
+{
+	PDBG("%s addr 0x%x size %d\n", __FUNCTION__, addr, size << 6);
+	gen_pool_free(rdev_p->rqt_pool, (unsigned long)addr, size << 6);
+}
+
+int cxio_hal_rqtpool_create(struct cxio_rdev *rdev_p)
+{
+	unsigned long i;
+	rdev_p->rqt_pool = gen_pool_create(MIN_RQT_SHIFT, -1);
+	if (rdev_p->rqt_pool)
+		for (i = rdev_p->rnic_info.rqt_base;
+		     i <= rdev_p->rnic_info.rqt_top - RQT_CHUNK + 1;
+		     i += RQT_CHUNK)
+			gen_pool_add(rdev_p->rqt_pool, i, RQT_CHUNK, -1);
+	return rdev_p->rqt_pool ? 0 : -ENOMEM;
+}
+
+void cxio_hal_rqtpool_destroy(struct cxio_rdev *rdev_p)
+{
+	gen_pool_destroy(rdev_p->rqt_pool);
+}
diff --git a/drivers/infiniband/hw/cxgb3/cxio_resource.h b/drivers/infiniband/hw/cxgb3/cxio_resource.h
new file mode 100644
index 0000000..a6bbe83
--- /dev/null
+++ b/drivers/infiniband/hw/cxgb3/cxio_resource.h
@@ -0,0 +1,70 @@
+/*
+ * Copyright (c) 2006 Chelsio, Inc. All rights reserved.
+ * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+#ifndef __CXIO_RESOURCE_H__
+#define __CXIO_RESOURCE_H__
+
+#include <linux/kernel.h>
+#include <linux/random.h>
+#include <linux/slab.h>
+#include <linux/kfifo.h>
+#include <linux/spinlock.h>
+#include <linux/errno.h>
+#include <linux/genalloc.h>
+#include "cxio_hal.h"
+
+extern int cxio_hal_init_rhdl_resource(u32 nr_rhdl);
+extern void cxio_hal_destroy_rhdl_resource(void);
+extern int cxio_hal_init_resource(struct cxio_rdev *rdev_p,
+				  u32 nr_tpt, u32 nr_pbl,
+				  u32 nr_rqt, u32 nr_qpid, u32 nr_cqid,
+				  u32 nr_pdid);
+extern u32 cxio_hal_get_stag(struct cxio_hal_resource *rscp);
+extern void cxio_hal_put_stag(struct cxio_hal_resource *rscp, u32 stag);
+extern u32 cxio_hal_get_qpid(struct cxio_hal_resource *rscp);
+extern void cxio_hal_put_qpid(struct cxio_hal_resource *rscp, u32 qpid);
+extern u32 cxio_hal_get_cqid(struct cxio_hal_resource *rscp);
+extern void cxio_hal_put_cqid(struct cxio_hal_resource *rscp, u32 cqid);
+extern void cxio_hal_destroy_resource(struct cxio_hal_resource *rscp);
+
+#define PBL_OFF(rdev_p, a) ( (a) - (rdev_p)->rnic_info.pbl_base )
+extern int cxio_hal_pblpool_create(struct cxio_rdev *rdev_p);
+extern void cxio_hal_pblpool_destroy(struct cxio_rdev *rdev_p);
+extern u32 cxio_hal_pblpool_alloc(struct cxio_rdev *rdev_p, int size);
+extern void cxio_hal_pblpool_free(struct cxio_rdev *rdev_p, u32 addr, int size);
+
+#define RQT_OFF(rdev_p, a) ( (a) - (rdev_p)->rnic_info.rqt_base )
+extern int cxio_hal_rqtpool_create(struct cxio_rdev *rdev_p);
+extern void cxio_hal_rqtpool_destroy(struct cxio_rdev *rdev_p);
+extern u32 cxio_hal_rqtpool_alloc(struct cxio_rdev *rdev_p, int size);
+extern void cxio_hal_rqtpool_free(struct cxio_rdev *rdev_p, u32 addr, int size);
+#endif
diff --git a/drivers/infiniband/hw/cxgb3/cxio_wr.h b/drivers/infiniband/hw/cxgb3/cxio_wr.h
new file mode 100644
index 0000000..103fc42
--- /dev/null
+++ b/drivers/infiniband/hw/cxgb3/cxio_wr.h
@@ -0,0 +1,685 @@
+/*
+ * Copyright (c) 2006 Chelsio, Inc. All rights reserved.
+ * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+#ifndef __CXIO_WR_H__
+#define __CXIO_WR_H__
+
+#include <asm/io.h>
+#include <linux/pci.h>
+#include <linux/timer.h>
+#include "firmware_exports.h"
+
+#define T3_MAX_SGE      4
+
+#define Q_EMPTY(rptr,wptr) ((rptr)==(wptr))
+#define Q_FULL(rptr,wptr,size_log2)  ( (((wptr)-(rptr))>>(size_log2)) && \
+				       ((rptr)!=(wptr)) )
+#define Q_GENBIT(ptr,size_log2) (!(((ptr)>>size_log2)&0x1))
+#define Q_FREECNT(rptr,wptr,size_log2) ((1UL<<size_log2)-((wptr)-(rptr)))
+#define Q_COUNT(rptr,wptr) ((wptr)-(rptr))
+#define Q_PTR2IDX(ptr,size_log2) (ptr & ((1UL<<size_log2)-1))
+
+static inline void ring_doorbell(void __iomem *doorbell, u32 qpid)
+{
+	writel(((1<<31) | qpid), doorbell);
+}
+
+#define SEQ32_GE(x,y) (!( (((u32) (x)) - ((u32) (y))) & 0x80000000 ))
+
+enum t3_wr_flags {
+	T3_COMPLETION_FLAG = 0x01,
+	T3_NOTIFY_FLAG = 0x02,
+	T3_SOLICITED_EVENT_FLAG = 0x04,
+	T3_READ_FENCE_FLAG = 0x08,
+	T3_LOCAL_FENCE_FLAG = 0x10
+} __attribute__ ((packed));
+
+enum t3_wr_opcode {
+	T3_WR_BP = FW_WROPCODE_RI_BYPASS,
+	T3_WR_SEND = FW_WROPCODE_RI_SEND,
+	T3_WR_WRITE = FW_WROPCODE_RI_RDMA_WRITE,
+	T3_WR_READ = FW_WROPCODE_RI_RDMA_READ,
+	T3_WR_INV_STAG = FW_WROPCODE_RI_LOCAL_INV,
+	T3_WR_BIND = FW_WROPCODE_RI_BIND_MW,
+	T3_WR_RCV = FW_WROPCODE_RI_RECEIVE,
+	T3_WR_INIT = FW_WROPCODE_RI_RDMA_INIT,
+	T3_WR_QP_MOD = FW_WROPCODE_RI_MODIFY_QP
+} __attribute__ ((packed));
+
+enum t3_rdma_opcode {
+	T3_RDMA_WRITE,		/* IETF RDMAP v1.0 ... */
+	T3_READ_REQ,
+	T3_READ_RESP,
+	T3_SEND,
+	T3_SEND_WITH_INV,
+	T3_SEND_WITH_SE,
+	T3_SEND_WITH_SE_INV,
+	T3_TERMINATE,
+	T3_RDMA_INIT,		/* CHELSIO RI specific ... */
+	T3_BIND_MW,
+	T3_FAST_REGISTER,
+	T3_LOCAL_INV,
+	T3_QP_MOD,
+	T3_BYPASS
+} __attribute__ ((packed));
+
+static inline enum t3_rdma_opcode wr2opcode(enum t3_wr_opcode wrop)
+{
+	switch (wrop) {
+		case T3_WR_BP: return T3_BYPASS;
+		case T3_WR_SEND: return T3_SEND;
+		case T3_WR_WRITE: return T3_RDMA_WRITE;
+		case T3_WR_READ: return T3_READ_REQ;
+		case T3_WR_INV_STAG: return T3_LOCAL_INV;
+		case T3_WR_BIND: return T3_BIND_MW;
+		case T3_WR_INIT: return T3_RDMA_INIT;
+		case T3_WR_QP_MOD: return T3_QP_MOD;
+		default: break;
+	}
+	return -1;
+}
+
+
+/* Work request id */
+union t3_wrid {
+	struct {
+		u32 hi;
+		u32 low;
+	} id0;
+	u64 id1;
+};
+
+#define WRID(wrid)		(wrid.id1)
+#define WRID_GEN(wrid)		(wrid.id0.wr_gen)
+#define WRID_IDX(wrid)		(wrid.id0.wr_idx)
+#define WRID_LO(wrid)		(wrid.id0.wr_lo)
+
+struct fw_riwrh {
+	__be32 op_seop_flags;
+	__be32 gen_tid_len;
+};
+
+#define S_FW_RIWR_OP		24
+#define M_FW_RIWR_OP		0xff
+#define V_FW_RIWR_OP(x)		((x) << S_FW_RIWR_OP)
+#define G_FW_RIWR_OP(x)	((((x) >> S_FW_RIWR_OP)) & M_FW_RIWR_OP)
+
+#define S_FW_RIWR_SOPEOP	22
+#define M_FW_RIWR_SOPEOP	0x3
+#define V_FW_RIWR_SOPEOP(x)	((x) << S_FW_RIWR_SOPEOP)
+
+#define S_FW_RIWR_FLAGS		8
+#define M_FW_RIWR_FLAGS		0x3fffff
+#define V_FW_RIWR_FLAGS(x)	((x) << S_FW_RIWR_FLAGS)
+#define G_FW_RIWR_FLAGS(x)	((((x) >> S_FW_RIWR_FLAGS)) & M_FW_RIWR_FLAGS)
+
+#define S_FW_RIWR_TID		8
+#define V_FW_RIWR_TID(x)	((x) << S_FW_RIWR_TID)
+
+#define S_FW_RIWR_LEN		0
+#define V_FW_RIWR_LEN(x)	((x) << S_FW_RIWR_LEN)
+
+#define S_FW_RIWR_GEN           31
+#define V_FW_RIWR_GEN(x)        ((x)  << S_FW_RIWR_GEN)
+
+struct t3_sge {
+	__be32 stag;
+	__be32 len;
+	__be64 to;
+};
+
+/* If num_sgle is zero, flit 5+ contains immediate data.*/
+struct t3_send_wr {
+	struct fw_riwrh wrh;	/* 0 */
+	union t3_wrid wrid;	/* 1 */
+
+	u8 rdmaop;		/* 2 */
+	u8 reserved[3];
+	__be32 rem_stag;
+	__be32 plen;		/* 3 */
+	__be32 num_sgle;
+	struct t3_sge sgl[T3_MAX_SGE];	/* 4+ */
+};
+
+struct t3_local_inv_wr {
+	struct fw_riwrh wrh;	/* 0 */
+	union t3_wrid wrid;	/* 1 */
+	__be32 stag;		/* 2 */
+	__be32 reserved3;
+};
+
+struct t3_rdma_write_wr {
+	struct fw_riwrh wrh;	/* 0 */
+	union t3_wrid wrid;	/* 1 */
+	u8 rdmaop;		/* 2 */
+	u8 reserved[3];
+	__be32 stag_sink;
+	__be64 to_sink;		/* 3 */
+	__be32 plen;		/* 4 */
+	__be32 num_sgle;
+	struct t3_sge sgl[T3_MAX_SGE];	/* 5+ */
+};
+
+struct t3_rdma_read_wr {
+	struct fw_riwrh wrh;	/* 0 */
+	union t3_wrid wrid;	/* 1 */
+	u8 rdmaop;		/* 2 */
+	u8 reserved[3];
+	__be32 rem_stag;
+	__be64 rem_to;		/* 3 */
+	__be32 local_stag;	/* 4 */
+	__be32 local_len;
+	__be64 local_to;	/* 5 */
+};
+
+enum t3_addr_type {
+	T3_VA_BASED_TO = 0x0,
+	T3_ZERO_BASED_TO = 0x1
+} __attribute__ ((packed));
+
+enum t3_mem_perms {
+	T3_MEM_ACCESS_LOCAL_READ = 0x1,
+	T3_MEM_ACCESS_LOCAL_WRITE = 0x2,
+	T3_MEM_ACCESS_REM_READ = 0x4,
+	T3_MEM_ACCESS_REM_WRITE = 0x8
+} __attribute__ ((packed));
+
+struct t3_bind_mw_wr {
+	struct fw_riwrh wrh;	/* 0 */
+	union t3_wrid wrid;	/* 1 */
+	u16 reserved;		/* 2 */
+	u8 type;
+	u8 perms;
+	__be32 mr_stag;
+	__be32 mw_stag;		/* 3 */
+	__be32 mw_len;
+	__be64 mw_va;		/* 4 */
+	__be32 mr_pbl_addr;	/* 5 */
+	u8 reserved2[3];
+	u8 mr_pagesz;
+};
+
+struct t3_receive_wr {
+	struct fw_riwrh wrh;	/* 0 */
+	union t3_wrid wrid;	/* 1 */
+	u8 pagesz[T3_MAX_SGE];
+	__be32 num_sgle;		/* 2 */
+	struct t3_sge sgl[T3_MAX_SGE];	/* 3+ */
+	__be32 pbl_addr[T3_MAX_SGE];
+};
+
+struct t3_bypass_wr {
+	struct fw_riwrh wrh;
+	union t3_wrid wrid;	/* 1 */
+};
+
+struct t3_modify_qp_wr {
+	struct fw_riwrh wrh;	/* 0 */
+	union t3_wrid wrid;	/* 1 */
+	__be32 flags;		/* 2 */
+	__be32 quiesce;		/* 2 */
+	__be32 max_ird;		/* 3 */
+	__be32 max_ord;		/* 3 */
+	__be64 sge_cmd;		/* 4 */
+	__be64 ctx1;		/* 5 */
+	__be64 ctx0;		/* 6 */
+};
+
+enum t3_modify_qp_flags {
+	MODQP_QUIESCE  = 0x01,
+	MODQP_MAX_IRD  = 0x02,
+	MODQP_MAX_ORD  = 0x04,
+	MODQP_WRITE_EC = 0x08,
+	MODQP_READ_EC  = 0x10,
+};
+
+
+enum t3_mpa_attrs {
+	uP_RI_MPA_RX_MARKER_ENABLE = 0x1,
+	uP_RI_MPA_TX_MARKER_ENABLE = 0x2,
+	uP_RI_MPA_CRC_ENABLE = 0x4,
+	uP_RI_MPA_IETF_ENABLE = 0x8
+} __attribute__ ((packed));
+
+enum t3_qp_caps {
+	uP_RI_QP_RDMA_READ_ENABLE = 0x01,
+	uP_RI_QP_RDMA_WRITE_ENABLE = 0x02,
+	uP_RI_QP_BIND_ENABLE = 0x04,
+	uP_RI_QP_FAST_REGISTER_ENABLE = 0x08,
+	uP_RI_QP_STAG0_ENABLE = 0x10
+} __attribute__ ((packed));
+
+struct t3_rdma_init_attr {
+	u32 tid;
+	u32 qpid;
+	u32 pdid;
+	u32 scqid;
+	u32 rcqid;
+	u32 rq_addr;
+	u32 rq_size;
+	enum t3_mpa_attrs mpaattrs;
+	enum t3_qp_caps qpcaps;
+	u16 tcp_emss;
+	u32 ord;
+	u32 ird;
+	u64 qp_dma_addr;
+	u32 qp_dma_size;
+	u32 flags;
+};
+
+struct t3_rdma_init_wr {
+	struct fw_riwrh wrh;	/* 0 */
+	union t3_wrid wrid;	/* 1 */
+	__be32 qpid;		/* 2 */
+	__be32 pdid;
+	__be32 scqid;		/* 3 */
+	__be32 rcqid;
+	__be32 rq_addr;		/* 4 */
+	__be32 rq_size;
+	u8 mpaattrs;		/* 5 */
+	u8 qpcaps;
+	__be16 ulpdu_size;
+	__be32 flags;		/* bits 31-1 - reservered */
+				/* bit     0 - set if RECV posted */
+	__be32 ord;		/* 6 */
+	__be32 ird;
+	__be64 qp_dma_addr;	/* 7 */
+	__be32 qp_dma_size;	/* 8 */
+	u32 rsvd;
+};
+
+struct t3_genbit {
+	u64 flit[15];
+	__be64 genbit;
+};
+
+enum rdma_init_wr_flags {
+	RECVS_POSTED = 1,
+};
+
+union t3_wr {
+	struct t3_send_wr send;
+	struct t3_rdma_write_wr write;
+	struct t3_rdma_read_wr read;
+	struct t3_receive_wr recv;
+	struct t3_local_inv_wr local_inv;
+	struct t3_bind_mw_wr bind;
+	struct t3_bypass_wr bypass;
+	struct t3_rdma_init_wr init;
+	struct t3_modify_qp_wr qp_mod;
+	struct t3_genbit genbit;
+	u64 flit[16];
+};
+
+#define T3_SQ_CQE_FLIT	  13
+#define T3_SQ_COOKIE_FLIT 14
+
+#define T3_RQ_COOKIE_FLIT 13
+#define T3_RQ_CQE_FLIT	  14
+
+static inline enum t3_wr_opcode fw_riwrh_opcode(struct fw_riwrh *wqe)
+{
+	return G_FW_RIWR_OP(be32_to_cpu(wqe->op_seop_flags));
+}
+
+static inline void build_fw_riwrh(struct fw_riwrh *wqe, enum t3_wr_opcode op,
+				  enum t3_wr_flags flags, u8 genbit, u32 tid,
+				  u8 len)
+{
+	wqe->op_seop_flags = cpu_to_be32(V_FW_RIWR_OP(op) |
+					 V_FW_RIWR_SOPEOP(M_FW_RIWR_SOPEOP) |
+					 V_FW_RIWR_FLAGS(flags));
+	wmb();
+	wqe->gen_tid_len = cpu_to_be32(V_FW_RIWR_GEN(genbit) |
+				       V_FW_RIWR_TID(tid) |
+				       V_FW_RIWR_LEN(len));
+	/* 2nd gen bit... */
+	((union t3_wr *)wqe)->genbit.genbit = cpu_to_be64(genbit);
+}
+
+/*
+ * T3 ULP2_TX commands
+ */
+enum t3_utx_mem_op {
+	T3_UTX_MEM_READ = 2,
+	T3_UTX_MEM_WRITE = 3
+};
+
+/* T3 MC7 RDMA TPT entry format */
+
+enum tpt_mem_type {
+	TPT_NON_SHARED_MR = 0x0,
+	TPT_SHARED_MR = 0x1,
+	TPT_MW = 0x2,
+	TPT_MW_RELAXED_PROTECTION = 0x3
+};
+
+enum tpt_addr_type {
+	TPT_ZBTO = 0,
+	TPT_VATO = 1
+};
+
+enum tpt_mem_perm {
+	TPT_LOCAL_READ = 0x8,
+	TPT_LOCAL_WRITE = 0x4,
+	TPT_REMOTE_READ = 0x2,
+	TPT_REMOTE_WRITE = 0x1
+};
+
+struct tpt_entry {
+	__be32 valid_stag_pdid;
+	__be32 flags_pagesize_qpid;
+
+	__be32 rsvd_pbl_addr;
+	__be32 len;
+	__be32 va_hi;
+	__be32 va_low_or_fbo;
+
+	__be32 rsvd_bind_cnt_or_pstag;
+	__be32 rsvd_pbl_size;
+};
+
+#define S_TPT_VALID		31
+#define V_TPT_VALID(x)		((x) << S_TPT_VALID)
+#define F_TPT_VALID		V_TPT_VALID(1U)
+
+#define S_TPT_STAG_KEY		23
+#define M_TPT_STAG_KEY		0xFF
+#define V_TPT_STAG_KEY(x)	((x) << S_TPT_STAG_KEY)
+#define G_TPT_STAG_KEY(x)	(((x) >> S_TPT_STAG_KEY) & M_TPT_STAG_KEY)
+
+#define S_TPT_STAG_STATE	22
+#define V_TPT_STAG_STATE(x)	((x) << S_TPT_STAG_STATE)
+#define F_TPT_STAG_STATE	V_TPT_STAG_STATE(1U)
+
+#define S_TPT_STAG_TYPE		20
+#define M_TPT_STAG_TYPE		0x3
+#define V_TPT_STAG_TYPE(x)	((x) << S_TPT_STAG_TYPE)
+#define G_TPT_STAG_TYPE(x)	(((x) >> S_TPT_STAG_TYPE) & M_TPT_STAG_TYPE)
+
+#define S_TPT_PDID		0
+#define M_TPT_PDID		0xFFFFF
+#define V_TPT_PDID(x)		((x) << S_TPT_PDID)
+#define G_TPT_PDID(x)		(((x) >> S_TPT_PDID) & M_TPT_PDID)
+
+#define S_TPT_PERM		28
+#define M_TPT_PERM		0xF
+#define V_TPT_PERM(x)		((x) << S_TPT_PERM)
+#define G_TPT_PERM(x)		(((x) >> S_TPT_PERM) & M_TPT_PERM)
+
+#define S_TPT_REM_INV_DIS	27
+#define V_TPT_REM_INV_DIS(x)	((x) << S_TPT_REM_INV_DIS)
+#define F_TPT_REM_INV_DIS	V_TPT_REM_INV_DIS(1U)
+
+#define S_TPT_ADDR_TYPE		26
+#define V_TPT_ADDR_TYPE(x)	((x) << S_TPT_ADDR_TYPE)
+#define F_TPT_ADDR_TYPE		V_TPT_ADDR_TYPE(1U)
+
+#define S_TPT_MW_BIND_ENABLE	25
+#define V_TPT_MW_BIND_ENABLE(x)	((x) << S_TPT_MW_BIND_ENABLE)
+#define F_TPT_MW_BIND_ENABLE    V_TPT_MW_BIND_ENABLE(1U)
+
+#define S_TPT_PAGE_SIZE		20
+#define M_TPT_PAGE_SIZE		0x1F
+#define V_TPT_PAGE_SIZE(x)	((x) << S_TPT_PAGE_SIZE)
+#define G_TPT_PAGE_SIZE(x)	(((x) >> S_TPT_PAGE_SIZE) & M_TPT_PAGE_SIZE)
+
+#define S_TPT_PBL_ADDR		0
+#define M_TPT_PBL_ADDR		0x1FFFFFFF
+#define V_TPT_PBL_ADDR(x)	((x) << S_TPT_PBL_ADDR)
+#define G_TPT_PBL_ADDR(x)       (((x) >> S_TPT_PBL_ADDR) & M_TPT_PBL_ADDR)
+
+#define S_TPT_QPID		0
+#define M_TPT_QPID		0xFFFFF
+#define V_TPT_QPID(x)		((x) << S_TPT_QPID)
+#define G_TPT_QPID(x)		(((x) >> S_TPT_QPID) & M_TPT_QPID)
+
+#define S_TPT_PSTAG		0
+#define M_TPT_PSTAG		0xFFFFFF
+#define V_TPT_PSTAG(x)		((x) << S_TPT_PSTAG)
+#define G_TPT_PSTAG(x)		(((x) >> S_TPT_PSTAG) & M_TPT_PSTAG)
+
+#define S_TPT_PBL_SIZE		0
+#define M_TPT_PBL_SIZE		0xFFFFF
+#define V_TPT_PBL_SIZE(x)	((x) << S_TPT_PBL_SIZE)
+#define G_TPT_PBL_SIZE(x)	(((x) >> S_TPT_PBL_SIZE) & M_TPT_PBL_SIZE)
+
+/*
+ * CQE defs
+ */
+struct t3_cqe {
+	__be32 header;
+	__be32 len;
+	union {
+		struct {
+			__be32 stag;
+			__be32 msn;
+		} rcqe;
+		struct {
+			u32 wrid_hi;
+			u32 wrid_low;
+		} scqe;
+	} u;
+};
+
+#define S_CQE_OOO	  31
+#define M_CQE_OOO	  0x1
+#define G_CQE_OOO(x)	  ((((x) >> S_CQE_OOO)) & M_CQE_OOO)
+#define V_CEQ_OOO(x)	  ((x)<<S_CQE_OOO)
+
+#define S_CQE_QPID        12
+#define M_CQE_QPID        0x7FFFF
+#define G_CQE_QPID(x)     ((((x) >> S_CQE_QPID)) & M_CQE_QPID)
+#define V_CQE_QPID(x)	  ((x)<<S_CQE_QPID)
+
+#define S_CQE_SWCQE       11
+#define M_CQE_SWCQE       0x1
+#define G_CQE_SWCQE(x)    ((((x) >> S_CQE_SWCQE)) & M_CQE_SWCQE)
+#define V_CQE_SWCQE(x)	  ((x)<<S_CQE_SWCQE)
+
+#define S_CQE_GENBIT      10
+#define M_CQE_GENBIT      0x1
+#define G_CQE_GENBIT(x)   (((x) >> S_CQE_GENBIT) & M_CQE_GENBIT)
+#define V_CQE_GENBIT(x)	  ((x)<<S_CQE_GENBIT)
+
+#define S_CQE_STATUS      5
+#define M_CQE_STATUS      0x1F
+#define G_CQE_STATUS(x)   ((((x) >> S_CQE_STATUS)) & M_CQE_STATUS)
+#define V_CQE_STATUS(x)   ((x)<<S_CQE_STATUS)
+
+#define S_CQE_TYPE        4
+#define M_CQE_TYPE        0x1
+#define G_CQE_TYPE(x)     ((((x) >> S_CQE_TYPE)) & M_CQE_TYPE)
+#define V_CQE_TYPE(x)     ((x)<<S_CQE_TYPE)
+
+#define S_CQE_OPCODE      0
+#define M_CQE_OPCODE      0xF
+#define G_CQE_OPCODE(x)   ((((x) >> S_CQE_OPCODE)) & M_CQE_OPCODE)
+#define V_CQE_OPCODE(x)   ((x)<<S_CQE_OPCODE)
+
+#define SW_CQE(x)         (G_CQE_SWCQE(be32_to_cpu((x).header)))
+#define CQE_OOO(x)        (G_CQE_OOO(be32_to_cpu((x).header)))
+#define CQE_QPID(x)       (G_CQE_QPID(be32_to_cpu((x).header)))
+#define CQE_GENBIT(x)     (G_CQE_GENBIT(be32_to_cpu((x).header)))
+#define CQE_TYPE(x)       (G_CQE_TYPE(be32_to_cpu((x).header)))
+#define SQ_TYPE(x)	  (CQE_TYPE((x)))
+#define RQ_TYPE(x)	  (!CQE_TYPE((x)))
+#define CQE_STATUS(x)     (G_CQE_STATUS(be32_to_cpu((x).header)))
+#define CQE_OPCODE(x)     (G_CQE_OPCODE(be32_to_cpu((x).header)))
+
+#define CQE_LEN(x)        (be32_to_cpu((x).len))
+
+/* used for RQ completion processing */
+#define CQE_WRID_STAG(x)  (be32_to_cpu((x).u.rcqe.stag))
+#define CQE_WRID_MSN(x)   (be32_to_cpu((x).u.rcqe.msn))
+
+/* used for SQ completion processing */
+#define CQE_WRID_SQ_WPTR(x)	((x).u.scqe.wrid_hi)
+#define CQE_WRID_WPTR(x)	((x).u.scqe.wrid_low)
+
+/* generic accessor macros */
+#define CQE_WRID_HI(x)		((x).u.scqe.wrid_hi)
+#define CQE_WRID_LOW(x)		((x).u.scqe.wrid_low)
+
+#define TPT_ERR_SUCCESS                     0x0
+#define TPT_ERR_STAG                        0x1	 /* STAG invalid: either the */
+						 /* STAG is offlimt, being 0, */
+						 /* or STAG_key mismatch */
+#define TPT_ERR_PDID                        0x2	 /* PDID mismatch */
+#define TPT_ERR_QPID                        0x3	 /* QPID mismatch */
+#define TPT_ERR_ACCESS                      0x4	 /* Invalid access right */
+#define TPT_ERR_WRAP                        0x5	 /* Wrap error */
+#define TPT_ERR_BOUND                       0x6	 /* base and bounds voilation */
+#define TPT_ERR_INVALIDATE_SHARED_MR        0x7	 /* attempt to invalidate a  */
+						 /* shared memory region */
+#define TPT_ERR_INVALIDATE_MR_WITH_MW_BOUND 0x8	 /* attempt to invalidate a  */
+						 /* shared memory region */
+#define TPT_ERR_ECC                         0x9	 /* ECC error detected */
+#define TPT_ERR_ECC_PSTAG                   0xA	 /* ECC error detected when  */
+						 /* reading PSTAG for a MW  */
+						 /* Invalidate */
+#define TPT_ERR_PBL_ADDR_BOUND              0xB	 /* pbl addr out of bounds:  */
+						 /* software error */
+#define TPT_ERR_SWFLUSH			    0xC	 /* SW FLUSHED */
+#define TPT_ERR_CRC                         0x10 /* CRC error */
+#define TPT_ERR_MARKER                      0x11 /* Marker error */
+#define TPT_ERR_PDU_LEN_ERR                 0x12 /* invalid PDU length */
+#define TPT_ERR_OUT_OF_RQE                  0x13 /* out of RQE */
+#define TPT_ERR_DDP_VERSION                 0x14 /* wrong DDP version */
+#define TPT_ERR_RDMA_VERSION                0x15 /* wrong RDMA version */
+#define TPT_ERR_OPCODE                      0x16 /* invalid rdma opcode */
+#define TPT_ERR_DDP_QUEUE_NUM               0x17 /* invalid ddp queue number */
+#define TPT_ERR_MSN                         0x18 /* MSN error */
+#define TPT_ERR_TBIT                        0x19 /* tag bit not set correctly */
+#define TPT_ERR_MO                          0x1A /* MO not 0 for TERMINATE  */
+						 /* or READ_REQ */
+#define TPT_ERR_MSN_GAP                     0x1B
+#define TPT_ERR_MSN_RANGE                   0x1C
+#define TPT_ERR_IRD_OVERFLOW                0x1D
+#define TPT_ERR_RQE_ADDR_BOUND              0x1E /* RQE addr out of bounds:  */
+						 /* software error */
+#define TPT_ERR_INTERNAL_ERR                0x1F /* internal error (opcode  */
+						 /* mismatch) */
+
+struct t3_swsq {
+	__u64			wr_id;
+	struct t3_cqe		cqe;
+	__u32			sq_wptr;
+	__be32			read_len;
+	int			opcode;
+	int			complete;
+	int			signaled;
+};
+
+/*
+ * A T3 WQ implements both the SQ and RQ.
+ */
+struct t3_wq {
+	union t3_wr *queue;		/* DMA accessable memory */
+	dma_addr_t dma_addr;		/* DMA address for HW */
+	DECLARE_PCI_UNMAP_ADDR(mapping)	/* unmap kruft */
+	u32 error;			/* 1 once we go to ERROR */
+	u32 qpid;
+	u32 wptr;			/* idx to next available WR slot */
+	u32 size_log2;			/* total wq size */
+	struct t3_swsq *sq;		/* SW SQ */
+	struct t3_swsq *oldest_read;	/* tracks oldest pending read */
+	u32 sq_wptr;			/* sq_wptr - sq_rptr == count of */
+	u32 sq_rptr;			/* pending wrs */
+	u32 sq_size_log2;		/* sq size */
+	u64 *rq;			/* SW RQ (holds consumer wr_ids */
+	u32 rq_wptr;			/* rq_wptr - rq_rptr == count of */
+	u32 rq_rptr;			/* pending wrs */
+	u64 *rq_oldest_wr;		/* oldest wr on the SW RQ */
+	u32 rq_size_log2;		/* rq size */
+	u32 rq_addr;			/* rq adapter address */
+	void __iomem *doorbell;		/* kernel db */
+	u64 udb;			/* user db if any */
+};
+
+struct t3_cq {
+	u32 cqid;
+	u32 rptr;
+	u32 wptr;
+	u32 size_log2;
+	dma_addr_t dma_addr;
+	DECLARE_PCI_UNMAP_ADDR(mapping)
+	struct t3_cqe *queue;
+	struct t3_cqe *sw_queue;
+	u32 sw_rptr;
+	u32 sw_wptr;
+};
+
+#define CQ_VLD_ENTRY(ptr,size_log2,cqe) (Q_GENBIT(ptr,size_log2) == \
+					 CQE_GENBIT(*cqe))
+
+static inline void cxio_set_wq_in_error(struct t3_wq *wq)
+{
+	wq->queue->flit[13] = 1;
+}
+
+static inline struct t3_cqe *cxio_next_hw_cqe(struct t3_cq *cq)
+{
+	struct t3_cqe *cqe;
+
+	cqe = cq->queue + (Q_PTR2IDX(cq->rptr, cq->size_log2));
+	if (CQ_VLD_ENTRY(cq->rptr, cq->size_log2, cqe))
+		return cqe;
+	return NULL;
+}
+
+static inline struct t3_cqe *cxio_next_sw_cqe(struct t3_cq *cq)
+{
+	struct t3_cqe *cqe;
+
+	if (!Q_EMPTY(cq->sw_rptr, cq->sw_wptr)) {
+		cqe = cq->sw_queue + (Q_PTR2IDX(cq->sw_rptr, cq->size_log2));
+		return cqe;
+	}
+	return NULL;
+}
+
+static inline struct t3_cqe *cxio_next_cqe(struct t3_cq *cq)
+{
+	struct t3_cqe *cqe;
+
+	if (!Q_EMPTY(cq->sw_rptr, cq->sw_wptr)) {
+		cqe = cq->sw_queue + (Q_PTR2IDX(cq->sw_rptr, cq->size_log2));
+		return cqe;
+	}
+	cqe = cq->queue + (Q_PTR2IDX(cq->rptr, cq->size_log2));
+	if (CQ_VLD_ENTRY(cq->rptr, cq->size_log2, cqe))
+		return cqe;
+	return NULL;
+}
+
+#endif
diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.c b/drivers/infiniband/hw/cxgb3/iwch_provider.c
index 8e519f2..d02cd72 100644
--- a/drivers/infiniband/hw/cxgb3/iwch_provider.c
+++ b/drivers/infiniband/hw/cxgb3/iwch_provider.c
@@ -50,7 +50,7 @@ #include <rdma/ib_verbs.h>
 #include <rdma/ib_smi.h>
 #include <rdma/ib_user_verbs.h>
 
-#include <cxio_hal.h>
+#include "cxio_hal.h"
 #include "iwch.h"
 #include "iwch_provider.h"
 #include "iwch_cm.h"


From swise at opengridcomputing.com  Sat Feb 10 11:23:43 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Sat, 10 Feb 2007 13:23:43 -0600
Subject: [openib-general] [PATCH] RDMA/iwcm: Bugs in
 cm_conn_req_handler()
In-Reply-To: <1171035668.26453.11.camel@trinity.ogc.int>
References: <OFB574507E.6CAEE34E-ON6525727D.0013194D-6525727D.00132A8C@in.ibm.com>
	<ada7iusm021.fsf@cisco.com> <1171035668.26453.11.camel@trinity.ogc.int>
Message-ID: <1171135423.11017.61.camel@stevo-desktop>

Here is a patch that Tom and I think fixes the race condition Roland
discovered, plus cleans up the issues Krishna attempted to fix in his
first patch.  I'm testing it now with a series of rping tests looping
with random sizes and counts and it seems to work, but the patch needs
more testing and review.  

Krishna, can you review this carefully and also test it and let us know
if you think its good?  The patch is against for-2.6.21 from Roland's
tree.

Roland, can you review this too and verify that it fixes the race
condition?


Krishna: here are comments to your original patch description:


cm_conn_req_handler() :
>         1. Calling destroy_cm_id leaks 3 work 'free' list entries.

I don't think your original patch fixed all places this memory was
leaked.

This has been address in the patch below by creating a function
free_cm_id() that frees the list entries -and- frees the cm_id memory.
It is then called from the 3 places where the cm_id can be freed.

>         2. cm_id is freed up wrongly and not cm_id_priv (though the
>            effect is the same since cm_id is the first element of
>            cm_id_priv, but still a bug if the top level cm_id
> changes).
> 

This is addressed in the patch below since we have a prototyped function
for freeing the cm_id_priv.

>         3. Reject message has to be sent on failure. Tested this
>            without the fix and found the client hangs, waited for
> about
>            20 mins and then did Ctrl-C but the process is unkillable.
> 

The call to iw_cm_reject() is now in destroy_cm_id() and is called based
on the cm_id state.  So whenever a connection is destroyed, if it needs
a rejection sent, it will be sent.

>         4. Setting IWCM_F_CALLBACK_DESTROY on cm_id (child handle)
>            doesn't achieve anything, since checking for
>            IWCM_F_CALLBACK_DESTROY in the parent's flag (in
>            cm_work_handler) means that this will never be true.
> 

It does achieve something.  I think you are failing to consider the fact
that the iWARP provider can have a reference on the cm_id even in the
case where the callback function returns an error thus giving
destruction ownership to the IWCM.  Perhaps the ammasso provider never
does this, but cxgb3 can.  And we haven't put any restrictions on
exactly when the provider _must_ release its reference.  If the provider
_does_ have a reference at this point in the code, then the cm_id will
not be freed, and must be freed when the refcnt finally reaches zero
when the provider removes its reference.  

I wish to clarify this for everyone (and we need this text in
Documentation/infiniband/iwcm.txt IMO):

This design is based on the RDMA_CM and IB_CM behavior.  If the app
issues the destroy via rdma_destroy_cm_id, then we block that thread
until all references are gone.  If the app returns non-zero in a
callback for a given cm_id, then the CM owns destroying the cm_id and
the application is done with it. That's the short of it.  Here's the
long of it:

There are 2 paths for freeing iw_cm memory.

1) the application issues a rdma_destroy_cm_id() which calls
iw_destroy_cm_id().  In this case (and this case only), the thread is
blocked until the refcnt reaches 0, then the thread continues and frees
the memory. 

2) the application returns non zero from a callback function.  In this
case, the IWCM is responsible to destroy the cm_id.  However, the IWCM
_cannot_ block in its event handler thread because this can cause a
deadlock.  A deadlock can occur if the provider has a reference to the
cm_id and needs to post some event before removing the reference.  If
the IWCM were to block awaiting the refcnt to go to zero, it would
deadlock with the provider trying to post the last event before derefing
the cm_id.  So the IWCM_F_CALLBACK_DESTROY bit is used to indicate that
the IWCM owns destroying this.  If, in cm_work_handler(), the refcnt
goes to zero -and- the DESTROY bit is set, then the cm_id can be freed.
If the refcnt doesn't go to zero in that function, then either the
provider still has a reference, or subsequent queued work items have
additional references.  In either case, the cm_id is not freed and
cm_work_handler() keeps chunking through the events and processing them.
Since the cm_id is marked DESTROYING, the events get dropped and the
references released on the cm_id.  Eventually the cm_id will get freed
either in cm_work_handler() -or- in rem_ref() called by the provider it
the provider has the last reference.

So based on the above design, there are 3 places in the code where the
cm_id can be freed:

A) in case 1 above the memory will always be freed in iw_destroy_cm_id()
after the thread is awakened with a refcnt of zero.

B) in case 2 above if the last reference is due to a queued work event
for the iwm.  In this case the memory if freed in cm_work_handler().

C) in case 2 above if the provider has the last reference, then the
cm_id is freed in rem_ref().


I hope this clarifies things.

Here's the proposed patch:

iw_cm_id destruction race condition fixes.

From: Steve Wise <swise at opengridcomputing.com>

Several changes:

- iwcm_deref_id() always wakes up if there's another reference.

- move iw_cm_reject() into destroy_cm_id() to reduce code replication.

- clean up race condition in cm_work_handler().

- create static void free_cm_id() which deallocs the work entries and then
  kfrees the cm_id memory.  This reduces code replication.

Signed-off-by: Steve Wise <swise at opengridcomputing.com>
Signed-off-by: Tom Tucker <tom at opengridcomputing.com>
---

 drivers/infiniband/core/iwcm.c |   48 +++++++++++++++++++++-------------------
 1 files changed, 25 insertions(+), 23 deletions(-)

diff --git a/drivers/infiniband/core/iwcm.c b/drivers/infiniband/core/iwcm.c
index 1039ad5..403daed 100644
--- a/drivers/infiniband/core/iwcm.c
+++ b/drivers/infiniband/core/iwcm.c
@@ -146,6 +146,12 @@ static int copy_private_data(struct iw_c
 	return 0;
 }
 
+static void free_cm_id(struct iwcm_id_private *cm_id_priv)
+{
+	dealloc_work_entries(cm_id_priv);
+	kfree(cm_id_priv);
+}
+
 /*
  * Release a reference on cm_id. If the last reference is being
  * released, enable the waiting thread (in iw_destroy_cm_id) to
@@ -153,21 +159,14 @@ static int copy_private_data(struct iw_c
  */
 static int iwcm_deref_id(struct iwcm_id_private *cm_id_priv)
 {
-	int ret = 0;
-
 	BUG_ON(atomic_read(&cm_id_priv->refcount)==0);
 	if (atomic_dec_and_test(&cm_id_priv->refcount)) {
 		BUG_ON(!list_empty(&cm_id_priv->work_list));
-		if (waitqueue_active(&cm_id_priv->destroy_comp.wait)) {
-			BUG_ON(cm_id_priv->state != IW_CM_STATE_DESTROYING);
-			BUG_ON(test_bit(IWCM_F_CALLBACK_DESTROY,
-					&cm_id_priv->flags));
-			ret = 1;
-		}
 		complete(&cm_id_priv->destroy_comp);
+		return 1;
 	}
 
-	return ret;
+	return 0;
 }
 
 static void add_ref(struct iw_cm_id *cm_id)
@@ -181,7 +180,11 @@ static void rem_ref(struct iw_cm_id *cm_
 {
 	struct iwcm_id_private *cm_id_priv;
 	cm_id_priv = container_of(cm_id, struct iwcm_id_private, id);
-	iwcm_deref_id(cm_id_priv);
+	if (iwcm_deref_id(cm_id_priv) &&
+	    test_bit(IWCM_F_CALLBACK_DESTROY, &cm_id_priv->flags)) {
+		BUG_ON(!list_empty(&cm_id_priv->work_list));
+		free_cm_id(cm_id_priv);
+	}
 }
 
 static int cm_event_handler(struct iw_cm_id *cm_id, struct iw_cm_event *event);
@@ -355,8 +358,11 @@ static void destroy_cm_id(struct iw_cm_i
 	case IW_CM_STATE_CONN_RECV:
 		/*
 		 * App called destroy before/without calling accept after
-		 * receiving connection request event notification.
+		 * receiving connection request event notification or
+		 * returned non zero from the event callback function.
+		 * In either case, must tell the provider to reject.
 		 */
+		iw_cm_reject(cm_id, NULL, 0);
 		cm_id_priv->state = IW_CM_STATE_DESTROYING;
 		break;
 	case IW_CM_STATE_CONN_SENT:
@@ -391,9 +397,7 @@ void iw_destroy_cm_id(struct iw_cm_id *c
 
 	wait_for_completion(&cm_id_priv->destroy_comp);
 
-	dealloc_work_entries(cm_id_priv);
-
-	kfree(cm_id_priv);
+	free_cm_id(cm_id_priv);
 }
 EXPORT_SYMBOL(iw_destroy_cm_id);
 
@@ -639,7 +643,6 @@ static void cm_conn_req_handler(struct i
 
 	ret = alloc_work_entries(cm_id_priv, 3);
 	if (ret) {
-		iw_cm_reject(cm_id, NULL, 0);
 		iw_destroy_cm_id(cm_id);
 		goto out;
 	}
@@ -650,7 +653,7 @@ static void cm_conn_req_handler(struct i
 		set_bit(IWCM_F_CALLBACK_DESTROY, &cm_id_priv->flags);
 		destroy_cm_id(cm_id);
 		if (atomic_read(&cm_id_priv->refcount)==0)
-			kfree(cm_id);
+			free_cm_id(cm_id_priv);
 	}
 
 out:
@@ -854,13 +857,12 @@ static void cm_work_handler(struct work_
 			destroy_cm_id(&cm_id_priv->id);
 		}
 		BUG_ON(atomic_read(&cm_id_priv->refcount)==0);
-		if (iwcm_deref_id(cm_id_priv))
-			return;
-
-		if (atomic_read(&cm_id_priv->refcount)==0 &&
-		    test_bit(IWCM_F_CALLBACK_DESTROY, &cm_id_priv->flags)) {
-			dealloc_work_entries(cm_id_priv);
-			kfree(cm_id_priv);
+		if (iwcm_deref_id(cm_id_priv)) {
+			if (test_bit(IWCM_F_CALLBACK_DESTROY,
+				     &cm_id_priv->flags)) {
+				BUG_ON(!list_empty(&cm_id_priv->work_list));
+				free_cm_id(cm_id_priv);
+			}
 			return;
 		}
 		spin_lock_irqsave(&cm_id_priv->lock, flags);


From rowland at cse.ohio-state.edu  Sat Feb 10 11:25:16 2007
From: rowland at cse.ohio-state.edu (Shaun Rowland)
Date: Sat, 10 Feb 2007 14:25:16 -0500
Subject: [openib-general] MVAPICH2 SRPM update and install files patch
Message-ID: <45CE1C1C.70406@cse.ohio-state.edu>

I updated the latest MVAPICH2 SRPM:

https://www.openfabrics.org/~rowland/ofed_1_2/

I am including a patch to the latest ofed_1_2_scripts git files. Since
these files are the same as those used in the OFED-1.2-20070208-1508.tgz
package, this patch can also be applied there. This patch is required to
use the new MVAPICH2 SRPM file and should not be used with the older
versions.

I've done the following:

- Updated some of the dependencies when mvapich2 is selected.

- Added new mvapich2 configuration prompts if mvapich2 is selected.
This is all contained within the mvapich2_config shell function. These
values are stored in the configuration file, etc. and prefixed with
MVAPICH2_CONF_.

There are two implementation choices for the MVAPICH2 build: OFA and
uDAPL. The OFA build should allow IB, IB + RDMA-CM, and iWARP to be
used. The mode is controlled by the following runtime environment variables:

IB
--
No additional environment variable required (default case).

IB + RDMA-CM
------------
MV2_USE_RDMA_CM=1

iWARP
-----
MV2_ENABLE_IWARP_MODE=1

-- 
Shaun Rowland	rowland at cse.ohio-state.edu
http://www.cse.ohio-state.edu/~rowland/
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: ofed_1_2_scripts.patch
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070210/29424cc1/attachment.ksh>

From swise at opengridcomputing.com  Sat Feb 10 12:36:03 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Sat, 10 Feb 2007 14:36:03 -0600
Subject: [openib-general] [PATCH] RDMA/iwcm: Bugs in
 cm_conn_req_handler()
In-Reply-To: <1171135423.11017.61.camel@stevo-desktop>
References: <OFB574507E.6CAEE34E-ON6525727D.0013194D-6525727D.00132A8C@in.ibm.com>
	<ada7iusm021.fsf@cisco.com> <1171035668.26453.11.camel@trinity.ogc.int>
	<1171135423.11017.61.camel@stevo-desktop>
Message-ID: <1171139763.11017.68.camel@stevo-desktop>

ugh. 

There is at least one bug in this patch.  I cannot call iw_cm_reject()
inside destroy_cm_id() because both functions grab the iw_cm lock...


On Sat, 2007-02-10 at 13:23 -0600, Steve Wise wrote:
> Here is a patch that Tom and I think fixes the race condition Roland
> discovered, plus cleans up the issues Krishna attempted to fix in his
> first patch.  I'm testing it now with a series of rping tests looping
> with random sizes and counts and it seems to work, but the patch needs
> more testing and review.  
> 
> Krishna, can you review this carefully and also test it and let us know
> if you think its good?  The patch is against for-2.6.21 from Roland's
> tree.
> 
> Roland, can you review this too and verify that it fixes the race
> condition?
> 
> 
> Krishna: here are comments to your original patch description:
> 
> 
> cm_conn_req_handler() :
> >         1. Calling destroy_cm_id leaks 3 work 'free' list entries.
> 
> I don't think your original patch fixed all places this memory was
> leaked.
> 
> This has been address in the patch below by creating a function
> free_cm_id() that frees the list entries -and- frees the cm_id memory.
> It is then called from the 3 places where the cm_id can be freed.
> 
> >         2. cm_id is freed up wrongly and not cm_id_priv (though the
> >            effect is the same since cm_id is the first element of
> >            cm_id_priv, but still a bug if the top level cm_id
> > changes).
> > 
> 
> This is addressed in the patch below since we have a prototyped function
> for freeing the cm_id_priv.
> 
> >         3. Reject message has to be sent on failure. Tested this
> >            without the fix and found the client hangs, waited for
> > about
> >            20 mins and then did Ctrl-C but the process is unkillable.
> > 
> 
> The call to iw_cm_reject() is now in destroy_cm_id() and is called based
> on the cm_id state.  So whenever a connection is destroyed, if it needs
> a rejection sent, it will be sent.
> 
> >         4. Setting IWCM_F_CALLBACK_DESTROY on cm_id (child handle)
> >            doesn't achieve anything, since checking for
> >            IWCM_F_CALLBACK_DESTROY in the parent's flag (in
> >            cm_work_handler) means that this will never be true.
> > 
> 
> It does achieve something.  I think you are failing to consider the fact
> that the iWARP provider can have a reference on the cm_id even in the
> case where the callback function returns an error thus giving
> destruction ownership to the IWCM.  Perhaps the ammasso provider never
> does this, but cxgb3 can.  And we haven't put any restrictions on
> exactly when the provider _must_ release its reference.  If the provider
> _does_ have a reference at this point in the code, then the cm_id will
> not be freed, and must be freed when the refcnt finally reaches zero
> when the provider removes its reference.  
> 
> I wish to clarify this for everyone (and we need this text in
> Documentation/infiniband/iwcm.txt IMO):
> 
> This design is based on the RDMA_CM and IB_CM behavior.  If the app
> issues the destroy via rdma_destroy_cm_id, then we block that thread
> until all references are gone.  If the app returns non-zero in a
> callback for a given cm_id, then the CM owns destroying the cm_id and
> the application is done with it. That's the short of it.  Here's the
> long of it:
> 
> There are 2 paths for freeing iw_cm memory.
> 
> 1) the application issues a rdma_destroy_cm_id() which calls
> iw_destroy_cm_id().  In this case (and this case only), the thread is
> blocked until the refcnt reaches 0, then the thread continues and frees
> the memory. 
> 
> 2) the application returns non zero from a callback function.  In this
> case, the IWCM is responsible to destroy the cm_id.  However, the IWCM
> _cannot_ block in its event handler thread because this can cause a
> deadlock.  A deadlock can occur if the provider has a reference to the
> cm_id and needs to post some event before removing the reference.  If
> the IWCM were to block awaiting the refcnt to go to zero, it would
> deadlock with the provider trying to post the last event before derefing
> the cm_id.  So the IWCM_F_CALLBACK_DESTROY bit is used to indicate that
> the IWCM owns destroying this.  If, in cm_work_handler(), the refcnt
> goes to zero -and- the DESTROY bit is set, then the cm_id can be freed.
> If the refcnt doesn't go to zero in that function, then either the
> provider still has a reference, or subsequent queued work items have
> additional references.  In either case, the cm_id is not freed and
> cm_work_handler() keeps chunking through the events and processing them.
> Since the cm_id is marked DESTROYING, the events get dropped and the
> references released on the cm_id.  Eventually the cm_id will get freed
> either in cm_work_handler() -or- in rem_ref() called by the provider it
> the provider has the last reference.
> 
> So based on the above design, there are 3 places in the code where the
> cm_id can be freed:
> 
> A) in case 1 above the memory will always be freed in iw_destroy_cm_id()
> after the thread is awakened with a refcnt of zero.
> 
> B) in case 2 above if the last reference is due to a queued work event
> for the iwm.  In this case the memory if freed in cm_work_handler().
> 
> C) in case 2 above if the provider has the last reference, then the
> cm_id is freed in rem_ref().
> 
> 
> I hope this clarifies things.
> 
> Here's the proposed patch:
> 
> iw_cm_id destruction race condition fixes.
> 
> From: Steve Wise <swise at opengridcomputing.com>
> 
> Several changes:
> 
> - iwcm_deref_id() always wakes up if there's another reference.
> 
> - move iw_cm_reject() into destroy_cm_id() to reduce code replication.
> 
> - clean up race condition in cm_work_handler().
> 
> - create static void free_cm_id() which deallocs the work entries and then
>   kfrees the cm_id memory.  This reduces code replication.
> 
> Signed-off-by: Steve Wise <swise at opengridcomputing.com>
> Signed-off-by: Tom Tucker <tom at opengridcomputing.com>
> ---
> 
>  drivers/infiniband/core/iwcm.c |   48 +++++++++++++++++++++-------------------
>  1 files changed, 25 insertions(+), 23 deletions(-)
> 
> diff --git a/drivers/infiniband/core/iwcm.c b/drivers/infiniband/core/iwcm.c
> index 1039ad5..403daed 100644
> --- a/drivers/infiniband/core/iwcm.c
> +++ b/drivers/infiniband/core/iwcm.c
> @@ -146,6 +146,12 @@ static int copy_private_data(struct iw_c
>  	return 0;
>  }
>  
> +static void free_cm_id(struct iwcm_id_private *cm_id_priv)
> +{
> +	dealloc_work_entries(cm_id_priv);
> +	kfree(cm_id_priv);
> +}
> +
>  /*
>   * Release a reference on cm_id. If the last reference is being
>   * released, enable the waiting thread (in iw_destroy_cm_id) to
> @@ -153,21 +159,14 @@ static int copy_private_data(struct iw_c
>   */
>  static int iwcm_deref_id(struct iwcm_id_private *cm_id_priv)
>  {
> -	int ret = 0;
> -
>  	BUG_ON(atomic_read(&cm_id_priv->refcount)==0);
>  	if (atomic_dec_and_test(&cm_id_priv->refcount)) {
>  		BUG_ON(!list_empty(&cm_id_priv->work_list));
> -		if (waitqueue_active(&cm_id_priv->destroy_comp.wait)) {
> -			BUG_ON(cm_id_priv->state != IW_CM_STATE_DESTROYING);
> -			BUG_ON(test_bit(IWCM_F_CALLBACK_DESTROY,
> -					&cm_id_priv->flags));
> -			ret = 1;
> -		}
>  		complete(&cm_id_priv->destroy_comp);
> +		return 1;
>  	}
>  
> -	return ret;
> +	return 0;
>  }
>  
>  static void add_ref(struct iw_cm_id *cm_id)
> @@ -181,7 +180,11 @@ static void rem_ref(struct iw_cm_id *cm_
>  {
>  	struct iwcm_id_private *cm_id_priv;
>  	cm_id_priv = container_of(cm_id, struct iwcm_id_private, id);
> -	iwcm_deref_id(cm_id_priv);
> +	if (iwcm_deref_id(cm_id_priv) &&
> +	    test_bit(IWCM_F_CALLBACK_DESTROY, &cm_id_priv->flags)) {
> +		BUG_ON(!list_empty(&cm_id_priv->work_list));
> +		free_cm_id(cm_id_priv);
> +	}
>  }
>  
>  static int cm_event_handler(struct iw_cm_id *cm_id, struct iw_cm_event *event);
> @@ -355,8 +358,11 @@ static void destroy_cm_id(struct iw_cm_i
>  	case IW_CM_STATE_CONN_RECV:
>  		/*
>  		 * App called destroy before/without calling accept after
> -		 * receiving connection request event notification.
> +		 * receiving connection request event notification or
> +		 * returned non zero from the event callback function.
> +		 * In either case, must tell the provider to reject.
>  		 */
> +		iw_cm_reject(cm_id, NULL, 0);
>  		cm_id_priv->state = IW_CM_STATE_DESTROYING;
>  		break;
>  	case IW_CM_STATE_CONN_SENT:
> @@ -391,9 +397,7 @@ void iw_destroy_cm_id(struct iw_cm_id *c
>  
>  	wait_for_completion(&cm_id_priv->destroy_comp);
>  
> -	dealloc_work_entries(cm_id_priv);
> -
> -	kfree(cm_id_priv);
> +	free_cm_id(cm_id_priv);
>  }
>  EXPORT_SYMBOL(iw_destroy_cm_id);
>  
> @@ -639,7 +643,6 @@ static void cm_conn_req_handler(struct i
>  
>  	ret = alloc_work_entries(cm_id_priv, 3);
>  	if (ret) {
> -		iw_cm_reject(cm_id, NULL, 0);
>  		iw_destroy_cm_id(cm_id);
>  		goto out;
>  	}
> @@ -650,7 +653,7 @@ static void cm_conn_req_handler(struct i
>  		set_bit(IWCM_F_CALLBACK_DESTROY, &cm_id_priv->flags);
>  		destroy_cm_id(cm_id);
>  		if (atomic_read(&cm_id_priv->refcount)==0)
> -			kfree(cm_id);
> +			free_cm_id(cm_id_priv);
>  	}
>  
>  out:
> @@ -854,13 +857,12 @@ static void cm_work_handler(struct work_
>  			destroy_cm_id(&cm_id_priv->id);
>  		}
>  		BUG_ON(atomic_read(&cm_id_priv->refcount)==0);
> -		if (iwcm_deref_id(cm_id_priv))
> -			return;
> -
> -		if (atomic_read(&cm_id_priv->refcount)==0 &&
> -		    test_bit(IWCM_F_CALLBACK_DESTROY, &cm_id_priv->flags)) {
> -			dealloc_work_entries(cm_id_priv);
> -			kfree(cm_id_priv);
> +		if (iwcm_deref_id(cm_id_priv)) {
> +			if (test_bit(IWCM_F_CALLBACK_DESTROY,
> +				     &cm_id_priv->flags)) {
> +				BUG_ON(!list_empty(&cm_id_priv->work_list));
> +				free_cm_id(cm_id_priv);
> +			}
>  			return;
>  		}
>  		spin_lock_irqsave(&cm_id_priv->lock, flags);
> 
> 
> 
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
> 


From mst at mellanox.co.il  Sat Feb 10 13:10:21 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Sat, 10 Feb 2007 23:10:21 +0200
Subject: [openib-general] [PATCH 0/5] iw_cxgb3 - misc cleanup and fixes
In-Reply-To: <adahctvl4zx.fsf@cisco.com>
References: <adahctvl4zx.fsf@cisco.com>
Message-ID: <20070210211021.GA14903@mellanox.co.il>

> Quoting Roland Dreier <rdreier at cisco.com>:
> Subject: Re: [PATCH 0/5] iw_cxgb3 - misc cleanup and fixes
> 
>     Michael> What about the mthca memory registration patches?  I
>     Michael> thought they are on their way. Should I repost?
> 
> Sorry, I forgot about that.  Yes, please resend the latest state.

OK, coming up.

-- 
MST


From mst at mellanox.co.il  Sat Feb 10 13:13:12 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Sat, 10 Feb 2007 23:13:12 +0200
Subject: [openib-general] [PATCH 1 of 4] IB/mthca: merge MR and FMR space on
	64 bit
Message-ID: <20070210211312.GB14903@mellanox.co.il>

For Tavor, we currently reserve separate MPT and MTT space for
FMRs to avoid abusing the vmalloc space on 32 bit kernels. No
such problem exists on 64 bit kernels so let's not do it there.

This way we have a shared pool for MR and FMR resources, used on
demand.  This will also make it possible to write MTTs for
regular regions directly from driver.

Signed-off-by: Michael S. Tsirkin <mst at mellanox.co.il>

---

Index: linux-2.6/drivers/infiniband/hw/mthca/mthca_mr.c
===================================================================
--- linux-2.6.orig/drivers/infiniband/hw/mthca/mthca_mr.c
+++ linux-2.6/drivers/infiniband/hw/mthca/mthca_mr.c
@@ -761,7 +761,7 @@ void mthca_arbel_fmr_unmap(struct mthca_
 int mthca_init_mr_table(struct mthca_dev *dev)
 {
 	unsigned long addr;
-	int err, i;
+	int mpts, mtts, err, i;
 
 	err = mthca_alloc_init(&dev->mr_table.mpt_alloc,
 			       dev->limits.num_mpts,
@@ -795,13 +795,21 @@ int mthca_init_mr_table(struct mthca_dev
 			err = -EINVAL;
 			goto err_fmr_mpt;
 		}
+		mpts = mtts = 1 << i;
+	} else {
+		mpts = dev->limits.num_mtt_segs;
+		mtts = dev->limits.num_mpts;
+	}
+
+	if (!mthca_is_memfree(dev) &&
+	    (dev->mthca_flags & MTHCA_FLAG_FMR)) {
 
 		addr = pci_resource_start(dev->pdev, 4) +
 			((pci_resource_len(dev->pdev, 4) - 1) &
 			 dev->mr_table.mpt_base);
 
 		dev->mr_table.tavor_fmr.mpt_base =
-			ioremap(addr, (1 << i) * sizeof(struct mthca_mpt_entry));
+			ioremap(addr, mpts * sizeof(struct mthca_mpt_entry));
 
 		if (!dev->mr_table.tavor_fmr.mpt_base) {
 			mthca_warn(dev, "MPT ioremap for FMR failed.\n");
@@ -814,19 +822,21 @@ int mthca_init_mr_table(struct mthca_dev
 			 dev->mr_table.mtt_base);
 
 		dev->mr_table.tavor_fmr.mtt_base =
-			ioremap(addr, (1 << i) * MTHCA_MTT_SEG_SIZE);
+			ioremap(addr, mtts * MTHCA_MTT_SEG_SIZE);
 		if (!dev->mr_table.tavor_fmr.mtt_base) {
 			mthca_warn(dev, "MTT ioremap for FMR failed.\n");
 			err = -ENOMEM;
 			goto err_fmr_mtt;
 		}
+	}
 
-		err = mthca_buddy_init(&dev->mr_table.tavor_fmr.mtt_buddy, i);
+	if (dev->limits.fmr_reserved_mtts) {
+		err = mthca_buddy_init(&dev->mr_table.tavor_fmr.mtt_buddy, fls(mtts - 1));
 		if (err)
 			goto err_fmr_mtt_buddy;
 
 		/* Prevent regular MRs from using FMR keys */
-		err = mthca_buddy_alloc(&dev->mr_table.mtt_buddy, i);
+		err = mthca_buddy_alloc(&dev->mr_table.mtt_buddy, fls(mtts - 1));
 		if (err)
 			goto err_reserve_fmr;
 
Index: linux-2.6/drivers/infiniband/hw/mthca/mthca_profile.c
===================================================================
--- linux-2.6.orig/drivers/infiniband/hw/mthca/mthca_profile.c
+++ linux-2.6/drivers/infiniband/hw/mthca/mthca_profile.c
@@ -277,7 +277,7 @@ u64 mthca_make_profile(struct mthca_dev 
 	 * out of the MR pool. They don't use additional memory, but
 	 * we assign them as part of the HCA profile anyway.
 	 */
-	if (mthca_is_memfree(dev))
+	if (mthca_is_memfree(dev) || BITS_PER_LONG == 64)
 		dev->limits.fmr_reserved_mtts = 0;
 	else
 		dev->limits.fmr_reserved_mtts = request->fmr_reserved_mtts;

-- 
MST


From mst at mellanox.co.il  Sat Feb 10 13:14:25 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Sat, 10 Feb 2007 23:14:25 +0200
Subject: [openib-general] [PATCH 2 of 4] IB/mthca: always fill MTTs from CPU
Message-ID: <20070210211425.GC14903@mellanox.co.il>

Speed up memory registration by filling in MTTs directly.  This
reduces the number of FW commands needed to register an MR by at
least a factor of 2.  This applies to all memfree cards, and to
tavor mode on 64 bit systems with the patch I posted earlier.

Signed-off-by: Michael S. Tsirkin <mst at mellanox.co.il>

---

Index: linux-2.6/drivers/infiniband/hw/mthca/mthca_dev.h
===================================================================
--- linux-2.6.orig/drivers/infiniband/hw/mthca/mthca_dev.h
+++ linux-2.6/drivers/infiniband/hw/mthca/mthca_dev.h
@@ -464,6 +464,8 @@ void mthca_uar_free(struct mthca_dev *de
 int mthca_pd_alloc(struct mthca_dev *dev, int privileged, struct mthca_pd *pd);
 void mthca_pd_free(struct mthca_dev *dev, struct mthca_pd *pd);
 
+int mthca_write_mtt_size(struct mthca_dev *dev);
+
 struct mthca_mtt *mthca_alloc_mtt(struct mthca_dev *dev, int size);
 void mthca_free_mtt(struct mthca_dev *dev, struct mthca_mtt *mtt);
 int mthca_write_mtt(struct mthca_dev *dev, struct mthca_mtt *mtt,
Index: linux-2.6/drivers/infiniband/hw/mthca/mthca_mr.c
===================================================================
--- linux-2.6.orig/drivers/infiniband/hw/mthca/mthca_mr.c
+++ linux-2.6/drivers/infiniband/hw/mthca/mthca_mr.c
@@ -243,8 +243,8 @@ void mthca_free_mtt(struct mthca_dev *de
 	kfree(mtt);
 }
 
-int mthca_write_mtt(struct mthca_dev *dev, struct mthca_mtt *mtt,
-		    int start_index, u64 *buffer_list, int list_len)
+static int __mthca_write_mtt(struct mthca_dev *dev, struct mthca_mtt *mtt,
+			     int start_index, u64 *buffer_list, int list_len)
 {
 	struct mthca_mailbox *mailbox;
 	__be64 *mtt_entry;
@@ -295,6 +295,84 @@ out:
 	return err;
 }
 
+void mthca_tavor_write_mtt_seg(struct mthca_dev *dev, struct mthca_mtt *mtt,
+			      int start_index, u64 *buffer_list, int list_len)
+{
+	u64 __iomem *mtts;
+	u32 mtt_seg;
+	int i;
+
+	mtt_seg = mtt->first_seg * MTHCA_MTT_SEG_SIZE;
+       	mtts = dev->mr_table.tavor_fmr.mtt_base + mtt_seg + start_index * sizeof (u64);
+	for (i = 0; i < list_len; ++i) {
+		__be64 mtt_entry = cpu_to_be64(buffer_list[i] |
+					       MTHCA_MTT_FLAG_PRESENT);
+		mthca_write64_raw(mtt_entry, mtts + i);
+	}
+}
+
+void mthca_arbel_write_mtt_seg(struct mthca_dev *dev, struct mthca_mtt *mtt,
+			      int start_index, u64 *buffer_list, int list_len)
+{
+	__be64 *mtts;
+	int i;
+	int s = start_index * sizeof (u64);
+
+	/* For Arbel, all MTTs must fit in the same page. */
+	BUG_ON(s / PAGE_SIZE != (s + list_len * sizeof(u64) - 1) / PAGE_SIZE);
+	/* Require full segments */
+	BUG_ON(s % MTHCA_MTT_SEG_SIZE);
+
+	mtts = mthca_table_find(dev->mr_table.mtt_table, mtt->first_seg +
+				s / MTHCA_MTT_SEG_SIZE);
+
+	BUG_ON(!mtts);
+
+	for (i = 0; i < list_len; ++i)
+		mtts[i] = cpu_to_be64(buffer_list[i] | MTHCA_MTT_FLAG_PRESENT);
+}
+
+int mthca_write_mtt_size(struct mthca_dev *dev)
+{
+	if (dev->mr_table.fmr_mtt_buddy != &dev->mr_table.mtt_buddy)
+		/*
+		 * Be friendly to WRITE_MTT command
+		 * and leave two empty slots for the
+		 * index and reserved fields of the
+		 * mailbox.
+		 */
+		return PAGE_SIZE / sizeof (u64) - 2;
+
+	/* For Arbel, all MTTs must fit in the same page. */
+	return mthca_is_memfree(dev) ? (PAGE_SIZE / sizeof (u64)) : 0x7ffffff;
+}
+
+int mthca_write_mtt(struct mthca_dev *dev, struct mthca_mtt *mtt,
+		    int start_index, u64 *buffer_list, int list_len)
+{
+	int size = mthca_write_mtt_size(dev);
+	int chunk;
+
+	if (dev->mr_table.fmr_mtt_buddy != &dev->mr_table.mtt_buddy)
+		return __mthca_write_mtt(dev, mtt, start_index, buffer_list, list_len);
+
+	while (list_len > 0) {
+		chunk = min(size, list_len);
+		if (mthca_is_memfree(dev))
+			mthca_arbel_write_mtt_seg(dev, mtt, start_index,
+						  buffer_list, chunk);
+		else
+			mthca_tavor_write_mtt_seg(dev, mtt, start_index,
+						  buffer_list, chunk);
+
+		list_len    -= chunk;
+		start_index += chunk;
+		buffer_list += chunk;
+	}
+
+	return 0;
+}
+
 static inline u32 tavor_hw_index_to_key(u32 ind)
 {
 	return ind;
Index: linux-2.6/drivers/infiniband/hw/mthca/mthca_provider.c
===================================================================
--- linux-2.6.orig/drivers/infiniband/hw/mthca/mthca_provider.c
+++ linux-2.6/drivers/infiniband/hw/mthca/mthca_provider.c
@@ -1015,6 +1015,7 @@ static struct ib_mr *mthca_reg_user_mr(s
 	int shift, n, len;
 	int i, j, k;
 	int err = 0;
+	int write_mtt_size;
 
 	shift = ffs(region->page_size) - 1;
 
@@ -1040,6 +1041,8 @@ static struct ib_mr *mthca_reg_user_mr(s
 
 	i = n = 0;
 
+	write_mtt_size = min(mthca_write_mtt_size(dev), (int)(PAGE_SIZE / sizeof *pages));
+
 	list_for_each_entry(chunk, &region->chunk_list, list)
 		for (j = 0; j < chunk->nmap; ++j) {
 			len = sg_dma_len(&chunk->page_list[j]) >> shift;
@@ -1047,14 +1050,11 @@ static struct ib_mr *mthca_reg_user_mr(s
 				pages[i++] = sg_dma_address(&chunk->page_list[j]) +
 					region->page_size * k;
 				/*
-				 * Be friendly to WRITE_MTT command
-				 * and leave two empty slots for the
-				 * index and reserved fields of the
-				 * mailbox.
+				 * Be friendly to write_mtt and pass it chunks
+				 * of appropriate size.
 				 */
-				if (i == PAGE_SIZE / sizeof (u64) - 2) {
-					err = mthca_write_mtt(dev, mr->mtt,
-							      n, pages, i);
+				if (i == write_mtt_size) {
+					err = mthca_write_mtt(dev, mr->mtt, n, pages, i);
 					if (err)
 						goto mtt_done;
 					n += i;

-- 
MST


From mst at mellanox.co.il  Sat Feb 10 13:15:08 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Sat, 10 Feb 2007 23:15:08 +0200
Subject: [openib-general] [PATCH 3 of 4] IB/mthca: fix non-cache-coherent
 CPUs with memfree
Message-ID: <20070210211508.GD14903@mellanox.co.il>

Fix non-cache-coherent CPUs with memfree HCAs.

We allocate the MTT table with alloc_pages() and then do
pci_map_sg(), so we must call pci_dma_sync_sg after the CPU
writes to the MTT table (this works since device never writes the
MTTs on memfree).

For MPTs, both the device and CPU might write there, so we must
allocate dma coherent memory for these.

Signed-off-by: Michael S. Tsirkin <mst at mellanox.co.il>

---

Index: linux-2.6/drivers/infiniband/hw/mthca/mthca_memfree.c
===================================================================
--- linux-2.6.orig/drivers/infiniband/hw/mthca/mthca_memfree.c
+++ linux-2.6/drivers/infiniband/hw/mthca/mthca_memfree.c
@@ -35,6 +35,8 @@
  */
 
 #include <linux/mm.h>
+#include <linux/scatterlist.h>
+#include <asm/page.h>
 
 #include "mthca_memfree.h"
 #include "mthca_dev.h"
@@ -58,22 +60,31 @@ struct mthca_user_db_table {
 	}                page[0];
 };
 
-void mthca_free_icm(struct mthca_dev *dev, struct mthca_icm *icm)
+void mthca_free_icm(struct mthca_dev *dev, struct mthca_icm *icm, int coherent)
 {
 	struct mthca_icm_chunk *chunk, *tmp;
+	void *buf;
 	int i;
 
 	if (!icm)
 		return;
 
 	list_for_each_entry_safe(chunk, tmp, &icm->chunk_list, list) {
-		if (chunk->nsg > 0)
-			pci_unmap_sg(dev->pdev, chunk->mem, chunk->npages,
-				     PCI_DMA_BIDIRECTIONAL);
-
-		for (i = 0; i < chunk->npages; ++i)
-			__free_pages(chunk->mem[i].page,
-				     get_order(chunk->mem[i].length));
+		if (coherent)
+			for (i = 0; i < chunk->npages; ++i) {
+				buf = lowmem_page_address(chunk->mem[i].page);
+				dma_free_coherent(&dev->pdev->dev, chunk->mem[i].length,
+						  buf, sg_dma_address(&chunk->mem[i]));
+			}
+		else {
+			if (chunk->nsg > 0)
+				pci_unmap_sg(dev->pdev, chunk->mem, chunk->npages,
+					     PCI_DMA_BIDIRECTIONAL);
+
+			for (i = 0; i < chunk->npages; ++i)
+				__free_pages(chunk->mem[i].page,
+					     get_order(chunk->mem[i].length));
+		}
 
 		kfree(chunk);
 	}
@@ -81,12 +92,41 @@ void mthca_free_icm(struct mthca_dev *de
 	kfree(icm);
 }
 
+static int mthca_alloc_icm_pages(struct scatterlist *mem, int order, gfp_t gfp_mask)
+{
+	mem->page = alloc_pages(gfp_mask, order);
+	if (!mem->page)
+		return -ENOMEM;
+
+	mem->length = PAGE_SIZE << order;
+	mem->offset = 0;
+	return 0;
+}
+
+static int mthca_alloc_icm_coherent(struct device *dev, struct scatterlist *mem,
+				    int order, gfp_t gfp_mask)
+{
+	void *buf = dma_alloc_coherent(dev, PAGE_SIZE << order, &sg_dma_address(mem),
+				       gfp_mask);
+	if (!buf)
+		return -ENOMEM;
+
+	sg_set_buf(mem, buf, PAGE_SIZE << order);
+	BUG_ON(mem->offset);
+	sg_dma_len(mem) = PAGE_SIZE << order;
+	return 0;
+}
+
 struct mthca_icm *mthca_alloc_icm(struct mthca_dev *dev, int npages,
-				  gfp_t gfp_mask)
+				  gfp_t gfp_mask, int coherent)
 {
 	struct mthca_icm *icm;
 	struct mthca_icm_chunk *chunk = NULL;
 	int cur_order;
+	int ret;
+
+	/* We use sg_set_buf for coherent allocs, which assumes low memory */
+	BUG_ON(coherent && (gfp_mask & __GFP_HIGHMEM));
 
 	icm = kmalloc(sizeof *icm, gfp_mask & ~(__GFP_HIGHMEM | __GFP_NOWARN));
 	if (!icm)
@@ -112,21 +152,28 @@ struct mthca_icm *mthca_alloc_icm(struct
 		while (1 << cur_order > npages)
 			--cur_order;
 
-		chunk->mem[chunk->npages].page = alloc_pages(gfp_mask, cur_order);
-		if (chunk->mem[chunk->npages].page) {
-			chunk->mem[chunk->npages].length = PAGE_SIZE << cur_order;
-			chunk->mem[chunk->npages].offset = 0;
+		if (coherent)
+			ret = mthca_alloc_icm_coherent(&dev->pdev->dev,
+						       &chunk->mem[chunk->npages],
+						       cur_order, gfp_mask);
+		else
+		       	ret = mthca_alloc_icm_pages(&chunk->mem[chunk->npages],
+						    cur_order, gfp_mask);
 
-			if (++chunk->npages == MTHCA_ICM_CHUNK_LEN) {
+		if (!ret) {
+			++chunk->npages;
+
+			if (!coherent && chunk->npages == MTHCA_ICM_CHUNK_LEN) {
 				chunk->nsg = pci_map_sg(dev->pdev, chunk->mem,
 							chunk->npages,
 							PCI_DMA_BIDIRECTIONAL);
 
 				if (chunk->nsg <= 0)
 					goto fail;
+			}
 
+			if (chunk->npages == MTHCA_ICM_CHUNK_LEN)
 				chunk = NULL;
-			}
 
 			npages -= 1 << cur_order;
 		} else {
@@ -136,7 +183,7 @@ struct mthca_icm *mthca_alloc_icm(struct
 		}
 	}
 
-	if (chunk) {
+	if (!coherent && chunk) {
 		chunk->nsg = pci_map_sg(dev->pdev, chunk->mem,
 					chunk->npages,
 					PCI_DMA_BIDIRECTIONAL);
@@ -148,7 +195,7 @@ struct mthca_icm *mthca_alloc_icm(struct
 	return icm;
 
 fail:
-	mthca_free_icm(dev, icm);
+	mthca_free_icm(dev, icm, coherent);
 	return NULL;
 }
 
@@ -167,7 +214,7 @@ int mthca_table_get(struct mthca_dev *de
 
 	table->icm[i] = mthca_alloc_icm(dev, MTHCA_TABLE_CHUNK_SIZE >> PAGE_SHIFT,
 					(table->lowmem ? GFP_KERNEL : GFP_HIGHUSER) |
-					__GFP_NOWARN);
+					__GFP_NOWARN, table->coherent);
 	if (!table->icm[i]) {
 		ret = -ENOMEM;
 		goto out;
@@ -175,7 +222,7 @@ int mthca_table_get(struct mthca_dev *de
 
 	if (mthca_MAP_ICM(dev, table->icm[i], table->virt + i * MTHCA_TABLE_CHUNK_SIZE,
 			  &status) || status) {
-		mthca_free_icm(dev, table->icm[i]);
+		mthca_free_icm(dev, table->icm[i], table->coherent);
 		table->icm[i] = NULL;
 		ret = -ENOMEM;
 		goto out;
@@ -204,16 +251,16 @@ void mthca_table_put(struct mthca_dev *d
 		mthca_UNMAP_ICM(dev, table->virt + i * MTHCA_TABLE_CHUNK_SIZE,
 				MTHCA_TABLE_CHUNK_SIZE / MTHCA_ICM_PAGE_SIZE,
 				&status);
-		mthca_free_icm(dev, table->icm[i]);
+		mthca_free_icm(dev, table->icm[i], table->coherent);
 		table->icm[i] = NULL;
 	}
 
 	mutex_unlock(&table->mutex);
 }
 
-void *mthca_table_find(struct mthca_icm_table *table, int obj)
+void *mthca_table_find(struct mthca_icm_table *table, int obj, dma_addr_t *dma_handle)
 {
-	int idx, offset, i;
+	int idx, offset, dma_offset, i;
 	struct mthca_icm_chunk *chunk;
 	struct mthca_icm *icm;
 	struct page *page = NULL;
@@ -225,13 +272,22 @@ void *mthca_table_find(struct mthca_icm_
 
 	idx = (obj & (table->num_obj - 1)) * table->obj_size;
 	icm = table->icm[idx / MTHCA_TABLE_CHUNK_SIZE];
-	offset = idx % MTHCA_TABLE_CHUNK_SIZE;
+	dma_offset = offset = idx % MTHCA_TABLE_CHUNK_SIZE;
 
 	if (!icm)
 		goto out;
 
 	list_for_each_entry(chunk, &icm->chunk_list, list) {
 		for (i = 0; i < chunk->npages; ++i) {
+			if (dma_handle && dma_offset >= 0) {
+				if (sg_dma_len(&chunk->mem[i]) > dma_offset)
+					*dma_handle = sg_dma_address(&chunk->mem[i]) +
+					       	dma_offset;
+				dma_offset -= sg_dma_len(&chunk->mem[i]);
+			}
+			/* DMA mapping can merge pages but not split them,
+			 * so if we found the page, dma_handle has already
+			 * been assigned to. */
 			if (chunk->mem[i].length > offset) {
 				page = chunk->mem[i].page;
 				goto out;
@@ -283,7 +339,7 @@ void mthca_table_put_range(struct mthca_
 struct mthca_icm_table *mthca_alloc_icm_table(struct mthca_dev *dev,
 					      u64 virt, int obj_size,
 					      int nobj, int reserved,
-					      int use_lowmem)
+					      int use_lowmem, int use_coherent)
 {
 	struct mthca_icm_table *table;
 	int num_icm;
@@ -302,6 +358,7 @@ struct mthca_icm_table *mthca_alloc_icm_
 	table->num_obj  = nobj;
 	table->obj_size = obj_size;
 	table->lowmem   = use_lowmem;
+	table->coherent = use_coherent;
 	mutex_init(&table->mutex);
 
 	for (i = 0; i < num_icm; ++i)
@@ -314,12 +371,12 @@ struct mthca_icm_table *mthca_alloc_icm_
 
 		table->icm[i] = mthca_alloc_icm(dev, chunk_size >> PAGE_SHIFT,
 						(use_lowmem ? GFP_KERNEL : GFP_HIGHUSER) |
-						__GFP_NOWARN);
+						__GFP_NOWARN, use_coherent);
 		if (!table->icm[i])
 			goto err;
 		if (mthca_MAP_ICM(dev, table->icm[i], virt + i * MTHCA_TABLE_CHUNK_SIZE,
 				  &status) || status) {
-			mthca_free_icm(dev, table->icm[i]);
+			mthca_free_icm(dev, table->icm[i], table->coherent);
 			table->icm[i] = NULL;
 			goto err;
 		}
@@ -339,7 +396,7 @@ err:
 			mthca_UNMAP_ICM(dev, virt + i * MTHCA_TABLE_CHUNK_SIZE,
 					MTHCA_TABLE_CHUNK_SIZE / MTHCA_ICM_PAGE_SIZE,
 				        &status);
-			mthca_free_icm(dev, table->icm[i]);
+			mthca_free_icm(dev, table->icm[i], table->coherent);
 		}
 
 	kfree(table);
@@ -357,7 +414,7 @@ void mthca_free_icm_table(struct mthca_d
 			mthca_UNMAP_ICM(dev, table->virt + i * MTHCA_TABLE_CHUNK_SIZE,
 					MTHCA_TABLE_CHUNK_SIZE / MTHCA_ICM_PAGE_SIZE,
 					&status);
-			mthca_free_icm(dev, table->icm[i]);
+			mthca_free_icm(dev, table->icm[i], table->coherent);
 		}
 
 	kfree(table);
Index: linux-2.6/drivers/infiniband/hw/mthca/mthca_main.c
===================================================================
--- linux-2.6.orig/drivers/infiniband/hw/mthca/mthca_main.c
+++ linux-2.6/drivers/infiniband/hw/mthca/mthca_main.c
@@ -379,7 +379,7 @@ static int mthca_load_fw(struct mthca_de
 
 	mdev->fw.arbel.fw_icm =
 		mthca_alloc_icm(mdev, mdev->fw.arbel.fw_pages,
-				GFP_HIGHUSER | __GFP_NOWARN);
+				GFP_HIGHUSER | __GFP_NOWARN, 0);
 	if (!mdev->fw.arbel.fw_icm) {
 		mthca_err(mdev, "Couldn't allocate FW area, aborting.\n");
 		return -ENOMEM;
@@ -412,7 +412,7 @@ err_unmap_fa:
 	mthca_UNMAP_FA(mdev, &status);
 
 err_free:
-	mthca_free_icm(mdev, mdev->fw.arbel.fw_icm);
+	mthca_free_icm(mdev, mdev->fw.arbel.fw_icm, 0);
 	return err;
 }
 
@@ -441,7 +441,7 @@ static int mthca_init_icm(struct mthca_d
 		  (unsigned long long) aux_pages << 2);
 
 	mdev->fw.arbel.aux_icm = mthca_alloc_icm(mdev, aux_pages,
-						 GFP_HIGHUSER | __GFP_NOWARN);
+						 GFP_HIGHUSER | __GFP_NOWARN, 0);
 	if (!mdev->fw.arbel.aux_icm) {
 		mthca_err(mdev, "Couldn't allocate aux memory, aborting.\n");
 		return -ENOMEM;
@@ -467,7 +467,8 @@ static int mthca_init_icm(struct mthca_d
 	mdev->mr_table.mtt_table = mthca_alloc_icm_table(mdev, init_hca->mtt_base,
 							 MTHCA_MTT_SEG_SIZE,
 							 mdev->limits.num_mtt_segs,
-							 mdev->limits.reserved_mtts, 1);
+							 mdev->limits.reserved_mtts,
+							 1, 0);
 	if (!mdev->mr_table.mtt_table) {
 		mthca_err(mdev, "Failed to map MTT context memory, aborting.\n");
 		err = -ENOMEM;
@@ -477,7 +478,8 @@ static int mthca_init_icm(struct mthca_d
 	mdev->mr_table.mpt_table = mthca_alloc_icm_table(mdev, init_hca->mpt_base,
 							 dev_lim->mpt_entry_sz,
 							 mdev->limits.num_mpts,
-							 mdev->limits.reserved_mrws, 1);
+							 mdev->limits.reserved_mrws,
+							 1, 1);
 	if (!mdev->mr_table.mpt_table) {
 		mthca_err(mdev, "Failed to map MPT context memory, aborting.\n");
 		err = -ENOMEM;
@@ -487,7 +489,8 @@ static int mthca_init_icm(struct mthca_d
 	mdev->qp_table.qp_table = mthca_alloc_icm_table(mdev, init_hca->qpc_base,
 							dev_lim->qpc_entry_sz,
 							mdev->limits.num_qps,
-							mdev->limits.reserved_qps, 0);
+							mdev->limits.reserved_qps,
+						       	0, 0);
 	if (!mdev->qp_table.qp_table) {
 		mthca_err(mdev, "Failed to map QP context memory, aborting.\n");
 		err = -ENOMEM;
@@ -497,7 +500,8 @@ static int mthca_init_icm(struct mthca_d
 	mdev->qp_table.eqp_table = mthca_alloc_icm_table(mdev, init_hca->eqpc_base,
 							 dev_lim->eqpc_entry_sz,
 							 mdev->limits.num_qps,
-							 mdev->limits.reserved_qps, 0);
+							 mdev->limits.reserved_qps,
+							 0, 0);
 	if (!mdev->qp_table.eqp_table) {
 		mthca_err(mdev, "Failed to map EQP context memory, aborting.\n");
 		err = -ENOMEM;
@@ -507,7 +511,7 @@ static int mthca_init_icm(struct mthca_d
 	mdev->qp_table.rdb_table = mthca_alloc_icm_table(mdev, init_hca->rdb_base,
 							 MTHCA_RDB_ENTRY_SIZE,
 							 mdev->limits.num_qps <<
-							 mdev->qp_table.rdb_shift,
+							 mdev->qp_table.rdb_shift, 0,
 							 0, 0);
 	if (!mdev->qp_table.rdb_table) {
 		mthca_err(mdev, "Failed to map RDB context memory, aborting\n");
@@ -518,7 +522,8 @@ static int mthca_init_icm(struct mthca_d
        mdev->cq_table.table = mthca_alloc_icm_table(mdev, init_hca->cqc_base,
 						    dev_lim->cqc_entry_sz,
 						    mdev->limits.num_cqs,
-						    mdev->limits.reserved_cqs, 0);
+						    mdev->limits.reserved_cqs,
+						    0, 0);
 	if (!mdev->cq_table.table) {
 		mthca_err(mdev, "Failed to map CQ context memory, aborting.\n");
 		err = -ENOMEM;
@@ -530,7 +535,8 @@ static int mthca_init_icm(struct mthca_d
 			mthca_alloc_icm_table(mdev, init_hca->srqc_base,
 					      dev_lim->srq_entry_sz,
 					      mdev->limits.num_srqs,
-					      mdev->limits.reserved_srqs, 0);
+					      mdev->limits.reserved_srqs,
+					      0, 0);
 		if (!mdev->srq_table.table) {
 			mthca_err(mdev, "Failed to map SRQ context memory, "
 				  "aborting.\n");
@@ -550,7 +556,7 @@ static int mthca_init_icm(struct mthca_d
 						      mdev->limits.num_amgms,
 						      mdev->limits.num_mgms +
 						      mdev->limits.num_amgms,
-						      0);
+						      0, 0);
 	if (!mdev->mcg_table.table) {
 		mthca_err(mdev, "Failed to map MCG context memory, aborting.\n");
 		err = -ENOMEM;
@@ -588,7 +594,7 @@ err_unmap_aux:
 	mthca_UNMAP_ICM_AUX(mdev, &status);
 
 err_free_aux:
-	mthca_free_icm(mdev, mdev->fw.arbel.aux_icm);
+	mthca_free_icm(mdev, mdev->fw.arbel.aux_icm, 0);
 
 	return err;
 }
@@ -609,7 +615,7 @@ static void mthca_free_icms(struct mthca
 	mthca_unmap_eq_icm(mdev);
 
 	mthca_UNMAP_ICM_AUX(mdev, &status);
-	mthca_free_icm(mdev, mdev->fw.arbel.aux_icm);
+	mthca_free_icm(mdev, mdev->fw.arbel.aux_icm, 0);
 }
 
 static int mthca_init_arbel(struct mthca_dev *mdev)
@@ -693,7 +699,7 @@ err_free_icm:
 
 err_stop_fw:
 	mthca_UNMAP_FA(mdev, &status);
-	mthca_free_icm(mdev, mdev->fw.arbel.fw_icm);
+	mthca_free_icm(mdev, mdev->fw.arbel.fw_icm, 0);
 
 err_disable:
 	if (!(mdev->mthca_flags & MTHCA_FLAG_NO_LAM))
@@ -712,7 +718,7 @@ static void mthca_close_hca(struct mthca
 		mthca_free_icms(mdev);
 
 		mthca_UNMAP_FA(mdev, &status);
-		mthca_free_icm(mdev, mdev->fw.arbel.fw_icm);
+		mthca_free_icm(mdev, mdev->fw.arbel.fw_icm, 0);
 
 		if (!(mdev->mthca_flags & MTHCA_FLAG_NO_LAM))
 			mthca_DISABLE_LAM(mdev, &status);
Index: linux-2.6/drivers/infiniband/hw/mthca/mthca_memfree.h
===================================================================
--- linux-2.6.orig/drivers/infiniband/hw/mthca/mthca_memfree.h
+++ linux-2.6/drivers/infiniband/hw/mthca/mthca_memfree.h
@@ -69,6 +69,7 @@ struct mthca_icm_table {
 	int               num_obj;
 	int               obj_size;
 	int               lowmem;
+	int               coherent;
 	struct mutex      mutex;
 	struct mthca_icm *icm[0];
 };
@@ -82,17 +83,17 @@ struct mthca_icm_iter {
 struct mthca_dev;
 
 struct mthca_icm *mthca_alloc_icm(struct mthca_dev *dev, int npages,
-				  gfp_t gfp_mask);
-void mthca_free_icm(struct mthca_dev *dev, struct mthca_icm *icm);
+				  gfp_t gfp_mask, int coherent);
+void mthca_free_icm(struct mthca_dev *dev, struct mthca_icm *icm, int coherent);
 
 struct mthca_icm_table *mthca_alloc_icm_table(struct mthca_dev *dev,
 					      u64 virt, int obj_size,
 					      int nobj, int reserved,
-					      int use_lowmem);
+					      int use_lowmem, int use_coherent);
 void mthca_free_icm_table(struct mthca_dev *dev, struct mthca_icm_table *table);
 int mthca_table_get(struct mthca_dev *dev, struct mthca_icm_table *table, int obj);
 void mthca_table_put(struct mthca_dev *dev, struct mthca_icm_table *table, int obj);
-void *mthca_table_find(struct mthca_icm_table *table, int obj);
+void *mthca_table_find(struct mthca_icm_table *table, int obj, dma_addr_t *dma_handle);
 int mthca_table_get_range(struct mthca_dev *dev, struct mthca_icm_table *table,
 			  int start, int end);
 void mthca_table_put_range(struct mthca_dev *dev, struct mthca_icm_table *table,
Index: linux-2.6/drivers/infiniband/hw/mthca/mthca_mr.c
===================================================================
--- linux-2.6.orig/drivers/infiniband/hw/mthca/mthca_mr.c
+++ linux-2.6/drivers/infiniband/hw/mthca/mthca_mr.c
@@ -315,6 +315,7 @@ void mthca_arbel_write_mtt_seg(struct mt
 			      int start_index, u64 *buffer_list, int list_len)
 {
 	__be64 *mtts;
+	dma_addr_t dma_handle;
 	int i;
 	int s = start_index * sizeof (u64);
 
@@ -324,12 +325,14 @@ void mthca_arbel_write_mtt_seg(struct mt
 	BUG_ON(s % MTHCA_MTT_SEG_SIZE);
 
 	mtts = mthca_table_find(dev->mr_table.mtt_table, mtt->first_seg +
-				s / MTHCA_MTT_SEG_SIZE);
+				s / MTHCA_MTT_SEG_SIZE, &dma_handle);
 
 	BUG_ON(!mtts);
 
 	for (i = 0; i < list_len; ++i)
 		mtts[i] = cpu_to_be64(buffer_list[i] | MTHCA_MTT_FLAG_PRESENT);
+
+	dma_sync_single(&dev->pdev->dev, dma_handle, list_len * sizeof(u64), DMA_TO_DEVICE);
 }
 
 int mthca_write_mtt_size(struct mthca_dev *dev)
@@ -602,7 +605,7 @@ int mthca_fmr_alloc(struct mthca_dev *de
 		if (err)
 			goto err_out_mpt_free;
 
-		mr->mem.arbel.mpt = mthca_table_find(dev->mr_table.mpt_table, key);
+		mr->mem.arbel.mpt = mthca_table_find(dev->mr_table.mpt_table, key, NULL);
 		BUG_ON(!mr->mem.arbel.mpt);
 	} else
 		mr->mem.tavor.mpt = dev->mr_table.tavor_fmr.mpt_base +
@@ -616,7 +619,8 @@ int mthca_fmr_alloc(struct mthca_dev *de
 
 	if (mthca_is_memfree(dev)) {
 		mr->mem.arbel.mtts = mthca_table_find(dev->mr_table.mtt_table,
-						      mr->mtt->first_seg);
+						      mr->mtt->first_seg,
+						      &mr->mem.arbel.dma_handle);
 		BUG_ON(!mr->mem.arbel.mtts);
 	} else
 		mr->mem.tavor.mtts = dev->mr_table.tavor_fmr.mtt_base + mtt_seg;
@@ -790,6 +794,9 @@ int mthca_arbel_map_phys_fmr(struct ib_f
 		fmr->mem.arbel.mtts[i] = cpu_to_be64(page_list[i] |
 						     MTHCA_MTT_FLAG_PRESENT);
 
+	dma_sync_single(&dev->pdev->dev, fmr->mem.arbel.dma_handle,
+		       	list_len * sizeof(u64), DMA_TO_DEVICE);
+
 	fmr->mem.arbel.mpt->key    = cpu_to_be32(key);
 	fmr->mem.arbel.mpt->lkey   = cpu_to_be32(key);
 	fmr->mem.arbel.mpt->length = cpu_to_be64(list_len * (1ull << fmr->attr.page_shift));
Index: linux-2.6/drivers/infiniband/hw/mthca/mthca_provider.h
===================================================================
--- linux-2.6.orig/drivers/infiniband/hw/mthca/mthca_provider.h
+++ linux-2.6/drivers/infiniband/hw/mthca/mthca_provider.h
@@ -89,6 +89,7 @@ struct mthca_fmr {
 		struct {
 			struct mthca_mpt_entry *mpt;
 			__be64 *mtts;
+			dma_addr_t dma_handle;
 		} arbel;
 	} mem;
 };

-- 
MST


From mst at mellanox.co.il  Sat Feb 10 13:17:26 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Sat, 10 Feb 2007 23:17:26 +0200
Subject: [openib-general] [PATCH 4 of 4] IB/mthca: give reserved MTTs a
 separate cache line
Message-ID: <20070210211726.GE14903@mellanox.co.il>

This fixes several issues related to reserved MTTs and memory alignment.

1. MTTs are allocated in non-cache-coherent memory, so we must give
reserved MTTs their own cache line, to prevent both device and
CPU from writing into the same cache line at the same time.

2. reserved_mtts field has different meaning in Tavor and Arbel,
so we are wasting mtt entries on memfree. Fix the Arbel case to match
Tavor semantics.

Signed-off-by: Michael S. Tsirkin <mst at mellanox.co.il>

---

Index: linux-2.6/drivers/infiniband/hw/mthca/mthca_main.c
===================================================================
--- linux-2.6.orig/drivers/infiniband/hw/mthca/mthca_main.c
+++ linux-2.6/drivers/infiniband/hw/mthca/mthca_main.c
@@ -464,6 +464,10 @@ static int mthca_init_icm(struct mthca_d
 		goto err_unmap_aux;
 	}
 
+	/* CPU writes to non-reserved MTTs, while HCA might DMA to reserved mtts */
+	mdev->limits.reserved_mtts = ALIGN(mdev->limits.reserved_mtts * MTHCA_MTT_SEG_SIZE,
+					   dma_get_cache_alignment()) / MTHCA_MTT_SEG_SIZE;
+
 	mdev->mr_table.mtt_table = mthca_alloc_icm_table(mdev, init_hca->mtt_base,
 							 MTHCA_MTT_SEG_SIZE,
 							 mdev->limits.num_mtt_segs,
Index: linux-2.6/drivers/infiniband/hw/mthca/mthca_cmd.c
===================================================================
--- linux-2.6.orig/drivers/infiniband/hw/mthca/mthca_cmd.c
+++ linux-2.6/drivers/infiniband/hw/mthca/mthca_cmd.c
@@ -1051,7 +1051,11 @@ int mthca_QUERY_DEV_LIM(struct mthca_dev
 	MTHCA_GET(field, outbox, QUERY_DEV_LIM_MAX_EQ_OFFSET);
 	dev_lim->max_eqs = 1 << (field & 0x7);
 	MTHCA_GET(field, outbox, QUERY_DEV_LIM_RSVD_MTT_OFFSET);
-	dev_lim->reserved_mtts = 1 << (field >> 4);
+	if (mthca_is_memfree(dev))
+		dev_lim->reserved_mtts = ALIGN((1 << (field >> 4)) * sizeof(u64),
+					       MTHCA_MTT_SEG_SIZE) / MTHCA_MTT_SEG_SIZE;
+	else
+		dev_lim->reserved_mtts = 1 << (field >> 4);
 	MTHCA_GET(field, outbox, QUERY_DEV_LIM_MAX_MRW_SZ_OFFSET);
 	dev_lim->max_mrw_sz = 1 << field;
 	MTHCA_GET(field, outbox, QUERY_DEV_LIM_RSVD_MRW_OFFSET);
-- 
MST


From swise at opengridcomputing.com  Sat Feb 10 13:26:35 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Sat, 10 Feb 2007 15:26:35 -0600
Subject: [openib-general] [PATCH] RDMA/iwcm: Bugs in
 cm_conn_req_handler()
In-Reply-To: <1171139763.11017.68.camel@stevo-desktop>
References: <OFB574507E.6CAEE34E-ON6525727D.0013194D-6525727D.00132A8C@in.ibm.com>
	<ada7iusm021.fsf@cisco.com> <1171035668.26453.11.camel@trinity.ogc.int>
	<1171135423.11017.61.camel@stevo-desktop>
	<1171139763.11017.68.camel@stevo-desktop>
Message-ID: <1171142795.11017.71.camel@stevo-desktop>

On Sat, 2007-02-10 at 14:36 -0600, Steve Wise wrote:
> ugh. 
> 
> There is at least one bug in this patch.  I cannot call iw_cm_reject()
> inside destroy_cm_id() because both functions grab the iw_cm lock...
> 
> 

This patch puts the iw_cm_reject() calls back in
cm_conn_req_handler()...


---

iw_cm_id destruction race condition fixes.

From: Steve Wise <swise at opengridcomputing.com>

Several changes:

- iwcm_deref_id() always wakes up if there's another reference.

- clean up race condition in cm_work_handler().

- create static void free_cm_id() which deallocs the work entries and then
  kfrees the cm_id memory.  This reduces code replication.

- rem_ref() if this is the last reference -and- the IWCM owns freeing the 
  cm_id, then free it.

Signed-off-by: Steve Wise <swise at opengridcomputing.com>
Signed-off-by: Tom Tucker <tom at opengridcomputing.com>
---

 drivers/infiniband/core/iwcm.c |   47 +++++++++++++++++++++-------------------
 1 files changed, 25 insertions(+), 22 deletions(-)

diff --git a/drivers/infiniband/core/iwcm.c b/drivers/infiniband/core/iwcm.c
index 1039ad5..891d1fa 100644
--- a/drivers/infiniband/core/iwcm.c
+++ b/drivers/infiniband/core/iwcm.c
@@ -146,6 +146,12 @@ static int copy_private_data(struct iw_c
 	return 0;
 }
 
+static void free_cm_id(struct iwcm_id_private *cm_id_priv)
+{
+	dealloc_work_entries(cm_id_priv);
+	kfree(cm_id_priv);
+}
+
 /*
  * Release a reference on cm_id. If the last reference is being
  * released, enable the waiting thread (in iw_destroy_cm_id) to
@@ -153,21 +159,14 @@ static int copy_private_data(struct iw_c
  */
 static int iwcm_deref_id(struct iwcm_id_private *cm_id_priv)
 {
-	int ret = 0;
-
 	BUG_ON(atomic_read(&cm_id_priv->refcount)==0);
 	if (atomic_dec_and_test(&cm_id_priv->refcount)) {
 		BUG_ON(!list_empty(&cm_id_priv->work_list));
-		if (waitqueue_active(&cm_id_priv->destroy_comp.wait)) {
-			BUG_ON(cm_id_priv->state != IW_CM_STATE_DESTROYING);
-			BUG_ON(test_bit(IWCM_F_CALLBACK_DESTROY,
-					&cm_id_priv->flags));
-			ret = 1;
-		}
 		complete(&cm_id_priv->destroy_comp);
+		return 1;
 	}
 
-	return ret;
+	return 0;
 }
 
 static void add_ref(struct iw_cm_id *cm_id)
@@ -181,7 +180,11 @@ static void rem_ref(struct iw_cm_id *cm_
 {
 	struct iwcm_id_private *cm_id_priv;
 	cm_id_priv = container_of(cm_id, struct iwcm_id_private, id);
-	iwcm_deref_id(cm_id_priv);
+	if (iwcm_deref_id(cm_id_priv) &&
+	    test_bit(IWCM_F_CALLBACK_DESTROY, &cm_id_priv->flags)) {
+		BUG_ON(!list_empty(&cm_id_priv->work_list));
+		free_cm_id(cm_id_priv);
+	}
 }
 
 static int cm_event_handler(struct iw_cm_id *cm_id, struct iw_cm_event *event);
@@ -355,7 +358,9 @@ static void destroy_cm_id(struct iw_cm_i
 	case IW_CM_STATE_CONN_RECV:
 		/*
 		 * App called destroy before/without calling accept after
-		 * receiving connection request event notification.
+		 * receiving connection request event notification or
+		 * returned non zero from the event callback function.
+		 * In either case, must tell the provider to reject.
 		 */
 		cm_id_priv->state = IW_CM_STATE_DESTROYING;
 		break;
@@ -391,9 +396,7 @@ void iw_destroy_cm_id(struct iw_cm_id *c
 
 	wait_for_completion(&cm_id_priv->destroy_comp);
 
-	dealloc_work_entries(cm_id_priv);
-
-	kfree(cm_id_priv);
+	free_cm_id(cm_id_priv);
 }
 EXPORT_SYMBOL(iw_destroy_cm_id);
 
@@ -647,10 +650,11 @@ static void cm_conn_req_handler(struct i
 	/* Call the client CM handler */
 	ret = cm_id->cm_handler(cm_id, iw_event);
 	if (ret) {
+		iw_cm_reject(cm_id, NULL, 0);
 		set_bit(IWCM_F_CALLBACK_DESTROY, &cm_id_priv->flags);
 		destroy_cm_id(cm_id);
 		if (atomic_read(&cm_id_priv->refcount)==0)
-			kfree(cm_id);
+			free_cm_id(cm_id_priv);
 	}
 
 out:
@@ -854,13 +858,12 @@ static void cm_work_handler(struct work_
 			destroy_cm_id(&cm_id_priv->id);
 		}
 		BUG_ON(atomic_read(&cm_id_priv->refcount)==0);
-		if (iwcm_deref_id(cm_id_priv))
-			return;
-
-		if (atomic_read(&cm_id_priv->refcount)==0 &&
-		    test_bit(IWCM_F_CALLBACK_DESTROY, &cm_id_priv->flags)) {
-			dealloc_work_entries(cm_id_priv);
-			kfree(cm_id_priv);
+		if (iwcm_deref_id(cm_id_priv)) {
+			if (test_bit(IWCM_F_CALLBACK_DESTROY,
+				     &cm_id_priv->flags)) {
+				BUG_ON(!list_empty(&cm_id_priv->work_list));
+				free_cm_id(cm_id_priv);
+			}
 			return;
 		}
 		spin_lock_irqsave(&cm_id_priv->lock, flags);


From rdreier at cisco.com  Sat Feb 10 15:12:04 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Sat, 10 Feb 2007 15:12:04 -0800
Subject: [openib-general] [PATCH] for-2.6.21 Remove hw/cxgb3/core
	subdirectory.
In-Reply-To: <1171133573.11017.41.camel@stevo-desktop> (Steve Wise's
	message of "Sat, 10 Feb 2007 12:52:53 -0600")
References: <1171133573.11017.41.camel@stevo-desktop>
Message-ID: <ada7iupfvzv.fsf@cisco.com>

Thanks, applied this and the previous patch, and pushed out my
for-2.6.21 branch.  I also rebased so the cxgb3 net driver builds now.


From mst at mellanox.co.il  Sat Feb 10 15:32:29 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Sun, 11 Feb 2007 01:32:29 +0200
Subject: [openib-general] integer overflow
Message-ID: <20070210233229.GE32216@mellanox.co.il>

Roland, the following code in ipoib:

	while ((int) priv->tx_tail - (int) priv->tx_head < 0) {

seems to rely on integer overflow which seems to be
undefined behaviour.

Should we care?

-- 
MST


From rdreier at cisco.com  Sat Feb 10 15:52:45 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Sat, 10 Feb 2007 15:52:45 -0800
Subject: [openib-general] integer overflow
In-Reply-To: <20070210233229.GE32216@mellanox.co.il> (Michael S.
	Tsirkin's message of "Sun, 11 Feb 2007 01:32:29 +0200")
References: <20070210233229.GE32216@mellanox.co.il>
Message-ID: <adavei9efjm.fsf@cisco.com>

 > 	while ((int) priv->tx_tail - (int) priv->tx_head < 0) {
 > 
 > seems to rely on integer overflow which seems to be
 > undefined behaviour.

tx_tail and tx_head are unsigned, and overflow is defined for unsigned
integers.

 - R.


From mst at mellanox.co.il  Sat Feb 10 15:59:35 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Sun, 11 Feb 2007 01:59:35 +0200
Subject: [openib-general] integer overflow
In-Reply-To: <adavei9efjm.fsf@cisco.com>
References: <adavei9efjm.fsf@cisco.com>
Message-ID: <20070210235935.GF32216@mellanox.co.il>

> Quoting Roland Dreier <rdreier at cisco.com>:
> Subject: Re: integer overflow
> 
>  > 	while ((int) priv->tx_tail - (int) priv->tx_head < 0) {
>  > 
>  > seems to rely on integer overflow which seems to be
>  > undefined behaviour.
> 
> tx_tail and tx_head are unsigned, and overflow is defined for unsigned
> integers.

Yes but we cast them to signed int here - no?


-- 
MST


From rdreier at cisco.com  Sat Feb 10 16:01:01 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Sat, 10 Feb 2007 16:01:01 -0800
Subject: [openib-general] integer overflow
In-Reply-To: <20070210235935.GF32216@mellanox.co.il> (Michael S.
	Tsirkin's message of "Sun, 11 Feb 2007 01:59:35 +0200")
References: <adavei9efjm.fsf@cisco.com> <20070210235935.GF32216@mellanox.co.il>
Message-ID: <adad54hef5u.fsf@cisco.com>

 > Yes but we cast them to signed int here - no?

That's true, I guess it is technically undefined.  But time_after() is
relying on the same thing working, so I would say we don't care.

 - R.


From mst at mellanox.co.il  Sat Feb 10 16:31:40 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Sun, 11 Feb 2007 02:31:40 +0200
Subject: [openib-general] [PATCH RFC] use common cq for ipoib cm send side
Message-ID: <20070211003140.GH32216@mellanox.co.il>

The following untested patch moves all TX processing in IPoIB CM to common CQ.
This should help reduce the number of interrupts for bi-directional traffic
(such as TCP). Is this a good idea? What do others think?

Signed-off-by: Michael S. Tsirkin <mst at mellanox.co.il>

---

diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h b/drivers/infiniband/ulp/ipoib/ipoib.h
index eb885ee..ef703c7 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib.h
+++ b/drivers/infiniband/ulp/ipoib/ipoib.h
@@ -99,9 +99,9 @@ enum {
 
 #define	IPOIB_OP_RECV   (1ul << 31)
 #ifdef CONFIG_INFINIBAND_IPOIB_CM
-#define	IPOIB_CM_OP_SRQ (1ul << 30)
+#define	IPOIB_OP_CM     (1ul << 30)
 #else
-#define	IPOIB_CM_OP_SRQ (0)
+#define	IPOIB_OP_CM     (0)
 #endif
 
 /* structs */
@@ -144,7 +144,6 @@ struct ipoib_cm_rx {
 
 struct ipoib_cm_tx {
 	struct ib_cm_id     *id;
-	struct ib_cq        *cq;
 	struct ib_qp        *qp;
 	struct list_head     list;
 	struct net_device   *dev;
@@ -233,6 +232,7 @@ struct ipoib_dev_priv {
 	unsigned             tx_tail;
 	struct ib_sge        tx_sge;
 	struct ib_send_wr    tx_wr;
+	unsigned             tx_outstanding;
 
 	struct ib_wc ibwc[IPOIB_NUM_WC];
 
@@ -439,6 +439,7 @@ void ipoib_cm_destroy_tx(struct ipoib_cm_tx *tx);
 void ipoib_cm_skb_too_long(struct net_device* dev, struct sk_buff *skb,
 			   unsigned int mtu);
 void ipoib_cm_handle_rx_wc(struct net_device *dev, struct ib_wc *wc);
+void ipoib_cm_handle_tx_wc(struct net_device *dev, struct ib_wc *wc);
 #else
 
 struct ipoib_cm_tx;
@@ -527,6 +528,9 @@ static inline void ipoib_cm_handle_rx_wc(struct net_device *dev, struct ib_wc *w
 {
 }
 
+static inline void ipoib_cm_handle_tx_wc(struct net_device *dev, struct ib_wc *wc)
+{
+}
 #endif
 
 #ifdef CONFIG_INFINIBAND_IPOIB_DEBUG
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_cm.c b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
index 8ee6f06..47c868c 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_cm.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
@@ -85,7 +85,7 @@ static int ipoib_cm_post_receive(struct net_device *dev, int id)
 	struct ib_recv_wr *bad_wr;
 	int i, ret;
 
-	priv->cm.rx_wr.wr_id = id | IPOIB_CM_OP_SRQ;
+	priv->cm.rx_wr.wr_id = id | IPOIB_OP_CM | IPOIB_OP_RECV;
 
 	for (i = 0; i < IPOIB_CM_RX_SG; ++i)
 		priv->cm.rx_sge[i].addr = priv->cm.srq_ring[id].mapping[i];
@@ -346,7 +346,7 @@ static void skb_put_frags(struct sk_buff *skb, unsigned int hdr_space,
 void ipoib_cm_handle_rx_wc(struct net_device *dev, struct ib_wc *wc)
 {
 	struct ipoib_dev_priv *priv = netdev_priv(dev);
-	unsigned int wr_id = wc->wr_id & ~IPOIB_CM_OP_SRQ;
+	unsigned int wr_id = wc->wr_id & ~(IPOIB_OP_CM | IPOIB_OP_RECV);
 	struct sk_buff *skb;
 	struct ipoib_cm_rx *p;
 	unsigned long flags;
@@ -433,7 +433,7 @@ static inline int post_send(struct ipoib_dev_priv *priv,
 	priv->tx_sge.addr             = addr;
 	priv->tx_sge.length           = len;
 
-	priv->tx_wr.wr_id 	      = wr_id;
+	priv->tx_wr.wr_id 	      = wr_id | IPOIB_OP_CM;
 
 	return ib_post_send(tx->qp, &priv->tx_wr, &bad_wr);
 }
@@ -484,20 +484,19 @@ void ipoib_cm_send(struct net_device *dev, struct sk_buff *skb, struct ipoib_cm_
 		dev->trans_start = jiffies;
 		++tx->tx_head;
 
-		if (tx->tx_head - tx->tx_tail == ipoib_sendq_size) {
+		if (++priv->tx_outstanding == ipoib_sendq_size) {
 			ipoib_dbg(priv, "TX ring 0x%x full, stopping kernel net queue\n",
 				  tx->qp->qp_num);
 			netif_stop_queue(dev);
-			set_bit(IPOIB_FLAG_NETIF_STOPPED, &tx->flags);
 		}
 	}
 }
 
-static void ipoib_cm_handle_tx_wc(struct net_device *dev, struct ipoib_cm_tx *tx,
-				  struct ib_wc *wc)
+void ipoib_cm_handle_tx_wc(struct net_device *dev, struct ib_wc *wc)
 {
 	struct ipoib_dev_priv *priv = netdev_priv(dev);
-	unsigned int wr_id = wc->wr_id;
+	struct ipoib_cm_tx *tx = wc->qp->qp_context;
+	unsigned int wr_id = wc->wr_id & ~IPOIB_OP_CM;
 	struct ipoib_tx_buf *tx_req;
 	unsigned long flags;
 
@@ -522,11 +521,10 @@ static void ipoib_cm_handle_tx_wc(struct net_device *dev, struct ipoib_cm_tx *tx
 
 	spin_lock_irqsave(&priv->tx_lock, flags);
 	++tx->tx_tail;
-	if (unlikely(test_bit(IPOIB_FLAG_NETIF_STOPPED, &tx->flags)) &&
-	    tx->tx_head - tx->tx_tail <= ipoib_sendq_size >> 1) {
-		clear_bit(IPOIB_FLAG_NETIF_STOPPED, &tx->flags);
+	if (unlikely(--priv->tx_outstanding == ipoib_sendq_size >> 1) &&
+	    netif_queue_stopped(dev) &&
+	    test_bit(IPOIB_FLAG_ADMIN_UP, &priv->flags))
 		netif_wake_queue(dev);
-	}
 
 	if (wc->status != IB_WC_SUCCESS &&
 	    wc->status != IB_WC_WR_FLUSH_ERR) {
@@ -551,8 +549,17 @@ static void ipoib_cm_handle_tx_wc(struct net_device *dev, struct ipoib_cm_tx *tx
 
 		/* queue would be re-started anyway when TX is destroyed,
 		 * but it makes sense to do it ASAP here. */
-		if (test_and_clear_bit(IPOIB_FLAG_NETIF_STOPPED, &tx->flags))
-			netif_wake_queue(dev);
+		while ((int) tx->tx_tail - (int) tx->tx_head < 0) {
+			tx_req = &tx->tx_ring[tx->tx_tail & (ipoib_sendq_size - 1)];
+			ib_dma_unmap_single(priv->ca, tx_req->mapping, tx_req->skb->len,
+					 DMA_TO_DEVICE);
+			dev_kfree_skb_any(tx_req->skb);
+			++tx->tx_tail;
+			if (unlikely(--priv->tx_outstanding == ipoib_sendq_size >> 1) &&
+			    netif_queue_stopped(tx->dev) &&
+			    test_bit(IPOIB_FLAG_ADMIN_UP, &priv->flags))
+				netif_wake_queue(tx->dev);
+		}
 
 		if (test_and_clear_bit(IPOIB_FLAG_INITIALIZED, &tx->flags)) {
 			list_move(&tx->list, &priv->cm.reap_list);
@@ -567,19 +574,6 @@ static void ipoib_cm_handle_tx_wc(struct net_device *dev, struct ipoib_cm_tx *tx
 	spin_unlock_irqrestore(&priv->tx_lock, flags);
 }
 
-static void ipoib_cm_tx_completion(struct ib_cq *cq, void *tx_ptr)
-{
-	struct ipoib_cm_tx *tx = tx_ptr;
-	int n, i;
-
-	ib_req_notify_cq(cq, IB_CQ_NEXT_COMP);
-	do {
-		n = ib_poll_cq(cq, IPOIB_NUM_WC, tx->ibwc);
-		for (i = 0; i < n; ++i)
-			ipoib_cm_handle_tx_wc(tx->dev, tx, tx->ibwc + i);
-	} while (n == IPOIB_NUM_WC);
-}
-
 int ipoib_cm_dev_open(struct net_device *dev)
 {
 	struct ipoib_dev_priv *priv = netdev_priv(dev);
@@ -699,17 +693,18 @@ static int ipoib_cm_rep_handler(struct ib_cm_id *cm_id, struct ib_cm_event *even
 	return 0;
 }
 
-static struct ib_qp *ipoib_cm_create_tx_qp(struct net_device *dev, struct ib_cq *cq)
+static struct ib_qp *ipoib_cm_create_tx_qp(struct net_device *dev, struct ipoib_cm_tx *tx)
 {
 	struct ipoib_dev_priv *priv = netdev_priv(dev);
 	struct ib_qp_init_attr attr = {};
 	attr.recv_cq = priv->cq;
+	attr.send_cq = priv->cq;
 	attr.srq = priv->cm.srq;
 	attr.cap.max_send_wr = ipoib_sendq_size;
 	attr.cap.max_send_sge = 1;
 	attr.sq_sig_type = IB_SIGNAL_ALL_WR;
 	attr.qp_type = IB_QPT_RC;
-	attr.send_cq = cq;
+	attr.qp_context = tx;
 	return ib_create_qp(priv->pd, &attr);
 }
 
@@ -789,21 +784,7 @@ static int ipoib_cm_tx_init(struct ipoib_cm_tx *p, u32 qpn,
 		goto err_tx;
 	}
 
-	p->cq = ib_create_cq(priv->ca, ipoib_cm_tx_completion, NULL, p,
-			     ipoib_sendq_size + 1);
-	if (IS_ERR(p->cq)) {
-		ret = PTR_ERR(p->cq);
-		ipoib_warn(priv, "failed to allocate tx cq: %d\n", ret);
-		goto err_cq;
-	}
-
-	ret = ib_req_notify_cq(p->cq, IB_CQ_NEXT_COMP);
-	if (ret) {
-		ipoib_warn(priv, "failed to request completion notification: %d\n", ret);
-		goto err_req_notify;
-	}
-
-	p->qp = ipoib_cm_create_tx_qp(p->dev, p->cq);
+	p->qp = ipoib_cm_create_tx_qp(p->dev, p);
 	if (IS_ERR(p->qp)) {
 		ret = PTR_ERR(p->qp);
 		ipoib_warn(priv, "failed to allocate tx qp: %d\n", ret);
@@ -840,12 +821,8 @@ err_modify:
 err_id:
 	p->id = NULL;
 	ib_destroy_qp(p->qp);
-err_req_notify:
 err_qp:
 	p->qp = NULL;
-	ib_destroy_cq(p->cq);
-err_cq:
-	p->cq = NULL;
 err_tx:
 	return ret;
 }
@@ -854,6 +831,7 @@ static void ipoib_cm_tx_destroy(struct ipoib_cm_tx *p)
 {
 	struct ipoib_dev_priv *priv = netdev_priv(p->dev);
 	struct ipoib_tx_buf *tx_req;
+	unsigned long flags;
 
 	ipoib_dbg(priv, "Destroy active connection 0x%x head 0x%x tail 0x%x\n",
 		  p->qp ? p->qp->qp_num : 0, p->tx_head, p->tx_tail);
@@ -864,12 +842,6 @@ static void ipoib_cm_tx_destroy(struct ipoib_cm_tx *p)
 	if (p->qp)
 		ib_destroy_qp(p->qp);
 
-	if (p->cq)
-		ib_destroy_cq(p->cq);
-
-	if (test_bit(IPOIB_FLAG_NETIF_STOPPED, &p->flags))
-		netif_wake_queue(p->dev);
-
 	if (p->tx_ring) {
 		while ((int) p->tx_tail - (int) p->tx_head < 0) {
 			tx_req = &p->tx_ring[p->tx_tail & (ipoib_sendq_size - 1)];
@@ -877,6 +849,12 @@ static void ipoib_cm_tx_destroy(struct ipoib_cm_tx *p)
 					 DMA_TO_DEVICE);
 			dev_kfree_skb_any(tx_req->skb);
 			++p->tx_tail;
+			spin_lock_irqsave(&priv->tx_lock, flags);
+			if (unlikely(--priv->tx_outstanding == ipoib_sendq_size >> 1) &&
+			    netif_queue_stopped(p->dev) &&
+			    test_bit(IPOIB_FLAG_ADMIN_UP, &priv->flags))
+				netif_wake_queue(p->dev);
+			spin_unlock_irqrestore(&priv->tx_lock, flags);
 		}
 
 		kfree(p->tx_ring);
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_ib.c b/drivers/infiniband/ulp/ipoib/ipoib_ib.c
index f2aa923..19a3d3e 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_ib.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_ib.c
@@ -266,11 +266,10 @@ static void ipoib_ib_handle_tx_wc(struct net_device *dev, struct ib_wc *wc)
 
 	spin_lock_irqsave(&priv->tx_lock, flags);
 	++priv->tx_tail;
-	if (unlikely(test_bit(IPOIB_FLAG_NETIF_STOPPED, &priv->flags)) &&
-	    priv->tx_head - priv->tx_tail <= ipoib_sendq_size >> 1) {
-		clear_bit(IPOIB_FLAG_NETIF_STOPPED, &priv->flags);
+	if (unlikely(--priv->tx_outstanding == ipoib_sendq_size >> 1) &&
+	    netif_queue_stopped(dev) &&
+	    test_bit(IPOIB_FLAG_ADMIN_UP, &priv->flags))
 		netif_wake_queue(dev);
-	}
 	spin_unlock_irqrestore(&priv->tx_lock, flags);
 
 	if (wc->status != IB_WC_SUCCESS &&
@@ -282,12 +281,17 @@ static void ipoib_ib_handle_tx_wc(struct net_device *dev, struct ib_wc *wc)
 
 static void ipoib_ib_handle_wc(struct net_device *dev, struct ib_wc *wc)
 {
-	if (wc->wr_id & IPOIB_CM_OP_SRQ)
-		ipoib_cm_handle_rx_wc(dev, wc);
-	else if (wc->wr_id & IPOIB_OP_RECV)
-		ipoib_ib_handle_rx_wc(dev, wc);
-	else
-		ipoib_ib_handle_tx_wc(dev, wc);
+	if (wc->wr_id & IPOIB_OP_CM) {
+		if (wc->wr_id & IPOIB_OP_RECV)
+			ipoib_cm_handle_rx_wc(dev, wc);
+		else
+			ipoib_cm_handle_tx_wc(dev, wc);
+	} else {
+		if (wc->wr_id & IPOIB_OP_RECV)
+			ipoib_ib_handle_rx_wc(dev, wc);
+		else
+			ipoib_ib_handle_tx_wc(dev, wc);
+	}
 }
 
 void ipoib_ib_completion(struct ib_cq *cq, void *dev_ptr)
@@ -370,10 +374,9 @@ void ipoib_send(struct net_device *dev, struct sk_buff *skb,
 		address->last_send = priv->tx_head;
 		++priv->tx_head;
 
-		if (priv->tx_head - priv->tx_tail == ipoib_sendq_size) {
+		if (++priv->tx_outstanding == ipoib_sendq_size) {
 			ipoib_dbg(priv, "TX ring full, stopping kernel net queue\n");
 			netif_stop_queue(dev);
-			set_bit(IPOIB_FLAG_NETIF_STOPPED, &priv->flags);
 		}
 	}
 }
@@ -549,6 +552,7 @@ int ipoib_ib_dev_stop(struct net_device *dev)
 						    DMA_TO_DEVICE);
 				dev_kfree_skb_any(tx_req->skb);
 				++priv->tx_tail;
+				--priv->tx_outstanding;
 			}
 
 			for (i = 0; i < ipoib_recvq_size; ++i) {
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c
index 19e82db..7c7b136 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
@@ -900,7 +900,7 @@ int ipoib_dev_init(struct net_device *dev, struct ib_device *ca, int port)
 		goto out_rx_ring_cleanup;
 	}
 
-	/* priv->tx_head & tx_tail are already 0 */
+	/* priv->tx_head, tx_tail & tx_outstanding are already 0 */
 
 	if (ipoib_ib_dev_init(dev, ca, port))
 		goto out_tx_ring_cleanup;

-- 
MST


From vlad at mellanox.co.il  Sat Feb 10 23:58:05 2007
From: vlad at mellanox.co.il (Vladimir Sokolovsky)
Date: Sun, 11 Feb 2007 09:58:05 +0200
Subject: [openib-general] [PATCH ofed-1.2] ofa_user.spec: fix libehca
 directory structure
In-Reply-To: <200702081832.14862.ossrosch@linux.vnet.ibm.com>
References: <200702081832.14862.ossrosch@linux.vnet.ibm.com>
Message-ID: <1171180685.5694.2.camel@vladsk-laptop>

On Thu, 2007-02-08 at 18:32 +0100, Stefan Roscher wrote:
> Correct directory structure according to new driver loading scheme from libibverbs
> 
> 
> Signed-off-by: Stefan Roscher <stefan.roscher at de.ibm.com>
> ---
> 
> 
> --- ofa_user.spec_old	2007-02-08 09:03:33.000000000 -0800
> +++ ofa_user.spec_new	2007-02-08 09:07:32.000000000 -0800

Applied.


-- 
Vladimir Sokolovsky <vlad at mellanox.co.il>
Mellanox Technologies Ltd.


From vlad at mellanox.co.il  Sun Feb 11 00:33:10 2007
From: vlad at mellanox.co.il (Vladimir Sokolovsky)
Date: Sun, 11 Feb 2007 10:33:10 +0200
Subject: [openib-general] [PATCH ofed-1.2] ofa_user.spec: fix
 installation path for ehca.driver
In-Reply-To: <200702091437.02142.ossrosch@linux.vnet.ibm.com>
References: <200702091437.02142.ossrosch@linux.vnet.ibm.com>
Message-ID: <1171182790.5694.4.camel@vladsk-laptop>

On Fri, 2007-02-09 at 14:37 +0100, Stefan Roscher wrote:
> Hi Vladimir,
> 
> we tested the newest ofed1.2 package and found out that ehca.driver file is
> not copied into /usr/local/ofed/etc/libibverbs.d/
> 
> This patch add the installation path for ehca.driver to ofa_user.spec. 
> Please ensure you first apply the ofa_user.spec patch I sent yesterday:
> http://openib.org/pipermail/openib-general/2007-February/032736.html
> 
> 
> Signed-off-by: Stefan Roscher <stefan.roscher at de.ibm.com>
> ---
> 
> 
> ofa_user.spec |    1 +
> 1 files changed, 1 insertion(+)
> 
> 
> 
> diff -Nurp ofed_scripts_old/ofa_user.spec ofed_scripts_new/ofa_user.spec
> --- ofed_scripts_old/ofa_user.spec	2007-02-09 14:00:38.000000000 +0100
> +++ ofed_scripts_new/ofa_user.spec	2007-02-09 14:02:45.000000000 +0100
> @@ -1165,6 +1165,7 @@ fi
>  %files -n libehca -f libehca-files
>  %defattr(-,root,root,-)
>  %{_libdir}/libehca*.so*
> +%config %{_prefix}/etc/libibverbs.d/ehca.driver
>  # %doc AUTHORS COPYING ChangeLog README
>  %endif

Applied.


-- 
Vladimir Sokolovsky <vlad at mellanox.co.il>
Mellanox Technologies Ltd.


From vlad at lists.openfabrics.org  Sun Feb 11 02:24:24 2007
From: vlad at lists.openfabrics.org (vlad at lists.openfabrics.org)
Date: Sun, 11 Feb 2007 02:24:24 -0800 (PST)
Subject: [openib-general] ofa_1_2_kernel 20070211-0200 daily build status
Message-ID: <20070211102424.ACFEAE60808@openfabrics.org>

This email was generated automatically, please do not reply


Common build parameters:  --with-ipoib-mod --with-sdp-mod --with-srp-mod --with-user_mad-mod --with-user_access-mod --with-mthca-mod --with-core-mod --with-addr_trans-mod --with-cxgb3-mod 

Passed:
Passed on i686 with 2.6.15-23-server
Passed on i686 with linux-2.6.19
Passed on i686 with linux-2.6.18
Passed on i686 with linux-2.6.17
Passed on i686 with linux-2.6.15
Passed on i686 with linux-2.6.12
Passed on i686 with linux-2.6.16
Passed on i686 with linux-2.6.13
Passed on i686 with linux-2.6.14
Passed on ia64 with linux-2.6.19
Passed on x86_64 with linux-2.6.16
Passed on powerpc with linux-2.6.19
Passed on x86_64 with linux-2.6.17
Passed on x86_64 with linux-2.6.19
Passed on x86_64 with linux-2.6.12
Passed on ppc64 with linux-2.6.18
Passed on powerpc with linux-2.6.18
Passed on powerpc with linux-2.6.17
Passed on x86_64 with linux-2.6.14
Passed on x86_64 with linux-2.6.15
Passed on x86_64 with linux-2.6.18
Passed on x86_64 with linux-2.6.13
Passed on powerpc with linux-2.6.15
Passed on powerpc with linux-2.6.12
Passed on powerpc with linux-2.6.16
Passed on ppc64 with linux-2.6.15
Passed on powerpc with linux-2.6.13
Passed on ppc64 with linux-2.6.16
Passed on ppc64 with linux-2.6.12
Passed on ia64 with linux-2.6.18
Passed on ppc64 with linux-2.6.13
Passed on ppc64 with linux-2.6.19
Passed on powerpc with linux-2.6.14
Passed on ppc64 with linux-2.6.17
Passed on ppc64 with linux-2.6.14
Passed on ia64 with linux-2.6.17
Passed on ia64 with linux-2.6.13
Passed on ia64 with linux-2.6.12
Passed on ia64 with linux-2.6.15
Passed on ia64 with linux-2.6.16
Passed on ia64 with linux-2.6.14

Failed:
Build failed on ia64 with linux-2.6.16.21-0.8-default
Log:
/home/vlad/tmp/ofa_1_2_kernel-20070211-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core/addr.c:380: error: implicit declaration of function ‘register_netevent_notifier’
/home/vlad/tmp/ofa_1_2_kernel-20070211-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core/addr.c: In function ‘addr_cleanup’:
/home/vlad/tmp/ofa_1_2_kernel-20070211-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core/addr.c:386: error: implicit declaration of function ‘unregister_netevent_notifier’
make[4]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070211-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core/addr.o] Error 1
make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070211-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core] Error 2
make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070211-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband] Error 2
make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070211-0200_linux-2.6.16.21-0.8-default_ia64_check] Error 2
make[1]: Leaving directory `/home/vlad/kernel.org/ia64/linux-2.6.16.21-0.8-default'
make: *** [kernel] Error 2
----------------------------------------------------------------------------------


From ogerlitz at voltaire.com  Sun Feb 11 03:23:20 2007
From: ogerlitz at voltaire.com (Or Gerlitz)
Date: Sun, 11 Feb 2007 13:23:20 +0200
Subject: [openib-general] please pull for 2.6.21: fix + add IB multicast
 support
In-Reply-To: <adaodo4nqz7.fsf@cisco.com>
References: <000701c741a6$16dc4760$ff0da8c0@amr.corp.intel.com>
	<45BF8E17.2010805@ichips.intel.com> <45C37BE9.5040105@ichips.intel.com>
	<45C85B39.4080700@voltaire.com> <45CB3537.8060508@voltaire.com>
	<adaodo4nqz7.fsf@cisco.com>
Message-ID: <45CEFCA8.4000008@voltaire.com>

Roland Dreier wrote:
> I merged the "increment port number" and "remove redundant '_wq'"
> patches from git.openfabrics.org/~shefty/scm/rdma-dev.git for-roland
> 
> I plan to review to multicast stuff next week and I hope to merge it
> for 2.6.21.  Or, have you or anyone else at Voltaire read over the
> code in addition to using it?  Do you see anything that should be
> cleaned up?

OK, I spent some time today on reviewing and playing with the ib_sa: 
track multicast join/leave requests patch - and have no special 
comments. I think the two patches are ready for merge, let me know if 
you have any specific question.

Or.


From tziporet at mellanox.co.il  Sun Feb 11 05:43:10 2007
From: tziporet at mellanox.co.il (Tziporet Koren)
Date: Sun, 11 Feb 2007 15:43:10 +0200
Subject: [openib-general] Reminder: OFED 1.2 coordination meeting on Mon
	Feb-12 9am PST
Message-ID: <45CF1D6E.4080101@mellanox.co.il>

Reminder: OFED 1.2 coordination meeting on Mon Feb-12 9am PST

Agenda:
* OFED 1.2 Alpha status update

Tziporet

-------------------------------------------------------------------------------------------

Bridge info:

Meeting ID:              2106670
Meeting Password:

Global Access Numbers:
http://cisco.com/en/US/about/doing_business/conferencing/index.html

     US/Canada:  +1.866.432.9903    United Kingdom:   +44.20.8824.0117
     India:      +91.80.4103.3979   Germany:          +49.619.6773.9002
     Japan:      +81.3.5763.9394    China:            +86.10.8515.5666

for world-wide access numbers see:

http://openib.org/pipermail/openib-general/2007-January/031282.html


From pasha at dev.mellanox.co.il  Sun Feb 11 06:52:52 2007
From: pasha at dev.mellanox.co.il (Pavel Shamis (Pasha))
Date: Sun, 11 Feb 2007 16:52:52 +0200
Subject: [openib-general] [openfabrics-ewg] MVAPICH 0.9.9-beta release
	is available
In-Reply-To: <200702092228.l19MSGEo006670@xi.cse.ohio-state.edu>
References: <200702092228.l19MSGEo006670@xi.cse.ohio-state.edu>
Message-ID: <45CF2DC4.8050402@dev.mellanox.co.il>


SRPM with latest version of mvapich 0.9.9 (beta 0.9.9) was uploaded to 
OFED 1.2 repository http://www.openfabrics.org/~pasha/ofed_1_2/mvapich/

Regards,
Pasha

Dhabaleswar Panda wrote:
> The MVAPICH team is pleased to announce the availability of MVAPICH
> 0.9.9-beta with the following NEW features:
> 
> - Message coalescing support to enable reduction of per Queue-pair
>   send queues for reduction in memory requirement on large scale
>   clusters. This design also increases the small message messaging
>   rate significantly.
> 
> - Designs for avoiding hot-spots in networks of large-scale clusters
> 
>   - Multi-pathing support leveraging LMC mechanism
>   - Multi-port support for enabling user processes to bind to 
>     different IB ports for balanced communication performance
>     on multi-core platforms
> 
> - Multi-core optimized scalable shared memory design
> 
> - Memory Hook support provided by integration with ptmalloc2 library. 
>   This provides safe release of memory to the Operating System and
>   is expected to benefit the memory usage of applications that 
>   frequently use malloc and free operations.
> 
> - Optimized, high-performance shared memory aware collective
>   operations for multi-core platforms
> 
> - Shared-Memory only channel (This interface support is useful for
>   running MPI jobs on multi-processor systems without using any 
>   high-performance network. For example, multi-core servers, 
>   desktops, and laptops; and clusters with serial nodes.)
> 
> A new "Multiple-pair Bandwidth and Message Rate" test is also
> available as a part of OSU_Benchmarks.
> 
> For downloading MVAPICH 0.9.9-beta package and accessing the anonymous
> SVN, please visit the following URL:
> 
> http://nowlab.cse.ohio-state.edu/projects/mpi-iba/
> 
> MVAPICH 0.9.9-beta is also available for OFED 1.2 testing.
> 
> All feedbacks, including bug reports and hints for performance tuning,
> are welcome. Please post it to the mvapich-discuss mailing list.
> 
> Thanks, 
> 
> MVAPICH Team
> 
> 
> _______________________________________________
> openfabrics-ewg mailing list
> openfabrics-ewg at openib.org
> http://openib.org/mailman/listinfo/openfabrics-ewg
> 


From sweitzen at cisco.com  Sun Feb 11 10:44:54 2007
From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen))
Date: Sun, 11 Feb 2007 10:44:54 -0800
Subject: [openib-general] Problem with install.sh openib-diags
	OFED-1.2-20070208-1508.tgz
Message-ID: <A15335FBE9BD2449AF2C9EF3D1EB8EA302FEF308@xmb-sjc-216.amer.cisco.com>

I'm using install.sh on RHEL4 U3 x86_64
 
Preparing...
##################################################
kernel-ib-devel
##################################################
kernel-ib
##################################################
error: Failed dependencies:
        perl(IBswcountlimits) is needed by
openib-diags-1.2.0-pre1.x86_64
ERROR: Failed executing "/bin/rpm -ihv
/tmp/OFED-1.2-20070208-1508/RPMS/redhat-\
release-4AS-4.1/dapl-1.2.0-0.x86_64.rpm
/tmp/OFED-1.2-20070208-1508/RPMS/redhat\
-release-4AS-4.1/dapl-devel-1.2.0-0.x86_64.rpm
/tmp/OFED-1.2-20070208-1508/RPMS\
/redhat-release-4AS-4.1/libibcommon-1.0.2-0.x86_64.rpm
/tmp/OFED-1.2-20070208-1\
508/RPMS/redhat-release-4AS-4.1/libibcommon-devel-1.0.2-0.x86_64.rpm
/tmp/OFED-\
1.2-20070208-1508/RPMS/redhat-release-4AS-4.1/libibmad-1.0.2-0.x86_64.rp
m /tmp/\
OFED-1.2-20070208-1508/RPMS/redhat-release-4AS-4.1/libibmad-devel-1.0.2-
0.x86_6\
4.rpm
/tmp/OFED-1.2-20070208-1508/RPMS/redhat-release-4AS-4.1/libibumad-1.0.2-
0\
.x86_64.rpm
/tmp/OFED-1.2-20070208-1508/RPMS/redhat-release-4AS-4.1/libibumad-d\
evel-1.0.2-0.x86_64.rpm
/tmp/OFED-1.2-20070208-1508/RPMS/redhat-release-4AS-4.1\
/libibverbs-1.1-pre1.x86_64.rpm
/tmp/OFED-1.2-20070208-1508/RPMS/redhat-release\
-4AS-4.1/libibverbs-devel-1.1-pre1.x86_64.rpm
/tmp/OFED-1.2-20070208-1508/RPMS/\
redhat-release-4AS-4.1/libibverbs-utils-1.1-pre1.x86_64.rpm
/tmp/OFED-1.2-20070\
208-1508/RPMS/redhat-release-4AS-4.1/libmthca-1.0.4-pre.x86_64.rpm
/tmp/OFED-1.\
2-20070208-1508/RPMS/redhat-release-4AS-4.1/libmthca-devel-1.0.4-pre.x86
_64.rpm\
 
/tmp/OFED-1.2-20070208-1508/RPMS/redhat-release-4AS-4.1/libopensm-3.0.1-
0.x86_\
64.rpm
/tmp/OFED-1.2-20070208-1508/RPMS/redhat-release-4AS-4.1/libopensm-devel-
\
3.0.1-0.x86_64.rpm
/tmp/OFED-1.2-20070208-1508/RPMS/redhat-release-4AS-4.1/libo\
smcomp-3.0.1-0.x86_64.rpm
/tmp/OFED-1.2-20070208-1508/RPMS/redhat-release-4AS-4\
.1/libosmcomp-devel-3.0.1-0.x86_64.rpm
/tmp/OFED-1.2-20070208-1508/RPMS/redhat-\
release-4AS-4.1/libosmvendor-3.0.1-0.x86_64.rpm
/tmp/OFED-1.2-20070208-1508/RPM\
S/redhat-release-4AS-4.1/libosmvendor-devel-3.0.1-0.x86_64.rpm
/tmp/OFED-1.2-20\
070208-1508/RPMS/redhat-release-4AS-4.1/librdmacm-0.9.0-0.x86_64.rpm
/tmp/OFED-\
1.2-20070208-1508/RPMS/redhat-release-4AS-4.1/librdmacm-devel-0.9.0-0.x8
6_64.rp\
m
/tmp/OFED-1.2-20070208-1508/RPMS/redhat-release-4AS-4.1/libsdp-1.1.99-0.
x86_6\
4.rpm
/tmp/OFED-1.2-20070208-1508/RPMS/redhat-release-4AS-4.1/openib-diags-1.2
.\
0-pre1.x86_64.rpm
/tmp/OFED-1.2-20070208-1508/RPMS/redhat-release-4AS-4.1/perft\
est-1.2-0.x86_64.rpm "

 
Scott Weitzenkamp
SQA and Release Manager
Server Virtualization Business Unit
Cisco Systems
 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070211/d0cca468/attachment.html>

From swise at opengridcomputing.com  Sun Feb 11 11:58:19 2007
From: swise at opengridcomputing.com (Steve WIse)
Date: Sun, 11 Feb 2007 13:58:19 -0600
Subject: [openib-general] [PATCH] ofed_1-2 IWCM - Set iniator depth and
 responder resources to device max values.
Message-ID: <1171223899.4027.1.camel@linux-q667.site>


IWCM - Set initiator depth and responder resources to device max values.

For OFED 1.2, the IWCM will set the initiator depth and responder
resources to the device max values for new connect request events.

    
Signed-off-by: Steve Wise <swise at opengridcomputing.com>
---

 kernel_patches/fixes/iwcm_ordird.patch |   43 ++++++++++++++++++++++++++++++++
 1 files changed, 43 insertions(+), 0 deletions(-)

diff --git a/kernel_patches/fixes/iwcm_ordird.patch b/kernel_patches/fixes/iwcm_ordird.patch
new file mode 100644
index 0000000..3a9f643
--- /dev/null
+++ b/kernel_patches/fixes/iwcm_ordird.patch
@@ -0,0 +1,43 @@
+commit 7175034c7adf6b5fb5ba311929376af7501387a1
+Author: Steve Wise <swise at opengridcomputing.com>
+Date:   Sat Feb 10 14:16:35 2007 -0600
+
+    IWCM - Set iniator depth and responder resources to device max values.
+    
+    For OFED 1.2, the IWCM will set the initiator depth and responder
+    resources to the device max values for new connect request events.
+    
+    Signed-off-by: Steve Wise <swise at opengridcomputing.com>
+
+diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
+index 9e0ab04..e3afdf8 100644
+--- a/drivers/infiniband/core/cma.c
++++ b/drivers/infiniband/core/cma.c
+@@ -1137,6 +1137,7 @@ static int iw_conn_req_handler(struct iw
+ 	struct net_device *dev = NULL;
+ 	struct rdma_cm_event event;
+ 	int ret;
++	struct ib_device_attr attr;
+ 
+ 	listen_id = cm_id->context;
+ 	atomic_inc(&listen_id->dev_remove);
+@@ -1189,10 +1190,19 @@ static int iw_conn_req_handler(struct iw
+ 	sin = (struct sockaddr_in *) &new_cm_id->route.addr.dst_addr;
+ 	*sin = iw_event->remote_addr;
+ 
++	ret = ib_query_device(conn_id->id.device, &attr);
++	if (ret) {
++		cma_release_remove(conn_id);
++		rdma_destroy_id(new_cm_id);
++		goto out;
++	}
++	
+ 	memset(&event, 0, sizeof event);
+ 	event.event = RDMA_CM_EVENT_CONNECT_REQUEST;
+ 	event.param.conn.private_data = iw_event->private_data;
+ 	event.param.conn.private_data_len = iw_event->private_data_len;
++	event.param.conn.initiator_depth = attr.max_qp_init_rd_atom;	
++	event.param.conn.responder_resources = attr.max_qp_rd_atom;
+ 	ret = conn_id->id.event_handler(&conn_id->id, &event);
+ 	if (ret) {
+ 		/* User wants to destroy the CM ID */


From mst at mellanox.co.il  Sun Feb 11 13:03:07 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Sun, 11 Feb 2007 23:03:07 +0200
Subject: [openib-general] [PATCH RFC] use common cq for ipoib cm send
	side
In-Reply-To: <20070211003140.GH32216@mellanox.co.il>
References: <20070211003140.GH32216@mellanox.co.il>
Message-ID: <20070211210307.GB28231@mellanox.co.il>

> Quoting Michael S. Tsirkin <mst at mellanox.co.il>:
> Subject: [PATCH RFC] use common cq for ipoib cm send side
> 
> The following untested patch moves all TX processing in IPoIB CM to common CQ.
> This should help reduce the number of interrupts for bi-directional traffic
> (such as TCP). Is this a good idea? What do others think?
> 
> Signed-off-by: Michael S. Tsirkin <mst at mellanox.co.il>

FYI, this was just thinking aloud.  The version below works fine here but the
performance gain seems to be very small (about 1%).  The gain with NAPI might
be bigger but this is yet to be tested.  I'll continue looking into this.

Feedback wellcome.

 ipoib.h      |   10 +++++--
 ipoib_cm.c   |   78 +++++++++++++++--------------------------------------------
 ipoib_ib.c   |   28 ++++++++++++---------
 ipoib_main.c |    2 -
 4 files changed, 45 insertions(+), 73 deletions(-)

------------

Use common CQ for all TX QPs: keep a per-device counter out outstanding
tx WRs, and stop the interface when this counter reaches the send queue size, to
avoid CQ overruns.

Signed-off-by: Michael S. Tsirkin <mst at mellanox.co.il>

---

diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h b/drivers/infiniband/ulp/ipoib/ipoib.h
index eb885ee..ef703c7 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib.h
+++ b/drivers/infiniband/ulp/ipoib/ipoib.h
@@ -99,9 +99,9 @@ enum {
 
 #define	IPOIB_OP_RECV   (1ul << 31)
 #ifdef CONFIG_INFINIBAND_IPOIB_CM
-#define	IPOIB_CM_OP_SRQ (1ul << 30)
+#define	IPOIB_OP_CM     (1ul << 30)
 #else
-#define	IPOIB_CM_OP_SRQ (0)
+#define	IPOIB_OP_CM     (0)
 #endif
 
 /* structs */
@@ -144,7 +144,6 @@ struct ipoib_cm_rx {
 
 struct ipoib_cm_tx {
 	struct ib_cm_id     *id;
-	struct ib_cq        *cq;
 	struct ib_qp        *qp;
 	struct list_head     list;
 	struct net_device   *dev;
@@ -233,6 +232,7 @@ struct ipoib_dev_priv {
 	unsigned             tx_tail;
 	struct ib_sge        tx_sge;
 	struct ib_send_wr    tx_wr;
+	unsigned             tx_outstanding;
 
 	struct ib_wc ibwc[IPOIB_NUM_WC];
 
@@ -439,6 +439,7 @@ void ipoib_cm_destroy_tx(struct ipoib_cm_tx *tx);
 void ipoib_cm_skb_too_long(struct net_device* dev, struct sk_buff *skb,
 			   unsigned int mtu);
 void ipoib_cm_handle_rx_wc(struct net_device *dev, struct ib_wc *wc);
+void ipoib_cm_handle_tx_wc(struct net_device *dev, struct ib_wc *wc);
 #else
 
 struct ipoib_cm_tx;
@@ -527,6 +528,9 @@ static inline void ipoib_cm_handle_rx_wc(struct net_device *dev, struct ib_wc *w
 {
 }
 
+static inline void ipoib_cm_handle_tx_wc(struct net_device *dev, struct ib_wc *wc)
+{
+}
 #endif
 
 #ifdef CONFIG_INFINIBAND_IPOIB_DEBUG
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_cm.c b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
index 8ee6f06..af36562 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_cm.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
@@ -85,7 +85,7 @@ static int ipoib_cm_post_receive(struct net_device *dev, int id)
 	struct ib_recv_wr *bad_wr;
 	int i, ret;
 
-	priv->cm.rx_wr.wr_id = id | IPOIB_CM_OP_SRQ;
+	priv->cm.rx_wr.wr_id = id | IPOIB_OP_CM | IPOIB_OP_RECV;
 
 	for (i = 0; i < IPOIB_CM_RX_SG; ++i)
 		priv->cm.rx_sge[i].addr = priv->cm.srq_ring[id].mapping[i];
@@ -346,7 +346,7 @@ static void skb_put_frags(struct sk_buff *skb, unsigned int hdr_space,
 void ipoib_cm_handle_rx_wc(struct net_device *dev, struct ib_wc *wc)
 {
 	struct ipoib_dev_priv *priv = netdev_priv(dev);
-	unsigned int wr_id = wc->wr_id & ~IPOIB_CM_OP_SRQ;
+	unsigned int wr_id = wc->wr_id & ~(IPOIB_OP_CM | IPOIB_OP_RECV);
 	struct sk_buff *skb;
 	struct ipoib_cm_rx *p;
 	unsigned long flags;
@@ -433,7 +433,7 @@ static inline int post_send(struct ipoib_dev_priv *priv,
 	priv->tx_sge.addr             = addr;
 	priv->tx_sge.length           = len;
 
-	priv->tx_wr.wr_id 	      = wr_id;
+	priv->tx_wr.wr_id 	      = wr_id | IPOIB_OP_CM;
 
 	return ib_post_send(tx->qp, &priv->tx_wr, &bad_wr);
 }
@@ -484,20 +484,19 @@ void ipoib_cm_send(struct net_device *dev, struct sk_buff *skb, struct ipoib_cm_
 		dev->trans_start = jiffies;
 		++tx->tx_head;
 
-		if (tx->tx_head - tx->tx_tail == ipoib_sendq_size) {
+		if (++priv->tx_outstanding == ipoib_sendq_size) {
 			ipoib_dbg(priv, "TX ring 0x%x full, stopping kernel net queue\n",
 				  tx->qp->qp_num);
 			netif_stop_queue(dev);
-			set_bit(IPOIB_FLAG_NETIF_STOPPED, &tx->flags);
 		}
 	}
 }
 
-static void ipoib_cm_handle_tx_wc(struct net_device *dev, struct ipoib_cm_tx *tx,
-				  struct ib_wc *wc)
+void ipoib_cm_handle_tx_wc(struct net_device *dev, struct ib_wc *wc)
 {
 	struct ipoib_dev_priv *priv = netdev_priv(dev);
-	unsigned int wr_id = wc->wr_id;
+	struct ipoib_cm_tx *tx = wc->qp->qp_context;
+	unsigned int wr_id = wc->wr_id & ~IPOIB_OP_CM;
 	struct ipoib_tx_buf *tx_req;
 	unsigned long flags;
 
@@ -522,11 +521,10 @@ static void ipoib_cm_handle_tx_wc(struct net_device *dev, struct ipoib_cm_tx *tx
 
 	spin_lock_irqsave(&priv->tx_lock, flags);
 	++tx->tx_tail;
-	if (unlikely(test_bit(IPOIB_FLAG_NETIF_STOPPED, &tx->flags)) &&
-	    tx->tx_head - tx->tx_tail <= ipoib_sendq_size >> 1) {
-		clear_bit(IPOIB_FLAG_NETIF_STOPPED, &tx->flags);
+	if (unlikely(--priv->tx_outstanding == ipoib_sendq_size >> 1) &&
+	    netif_queue_stopped(dev) &&
+	    test_bit(IPOIB_FLAG_ADMIN_UP, &priv->flags))
 		netif_wake_queue(dev);
-	}
 
 	if (wc->status != IB_WC_SUCCESS &&
 	    wc->status != IB_WC_WR_FLUSH_ERR) {
@@ -549,11 +547,6 @@ static void ipoib_cm_handle_tx_wc(struct net_device *dev, struct ipoib_cm_tx *tx
 			tx->neigh = NULL;
 		}
 
-		/* queue would be re-started anyway when TX is destroyed,
-		 * but it makes sense to do it ASAP here. */
-		if (test_and_clear_bit(IPOIB_FLAG_NETIF_STOPPED, &tx->flags))
-			netif_wake_queue(dev);
-
 		if (test_and_clear_bit(IPOIB_FLAG_INITIALIZED, &tx->flags)) {
 			list_move(&tx->list, &priv->cm.reap_list);
 			queue_work(ipoib_workqueue, &priv->cm.reap_task);
@@ -567,19 +560,6 @@ static void ipoib_cm_handle_tx_wc(struct net_device *dev, struct ipoib_cm_tx *tx
 	spin_unlock_irqrestore(&priv->tx_lock, flags);
 }
 
-static void ipoib_cm_tx_completion(struct ib_cq *cq, void *tx_ptr)
-{
-	struct ipoib_cm_tx *tx = tx_ptr;
-	int n, i;
-
-	ib_req_notify_cq(cq, IB_CQ_NEXT_COMP);
-	do {
-		n = ib_poll_cq(cq, IPOIB_NUM_WC, tx->ibwc);
-		for (i = 0; i < n; ++i)
-			ipoib_cm_handle_tx_wc(tx->dev, tx, tx->ibwc + i);
-	} while (n == IPOIB_NUM_WC);
-}
-
 int ipoib_cm_dev_open(struct net_device *dev)
 {
 	struct ipoib_dev_priv *priv = netdev_priv(dev);
@@ -699,17 +679,18 @@ static int ipoib_cm_rep_handler(struct ib_cm_id *cm_id, struct ib_cm_event *even
 	return 0;
 }
 
-static struct ib_qp *ipoib_cm_create_tx_qp(struct net_device *dev, struct ib_cq *cq)
+static struct ib_qp *ipoib_cm_create_tx_qp(struct net_device *dev, struct ipoib_cm_tx *tx)
 {
 	struct ipoib_dev_priv *priv = netdev_priv(dev);
 	struct ib_qp_init_attr attr = {};
 	attr.recv_cq = priv->cq;
+	attr.send_cq = priv->cq;
 	attr.srq = priv->cm.srq;
 	attr.cap.max_send_wr = ipoib_sendq_size;
 	attr.cap.max_send_sge = 1;
 	attr.sq_sig_type = IB_SIGNAL_ALL_WR;
 	attr.qp_type = IB_QPT_RC;
-	attr.send_cq = cq;
+	attr.qp_context = tx;
 	return ib_create_qp(priv->pd, &attr);
 }
 
@@ -789,21 +770,7 @@ static int ipoib_cm_tx_init(struct ipoib_cm_tx *p, u32 qpn,
 		goto err_tx;
 	}
 
-	p->cq = ib_create_cq(priv->ca, ipoib_cm_tx_completion, NULL, p,
-			     ipoib_sendq_size + 1);
-	if (IS_ERR(p->cq)) {
-		ret = PTR_ERR(p->cq);
-		ipoib_warn(priv, "failed to allocate tx cq: %d\n", ret);
-		goto err_cq;
-	}
-
-	ret = ib_req_notify_cq(p->cq, IB_CQ_NEXT_COMP);
-	if (ret) {
-		ipoib_warn(priv, "failed to request completion notification: %d\n", ret);
-		goto err_req_notify;
-	}
-
-	p->qp = ipoib_cm_create_tx_qp(p->dev, p->cq);
+	p->qp = ipoib_cm_create_tx_qp(p->dev, p);
 	if (IS_ERR(p->qp)) {
 		ret = PTR_ERR(p->qp);
 		ipoib_warn(priv, "failed to allocate tx qp: %d\n", ret);
@@ -840,12 +807,8 @@ err_modify:
 err_id:
 	p->id = NULL;
 	ib_destroy_qp(p->qp);
-err_req_notify:
 err_qp:
 	p->qp = NULL;
-	ib_destroy_cq(p->cq);
-err_cq:
-	p->cq = NULL;
 err_tx:
 	return ret;
 }
@@ -854,6 +817,7 @@ static void ipoib_cm_tx_destroy(struct ipoib_cm_tx *p)
 {
 	struct ipoib_dev_priv *priv = netdev_priv(p->dev);
 	struct ipoib_tx_buf *tx_req;
+	unsigned long flags;
 
 	ipoib_dbg(priv, "Destroy active connection 0x%x head 0x%x tail 0x%x\n",
 		  p->qp ? p->qp->qp_num : 0, p->tx_head, p->tx_tail);
@@ -864,12 +828,6 @@ static void ipoib_cm_tx_destroy(struct ipoib_cm_tx *p)
 	if (p->qp)
 		ib_destroy_qp(p->qp);
 
-	if (p->cq)
-		ib_destroy_cq(p->cq);
-
-	if (test_bit(IPOIB_FLAG_NETIF_STOPPED, &p->flags))
-		netif_wake_queue(p->dev);
-
 	if (p->tx_ring) {
 		while ((int) p->tx_tail - (int) p->tx_head < 0) {
 			tx_req = &p->tx_ring[p->tx_tail & (ipoib_sendq_size - 1)];
@@ -877,6 +835,12 @@ static void ipoib_cm_tx_destroy(struct ipoib_cm_tx *p)
 					 DMA_TO_DEVICE);
 			dev_kfree_skb_any(tx_req->skb);
 			++p->tx_tail;
+			spin_lock_irqsave(&priv->tx_lock, flags);
+			if (unlikely(--priv->tx_outstanding == ipoib_sendq_size >> 1) &&
+			    netif_queue_stopped(p->dev) &&
+			    test_bit(IPOIB_FLAG_ADMIN_UP, &priv->flags))
+				netif_wake_queue(p->dev);
+			spin_unlock_irqrestore(&priv->tx_lock, flags);
 		}
 
 		kfree(p->tx_ring);
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_ib.c b/drivers/infiniband/ulp/ipoib/ipoib_ib.c
index f2aa923..19a3d3e 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_ib.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_ib.c
@@ -266,11 +266,10 @@ static void ipoib_ib_handle_tx_wc(struct net_device *dev, struct ib_wc *wc)
 
 	spin_lock_irqsave(&priv->tx_lock, flags);
 	++priv->tx_tail;
-	if (unlikely(test_bit(IPOIB_FLAG_NETIF_STOPPED, &priv->flags)) &&
-	    priv->tx_head - priv->tx_tail <= ipoib_sendq_size >> 1) {
-		clear_bit(IPOIB_FLAG_NETIF_STOPPED, &priv->flags);
+	if (unlikely(--priv->tx_outstanding == ipoib_sendq_size >> 1) &&
+	    netif_queue_stopped(dev) &&
+	    test_bit(IPOIB_FLAG_ADMIN_UP, &priv->flags))
 		netif_wake_queue(dev);
-	}
 	spin_unlock_irqrestore(&priv->tx_lock, flags);
 
 	if (wc->status != IB_WC_SUCCESS &&
@@ -282,12 +281,17 @@ static void ipoib_ib_handle_tx_wc(struct net_device *dev, struct ib_wc *wc)
 
 static void ipoib_ib_handle_wc(struct net_device *dev, struct ib_wc *wc)
 {
-	if (wc->wr_id & IPOIB_CM_OP_SRQ)
-		ipoib_cm_handle_rx_wc(dev, wc);
-	else if (wc->wr_id & IPOIB_OP_RECV)
-		ipoib_ib_handle_rx_wc(dev, wc);
-	else
-		ipoib_ib_handle_tx_wc(dev, wc);
+	if (wc->wr_id & IPOIB_OP_CM) {
+		if (wc->wr_id & IPOIB_OP_RECV)
+			ipoib_cm_handle_rx_wc(dev, wc);
+		else
+			ipoib_cm_handle_tx_wc(dev, wc);
+	} else {
+		if (wc->wr_id & IPOIB_OP_RECV)
+			ipoib_ib_handle_rx_wc(dev, wc);
+		else
+			ipoib_ib_handle_tx_wc(dev, wc);
+	}
 }
 
 void ipoib_ib_completion(struct ib_cq *cq, void *dev_ptr)
@@ -370,10 +374,9 @@ void ipoib_send(struct net_device *dev, struct sk_buff *skb,
 		address->last_send = priv->tx_head;
 		++priv->tx_head;
 
-		if (priv->tx_head - priv->tx_tail == ipoib_sendq_size) {
+		if (++priv->tx_outstanding == ipoib_sendq_size) {
 			ipoib_dbg(priv, "TX ring full, stopping kernel net queue\n");
 			netif_stop_queue(dev);
-			set_bit(IPOIB_FLAG_NETIF_STOPPED, &priv->flags);
 		}
 	}
 }
@@ -549,6 +552,7 @@ int ipoib_ib_dev_stop(struct net_device *dev)
 						    DMA_TO_DEVICE);
 				dev_kfree_skb_any(tx_req->skb);
 				++priv->tx_tail;
+				--priv->tx_outstanding;
 			}
 
 			for (i = 0; i < ipoib_recvq_size; ++i) {
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c
index 19e82db..7c7b136 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
@@ -900,7 +900,7 @@ int ipoib_dev_init(struct net_device *dev, struct ib_device *ca, int port)
 		goto out_rx_ring_cleanup;
 	}
 
-	/* priv->tx_head & tx_tail are already 0 */
+	/* priv->tx_head, tx_tail & tx_outstanding are already 0 */
 
 	if (ipoib_ib_dev_init(dev, ca, port))
 		goto out_tx_ring_cleanup;

-- 
MST


From swise at opengridcomputing.com  Sun Feb 11 13:14:49 2007
From: swise at opengridcomputing.com (Steve WIse)
Date: Sun, 11 Feb 2007 15:14:49 -0600
Subject: [openib-general] [PATCH] iw_cxgb3 Change cxio semaphore to mutex.
Message-ID: <1171228489.4027.4.camel@linux-q667.site>


From: Steve Wise <swise at opengridcomputing.com>

Change cxio semaphore to mutex.

Signed-off-by: Steve Wise <swise at opengridcomputing.com>
---

 drivers/infiniband/hw/cxgb3/cxio_hal.c |   10 +++++-----
 drivers/infiniband/hw/cxgb3/cxio_hal.h |    4 ++--
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/infiniband/hw/cxgb3/cxio_hal.c b/drivers/infiniband/hw/cxgb3/cxio_hal.c
index 19553b3..de3cb15 100644
--- a/drivers/infiniband/hw/cxgb3/cxio_hal.c
+++ b/drivers/infiniband/hw/cxgb3/cxio_hal.c
@@ -30,9 +30,9 @@
  * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
  * SOFTWARE.
  */
-#include <asm/semaphore.h>
 #include <asm/delay.h>
 
+#include <linux/mutex.h>
 #include <linux/netdevice.h>
 #include <linux/sched.h>
 #include <linux/spinlock.h>
@@ -527,7 +527,7 @@ static int cxio_hal_init_ctrl_qp(struct 
 	memset(rdev_p->ctrl_qp.workq, 0,
 	       (1 << T3_CTRL_QP_SIZE_LOG2) * sizeof(union t3_wr));
 
-	init_MUTEX(&rdev_p->ctrl_qp.sem);
+	mutex_init(&rdev_p->ctrl_qp.lock);
 	init_waitqueue_head(&rdev_p->ctrl_qp.waitq);
 
 	/* update HW Ctrl QP context */
@@ -570,7 +570,7 @@ static int cxio_hal_destroy_ctrl_qp(stru
 
 /* write len bytes of data into addr (32B aligned address)
  * If data is NULL, clear len byte of memory to zero.
- * caller aquires the sem before the call
+ * caller aquires the ctrl_qp lock before the call
  */
 static int cxio_hal_ctrl_qp_write_mem(struct cxio_rdev *rdev_p, u32 addr,
 				      u32 len, void *data, int completion)
@@ -705,7 +705,7 @@ static int __cxio_tpt_op(struct cxio_rde
 		}
 	}
 
-	down_interruptible(&rdev_p->ctrl_qp.sem);
+	mutex_lock(&rdev_p->ctrl_qp.lock);
 
 	/* write PBL first if any - update pbl only if pbl list exist */
 	if (pbl) {
@@ -752,7 +752,7 @@ static int __cxio_tpt_op(struct cxio_rde
 		cxio_hal_put_stag(rdev_p->rscp, stag_idx);
 ret:
 	wptr = rdev_p->ctrl_qp.wptr;
-	up(&rdev_p->ctrl_qp.sem);
+	mutex_unlock(&rdev_p->ctrl_qp.lock);
 	if (!err)
 		if (wait_event_interruptible(rdev_p->ctrl_qp.waitq,
 					     SEQ32_GE(rdev_p->ctrl_qp.rptr,
diff --git a/drivers/infiniband/hw/cxgb3/cxio_hal.h b/drivers/infiniband/hw/cxgb3/cxio_hal.h
index 8fb2999..1b97e80 100644
--- a/drivers/infiniband/hw/cxgb3/cxio_hal.h
+++ b/drivers/infiniband/hw/cxgb3/cxio_hal.h
@@ -62,8 +62,8 @@ #define T3_MAX_DEV_NAME_LEN 32
 struct cxio_hal_ctrl_qp {
 	u32 wptr;
 	u32 rptr;
-	struct semaphore sem;	/* for the wtpr, can sleep */
-	wait_queue_head_t waitq;	/* wait for RspQ/CQE msg */
+	struct mutex lock;	/* for the wtpr, can sleep */
+	wait_queue_head_t waitq;/* wait for RspQ/CQE msg */
 	union t3_wr *workq;	/* the work request queue */
 	dma_addr_t dma_addr;	/* pci bus address of the workq */
 	DECLARE_PCI_UNMAP_ADDR(mapping)


From rdreier at cisco.com  Sun Feb 11 15:11:38 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Sun, 11 Feb 2007 15:11:38 -0800
Subject: [openib-general] [PATCH] iw_cxgb3 Change cxio semaphore to
	mutex.
In-Reply-To: <1171228489.4027.4.camel@linux-q667.site> (Steve WIse's
	message of "Sun, 11 Feb 2007 15:14:49 -0600")
References: <1171228489.4027.4.camel@linux-q667.site>
Message-ID: <adabqk0b87p.fsf@cisco.com>

Thanks, applied along with the following warning cleanup for archs
where u64 is unsigned long instead unsigned long long:

diff --git a/drivers/infiniband/hw/cxgb3/cxio_dbg.c b/drivers/infiniband/hw/cxgb3/cxio_dbg.c
index dfaa704..5a7306f 100644
--- a/drivers/infiniband/hw/cxgb3/cxio_dbg.c
+++ b/drivers/infiniband/hw/cxgb3/cxio_dbg.c
@@ -62,7 +62,7 @@ void cxio_dump_tpt(struct cxio_rdev *rdev, u32 stag)
 
 	data = (u64 *)m->buf;
 	while (size > 0) {
-		PDBG("TPT %08x: %016llx\n", m->addr, (u64)*data);
+		PDBG("TPT %08x: %016llx\n", m->addr, (unsigned long long) *data);
 		size -= 8;
 		data++;
 		m->addr += 8;
@@ -100,7 +100,7 @@ void cxio_dump_pbl(struct cxio_rdev *rdev, u32 pbl_addr, uint len, u8 shift)
 
 	data = (u64 *)m->buf;
 	while (size > 0) {
-		PDBG("PBL %08x: %016llx\n", m->addr, (u64)*data);
+		PDBG("PBL %08x: %016llx\n", m->addr, (unsigned long long) *data);
 		size -= 8;
 		data++;
 		m->addr += 8;
@@ -116,7 +116,8 @@ void cxio_dump_wqe(union t3_wr *wqe)
 	if (size == 0)
 		size = 8;
 	while (size > 0) {
-		PDBG("WQE %p: %016llx\n", data, be64_to_cpu(*data));
+		PDBG("WQE %p: %016llx\n", data,
+		     (unsigned long long) be64_to_cpu(*data));
 		size--;
 		data++;
 	}
@@ -128,7 +129,8 @@ void cxio_dump_wce(struct t3_cqe *wce)
 	int size = sizeof(*wce);
 
 	while (size > 0) {
-		PDBG("WCE %p: %016llx\n", data, be64_to_cpu(*data));
+		PDBG("WCE %p: %016llx\n", data,
+		     (unsigned long long) be64_to_cpu(*data));
 		size -= 8;
 		data++;
 	}
@@ -159,7 +161,7 @@ void cxio_dump_rqt(struct cxio_rdev *rdev, u32 hwtid, int nents)
 
 	data = (u64 *)m->buf;
 	while (size > 0) {
-		PDBG("RQT %08x: %016llx\n", m->addr, (u64)*data);
+		PDBG("RQT %08x: %016llx\n", m->addr, (unsigned long long) *data);
 		size -= 8;
 		data++;
 		m->addr += 8;
diff --git a/drivers/infiniband/hw/cxgb3/cxio_hal.c b/drivers/infiniband/hw/cxgb3/cxio_hal.c
index 19553b3..0531b94 100644
--- a/drivers/infiniband/hw/cxgb3/cxio_hal.c
+++ b/drivers/infiniband/hw/cxgb3/cxio_hal.c
@@ -298,7 +298,7 @@ int cxio_create_qp(struct cxio_rdev *rdev_p, u32 kernel_domain,
 		wq->udb = (u64)rdev_p->rnic_info.udbell_physbase +
 					(wq->qpid << rdev_p->qpshift);
 	PDBG("%s qpid 0x%x doorbell 0x%p udb 0x%llx\n", __FUNCTION__,
-	     wq->qpid, wq->doorbell, wq->udb);
+	     wq->qpid, wq->doorbell, (unsigned long long) wq->udb);
 	return 0;
 err4:
 	kfree(wq->sq);
@@ -553,8 +553,8 @@ static int cxio_hal_init_ctrl_qp(struct cxio_rdev *rdev_p)
 	wqe->ctx1 = cpu_to_be64(ctx1);
 	wqe->ctx0 = cpu_to_be64(ctx0);
 	PDBG("CtrlQP dma_addr 0x%llx workq %p size %d\n",
-	     (u64) rdev_p->ctrl_qp.dma_addr, rdev_p->ctrl_qp.workq,
-	     1 << T3_CTRL_QP_SIZE_LOG2);
+	     (unsigned long long) rdev_p->ctrl_qp.dma_addr,
+	     rdev_p->ctrl_qp.workq, 1 << T3_CTRL_QP_SIZE_LOG2);
 	skb->priority = CPL_PRIORITY_CONTROL;
 	return (cxgb3_ofld_send(rdev_p->t3cdev_p, skb));
 }
diff --git a/drivers/infiniband/hw/cxgb3/iwch_cq.c b/drivers/infiniband/hw/cxgb3/iwch_cq.c
index 3d7c96f..98b3bdb 100644
--- a/drivers/infiniband/hw/cxgb3/iwch_cq.c
+++ b/drivers/infiniband/hw/cxgb3/iwch_cq.c
@@ -87,7 +87,7 @@ static int iwch_poll_cq_one(struct iwch_dev *rhp, struct iwch_cq *chp,
 	     "lo 0x%x cookie 0x%llx\n", __FUNCTION__,
 	     CQE_QPID(cqe), CQE_TYPE(cqe),
 	     CQE_OPCODE(cqe), CQE_STATUS(cqe), CQE_WRID_HI(cqe),
-	     CQE_WRID_LOW(cqe), cookie);
+	     CQE_WRID_LOW(cqe), (unsigned long long) cookie);
 
 	if (CQE_TYPE(cqe) == 0) {
 		if (!CQE_STATUS(cqe))
diff --git a/drivers/infiniband/hw/cxgb3/iwch_mem.c b/drivers/infiniband/hw/cxgb3/iwch_mem.c
index 5909ec5..2b6cd53 100644
--- a/drivers/infiniband/hw/cxgb3/iwch_mem.c
+++ b/drivers/infiniband/hw/cxgb3/iwch_mem.c
@@ -163,7 +163,9 @@ int build_phys_page_list(struct ib_phys_buf *buffer_list,
 			    ((u64) j << *shift));
 
 	PDBG("%s va 0x%llx mask 0x%llx shift %d len %lld pbl_size %d\n",
-	     __FUNCTION__, *iova_start, mask, *shift, *total_size, *npages);
+	     __FUNCTION__, (unsigned long long) *iova_start,
+	     (unsigned long long) mask, *shift, (unsigned long long) *total_size,
+	     *npages);
 
 	return 0;
 
diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.c b/drivers/infiniband/hw/cxgb3/iwch_provider.c
index d02cd72..549de0a 100644
--- a/drivers/infiniband/hw/cxgb3/iwch_provider.c
+++ b/drivers/infiniband/hw/cxgb3/iwch_provider.c
@@ -213,7 +213,7 @@ static struct ib_cq *iwch_create_cq(struct ib_device *ibdev, int entries,
 	}
 	PDBG("created cqid 0x%0x chp %p size 0x%0x, dma_addr 0x%0llx\n",
 	     chp->cq.cqid, chp, (1 << chp->cq.size_log2),
-	     (u64)chp->cq.dma_addr);
+	     (unsigned long long) chp->cq.dma_addr);
 	return &chp->ibcq;
 }
 
@@ -323,7 +323,7 @@ static int iwch_mmap(struct ib_ucontext *context, struct vm_area_struct *vma)
 	struct iwch_ucontext *ucontext;
 
 	PDBG("%s off 0x%lx addr 0x%llx len %d\n", __FUNCTION__, vma->vm_pgoff,
-	     pgaddr, len);
+	     (unsigned long long) pgaddr, len);
 
 	if (vma->vm_start & (PAGE_SIZE-1)) {
 	        return -EINVAL;
@@ -873,7 +873,8 @@ static struct ib_qp *iwch_create_qp(struct ib_pd *pd,
 	PDBG("%s sq_num_entries %d, rq_num_entries %d "
 	     "qpid 0x%0x qhp %p dma_addr 0x%llx size %d\n",
 	     __FUNCTION__, qhp->attr.sq_num_entries, qhp->attr.rq_num_entries,
-	     qhp->wq.qpid, qhp, (u64)qhp->wq.dma_addr, 1 << qhp->wq.size_log2);
+	     qhp->wq.qpid, qhp, (unsigned long long) qhp->wq.dma_addr,
+	     1 << qhp->wq.size_log2);
 	return &qhp->ibqp;
 }
 
diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.h b/drivers/infiniband/hw/cxgb3/iwch_provider.h
index b2eb29e..5680d82 100644
--- a/drivers/infiniband/hw/cxgb3/iwch_provider.h
+++ b/drivers/infiniband/hw/cxgb3/iwch_provider.h
@@ -212,8 +212,8 @@ static inline struct iwch_mm_entry *remove_mmap(struct iwch_ucontext *ucontext,
 		if (mm->addr == addr && mm->len == len) {
 			list_del_init(&mm->entry);
 			spin_unlock(&ucontext->mmap_lock);
-			PDBG("%s addr 0x%llx len %d\n", __FUNCTION__, mm->addr,
-			     mm->len);
+			PDBG("%s addr 0x%llx len %d\n", __FUNCTION__,
+			     (unsigned long long) mm->addr, mm->len);
 			return mm;
 		}
 	}
@@ -225,7 +225,8 @@ static inline void insert_mmap(struct iwch_ucontext *ucontext,
 			       struct iwch_mm_entry *mm)
 {
 	spin_lock(&ucontext->mmap_lock);
-	PDBG("%s addr 0x%llx len %d\n", __FUNCTION__, mm->addr, mm->len);
+	PDBG("%s addr 0x%llx len %d\n", __FUNCTION__,
+	     (unsigned long long) mm->addr, mm->len);
 	list_add_tail(&mm->entry, &ucontext->mmaps);
 	spin_unlock(&ucontext->mmap_lock);
 }
diff --git a/drivers/infiniband/hw/cxgb3/iwch_qp.c b/drivers/infiniband/hw/cxgb3/iwch_qp.c
index 8b44b69..e066727 100644
--- a/drivers/infiniband/hw/cxgb3/iwch_qp.c
+++ b/drivers/infiniband/hw/cxgb3/iwch_qp.c
@@ -329,7 +329,7 @@ int iwch_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr,
 			       Q_GENBIT(qhp->wq.wptr, qhp->wq.size_log2),
 			       0, t3_wr_flit_cnt);
 		PDBG("%s cookie 0x%llx wq idx 0x%x swsq idx %ld opcode %d\n",
-		     __FUNCTION__, wr->wr_id, idx,
+		     __FUNCTION__, (unsigned long long) wr->wr_id, idx,
 		     Q_PTR2IDX(qhp->wq.sq_wptr, qhp->wq.sq_size_log2),
 		     sqp->opcode);
 		wr = wr->next;
@@ -381,8 +381,8 @@ int iwch_post_receive(struct ib_qp *ibqp, struct ib_recv_wr *wr,
 			       Q_GENBIT(qhp->wq.wptr, qhp->wq.size_log2),
 			       0, sizeof(struct t3_receive_wr) >> 3);
 		PDBG("%s cookie 0x%llx idx 0x%x rq_wptr 0x%x rw_rptr 0x%x "
-		     "wqe %p \n", __FUNCTION__, wr->wr_id, idx,
-		     qhp->wq.rq_wptr, qhp->wq.rq_rptr, wqe);
+		     "wqe %p \n", __FUNCTION__, (unsigned long long) wr->wr_id,
+		     idx, qhp->wq.rq_wptr, qhp->wq.rq_rptr, wqe);
 		++(qhp->wq.rq_wptr);
 		++(qhp->wq.wptr);
 		wr = wr->next;


From jgunthorpe at obsidianresearch.com  Sun Feb 11 15:09:35 2007
From: jgunthorpe at obsidianresearch.com (Jason Gunthorpe)
Date: Sun, 11 Feb 2007 16:09:35 -0700
Subject: [openib-general] Problem is routing CM REQ
In-Reply-To: <000001c74cb8$5e80eef0$3cd4180a@amr.corp.intel.com>
References: <20070210004820.GS11411@obsidianresearch.com>
	<000001c74cb8$5e80eef0$3cd4180a@amr.corp.intel.com>
Message-ID: <20070211230935.GT11411@obsidianresearch.com>

On Fri, Feb 09, 2007 at 06:08:34PM -0800, Sean Hefty wrote:
> >So basically what you are saying is that the TClass and FlowLabel act
> >as some kind of global dis-ambiguation that lets all SAs know that the
> >tuple <SGID,DGID,TClass,FlowLabel> MUST be matched with <LRH_A,LRH_B>
> >on each side.
> 
> Sort of...  My reasoning is that if you look at a packet traveling
> from the source QP to the destination QP, and examine the packet in
> some intermediate subnet (say between two routers), then the only
> information that it carries is the <SGID, DGID, TClass, FlowLabel>
> tuple.  This information must be sufficient to direct the routing at
> the endpoints.

Ah, I think I missed the key step in your scheme.. You plan to query
the local SM for SGID=remote DGID=local? (ie reversed from 'normal'. I
was thinking only about the SGID=local DGID=remote query direction)

Yes, I agree this works in the simple cases. Quite well in fact...
The reversed direction of the PR query is very much aligned with the
idea that the GRH is only a destination affecting thing.

Let my try to outline to you what I think you are proposing.
This is the diagram I am thinking of:

   SA                                                      SA'
Node1 --> (LID 1) Router A -------  Router A' (LID A) ---> Node2
      |-> (LID 2) Router A                              |
      |-> (LID 3) Router B -------  Router B' (LID B) --|

Router A and Router B are independent redundant devices, not a route
cloud of some sort. B -> A' is not a possible path.

So your idea is to do:
  PR0: Node 1 asks SA for Node1 -> Node2 reversable path.
       SA returns SLID=Node1 DLID=1, FlowLabel=Magic Reversable
       indicator. This path is used for CM GMPs, or for the
       normal non-routed CM.
  PR1: Detecting a routed situation from PR0, 
       Node 1 asks SA for Node2 -> Node1. SA returns SLID=1
       DLID=Node1 and a GRH that configures Router A to use SLID=1
       You reverse the local LIDS from that path to get the QP
       configuration.
  PR2: Node 1 asks SA' for Node1 -> Node2. SA returns SLID=A
       DLID=Node.

OK. But what if:
  PR1: Node 1 asks SA for Node2 -> Node1. SA returns SLID=3
       DLID=Node1
  PR2: Node 1 asks SA' for Node1 -> Node2. SA returns SLID=A
       DLID=Node2.

Now the LIDs don't match and the QP won't work. SA' has no idea that
SA picked Router B.

> It shouldn't need information about the paths used by packets on the
> remote subnet.  If a subnet has multiple routers into it, they can
> forward packets to the correct router if needed.  (Could the routers
> just forward to the end node and insert the expected SLID?)

Right, this is a good way to solve the problem. Going with the
example above, SA' returns a GRH that configures Router B' to use
SLID=A and the GRH SA returned configures Router A to use SLID=3.
Router B' and A both are faking the SLID in the LRH.

This effectively defeats the QP SLID check and everything works :>
[Like I said before, this check seems to be a misfeature]

I can think of the following downsides:
 1) Re-reading Michael Krause's email makes me think that defeating
    the QP SLID check is contrary to the spirit of IBA
 2) Routers now require a GRH->LRH translation table size that
    is proportional to all the router LIDs in the subnet, not
    just its own LIDs. [Smart selection of the Flow Label could
    mitigate this growth though]
 3) The reverse PR query method requires 3 PR queries for the simple
    case and as many as 5 if you want non-reversible paths.
 4) Some means of remote SA communication needs to be decided
    pre-standardization :< (I agree that a magic GID seems best)

But... It is the SLID faking that solves the multiple-router-path
problem, not the reverse PR. Do you think something like that could
be standardized?

I guess the big question I have is if IBA chooses to standardize some
other method, how much chance is there that it would also make this
unsupportable? Ie by preventing the remote SA communication mechanism
or by defining a reverse PR to mean something else? I could easially
imagine the reverse PR being defined as a way to ask the local SA
about the *remote* LIDs.

[Actually, if you define it that way and use a MultiPathRecord query
 then there is enough information to return working LIDs
 for both subnets. The SAs would have to communicate between
 themselves and the routers using a new protocol, but that is
 doable. This does require that a PR be defined so that the LIDs are
 relative to the subnet of the SGID - not to the local subnet!]

> I'm still trying to find a solution that doesn't violate the
> architecture as defined.  I don't see why my idea wouldn't work yet.
> It just requires some unspecified coordination between the local SA
> and local routers.

I'd also very much like to not have to change the passive side to make
this work.

But this has turned into such a complex problem it seems really hard
to predict what will pass through to standardization.. That is the
main benifit I see of the small change to the passive side. No matter
what is standardized it can be accomidated in the resulting
standard, wheras defining a PR with SGID==offsubnet to mean one thing
or another seems much more risky.

Jason


From devesh28 at gmail.com  Sun Feb 11 21:10:48 2007
From: devesh28 at gmail.com (Devesh Sharma)
Date: Mon, 12 Feb 2007 10:40:48 +0530
Subject: [openib-general] Immediate data question
In-Reply-To: <349DCDA352EACF42A0C49FA6DCEA840362A87E@G3W0634.americas.hpqcorp.net>
References: <6C2C79E72C305246B504CBA17B5500C905DC04@mtlexch01.mtl.com>
	<adatzy0qmt3.fsf@cisco.com>
	<349DCDA352EACF42A0C49FA6DCEA840350B1B5@G3W0634.americas.hpqcorp.net>
	<ada7iuwp5rr.fsf@cisco.com>
	<349DCDA352EACF42A0C49FA6DCEA84035DF880@G3W0634.americas.hpqcorp.net>
	<adamz3pfym0.fsf@cisco.com>
	<349DCDA352EACF42A0C49FA6DCEA84035DFAFF@G3W0634.americas.hpqcorp.net>
	<adahctxeds8.fsf@cisco.com>
	<6.2.0.14.2.20070208131610.085c52f8@esmail.cup.hp.com>
	<349DCDA352EACF42A0C49FA6DCEA840362A87E@G3W0634.americas.hpqcorp.net>
Message-ID: <309a667c0702112110h7a79961fv9c3cf46d4392e1d4@mail.gmail.com>

On 2/10/07, Tang, Changqing <changquing.tang at hp.com> wrote:
> > >
> > >Not for the receiver, but the sender will be severely slowed down by
> > >having to wait for the RNR timeouts.
> >
> > RNR = Receiver Not Ready so by definition, the data flow
> > isn't going to
> > progress until the receiver is ready to receive data.   If a
> > receive QP
> > enters RNR for a RC, then it is likely not progressing as
> > desired.   RNR
> > was initially put in place to enable a receiver to create
> > back pressure to the sender without causing a fatal error
> > condition.  It should rarely be entered and therefore should
> > have negligible impact on overall performance however when a
> > RNR occurs, no forward progress will occur so performance is
> > essentially zero.
>
> Mike:
>         I still do not quite understand this issue. I have two
> situations that have RNR triggered.
>
> 1. process A and process B is connected with QP. A first post a send to
> B, B does not post receive. Then A and B are doing a long time
> RDMA_WRITE each other, A and B just check memory for the RDMA_WRITE
> message. Finally B will post a receive. Does the first pending send in A
> block all the later RDMA_WRITE ?
According to IBTA spec HCA will process WR entries in strict order in
which they are posted so the send will block all WR posted after this
send, Until-unless HCA has multiple processing elements, I think even
then processing order will be maintained by HCA
 If not, since RNR is triggered
> periodically till B post receive, does it affect the RDMA_WRITE
> performance between A and B ?
>
> 2. extend above to three processes, A connect to B, B connect to C, so B
> has two QPs, but one CQ. A posts a send to B, B does not post receive,
> rather B and C are doing a long time RDMA_WRITE, or send/recv. But B
> must sends RNR periodically to A, right?. So does the pending message
> from A affects B's overall performance  between B and C ?
>
>         Thank you.
>
> --CQ
>
>
> >
> > Mike
> >
> >
> >
>
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
>
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
>
>


From sweitzen at cisco.com  Sun Feb 11 21:53:17 2007
From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen))
Date: Sun, 11 Feb 2007 21:53:17 -0800
Subject: [openib-general] no RDS in OFED 1.2?
Message-ID: <A15335FBE9BD2449AF2C9EF3D1EB8EA302FEF389@xmb-sjc-216.amer.cisco.com>

I don't see RDS in the feature freeze builds yet.
 
Scott Weitzenkamp
SQA and Release Manager
Server Virtualization Business Unit
Cisco Systems
 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070211/b46b1ba2/attachment.html>

From tziporet at mellanox.co.il  Sun Feb 11 23:30:31 2007
From: tziporet at mellanox.co.il (Tziporet Koren)
Date: Mon, 12 Feb 2007 09:30:31 +0200
Subject: [openib-general] [openfabrics-ewg] no RDS in OFED 1.2?
Message-ID: <6C2C79E72C305246B504CBA17B5500C9A0DDD8@mtlexch01.mtl.com>

Vlad is working on this.
It will be in the alpha release
 
Tziporet

________________________________

From: openfabrics-ewg-bounces at openib.org
[mailto:openfabrics-ewg-bounces at openib.org] On Behalf Of Scott
Weitzenkamp (sweitzen)
Sent: Monday, February 12, 2007 7:53 AM
To: openfabrics-ewg at openib.org
Cc: openib-general
Subject: [openfabrics-ewg] no RDS in OFED 1.2?


I don't see RDS in the feature freeze builds yet.
 
Scott Weitzenkamp
SQA and Release Manager
Server Virtualization Business Unit
Cisco Systems
 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070212/36746c44/attachment.html>

From vlad at lists.openfabrics.org  Mon Feb 12 02:24:14 2007
From: vlad at lists.openfabrics.org (vlad at lists.openfabrics.org)
Date: Mon, 12 Feb 2007 02:24:14 -0800 (PST)
Subject: [openib-general] ofa_1_2_kernel 20070212-0200 daily build status
Message-ID: <20070212102414.E14F9E60806@openfabrics.org>

This email was generated automatically, please do not reply


Common build parameters:  --with-ipoib-mod --with-sdp-mod --with-srp-mod --with-user_mad-mod --with-user_access-mod --with-mthca-mod --with-core-mod --with-addr_trans-mod --with-cxgb3-mod 

Passed:
Passed on i686 with 2.6.15-23-server
Passed on i686 with linux-2.6.18
Passed on i686 with linux-2.6.13
Passed on i686 with linux-2.6.19
Passed on i686 with linux-2.6.15
Passed on i686 with linux-2.6.14
Passed on i686 with linux-2.6.16
Passed on i686 with linux-2.6.17
Passed on i686 with linux-2.6.12
Passed on x86_64 with linux-2.6.20
Passed on powerpc with linux-2.6.19
Passed on powerpc with linux-2.6.17
Passed on x86_64 with linux-2.6.19
Passed on x86_64 with linux-2.6.13
Passed on x86_64 with linux-2.6.18
Passed on x86_64 with linux-2.6.14
Passed on x86_64 with linux-2.6.12
Passed on x86_64 with linux-2.6.17
Passed on x86_64 with linux-2.6.16
Passed on powerpc with linux-2.6.18
Passed on ppc64 with linux-2.6.19
Passed on x86_64 with linux-2.6.15
Passed on powerpc with linux-2.6.16
Passed on powerpc with linux-2.6.12
Passed on powerpc with linux-2.6.14
Passed on ppc64 with linux-2.6.12
Passed on powerpc with linux-2.6.15
Passed on ppc64 with linux-2.6.18
Passed on ppc64 with linux-2.6.15
Passed on ppc64 with linux-2.6.17
Passed on powerpc with linux-2.6.13
Passed on ppc64 with linux-2.6.16
Passed on ppc64 with linux-2.6.14
Passed on ia64 with linux-2.6.19
Passed on ppc64 with linux-2.6.13
Passed on ia64 with linux-2.6.15
Passed on ia64 with linux-2.6.16
Passed on ia64 with linux-2.6.18
Passed on ia64 with linux-2.6.17
Passed on ia64 with linux-2.6.14
Passed on ia64 with linux-2.6.13
Passed on ia64 with linux-2.6.12

Failed:


From halr at voltaire.com  Mon Feb 12 03:36:10 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 12 Feb 2007 06:36:10 -0500
Subject: [openib-general] Problem with install.sh openib-diags
 OFED-1.2-20070208-1508.tgz
In-Reply-To: <A15335FBE9BD2449AF2C9EF3D1EB8EA302FEF308@xmb-sjc-216.amer.cisco.com>
References: <A15335FBE9BD2449AF2C9EF3D1EB8EA302FEF308@xmb-sjc-216.amer.cisco.com>
Message-ID: <1171280132.31538.409786.camel@hal.voltaire.com>

On Sun, 2007-02-11 at 13:44, Scott Weitzenkamp (sweitzen) wrote:
> I'm using install.sh on RHEL4 U3 x86_64
>  
> Preparing...               
> ##################################################
> kernel-ib-devel            
> ##################################################
> kernel-ib                  
> ##################################################
> error: Failed dependencies:
>         perl(IBswcountlimits)

This is supposed to be IBswcountlimits.pm. I think there was a change
for the localtion of this to be under <prefix>/lib/perl some days ago
but not sure whether this change is in the OFED 1.2 install (for
OFED-1.2-20070208). 

Vlad, do you know what is causing this error ?

-- Hal

>  is needed by openib-diags-1.2.0-pre1.x86_64
> ERROR: Failed executing "/bin/rpm -ihv
> /tmp/OFED-1.2-20070208-1508/RPMS/redhat-\
> release-4AS-4.1/dapl-1.2.0-0.x86_64.rpm
> /tmp/OFED-1.2-20070208-1508/RPMS/redhat\
> -release-4AS-4.1/dapl-devel-1.2.0-0.x86_64.rpm
> /tmp/OFED-1.2-20070208-1508/RPMS\
> /redhat-release-4AS-4.1/libibcommon-1.0.2-0.x86_64.rpm
> /tmp/OFED-1.2-20070208-1\
> 508/RPMS/redhat-release-4AS-4.1/libibcommon-devel-1.0.2-0.x86_64.rpm
> /tmp/OFED-\
> 1.2-20070208-1508/RPMS/redhat-release-4AS-4.1/libibmad-1.0.2-0.x86_64.rpm /tmp/\
> OFED-1.2-20070208-1508/RPMS/redhat-release-4AS-4.1/libibmad-devel-1.0.2-0.x86_6\
> 4.rpm
> /tmp/OFED-1.2-20070208-1508/RPMS/redhat-release-4AS-4.1/libibumad-1.0.2-0\
> .x86_64.rpm
> /tmp/OFED-1.2-20070208-1508/RPMS/redhat-release-4AS-4.1/libibumad-d\
> evel-1.0.2-0.x86_64.rpm
> /tmp/OFED-1.2-20070208-1508/RPMS/redhat-release-4AS-4.1\
> /libibverbs-1.1-pre1.x86_64.rpm
> /tmp/OFED-1.2-20070208-1508/RPMS/redhat-release\
> -4AS-4.1/libibverbs-devel-1.1-pre1.x86_64.rpm
> /tmp/OFED-1.2-20070208-1508/RPMS/\
> redhat-release-4AS-4.1/libibverbs-utils-1.1-pre1.x86_64.rpm
> /tmp/OFED-1.2-20070\
> 208-1508/RPMS/redhat-release-4AS-4.1/libmthca-1.0.4-pre.x86_64.rpm
> /tmp/OFED-1.\
> 2-20070208-1508/RPMS/redhat-release-4AS-4.1/libmthca-devel-1.0.4-pre.x86_64.rpm\
>  /tmp/OFED-1.2-20070208-1508/RPMS/redhat-release-4AS-4.1/libopensm-3.0.1-0.x86_\
> 64.rpm
> /tmp/OFED-1.2-20070208-1508/RPMS/redhat-release-4AS-4.1/libopensm-devel-\
> 3.0.1-0.x86_64.rpm
> /tmp/OFED-1.2-20070208-1508/RPMS/redhat-release-4AS-4.1/libo\
> smcomp-3.0.1-0.x86_64.rpm
> /tmp/OFED-1.2-20070208-1508/RPMS/redhat-release-4AS-4\
> .1/libosmcomp-devel-3.0.1-0.x86_64.rpm
> /tmp/OFED-1.2-20070208-1508/RPMS/redhat-\
> release-4AS-4.1/libosmvendor-3.0.1-0.x86_64.rpm
> /tmp/OFED-1.2-20070208-1508/RPM\
> S/redhat-release-4AS-4.1/libosmvendor-devel-3.0.1-0.x86_64.rpm
> /tmp/OFED-1.2-20\
> 070208-1508/RPMS/redhat-release-4AS-4.1/librdmacm-0.9.0-0.x86_64.rpm
> /tmp/OFED-\
> 1.2-20070208-1508/RPMS/redhat-release-4AS-4.1/librdmacm-devel-0.9.0-0.x86_64.rp\
> m
> /tmp/OFED-1.2-20070208-1508/RPMS/redhat-release-4AS-4.1/libsdp-1.1.99-0.x86_6\
> 4.rpm
> /tmp/OFED-1.2-20070208-1508/RPMS/redhat-release-4AS-4.1/openib-diags-1.2.\
> 0-pre1.x86_64.rpm
> /tmp/OFED-1.2-20070208-1508/RPMS/redhat-release-4AS-4.1/perft\
> est-1.2-0.x86_64.rpm "
> 
>  
> Scott Weitzenkamp
> SQA and Release Manager
> Server Virtualization Business Unit
> Cisco Systems
>  
> 
> ______________________________________________________________________
> 
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


From bugzilla-daemon at lists.openfabrics.org  Mon Feb 12 04:40:58 2007
From: bugzilla-daemon at lists.openfabrics.org (bugzilla-daemon at lists.openfabrics.org)
Date: Mon, 12 Feb 2007 04:40:58 -0800 (PST)
Subject: [openib-general] [Bug 351] New: Routing table problem in SLES10
 when using port #2
Message-ID: <bug-351-1@https.bugs.openfabrics.org/>

https://bugs.openfabrics.org/show_bug.cgi?id=351

           Summary: Routing table problem in SLES10 when using port #2
           Product: OpenFabrics Linux
           Version: 1.2
          Platform: All
        OS/Version: SLES 10
            Status: NEW
          Severity: major
          Priority: P1
         Component: IPoIB
        AssignedTo: bugzilla at openib.org
        ReportedBy: yohadd at mellanox.co.il


There is an issue with the routing table on SLES10 when using IB port #2.
After host reboot the routing table contain two entries for 12.X.X.X !!!
One of the entries is correct and point to ib1, the other one point to ib0.

Route output:

sw087:~ # route
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
10.4.0.0         *               255.255.0.0     U     0      0        0 eth0   
12.4.0.0         *               255.255.0.0     U     0      0        0 ib1 
12.4.0.0         *               255.255.0.0     U     0      0        0 ib0  
link-local      *               255.255.0.0     U     0      0        0 eth0
11.4.0.0         *               255.255.0.0     U     0      0        0 ib0
loopback        *               255.0.0.0       U     0      0        0 lo
default         10.4.0.211      0.0.0.0         UG    0      0        0 eth0

The first entry for 12.4.0.0 point to I/F ib1, so in this configuration the
ipoib over 12.4.X.X will work fine.
After restarting the ib1 I/F with the ifconfig commands, the routing table
changed (the order between the two 12.4.0.0 entries changed) and ipoib over
12.4.X.X will not work any more.

Restarting ib1 I/F:

sw087:~ # ifdown ib1
    ib1       device: Mellanox Technologies MT23108 InfiniHost (rev a1)
sw087:~ # ifup ib1
    ib1       device: Mellanox Technologies MT23108 InfiniHost (rev a1)

Route output:
sw087:~ # route
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
10.4.0.0         *               255.255.0.0     U     0      0        0 eth0 
12.4.0.0         *               255.255.0.0     U     0      0        0 ib0    
12.4.0.0         *               255.255.0.0     U     0      0        0 ib1  
link-local      *               255.255.0.0     U     0      0        0 eth0
11.4.0.0         *               255.255.0.0     U     0      0        0 ib0
loopback        *               255.0.0.0       U     0      0        0 lo
default         10.4.0.211      0.0.0.0         UG    0      0        0 eth0


Host info:

sw087:~ # hostinfo 
Name         =sw087
IP           =10.4.3.87
CpuNum       =4
CpuVendor    =GenuineIntel
CpuModel     =                  Intel(R) Xeon(TM) CPU 3.20GHz
CpuMhz       =3200.190
MemSizeKb    =4047700
MachType     =x86_64
KernelRev    =2.6.16.21-0.8-smp
ChipSet      =Intel Corporation E7520 Memory Controller Hub (rev 0c)
Os           =Welcome to SUSE Linux Enterprise Server 10 (x86_64) - Kernel \r
(\l).
IBDevsNum    =1
HCA0Name     =mthca0
HCA0Desc     =sw087 HCA-1
HCA0Type     =MT23108
HCA0FWVer    =3.5.0
HCA0PSID     =MT_0030000001
HCA0GUIDS    =NODE:0x0002c9871297bce0;SYS:0x0002c9871297bce3
HCA0Ports   
=1:0x0002c9871297bce1:0x0:11.4.3.87:DOWN;2:0x0002c9871297bce2:0x0:12.4.3.87:INIT
HCA1Name     =NONE
HCA1Desc     =NONE
HCA1Type     =NONE
HCA1FWVer    =NONE
HCA1PSID     =NONE
HCA1GUIDS    =NONE
HCA1Ports    =NONE
IBStack      =/usr/local/
IBStackType  =ofed
IBStackVer   =OFED-1.2-20070211-1558
IBMPI        =/usr/local//mpi
MST_BUILD    =4.3.6
IBADM_BUILD  =IBADM 2.1.0, 20060720-1410
WRITE_BW     =/usr/local/bin/ib_write_bw


-- 
Configure bugmail: https://bugs.openfabrics.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From vlad at mellanox.co.il  Mon Feb 12 05:05:56 2007
From: vlad at mellanox.co.il (Vladimir Sokolovsky)
Date: Mon, 12 Feb 2007 15:05:56 +0200
Subject: [openib-general] Problem with install.sh openib-diags
 OFED-1.2-20070208-1508.tgz
In-Reply-To: <1171280132.31538.409786.camel@hal.voltaire.com>
References: <A15335FBE9BD2449AF2C9EF3D1EB8EA302FEF308@xmb-sjc-216.amer.cisco.com>
	<1171280132.31538.409786.camel@hal.voltaire.com>
Message-ID: <1171285556.6265.6.camel@vladsk-laptop>

On Mon, 2007-02-12 at 06:36 -0500, Hal Rosenstock wrote:
> On Sun, 2007-02-11 at 13:44, Scott Weitzenkamp (sweitzen) wrote:
> > I'm using install.sh on RHEL4 U3 x86_64
> >  
> > Preparing...               
> > ##################################################
> > kernel-ib-devel            
> > ##################################################
> > kernel-ib                  
> > ##################################################
> > error: Failed dependencies:
> >         perl(IBswcountlimits)
> 
> This is supposed to be IBswcountlimits.pm. I think there was a change
> for the localtion of this to be under <prefix>/lib/perl some days ago
> but not sure whether this change is in the OFED 1.2 install (for
> OFED-1.2-20070208). 
> 
> Vlad, do you know what is causing this error ?
> 
> -- Hal

I fixed this by adding the following line to ofa_user.spec file:

Provides: perl(IBswcountlimits)


-- 
Vladimir Sokolovsky <vlad at mellanox.co.il>
Mellanox Technologies Ltd.


From suri at baymicrosystems.com  Mon Feb 12 06:27:12 2007
From: suri at baymicrosystems.com (Suresh Shelvapille)
Date: Mon, 12 Feb 2007 09:27:12 -0500
Subject: [openib-general] patches to 2.6.19.1 kernel for switch Operation
In-Reply-To: <1171288297.31538.417657.camel@hal.voltaire.com>
References: <000601c7419f$d4470c60$ff0da8c0@amr.corp.intel.com>
	<1170072757.4555.242192.camel@hal.voltaire.com>
	<039701c7494b$6bd5d860$1914a8c0@surioffice>
	<1171050441.31538.180858.camel@hal.voltaire.com>
	<048101c74c91$e0f54dd0$1914a8c0@surioffice>
	<1171288297.31538.417657.camel@hal.voltaire.com>
Message-ID: <04ba01c74eb1$e77fd180$1914a8c0@surioffice>


Hal:

> > Ref: comment on mad.c (ib_mad_recv_done_handler().
> >
> > Even if I make the relevant changes to smi.c functions how do I get the
> > packet to get forwarded, without making additional changes in this
> function?
> >
> > Meaning, when smi_handle_dr_smp_send(),smi_check_forward_dr_smp() are
> called
> > and you determine that the packet has to be forwarded instead of
> consuming
> > where do you actually do the send? I think this chain is missing!
> 
> My initial thought was what I wrote but in looking at this further, as
> you point out, the SMI routines are only updating the packet and
> indicating its disposition. The actual sending needs to be elsewhere.
> I'm not sure what the code ends up looking like with the changes
> suggested and would just like this to look as clean as possible and use
> the SMI routines where appropriate here. Does this make sense ?
> 
I am not sure I follow this last statement. 
 

From suri at baymicrosystems.com  Mon Feb 12 06:13:25 2007
From: suri at baymicrosystems.com (Suresh Shelvapille)
Date: Mon, 12 Feb 2007 09:13:25 -0500
Subject: [openib-general] FW: patches to 2.6.19.1 kernel for switch Operation
Message-ID: <04b901c74eaf$f934aa10$1914a8c0@surioffice>

Just copying the list.

-----Original Message-----
From: Suresh Shelvapille [mailto:suri at baymicrosystems.com] 
Sent: Friday, February 09, 2007 4:33 PM
To: 'Hal Rosenstock'
Subject: RE: patches to 2.6.19.1 kernel for switch Operation


Hal:

Many thanks for your response,

Ref: comment on mad.c (ib_mad_recv_done_handler().

Even if I make the relevant changes to smi.c functions how do I get the
packet to get forwarded, without making additional changes in this function?

Meaning, when smi_handle_dr_smp_send(),smi_check_forward_dr_smp() are called
and you determine that the packet has to be forwarded instead of consuming
where do you actually do the send? I think this chain is missing!


Thanks,
Suri

> +                       if (!agent_send_response(&response->mad.mad,
> +                                                &response->grh, wc,
> +                                                port_priv->device,
> +                                                port_num,
> +                                                qp_info->qp->qp_num))
> +                               response = NULL;
> 
> Per the above change, it appears that smi_check_forward_dr_smp and
> smi_handle_dr_smp_send are no longer used at least here
> (smi_check_forward_dr_smp is not used at all with this change). Couldn't
> these be fixed to do the right thing for this case (as well as existing
> cases) ? I'm not sure your changes work for end ports (CA and router
> ports).
> 
> Also, based on smi comments below, there might also be changes to
> following:
> +                       if (!ib_get_smp_direction(&recv->mad.smp))
> +                               port_num =
> recv->mad.smp.initial_path[recv->mad.smp.hop_ptr+1];
> +                       else
> +                               port_num =
> recv->mad.smp.return_path[recv->mad.smp.hop_ptr-1];
> +
> 


From halr at voltaire.com  Mon Feb 12 06:43:26 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 12 Feb 2007 09:43:26 -0500
Subject: [openib-general] FW: patches to 2.6.19.1 kernel for switch
 Operation
In-Reply-To: <04b901c74eaf$f934aa10$1914a8c0@surioffice>
References: <04b901c74eaf$f934aa10$1914a8c0@surioffice>
Message-ID: <1171291368.31538.420357.camel@hal.voltaire.com>

On Mon, 2007-02-12 at 09:13, Suresh Shelvapille wrote:
> Just copying the list.
> 
> -----Original Message-----
> From: Suresh Shelvapille [mailto:suri at baymicrosystems.com] 
> Sent: Friday, February 09, 2007 4:33 PM
> To: 'Hal Rosenstock'
> Subject: RE: patches to 2.6.19.1 kernel for switch Operation
> 
> 
> 
> Hal:
> 
> Many thanks for your response,
> 
> Ref: comment on mad.c (ib_mad_recv_done_handler().
> 
> Even if I make the relevant changes to smi.c functions how do I get the
> packet to get forwarded, without making additional changes in this function?
> 
> Meaning, when smi_handle_dr_smp_send(),smi_check_forward_dr_smp() are called
> and you determine that the packet has to be forwarded instead of consuming
> where do you actually do the send? I think this chain is missing!

My initial thought was what I wrote but in looking at this further, as
you point out, the SMI routines are only updating the packet and
indicating its disposition. The actual sending needs to be elsewhere.
I'm not sure what the code ends up looking like with the changes
suggested and would just like this to look as clean as possible and use
the SMI routines where appropriate here. Does this make sense ?

-- Hal

> Thanks,
> Suri
> 
> > +                       if (!agent_send_response(&response->mad.mad,
> > +                                                &response->grh, wc,
> > +                                                port_priv->device,
> > +                                                port_num,
> > +                                                qp_info->qp->qp_num))
> > +                               response = NULL;
> > 
> > Per the above change, it appears that smi_check_forward_dr_smp and
> > smi_handle_dr_smp_send are no longer used at least here
> > (smi_check_forward_dr_smp is not used at all with this change). Couldn't
> > these be fixed to do the right thing for this case (as well as existing
> > cases) ? I'm not sure your changes work for end ports (CA and router
> > ports).
> > 
> > Also, based on smi comments below, there might also be changes to
> > following:
> > +                       if (!ib_get_smp_direction(&recv->mad.smp))
> > +                               port_num =
> > recv->mad.smp.initial_path[recv->mad.smp.hop_ptr+1];
> > +                       else
> > +                               port_num =
> > recv->mad.smp.return_path[recv->mad.smp.hop_ptr-1];
> > +
> > 
> 
> 
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
> 


From swise at opengridcomputing.com  Mon Feb 12 06:59:14 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Mon, 12 Feb 2007 08:59:14 -0600
Subject: [openib-general] OFED 1.2 build problem
Message-ID: <1171292354.16167.9.camel@stevo-desktop>

Dunno if this has already been resolved?

Building the 20070208-1508 OFED 1.2 kit.
RHEL3U4 with that distro's kernel.
Ran build.sh and selected "all".

It fails building ipath:

/var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/infiniband/hw/ipath/ipath_diag.c:44:22: linux/io.h: No such file or directory
/var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/infiniband/hw/ipath/ipath_diag.c: In function `ipath_diag_open':
/var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/infiniband/hw/ipath/ipath_diag.c:283: warning: implicit declaration of function `mutex_lock'
/var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/infiniband/hw/ipath/ipath_diag.c:308: warning: implicit declaration of function `mutex_unlock'
/var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/infiniband/hw/ipath/ipath_diag.c: In function `ipath_diagpkt_write':
/var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/infiniband/hw/ipath/ipath_diag.c:429: warning: implicit declaration of function `__iowrite32_copy'
make[4]: *** [/var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/infiniband/hw/ipath/ipath_diag.o] Error 1
make[3]: *** [/var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/infiniband/hw/ipath] Error 2
make[2]: *** [/var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/infiniband] Error 2
make[1]: *** [_module_/var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2] Error 2
make[1]: Leaving directory `/usr/src/kernels/2.6.9-42.EL-smp-x86_64'
make: *** [kernel] Error 2
error: Bad exit status from /var/tmp/rpm-tmp.66790 (%install)


From halr at voltaire.com  Mon Feb 12 07:18:46 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 12 Feb 2007 10:18:46 -0500
Subject: [openib-general] [PATCH TRIVIAL] opensm: remove #ifdef __WIN__
 in not shared file.
In-Reply-To: <20070208231412.GA22807@sashak.voltaire.com>
References: <20070208231412.GA22807@sashak.voltaire.com>
Message-ID: <1171293506.31538.422142.camel@hal.voltaire.com>

On Thu, 2007-02-08 at 18:14, Sasha Khapyorsky wrote:
> opensm/main.c is not shared by win OpenSM, and #ifdef __WIN__ is not
> needed here.
> 
> Signed-off-by: Sasha Khapyorsky <sashak at voltaire.com>

Thanks. Applied (to both master and ofed_1_2).

-- Hal


From mshefty at ichips.intel.com  Mon Feb 12 08:19:34 2007
From: mshefty at ichips.intel.com (Sean Hefty)
Date: Mon, 12 Feb 2007 08:19:34 -0800
Subject: [openib-general] [PATCH] RDMA/iwcm: Bugs in
 cm_conn_req_handler()
In-Reply-To: <1171135423.11017.61.camel@stevo-desktop>
References: <OFB574507E.6CAEE34E-ON6525727D.0013194D-6525727D.00132A8C@in.ibm.com>
	<ada7iusm021.fsf@cisco.com> <1171035668.26453.11.camel@trinity.ogc.int>
	<1171135423.11017.61.camel@stevo-desktop>
Message-ID: <45D09396.5060603@ichips.intel.com>

> This design is based on the RDMA_CM and IB_CM behavior.  If the app
> issues the destroy via rdma_destroy_cm_id, then we block that thread
> until all references are gone.  If the app returns non-zero in a
> callback for a given cm_id, then the CM owns destroying the cm_id and
> the application is done with it. That's the short of it.  Here's the
> long of it:

Note that the goal of this behavior is simply to ensure that no thread will 
touch any code in their callback after destroying their cm_id.  That is all that 
needs to be guaranteed, if this helps any.

- Sean


From swise at opengridcomputing.com  Mon Feb 12 08:20:07 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Mon, 12 Feb 2007 10:20:07 -0600
Subject: [openib-general] [PATCH] ofed_1-2 IWCM - Set iniator depth and
 responder resources to device max values.
In-Reply-To: <1171223899.4027.1.camel@linux-q667.site>
References: <1171223899.4027.1.camel@linux-q667.site>
Message-ID: <1171297207.16167.24.camel@stevo-desktop>

BTW:  We need this for the alpha1 build or DAPL applications won't work
over iWARP devices.

Steve.

On Sun, 2007-02-11 at 13:58 -0600, Steve WIse wrote:
> IWCM - Set initiator depth and responder resources to device max values.
> 
> For OFED 1.2, the IWCM will set the initiator depth and responder
> resources to the device max values for new connect request events.
> 
>     
> Signed-off-by: Steve Wise <swise at opengridcomputing.com>
> ---
> 
>  kernel_patches/fixes/iwcm_ordird.patch |   43 ++++++++++++++++++++++++++++++++
>  1 files changed, 43 insertions(+), 0 deletions(-)
> 
> diff --git a/kernel_patches/fixes/iwcm_ordird.patch b/kernel_patches/fixes/iwcm_ordird.patch
> new file mode 100644
> index 0000000..3a9f643
> --- /dev/null
> +++ b/kernel_patches/fixes/iwcm_ordird.patch
> @@ -0,0 +1,43 @@
> +commit 7175034c7adf6b5fb5ba311929376af7501387a1
> +Author: Steve Wise <swise at opengridcomputing.com>
> +Date:   Sat Feb 10 14:16:35 2007 -0600
> +
> +    IWCM - Set iniator depth and responder resources to device max values.
> +    
> +    For OFED 1.2, the IWCM will set the initiator depth and responder
> +    resources to the device max values for new connect request events.
> +    
> +    Signed-off-by: Steve Wise <swise at opengridcomputing.com>
> +
> +diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
> +index 9e0ab04..e3afdf8 100644
> +--- a/drivers/infiniband/core/cma.c
> ++++ b/drivers/infiniband/core/cma.c
> +@@ -1137,6 +1137,7 @@ static int iw_conn_req_handler(struct iw
> + 	struct net_device *dev = NULL;
> + 	struct rdma_cm_event event;
> + 	int ret;
> ++	struct ib_device_attr attr;
> + 
> + 	listen_id = cm_id->context;
> + 	atomic_inc(&listen_id->dev_remove);
> +@@ -1189,10 +1190,19 @@ static int iw_conn_req_handler(struct iw
> + 	sin = (struct sockaddr_in *) &new_cm_id->route.addr.dst_addr;
> + 	*sin = iw_event->remote_addr;
> + 
> ++	ret = ib_query_device(conn_id->id.device, &attr);
> ++	if (ret) {
> ++		cma_release_remove(conn_id);
> ++		rdma_destroy_id(new_cm_id);
> ++		goto out;
> ++	}
> ++	
> + 	memset(&event, 0, sizeof event);
> + 	event.event = RDMA_CM_EVENT_CONNECT_REQUEST;
> + 	event.param.conn.private_data = iw_event->private_data;
> + 	event.param.conn.private_data_len = iw_event->private_data_len;
> ++	event.param.conn.initiator_depth = attr.max_qp_init_rd_atom;	
> ++	event.param.conn.responder_resources = attr.max_qp_rd_atom;
> + 	ret = conn_id->id.event_handler(&conn_id->id, &event);
> + 	if (ret) {
> + 		/* User wants to destroy the CM ID */
> 
> 
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
> 


From changquing.tang at hp.com  Mon Feb 12 08:21:57 2007
From: changquing.tang at hp.com (Tang, Changqing)
Date: Mon, 12 Feb 2007 16:21:57 -0000
Subject: [openib-general] Immediate data question
In-Reply-To: <309a667c0702112110h7a79961fv9c3cf46d4392e1d4@mail.gmail.com>
References: <6C2C79E72C305246B504CBA17B5500C905DC04@mtlexch01.mtl.com>
	<adatzy0qmt3.fsf@cisco.com>
	<349DCDA352EACF42A0C49FA6DCEA840350B1B5@G3W0634.americas.hpqcorp.net>
	<ada7iuwp5rr.fsf@cisco.com>
	<349DCDA352EACF42A0C49FA6DCEA84035DF880@G3W0634.americas.hpqcorp.net>
	<adamz3pfym0.fsf@cisco.com>
	<349DCDA352EACF42A0C49FA6DCEA84035DFAFF@G3W0634.americas.hpqcorp.net>
	<adahctxeds8.fsf@cisco.com>
	<6.2.0.14.2.20070208131610.085c52f8@esmail.cup.hp.com>
	<349DCDA352EACF42A0C49FA6DCEA840362A87E@G3W0634.americas.hpqcorp.net>
	<309a667c0702112110h7a79961fv9c3cf46d4392e1d4@mail.gmail.com>
Message-ID: <349DCDA352EACF42A0C49FA6DCEA8403685756@G3W0634.americas.hpqcorp.net>

 
> > 1. process A and process B is connected with QP. A first 
> post a send 
> > to B, B does not post receive. Then A and B are doing a long time 
> > RDMA_WRITE each other, A and B just check memory for the RDMA_WRITE 
> > message. Finally B will post a receive. Does the first 
> pending send in 
> > A block all the later RDMA_WRITE ?
> According to IBTA spec HCA will process WR entries in strict 
> order in which they are posted so the send will block all WR 
> posted after this send, Until-unless HCA has multiple 
> processing elements, I think even then processing order will 
> be maintained by HCA 

Thanks, I can not use such code style.


> >
> > 2. extend above to three processes, A connect to B, B 
> connect to C, so 
> > B has two QPs, but one CQ. A posts a send to B, B does not post 
> > receive, rather B and C are doing a long time RDMA_WRITE, or 
> > send/recv. But B must sends RNR periodically to A, right?. 
> So does the 
> > pending message from A affects B's overall performance  
> between B and C ?

Do you have any idea about this second situation ?

--CQ


> >
> >         Thank you.
> >
> > --CQ
> >
> >
> > >
> > > Mike
> > >
> > >
> > >
> >
> > _______________________________________________
> > openib-general mailing list
> > openib-general at openib.org
> > http://openib.org/mailman/listinfo/openib-general
> >
> > To unsubscribe, please visit 
> > http://openib.org/mailman/listinfo/openib-general
> >
> >
> 


From tom at opengridcomputing.com  Mon Feb 12 08:37:33 2007
From: tom at opengridcomputing.com (Tom Tucker)
Date: Mon, 12 Feb 2007 10:37:33 -0600
Subject: [openib-general] [PATCH] RDMA/iwcm: Bugs in
 cm_conn_req_handler()
In-Reply-To: <45D09396.5060603@ichips.intel.com>
References: <OFB574507E.6CAEE34E-ON6525727D.0013194D-6525727D.00132A8C@in.ibm.com>
	<ada7iusm021.fsf@cisco.com> <1171035668.26453.11.camel@trinity.ogc.int>
	<1171135423.11017.61.camel@stevo-desktop>
	<45D09396.5060603@ichips.intel.com>
Message-ID: <1171298253.12228.9.camel@trinity.ogc.int>

On Mon, 2007-02-12 at 08:19 -0800, Sean Hefty wrote:
> > This design is based on the RDMA_CM and IB_CM behavior.  If the app
> > issues the destroy via rdma_destroy_cm_id, then we block that thread
> > until all references are gone.  If the app returns non-zero in a
> > callback for a given cm_id, then the CM owns destroying the cm_id and
> > the application is done with it. That's the short of it.  Here's the
> > long of it:
> 
> Note that the goal of this behavior is simply to ensure that no thread will 
> touch any code in their callback after destroying their cm_id.  That is all that 
> needs to be guaranteed, if this helps any.

It help a lot actually. We've discussed simplifying this code by not
blocking the destroy and guaranteeing that events received after the
destroy are never delivered, but we didn't want to do something this
drastic without some time to get it right.

> 
> - Sean


From ossrosch at linux.vnet.ibm.com  Mon Feb 12 08:36:34 2007
From: ossrosch at linux.vnet.ibm.com (Stefan Roscher)
Date: Mon, 12 Feb 2007 17:36:34 +0100
Subject: [openib-general] 32-bit build for ppc64 is required
Message-ID: <200702121736.35468.ossrosch@linux.vnet.ibm.com>

Hi,

after building the latest ofed build package we recognized that on PPC64 only
64-bit libaries were build.
Because we have customers using older userpace apllications which are
certified for 32-bit we think additional 32bit support is a requirement for 64bit builds.

If OFED 1.2 supports 32 bit on ppc64, we have to change the install
directory.I would suggest to install 32-bit binaries into
/usr/local/ofed/bin32 directory. So no changes on current naming conventions
has to be done.The libaries are installed in the /usr/local/ofed/lib directory.

Feedback appriciated.


Kind regards Stefan Roscher


From tziporet at mellanox.co.il  Mon Feb 12 08:42:10 2007
From: tziporet at mellanox.co.il (Tziporet Koren)
Date: Mon, 12 Feb 2007 18:42:10 +0200
Subject: [openib-general] OFED 1.2 components list - for the meeting today
Message-ID: <45D098E2.6000804@mellanox.co.il>

This is the full OFED 1.2 components list that we will review in the meeting 

Tziporet

# Kernel
ib_verbs (core)
ib_mthca
ib_ipoib
ib_ipath - currently works on 2.6.20 only. Backport patches cannot applied
ib_iser
ib_sdp
ib_srp
ib_ehca - PPC only
cxgb3
vnic
rds - currently works on kernel 2.6.20 and 2.6.19
ib-bonding - RHEL4UP3 & SLES10 
 
# User libraries
libibverbs
libibcm
libmthca
libipathverbs
libcxgb3
libsdp
libehca
sdpnetstat
libibcommon
libibmad
libibumad
libopensm
libosmcomp
libosmvendor
librdmacm
dapl - not working with iWARP

# User utilities
perftest
mstflint
ibutils
opensm
qlvnictools
openib-diags
srptools
ipoibtools
tvflash

# MPI:
mvapich
mvapich2 - Build issue
openmpi
mpitests

# OFED specific:
ofed_docs - taken from 1.1 - not yet updated for 1.2
ofed_scripts


From halr at voltaire.com  Mon Feb 12 08:49:15 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 12 Feb 2007 11:49:15 -0500
Subject: [openib-general] OFED 1.2 components list - for the meeting
 today
In-Reply-To: <45D098E2.6000804@mellanox.co.il>
References: <45D098E2.6000804@mellanox.co.il>
Message-ID: <1171298946.31538.427171.camel@hal.voltaire.com>

On Mon, 2007-02-12 at 11:42, Tziporet Koren wrote:
> This is the full OFED 1.2 components list that we will review in the meeting 
> 
> Tziporet
> 
> # Kernel
> ib_verbs (core)
> ib_mthca
> ib_ipoib
> ib_ipath - currently works on 2.6.20 only. Backport patches cannot applied
> ib_iser
> ib_sdp
> ib_srp
> ib_ehca - PPC only
> cxgb3
> vnic
> rds - currently works on kernel 2.6.20 and 2.6.19
> ib-bonding - RHEL4UP3 & SLES10 

Was ib_madeye carried over from OFED 1.1 or does this need to be added
for OFED 1.2 ?

-- Hal

> # User libraries
> libibverbs
> libibcm
> libmthca
> libipathverbs
> libcxgb3
> libsdp
> libehca
> sdpnetstat
> libibcommon
> libibmad
> libibumad
> libopensm
> libosmcomp
> libosmvendor
> librdmacm
> dapl - not working with iWARP
> 
> # User utilities
> perftest
> mstflint
> ibutils
> opensm
> qlvnictools
> openib-diags
> srptools
> ipoibtools
> tvflash
> 
> # MPI:
> mvapich
> mvapich2 - Build issue
> openmpi
> mpitests
> 
> # OFED specific:
> ofed_docs - taken from 1.1 - not yet updated for 1.2
> ofed_scripts
> 
> 
> 
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
> 


From swise at opengridcomputing.com  Mon Feb 12 08:55:36 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Mon, 12 Feb 2007 10:55:36 -0600
Subject: [openib-general] cxgb3 compilation fails on RHEL4.0U3
In-Reply-To: <1171296412.6265.24.camel@vladsk-laptop>
References: <1171296412.6265.24.camel@vladsk-laptop>
Message-ID: <1171299336.16167.31.camel@stevo-desktop>

I only backported to RHEL4U4 since that was the supported platform.  

Is OFED 1.2 supporting U3 too?  

I can add the backport if needed.


On Mon, 2007-02-12 at 18:06 +0200, Vladimir Sokolovsky wrote:
> Hi Steve,
> I got the following compilation failure on RHEL4.0U3 (2.6.9-34.ELsmp):
> 
>   gcc -Wp,-MD,/var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/net/cxgb3/.cxgb3_offload.o.d -nostdinc -iwithprefix include -I/var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/kernel_addons/backport/2.6.9_U3/include/  -I/var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/include  -I/var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/infiniband/include  -Iinclude    -include include/linux/autoconf.h  -include /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/include/linux/autoconf.h   -D__KERNEL__ -I/var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/kernel_addons/backport/2.6.9_U3/include/  -I/var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/include  -I/var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/infiniband/include  -Iinclude    -include include/linux/autoconf.h  -include /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/include/linux/autoconf.h   -Wall -Wstrict-prototypes -Wno-trigraphs -fno-strict-aliasing -fno-common -Os -fomit-frame-pointer -g -Wdeclaration-after-statement  -mno-red-zone -mcmodel=kernel -pipe -fno-reorder-blocks      -Wno-sign-compare -
 f!
>  unit-at-a-time   -I/var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/include -I/var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/infiniband/include  -I/var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/infiniband/ulp/ipoib  -I/var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/infiniband/debug  -I/var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/infiniband/hw/cxgb3/core  -I/var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/net/cxgb3  -I/var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/net/rds   -DMODULE -DKBUILD_BASENAME=cxgb3_offload -DKBUILD_MODNAME=cxgb3 -c -o /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/net/cxgb3/.tmp_cxgb3_offload.o /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/net/cxgb3/cxgb3_offload.c
> /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/net/cxgb3/cxgb3_offload.c:57: error: syntax error before "adapter_list_lock"
> /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/net/cxgb3/cxgb3_offload.c:57: warning: type defaults to `int' in declaration of `adapter_list_lock'
> /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/net/cxgb3/cxgb3_offload.c:57: error: incompatible types in initialization
> /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/net/cxgb3/cxgb3_offload.c:57: error: initializer element is not constant
> /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/net/cxgb3/cxgb3_offload.c:57: warning: data definition has no type or storage class
> /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/net/cxgb3/cxgb3_offload.c: In function `is_offloading':
> /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/net/cxgb3/cxgb3_offload.c:885: warning: passing arg 1 of `_read_lock_bh' from incompatible pointer type
> /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/net/cxgb3/cxgb3_offload.c:889: warning: passing arg 1 of `_read_unlock_bh' from incompatible pointer type
> /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/net/cxgb3/cxgb3_offload.c:894: warning: passing arg 1 of `_read_unlock_bh' from incompatible pointer type
> /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/net/cxgb3/cxgb3_offload.c: In function `add_adapter':
> /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/net/cxgb3/cxgb3_offload.c:1062: warning: passing arg 1 of `_write_lock_bh' from incompatible pointer type
> /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/net/cxgb3/cxgb3_offload.c:1064: warning: passing arg 1 of `_write_unlock_bh' from incompatible pointer type
> /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/net/cxgb3/cxgb3_offload.c: In function `remove_adapter':
> /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/net/cxgb3/cxgb3_offload.c:1069: warning: passing arg 1 of `_write_lock_bh' from incompatible pointer type
> /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/net/cxgb3/cxgb3_offload.c:1071: warning: passing arg 1 of `_write_unlock_bh' from incompatible pointer type
> make[3]: *** [/var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/net/cxgb3/cxgb3_offload.o] Error 1
> make[2]: *** [/var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2/drivers/net/cxgb3] Error 2
> make[1]: *** [_module_/var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2] Error 2
> make[1]: Leaving directory `/usr/src/kernels/2.6.9-34.EL-smp-x86_64'
> make: *** [kernel] Error 2
> 
> 


From mshefty at ichips.intel.com  Mon Feb 12 09:23:06 2007
From: mshefty at ichips.intel.com (Sean Hefty)
Date: Mon, 12 Feb 2007 09:23:06 -0800
Subject: [openib-general] Problem is routing CM REQ
In-Reply-To: <20070211230935.GT11411@obsidianresearch.com>
References: <20070210004820.GS11411@obsidianresearch.com>
	<000001c74cb8$5e80eef0$3cd4180a@amr.corp.intel.com>
	<20070211230935.GT11411@obsidianresearch.com>
Message-ID: <45D0A27A.2010302@ichips.intel.com>

> Ah, I think I missed the key step in your scheme.. You plan to query
> the local SM for SGID=remote DGID=local? (ie reversed from 'normal'. I
> was thinking only about the SGID=local DGID=remote query direction)

I'm not sure that the query needs the GIDs reversed, as long as the path is 
reversible.  So, the local query would be:

SGID=local, DGID=remote, reversible=1

And the remote query would be:

SGID=local, DGID=remote, reversible=1,
TClass & FlowLabel=from previous query response

Use of reversible indicates that the remote side can send a packet back, and it 
will be received successfully at the local side.  This seems to imply 
information about the local routing tables and GID to LID mappings.  That is, 
packets traveling from the SGID->DGID and DGID->SGID use the same local LID pair.

>    SA                                                      SA'
> Node1 --> (LID 1) Router A -------  Router A' (LID A) ---> Node2
>       |-> (LID 2) Router A                              |
>       |-> (LID 3) Router B -------  Router B' (LID B) --|
> 
> Router A and Router B are independent redundant devices, not a route
> cloud of some sort. B -> A' is not a possible path.

Since A' and B' connect to the same subnet, B -> A' should be a valid path.

> So your idea is to do:
>   PR0: Node 1 asks SA for Node1 -> Node2 reversable path.
>        SA returns SLID=Node1 DLID=1, FlowLabel=Magic Reversable
>        indicator. This path is used for CM GMPs, or for the
>        normal non-routed CM.
>   PR1: Detecting a routed situation from PR0, 
>        Node 1 asks SA for Node2 -> Node1. SA returns SLID=1
>        DLID=Node1 and a GRH that configures Router A to use SLID=1
>        You reverse the local LIDS from that path to get the QP
>        configuration.

I think PR0 and PR1 could be the same.

> I can think of the following downsides:
>  1) Re-reading Michael Krause's email makes me think that defeating
>     the QP SLID check is contrary to the spirit of IBA

I don't think we need to defeat the QP SLID check if we want extra routing, but 
having redundant routers use the same link layer address isn't necessarily a bad 
thing.

>  4) Some means of remote SA communication needs to be decided
>     pre-standardization :< (I agree that a magic GID seems best)

I think this is the first thing that must be solved, regardless of other 
details.  We should see if we can at least get agreement on this, and if there 
are any issues.

> But this has turned into such a complex problem it seems really hard
> to predict what will pass through to standardization.. That is the
> main benifit I see of the small change to the passive side. No matter
> what is standardized it can be accomidated in the resulting
> standard, wheras defining a PR with SGID==offsubnet to mean one thing
> or another seems much more risky.

I think the only thing we're asking for so far is a magic GID, unless I'm 
reading too much into what a reversible path indicates.

- Sean


From robert.j.woodruff at intel.com  Mon Feb 12 09:58:43 2007
From: robert.j.woodruff at intel.com (Woodruff, Robert J)
Date: Mon, 12 Feb 2007 09:58:43 -0800
Subject: [openib-general] OFED 1.2 components list - for the meeting
 today
In-Reply-To: <45D098E2.6000804@mellanox.co.il>
Message-ID: <BAE9DCEF64577A439B3A37F36F9B691C01B288A3@orsmsx418.amr.corp.intel.com>


BTW.

Is the ibdiagui code going to be part of this release. 
I did not see it in the list below or is it just part of 
the openib-diags ?
I thought that we discussed this as an OFED 1.2 feature.
I have someone that is interested in trying it out.

woody
 

-----Original Message-----
From: openib-general-bounces at openib.org
[mailto:openib-general-bounces at openib.org] On Behalf Of Tziporet Koren
Sent: Monday, February 12, 2007 8:42 AM
To: OPENIB; EWG
Subject: [openib-general] OFED 1.2 components list - for the meeting
today

This is the full OFED 1.2 components list that we will review in the
meeting 

Tziporet

# Kernel
ib_verbs (core)
ib_mthca
ib_ipoib
ib_ipath - currently works on 2.6.20 only. Backport patches cannot
applied
ib_iser
ib_sdp
ib_srp
ib_ehca - PPC only
cxgb3
vnic
rds - currently works on kernel 2.6.20 and 2.6.19
ib-bonding - RHEL4UP3 & SLES10 
 
# User libraries
libibverbs
libibcm
libmthca
libipathverbs
libcxgb3
libsdp
libehca
sdpnetstat
libibcommon
libibmad
libibumad
libopensm
libosmcomp
libosmvendor
librdmacm
dapl - not working with iWARP

# User utilities
perftest
mstflint
ibutils
opensm
qlvnictools
openib-diags
srptools
ipoibtools
tvflash

# MPI:
mvapich
mvapich2 - Build issue
openmpi
mpitests

# OFED specific:
ofed_docs - taken from 1.1 - not yet updated for 1.2
ofed_scripts


_______________________________________________
openib-general mailing list
openib-general at openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit
http://openib.org/mailman/listinfo/openib-general


From swise at opengridcomputing.com  Mon Feb 12 10:07:49 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Mon, 12 Feb 2007 12:07:49 -0600
Subject: [openib-general] cxgb3 compilation fails on RHEL4.0U3
In-Reply-To: <1171302989.12725.0.camel@vladsk-laptop>
References: <1171296412.6265.24.camel@vladsk-laptop>
	<1171299336.16167.31.camel@stevo-desktop>
	<1171302989.12725.0.camel@vladsk-laptop>
Message-ID: <1171303669.16167.48.camel@stevo-desktop>

On Mon, 2007-02-12 at 19:56 +0200, Vladimir Sokolovsky wrote:
> On Mon, 2007-02-12 at 10:55 -0600, Steve Wise wrote:
> > I only backported to RHEL4U4 since that was the supported platform.  
> > 
> > Is OFED 1.2 supporting U3 too?  
> > 
> > I can add the backport if needed.
> > 
> 
> 
> RHEL4U3 is not officially supported but there are some patches for cxgb3
> under kernel_patches/backport/2.6.9_U3:
> 
> kernel_patches/backport/2.6.9_U3/cxgb3_main_to_2_6_13.patch 
> kernel_patches/backport/2.6.9_U3/cxgb3_makefile_to_2_6_19.patch

Looks like Michael added this with commit: 

ea110866d640317fe889abdc3aaba317ae20da65

For alpha1, please just don't build cxgb3/libcxb3 for RHEL4U3. 

Steve.


From todd.rimmer at qlogic.com  Mon Feb 12 10:17:38 2007
From: todd.rimmer at qlogic.com (Todd Rimmer)
Date: Mon, 12 Feb 2007 12:17:38 -0600
Subject: [openib-general] Problem is routing CM REQ
In-Reply-To: <45D0A27A.2010302@ichips.intel.com>
Message-ID: <4FB1BCCAE6CAED44A1DC005B1DE061191A5D8F@EPEXCH2.qlogic.org>

> From: Sean Hefty
> Sent: Monday, February 12, 2007 12:23 PM
> To: Jason Gunthorpe; Hal Rosenstock
> Cc: openib-general at openib.org
> Subject: Re: [openib-general] Problem is routing CM REQ

There has been a lot of good discussion and proposed designs for this
solution. I think it would be very helpful for Sean and Jason to put
together a single living document (which could become a kernel
Documents/ file later) summarizing the present proposed solution and the
expectations from each component (router, SM/SA, CM, etc).

That would certainly be a lot easier to follow than attempting to piece
together the conclusions from this long email chain.  It would also
likely avoid omissions and allow for easier review by a larger audience.

Thank you,
Todd Rimmer


From goatsbenefactresses at draka.fr  Mon Feb 12 11:40:57 2007
From: goatsbenefactresses at draka.fr (Trey Irek)
Date: Mon, 12 Feb 2007 18:40:57 -0060
Subject: [openib-general] Fwd: MHII
Message-ID: <1ADU1MZRRH0_VN9CM_HKM9IB@draka.fr>

Good day

Todays market started and we have the latest news for investors:

MHII at OBB
Last: 0.02

We know that you have a stake in fresh and live data only.The effectiveness of your and our work depends on truthful information and live data. That is why we offer you only online news which is represent the facts.
MHII is on vantage point now and the comapny is going to increase their positions. 
Don't lose time. It's better moment to act now.

Call your broker now.

els0o2bakjc1m1zy3kd0bvd839ikitymqnx9sel
746A6E326A6A6645757367746C7573676C737771746A6E3377
9WADC4CIVTE546J9GUPM90EDVZHUTOETU96XQ


From halr at voltaire.com  Mon Feb 12 11:07:57 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 12 Feb 2007 14:07:57 -0500
Subject: [openib-general] OFED 1.2 components list - for the meeting
 today
In-Reply-To: <BAE9DCEF64577A439B3A37F36F9B691C01B288A3@orsmsx418.amr.corp.intel.com>
References: <BAE9DCEF64577A439B3A37F36F9B691C01B288A3@orsmsx418.amr.corp.intel.com>
Message-ID: <1171307245.31538.434613.camel@hal.voltaire.com>

On Mon, 2007-02-12 at 12:58, Woodruff, Robert J wrote:
> BTW.
> 
> Is the ibdiagui code going to be part of this release. 
> I did not see it in the list below or is it just part of 
> the openib-diags ?

It's part of ibutils.

-- Hal

> I thought that we discussed this as an OFED 1.2 feature.
> I have someone that is interested in trying it out.
> 
> woody
>  
> 
> -----Original Message-----
> From: openib-general-bounces at openib.org
> [mailto:openib-general-bounces at openib.org] On Behalf Of Tziporet Koren
> Sent: Monday, February 12, 2007 8:42 AM
> To: OPENIB; EWG
> Subject: [openib-general] OFED 1.2 components list - for the meeting
> today
> 
> This is the full OFED 1.2 components list that we will review in the
> meeting 
> 
> Tziporet
> 
> # Kernel
> ib_verbs (core)
> ib_mthca
> ib_ipoib
> ib_ipath - currently works on 2.6.20 only. Backport patches cannot
> applied
> ib_iser
> ib_sdp
> ib_srp
> ib_ehca - PPC only
> cxgb3
> vnic
> rds - currently works on kernel 2.6.20 and 2.6.19
> ib-bonding - RHEL4UP3 & SLES10 
>  
> # User libraries
> libibverbs
> libibcm
> libmthca
> libipathverbs
> libcxgb3
> libsdp
> libehca
> sdpnetstat
> libibcommon
> libibmad
> libibumad
> libopensm
> libosmcomp
> libosmvendor
> librdmacm
> dapl - not working with iWARP
> 
> # User utilities
> perftest
> mstflint
> ibutils
> opensm
> qlvnictools
> openib-diags
> srptools
> ipoibtools
> tvflash
> 
> # MPI:
> mvapich
> mvapich2 - Build issue
> openmpi
> mpitests
> 
> # OFED specific:
> ofed_docs - taken from 1.1 - not yet updated for 1.2
> ofed_scripts
> 
> 
> 
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit
> http://openib.org/mailman/listinfo/openib-general
> 
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
> 


From rdreier at cisco.com  Mon Feb 12 11:11:06 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Mon, 12 Feb 2007 11:11:06 -0800
Subject: [openib-general] [PATCH 4 of 4] IB/mthca: give reserved MTTs a
 separate cache line
In-Reply-To: <20070210211726.GE14903@mellanox.co.il> (Michael S.
	Tsirkin's message of "Sat, 10 Feb 2007 23:17:26 +0200")
References: <20070210211726.GE14903@mellanox.co.il>
Message-ID: <adavei76vjp.fsf@cisco.com>

Thanks, applied as 2 separate patches.


From swise at opengridcomputing.com  Mon Feb 12 11:30:10 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Mon, 12 Feb 2007 13:30:10 -0600
Subject: [openib-general] [PATCH][RFC] iw_cxgb3/2.6.21 - Don't use the
 physical address for mapping memory into userspace.
Message-ID: <1171308610.16167.69.camel@stevo-desktop>

Roland, can you review this?


-----


From: Steve Wise <swise at opengridcomputing.com>

Currently iw_cxgb3 uses the physical address as the key/offset to return
to the user process for maping kernel memory into userspace.  The user
process then calls mmap() using this key as the offset.  Because the
physical address is 64 bits, this introduces a problem with 32-bit
userspace, which might not be able to pass an arbitrary 64-bit address
back into the kernel (since mmap2() is limited to a 32-bit number of
pages for the offset, which limits it to 44-bit addresses).

Change the mmap logic to use a u32 counter as the offset for mapping.

Signed-off-by: Steve Wise <swise at opengridcomputing.com>
---

 drivers/infiniband/hw/cxgb3/iwch_provider.c |   66 +++++++++++++++++----------
 drivers/infiniband/hw/cxgb3/iwch_provider.h |   13 +++--
 drivers/infiniband/hw/cxgb3/iwch_user.h     |    6 +-
 3 files changed, 52 insertions(+), 33 deletions(-)

diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.c b/drivers/infiniband/hw/cxgb3/iwch_provider.c
index d02cd72..b2c88d6 100644
--- a/drivers/infiniband/hw/cxgb3/iwch_provider.c
+++ b/drivers/infiniband/hw/cxgb3/iwch_provider.c
@@ -115,7 +115,7 @@ static struct ib_ucontext *iwch_alloc_uc
 	struct iwch_dev *rhp = to_iwch_dev(ibdev);
 
 	PDBG("%s ibdev %p\n", __FUNCTION__, ibdev);
-	context = kmalloc(sizeof(*context), GFP_KERNEL);
+	context = kzalloc(sizeof(*context), GFP_KERNEL);
 	if (!context)
 		return ERR_PTR(-ENOMEM);
 	cxio_init_ucontext(&rhp->rdev, &context->uctx);
@@ -141,13 +141,14 @@ static int iwch_destroy_cq(struct ib_cq 
 }
 
 static struct ib_cq *iwch_create_cq(struct ib_device *ibdev, int entries,
-			     struct ib_ucontext *context,
+			     struct ib_ucontext *ib_context,
 			     struct ib_udata *udata)
 {
 	struct iwch_dev *rhp;
 	struct iwch_cq *chp;
 	struct iwch_create_cq_resp uresp;
 	struct iwch_create_cq_req ureq;
+	struct iwch_ucontext *ucontext = NULL;
 
 	PDBG("%s ib_dev %p entries %d\n", __FUNCTION__, ibdev, entries);
 	rhp = to_iwch_dev(ibdev);
@@ -155,12 +156,15 @@ static struct ib_cq *iwch_create_cq(stru
 	if (!chp)
 		return ERR_PTR(-ENOMEM);
 
-	if (context && !t3a_device(rhp)) {
-		if (ib_copy_from_udata(&ureq, udata, sizeof (ureq))) {
-			kfree(chp);
-			return ERR_PTR(-EFAULT);
+	if (ib_context) {
+		ucontext = to_iwch_ucontext(ib_context);
+		if (!t3a_device(rhp)) {
+			if (ib_copy_from_udata(&ureq, udata, sizeof (ureq))) {
+				kfree(chp);
+				return ERR_PTR(-EFAULT);
+			}
+			chp->user_rptr_addr = (u32 __user *)(unsigned long)ureq.user_rptr_addr;
 		}
-		chp->user_rptr_addr = (u32 __user *)(unsigned long)ureq.user_rptr_addr;
 	}
 
 	if (t3a_device(rhp)) {
@@ -190,7 +194,7 @@ static struct ib_cq *iwch_create_cq(stru
 	init_waitqueue_head(&chp->wait);
 	insert_handle(rhp, &rhp->cqidr, chp, chp->cq.cqid);
 
-	if (context) {
+	if (ucontext) {
 		struct iwch_mm_entry *mm;
 
 		mm = kmalloc(sizeof *mm, GFP_KERNEL);
@@ -200,16 +204,20 @@ static struct ib_cq *iwch_create_cq(stru
 		}
 		uresp.cqid = chp->cq.cqid;
 		uresp.size_log2 = chp->cq.size_log2;
-		uresp.physaddr = virt_to_phys(chp->cq.queue);
+		spin_lock(&ucontext->mmap_lock);
+		uresp.key = ucontext->key;
+		ucontext->key += PAGE_SIZE;
+		spin_unlock(&ucontext->mmap_lock);
 		if (ib_copy_to_udata(udata, &uresp, sizeof (uresp))) {
 			kfree(mm);
 			iwch_destroy_cq(&chp->ibcq);
 			return ERR_PTR(-EFAULT);
 		}
-		mm->addr = uresp.physaddr;
+		mm->key = uresp.key;
+		mm->addr = virt_to_phys(chp->cq.queue);
 		mm->len = PAGE_ALIGN((1UL << uresp.size_log2) *
 					     sizeof (struct t3_cqe));
-		insert_mmap(to_iwch_ucontext(context), mm);
+		insert_mmap(ucontext, mm);
 	}
 	PDBG("created cqid 0x%0x chp %p size 0x%0x, dma_addr 0x%0llx\n",
 	     chp->cq.cqid, chp, (1 << chp->cq.size_log2),
@@ -316,14 +324,14 @@ static int iwch_arm_cq(struct ib_cq *ibc
 static int iwch_mmap(struct ib_ucontext *context, struct vm_area_struct *vma)
 {
 	int len = vma->vm_end - vma->vm_start;
-	u64 pgaddr = vma->vm_pgoff << PAGE_SHIFT;
+	u32 key = vma->vm_pgoff << PAGE_SHIFT;
 	struct cxio_rdev *rdev_p;
 	int ret = 0;
 	struct iwch_mm_entry *mm;
 	struct iwch_ucontext *ucontext;
 
-	PDBG("%s off 0x%lx addr 0x%llx len %d\n", __FUNCTION__, vma->vm_pgoff,
-	     pgaddr, len);
+	PDBG("%s pgoff 0x%lx key 0x%x len %d\n", __FUNCTION__, vma->vm_pgoff,
+	     key, len);
 
 	if (vma->vm_start & (PAGE_SIZE-1)) {
 	        return -EINVAL;
@@ -332,13 +340,13 @@ static int iwch_mmap(struct ib_ucontext 
 	rdev_p = &(to_iwch_dev(context->device)->rdev);
 	ucontext = to_iwch_ucontext(context);
 
-	mm = remove_mmap(ucontext, pgaddr, len);
+	mm = remove_mmap(ucontext, key, len);
 	if (!mm)
 		return -EINVAL;
 	kfree(mm);
 
-	if ((pgaddr >= rdev_p->rnic_info.udbell_physbase) &&
-	    (pgaddr < (rdev_p->rnic_info.udbell_physbase +
+	if ((mm->addr >= rdev_p->rnic_info.udbell_physbase) &&
+	    (mm->addr < (rdev_p->rnic_info.udbell_physbase +
 		       rdev_p->rnic_info.udbell_len))) {
 
 		/*
@@ -351,15 +359,17 @@ static int iwch_mmap(struct ib_ucontext 
 		vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
 		vma->vm_flags |= VM_DONTCOPY | VM_DONTEXPAND;
 		vma->vm_flags &= ~VM_MAYREAD;
-		ret = io_remap_pfn_range(vma, vma->vm_start, vma->vm_pgoff,
-				       len, vma->vm_page_prot);
+		ret = io_remap_pfn_range(vma, vma->vm_start, 
+					 mm->addr >> PAGE_SHIFT,
+				         len, vma->vm_page_prot);
 	} else {
 
 		/*
 		 * Map WQ or CQ contig dma memory...
 		 */
-		ret = remap_pfn_range(vma, vma->vm_start, vma->vm_pgoff,
-				       len, vma->vm_page_prot);
+		ret = remap_pfn_range(vma, vma->vm_start, 
+				      mm->addr >> PAGE_SHIFT,
+				      len, vma->vm_page_prot);
 	}
 
 	return ret;
@@ -853,18 +863,24 @@ static struct ib_qp *iwch_create_qp(stru
 		uresp.size_log2 = qhp->wq.size_log2;
 		uresp.sq_size_log2 = qhp->wq.sq_size_log2;
 		uresp.rq_size_log2 = qhp->wq.rq_size_log2;
-		uresp.physaddr = virt_to_phys(qhp->wq.queue);
-		uresp.doorbell = qhp->wq.udb;
+		spin_lock(&ucontext->mmap_lock);
+		uresp.key = ucontext->key;
+		ucontext->key += PAGE_SIZE;
+		uresp.db_key = ucontext->key;
+		ucontext->key += PAGE_SIZE;
+		spin_unlock(&ucontext->mmap_lock);
 		if (ib_copy_to_udata(udata, &uresp, sizeof (uresp))) {
 			kfree(mm1);
 			kfree(mm2);
 			iwch_destroy_qp(&qhp->ibqp);
 			return ERR_PTR(-EFAULT);
 		}
-		mm1->addr = uresp.physaddr;
+		mm1->key = uresp.key;
+		mm1->addr = virt_to_phys(qhp->wq.queue);
 		mm1->len = PAGE_ALIGN(wqsize * sizeof (union t3_wr));
 		insert_mmap(ucontext, mm1);
-		mm2->addr = uresp.doorbell & PAGE_MASK;
+		mm2->key = uresp.db_key;
+		mm2->addr = qhp->wq.udb & PAGE_MASK;
 		mm2->len = PAGE_SIZE;
 		insert_mmap(ucontext, mm2);
 	}
diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.h b/drivers/infiniband/hw/cxgb3/iwch_provider.h
index b2eb29e..463e746 100644
--- a/drivers/infiniband/hw/cxgb3/iwch_provider.h
+++ b/drivers/infiniband/hw/cxgb3/iwch_provider.h
@@ -184,6 +184,7 @@ struct ib_qp *iwch_get_qp(struct ib_devi
 struct iwch_ucontext {
 	struct ib_ucontext ibucontext;
 	struct cxio_ucontext uctx;
+	u32 key;
 	spinlock_t mmap_lock;
 	struct list_head mmaps;
 };
@@ -196,11 +197,12 @@ static inline struct iwch_ucontext *to_i
 struct iwch_mm_entry {
 	struct list_head entry;
 	u64 addr;
+	u32 key;
 	unsigned len;
 };
 
 static inline struct iwch_mm_entry *remove_mmap(struct iwch_ucontext *ucontext,
-						u64 addr, unsigned len)
+						u32 key, unsigned len)
 {
 	struct list_head *pos, *nxt;
 	struct iwch_mm_entry *mm;
@@ -209,11 +211,11 @@ static inline struct iwch_mm_entry *remo
 	list_for_each_safe(pos, nxt, &ucontext->mmaps) {
 
 		mm = list_entry(pos, struct iwch_mm_entry, entry);
-		if (mm->addr == addr && mm->len == len) {
+		if (mm->key == key && mm->len == len) {
 			list_del_init(&mm->entry);
 			spin_unlock(&ucontext->mmap_lock);
-			PDBG("%s addr 0x%llx len %d\n", __FUNCTION__, mm->addr,
-			     mm->len);
+			PDBG("%s addr 0x%llx key 0x%x len %d\n", 
+			     __FUNCTION__, mm->addr, mm->key, mm->len);
 			return mm;
 		}
 	}
@@ -225,7 +227,8 @@ static inline void insert_mmap(struct iw
 			       struct iwch_mm_entry *mm)
 {
 	spin_lock(&ucontext->mmap_lock);
-	PDBG("%s addr 0x%llx len %d\n", __FUNCTION__, mm->addr, mm->len);
+	PDBG("%s addr 0x%llx key 0x%x len %d\n", __FUNCTION__, 
+	     mm->addr, mm->key, mm->len);
 	list_add_tail(&mm->entry, &ucontext->mmaps);
 	spin_unlock(&ucontext->mmap_lock);
 }
diff --git a/drivers/infiniband/hw/cxgb3/iwch_user.h b/drivers/infiniband/hw/cxgb3/iwch_user.h
index 14e1517..c4e7fbe 100644
--- a/drivers/infiniband/hw/cxgb3/iwch_user.h
+++ b/drivers/infiniband/hw/cxgb3/iwch_user.h
@@ -47,14 +47,14 @@ struct iwch_create_cq_req {
 };
 
 struct iwch_create_cq_resp {
-	__u64 physaddr;
+	__u64 key;
 	__u32 cqid;
 	__u32 size_log2;
 };
 
 struct iwch_create_qp_resp {
-	__u64 physaddr;
-	__u64 doorbell;
+	__u64 key;
+	__u64 db_key;
 	__u32 qpid;
 	__u32 size_log2;
 	__u32 sq_size_log2;


From rdreier at cisco.com  Mon Feb 12 11:58:13 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Mon, 12 Feb 2007 11:58:13 -0800
Subject: [openib-general] [PATCH][RFC] iw_cxgb3/2.6.21 - Don't use the
 physical address for mapping memory into userspace.
In-Reply-To: <1171308610.16167.69.camel@stevo-desktop> (Steve Wise's
	message of "Mon, 12 Feb 2007 13:30:10 -0600")
References: <1171308610.16167.69.camel@stevo-desktop>
Message-ID: <adalkj35esq.fsf@cisco.com>

Looks mostly sane (assuming it works on 32-bit userspace on 64-bit
kernel now), but:

 > -	context = kmalloc(sizeof(*context), GFP_KERNEL);
 > +	context = kzalloc(sizeof(*context), GFP_KERNEL);

Why do you need this?  Is this an unrelated change?

 - R.


From swise at opengridcomputing.com  Mon Feb 12 12:04:30 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Mon, 12 Feb 2007 14:04:30 -0600
Subject: [openib-general] [PATCH][RFC] iw_cxgb3/2.6.21 - Don't use the
 physical address for mapping memory into userspace.
In-Reply-To: <adalkj35esq.fsf@cisco.com>
References: <1171308610.16167.69.camel@stevo-desktop>
	<adalkj35esq.fsf@cisco.com>
Message-ID: <1171310670.16167.89.camel@stevo-desktop>

On Mon, 2007-02-12 at 11:58 -0800, Roland Dreier wrote:
> Looks mostly sane (assuming it works on 32-bit userspace on 64-bit
> kernel now), but:
> 
>  > -	context = kmalloc(sizeof(*context), GFP_KERNEL);
>  > +	context = kzalloc(sizeof(*context), GFP_KERNEL);
> 
> Why do you need this?  Is this an unrelated change?
> 

Because the key generator u32 is in the context now, and the kzalloc()
initializes it.  I could have done:  

context->key = 0;

But km -> kz was less typing. ;-)

Steve.


From rdreier at cisco.com  Mon Feb 12 12:08:15 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Mon, 12 Feb 2007 12:08:15 -0800
Subject: [openib-general] [PATCH][RFC] iw_cxgb3/2.6.21 - Don't use the
 physical address for mapping memory into userspace.
In-Reply-To: <1171310670.16167.89.camel@stevo-desktop> (Steve Wise's
	message of "Mon, 12 Feb 2007 14:04:30 -0600")
References: <1171308610.16167.69.camel@stevo-desktop>
	<adalkj35esq.fsf@cisco.com> <1171310670.16167.89.camel@stevo-desktop>
Message-ID: <aday7n33zrk.fsf@cisco.com>

 > Because the key generator u32 is in the context now, and the kzalloc()
 > initializes it.  I could have done:  
 > 
 > context->key = 0;
 > 
 > But km -> kz was less typing. ;-)

OK, got it.  Anyway as I said, from a quick read the changes look
sane, with the assumption that they work.


From swise at opengridcomputing.com  Mon Feb 12 12:19:36 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Mon, 12 Feb 2007 14:19:36 -0600
Subject: [openib-general] [PATCH][RFC] iw_cxgb3/2.6.21 - Don't use the
 physical address for mapping memory into userspace.
In-Reply-To: <aday7n33zrk.fsf@cisco.com>
References: <1171308610.16167.69.camel@stevo-desktop>
	<adalkj35esq.fsf@cisco.com> <1171310670.16167.89.camel@stevo-desktop>
	<aday7n33zrk.fsf@cisco.com>
Message-ID: <1171311576.16167.91.camel@stevo-desktop>

On Mon, 2007-02-12 at 12:08 -0800, Roland Dreier wrote:
>  > Because the key generator u32 is in the context now, and the kzalloc()
>  > initializes it.  I could have done:  
>  > 
>  > context->key = 0;
>  > 
>  > But km -> kz was less typing. ;-)
> 
> OK, got it.  Anyway as I said, from a quick read the changes look
> sane, with the assumption that they work.

I tested and it works.  Do you want to pull this in before you push the
driver upstream?  Do I need to repost it?


Thanks,

Steve.


From rdreier at cisco.com  Mon Feb 12 12:20:31 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Mon, 12 Feb 2007 12:20:31 -0800
Subject: [openib-general] [PATCH][RFC] iw_cxgb3/2.6.21 - Don't use the
 physical address for mapping memory into userspace.
In-Reply-To: <1171311576.16167.91.camel@stevo-desktop> (Steve Wise's
	message of "Mon, 12 Feb 2007 14:19:36 -0600")
References: <1171308610.16167.69.camel@stevo-desktop>
	<adalkj35esq.fsf@cisco.com> <1171310670.16167.89.camel@stevo-desktop>
	<aday7n33zrk.fsf@cisco.com> <1171311576.16167.91.camel@stevo-desktop>
Message-ID: <adafy9b3z74.fsf@cisco.com>

    Steve> I tested and it works.  Do you want to pull this in before
    Steve> you push the driver upstream?  Do I need to repost it?

I'll grab it and merge it in.  I expect to ask Linus to pull later
today.

 - R.


From rdreier at cisco.com  Mon Feb 12 12:23:29 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Mon, 12 Feb 2007 12:23:29 -0800
Subject: [openib-general] [PATCH][RFC] iw_cxgb3/2.6.21 - Don't use the
 physical address for mapping memory into userspace.
In-Reply-To: <1171311576.16167.91.camel@stevo-desktop> (Steve Wise's
	message of "Mon, 12 Feb 2007 14:19:36 -0600")
References: <1171308610.16167.69.camel@stevo-desktop>
	<adalkj35esq.fsf@cisco.com> <1171310670.16167.89.camel@stevo-desktop>
	<aday7n33zrk.fsf@cisco.com> <1171311576.16167.91.camel@stevo-desktop>
Message-ID: <adabqjz3z26.fsf@cisco.com>

Actually, that patch doesn't apply because of the "%llx" warning fixes
I pushed out.  And git-apply also complains about trailing
whitespace.  Can you resend a version that applies to the my
for-2.6.21 branch?

Thanks


From rdreier at cisco.com  Mon Feb 12 12:25:30 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Mon, 12 Feb 2007 12:25:30 -0800
Subject: [openib-general] [PATCH 3 of 4] IB/mthca: fix
 non-cache-coherent CPUs with memfree
In-Reply-To: <20070210211508.GD14903@mellanox.co.il> (Michael S.
	Tsirkin's message of "Sat, 10 Feb 2007 23:15:08 +0200")
References: <20070210211508.GD14903@mellanox.co.il>
Message-ID: <adasldb2ked.fsf@cisco.com>

 > +	sg_set_buf(mem, buf, PAGE_SIZE << order);
 > +	BUG_ON(mem->offset);
 > +	sg_dma_len(mem) = PAGE_SIZE << order;

What am I missing?  Any reason to set sg_dma_len() again after sg_set_buf()?


From jgunthorpe at obsidianresearch.com  Mon Feb 12 12:56:34 2007
From: jgunthorpe at obsidianresearch.com (Jason Gunthorpe)
Date: Mon, 12 Feb 2007 13:56:34 -0700
Subject: [openib-general] Problem is routing CM REQ
In-Reply-To: <45D0A27A.2010302@ichips.intel.com>
References: <20070210004820.GS11411@obsidianresearch.com>
	<000001c74cb8$5e80eef0$3cd4180a@amr.corp.intel.com>
	<20070211230935.GT11411@obsidianresearch.com>
	<45D0A27A.2010302@ichips.intel.com>
Message-ID: <20070212205634.GW11411@obsidianresearch.com>

On Mon, Feb 12, 2007 at 09:23:06AM -0800, Sean Hefty wrote:
> >Ah, I think I missed the key step in your scheme.. You plan to query
> >the local SM for SGID=remote DGID=local? (ie reversed from 'normal'. I
> >was thinking only about the SGID=local DGID=remote query direction)
> 
> I'm not sure that the query needs the GIDs reversed, as long as the path is 
> reversible.  So, the local query would be:
> 
> SGID=local, DGID=remote, reversible=1   (to SA)
> 
> And the remote query would be:
>
> SGID=local, DGID=remote, reversible=1,  (to SA')
> TClass & FlowLabel=from previous query response

1) What does the TClass and FlowLabel returned from SGID=local
   DGID=remote mean?
   Do you use it in the Node1 -> Node2 direction or the Node2 -> Node1 direction
   or both?
1a) If it is Node1 -> Node2 then the local SA has to query SA' to figure
    what FlowLabel to return.
1b) If it is for both directions then somehow SA, SA' and all four
    router ports need to agree on global flowlabels.
2) In the 2nd query, passing SGID=local, DGID=remote is 'reversed' 
   since SGID=local is the wrong subnet for SA'.
   I think defining this to mean something is risky.
2b) A PR query with TClass and FlowLabel present in the query is
    currently expected to return an answer with those fields matching.
    That implies #1b..

So, here is how I see this working..

- There is a single well known 'reversible' flowlabel. When a router
  processes a GRH with that flowlabel it produces a packet that
  has a SLID that is always the same, no matter what router port is
  used (A' or B' in my example). The LRH is also reversible according
  to the rules in IBA.

  A well known value side-steps the global information problem and
  allows the GRH to be reversible.
- Whenever a PR has reversible=1 the result returns the well known flowlabel.
  The router LID is always the single shared SLID.
- To get a more optimal path the following sequence of queries are used:
  to SA: SGID=Node1 DGID=Node2
   [In the background SA asks SA' what flow label to use]
  to SA': SGID=Node1 DGID=Node2 FlowLabel=(from above)
  to SA': SGID=Node2 DGID=Node1 SLID=(dlid from above)
   [In the background SA' asks SA what flow label to use]
  to SA: SGID=Node2 DGID=Node1 FlowLabel=(from above)

  It is almost guarenteed that the FlowLabel will be asymetric. This
  is to keep the flowlabel space local to each subnet.

  In the background quries SA and SA' also examine the global route
  topology to select an optimal no-spoof needed router LID. The
  background exchange is how the disambiguation problem with
  multiple-router path is solved.

Implicit in this are five IBA affecting things:
 - that PRs with SGID=non-local mean something specific
 - PRs with DGID=non-local cause the SA to communicate with the remote
   SA to learn the GRH's FlowLabel
   (except in the case where reversible=1)
 - clients can communicate with remote SA's
 - Routers do the SLID spoofing you outlined.
 - SA's and routers collaborate quite closely on how the
   router produces a LRH. In particular the SA controls the SLID
   spoofing

A new query type or maybe some kind of modified multi-path-record
query could be defined by IBA to reduce the 6 exchanges required to
something more efficient.

Does this match what you are thinking?

> >   SA                                                      SA'
> >Node1 --> (LID 1) Router A -------  Router A' (LID A) ---> Node2
> >      |-> (LID 2) Router A                              |
> >      |-> (LID 3) Router B -------  Router B' (LID B) --|
> >
> >Router A and Router B are independent redundant devices, not a route
> >cloud of some sort. B -> A' is not a possible path.
> 
> Since A' and B' connect to the same subnet, B -> A' should be a valid path.

Please don't dismiss this case as it is a simple case of a more
generalized problem. People will want to deploy primay and seconday
routers (like dual star switching) that don't intercommunicate for
reliability. The B -> A' path does not exist because the A and B
routers are seperate non-linked devices and not just 4 ports on one
large router. [A more general view would be a router ring architecture
where the clockwise and counterclockwise paths use different
hardware/cables]

There is alot of complex work in the router and SA side to make this
kind of topology work, but it is critical that the clients use path
queries that can provide enough data to the SA and return enough data
to the client to support this.

> >I can think of the following downsides:
> > 1) Re-reading Michael Krause's email makes me think that defeating
> >    the QP SLID check is contrary to the spirit of IBA
> 
> I don't think we need to defeat the QP SLID check if we want extra routing, 
> but having redundant routers use the same link layer address isn't 
> necessarily a bad thing.

Well, it is one and the same, the SLID is really only used in the QP
SLID check so changing it around only serves to defeat that check.

Jason


From tziporet at mellanox.co.il  Mon Feb 12 13:15:37 2007
From: tziporet at mellanox.co.il (Tziporet Koren)
Date: Mon, 12 Feb 2007 23:15:37 +0200
Subject: [openib-general] OFED 1.2 build problem
In-Reply-To: <1171292354.16167.9.camel@stevo-desktop>
References: <1171292354.16167.9.camel@stevo-desktop>
Message-ID: <45D0D8F9.9060908@mellanox.co.il>

Steve Wise wrote:
> Dunno if this has already been resolved?
>
> Building the 20070208-1508 OFED 1.2 kit.
> RHEL3U4 with that distro's kernel.
> Ran build.sh and selected "all".
>
>   
ipath drive does not have any backport patch. I hope they will have some 
today.

Tziporet


From krause at cup.hp.com  Mon Feb 12 13:06:27 2007
From: krause at cup.hp.com (Michael Krause)
Date: Mon, 12 Feb 2007 13:06:27 -0800
Subject: [openib-general] dapl broken for iWARP
In-Reply-To: <C98692FD98048C41885E0B0FACD9DFB803AFA730@exnane01.hq.netap p.com>
References: <C98692FD98048C41885E0B0FACD9DFB803AFA730@exnane01.hq.netapp.com>
Message-ID: <6.2.0.14.2.20070212130325.08f31f10@esmail.cup.hp.com>

At 07:29 AM 2/9/2007, Kanevsky, Arkady wrote:
>Mike,
>this is not a DAPL issue.
>There are 2 ways to deal with it.
>One is for all ULPs to use private data to exchange CM info.
>yes, some ULPs, like SDP do that in hello world message.
>
>Another is to let CM handle it.
>This way ULP does not have to deal with it.
>This is analogous to the IBTA CM IP addressing Annex.
>It ensure backwards compatibility and does not break any existing apps
>which use MPA as specified by IETF.
>
>No need to bother IETF until we have it working.

Given what it took to get MPA specified, I don't see changing the 
specification for this as likely welcomed by many.   The ULP used within 
the IETF are largely able to solve this problem at their login exchange so 
unless there is some ground swell of IETF ULP that can't solve it as these 
do, I think this may be a challenge to gain any traction.

Mike

>Thanks,
>
>Arkady Kanevsky                       email: arkady at netapp.com
>Network Appliance Inc.               phone: 781-768-5395
>1601 Trapelo Rd. - Suite 16.        Fax: 781-895-1195
>Waltham, MA 02451                   central phone: 781-768-5300
>
>
> > -----Original Message-----
> > From: Michael Krause [mailto:krause at cup.hp.com]
> > Sent: Thursday, February 08, 2007 4:27 PM
> > To: Kanevsky, Arkady; Steve Wise; Arlin Davis
> > Cc: openib-general
> > Subject: Re: [openib-general] dapl broken for iWARP
> >
> > At 07:43 AM 2/8/2007, Kanevsky, Arkady wrote:
> > >That is correct.
> > >I am working with Krishna on it.
> > >Expect patches soon.
> > >
> > >By the way the problem is not DAPL specific and so is a proposed
> > >solution.
> > >
> > >There are 3 aspects of the solution.
> > >One is APIs. We suggest that we do not augment these.
> > >That is a connection requestor sets its QP RDMA ORD and IRD.
> > >When connection is established user can check the QP RDMA
> > ORD and IRD
> > >to see what he has now to use over the connection.
> > >We may consider to extend QP attributes to support transport
> > specific
> > >parameters passing in the future.
> > >For example, iWARP MPA CRC request.
> > >
> > >Second is the semantic that CM provides.
> > >The proposal is to match IBCM semantic.
> > >That is CM guarantee that local IRD is >= remote ORD.
> > >This guarantees that incoming RDMA Read requests will not
> > overwhelm the
> > >QP RDMA Read capabilities.
> > >Again there is not changes to IBCM only to IWCM.
> > >Notice that as part of this IWCM will pass down to driver
> > and extract
> > >from driver needed info.
> > >
> > >The final part is iWARP CM extension to exchange RDMA ORD, IRD.
> > >This is similar to IBTA Annex for IP Addressing.
> > >The harder part that this will eventually require IETF MPA spec
> > >extension, and the fact that MPA protocol is implemented in
> > RNIC HW by
> > >many vendors, and hence can not be done by IWCM itself.
> >
> > We looked at this quite a bit during the creation of the
> > specification.   All of the targeted usage models exchange
> > this information
> > as part of their "hello" or login exchanges.    As such, the
> > "hum" was to
> > not change MPA to communicate such information and leave it
> > to software to
> > exchange these values through existing mechanisms.   I
> > seriously doubt
> > there will be much support for modifying the MPA
> > specification at this stage since the implementations are
> > largely complete and a modification would have to deal with
> > the legacy interoperability issue which likely would be
> > solved in software any way.  It would be simpler to simply
> > modify the underlying DAPL implementation to exchange the
> > information and keep this hidden from both the application
> > and the RNIC providers.
> >
> > Mike
> >
> >
> > >Thanks,
> > >
> > >Arkady Kanevsky                       email: arkady at netapp.com
> > >Network Appliance Inc.               phone: 781-768-5395
> > >1601 Trapelo Rd. - Suite 16.        Fax: 781-895-1195
> > >Waltham, MA 02451                   central phone: 781-768-5300
> > >
> > >
> > > > -----Original Message-----
> > > > From: Steve Wise [mailto:swise at opengridcomputing.com]
> > > > Sent: Wednesday, February 07, 2007 6:12 PM
> > > > To: Arlin Davis
> > > > Cc: openib-general
> > > > Subject: Re: [openib-general] dapl broken for iWARP
> > > >
> > > > On Wed, 2007-02-07 at 15:05 -0800, Arlin Davis wrote:
> > > > > Steve Wise wrote:
> > > > >
> > > > > >On Wed, 2007-02-07 at 14:02 -0600, Steve Wise wrote:
> > > > > >
> > > > > >
> > > > > >>Arlin,
> > > > > >>
> > > > > >>The OFED dapl code is assuming the responder_resources and
> > > > > >>initiator_depth passed up on a connection request event
> > > > are from the
> > > > > >>remote peer.  This doesn't happen for iWARP.  In the
> > > > current iWARP
> > > > > >>specifications, its up to the application to exchange this
> > > > > >>information somehow. So these are defaulting to 0 on the
> > > > server side
> > > > > >>of any dapl connection over iWARP.
> > > > > >>
> > > > > >>This is a fairly recent change, I think.  We need to
> > come up with
> > > > > >>some way to deal with this for OFED 1.2 IMO.
> > > > > >>
> > > > > >>
> > > > > Yes, this was changed recently to sync up with the
> > rdma_cm changes
> > > > > that exposed the values.
> > > > >
> > > > > >>
> > > > > >>
> > > > > >
> > > > > >The IWCM could set these to the device max values for instance.
> > > > > >
> > > > > >
> > > > > That would work fine as long as you know the remote
> > > > settings will be
> > > > > equal or better. The provider just sets the min of
> > local device max
> > > > > values and the remote values provided with the request.
> > > > >
> > > >
> > > > I know Krishna Kumar is working on a solution for exchanging
> > > > this info in private data so the IWCM can "do the right
> > > > thing".  Stay tuned for a patch series to review for this.
> > > > But this functionality is definitely post OFED-1.2.
> > > >
> > > >
> > > > So for the OFED-1.2, I will set these to the device max
> > in the IWCM.
> > > > Assuming the other side is OFED 1.2 DAPL, then it will work fine.
> > > >
> > > > Steve.
> > > >
> > > >
> > > >
> > > > _______________________________________________
> > > > openib-general mailing list
> > > > openib-general at openib.org
> > > > http://openib.org/mailman/listinfo/openib-general
> > > >
> > > > To unsubscribe, please visit
> > > > http://openib.org/mailman/listinfo/openib-general
> > > >
> > >
> > >_______________________________________________
> > >openib-general mailing list
> > >openib-general at openib.org
> > >http://openib.org/mailman/listinfo/openib-general
> > >
> > >To unsubscribe, please visit
> > >http://openib.org/mailman/listinfo/openib-general
> >
> >


From krause at cup.hp.com  Mon Feb 12 13:14:28 2007
From: krause at cup.hp.com (Michael Krause)
Date: Mon, 12 Feb 2007 13:14:28 -0800
Subject: [openib-general] Immediate data question
In-Reply-To: <309a667c0702112110h7a79961fv9c3cf46d4392e1d4@mail.gmail.co
 m>
References: <6C2C79E72C305246B504CBA17B5500C905DC04@mtlexch01.mtl.com>
	<adatzy0qmt3.fsf@cisco.com>
	<349DCDA352EACF42A0C49FA6DCEA840350B1B5@G3W0634.americas.hpqcorp.net>
	<ada7iuwp5rr.fsf@cisco.com>
	<349DCDA352EACF42A0C49FA6DCEA84035DF880@G3W0634.americas.hpqcorp.net>
	<adamz3pfym0.fsf@cisco.com>
	<349DCDA352EACF42A0C49FA6DCEA84035DFAFF@G3W0634.americas.hpqcorp.net>
	<adahctxeds8.fsf@cisco.com>
	<6.2.0.14.2.20070208131610.085c52f8@esmail.cup.hp.com>
	<349DCDA352EACF42A0C49FA6DCEA840362A87E@G3W0634.americas.hpqcorp.net>
	<309a667c0702112110h7a79961fv9c3cf46d4392e1d4@mail.gmail.com>
Message-ID: <6.2.0.14.2.20070212130704.09018a60@esmail.cup.hp.com>

At 09:10 PM 2/11/2007, Devesh Sharma wrote:
>On 2/10/07, Tang, Changqing <changquing.tang at hp.com> wrote:
>> > >
>> > >Not for the receiver, but the sender will be severely slowed down by
>> > >having to wait for the RNR timeouts.
>> >
>> > RNR = Receiver Not Ready so by definition, the data flow
>> > isn't going to
>> > progress until the receiver is ready to receive data.   If a
>> > receive QP
>> > enters RNR for a RC, then it is likely not progressing as
>> > desired.   RNR
>> > was initially put in place to enable a receiver to create
>> > back pressure to the sender without causing a fatal error
>> > condition.  It should rarely be entered and therefore should
>> > have negligible impact on overall performance however when a
>> > RNR occurs, no forward progress will occur so performance is
>> > essentially zero.
>>
>>Mike:
>>         I still do not quite understand this issue. I have two
>>situations that have RNR triggered.
>>
>>1. process A and process B is connected with QP. A first post a send to
>>B, B does not post receive. Then A and B are doing a long time
>>RDMA_WRITE each other, A and B just check memory for the RDMA_WRITE
>>message. Finally B will post a receive. Does the first pending send in A
>>block all the later RDMA_WRITE ?
>According to IBTA spec HCA will process WR entries in strict order in
>which they are posted so the send will block all WR posted after this
>send, Until-unless HCA has multiple processing elements, I think even
>then processing order will be maintained by HCA
>If not, since RNR is triggered

The source HCA is responsible for processing work requests in the order 
they are posted.   If the SEND cannot proceed and receives a RNR, then the 
subsequent RDMA Write should not proceed, i.e. the sequence numbers that 
define the valid window will not progress and given IB requires strong 
ordering within the fabric, nothing sent subsequently should be made 
visible at the sink HCA.   In your example, if A is sending a SEND followed 
by a RDMA Write, the first check should have been that B had provided an 
ACK with a credit indicating that a SEND is allowed.  If B subsequently 
removed access to the buffer that had to be posted to provide that credit, 
then it should trigger a RNR NAK and the subsequent RDMA Writes should not 
be visible at B since there is no an effective hole in the transmission stream.

>>periodically till B post receive, does it affect the RDMA_WRITE
>>performance between A and B ?
>>
>>2. extend above to three processes, A connect to B, B connect to C, so B
>>has two QPs, but one CQ. A posts a send to B, B does not post receive,
>>rather B and C are doing a long time RDMA_WRITE, or send/recv. But B
>>must sends RNR periodically to A, right?. So does the pending message
>>from A affects B's overall performance  between B and C ?

Neither IB nor iWARP provide any ordering guarantees between different data 
flows.  This is strictly under application control.  Hence, if a RNR NAK or 
whatever occurs on a RC between A and B, then it has no impact on what 
occurs between A and C or B and C.   It is simply outside the scope of 
either technology to address.

Mike


From swise at opengridcomputing.com  Mon Feb 12 13:15:53 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Mon, 12 Feb 2007 15:15:53 -0600
Subject: [openib-general] [PATCH][RFC] iw_cxgb3/2.6.21 - Don't use the
 physical address for mapping memory into userspace.
In-Reply-To: <adabqjz3z26.fsf@cisco.com>
References: <1171308610.16167.69.camel@stevo-desktop>
	<adalkj35esq.fsf@cisco.com> <1171310670.16167.89.camel@stevo-desktop>
	<aday7n33zrk.fsf@cisco.com> <1171311576.16167.91.camel@stevo-desktop>
	<adabqjz3z26.fsf@cisco.com>
Message-ID: <1171314953.16167.96.camel@stevo-desktop>

On Mon, 2007-02-12 at 12:23 -0800, Roland Dreier wrote:
> Actually, that patch doesn't apply because of the "%llx" warning fixes
> I pushed out.  And git-apply also complains about trailing
> whitespace.  Can you resend a version that applies to the my
> for-2.6.21 branch?
> 
> Thanks

Here it is...


Don't use the physical address for mapping memory into userspace.

From: Steve Wise <swise at opengridcomputing.com>

Currently iw_cxgb3 uses the physical address as the key/offset to return
to the user process for maping kernel memory into userspace.  The user
process then calls mmap() using this key as the offset.  Because the
physical address is 64 bits, this introduces a problem with 32-bit
userspace, which might not be able to pass an arbitrary 64-bit address
back into the kernel (since mmap2() is limited to a 32-bit number of
pages for the offset, which limits it to 44-bit addresses).

Change the mmap logic to use a u32 counter as the offset for mapping.

Signed-off-by: Steve Wise <swise at opengridcomputing.com>
---

 drivers/infiniband/hw/cxgb3/iwch_provider.c |   66 +++++++++++++++++----------
 drivers/infiniband/hw/cxgb3/iwch_provider.h |   14 +++---
 drivers/infiniband/hw/cxgb3/iwch_user.h     |    6 +-
 3 files changed, 52 insertions(+), 34 deletions(-)

diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.c b/drivers/infiniband/hw/cxgb3/iwch_provider.c
index 549de0a..2e05e94 100644
--- a/drivers/infiniband/hw/cxgb3/iwch_provider.c
+++ b/drivers/infiniband/hw/cxgb3/iwch_provider.c
@@ -115,7 +115,7 @@ static struct ib_ucontext *iwch_alloc_uc
 	struct iwch_dev *rhp = to_iwch_dev(ibdev);
 
 	PDBG("%s ibdev %p\n", __FUNCTION__, ibdev);
-	context = kmalloc(sizeof(*context), GFP_KERNEL);
+	context = kzalloc(sizeof(*context), GFP_KERNEL);
 	if (!context)
 		return ERR_PTR(-ENOMEM);
 	cxio_init_ucontext(&rhp->rdev, &context->uctx);
@@ -141,13 +141,14 @@ static int iwch_destroy_cq(struct ib_cq 
 }
 
 static struct ib_cq *iwch_create_cq(struct ib_device *ibdev, int entries,
-			     struct ib_ucontext *context,
+			     struct ib_ucontext *ib_context,
 			     struct ib_udata *udata)
 {
 	struct iwch_dev *rhp;
 	struct iwch_cq *chp;
 	struct iwch_create_cq_resp uresp;
 	struct iwch_create_cq_req ureq;
+	struct iwch_ucontext *ucontext = NULL;
 
 	PDBG("%s ib_dev %p entries %d\n", __FUNCTION__, ibdev, entries);
 	rhp = to_iwch_dev(ibdev);
@@ -155,12 +156,15 @@ static struct ib_cq *iwch_create_cq(stru
 	if (!chp)
 		return ERR_PTR(-ENOMEM);
 
-	if (context && !t3a_device(rhp)) {
-		if (ib_copy_from_udata(&ureq, udata, sizeof (ureq))) {
-			kfree(chp);
-			return ERR_PTR(-EFAULT);
+	if (ib_context) {
+		ucontext = to_iwch_ucontext(ib_context);
+		if (!t3a_device(rhp)) {
+			if (ib_copy_from_udata(&ureq, udata, sizeof (ureq))) {
+				kfree(chp);
+				return ERR_PTR(-EFAULT);
+			}
+			chp->user_rptr_addr = (u32 __user *)(unsigned long)ureq.user_rptr_addr;
 		}
-		chp->user_rptr_addr = (u32 __user *)(unsigned long)ureq.user_rptr_addr;
 	}
 
 	if (t3a_device(rhp)) {
@@ -190,7 +194,7 @@ static struct ib_cq *iwch_create_cq(stru
 	init_waitqueue_head(&chp->wait);
 	insert_handle(rhp, &rhp->cqidr, chp, chp->cq.cqid);
 
-	if (context) {
+	if (ucontext) {
 		struct iwch_mm_entry *mm;
 
 		mm = kmalloc(sizeof *mm, GFP_KERNEL);
@@ -200,16 +204,20 @@ static struct ib_cq *iwch_create_cq(stru
 		}
 		uresp.cqid = chp->cq.cqid;
 		uresp.size_log2 = chp->cq.size_log2;
-		uresp.physaddr = virt_to_phys(chp->cq.queue);
+		spin_lock(&ucontext->mmap_lock);
+		uresp.key = ucontext->key;
+		ucontext->key += PAGE_SIZE;
+		spin_unlock(&ucontext->mmap_lock);
 		if (ib_copy_to_udata(udata, &uresp, sizeof (uresp))) {
 			kfree(mm);
 			iwch_destroy_cq(&chp->ibcq);
 			return ERR_PTR(-EFAULT);
 		}
-		mm->addr = uresp.physaddr;
+		mm->key = uresp.key;
+		mm->addr = virt_to_phys(chp->cq.queue);
 		mm->len = PAGE_ALIGN((1UL << uresp.size_log2) *
 					     sizeof (struct t3_cqe));
-		insert_mmap(to_iwch_ucontext(context), mm);
+		insert_mmap(ucontext, mm);
 	}
 	PDBG("created cqid 0x%0x chp %p size 0x%0x, dma_addr 0x%0llx\n",
 	     chp->cq.cqid, chp, (1 << chp->cq.size_log2),
@@ -316,14 +324,14 @@ static int iwch_arm_cq(struct ib_cq *ibc
 static int iwch_mmap(struct ib_ucontext *context, struct vm_area_struct *vma)
 {
 	int len = vma->vm_end - vma->vm_start;
-	u64 pgaddr = vma->vm_pgoff << PAGE_SHIFT;
+	u32 key = vma->vm_pgoff << PAGE_SHIFT;
 	struct cxio_rdev *rdev_p;
 	int ret = 0;
 	struct iwch_mm_entry *mm;
 	struct iwch_ucontext *ucontext;
 
-	PDBG("%s off 0x%lx addr 0x%llx len %d\n", __FUNCTION__, vma->vm_pgoff,
-	     (unsigned long long) pgaddr, len);
+	PDBG("%s pgoff 0x%lx key 0x%x len %d\n", __FUNCTION__, vma->vm_pgoff,
+	     key, len);
 
 	if (vma->vm_start & (PAGE_SIZE-1)) {
 	        return -EINVAL;
@@ -332,13 +340,13 @@ static int iwch_mmap(struct ib_ucontext 
 	rdev_p = &(to_iwch_dev(context->device)->rdev);
 	ucontext = to_iwch_ucontext(context);
 
-	mm = remove_mmap(ucontext, pgaddr, len);
+	mm = remove_mmap(ucontext, key, len);
 	if (!mm)
 		return -EINVAL;
 	kfree(mm);
 
-	if ((pgaddr >= rdev_p->rnic_info.udbell_physbase) &&
-	    (pgaddr < (rdev_p->rnic_info.udbell_physbase +
+	if ((mm->addr >= rdev_p->rnic_info.udbell_physbase) &&
+	    (mm->addr < (rdev_p->rnic_info.udbell_physbase +
 		       rdev_p->rnic_info.udbell_len))) {
 
 		/*
@@ -351,15 +359,17 @@ static int iwch_mmap(struct ib_ucontext 
 		vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
 		vma->vm_flags |= VM_DONTCOPY | VM_DONTEXPAND;
 		vma->vm_flags &= ~VM_MAYREAD;
-		ret = io_remap_pfn_range(vma, vma->vm_start, vma->vm_pgoff,
-				       len, vma->vm_page_prot);
+		ret = io_remap_pfn_range(vma, vma->vm_start,
+					 mm->addr >> PAGE_SHIFT,
+				         len, vma->vm_page_prot);
 	} else {
 
 		/*
 		 * Map WQ or CQ contig dma memory...
 		 */
-		ret = remap_pfn_range(vma, vma->vm_start, vma->vm_pgoff,
-				       len, vma->vm_page_prot);
+		ret = remap_pfn_range(vma, vma->vm_start,
+				      mm->addr >> PAGE_SHIFT,
+				      len, vma->vm_page_prot);
 	}
 
 	return ret;
@@ -853,18 +863,24 @@ static struct ib_qp *iwch_create_qp(stru
 		uresp.size_log2 = qhp->wq.size_log2;
 		uresp.sq_size_log2 = qhp->wq.sq_size_log2;
 		uresp.rq_size_log2 = qhp->wq.rq_size_log2;
-		uresp.physaddr = virt_to_phys(qhp->wq.queue);
-		uresp.doorbell = qhp->wq.udb;
+		spin_lock(&ucontext->mmap_lock);
+		uresp.key = ucontext->key;
+		ucontext->key += PAGE_SIZE;
+		uresp.db_key = ucontext->key;
+		ucontext->key += PAGE_SIZE;
+		spin_unlock(&ucontext->mmap_lock);
 		if (ib_copy_to_udata(udata, &uresp, sizeof (uresp))) {
 			kfree(mm1);
 			kfree(mm2);
 			iwch_destroy_qp(&qhp->ibqp);
 			return ERR_PTR(-EFAULT);
 		}
-		mm1->addr = uresp.physaddr;
+		mm1->key = uresp.key;
+		mm1->addr = virt_to_phys(qhp->wq.queue);
 		mm1->len = PAGE_ALIGN(wqsize * sizeof (union t3_wr));
 		insert_mmap(ucontext, mm1);
-		mm2->addr = uresp.doorbell & PAGE_MASK;
+		mm2->key = uresp.db_key;
+		mm2->addr = qhp->wq.udb & PAGE_MASK;
 		mm2->len = PAGE_SIZE;
 		insert_mmap(ucontext, mm2);
 	}
diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.h b/drivers/infiniband/hw/cxgb3/iwch_provider.h
index 5680d82..61e3278 100644
--- a/drivers/infiniband/hw/cxgb3/iwch_provider.h
+++ b/drivers/infiniband/hw/cxgb3/iwch_provider.h
@@ -184,6 +184,7 @@ struct ib_qp *iwch_get_qp(struct ib_devi
 struct iwch_ucontext {
 	struct ib_ucontext ibucontext;
 	struct cxio_ucontext uctx;
+	u32 key;
 	spinlock_t mmap_lock;
 	struct list_head mmaps;
 };
@@ -196,11 +197,12 @@ static inline struct iwch_ucontext *to_i
 struct iwch_mm_entry {
 	struct list_head entry;
 	u64 addr;
+	u32 key;
 	unsigned len;
 };
 
 static inline struct iwch_mm_entry *remove_mmap(struct iwch_ucontext *ucontext,
-						u64 addr, unsigned len)
+						u32 key, unsigned len)
 {
 	struct list_head *pos, *nxt;
 	struct iwch_mm_entry *mm;
@@ -209,11 +211,11 @@ static inline struct iwch_mm_entry *remo
 	list_for_each_safe(pos, nxt, &ucontext->mmaps) {
 
 		mm = list_entry(pos, struct iwch_mm_entry, entry);
-		if (mm->addr == addr && mm->len == len) {
+		if (mm->key == key && mm->len == len) {
 			list_del_init(&mm->entry);
 			spin_unlock(&ucontext->mmap_lock);
-			PDBG("%s addr 0x%llx len %d\n", __FUNCTION__,
-			     (unsigned long long) mm->addr, mm->len);
+			PDBG("%s key 0x%x addr 0x%llx len %d\n", __FUNCTION__,
+			     key, (unsigned long long) mm->addr, mm->len);
 			return mm;
 		}
 	}
@@ -225,8 +227,8 @@ static inline void insert_mmap(struct iw
 			       struct iwch_mm_entry *mm)
 {
 	spin_lock(&ucontext->mmap_lock);
-	PDBG("%s addr 0x%llx len %d\n", __FUNCTION__,
-	     (unsigned long long) mm->addr, mm->len);
+	PDBG("%s key 0x%x addr 0x%llx len %d\n", __FUNCTION__,
+	     mm->key, (unsigned long long) mm->addr, mm->len);
 	list_add_tail(&mm->entry, &ucontext->mmaps);
 	spin_unlock(&ucontext->mmap_lock);
 }
diff --git a/drivers/infiniband/hw/cxgb3/iwch_user.h b/drivers/infiniband/hw/cxgb3/iwch_user.h
index 14e1517..c4e7fbe 100644
--- a/drivers/infiniband/hw/cxgb3/iwch_user.h
+++ b/drivers/infiniband/hw/cxgb3/iwch_user.h
@@ -47,14 +47,14 @@ struct iwch_create_cq_req {
 };
 
 struct iwch_create_cq_resp {
-	__u64 physaddr;
+	__u64 key;
 	__u32 cqid;
 	__u32 size_log2;
 };
 
 struct iwch_create_qp_resp {
-	__u64 physaddr;
-	__u64 doorbell;
+	__u64 key;
+	__u64 db_key;
 	__u32 qpid;
 	__u32 size_log2;
 	__u32 sq_size_log2;


From tziporet at mellanox.co.il  Mon Feb 12 13:14:01 2007
From: tziporet at mellanox.co.il (Tziporet Koren)
Date: Mon, 12 Feb 2007 23:14:01 +0200
Subject: [openib-general] OFED 1.2 components list - for the meeting
 today
In-Reply-To: <1171307245.31538.434613.camel@hal.voltaire.com>
References: <BAE9DCEF64577A439B3A37F36F9B691C01B288A3@orsmsx418.amr.corp.intel.com>
	<1171307245.31538.434613.camel@hal.voltaire.com>
Message-ID: <45D0D899.4000505@mellanox.co.il>

Hal Rosenstock wrote:
> On Mon, 2007-02-12 at 12:58, Woodruff, Robert J wrote:
>   
>> BTW.
>>
>> Is the ibdiagui code going to be part of this release. 
>> I did not see it in the list below or is it just part of 
>> the openib-diags ?
>>     
>
> It's part of ibutils.
>   
And already part of OFED 1.2
>> I thought that we discussed this as an OFED 1.2 feature.
>> I have someone that is interested in trying it out.
>>     
You can try it now.

Tziporet


From rdreier at cisco.com  Mon Feb 12 13:43:31 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Mon, 12 Feb 2007 13:43:31 -0800
Subject: [openib-general] [PATCH][RFC] iw_cxgb3/2.6.21 - Don't use the
 physical address for mapping memory into userspace.
In-Reply-To: <1171314953.16167.96.camel@stevo-desktop> (Steve Wise's
	message of "Mon, 12 Feb 2007 15:15:53 -0600")
References: <1171308610.16167.69.camel@stevo-desktop>
	<adalkj35esq.fsf@cisco.com> <1171310670.16167.89.camel@stevo-desktop>
	<aday7n33zrk.fsf@cisco.com> <1171311576.16167.91.camel@stevo-desktop>
	<adabqjz3z26.fsf@cisco.com> <1171314953.16167.96.camel@stevo-desktop>
Message-ID: <adaabzj2gsc.fsf@cisco.com>

OK, merged into for-2.6.21 and pushed out.


From halr at voltaire.com  Mon Feb 12 14:40:15 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 12 Feb 2007 17:40:15 -0500
Subject: [openib-general] patches to 2.6.19.1 kernel for switch Operation
In-Reply-To: <04ba01c74eb1$e77fd180$1914a8c0@surioffice>
References: <000601c7419f$d4470c60$ff0da8c0@amr.corp.intel.com>
	<1170072757.4555.242192.camel@hal.voltaire.com>
	<039701c7494b$6bd5d860$1914a8c0@surioffice>
	<1171050441.31538.180858.camel@hal.voltaire.com>
	<048101c74c91$e0f54dd0$1914a8c0@surioffice>
	<1171288297.31538.417657.camel@hal.voltaire.com>
	<04ba01c74eb1$e77fd180$1914a8c0@surioffice>
Message-ID: <1171319946.31538.446427.camel@hal.voltaire.com>

Suri,

On Mon, 2007-02-12 at 09:27, Suresh Shelvapille wrote:
> Hal:
> 
> > > Ref: comment on mad.c (ib_mad_recv_done_handler().
> > >
> > > Even if I make the relevant changes to smi.c functions how do I get the
> > > packet to get forwarded, without making additional changes in this
> > function?
> > >
> > > Meaning, when smi_handle_dr_smp_send(),smi_check_forward_dr_smp() are
> > called
> > > and you determine that the packet has to be forwarded instead of
> > consuming
> > > where do you actually do the send? I think this chain is missing!
> > 
> > My initial thought was what I wrote but in looking at this further, as
> > you point out, the SMI routines are only updating the packet and
> > indicating its disposition. The actual sending needs to be elsewhere.
> > I'm not sure what the code ends up looking like with the changes
> > suggested and would just like this to look as clean as possible and use
> > the SMI routines where appropriate here. Does this make sense ?
> > 
> I am not sure I follow this last statement. 

I was trying to say that the send needs to be elsewhere from the SMI
code for the forward case so it may go in the routine where you placed
it. I was also trying to say that I'm not 100% sure what this could look
like until the other changes described are made so this may take twp
more iterations rather than one. Is that any clearer ?

-- Hal


From mshefty at ichips.intel.com  Mon Feb 12 14:47:42 2007
From: mshefty at ichips.intel.com (Sean Hefty)
Date: Mon, 12 Feb 2007 14:47:42 -0800
Subject: [openib-general] Problem is routing CM REQ
In-Reply-To: <20070212205634.GW11411@obsidianresearch.com>
References: <20070210004820.GS11411@obsidianresearch.com>
	<000001c74cb8$5e80eef0$3cd4180a@amr.corp.intel.com>
	<20070211230935.GT11411@obsidianresearch.com>
	<45D0A27A.2010302@ichips.intel.com>
	<20070212205634.GW11411@obsidianresearch.com>
Message-ID: <45D0EE8E.4030906@ichips.intel.com>

> 1) What does the TClass and FlowLabel returned from SGID=local
>    DGID=remote mean?
>    Do you use it in the Node1 -> Node2 direction or the Node2 -> Node1 direction
>    or both?

Maybe it would help if we can agree on a set of expectations.  These are what I 
am thinking:

1. An SA should be able to respond to a valid PR query if at least one of the 
GIDs in the path record is local.

2. The LIDs in a PR are relative to the SA's subnet that returned the record.

3. An IB router should not failover transparently to QPs sending traffic through 
that router.

4. A PR from the local SA with reversible=1 indicates that data sent from the 
remote GID to the local GID using the PR TC and FL will route locally using the 
specified LID pair.  This holds whether the PR SGID is local or remote.

5. A PR from a remote SA with reversible=1 indicates that data sent from the 
local GID to the remote GID using the PR TC and FL will route remotely using the 
specified LID pair.  This holds whether the PR SGID is local or remote.

6. A PR with reversible=0 is relative to SA's subnet.  The SGID->DGID data flow 
over the PR TC and FL indicates the SLID->DLID mapping for that subnet.

Do your expectations differ from these?

The use of reversible between subnets is what's concerning me.  It may be that 
an SA could not return any paths as reversible between two subnets without using 
some trick like what you mentioned.

These add a requirement on the SA that they must be aware of the routes packets 
take between two GIDs using a given TC and FL, but I don't believe that this 
necessarily forces SA to SA communication.  The SA may only need to exchange 
information with a router...?

> Implicit in this are five IBA affecting things:
>  - that PRs with SGID=non-local mean something specific

I don't think that we're changing any of the meanings of the fields though.

>  - Routers do the SLID spoofing you outlined.

I'm not sure this is something that we do want now.  APM should really handle 
path failover.

> There is alot of complex work in the router and SA side to make this
> kind of topology work, but it is critical that the clients use path
> queries that can provide enough data to the SA and return enough data
> to the client to support this.

I'm still deciding if the existing path record attribute is sufficient.

- Sean


From rdreier at cisco.com  Mon Feb 12 15:08:53 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Mon, 12 Feb 2007 15:08:53 -0800
Subject: [openib-general] [PATCH 3 of 4] IB/mthca: fix
 non-cache-coherent CPUs with memfree
In-Reply-To: <20070210211508.GD14903@mellanox.co.il> (Michael S.
	Tsirkin's message of "Sat, 10 Feb 2007 23:15:08 +0200")
References: <20070210211508.GD14903@mellanox.co.il>
Message-ID: <adahctrrn22.fsf@cisco.com>

Queued for 2.6.21, although I think a further cleanup would be:

 >  	mdev->mr_table.mpt_table = mthca_alloc_icm_table(mdev, init_hca->mpt_base,
 >  							 dev_lim->mpt_entry_sz,
 >  							 mdev->limits.num_mpts,
 > -							 mdev->limits.reserved_mrws, 1);
 > +							 mdev->limits.reserved_mrws,
 > +							 1, 1);

instead of having use_lowmem and use_coherent be separate parameters,
we should probably convert it to a type parameter, and have
MTHCA_ICM_TABLE_HIGHMEM, _LOWMEM and _COHERENT.  That would make these
calls a lot easier to read and get correct.

 - R.


From swise at opengridcomputing.com  Mon Feb 12 15:24:07 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Mon, 12 Feb 2007 17:24:07 -0600
Subject: [openib-general] issues with compilation of ofed 1.2
In-Reply-To: <1170973693.19297.2.camel@firewall.xsintricity.com>
References: <45C9EE31.2040602@voltaire.com>
	<6a122cc00702072302s18c1c4b7i3f1e4a1b3f3d0381@mail.gmail.com>
	<1170973693.19297.2.camel@firewall.xsintricity.com>
Message-ID: <1171322647.28500.41.camel@stevo-desktop>

I still get this error building  on rhel5b2 with the latest from the ofa
git trees:

ERROR: The sysfsutils-devel package is required to build libibverbs_devel RPM
[root at vic12 OFED-1.2-stevo]# rpm -qa|grep sysfs
libsysfs-2.0.0-6
libsysfs-devel-2.0.0-6
libsysfs-2.0.0-6
sysfsutils-2.0.0-6
libsysfs-devel-2.0.0-6


I installed all the sysfs rpms I could find.  So is there some
dependency problem here in the OFED build script that is looking for the
wrong rpm in rhel5?

Is there a bug to track this issue?

Steve.


On Thu, 2007-02-08 at 17:28 -0500, Doug Ledford wrote:
> On Thu, 2007-02-08 at 09:02 +0200, Moni Levy wrote:
> > Doug,
> > On 2/7/07, Yosef Etigin <yosefe at voltaire.com> wrote:
> > > 7. On RHAS5 beta 2, the setup requires sysfstuils-devel RPM which is not included in this distro.
> > 
> > Can you please help us with that ?
> 
> The value of the sysfsutils is far overshadowed by the value of libsysfs
> (and libsysfs is far more commonly used).  So, in RHEL5, the rpm package
> names reflect this:
> 
> libsysfs
> sysfsutils (I think, might be libsysfs-utils)
> libsysfs-devel
> 
> It's all still there, just a different name.
> 
> > -- Moni
> > 
> > >
> > > --
> > > Yosef Etigin
> > > Alex Tabachnik
> > >
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


From krause at cup.hp.com  Mon Feb 12 15:31:15 2007
From: krause at cup.hp.com (Michael Krause)
Date: Mon, 12 Feb 2007 15:31:15 -0800
Subject: [openib-general] Problem is routing CM REQ
In-Reply-To: <20070212205634.GW11411@obsidianresearch.com>
References: <20070210004820.GS11411@obsidianresearch.com>
	<000001c74cb8$5e80eef0$3cd4180a@amr.corp.intel.com>
	<20070211230935.GT11411@obsidianresearch.com>
	<45D0A27A.2010302@ichips.intel.com>
	<20070212205634.GW11411@obsidianresearch.com>
Message-ID: <6.2.0.14.2.20070212152343.08e8ca48@esmail.cup.hp.com>

At 12:56 PM 2/12/2007, Jason Gunthorpe wrote:
>On Mon, Feb 12, 2007 at 09:23:06AM -0800, Sean Hefty wrote:
> > >Ah, I think I missed the key step in your scheme.. You plan to query
> > >the local SM for SGID=remote DGID=local? (ie reversed from 'normal'. I
> > >was thinking only about the SGID=local DGID=remote query direction)
> >
> > I'm not sure that the query needs the GIDs reversed, as long as the 
> path is
> > reversible.  So, the local query would be:
> >
> > SGID=local, DGID=remote, reversible=1   (to SA)
> >
> > And the remote query would be:
> >
> > SGID=local, DGID=remote, reversible=1,  (to SA')
> > TClass & FlowLabel=from previous query response
>
>1) What does the TClass and FlowLabel returned from SGID=local
>    DGID=remote mean?
>    Do you use it in the Node1 -> Node2 direction or the Node2 -> Node1 
> direction
>    or both?
>1a) If it is Node1 -> Node2 then the local SA has to query SA' to figure
>     what FlowLabel to return.
>1b) If it is for both directions then somehow SA, SA' and all four
>     router ports need to agree on global flowlabels.
>2) In the 2nd query, passing SGID=local, DGID=remote is 'reversed'
>    since SGID=local is the wrong subnet for SA'.
>    I think defining this to mean something is risky.
>2b) A PR query with TClass and FlowLabel present in the query is
>     currently expected to return an answer with those fields matching.
>     That implies #1b..

TClass is intended to communicate the end-to-end QoS desired.   TClass is 
then mapped to a SL that is local to each subnet.   A flow label is 
intended to much the same as in the IP world and is left, in essence, to 
routers to manage.    An endnode look up should be to find the address 
vector to the remote.   A look up may return multiple vectors.   The SLID 
would correspond to each local subnet router port that acts as a first-hop 
destination to the remote subnet.    I don't see why the router protocol 
would not simply enable all paths on the local subnet to a given remote 
subnet be acquired.  All of the work is kept local to the SA / SM in the 
source subnet when determining a remote path to take.   Why is there any 
need to define more than just this?  Define a router protocol to 
communicate the each subnet's prefix, TClass, etc. and apply KISS.   A 
management entity that wanted to manage out each subnet provides router 
management in terms of route selection, etc. can be constructed by using 
the existing protocols / tools combined with a new router protocol which 
only does DGID to next hop SLID mapping.

Mike


>So, here is how I see this working..
>
>- There is a single well known 'reversible' flowlabel. When a router
>   processes a GRH with that flowlabel it produces a packet that
>   has a SLID that is always the same, no matter what router port is
>   used (A' or B' in my example). The LRH is also reversible according
>   to the rules in IBA.
>
>   A well known value side-steps the global information problem and
>   allows the GRH to be reversible.
>- Whenever a PR has reversible=1 the result returns the well known flowlabel.
>   The router LID is always the single shared SLID.
>- To get a more optimal path the following sequence of queries are used:
>   to SA: SGID=Node1 DGID=Node2
>    [In the background SA asks SA' what flow label to use]
>   to SA': SGID=Node1 DGID=Node2 FlowLabel=(from above)
>   to SA': SGID=Node2 DGID=Node1 SLID=(dlid from above)
>    [In the background SA' asks SA what flow label to use]
>   to SA: SGID=Node2 DGID=Node1 FlowLabel=(from above)
>
>   It is almost guarenteed that the FlowLabel will be asymetric. This
>   is to keep the flowlabel space local to each subnet.
>
>   In the background quries SA and SA' also examine the global route
>   topology to select an optimal no-spoof needed router LID. The
>   background exchange is how the disambiguation problem with
>   multiple-router path is solved.
>
>Implicit in this are five IBA affecting things:
>  - that PRs with SGID=non-local mean something specific
>  - PRs with DGID=non-local cause the SA to communicate with the remote
>    SA to learn the GRH's FlowLabel
>    (except in the case where reversible=1)
>  - clients can communicate with remote SA's
>  - Routers do the SLID spoofing you outlined.
>  - SA's and routers collaborate quite closely on how the
>    router produces a LRH. In particular the SA controls the SLID
>    spoofing
>
>A new query type or maybe some kind of modified multi-path-record
>query could be defined by IBA to reduce the 6 exchanges required to
>something more efficient.
>
>Does this match what you are thinking?
>
> > >   SA                                                      SA'
> > >Node1 --> (LID 1) Router A -------  Router A' (LID A) ---> Node2
> > >      |-> (LID 2) Router A                              |
> > >      |-> (LID 3) Router B -------  Router B' (LID B) --|
> > >
> > >Router A and Router B are independent redundant devices, not a route
> > >cloud of some sort. B -> A' is not a possible path.
> >
> > Since A' and B' connect to the same subnet, B -> A' should be a valid path.
>
>Please don't dismiss this case as it is a simple case of a more
>generalized problem. People will want to deploy primay and seconday
>routers (like dual star switching) that don't intercommunicate for
>reliability. The B -> A' path does not exist because the A and B
>routers are seperate non-linked devices and not just 4 ports on one
>large router. [A more general view would be a router ring architecture
>where the clockwise and counterclockwise paths use different
>hardware/cables]
>
>There is alot of complex work in the router and SA side to make this
>kind of topology work, but it is critical that the clients use path
>queries that can provide enough data to the SA and return enough data
>to the client to support this.
>
> > >I can think of the following downsides:
> > > 1) Re-reading Michael Krause's email makes me think that defeating
> > >    the QP SLID check is contrary to the spirit of IBA
> >
> > I don't think we need to defeat the QP SLID check if we want extra 
> routing,
> > but having redundant routers use the same link layer address isn't
> > necessarily a bad thing.
>
>Well, it is one and the same, the SLID is really only used in the QP
>SLID check so changing it around only serves to defeat that check.
>
>Jason
>
>_______________________________________________
>openib-general mailing list
>openib-general at openib.org
>http://openib.org/mailman/listinfo/openib-general
>
>To unsubscribe, please visit 
>http://openib.org/mailman/listinfo/openib-general


From swise at opengridcomputing.com  Mon Feb 12 15:35:54 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Mon, 12 Feb 2007 17:35:54 -0600
Subject: [openib-general] issues with compilation of ofed 1.2
In-Reply-To: <1171322647.28500.41.camel@stevo-desktop>
References: <45C9EE31.2040602@voltaire.com>
	<6a122cc00702072302s18c1c4b7i3f1e4a1b3f3d0381@mail.gmail.com>
	<1170973693.19297.2.camel@firewall.xsintricity.com>
	<1171322647.28500.41.camel@stevo-desktop>
Message-ID: <1171323354.28500.48.camel@stevo-desktop>

Ok, I hacked around this by changing the build_env.sh.  

But I think build_env.sh will have to distinguish between rhel5 and all
other redhat releases so it can correctly set the prerequisite rpms...

Steve.

On Mon, 2007-02-12 at 17:24 -0600, Steve Wise wrote:
> I still get this error building  on rhel5b2 with the latest from the ofa
> git trees:
> 
> ERROR: The sysfsutils-devel package is required to build libibverbs_devel RPM
> [root at vic12 OFED-1.2-stevo]# rpm -qa|grep sysfs
> libsysfs-2.0.0-6
> libsysfs-devel-2.0.0-6
> libsysfs-2.0.0-6
> sysfsutils-2.0.0-6
> libsysfs-devel-2.0.0-6
> 
> 
> I installed all the sysfs rpms I could find.  So is there some
> dependency problem here in the OFED build script that is looking for the
> wrong rpm in rhel5?
> 
> Is there a bug to track this issue?
> 
> Steve.
> 
> 
> On Thu, 2007-02-08 at 17:28 -0500, Doug Ledford wrote:
> > On Thu, 2007-02-08 at 09:02 +0200, Moni Levy wrote:
> > > Doug,
> > > On 2/7/07, Yosef Etigin <yosefe at voltaire.com> wrote:
> > > > 7. On RHAS5 beta 2, the setup requires sysfstuils-devel RPM which is not included in this distro.
> > > 
> > > Can you please help us with that ?
> > 
> > The value of the sysfsutils is far overshadowed by the value of libsysfs
> > (and libsysfs is far more commonly used).  So, in RHEL5, the rpm package
> > names reflect this:
> > 
> > libsysfs
> > sysfsutils (I think, might be libsysfs-utils)
> > libsysfs-devel
> > 
> > It's all still there, just a different name.
> > 
> > > -- Moni
> > > 
> > > >
> > > > --
> > > > Yosef Etigin
> > > > Alex Tabachnik
> > > >
> > _______________________________________________
> > openib-general mailing list
> > openib-general at openib.org
> > http://openib.org/mailman/listinfo/openib-general
> > 
> > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
> 
> 
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
> 


From mshefty at ichips.intel.com  Mon Feb 12 15:48:24 2007
From: mshefty at ichips.intel.com (Sean Hefty)
Date: Mon, 12 Feb 2007 15:48:24 -0800
Subject: [openib-general] Problem is routing CM REQ
In-Reply-To: <6.2.0.14.2.20070212152343.08e8ca48@esmail.cup.hp.com>
References: <20070210004820.GS11411@obsidianresearch.com>
	<000001c74cb8$5e80eef0$3cd4180a@amr.corp.intel.com>
	<20070211230935.GT11411@obsidianresearch.com>
	<45D0A27A.2010302@ichips.intel.com>
	<20070212205634.GW11411@obsidianresearch.com>
	<6.2.0.14.2.20070212152343.08e8ca48@esmail.cup.hp.com>
Message-ID: <45D0FCC8.4090304@ichips.intel.com>

> An endnode look up should be to find the address 
> vector to the remote.   A look up may return multiple vectors.   The 
> SLID would correspond to each local subnet router port that acts as a 
> first-hop destination to the remote subnet.    I don't see why the 
> router protocol would not simply enable all paths on the local subnet to 
> a given remote subnet be acquired.  All of the work is kept local to the 
> SA / SM in the source subnet when determining a remote path to take.   
> Why is there any need to define more than just this?

For an RC QP, we need at least two sets of LIDs.  In the simplest case, we need 
the SLID/router DLID for the local subnet, and the router SLID/DLID for the 
remote subnet.  The problem is in obtaining the SLID/DLID for the remote subnet.

- Sean


From krause at cup.hp.com  Mon Feb 12 15:35:45 2007
From: krause at cup.hp.com (Michael Krause)
Date: Mon, 12 Feb 2007 15:35:45 -0800
Subject: [openib-general] Problem is routing CM REQ
In-Reply-To: <45D0EE8E.4030906@ichips.intel.com>
References: <20070210004820.GS11411@obsidianresearch.com>
	<000001c74cb8$5e80eef0$3cd4180a@amr.corp.intel.com>
	<20070211230935.GT11411@obsidianresearch.com>
	<45D0A27A.2010302@ichips.intel.com>
	<20070212205634.GW11411@obsidianresearch.com>
	<45D0EE8E.4030906@ichips.intel.com>
Message-ID: <6.2.0.14.2.20070212153253.08e8c7b8@esmail.cup.hp.com>

At 02:47 PM 2/12/2007, Sean Hefty wrote:
> > 1) What does the TClass and FlowLabel returned from SGID=local
> >    DGID=remote mean?
> >    Do you use it in the Node1 -> Node2 direction or the Node2 -> Node1 
> direction
> >    or both?
>
>Maybe it would help if we can agree on a set of expectations.  These are 
>what I
>am thinking:
>
>1. An SA should be able to respond to a valid PR query if at least one of the
>GIDs in the path record is local.
>
>2. The LIDs in a PR are relative to the SA's subnet that returned the record.
>
>3. An IB router should not failover transparently to QPs sending traffic 
>through
>that router.

There is no reason for such a restriction.  APM can work with routers and 
the IB protocol will recover from any out of order packet processing just fine.


>4. A PR from the local SA with reversible=1 indicates that data sent from the
>remote GID to the local GID using the PR TC and FL will route locally 
>using the
>specified LID pair.  This holds whether the PR SGID is local or remote.
>
>5. A PR from a remote SA with reversible=1 indicates that data sent from the
>local GID to the remote GID using the PR TC and FL will route remotely 
>using the
>specified LID pair.  This holds whether the PR SGID is local or remote.
>
>6. A PR with reversible=0 is relative to SA's subnet.  The SGID->DGID data 
>flow
>over the PR TC and FL indicates the SLID->DLID mapping for that subnet.
>
>Do your expectations differ from these?
>
>The use of reversible between subnets is what's concerning me.  It may be 
>that
>an SA could not return any paths as reversible between two subnets without 
>using
>some trick like what you mentioned.
>
>These add a requirement on the SA that they must be aware of the routes 
>packets
>take between two GIDs using a given TC and FL, but I don't believe that this
>necessarily forces SA to SA communication.  The SA may only need to exchange
>information with a router...?

It should not force SA to SA communication.   Such communication is overly 
complex and will be a major issue to control and manage in the end. 
Further, security concerns, partition management, etc. start to complex 
enough as it is without adding more fuel to the fire.

> > Implicit in this are five IBA affecting things:
> >  - that PRs with SGID=non-local mean something specific
>
>I don't think that we're changing any of the meanings of the fields though.
>
> >  - Routers do the SLID spoofing you outlined.
>
>I'm not sure this is something that we do want now.  APM should really handle
>path failover.
>
> > There is alot of complex work in the router and SA side to make this
> > kind of topology work, but it is critical that the clients use path
> > queries that can provide enough data to the SA and return enough data
> > to the client to support this.
>
>I'm still deciding if the existing path record attribute is sufficient.

Our original IB router work I believe drove some of what is in the current 
records so I suspect they are fine as is.

Mike 


From jgunthorpe at obsidianresearch.com  Mon Feb 12 15:54:04 2007
From: jgunthorpe at obsidianresearch.com (Jason Gunthorpe)
Date: Mon, 12 Feb 2007 16:54:04 -0700
Subject: [openib-general] Problem is routing CM REQ
In-Reply-To: <45D0EE8E.4030906@ichips.intel.com>
References: <20070210004820.GS11411@obsidianresearch.com>
	<000001c74cb8$5e80eef0$3cd4180a@amr.corp.intel.com>
	<20070211230935.GT11411@obsidianresearch.com>
	<45D0A27A.2010302@ichips.intel.com>
	<20070212205634.GW11411@obsidianresearch.com>
	<45D0EE8E.4030906@ichips.intel.com>
Message-ID: <20070212235404.GX11411@obsidianresearch.com>

On Mon, Feb 12, 2007 at 02:47:42PM -0800, Sean Hefty wrote:

> Maybe it would help if we can agree on a set of expectations.  These are 
> what I am thinking:
> 
> 1. An SA should be able to respond to a valid PR query if at least one of 
> the GIDs in the path record is local.
> 
> 2. The LIDs in a PR are relative to the SA's subnet that returned the 
> record.
> 
> 3. An IB router should not failover transparently to QPs sending traffic 
> through that router.

OK to these

> 4. A PR from the local SA with reversible=1 indicates that data sent from 
> the remote GID to the local GID using the PR TC and FL will route locally 
> using the specified LID pair.  This holds whether the PR SGID is local or 
> remote.
 
> 5. A PR from a remote SA with reversible=1 indicates that data sent from 
> the local GID to the remote GID using the PR TC and FL will route remotely 
> using the specified LID pair.  This holds whether the PR SGID is local or 
> remote.

I can't think how to actually implement these restrictions in the
general case without SLID spoofing and the general method I outlined
in my prior email.

*Especially* reversible - which by definition requires the FL and TC
to be the same on both directions of the path!

> 6. A PR with reversible=0 is relative to SA's subnet.  The SGID->DGID data 
> flow over the PR TC and FL indicates the SLID->DLID mapping for that subnet.

Think about this - it is backwards for the UD case. You have specified
that the SGID->DGID direction uses the returned SLID/DLID which are
ensured by the flowlabel in the GRH. But the local side only controls
what it sends. How does this GRH get to the remote side? In UD the
returned GRH from the PR controls the selection of LID on the DGID's
subnet. That is how it must be.

QPs have a specific definition of where the GRH comes from, and for a
local PR query with SGID=myself the GRH programmed into the QP must come
from that query. This is necessary for UD and I don't think it can be
changed around.

Plus, in the multi-router path, the GRH alone does not contain the
information to know which physical router port the flow exits
from. (See prior diagram) - so the SLID spoofing is the only way to
fix things up if the PR queries are left unchanged.

> The use of reversible between subnets is what's concerning me.  It may be 
> that an SA could not return any paths as reversible between two subnets 
> without using some trick like what you mentioned.

I really don't see how it can work any other way right now..
 
> These add a requirement on the SA that they must be aware of the routes 
> packets take between two GIDs using a given TC and FL, but I don't believe 
> that this necessarily forces SA to SA communication.  The SA may only need 
> to exchange information with a router...?

The major problem is that there are multiple router paths that a given
GRH can take that are only fully disambiguated by the router lid at
the sender.

> > - Routers do the SLID spoofing you outlined.
> 
> I'm not sure this is something that we do want now.  APM should really 
> handle path failover.

It has absolutely nothing to do with failover. This is necessary to
make multiple router paths work at all. It is necessary for reversible
to work with multiple routers at all. 

> >There is alot of complex work in the router and SA side to make this
> >kind of topology work, but it is critical that the clients use path
> >queries that can provide enough data to the SA and return enough data
> >to the client to support this.
> 
> I'm still deciding if the existing path record attribute is sufficient.

I'm of the opinion that it isn't a good fit. Look at how tortured
things are just because the PR record does not have enough information
to let the SA answer in the best way.

Jason


From jgunthorpe at obsidianresearch.com  Mon Feb 12 16:10:45 2007
From: jgunthorpe at obsidianresearch.com (Jason Gunthorpe)
Date: Mon, 12 Feb 2007 17:10:45 -0700
Subject: [openib-general] Problem is routing CM REQ
In-Reply-To: <6.2.0.14.2.20070212152343.08e8ca48@esmail.cup.hp.com>
References: <20070210004820.GS11411@obsidianresearch.com>
	<000001c74cb8$5e80eef0$3cd4180a@amr.corp.intel.com>
	<20070211230935.GT11411@obsidianresearch.com>
	<45D0A27A.2010302@ichips.intel.com>
	<20070212205634.GW11411@obsidianresearch.com>
	<6.2.0.14.2.20070212152343.08e8ca48@esmail.cup.hp.com>
Message-ID: <20070213001045.GY11411@obsidianresearch.com>

On Mon, Feb 12, 2007 at 03:31:15PM -0800, Michael Krause wrote:

> TClass is intended to communicate the end-to-end QoS desired.   TClass is 
> then mapped to a SL that is local to each subnet.   A flow label is 
> intended to much the same as in the IP world and is left, in essence, to 
> routers to manage.    An endnode look up should be to find the address 
> vector to the remote.   A look up may return multiple vectors.   The SLID 
> would correspond to each local subnet router port that acts as a first-hop 
> destination to the remote subnet.    I don't see why the router protocol 
> would not simply enable all paths on the local subnet to a given remote 
> subnet be acquired.  All of the work is kept local to the SA / SM in the 
> source subnet when determining a remote path to take.   Why is there any 
> need to define more than just this?  Define a router protocol to 
> communicate the each subnet's prefix, TClass, etc. and apply KISS.   A 
> management entity that wanted to manage out each subnet provides router 
> management in terms of route selection, etc. can be constructed by using 
> the existing protocols / tools combined with a new router protocol which 
> only does DGID to next hop SLID mapping.

All of this complexity is due to the RC QP requirement that the SLID
of an incoming LRH match the DLID programmed into the QP.

Translated into a network with routers this means that for a RC flow
to successfully work both the *forward* and *reverse* direction must
traverse the same router *LID* not just *port* on both subnets.

Please see the little ascii diagram I drew in a prior email to
understand my concern.

There is no such restriction in a real IP network. It would be akin to
having a host match the source MAC address in the ethernet frame to
double check that it came from the router port it is sending outgoing
packets to. Which means simple one-sided solutions from IP land don't
work here.

Things work exactly the way you outline today for UD. They don't work
at all for the general case of RC. Get rid of the QP requirement and
things work the way you outline for RC too. Keep it in and you must
use the FlowLabel to force the flows onto the right router LID.

That is why I said previously that the QP matching rules are a
mistake. The best way to solve this is to change C9-54 to only be in
effect if the GRH is not present.

CM also introduces the much smaller problem of getting the LIDs to the
passive side - but that cannot be solved without a broad solution to
the RC QP SLID matching problem.

Jason


From rdreier at cisco.com  Mon Feb 12 16:18:23 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Mon, 12 Feb 2007 16:18:23 -0800
Subject: [openib-general] [GIT PULL] please pull infiniband.git
Message-ID: <adazm7ioqpc.fsf@cisco.com>

Linus, please pull from

    master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband.git for-linus

This tree is also available from kernel.org mirrors at:

    git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git for-linus

This will add the new cxgb3 RDMA driver for Chelsio T3 NICs, as well
as IPoIB connected mode and various other smaller changes:

Ahmed S. Darwish (1):
      IB/core: Use ARRAY_SIZE macro for mandatory_table

Akinobu Mita (1):
      IB/ehca: Fix memleak on module unloading

David Howells (1):
      IB/mthca: Work around gcc bug on sparc64

Michael S. Tsirkin (6):
      IPoIB: Connected mode experimental support
      IB/mthca: Fix reserved MTTs calculation on mem-free HCAs
      IB/mthca: Give reserved MTTs a separate cache line
      IB/mthca: Fix access to MTT and MPT tables on non-cache-coherent CPUs
      IB/mthca: Merge MR and FMR space on 64-bit systems
      IB/mthca: Always fill MTTs from CPU

Roland Dreier (1):
      IB/mthca: Use correct structure size in call to memset()

Sean Hefty (2):
      RDMA/cma: Increment port number after close to avoid re-use
      IB: Remove redundant "_wq" from workqueue names

Steve Wise (1):
      RDMA/cxgb3: Add driver for Chelsio T3 RNIC

 drivers/infiniband/Kconfig                     |    1 +
 drivers/infiniband/Makefile                    |    1 +
 drivers/infiniband/core/addr.c                 |    2 +-
 drivers/infiniband/core/cma.c                  |   68 +-
 drivers/infiniband/core/device.c               |    3 +-
 drivers/infiniband/hw/cxgb3/Kconfig            |   27 +
 drivers/infiniband/hw/cxgb3/Makefile           |   12 +
 drivers/infiniband/hw/cxgb3/cxio_dbg.c         |  207 +++
 drivers/infiniband/hw/cxgb3/cxio_hal.c         | 1280 +++++++++++++++
 drivers/infiniband/hw/cxgb3/cxio_hal.h         |  201 +++
 drivers/infiniband/hw/cxgb3/cxio_resource.c    |  331 ++++
 drivers/infiniband/hw/cxgb3/cxio_resource.h    |   70 +
 drivers/infiniband/hw/cxgb3/cxio_wr.h          |  685 ++++++++
 drivers/infiniband/hw/cxgb3/iwch.c             |  189 +++
 drivers/infiniband/hw/cxgb3/iwch.h             |  177 ++
 drivers/infiniband/hw/cxgb3/iwch_cm.c          | 2081 ++++++++++++++++++++++++
 drivers/infiniband/hw/cxgb3/iwch_cm.h          |  223 +++
 drivers/infiniband/hw/cxgb3/iwch_cq.c          |  225 +++
 drivers/infiniband/hw/cxgb3/iwch_ev.c          |  231 +++
 drivers/infiniband/hw/cxgb3/iwch_mem.c         |  172 ++
 drivers/infiniband/hw/cxgb3/iwch_provider.c    | 1203 ++++++++++++++
 drivers/infiniband/hw/cxgb3/iwch_provider.h    |  367 +++++
 drivers/infiniband/hw/cxgb3/iwch_qp.c          | 1007 ++++++++++++
 drivers/infiniband/hw/cxgb3/iwch_user.h        |   67 +
 drivers/infiniband/hw/cxgb3/tcb.h              |  632 +++++++
 drivers/infiniband/hw/ehca/ehca_irq.c          |    2 +
 drivers/infiniband/hw/mthca/mthca_cmd.c        |    6 +-
 drivers/infiniband/hw/mthca/mthca_dev.h        |    2 +
 drivers/infiniband/hw/mthca/mthca_main.c       |   40 +-
 drivers/infiniband/hw/mthca/mthca_memfree.c    |  127 ++-
 drivers/infiniband/hw/mthca/mthca_memfree.h    |    9 +-
 drivers/infiniband/hw/mthca/mthca_mr.c         |  110 ++-
 drivers/infiniband/hw/mthca/mthca_profile.c    |    2 +-
 drivers/infiniband/hw/mthca/mthca_provider.c   |   14 +-
 drivers/infiniband/hw/mthca/mthca_provider.h   |    1 +
 drivers/infiniband/hw/mthca/mthca_qp.c         |    2 +-
 drivers/infiniband/hw/mthca/mthca_srq.c        |    9 +-
 drivers/infiniband/ulp/ipoib/Kconfig           |   16 +-
 drivers/infiniband/ulp/ipoib/Makefile          |    1 +
 drivers/infiniband/ulp/ipoib/ipoib.h           |  215 +++
 drivers/infiniband/ulp/ipoib/ipoib_cm.c        | 1237 ++++++++++++++
 drivers/infiniband/ulp/ipoib/ipoib_ib.c        |   29 +-
 drivers/infiniband/ulp/ipoib/ipoib_main.c      |   63 +-
 drivers/infiniband/ulp/ipoib/ipoib_multicast.c |    4 +-
 drivers/infiniband/ulp/ipoib/ipoib_verbs.c     |   40 +-
 drivers/infiniband/ulp/ipoib/ipoib_vlan.c      |    2 +
 46 files changed, 11279 insertions(+), 114 deletions(-)
 create mode 100644 drivers/infiniband/hw/cxgb3/Kconfig
 create mode 100644 drivers/infiniband/hw/cxgb3/Makefile
 create mode 100644 drivers/infiniband/hw/cxgb3/cxio_dbg.c
 create mode 100644 drivers/infiniband/hw/cxgb3/cxio_hal.c
 create mode 100644 drivers/infiniband/hw/cxgb3/cxio_hal.h
 create mode 100644 drivers/infiniband/hw/cxgb3/cxio_resource.c
 create mode 100644 drivers/infiniband/hw/cxgb3/cxio_resource.h
 create mode 100644 drivers/infiniband/hw/cxgb3/cxio_wr.h
 create mode 100644 drivers/infiniband/hw/cxgb3/iwch.c
 create mode 100644 drivers/infiniband/hw/cxgb3/iwch.h
 create mode 100644 drivers/infiniband/hw/cxgb3/iwch_cm.c
 create mode 100644 drivers/infiniband/hw/cxgb3/iwch_cm.h
 create mode 100644 drivers/infiniband/hw/cxgb3/iwch_cq.c
 create mode 100644 drivers/infiniband/hw/cxgb3/iwch_ev.c
 create mode 100644 drivers/infiniband/hw/cxgb3/iwch_mem.c
 create mode 100644 drivers/infiniband/hw/cxgb3/iwch_provider.c
 create mode 100644 drivers/infiniband/hw/cxgb3/iwch_provider.h
 create mode 100644 drivers/infiniband/hw/cxgb3/iwch_qp.c
 create mode 100644 drivers/infiniband/hw/cxgb3/iwch_user.h
 create mode 100644 drivers/infiniband/hw/cxgb3/tcb.h
 create mode 100644 drivers/infiniband/ulp/ipoib/ipoib_cm.c


From mshefty at ichips.intel.com  Mon Feb 12 16:45:33 2007
From: mshefty at ichips.intel.com (Sean Hefty)
Date: Mon, 12 Feb 2007 16:45:33 -0800
Subject: [openib-general] Problem is routing CM REQ
In-Reply-To: <20070212235404.GX11411@obsidianresearch.com>
References: <20070210004820.GS11411@obsidianresearch.com>
	<000001c74cb8$5e80eef0$3cd4180a@amr.corp.intel.com>
	<20070211230935.GT11411@obsidianresearch.com>
	<45D0A27A.2010302@ichips.intel.com>
	<20070212205634.GW11411@obsidianresearch.com>
	<45D0EE8E.4030906@ichips.intel.com>
	<20070212235404.GX11411@obsidianresearch.com>
Message-ID: <45D10A2D.10104@ichips.intel.com>

>>4. A PR from the local SA with reversible=1 indicates that data sent from 
>>the remote GID to the local GID using the PR TC and FL will route locally 
>>using the specified LID pair.  This holds whether the PR SGID is local or 
>>remote.
> 
>>5. A PR from a remote SA with reversible=1 indicates that data sent from 
>>the local GID to the remote GID using the PR TC and FL will route remotely 
>>using the specified LID pair.  This holds whether the PR SGID is local or 
>>remote.
> 
> I can't think how to actually implement these restrictions in the
> general case without SLID spoofing and the general method I outlined
> in my prior email.

But you agree with the expectations, and what reversible indicates?  Or are you 
claiming that reversible paths between different subnets is undefined, or means 
something different than specified in 13.5.4?  (E.g. reversible applies only at 
the network level if global routing is used.)

> Think about this - it is backwards for the UD case. You have specified
> that the SGID->DGID direction uses the returned SLID/DLID which are
> ensured by the flowlabel in the GRH. But the local side only controls
> what it sends. How does this GRH get to the remote side? In UD the
> returned GRH from the PR controls the selection of LID on the DGID's
> subnet. That is how it must be.

I'm not following you here.  For UD, query the local SA, then direct the send to 
the router LID.  I would only query the remote SA for RC, in order to get the 
remote LID information to put into the CM REQ.

> The major problem is that there are multiple router paths that a given
> GRH can take that are only fully disambiguated by the router lid at
> the sender.

But doesn't 19.2.4.1 imply that once a router selects a path, it will continue 
to use that same path for similar packets?  So, if we inject a GRH into the 
internetwork from the source router, then isn't a single path followed to the 
remote endpoint?

Relaxing 9.6.1.5 seems like a nice solution to most of the problems, but it also 
seems like one that would fail to work with any existing HCAs.

- Sean


From dledford at redhat.com  Mon Feb 12 17:20:26 2007
From: dledford at redhat.com (Doug Ledford)
Date: Mon, 12 Feb 2007 20:20:26 -0500
Subject: [openib-general] issues with compilation of ofed 1.2
In-Reply-To: <1171323354.28500.48.camel@stevo-desktop>
References: <45C9EE31.2040602@voltaire.com>
	<6a122cc00702072302s18c1c4b7i3f1e4a1b3f3d0381@mail.gmail.com>
	<1170973693.19297.2.camel@firewall.xsintricity.com>
	<1171322647.28500.41.camel@stevo-desktop>
	<1171323354.28500.48.camel@stevo-desktop>
Message-ID: <1171329626.3161.36.camel@fc6.xsintricity.com>

On Mon, 2007-02-12 at 17:35 -0600, Steve Wise wrote:
> Ok, I hacked around this by changing the build_env.sh.  
> 
> But I think build_env.sh will have to distinguish between rhel5 and all
> other redhat releases so it can correctly set the prerequisite rpms...

Yes, it will.

> Steve.
> 
> On Mon, 2007-02-12 at 17:24 -0600, Steve Wise wrote:
> > I still get this error building  on rhel5b2 with the latest from the ofa
> > git trees:
> > 
> > ERROR: The sysfsutils-devel package is required to build libibverbs_devel RPM
> > [root at vic12 OFED-1.2-stevo]# rpm -qa|grep sysfs
> > libsysfs-2.0.0-6
> > libsysfs-devel-2.0.0-6
> > libsysfs-2.0.0-6
> > sysfsutils-2.0.0-6
> > libsysfs-devel-2.0.0-6
> > 
> > 
> > I installed all the sysfs rpms I could find.  So is there some
> > dependency problem here in the OFED build script that is looking for the
> > wrong rpm in rhel5?
> > 
> > Is there a bug to track this issue?
> > 
> > Steve.
> > 
> > 
> > On Thu, 2007-02-08 at 17:28 -0500, Doug Ledford wrote:
> > > On Thu, 2007-02-08 at 09:02 +0200, Moni Levy wrote:
> > > > Doug,
> > > > On 2/7/07, Yosef Etigin <yosefe at voltaire.com> wrote:
> > > > > 7. On RHAS5 beta 2, the setup requires sysfstuils-devel RPM which is not included in this distro.
> > > > 
> > > > Can you please help us with that ?
> > > 
> > > The value of the sysfsutils is far overshadowed by the value of libsysfs
> > > (and libsysfs is far more commonly used).  So, in RHEL5, the rpm package
> > > names reflect this:
> > > 
> > > libsysfs
> > > sysfsutils (I think, might be libsysfs-utils)
> > > libsysfs-devel
> > > 
> > > It's all still there, just a different name.
> > > 
> > > > -- Moni
> > > > 
> > > > >
> > > > > --
> > > > > Yosef Etigin
> > > > > Alex Tabachnik
> > > > >
> > > _______________________________________________
> > > openib-general mailing list
> > > openib-general at openib.org
> > > http://openib.org/mailman/listinfo/openib-general
> > > 
> > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
> > 
> > 
> > _______________________________________________
> > openib-general mailing list
> > openib-general at openib.org
> > http://openib.org/mailman/listinfo/openib-general
> > 
> > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
> > 
-- 
Doug Ledford <dledford at redhat.com>
              GPG KeyID: CFBFF194
              http://people.redhat.com/dledford

Infiniband specific RPMs available at
              http://people.redhat.com/dledford/Infiniband
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070212/4a571766/attachment.sig>

From jgunthorpe at obsidianresearch.com  Mon Feb 12 18:03:30 2007
From: jgunthorpe at obsidianresearch.com (Jason Gunthorpe)
Date: Mon, 12 Feb 2007 19:03:30 -0700
Subject: [openib-general] Problem is routing CM REQ
In-Reply-To: <45D10A2D.10104@ichips.intel.com>
References: <20070210004820.GS11411@obsidianresearch.com>
	<000001c74cb8$5e80eef0$3cd4180a@amr.corp.intel.com>
	<20070211230935.GT11411@obsidianresearch.com>
	<45D0A27A.2010302@ichips.intel.com>
	<20070212205634.GW11411@obsidianresearch.com>
	<45D0EE8E.4030906@ichips.intel.com>
	<20070212235404.GX11411@obsidianresearch.com>
	<45D10A2D.10104@ichips.intel.com>
Message-ID: <20070213020330.GZ11411@obsidianresearch.com>

On Mon, Feb 12, 2007 at 04:45:33PM -0800, Sean Hefty wrote:
> >>4. A PR from the local SA with reversible=1 indicates that data sent from 
> >>the remote GID to the local GID using the PR TC and FL will route locally 
> >>using the specified LID pair.  This holds whether the PR SGID is local or 
> >>remote.
> >
> >>5. A PR from a remote SA with reversible=1 indicates that data sent from 
> >>the local GID to the remote GID using the PR TC and FL will route 
> >>remotely using the specified LID pair.  This holds whether the PR SGID is 
> >>local or remote.
> >
> >I can't think how to actually implement these restrictions in the
> >general case without SLID spoofing and the general method I outlined
> >in my prior email.
> 
> But you agree with the expectations, and what reversible indicates?  Or are 
> you claiming that reversible paths between different subnets is undefined, 
> or means something different than specified in 13.5.4?  (E.g. reversible 
> applies only at the network level if global routing is used.)

I think pure reversible paths are a good idea to support on routed
paths - meaning strictly the definition from 13.5.4. That is a GMP
sender can request a PR with reversible=1 and know that if the
receiver applies 13.5.4 then the reply packet will get back to the
receiver. Note: As per the QP LID matching rules the SLID is not
matched for UD - so a reversible PR would not have to guarentee the
return path router SLID on the local side.

What your #4 and #5 are talking about is not just that, but also PR
queries that can unambigously identify the LID selections of the
router in advance. That is hugely different! IMHO, just because a
reversible path exists and will be used by the router shouldn't be
taken to mean that the it is the only one or that the SA can tell you
which of many possible choices it will be.

> >Think about this - it is backwards for the UD case. You have specified
> >that the SGID->DGID direction uses the returned SLID/DLID which are
> >ensured by the flowlabel in the GRH. But the local side only controls
> >what it sends. How does this GRH get to the remote side? In UD the
> >returned GRH from the PR controls the selection of LID on the DGID's
> >subnet. That is how it must be.
> 
> I'm not following you here.  For UD, query the local SA, then direct the 
> send to the router LID.  I would only query the remote SA for RC, in order 
> to get the remote LID information to put into the CM REQ.

I'm talking about the locality of information in the PR.

Eg:
PR query to SA: SGID=Node1 DGID=Node2 ==> Flowlabel=XX SLID=Node1 DLID=1

What direction does FlowLabel=xx refer to? Do you put it in the local
side's QP or do you put it in the CM REQ?

The use model that UD defines says it is to go in the QP, not the CM
REQ. It also more or less requires that the remote SA have a hand in
selecting the FlowLabel since the router on the Node2 subnet is the
one that acts on it.

When I read your mails I get the impression you want to put the
FlowLabel from the local PR in the CM REQ - which makes huge amounts
of sense, but is not really what is set out in IBA I feel. :<

Staying aligned with the UD use model for PRs is why I outlined a
solution that required the local SA to consult the remote SA to get
the FlowLabel.

> >The major problem is that there are multiple router paths that a given
> >GRH can take that are only fully disambiguated by the router lid at
> >the sender.
> 
> But doesn't 19.2.4.1 imply that once a router selects a path, it will 
> continue to use that same path for similar packets?  So, if we inject a GRH 
> into the internetwork from the source router, then isn't a single path 
> followed to the remote endpoint?

Yes. Absolutely. 

I view this problem not as if there is an existing fixed path, but
trying to find a way to support unambiguous identification of that
path when the DGID alone is not enough information.
[Ingress port, DGID, Flowlabel and TClass are the minimum required set
AFAIK]

BTW, 19.2.4.1 seems to imply that nothing in the spec is going to
cause a problem for the routers path selection since 'a session is
used in a deliberately vauge way'. My reading of 9.6.1.5 makes me
pretty sure it causes a problem due to the LRH.SLID matching - you
also agree right?

> Relaxing 9.6.1.5 seems like a nice solution to most of the problems, but it 
> also seems like one that would fail to work with any existing HCAs.

I agree. In fact until your mail last week I was operating under the
assumption (reinforced by text like 19.2.4.1) that nothing like
9.6.1.5 existed in the spec.

It wouldn't suprise me if the spec writers intended things to work as
though 9.6.1.5 didn't cause this problem and reworked it. If so then
cards that can't be fixed with a firmware upgrade wouldn't support
mutliple routed paths, but would support the simple single router LID
case. That might be acceptable. 

If so then I'd expect also for a SGID=off-subnet query to return the
remote LIDs to make CM work properly with existing conforming
implementations (that use 3 PR queries to get non-reversable paths
;>).

Jason


From krkumar2 at in.ibm.com  Mon Feb 12 19:31:42 2007
From: krkumar2 at in.ibm.com (Krishna Kumar2)
Date: Tue, 13 Feb 2007 09:01:42 +0530
Subject: [openib-general] [PATCH] RDMA/iwcm: Bugs in
 cm_conn_req_handler()
In-Reply-To: <1171061217.4525.15.camel@stevo-desktop>
Message-ID: <OF41D19D7B.F1CF5A88-ON65257281.00134665-65257281.00136200@in.ibm.com>

Steve,

I was doing "random%5 == 0" or some such and failing in
iw_conn_req_handler(). There was
no other explicit test case other than running rdma_bw using this hack.

thanks,

- KK

Steve Wise <swise at opengridcomputing.com> wrote on 02/10/2007 04:16:57 AM:

>
> > All 4 above cases were tested by injecting random error in
> > iw_conn_req_handler() and running rdma_bw/krping, they were
> > confirmed. I added the BUG_ON() to confirm the earlier check
> > for id_priv->refcount==0 should always be true (and could be
> > removed).
>
> Can you post the test case you're using for this?
>
>
> Steve.
>
>


From krkumar2 at in.ibm.com  Mon Feb 12 21:06:56 2007
From: krkumar2 at in.ibm.com (Krishna Kumar2)
Date: Tue, 13 Feb 2007 10:36:56 +0530
Subject: [openib-general] [PATCH] RDMA/iwcm: Bugs in
 cm_conn_req_handler()
In-Reply-To: <1171142795.11017.71.camel@stevo-desktop>
Message-ID: <OF9B1C2B03.E1D678D5-ON65257281.00184650-65257281.001C1A1E@in.ibm.com>

Hi Steve,

Thanks for the explanation. I reviewed the patch and had two
comments :

1. When the set_bit(CALLBACK_DESTROY) was done, the
    refcount could be such so that the free_cm_id is not called,
    resulting in cm_id having work entries. But there are two
    places where a BUG_ON(!list_empty(work_list)) was added
    (before doing a free_cm_id()), both under check for
    CALLBACK_DESTROY. Isn't it possible for these BUG_ON's
    to get hit ? This is an error case and may not hit in normal
    testing.

2.
> @@ -647,10 +650,11 @@ static void cm_conn_req_handler(struct i
>     /* Call the client CM handler */
>     ret = cm_id->cm_handler(cm_id, iw_event);
>     if (ret) {
> +      iw_cm_reject(cm_id, NULL, 0);
>        set_bit(IWCM_F_CALLBACK_DESTROY, &cm_id_priv->flags);
>        destroy_cm_id(cm_id);
>        if (atomic_read(&cm_id_priv->refcount)==0)
> -         kfree(cm_id);
> +         free_cm_id(cm_id_priv);
>     }

Though this is not a bug, the code just above this calls
iw_destroy_cm_id() if alloc_work_entries() failed. Is it
possible for the provider to get a reference to this cm_id
during the failed call to the client CM handler ? I didn't
think so, which is why in my original patch I had simply
called iw_destroy_cm_id here.

I had read your explanation about the provider possibly
acquiring a reference, but in this place aren't we calling
iw_conn_req_handler() which in turn cannot go to the
device and cache a reference count ?

The rest of the patch looks great. I am going to test this
today and will post the results.

Thanks,

- KK

> On Sat, 2007-02-10 at 14:36 -0600, Steve Wise wrote:
> > ugh.
> >
> > There is at least one bug in this patch.  I cannot call iw_cm_reject()
> > inside destroy_cm_id() because both functions grab the iw_cm lock...
> >
> >
>
> This patch puts the iw_cm_reject() calls back in
> cm_conn_req_handler()...
>
>
> ---
>
> iw_cm_id destruction race condition fixes.
>
> From: Steve Wise <swise at opengridcomputing.com>
>
> Several changes:
>
> - iwcm_deref_id() always wakes up if there's another reference.
>
> - clean up race condition in cm_work_handler().
>
> - create static void free_cm_id() which deallocs the work entries and
then
>   kfrees the cm_id memory.  This reduces code replication.
>
> - rem_ref() if this is the last reference -and- the IWCM owns freeing the

>   cm_id, then free it.
>
> Signed-off-by: Steve Wise <swise at opengridcomputing.com>
> Signed-off-by: Tom Tucker <tom at opengridcomputing.com>
> ---
>
>  drivers/infiniband/core/iwcm.c |   47
> +++++++++++++++++++++-------------------
>  1 files changed, 25 insertions(+), 22 deletions(-)
>
> diff --git a/drivers/infiniband/core/iwcm.c
b/drivers/infiniband/core/iwcm.c
> index 1039ad5..891d1fa 100644
> --- a/drivers/infiniband/core/iwcm.c
> +++ b/drivers/infiniband/core/iwcm.c
> @@ -146,6 +146,12 @@ static int copy_private_data(struct iw_c
>     return 0;
>  }
>
> +static void free_cm_id(struct iwcm_id_private *cm_id_priv)
> +{
> +   dealloc_work_entries(cm_id_priv);
> +   kfree(cm_id_priv);
> +}
> +
>  /*
>   * Release a reference on cm_id. If the last reference is being
>   * released, enable the waiting thread (in iw_destroy_cm_id) to
> @@ -153,21 +159,14 @@ static int copy_private_data(struct iw_c
>   */
>  static int iwcm_deref_id(struct iwcm_id_private *cm_id_priv)
>  {
> -   int ret = 0;
> -
>     BUG_ON(atomic_read(&cm_id_priv->refcount)==0);
>     if (atomic_dec_and_test(&cm_id_priv->refcount)) {
>        BUG_ON(!list_empty(&cm_id_priv->work_list));
> -      if (waitqueue_active(&cm_id_priv->destroy_comp.wait)) {
> -         BUG_ON(cm_id_priv->state != IW_CM_STATE_DESTROYING);
> -         BUG_ON(test_bit(IWCM_F_CALLBACK_DESTROY,
> -               &cm_id_priv->flags));
> -         ret = 1;
> -      }
>        complete(&cm_id_priv->destroy_comp);
> +      return 1;
>     }
>
> -   return ret;
> +   return 0;
>  }
>
>  static void add_ref(struct iw_cm_id *cm_id)
> @@ -181,7 +180,11 @@ static void rem_ref(struct iw_cm_id *cm_
>  {
>     struct iwcm_id_private *cm_id_priv;
>     cm_id_priv = container_of(cm_id, struct iwcm_id_private, id);
> -   iwcm_deref_id(cm_id_priv);
> +   if (iwcm_deref_id(cm_id_priv) &&
> +       test_bit(IWCM_F_CALLBACK_DESTROY, &cm_id_priv->flags)) {
> +      BUG_ON(!list_empty(&cm_id_priv->work_list));
> +      free_cm_id(cm_id_priv);
> +   }
>  }
>
>  static int cm_event_handler(struct iw_cm_id *cm_id, struct
> iw_cm_event *event);
> @@ -355,7 +358,9 @@ static void destroy_cm_id(struct iw_cm_i
>     case IW_CM_STATE_CONN_RECV:
>        /*
>         * App called destroy before/without calling accept after
> -       * receiving connection request event notification.
> +       * receiving connection request event notification or
> +       * returned non zero from the event callback function.
> +       * In either case, must tell the provider to reject.
>         */
>        cm_id_priv->state = IW_CM_STATE_DESTROYING;
>        break;
> @@ -391,9 +396,7 @@ void iw_destroy_cm_id(struct iw_cm_id *c
>
>     wait_for_completion(&cm_id_priv->destroy_comp);
>
> -   dealloc_work_entries(cm_id_priv);
> -
> -   kfree(cm_id_priv);
> +   free_cm_id(cm_id_priv);
>  }
>  EXPORT_SYMBOL(iw_destroy_cm_id);
>
> @@ -647,10 +650,11 @@ static void cm_conn_req_handler(struct i
>     /* Call the client CM handler */
>     ret = cm_id->cm_handler(cm_id, iw_event);
>     if (ret) {
> +      iw_cm_reject(cm_id, NULL, 0);
>        set_bit(IWCM_F_CALLBACK_DESTROY, &cm_id_priv->flags);
>        destroy_cm_id(cm_id);
>        if (atomic_read(&cm_id_priv->refcount)==0)
> -         kfree(cm_id);
> +         free_cm_id(cm_id_priv);
>     }
>
>  out:
> @@ -854,13 +858,12 @@ static void cm_work_handler(struct work_
>           destroy_cm_id(&cm_id_priv->id);
>        }
>        BUG_ON(atomic_read(&cm_id_priv->refcount)==0);
> -      if (iwcm_deref_id(cm_id_priv))
> -         return;
> -
> -      if (atomic_read(&cm_id_priv->refcount)==0 &&
> -          test_bit(IWCM_F_CALLBACK_DESTROY, &cm_id_priv->flags)) {
> -         dealloc_work_entries(cm_id_priv);
> -         kfree(cm_id_priv);
> +      if (iwcm_deref_id(cm_id_priv)) {
> +         if (test_bit(IWCM_F_CALLBACK_DESTROY,
> +                 &cm_id_priv->flags)) {
> +            BUG_ON(!list_empty(&cm_id_priv->work_list));
> +            free_cm_id(cm_id_priv);
> +         }
>           return;
>        }
>        spin_lock_irqsave(&cm_id_priv->lock, flags);
>
>
>
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
>
> To unsubscribe, please visit
http://openib.org/mailman/listinfo/openib-general
>


From rdreier at cisco.com  Mon Feb 12 21:45:19 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Mon, 12 Feb 2007 21:45:19 -0800
Subject: [openib-general] [PATCH 3 of 4] IB/mthca: fix
 non-cache-coherent CPUs with memfree
In-Reply-To: <20070210211508.GD14903@mellanox.co.il> (Michael S.
	Tsirkin's message of "Sat, 10 Feb 2007 23:15:08 +0200")
References: <20070210211508.GD14903@mellanox.co.il>
Message-ID: <adahctqobkg.fsf@cisco.com>

OK, I already merged this but now I'm thinking it's somewhat buggy:

 > +		if (coherent)
 > +			ret = mthca_alloc_icm_coherent(&dev->pdev->dev,
 > +						       &chunk->mem[chunk->npages],
 > +						       cur_order, gfp_mask);
 > +		else
 > +		       	ret = mthca_alloc_icm_pages(&chunk->mem[chunk->npages],
 > +						    cur_order, gfp_mask);
 >  
 > -			if (++chunk->npages == MTHCA_ICM_CHUNK_LEN) {
 > +		if (!ret) {
 > +			++chunk->npages;
 > +
 > +			if (!coherent && chunk->npages == MTHCA_ICM_CHUNK_LEN) {
 >  				chunk->nsg = pci_map_sg(dev->pdev, chunk->mem,

I don't see anything that ever bumps chunk->nsg if we're allocating a
coherent region and we end up needing more than one allocation to do
it.  Maybe something like this on top of the patch?

diff --git a/drivers/infiniband/hw/mthca/mthca_memfree.c b/drivers/infiniband/hw/mthca/mthca_memfree.c
index 0b9d053..48f7c65 100644
--- a/drivers/infiniband/hw/mthca/mthca_memfree.c
+++ b/drivers/infiniband/hw/mthca/mthca_memfree.c
@@ -175,7 +175,9 @@ struct mthca_icm *mthca_alloc_icm(struct mthca_dev *dev, int npages,
 		if (!ret) {
 			++chunk->npages;
 
-			if (!coherent && chunk->npages == MTHCA_ICM_CHUNK_LEN) {
+			if (coherent)
+				++chunk->nsg;
+			else if (chunk->npages == MTHCA_ICM_CHUNK_LEN) {
 				chunk->nsg = pci_map_sg(dev->pdev, chunk->mem,
 							chunk->npages,
 							PCI_DMA_BIDIRECTIONAL);


From erezz at voltaire.com  Mon Feb 12 22:21:57 2007
From: erezz at voltaire.com (Erez Zilber)
Date: Tue, 13 Feb 2007 08:21:57 +0200
Subject: [openib-general] OFED 1.2 components list - for the meeting
 today
In-Reply-To: <45D098E2.6000804@mellanox.co.il>
References: <45D098E2.6000804@mellanox.co.il>
Message-ID: <45D15905.8010204@voltaire.com>

Tziporet Koren wrote:
> This is the full OFED 1.2 components list that we will review in the meeting 
>
> Tziporet
>
> # Kernel
> ib_verbs (core)
> ib_mthca
> ib_ipoib
> ib_ipath - currently works on 2.6.20 only. Backport patches cannot applied
> ib_iser
> ib_sdp
> ib_srp
> ib_ehca - PPC only
> cxgb3
> vnic
> rds - currently works on kernel 2.6.20 and 2.6.19
> ib-bonding - RHEL4UP3 & SLES10 
>  
> # User libraries
> libibverbs
> libibcm
> libmthca
> libipathverbs
> libcxgb3
> libsdp
> libehca
> sdpnetstat
> libibcommon
> libibmad
> libibumad
> libopensm
> libosmcomp
> libosmvendor
> librdmacm
> dapl - not working with iWARP
>
> # User utilities
> perftest
> mstflint
> ibutils
> opensm
> qlvnictools
> openib-diags
> srptools
> ipoibtools
> tvflash
>
>   
open-iscsi is missing and should be placed under "User utilities".
> # MPI:
> mvapich
> mvapich2 - Build issue
> openmpi
> mpitests
>
> # OFED specific:
> ofed_docs - taken from 1.1 - not yet updated for 1.2
> ofed_scripts
>
>
>
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
>
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
>
>   


From erezz at voltaire.com  Mon Feb 12 22:26:04 2007
From: erezz at voltaire.com (Erez Zilber)
Date: Tue, 13 Feb 2007 08:26:04 +0200
Subject: [openib-general] ofa_1_2_kernel 20070212-0200 daily build status
In-Reply-To: <20070212102414.E14F9E60806@openfabrics.org>
References: <20070212102414.E14F9E60806@openfabrics.org>
Message-ID: <45D159FC.50806@voltaire.com>

vlad at lists.openfabrics.org wrote:
> This email was generated automatically, please do not reply
>
>
> Common build parameters:  --with-ipoib-mod --with-sdp-mod --with-srp-mod --with-user_mad-mod --with-user_access-mod --with-mthca-mod --with-core-mod --with-addr_trans-mod --with-cxgb3-mod 
>   

Vlad,

We talked about adding open-iscsi over iSER to this daily build several
weeks ago. Can you tell when will you be able to do that? It is really
important for us.

Thanks,
Erez


From karun.sharma at qlogic.com  Mon Feb 12 22:29:23 2007
From: karun.sharma at qlogic.com (Karun Sharma)
Date: Tue, 13 Feb 2007 00:29:23 -0600
Subject: [openib-general] new OFED 1.2 package
References: <45CB938E.5040305@mellanox.co.il>
Message-ID: <C07C40DB2364324799506DE8FF12F8D81861E0@EPEXCH1.qlogic.org>

Not able to install OFED1.2 on SLES10 machines (x86_64) even after disabling ipath.
Attached are the logs generated by install script. Observed some error with open-iscsi module. Disabling this module also doesn't help.
 
Thanks
Karun
 

________________________________

From: openib-general-bounces at openib.org on behalf of Tziporet Koren
Sent: Fri 2/9/2007 2:48 AM
To: EWG; OPENIB
Subject: [openib-general] new OFED 1.2 package


New OFED package was uploaded to the OFA server:
http://www.openfabrics.org/builds/ofed-1.2/OFED-1.2-20070208-1508.tgz

Many of the issues reported on the previous version are resolved
(bugzilla will be updated next week).

Since we had lab restructuring we did only basic tests on RHEL up4 and
SLES10 (x86 and x86_64)

All - we are going for our weekend now.
Please report all issues you encounter so we will be able to fix and do
the alpha release on Monday.

Thanks,
Tziporet & Vlad


_______________________________________________
openib-general mailing list
openib-general at openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070213/da678228/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: error.log
Type: application/octet-stream
Size: 1454707 bytes
Desc: error.log
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070213/da678228/attachment.obj>

From vlad at dev.mellanox.co.il  Mon Feb 12 11:24:21 2007
From: vlad at dev.mellanox.co.il (Vladimir Sokolovsky)
Date: Mon, 12 Feb 2007 21:24:21 +0200
Subject: [openib-general] MVAPICH2 SRPM update and install files patch
In-Reply-To: <45CE1C1C.70406@cse.ohio-state.edu>
References: <45CE1C1C.70406@cse.ohio-state.edu>
Message-ID: <1171308261.12725.9.camel@vladsk-laptop>

On Sat, 2007-02-10 at 14:25 -0500, Shaun Rowland wrote:
> I updated the latest MVAPICH2 SRPM:
> 
> https://www.openfabrics.org/~rowland/ofed_1_2/
> 
> I am including a patch to the latest ofed_1_2_scripts git files. Since
> these files are the same as those used in the OFED-1.2-20070208-1508.tgz
> package, this patch can also be applied there. This patch is required to
> use the new MVAPICH2 SRPM file and should not be used with the older
> versions.


Hi Shaun,
Mvapich2 RPM build fails.
Please fix the files list in mvapich2.spec. You need to put the path to
mvapich2 directory instead of prefix.
%{prefix} includes all OFED's files.


mvapich2.spec:
%files
%{_prefix}

-- 
Vladimir Sokolovsky <vlad at dev.mellanox.co.il>
Mellanox Technologies Ltd.


From dotanb at dev.mellanox.co.il  Tue Feb 13 00:17:28 2007
From: dotanb at dev.mellanox.co.il (Dotan Barak)
Date: Tue, 13 Feb 2007 10:17:28 +0200
Subject: [openib-general] [GIT PULL] please pull infiniband.git
In-Reply-To: <adazm7ioqpc.fsf@cisco.com>
References: <adazm7ioqpc.fsf@cisco.com>
Message-ID: <45D17418.3000508@dev.mellanox.co.il>

Hi Roland.

Roland Dreier wrote:
> Linus, please pull from
>
>     master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband.git for-linus
>
> This tree is also available from kernel.org mirrors at:
>
>     git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git for-linus
>
> This will add the new cxgb3 RDMA driver for Chelsio T3 NICs, as well
> as IPoIB connected mode and various other smaller changes:
>
> Ahmed S. Darwish (1):
>       IB/core: Use ARRAY_SIZE macro for mandatory_table
>
> Akinobu Mita (1):
>       IB/ehca: Fix memleak on module unloading
>
> David Howells (1):
>       IB/mthca: Work around gcc bug on sparc64
>
> Michael S. Tsirkin (6):
>       IPoIB: Connected mode experimental support
>       IB/mthca: Fix reserved MTTs calculation on mem-free HCAs
>       IB/mthca: Give reserved MTTs a separate cache line
>       IB/mthca: Fix access to MTT and MPT tables on non-cache-coherent CPUs
>       IB/mthca: Merge MR and FMR space on 64-bit systems
>       IB/mthca: Always fill MTTs from CPU
>
> Roland Dreier (1):
>       IB/mthca: Use correct structure size in call to memset()
>
> Sean Hefty (2):
>       RDMA/cma: Increment port number after close to avoid re-use
>       IB: Remove redundant "_wq" from workqueue names
>
> Steve Wise (1):
>       RDMA/cxgb3: Add driver for Chelsio T3 RNIC
>   
What about the patch that i sent on "Allow the following QP state 
transition : reset --> reset"?

thanks
Dotan


From erezz at voltaire.com  Tue Feb 13 01:42:21 2007
From: erezz at voltaire.com (Erez Zilber)
Date: Tue, 13 Feb 2007 11:42:21 +0200
Subject: [openib-general] new OFED 1.2 package
In-Reply-To: <C07C40DB2364324799506DE8FF12F8D81861E0@EPEXCH1.qlogic.org>
References: <45CB938E.5040305@mellanox.co.il>
	<C07C40DB2364324799506DE8FF12F8D81861E0@EPEXCH1.qlogic.org>
Message-ID: <45D187FD.4070500@voltaire.com>

Karun Sharma wrote:
> Not able to install OFED1.2 on SLES10 machines (x86_64) even after
> disabling ipath.
> Attached are the logs generated by install script. Observed some error
> with open-iscsi module. Disabling this module also doesn't help.

I made a fix in open-iscsi build. It should work once it is merged into
the new OFED build. Let me know if it doesn't work.

Erez


From vlad at lists.openfabrics.org  Tue Feb 13 02:23:48 2007
From: vlad at lists.openfabrics.org (vlad at lists.openfabrics.org)
Date: Tue, 13 Feb 2007 02:23:48 -0800 (PST)
Subject: [openib-general] ofa_1_2_kernel 20070213-0200 daily build status
Message-ID: <20070213102349.5FECBE60809@openfabrics.org>

This email was generated automatically, please do not reply


Common build parameters:  --with-ipoib-mod --with-sdp-mod --with-srp-mod --with-user_mad-mod --with-user_access-mod --with-mthca-mod --with-core-mod --with-addr_trans-mod --with-cxgb3-mod 

Passed:
Passed on i686 with 2.6.15-23-server
Passed on i686 with linux-2.6.17
Passed on i686 with linux-2.6.16
Passed on i686 with linux-2.6.15
Passed on i686 with linux-2.6.18
Passed on i686 with linux-2.6.14
Passed on i686 with linux-2.6.13
Passed on i686 with linux-2.6.12
Passed on i686 with linux-2.6.19
Passed on x86_64 with linux-2.6.20
Passed on powerpc with linux-2.6.19
Passed on x86_64 with linux-2.6.19
Passed on x86_64 with linux-2.6.18
Passed on x86_64 with linux-2.6.13
Passed on x86_64 with linux-2.6.16
Passed on x86_64 with linux-2.6.14
Passed on powerpc with linux-2.6.18
Passed on x86_64 with linux-2.6.12
Passed on powerpc with linux-2.6.17
Passed on x86_64 with linux-2.6.17
Passed on ppc64 with linux-2.6.19
Passed on powerpc with linux-2.6.15
Passed on x86_64 with linux-2.6.15
Passed on ppc64 with linux-2.6.18
Passed on ppc64 with linux-2.6.17
Passed on powerpc with linux-2.6.14
Passed on ppc64 with linux-2.6.16
Passed on ppc64 with linux-2.6.15
Passed on powerpc with linux-2.6.13
Passed on ia64 with linux-2.6.19
Passed on ppc64 with linux-2.6.13
Passed on powerpc with linux-2.6.12
Passed on powerpc with linux-2.6.16
Passed on ppc64 with linux-2.6.12
Passed on ppc64 with linux-2.6.14
Passed on ia64 with linux-2.6.17
Passed on ia64 with linux-2.6.18
Passed on ia64 with linux-2.6.15
Passed on ia64 with linux-2.6.13
Passed on ia64 with linux-2.6.12
Passed on ia64 with linux-2.6.14
Passed on ia64 with linux-2.6.16

Failed:


From devesh28 at gmail.com  Tue Feb 13 05:37:22 2007
From: devesh28 at gmail.com (Devesh Sharma)
Date: Tue, 13 Feb 2007 19:07:22 +0530
Subject: [openib-general] Immediate data question
In-Reply-To: <309a667c0702112110h7a79961fv9c3cf46d4392e1d4@mail.gmail.com>
References: <6C2C79E72C305246B504CBA17B5500C905DC04@mtlexch01.mtl.com>
	<349DCDA352EACF42A0C49FA6DCEA840350B1B5@G3W0634.americas.hpqcorp.net>
	<ada7iuwp5rr.fsf@cisco.com>
	<349DCDA352EACF42A0C49FA6DCEA84035DF880@G3W0634.americas.hpqcorp.net>
	<adamz3pfym0.fsf@cisco.com>
	<349DCDA352EACF42A0C49FA6DCEA84035DFAFF@G3W0634.americas.hpqcorp.net>
	<adahctxeds8.fsf@cisco.com>
	<6.2.0.14.2.20070208131610.085c52f8@esmail.cup.hp.com>
	<349DCDA352EACF42A0C49FA6DCEA840362A87E@G3W0634.americas.hpqcorp.net>
	<309a667c0702112110h7a79961fv9c3cf46d4392e1d4@mail.gmail.com>
Message-ID: <309a667c0702130537u35745e98y429d3d564fb093e9@mail.gmail.com>

On 2/12/07, Devesh Sharma <devesh28 at gmail.com> wrote:
> On 2/10/07, Tang, Changqing <changquing.tang at hp.com> wrote:
> > > >
> > > >Not for the receiver, but the sender will be severely slowed down by
> > > >having to wait for the RNR timeouts.
> > >
> > > RNR = Receiver Not Ready so by definition, the data flow
> > > isn't going to
> > > progress until the receiver is ready to receive data.   If a
> > > receive QP
> > > enters RNR for a RC, then it is likely not progressing as
> > > desired.   RNR
> > > was initially put in place to enable a receiver to create
> > > back pressure to the sender without causing a fatal error
> > > condition.  It should rarely be entered and therefore should
> > > have negligible impact on overall performance however when a
> > > RNR occurs, no forward progress will occur so performance is
> > > essentially zero.
> >
> > Mike:
> >         I still do not quite understand this issue. I have two
> > situations that have RNR triggered.
> >
> > 1. process A and process B is connected with QP. A first post a send to
> > B, B does not post receive. Then A and B are doing a long time
> > RDMA_WRITE each other, A and B just check memory for the RDMA_WRITE
> > message. Finally B will post a receive. Does the first pending send in A
> > block all the later RDMA_WRITE ?
> According to IBTA spec HCA will process WR entries in strict order in
> which they are posted so the send will block all WR posted after this
> send, Until-unless HCA has multiple processing elements, I think even
> then processing order will be maintained by HCA
>  If not, since RNR is triggered
> > periodically till B post receive, does it affect the RDMA_WRITE
> > performance between A and B ?
> >
> > 2. extend above to three processes, A connect to B, B connect to C, so B
> > has two QPs, but one CQ.A posts a send to B, B does not post receive,
post ordering accross QP is not guaranteed hence presence of same CQ
or different CQ will not affect any thing.
> > rather B and C are doing a long time RDMA_WRITE,or send/recv. But B
If RDMA WRITE _on_ B, no effect on performance. If RDMA WRITE _on_ C,
_may_ affect the performance, since load is on same HCA. In case of
Send/Recv again _may_ affect the performance, with the same reason.
> > must sends RNR periodically to A, right?. So does the pending message
> > from A affects B's overall performance  between B and C ?
But RNR NAK is not for very long time.....possibly this performance
hit you will not be able to observe even. The moment rnr_counter
expires connection will be broken!
> >
> >         Thank you.
> >
> > --CQ
> >
> >
> > >
> > > Mike
> > >
> > >
> > >
> >
> > _______________________________________________
> > openib-general mailing list
> > openib-general at openib.org
> > http://openib.org/mailman/listinfo/openib-general
> >
> > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
> >
> >
>


From jsquyres at cisco.com  Tue Feb 13 06:07:03 2007
From: jsquyres at cisco.com (Jeff Squyres)
Date: Tue, 13 Feb 2007 09:07:03 -0500
Subject: [openib-general] uDAPL in OFED 1.1 question
Message-ID: <F54F9880-5DE4-4A28-86FC-8744CA7D4110@cisco.com>

I have an OFED 1.1 cluster where something odd is happening in the  
udapl Open MPI plugin (I'm not excluding the possibility that we have  
a bug in the OMPI udapl plugin -- I'm just trying to understand some  
uDAPL behavior).  In some cases, we are getting back the error  
DAT_CONN_QUAL_IN_USE from dat_ep_create().

However, someone more knowledgeable about udapl than me said that the  
spec says that DAT_CONN_QUAL_IN_USE should only be reported back from  
a call to dat_psp_create() or dat_rsp_create().

Can someone tell me exactly what dat_ep_create() returning  
DAT_CONN_QUAL_IN_USE means?

Thanks.

-- 
Jeff Squyres
Server Virtualization Business Unit
Cisco Systems


From halr at voltaire.com  Tue Feb 13 07:15:04 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 13 Feb 2007 10:15:04 -0500
Subject: [openib-general] OSM QoS policy file
In-Reply-To: <45C72515.8090100@dev.mellanox.co.il>
References: <45C72515.8090100@dev.mellanox.co.il>
Message-ID: <1171379703.22446.15877.camel@hal.voltaire.com>

Hi Yevgeny,

Sorry for the slow response; I've been consumed getting ready for OFED
1.2 alpha.

On Mon, 2007-02-05 at 07:37, Yevgeny Kliteynik wrote:
> Hi Hal.
> 
> I added osm/doc/qos-policy.txt file with the description of the QoS
> policy file, and an example of such file (with more comments inside).
> I'm sure you'll have questions and corrections regarding this file,
> so for now, to make our work easier, I'm not sending it as patch, 
> but just as text. Please review the file.

Thanks for doing this. This helps but I still do have a number of
questions on it as you expected. See below for specifics.

It would be nice to turn this into a DTD when things get closer to
finalizing so XML configs could readily be validated. Can you do this ? 

I'd also like to see a futures/todo list. I think we've discussed a few
topics which fall into this category.

Thanks.

-- Hal

> Thanks
> 
> -- Yevgeny
> 
> =============================================================
> 
> QoS Policy File
> ===============
> 
> The QoS policy file is divided into 4 sub sections:
> 
>  - Port Group: a set of CAs, Routers or Switches that share 
>    the same settings. A port group might be a partition 
>    defined by the partition manager policy in terms of 
>    GUIDs. Future implementations might provide support 
>    for NodeDescription based definition of port groups.

IMO, this group be a separate schema on which this (and partitions and
perhaps other things are based) ? 

>  - Fabric Setup: 
>    Defines how the SL2VL and VLArb tables should be setup.
>    This policy definition assumes the computation of target 
>    behavior should be performed outside of OpenSM.

Rather than fabric setup, is this better named QoS Setup (which seems
consistent with the tag used below) or QoS Fabric Setup  ? Also, what is
the relation of this group to the port group ?

>  - QoS-Levels Definition:
>    This section defines the possible sets of parameters for 
>    QoS that a client might be mapped to. Each set holds: SL
>    and optionally: Max MTU, Max Rate, Packet Lifiteme and 
>    QoS Class.
> 
>  - Matching Rules:
>    A list of rules that match an incoming PathRecord request
>    to a QoS-Level. The rules are processed in order such as 
>    the first match is applied. Each rule is built out of set
>    of match expressions which should all match for the rule
>    to apply. The matching expressions are defined for the 
>    following fields:
>      - SRC and DST to lists of port groups
>      - Service-ID to a list of Service-ID or Service-ID ranges
>      - QoS Class to a list of QoS Class values or ranges
> 
> 
> Example of the QoS policy file
> ==============================
> 
> <?xml version="1.0" encoding="ISO-8859-1"?>
> <qos-policy>
>     <!-- Port Groups define sets of ports to be used later in the settings -->
>     <port-groups>
>         <!-- using port GUIDs -->
>         <port-group> 
>             <name>Storage</name> 
>             <!-- <use> is just a description that is used for logging.
>                  Other than that, it is just a commentary -->

I would think the name is for logging. use also ?

>             <use>our SRP storage targets</use>
>             <port-guid>0x1000000000000001</port-guid>
>             <port-guid>0x1000000000000002</port-guid>
>         </port-group>
>         <port-group> 
>             <name>Virtual Servers</name> 
>             <use>node desc and IB port #</use>
>             <!-- The syntax of the port name is as follows: "hostname/CA-num/Pnum".
>                  "hostname" and "CA-num" are compared to the first 2 words of 
>                  NodeDescription, and "Pnum" is a port number on that node. -->
>             <port-name>vs1/HCA-1/P1</port-name>
>             <port-name>vs3/HCA-1/P1</port-name>
>             <port-name>vs3/HCA-2/P1</port-name>

Shouldn't this be CA rather than HCA ?

I think this may also cover routers too.

Also, any support for switches ?

>         </port-group>
>         <!-- using partitions defined in the partition policy -->
>         <port-group> 
>             <name>Partition 1</name> 
>             <use>default settings</use>
>             <partition>Part1</partition>

Thiswould correlate to the partition named Part1 in the partition
configuration. Should pkey based port groups be supported as well ? Just
wondering...

The current partition config indicates a set of port GUIDs and whether
they are full or limited members. As mentioned before, I would prefer
that this heads towards a port grouping schema on which both partitions
and QoS and perhaps other things depend.

>         </port-group>
>         <!-- using node types CA|ROUTER|SWITCH -->
>         <port-group> 
>             <name>Routers</name> 
>             <use>all routers</use>
>             <node-type>ROUTER</node-type> 
>         </port-group>  
>     </port-groups>

This grouping is similar to existing QoS support. For switches, there
are external/physical ports and extended switch port 0 which are
different. Base switch port 0 does not support QoS.

>     <qos-setup>
>         <sl2vl-tables>
>             <!-- scope defines the exact devices and in/out ports the tables apply to
>                  if the same port is matching several rules the last one applies -->
>             <sl2vl-scope> 
>                 <group>Part1</group> 
>                 <!-- *see explanation below the policy file example* -->
>                 <from>*</from> 
>                 <!-- *see explanation below the policy file example* -->
>                 <to>*</to> 
>                 <!-- SL2VL table has to have exactly 16 values (one for each SL) -->
>                 <sl2vl-table>0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,7</sl2vl-table>
>             </sl2vl-scope>
>             <sl2vl-scope>
>                 <!-- *see explanation below the policy file example* -->
>                 <across-from>Storage1</across-from>
>                 <!-- *see explanation below the policy file example* -->
>                 <across-to>Storage2</across-to>
>                 <sl2vl-table>0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0</sl2vl-table>
>             </sl2vl-scope>
>         </sl2vl-tables>
> 
>         <!-- define all types of VLArb tables. The length of the tables should 
>              match the physically supported tables by their target ports -->
>         <vlarb-tables>
>             <!-- scope defines the exact ports the VLArb tables apply to -->
>             <vlarb-scope> 
>                 <!-- defining VLArb tables on all the ports that belong to 
>                      port group 'Storage', and on all the ports that connected 
>                      to ports of port group 'Storage' -->
>                 <group>Storage</group>
>                 <!-- "across" means all the ports that are connected to ports 
>                      that belong to the specified port group -->
>                 <across>Storage</across>
>                 <!-- VLArb table holds VL and weight pairs -->
>                 <vlarb-high>0:255,1:127,2:63,3:31,4:15,5:7,6:3,7:1</vlarb-high>
>                 <vlarb-low>8:255,9:127,10:63,11:31,12:15,13:7,14:3</vlarb-low>
>                 <vl-high-limit>10</vl-high-limit>
>             </vlarb-scope>
>         </vlarb-tables>
> 
>     </qos-setup>
> 
>     <qos-levels>
>         <!-- the first one is just setting SL -->
>         <qos-level> 
>             <!-- Serial number (unique ID) of the QoS level -->
>             <sn>1</sn> 
>             <use>for the lowest priority comm</use>
>             <sl>16</sl>

SL 16 is not valid.

>             <pkey>123</pkey>

Will this take hex as well as decimal ?

>             <packet-life>16</packet-life>
>         </qos-level>
>         <!-- the second sets SL and QoS Class -->
>         <qos-level> 
>             <sn>2</sn> 
>             <use>low latency best bandwidth</use>
>             <sl>0</sl> 
>             <qos-class>7</qos-class>
>         </qos-level>
>         <!-- the whole set: SL, QoS Class, MTU-Limit, Rate-Limit, Packet Lifetime -->
>         <qos-level> 
>             <sn>3</sn> 
>             <use>just an example</use>
>             <sl>0</sl> 
>             <qos-class>32</qos-class> 
>             <mtu-limit>1</mtu-limit> 
>             <rate-limit>1</rate-limit>
>             <packet-life>12</packet-life>
>         </qos-level>
>     </qos-levels>
> 
>     <!-- Match rules are scanned in a first-fit manner (like firewall rules table) -->
>     <qos-match-rules>
>         <!-- matching by single criteria: class (list of values and ranges) -->
>         <qos-match-rule> 
>             <qos-level-sn>1</qos-level-sn> <!-- defined in <sn> of <qos-level> -->

Can this be <sn> rather than <qos-level-sn> or can't the keywords be
duplicated ?

>             <use>low latency by class 7-9 or 11</use> <!-- just a description -->
>             <qos-class>7-9,11</qos-class>
>         </qos-match-rule>
>         <!-- show matching by destination group AND service-ids -->
>         <qos-match-rule> 
>             <qos-level-sn>2</qos-level-sn> 
>             <use>Storage targets connection></use>
>             <destination>Storage</destination>

Is destination a port group and used for matching destination GID or LID
on SA PR/MPR lookups ?

>             <service>22,4719-5000</service>
>         </qos-match-rule>
>         <!-- show matching by source group only -->
>         <qos-match-rule> 
>             <qos-level-sn>3</qos-level-sn> 
>             <use>bla bla</use>
>             <source>Storage</source>

Is source a port group and used for matching source GID or LID on SA
PR/MPR lookups ?

>         </qos-match-rule>
>     </qos-match-rules>
> 
> </qos-policy>
> 
> 
> Explanation of some fields
> ==========================
> 
> Most of the tags meaning is either intuitive or explained by the 
> comments along the file. One section that deserves a special
> explanation is SL2VL tables definition - <sl2vl-scope>.
> 
> In general, VL is a function of in-port (the port that the packet
> has entered through), out-port (the port that the packet is supposed
> to come out from) and the SL.
> In OpenSM, SL2VL table is defined on every port, where this port is 
> an out-port. Hence, on every port, SL2VL table is defined as function 
> of in-port and SL.

Would the syntax work for any SM ?

Are the below tags applicable to more than switches or only switches ?

> <to>n,m</to>

Will it take n-m too (port range) ? Might be more concise for some
configs.

>   This means that of all the ports of the specified port group, define
>   SL2VL tables where to-ports are ports number n and m. Since SL2VL 
>   table is defined per out-port, using <to> effectively means defining
>   SL2VL table on ports n and m.
>   In order to specify that SL2VL table should be defined on all the 
>   ports, an asterisk (*) can be used.
> 
> <from>i,j</from>

Will this take i-j too (port range) ? Might be more concise for some
configs.

>   This means that of all the ports of the specified port group that were
>   not filtered out by the <to> value, define SL2VL table only for entries
>   where from-ports are ports number i and j.
>   In order to specify that SL2VL table should be defined for all the in-ports, 
>   an asterisk (*) can be used.
> 
> To specify that all the SL2VL tables entries should be defined for all 
> the ports of a certain group, use the following:
>     <group>port_group</group> 
>     <from>*</from>
>     <to>*</to>
> 
> <across-to>PortGroupName</across-to>
>   
>   This is combination of <across> keyword (that can be found in VLArb tables 
>   definition) and <to> keyword. 
>   <across>PortGroupName</across> means that the ports that we're talking about
>   are all the ports that are connected to ports that belong to PortGroupName.
>   Essintially, <across-to>PortGroupName</across-to> means the folowing:
>   <to>list_of_all_the_ports_that_are_connected_to_group_PortGroupName</to>
>   
>   Example of usage of <across-to>:
>   A user has a set of 'special' nodes (e.g. storage nodes), and all the
>   traffic to these nodes has to get specific VL. The solution is to define port
>   group (i.e "Storage") that will include all the ports of these nodes, and then
>   to configure SL2VL tables on all the switch ports that are connected to the
>   Storage port group by specifying <across-to>Storage</across-to>
>   
> <across-from>PortGroupName</across-from>
> 
>   Similar to <across-to>, <across-from> is combination of <across> and <from>
>   keywords.

Is omission of these keywords treated as a wildcard (*) ?

After initial read of this, I have the following higher level
questions/thoughts:

How are trunk (switch to switch) links handled by the QoS syntax ?

I also need to think more about the across ramifications. Is it really
simpler to use this syntax than to specify the specific ports in
question ?


From swise at opengridcomputing.com  Tue Feb 13 07:30:10 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Tue, 13 Feb 2007 09:30:10 -0600
Subject: [openib-general] mvapich2 ofed 1.2 problem
Message-ID: <1171380610.15471.25.camel@stevo-desktop>

Hey Roland, 

Does this stack indicate that libibverbs is accessing a 1.0 provider?
cxgb3 shouldn't be 1.0 right?


Core was generated by `IMB_2.3/src/IMB-MPI1'.
Program terminated with signal 11, Segmentation fault.

...

(gdb) bt
#0  __ibv_alloc_pd (context=0x1) at src/verbs.c:143
#1  0x00002b832d4d4381 in __ibv_alloc_pd_1_0 (context=0x617830)
    at src/compat-1_0.c:572
#2  0x00002b832cfef04e in rdma_cm_init_pd_cq ()
   from /usr/local/ofed/mpi/gcc/mvapich2-0.9.8-3/lib/libmpich.so
#3  0x00002b832cfef415 in rdma_cm_create_qp ()
   from /usr/local/ofed/mpi/gcc/mvapich2-0.9.8-3/lib/libmpich.so
#4  0x00002b832cfefa37 in ib_cma_event_handler ()
   from /usr/local/ofed/mpi/gcc/mvapich2-0.9.8-3/lib/libmpich.so
#5  0x00002b832cfefcc0 in cm_thread ()
   from /usr/local/ofed/mpi/gcc/mvapich2-0.9.8-3/lib/libmpich.so
#6  0x0000003cd9406305 in start_thread () from /lib64/libpthread.so.0
#7  0x0000003cd88cd66d in clone () from /lib64/libc.so.6
#8  0x0000000000000000 in ?? ()
(gdb)  p *context
Cannot access memory at address 0x1
(gdb) up
#1  0x00002b832d4d4381 in __ibv_alloc_pd_1_0 (context=0x617830)
    at src/compat-1_0.c:572
572     src/compat-1_0.c: No such file or directory.
        in src/compat-1_0.c
(gdb) p *context
$1 = {device = 0x617100, ops = {
    query_device = 0x2b832dcf2bc0 <iwch_query_device>,
    query_port = 0x2b832dcf2ba0 <iwch_query_port>,
    alloc_pd = 0x2b832dcf2b30 <iwch_alloc_pd>,
    dealloc_pd = 0x2b832dcf2af0 <iwch_free_pd>,
    reg_mr = 0x2b832dcf29b0 <iwch_reg_mr>,
    dereg_mr = 0x2b832dcf2c30 <iwch_dereg_mr>,
    create_cq = 0x2b832dcf3050 <iwch_create_cq>,
    poll_cq = 0x2b832dcf1770 <t3b_poll_cq>,
    req_notify_cq = 0x2b832dcf10c0 <iwch_arm_cq>, cq_event = 0,
    resize_cq = 0x2b832dcf2870 <iwch_resize_cq>,
    destroy_cq = 0x2b832dcf2f50 <iwch_destroy_cq>,
    create_srq = 0x2b832dcf2880 <iwch_create_srq>,
    modify_srq = 0x2b832dcf2890 <iwch_modify_srq>, query_srq = 0,
    destroy_srq = 0x2b832dcf28a0 <iwch_destroy_srq>,
    post_srq_recv = 0x2b832dcf28b0 <iwch_post_srq_recv>,
    create_qp = 0x2b832dcf2d30 <iwch_create_qp>, query_qp = 0,
    modify_qp = 0x2b832dcf2900 <iwch_modify_qp>,
    destroy_qp = 0x2b832dcf3200 <iwch_destroy_qp>,
    post_send = 0x2b832dcf1fa0 <t3b_post_send>,
    post_recv = 0x2b832dcf2460 <t3b_post_recv>,
    create_ah = 0x2b832dcf28c0 <iwch_create_ah>,
    destroy_ah = 0x2b832dcf28d0 <iwch_destroy_ah>,
    attach_mcast = 0x2b832dcf28e0 <iwch_attach_mcast>,
    detach_mcast = 0x2b832dcf28f0 <iwch_detach_mcast>}, cmd_fd = 768552128,
  async_fd = 11139, num_comp_vectors = 8, real_context = 0x1}


From rdreier at cisco.com  Tue Feb 13 09:07:05 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Tue, 13 Feb 2007 09:07:05 -0800
Subject: [openib-general] mvapich2 ofed 1.2 problem
In-Reply-To: <1171380610.15471.25.camel@stevo-desktop> (Steve Wise's
	message of "Tue, 13 Feb 2007 09:30:10 -0600")
References: <1171380610.15471.25.camel@stevo-desktop>
Message-ID: <adar6sum1fq.fsf@cisco.com>

 > Does this stack indicate that libibverbs is accessing a 1.0 provider?
 > cxgb3 shouldn't be 1.0 right?

 > #1  0x00002b832d4d4381 in __ibv_alloc_pd_1_0 (context=0x617830)
 >     at src/compat-1_0.c:572
 > #2  0x00002b832cfef04e in rdma_cm_init_pd_cq ()
 >    from /usr/local/ofed/mpi/gcc/mvapich2-0.9.8-3/lib/libmpich.so

This means that the app (or maybe the RDMA CM library?) is linked
against the 1.0 API -- which should work even with cxgb3 actually.
But maybe mvapich is built against the 1.1 API and the RDMA CM is
built against 1.0 or something?

 - R.


From swise at opengridcomputing.com  Tue Feb 13 09:11:26 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Tue, 13 Feb 2007 11:11:26 -0600
Subject: [openib-general] mvapich2 ofed 1.2 problem
In-Reply-To: <adar6sum1fq.fsf@cisco.com>
References: <1171380610.15471.25.camel@stevo-desktop>
	<adar6sum1fq.fsf@cisco.com>
Message-ID: <1171386686.15471.36.camel@stevo-desktop>

On Tue, 2007-02-13 at 09:07 -0800, Roland Dreier wrote:
>  > Does this stack indicate that libibverbs is accessing a 1.0 provider?
>  > cxgb3 shouldn't be 1.0 right?
> 
>  > #1  0x00002b832d4d4381 in __ibv_alloc_pd_1_0 (context=0x617830)
>  >     at src/compat-1_0.c:572
>  > #2  0x00002b832cfef04e in rdma_cm_init_pd_cq ()
>  >    from /usr/local/ofed/mpi/gcc/mvapich2-0.9.8-3/lib/libmpich.so
> 
> This means that the app (or maybe the RDMA CM library?) is linked
> against the 1.0 API -- which should work even with cxgb3 actually.
> But maybe mvapich is built against the 1.1 API and the RDMA CM is
> built against 1.0 or something?
> 

How do I tell?  Can I tell from the .so files?

I can build a non-mpi app against the librdmacm and libibverbs that got
installed and things work ok.  So maybe libmpich is balled up somehow.

Interestingly, the mpi example program, cpi, that gets built with the
rpm works.  Its just mpi programs that I build using the mpicc which
links to the libmpich.so

Steve.


From vlad at dev.mellanox.co.il  Tue Feb 13 09:19:27 2007
From: vlad at dev.mellanox.co.il (Vladimir Sokolovsky)
Date: Tue, 13 Feb 2007 19:19:27 +0200
Subject: [openib-general] new OFED 1.2 package
Message-ID: <1171387167.3978.90.camel@vladsk-laptop>

New OFED package was uploaded to the OFA server:
http://www.openfabrics.org/~vlad/builds/ofed-1.2/OFED-1.2-20070213-1646.tgz


Known issues:
mvapich2 RPM build fails (will be fixed in alpha1).	
sdpnetstat compilation fails in RHEL5


-- 
Vladimir Sokolovsky <vlad at dev.mellanox.co.il>
Mellanox Technologies Ltd.


From rdreier at cisco.com  Tue Feb 13 09:21:13 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Tue, 13 Feb 2007 09:21:13 -0800
Subject: [openib-general] mvapich2 ofed 1.2 problem
In-Reply-To: <1171386686.15471.36.camel@stevo-desktop> (Steve Wise's
	message of "Tue, 13 Feb 2007 11:11:26 -0600")
References: <1171380610.15471.25.camel@stevo-desktop>
	<adar6sum1fq.fsf@cisco.com> <1171386686.15471.36.camel@stevo-desktop>
Message-ID: <adazm7ikm7q.fsf@cisco.com>

 > How do I tell?  Can I tell from the .so files?

ldd on the .so and the app would probably give you good info.

I'm pretty sure that mpicc must be linking against an libibverbs 1.0
from somewhere.

 - R.


From swise at opengridcomputing.com  Tue Feb 13 09:36:36 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Tue, 13 Feb 2007 11:36:36 -0600
Subject: [openib-general] mvapich2 ofed 1.2 problem
In-Reply-To: <adazm7ikm7q.fsf@cisco.com>
References: <1171380610.15471.25.camel@stevo-desktop>
	<adar6sum1fq.fsf@cisco.com> <1171386686.15471.36.camel@stevo-desktop>
	<adazm7ikm7q.fsf@cisco.com>
Message-ID: <1171388196.15471.47.camel@stevo-desktop>


On Tue, 2007-02-13 at 09:21 -0800, Roland Dreier wrote:
>  > How do I tell?  Can I tell from the .so files?
> 
> ldd on the .so and the app would probably give you good info.
> 
> I'm pretty sure that mpicc must be linking against an libibverbs 1.0
> from somewhere.
> 
>  - R.

By the way, the problem also happens running over mthca/IB with
librdmacm.

mpicc has '-libverbs'
mpicc.conf has '-libverbs' too.


ldd output.  Looks like they are all linking to libibverbs.so.1.  Is
that correct?


[mpi at vic20 ~]$ ldd IMB_2.3/src/IMB-MPI1
        libmpich.so => /usr/local/ofed/mpi/gcc/mvapich2-0.9.8-3/lib/libmpich.so (0x00002b0d7cefb000)
        librdmacm.so => /usr/local/ofed/lib64/librdmacm.so (0x00002b0d7d1b3000)
        libibverbs.so.1 => /usr/local/ofed/lib64/libibverbs.so.1 (0x00002b0d7d2b8000)
        libibumad.so.1 => /usr/local/ofed/lib64/libibumad.so.1 (0x00002b0d7d3c3000)
        libpthread.so.0 => /lib64/tls/libpthread.so.0 (0x0000003e07000000)
        librt.so.1 => /lib64/tls/librt.so.1 (0x0000003e0ba00000)
        libc.so.6 => /lib64/tls/libc.so.6 (0x0000003e06500000)
        libsysfs.so.1 => /usr/lib64/libsysfs.so.1 (0x0000003e06a00000)
        libdl.so.2 => /lib64/libdl.so.2 (0x0000003e06300000)
        libibcommon.so.1 => /usr/local/ofed/lib64/libibcommon.so.1 (0x00002b0d7d4cf000)
        /lib64/ld-linux-x86-64.so.2 (0x0000003e06100000)

[mpi at vic20 ~]$ ldd /usr/local/ofed/mpi/gcc/mvapich2-0.9.8-3/lib/libmpich.so
        libc.so.6 => /lib64/tls/libc.so.6 (0x00002b6a6061d000)
        /lib64/ld-linux-x86-64.so.2 (0x0000555555554000)

[mpi at vic20 ~]$ ldd /usr/local/ofed/lib64/librdmacm.so
        libibverbs.so.1 => /usr/local/ofed/lib64/libibverbs.so.1 (0x00002b3ef50de000)
        libsysfs.so.1 => /usr/lib64/libsysfs.so.1 (0x00002b3ef51ea000)
        libc.so.6 => /lib64/tls/libc.so.6 (0x00002b3ef52f6000)
        libpthread.so.0 => /lib64/tls/libpthread.so.0 (0x00002b3ef552a000)
        libdl.so.2 => /lib64/libdl.so.2 (0x00002b3ef5640000)
        /lib64/ld-linux-x86-64.so.2 (0x0000555555554000)

[mpi at vic20 ~]$ ldd /usr/local/ofed/lib64/libcxgb3-rdmav2.so
        libibverbs.so.1 => /usr/local/ofed/lib64/libibverbs.so.1 (0x00002b83c160e000)
        libc.so.6 => /lib64/tls/libc.so.6 (0x00002b83c171a000)
        libpthread.so.0 => /lib64/tls/libpthread.so.0 (0x00002b83c194e000)
        libdl.so.2 => /lib64/libdl.so.2 (0x00002b83c1a63000)
        /lib64/ld-linux-x86-64.so.2 (0x0000555555554000)

[mpi at vic20 ~]$ ldd /usr/local/ofed/lib64/libcxgb3.so
        libibverbs.so.1 => /usr/local/ofed/lib64/libibverbs.so.1 (0x00002ac8e4920000)
        libc.so.6 => /lib64/tls/libc.so.6 (0x00002ac8e4a2c000)
        libpthread.so.0 => /lib64/tls/libpthread.so.0 (0x00002ac8e4c60000)
        libdl.so.2 => /lib64/libdl.so.2 (0x00002ac8e4d75000)
        /lib64/ld-linux-x86-64.so.2 (0x0000555555554000)


From swise at opengridcomputing.com  Tue Feb 13 09:51:24 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Tue, 13 Feb 2007 11:51:24 -0600
Subject: [openib-general] mvapich2 ofed 1.2 problem
In-Reply-To: <1171388196.15471.47.camel@stevo-desktop>
References: <1171380610.15471.25.camel@stevo-desktop>
	<adar6sum1fq.fsf@cisco.com> <1171386686.15471.36.camel@stevo-desktop>
	<adazm7ikm7q.fsf@cisco.com> <1171388196.15471.47.camel@stevo-desktop>
Message-ID: <1171389084.15471.56.camel@stevo-desktop>

So this program doesn't work:

> [mpi at vic20 ~]$ ldd IMB_2.3/src/IMB-MPI1
>         libmpich.so => /usr/local/ofed/mpi/gcc/mvapich2-0.9.8-3/lib/libmpich.so (0x00002b0d7cefb000)
>         librdmacm.so => /usr/local/ofed/lib64/librdmacm.so (0x00002b0d7d1b3000)
>         libibverbs.so.1 => /usr/local/ofed/lib64/libibverbs.so.1 (0x00002b0d7d2b8000)
>         libibumad.so.1 => /usr/local/ofed/lib64/libibumad.so.1 (0x00002b0d7d3c3000)
>         libpthread.so.0 => /lib64/tls/libpthread.so.0 (0x0000003e07000000)
>         librt.so.1 => /lib64/tls/librt.so.1 (0x0000003e0ba00000)
>         libc.so.6 => /lib64/tls/libc.so.6 (0x0000003e06500000)
>         libsysfs.so.1 => /usr/lib64/libsysfs.so.1 (0x0000003e06a00000)
>         libdl.so.2 => /lib64/libdl.so.2 (0x0000003e06300000)
>         libibcommon.so.1 => /usr/local/ofed/lib64/libibcommon.so.1 (0x00002b0d7d4cf000)
>         /lib64/ld-linux-x86-64.so.2 (0x0000003e06100000)
> 

And this one does:

[root at vic20 ~]# ldd /usr/local/ofed/mpi/gcc/mvapich2-0.9.8-3/examples/cpi
        libm.so.6 => /lib64/tls/libm.so.6 (0x0000003e06800000)
        librdmacm.so => /usr/local/ofed/lib64/librdmacm.so (0x00002b1353b65000)
        libibverbs.so.1 => /usr/local/ofed/lib64/libibverbs.so.1 (0x00002b1353c6a000)
        libibumad.so.1 => /usr/local/ofed/lib64/libibumad.so.1 (0x00002b1353d75000)
        libpthread.so.0 => /lib64/tls/libpthread.so.0 (0x0000003e07000000)
        librt.so.1 => /lib64/tls/librt.so.1 (0x0000003e0ba00000)
        libc.so.6 => /lib64/tls/libc.so.6 (0x0000003e06500000)
        libdl.so.2 => /lib64/libdl.so.2 (0x0000003e06300000)
        libsysfs.so.1 => /usr/lib64/libsysfs.so.1 (0x0000003e06a00000)
        libibcommon.so.1 => /usr/local/ofed/lib64/libibcommon.so.1 (0x00002b1353e81000)
        /lib64/ld-linux-x86-64.so.2 (0x0000003e06100000)


Note the cpi program doesn't dynamically link with libmpich.so.  That
appears to be the difference...


From sean.hefty at intel.com  Tue Feb 13 09:53:10 2007
From: sean.hefty at intel.com (Sean Hefty)
Date: Tue, 13 Feb 2007 09:53:10 -0800
Subject: [openib-general] Problem is routing CM REQ
In-Reply-To: <20070213020330.GZ11411@obsidianresearch.com>
Message-ID: <000501c74f97$d3497090$8698070a@amr.corp.intel.com>

>What your #4 and #5 are talking about is not just that, but also PR
>queries that can unambigously identify the LID selections of the
>router in advance. That is hugely different! IMHO, just because a
>reversible path exists and will be used by the router shouldn't be
>taken to mean that the it is the only one or that the SA can tell you
>which of many possible choices it will be.

Yes - I was trying to define a routed path as reversible with respect to a
connection.  It makes things easier.  :)  This is where we've been
disconnecting.

I was wanting a packet sent from the remote GID to the local GID to come back
over the local DLID/SLID path specified in the path record if reversible is
true.  I give.  This was too strong of an assumption, since the response path
could travel a different DLID/SLID path and still qualify as reversible.

So, it seems that with respect to connections between subnets, path records
should be treated as if they were not reversible.  Using my model then would
require 4 queries...  (I need to read back through the discussion and see if the
different ideas can be condensed/summarized.)

>If so then I'd expect also for a SGID=off-subnet query to return the
>remote LIDs to make CM work properly with existing conforming
>implementations (that use 3 PR queries to get non-reversable paths
>;>).

I think it makes more sense to push interaction with a remote SA to the end node
to give them greater control over the query and avoid the local SA indirection.

- Sean


From panda at cse.ohio-state.edu  Tue Feb 13 09:59:19 2007
From: panda at cse.ohio-state.edu (Dhabaleswar Panda)
Date: Tue, 13 Feb 2007 12:59:19 -0500 (EST)
Subject: [openib-general] mvapich2 ofed 1.2 problem
In-Reply-To: <1171389084.15471.56.camel@stevo-desktop> from "Steve Wise"
	at Feb 13, 2007 11:51:24 AM
Message-ID: <200702131759.l1DHxJGC027072@xi.cse.ohio-state.edu>

Steve - Shaun will send a detailed reply to you on this issue shortly.

It looks like the patch sent by Shaun to Vlad (on Saturday) was not
applied to the latest OFED install script/build. This might be causing
all these problems. Vlad and Shaun have discussed this issue today
morning. Shaun has sent another updated patch to Vlad today (during
the last hour). Vlad will check and apply it tomorrow. Hopefully, this
will solve all the problems. 

Thanks, 

DK


> So this program doesn't work:
> 
> > [mpi at vic20 ~]$ ldd IMB_2.3/src/IMB-MPI1
> >         libmpich.so => /usr/local/ofed/mpi/gcc/mvapich2-0.9.8-3/lib/libmpich.so (0x00002b0d7cefb000)
> >         librdmacm.so => /usr/local/ofed/lib64/librdmacm.so (0x00002b0d7d1b3000)
> >         libibverbs.so.1 => /usr/local/ofed/lib64/libibverbs.so.1 (0x00002b0d7d2b8000)
> >         libibumad.so.1 => /usr/local/ofed/lib64/libibumad.so.1 (0x00002b0d7d3c3000)
> >         libpthread.so.0 => /lib64/tls/libpthread.so.0 (0x0000003e07000000)
> >         librt.so.1 => /lib64/tls/librt.so.1 (0x0000003e0ba00000)
> >         libc.so.6 => /lib64/tls/libc.so.6 (0x0000003e06500000)
> >         libsysfs.so.1 => /usr/lib64/libsysfs.so.1 (0x0000003e06a00000)
> >         libdl.so.2 => /lib64/libdl.so.2 (0x0000003e06300000)
> >         libibcommon.so.1 => /usr/local/ofed/lib64/libibcommon.so.1 (0x00002b0d7d4cf000)
> >         /lib64/ld-linux-x86-64.so.2 (0x0000003e06100000)
> > 
> 
> And this one does:
> 
> [root at vic20 ~]# ldd /usr/local/ofed/mpi/gcc/mvapich2-0.9.8-3/examples/cpi
>         libm.so.6 => /lib64/tls/libm.so.6 (0x0000003e06800000)
>         librdmacm.so => /usr/local/ofed/lib64/librdmacm.so (0x00002b1353b65000)
>         libibverbs.so.1 => /usr/local/ofed/lib64/libibverbs.so.1 (0x00002b1353c6a000)
>         libibumad.so.1 => /usr/local/ofed/lib64/libibumad.so.1 (0x00002b1353d75000)
>         libpthread.so.0 => /lib64/tls/libpthread.so.0 (0x0000003e07000000)
>         librt.so.1 => /lib64/tls/librt.so.1 (0x0000003e0ba00000)
>         libc.so.6 => /lib64/tls/libc.so.6 (0x0000003e06500000)
>         libdl.so.2 => /lib64/libdl.so.2 (0x0000003e06300000)
>         libsysfs.so.1 => /usr/lib64/libsysfs.so.1 (0x0000003e06a00000)
>         libibcommon.so.1 => /usr/local/ofed/lib64/libibcommon.so.1 (0x00002b1353e81000)
>         /lib64/ld-linux-x86-64.so.2 (0x0000003e06100000)
> 
> 
> Note the cpi program doesn't dynamically link with libmpich.so.  That
> appears to be the difference...
> 
> 
> 
> 
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
> 


From rowland at cse.ohio-state.edu  Tue Feb 13 10:01:47 2007
From: rowland at cse.ohio-state.edu (Shaun Rowland)
Date: Tue, 13 Feb 2007 13:01:47 -0500
Subject: [openib-general] mvapich2 ofed 1.2 problem
In-Reply-To: <adazm7ikm7q.fsf@cisco.com>
References: <1171380610.15471.25.camel@stevo-desktop>
	<adar6sum1fq.fsf@cisco.com> <1171386686.15471.36.camel@stevo-desktop>
	<adazm7ikm7q.fsf@cisco.com>
Message-ID: <45D1FD0B.2080606@cse.ohio-state.edu>

Roland Dreier wrote:
>  > How do I tell?  Can I tell from the .so files?
> 
> ldd on the .so and the app would probably give you good info.
> 
> I'm pretty sure that mpicc must be linking against an libibverbs 1.0
> from somewhere.
> 
>  - R.

When I build using the OFED-1.2-20070208-1508, libibverbs 1.0 is what is
built, at least by looking at the .so file result:

[rowland at z0 ~]$ ls /usr/local/ofed/lib64/ |grep ibverbs libibverbs.a
libibverbs.so
libibverbs.so.1
libibverbs.so.1.0.0

This seems odd. Is it correct? I have updated the MVAPICH2 SRPM and sent
a new patch for the OFED install scripts. This won't be reflected until
the alpha1 release. Still, does the above seem strange? I noticed this
recently. I see symbols for both versions though:

0000000000005a50 T ibv_detach_mcast at IBVERBS_1.0
00000000000082c0 T ibv_detach_mcast@@IBVERBS_1.1
0000000000000000 A IBVERBS_1.0
0000000000000000 A IBVERBS_1.1

Our code links to these libraries, and by default mpicc
should use what's in /usr/local/ofed/lib[64] in the -L path itself
directly too. Is this an issue in the library? The libmpich.so file
should not be any different when built. We will investigate this.

I can provide a patch against the latest OFED tar.gz to use the
mvapich2-0.9.8-3.src.rpm once I download the release if that would help,
as we have changed some things since the -2 SRPM release. Again, this
should be reflected in the alpha1 release.
-- 
Shaun Rowland	rowland at cse.ohio-state.edu
http://www.cse.ohio-state.edu/~rowland/


From vatsa at veritas.com  Tue Feb 13 10:06:04 2007
From: vatsa at veritas.com (vatsa at veritas.com)
Date: Tue, 13 Feb 2007 10:06:04 -0800
Subject: [openib-general] new OFED 1.2 package
In-Reply-To: <1171387167.3978.90.camel@vladsk-laptop>
References: <1171387167.3978.90.camel@vladsk-laptop>
Message-ID: <45D1FE0C.1050203@veritas.com>

Hi,

Is there a way to get OFED 1.2 binary rpms for RHEL4 Update 4 on x86_64 ?

Thanks,
Sreevatsa
> New OFED package was uploaded to the OFA server:
> http://www.openfabrics.org/~vlad/builds/ofed-1.2/OFED-1.2-20070213-1646.tgz
>
>
>
> Known issues:
> mvapich2 RPM build fails (will be fixed in alpha1).	
> sdpnetstat compilation fails in RHEL5
>
>
>   


From rdreier at cisco.com  Tue Feb 13 10:45:58 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Tue, 13 Feb 2007 10:45:58 -0800
Subject: [openib-general] mvapich2 ofed 1.2 problem
In-Reply-To: <45D1FD0B.2080606@cse.ohio-state.edu> (Shaun Rowland's
	message of "Tue, 13 Feb 2007 13:01:47 -0500")
References: <1171380610.15471.25.camel@stevo-desktop>
	<adar6sum1fq.fsf@cisco.com> <1171386686.15471.36.camel@stevo-desktop>
	<adazm7ikm7q.fsf@cisco.com> <45D1FD0B.2080606@cse.ohio-state.edu>
Message-ID: <ada1wktlwux.fsf@cisco.com>

 > When I build using the OFED-1.2-20070208-1508, libibverbs 1.0 is what is
 > built, at least by looking at the .so file result:
 > 
 > [rowland at z0 ~]$ ls /usr/local/ofed/lib64/ |grep ibverbs libibverbs.a
 > libibverbs.so
 > libibverbs.so.1
 > libibverbs.so.1.0.0

The soname hasn't changed because the library is still compatible.
But (I hope at least) OFED has libibverbs 1.1.


From rdreier at cisco.com  Tue Feb 13 11:05:09 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Tue, 13 Feb 2007 11:05:09 -0800
Subject: [openib-general] [GIT PULL] please pull infiniband.git
In-Reply-To: <45D17418.3000508@dev.mellanox.co.il> (Dotan Barak's
	message of "Tue, 13 Feb 2007 10:17:28 +0200")
References: <adazm7ioqpc.fsf@cisco.com> <45D17418.3000508@dev.mellanox.co.il>
Message-ID: <ada64a5khei.fsf@cisco.com>

 > What about the patch that i sent on "Allow the following QP state
 > transition : reset --> reset"?

OK, I'll merge that in the next patch.  It's the kind of patch I'm not
happy about merging, since it bloats the code to handle a corner case
no one is likely to hit in practice, but it is technically correct so
I guess we're forced to merge it.

 - R.


From swise at opengridcomputing.com  Tue Feb 13 11:52:20 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Tue, 13 Feb 2007 13:52:20 -0600
Subject: [openib-general] mvapich2 ofed 1.2 problem
In-Reply-To: <45D1FD0B.2080606@cse.ohio-state.edu>
References: <1171380610.15471.25.camel@stevo-desktop>
	<adar6sum1fq.fsf@cisco.com> <1171386686.15471.36.camel@stevo-desktop>
	<adazm7ikm7q.fsf@cisco.com> <45D1FD0B.2080606@cse.ohio-state.edu>
Message-ID: <1171396340.21471.2.camel@stevo-desktop>


On Tue, 2007-02-13 at 13:01 -0500, Shaun Rowland wrote:
> Roland Dreier wrote:
> >  > How do I tell?  Can I tell from the .so files?
> > 
> > ldd on the .so and the app would probably give you good info.
> > 
> > I'm pretty sure that mpicc must be linking against an libibverbs 1.0
> > from somewhere.
> > 
> >  - R.
> 
> When I build using the OFED-1.2-20070208-1508, libibverbs 1.0 is what is
> built, at least by looking at the .so file result:
> 
> [rowland at z0 ~]$ ls /usr/local/ofed/lib64/ |grep ibverbs libibverbs.a
> libibverbs.so
> libibverbs.so.1
> libibverbs.so.1.0.0
> 
> This seems odd. Is it correct? I have updated the MVAPICH2 SRPM and sent
> a new patch for the OFED install scripts. This won't be reflected until
> the alpha1 release. Still, does the above seem strange? I noticed this
> recently. I see symbols for both versions though:
> 
> 0000000000005a50 T ibv_detach_mcast at IBVERBS_1.0
> 00000000000082c0 T ibv_detach_mcast@@IBVERBS_1.1
> 0000000000000000 A IBVERBS_1.0
> 0000000000000000 A IBVERBS_1.1
> 
> Our code links to these libraries, and by default mpicc
> should use what's in /usr/local/ofed/lib[64] in the -L path itself
> directly too. Is this an issue in the library? The libmpich.so file
> should not be any different when built. We will investigate this.
> 
> I can provide a patch against the latest OFED tar.gz to use the
> mvapich2-0.9.8-3.src.rpm once I download the release if that would help,
> as we have changed some things since the -2 SRPM release. Again, this
> should be reflected in the alpha1 release.


I was hoping to sniff-test mvapich2 over OFA/iWARP.  So if you can get
something that works I'll try it out.

Steve.


From tziporet at mellanox.co.il  Tue Feb 13 12:03:59 2007
From: tziporet at mellanox.co.il (Tziporet Koren)
Date: Tue, 13 Feb 2007 22:03:59 +0200
Subject: [openib-general] new OFED 1.2 package
In-Reply-To: <45D1FE0C.1050203@veritas.com>
References: <1171387167.3978.90.camel@vladsk-laptop>
	<45D1FE0C.1050203@veritas.com>
Message-ID: <45D219AF.3090008@mellanox.co.il>

vatsa at veritas.com wrote:
> Hi,
>
> Is there a way to get OFED 1.2 binary rpms for RHEL4 Update 4 on x86_64 ?
>
You should build them on your machines - see the OFED installation guide 
(you can also access it from git:
http://staging.openfabrics.org/git/?p=~tziporet/docs.git;a=blob;f=OFED_Installation_Guide.txt;h=3b832cc14ac53c07e1935f5ca3bee750755c437a;hb=f43a950c36d081c939fbb407c64d1fd6d97c1cd7


Tziporet


From swise at opengridcomputing.com  Tue Feb 13 12:10:01 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Tue, 13 Feb 2007 14:10:01 -0600
Subject: [openib-general] [PATCH] ofed_1_2/iw_cxgb3 - Free any pending mmaps
 in iwch_dealloc_ucontext().
Message-ID: <1171397401.21471.5.camel@stevo-desktop>

Vlad/Michael,  

This should be pushed into ofed_1_2.  It can wait until after alpha1,
however, if you want.

Steve.

-----


Free any pending mmaps in iwch_dealloc_ucontext().

Signed-off-by: Steve Wise <swise at opengridcomputing.com>

---

 drivers/infiniband/hw/cxgb3/iwch_provider.c |    4 ++++
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.c b/drivers/infiniband/hw/cxgb3/iwch_provider.c
index dbb3f71..4a46771 100644
--- a/drivers/infiniband/hw/cxgb3/iwch_provider.c
+++ b/drivers/infiniband/hw/cxgb3/iwch_provider.c
@@ -98,7 +98,11 @@ static int iwch_dealloc_ucontext(struct 
 {
 	struct iwch_dev *rhp = to_iwch_dev(context->device);
 	struct iwch_ucontext *ucontext = to_iwch_ucontext(context);
+	struct iwch_mm_entry *mm, *tmp;
+
 	PDBG("%s context %p\n", __FUNCTION__, context);
+	list_for_each_entry_safe(mm, tmp, &ucontext->mmaps, entry)
+		kfree(mm);
 	cxio_release_ucontext(&rhp->rdev, &ucontext->uctx);
 	kfree(ucontext);
 	return 0;


From tziporet at mellanox.co.il  Tue Feb 13 12:11:49 2007
From: tziporet at mellanox.co.il (Tziporet Koren)
Date: Tue, 13 Feb 2007 22:11:49 +0200
Subject: [openib-general] [openfabrics-ewg] new OFED 1.2 package
In-Reply-To: <1171387167.3978.90.camel@vladsk-laptop>
References: <1171387167.3978.90.camel@vladsk-laptop>
Message-ID: <45D21B85.9070007@mellanox.co.il>

Vladimir Sokolovsky wrote:
> New OFED package was uploaded to the OFA server:
> http://www.openfabrics.org/~vlad/builds/ofed-1.2/OFED-1.2-20070213-1646.tgz
>
>
>
> Known issues:
> mvapich2 RPM build fails (will be fixed in alpha1).	
> sdpnetstat compilation fails in RHEL5
>
>
>   
Hi All,

This is the pre-alpha package for your testing.
Please send us feedback today so we can build the first alpha OFED tomorrow.
If any show-stopper issue for the alpha is found please let us know.

Note that components compilation is blocked on kernels that they do not 
support.

Thanks,
Tziporet


From swise at opengridcomputing.com  Tue Feb 13 12:12:02 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Tue, 13 Feb 2007 14:12:02 -0600
Subject: [openib-general] OFED 1.2 dapl and dat.conf
Message-ID: <1171397522.21471.7.camel@stevo-desktop>

Currently, the dapl rpms don't install dat.conf.  I think they probably
should, eh?  Maybe in <prefix>/etc/dat.conf


Steve.


From krause at cup.hp.com  Tue Feb 13 12:46:41 2007
From: krause at cup.hp.com (Michael Krause)
Date: Tue, 13 Feb 2007 12:46:41 -0800
Subject: [openib-general] Problem is routing CM REQ
In-Reply-To: <45D0FCC8.4090304@ichips.intel.com>
References: <20070210004820.GS11411@obsidianresearch.com>
	<000001c74cb8$5e80eef0$3cd4180a@amr.corp.intel.com>
	<20070211230935.GT11411@obsidianresearch.com>
	<45D0A27A.2010302@ichips.intel.com>
	<20070212205634.GW11411@obsidianresearch.com>
	<6.2.0.14.2.20070212152343.08e8ca48@esmail.cup.hp.com>
	<45D0FCC8.4090304@ichips.intel.com>
Message-ID: <6.2.0.14.2.20070213124447.08e6e600@esmail.cup.hp.com>

At 03:48 PM 2/12/2007, Sean Hefty wrote:
>>An endnode look up should be to find the address vector to the 
>>remote.   A look up may return multiple vectors.   The SLID would 
>>correspond to each local subnet router port that acts as a first-hop 
>>destination to the remote subnet.    I don't see why the router protocol 
>>would not simply enable all paths on the local subnet to a given remote 
>>subnet be acquired.  All of the work is kept local to the SA / SM in the 
>>source subnet when determining a remote path to take.
>>Why is there any need to define more than just this?
>
>For an RC QP, we need at least two sets of LIDs.  In the simplest case, we 
>need the SLID/router DLID for the local subnet, and the router SLID/DLID 
>for the remote subnet.  The problem is in obtaining the SLID/DLID for the 
>remote subnet.

Not quite.   The router protocol should determine the "next hop" LID to be 
used to either reach the destination endnode if in its local subnet or for 
the next router on the path to the remote.   CM only needs to be concerned 
with what is in a local subnet for finding the router or the endnode.  It 
does not need to comprehend the remote subnet(s) LID.   That is the router 
protocol to determine.  CM also must understand the GIDs involved which the 
router will process to figure out its LID mapping to the next hop.

Mike  


From krause at cup.hp.com  Tue Feb 13 12:52:35 2007
From: krause at cup.hp.com (Michael Krause)
Date: Tue, 13 Feb 2007 12:52:35 -0800
Subject: [openib-general] Immediate data question
In-Reply-To: <309a667c0702130537u35745e98y429d3d564fb093e9@mail.gmail.co
 m>
References: <6C2C79E72C305246B504CBA17B5500C905DC04@mtlexch01.mtl.com>
	<349DCDA352EACF42A0C49FA6DCEA840350B1B5@G3W0634.americas.hpqcorp.net>
	<ada7iuwp5rr.fsf@cisco.com>
	<349DCDA352EACF42A0C49FA6DCEA84035DF880@G3W0634.americas.hpqcorp.net>
	<adamz3pfym0.fsf@cisco.com>
	<349DCDA352EACF42A0C49FA6DCEA84035DFAFF@G3W0634.americas.hpqcorp.net>
	<adahctxeds8.fsf@cisco.com>
	<6.2.0.14.2.20070208131610.085c52f8@esmail.cup.hp.com>
	<349DCDA352EACF42A0C49FA6DCEA840362A87E@G3W0634.americas.hpqcorp.net>
	<309a667c0702112110h7a79961fv9c3cf46d4392e1d4@mail.gmail.com>
	<309a667c0702130537u35745e98y429d3d564fb093e9@mail.gmail.com>
Message-ID: <6.2.0.14.2.20070213125130.07f4dbf8@esmail.cup.hp.com>

At 05:37 AM 2/13/2007, Devesh Sharma wrote:
>On 2/12/07, Devesh Sharma <devesh28 at gmail.com> wrote:
>>On 2/10/07, Tang, Changqing <changquing.tang at hp.com> wrote:
>> > > >
>> > > >Not for the receiver, but the sender will be severely slowed down by
>> > > >having to wait for the RNR timeouts.
>> > >
>> > > RNR = Receiver Not Ready so by definition, the data flow
>> > > isn't going to
>> > > progress until the receiver is ready to receive data.   If a
>> > > receive QP
>> > > enters RNR for a RC, then it is likely not progressing as
>> > > desired.   RNR
>> > > was initially put in place to enable a receiver to create
>> > > back pressure to the sender without causing a fatal error
>> > > condition.  It should rarely be entered and therefore should
>> > > have negligible impact on overall performance however when a
>> > > RNR occurs, no forward progress will occur so performance is
>> > > essentially zero.
>> >
>> > Mike:
>> >         I still do not quite understand this issue. I have two
>> > situations that have RNR triggered.
>> >
>> > 1. process A and process B is connected with QP. A first post a send to
>> > B, B does not post receive. Then A and B are doing a long time
>> > RDMA_WRITE each other, A and B just check memory for the RDMA_WRITE
>> > message. Finally B will post a receive. Does the first pending send in A
>> > block all the later RDMA_WRITE ?
>>According to IBTA spec HCA will process WR entries in strict order in
>>which they are posted so the send will block all WR posted after this
>>send, Until-unless HCA has multiple processing elements, I think even
>>then processing order will be maintained by HCA
>>  If not, since RNR is triggered
>> > periodically till B post receive, does it affect the RDMA_WRITE
>> > performance between A and B ?
>> >
>> > 2. extend above to three processes, A connect to B, B connect to C, so B
>> > has two QPs, but one CQ.A posts a send to B, B does not post receive,
>post ordering accross QP is not guaranteed hence presence of same CQ
>or different CQ will not affect any thing.
>> > rather B and C are doing a long time RDMA_WRITE,or send/recv. But B
>If RDMA WRITE _on_ B, no effect on performance. If RDMA WRITE _on_ C,
>_may_ affect the performance, since load is on same HCA. In case of
>Send/Recv again _may_ affect the performance, with the same reason.

Seems orthogonal.  Any time h/w is shared, multiple flows will have an 
impact on one another.  That is why we have the different arbitration 
mechanisms to enable one to control that impact.

>> > must sends RNR periodically to A, right?. So does the pending message
>> > from A affects B's overall performance  between B and C ?
>But RNR NAK is not for very long time.....possibly this performance
>hit you will not be able to observe even. The moment rnr_counter
>expires connection will be broken!

Keep in mind the timeout can be infinite.  RNR NAK are not expected to be 
frequent so their performance impact was considered reasonable.

Mike

>> >
>> >         Thank you.
>> >
>> > --CQ
>> >
>> >
>> > >
>> > > Mike
>> > >
>> > >
>> > >
>> >
>> > _______________________________________________
>> > openib-general mailing list
>> > openib-general at openib.org
>> > http://openib.org/mailman/listinfo/openib-general
>> >
>> > To unsubscribe, please visit 
>> http://openib.org/mailman/listinfo/openib-general
>> >
>> >


From krause at cup.hp.com  Tue Feb 13 12:49:57 2007
From: krause at cup.hp.com (Michael Krause)
Date: Tue, 13 Feb 2007 12:49:57 -0800
Subject: [openib-general] Problem is routing CM REQ
In-Reply-To: <20070213001045.GY11411@obsidianresearch.com>
References: <20070210004820.GS11411@obsidianresearch.com>
	<000001c74cb8$5e80eef0$3cd4180a@amr.corp.intel.com>
	<20070211230935.GT11411@obsidianresearch.com>
	<45D0A27A.2010302@ichips.intel.com>
	<20070212205634.GW11411@obsidianresearch.com>
	<6.2.0.14.2.20070212152343.08e8ca48@esmail.cup.hp.com>
	<20070213001045.GY11411@obsidianresearch.com>
Message-ID: <6.2.0.14.2.20070213124803.08ee1208@esmail.cup.hp.com>

At 04:10 PM 2/12/2007, Jason Gunthorpe wrote:
>On Mon, Feb 12, 2007 at 03:31:15PM -0800, Michael Krause wrote:
>
> > TClass is intended to communicate the end-to-end QoS desired.   TClass is
> > then mapped to a SL that is local to each subnet.   A flow label is
> > intended to much the same as in the IP world and is left, in essence, to
> > routers to manage.    An endnode look up should be to find the address
> > vector to the remote.   A look up may return multiple vectors.   The SLID
> > would correspond to each local subnet router port that acts as a first-hop
> > destination to the remote subnet.    I don't see why the router protocol
> > would not simply enable all paths on the local subnet to a given remote
> > subnet be acquired.  All of the work is kept local to the SA / SM in the
> > source subnet when determining a remote path to take.   Why is there any
> > need to define more than just this?  Define a router protocol to
> > communicate the each subnet's prefix, TClass, etc. and apply KISS.   A
> > management entity that wanted to manage out each subnet provides router
> > management in terms of route selection, etc. can be constructed by using
> > the existing protocols / tools combined with a new router protocol which
> > only does DGID to next hop SLID mapping.
>
>All of this complexity is due to the RC QP requirement that the SLID
>of an incoming LRH match the DLID programmed into the QP.
>
>Translated into a network with routers this means that for a RC flow
>to successfully work both the *forward* and *reverse* direction must
>traverse the same router *LID* not just *port* on both subnets.

That is a given since the LID = path and same path must be used to insure 
strong ordering is maintained.

>Please see the little ascii diagram I drew in a prior email to
>understand my concern.
>
>There is no such restriction in a real IP network. It would be akin to
>having a host match the source MAC address in the ethernet frame to
>double check that it came from the router port it is sending outgoing
>packets to. Which means simple one-sided solutions from IP land don't
>work here.
>
>Things work exactly the way you outline today for UD. They don't work
>at all for the general case of RC. Get rid of the QP requirement and
>things work the way you outline for RC too. Keep it in and you must
>use the FlowLabel to force the flows onto the right router LID.

The same path must always be used to maintain strong ordering.  This is 
immutable part of IB technology.

>That is why I said previously that the QP matching rules are a
>mistake. The best way to solve this is to change C9-54 to only be in
>effect if the GRH is not present.

I disagree.  We were very explicit in how and why we constructed those rules.

>CM also introduces the much smaller problem of getting the LIDs to the
>passive side - but that cannot be solved without a broad solution to
>the RC QP SLID matching problem.

Mike 


From tziporet at mellanox.co.il  Tue Feb 13 13:05:22 2007
From: tziporet at mellanox.co.il (Tziporet Koren)
Date: Tue, 13 Feb 2007 23:05:22 +0200
Subject: [openib-general] OFED 1.2 Feb-12 meeting summary
Message-ID: <45D22812.3030904@mellanox.co.il>

Hi,

This is the OFED 1.2 Feb-12 meeting summary on alpha status:

* Abbreviated minutes / summary:*

    * The alpha release should be done on Wed Feb-14. (A package for
      testing was already published today)
    * Not all components must support all OSes for the alpha.
    * There going to be 3 weeks for testing the alpha release.
    * Next milestone is the Beta release - on March-7

*Note:* please post all OFED related mails to EWG mailing list too and 
not just the general list.

* Detailed Minutes:*
We reviewed all OFED components and this is the status toward the alpha 
release:
*
Kernel*
ib_verbs (core) - ready - need to add CMA patch for iWRAP to support 
uDAPL - was done today
ib_mthca - ready
ib_ipoib - ready
ib_ipath - currently works on 2.6.20 only. Backport patches will be 
available for the beta
ib_iser - ready
ib_sdp - ready
ib_srp - ready
ib_ehca - PPC only - ready
cxgb3 - ready - backport patch for SLES9 was applied today
vnic - ready
rds - currently works on kernel 2.6.20 and 2.6.19. RHEL and SLES support 
will be added for the beta.
ib-bonding - - ready (will work only on RHEL4UP3 & SLES10 )
madeye - we forgot to take this module from OFED 1.1. Will be done for 
the beta.

*User libraries*
libibverbs - ready; man pages should be check-in by Roland for the beta.
libibcm - ready
libmthca - ready
libipathverbs - missing the new mode of libibverbs. Will be done for the 
beta
libcxgb3 - ready
libsdp - ready
libehca - ready
libibcommon - ready
libibmad - ready
libibumad - ready
libopensm - ready
libosmcomp - ready
libosmvendor - ready
librdmacm - ready
dapl - ready

*User utilities*
performance tests - ready (for the beta need to check all tests pass 
compilation)
mstflint - ready
ibutils - ready
opensm - ready
qlvnictools - ready
openib-diags - ready
srptools - ready
ipoibtools - ready
tvflash - ready (Roland should open a branch)
sdpnetstat - ready (does not pass compilation on RHEL5)
open-iscsi - ready

*MPI:*
mvapich - ready
mvapich2 - Build issue - must be resolved for the alpha
openmpi - ready
mpitests - ready

*OFED specific:*
ofed_docs - taken from 1.1 - not yet updated for 1.2
ofed_scripts - ready


Tziporet

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070213/cb33c7b8/attachment.html>

From tziporet at mellanox.co.il  Tue Feb 13 13:07:19 2007
From: tziporet at mellanox.co.il (Tziporet Koren)
Date: Tue, 13 Feb 2007 23:07:19 +0200
Subject: [openib-general] [PATCH] ofed_1-2 IWCM - Set iniator depth and
 responder resources to device max values.
In-Reply-To: <1171297207.16167.24.camel@stevo-desktop>
References: <1171223899.4027.1.camel@linux-q667.site>
	<1171297207.16167.24.camel@stevo-desktop>
Message-ID: <45D22887.7030003@mellanox.co.il>

Steve Wise wrote:
> BTW:  We need this for the alpha1 build or DAPL applications won't work
> over iWARP devices.
>   
>
Was applied today
Tziporet


From mshefty at ichips.intel.com  Tue Feb 13 13:14:09 2007
From: mshefty at ichips.intel.com (Sean Hefty)
Date: Tue, 13 Feb 2007 13:14:09 -0800
Subject: [openib-general] Problem is routing CM REQ
In-Reply-To: <6.2.0.14.2.20070213124447.08e6e600@esmail.cup.hp.com>
References: <20070210004820.GS11411@obsidianresearch.com>
	<000001c74cb8$5e80eef0$3cd4180a@amr.corp.intel.com>
	<20070211230935.GT11411@obsidianresearch.com>
	<45D0A27A.2010302@ichips.intel.com>
	<20070212205634.GW11411@obsidianresearch.com>
	<6.2.0.14.2.20070212152343.08e8ca48@esmail.cup.hp.com>
	<45D0FCC8.4090304@ichips.intel.com>
	<6.2.0.14.2.20070213124447.08e6e600@esmail.cup.hp.com>
Message-ID: <45D22A21.9040708@ichips.intel.com>

> It does not need to comprehend the remote subnet(s) LID.   
> That is the router protocol to determine.  CM also must understand the 
> GIDs involved which the router will process to figure out its LID 
> mapping to the next hop.

The CM REQ carries the remote router LID (primary local port lid - 12.7.11) and 
remote endpoint LID (primary remote port lid - 12.7.21).

- Sean


From robert.j.woodruff at intel.com  Tue Feb 13 13:36:04 2007
From: robert.j.woodruff at intel.com (Woodruff, Robert J)
Date: Tue, 13 Feb 2007 13:36:04 -0800
Subject: [openib-general] [openfabrics-ewg] new OFED 1.2 package
In-Reply-To: <45D21B85.9070007@mellanox.co.il>
Message-ID: <BAE9DCEF64577A439B3A37F36F9B691C01B54D33@orsmsx418.amr.corp.intel.com>


I tried to build this on RedHat EL4-U3 and got the following
build error.

make: ***
[_module_/var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bondi
ng] Error 2
make: Leaving directory `/usr/src/kernels/2.6.9-34.EL.root-smp-x86_64'
+ echo ' Building  IB bonding driver failed'
 Building  IB bonding driver failed
+ exit 1 

-----Original Message-----
From: openfabrics-ewg-bounces at openib.org
[mailto:openfabrics-ewg-bounces at openib.org] On Behalf Of Tziporet Koren
Sent: Tuesday, February 13, 2007 12:12 PM
To: Vladimir Sokolovsky
Cc: EWG; OPENIB
Subject: Re: [openfabrics-ewg] new OFED 1.2 package

Vladimir Sokolovsky wrote:
> New OFED package was uploaded to the OFA server:
>
http://www.openfabrics.org/~vlad/builds/ofed-1.2/OFED-1.2-20070213-1646.
tgz
>
>
>
> Known issues:
> mvapich2 RPM build fails (will be fixed in alpha1).	
> sdpnetstat compilation fails in RHEL5
>
>
>   
Hi All,

This is the pre-alpha package for your testing.
Please send us feedback today so we can build the first alpha OFED
tomorrow.
If any show-stopper issue for the alpha is found please let us know.

Note that components compilation is blocked on kernels that they do not 
support.

Thanks,
Tziporet

_______________________________________________
openfabrics-ewg mailing list
openfabrics-ewg at openib.org
http://openib.org/mailman/listinfo/openfabrics-ewg


From jeremy.brown at qlogic.com  Tue Feb 13 14:47:48 2007
From: jeremy.brown at qlogic.com (Jeremy Brown)
Date: Tue, 13 Feb 2007 14:47:48 -0800
Subject: [openib-general] [openfabrics-ewg] new OFED 1.2 package
In-Reply-To: <45D21B85.9070007@mellanox.co.il>
References: <1171387167.3978.90.camel@vladsk-laptop>
	<45D21B85.9070007@mellanox.co.il>
Message-ID: <1171406869.17328.16.camel@citrine.pathscale.com>

On Tue, 2007-02-13 at 22:11 +0200, Tziporet Koren wrote:
> This is the pre-alpha package for your testing.
> Please send us feedback today so we can build the first alpha OFED tomorrow.
> If any show-stopper issue for the alpha is found please let us know.
> 
> Note that components compilation is blocked on kernels that they do not 
> support.

While I understand that Fedora is not officially supported in OFED 1.2,
I know that many participants are making an effort to make sure Fedora
(at least FC6) will work. I did attempt a build on a Fedora Core 4
system, and encountered an issue related to the sysfs* name changes.

ERROR: The libsysfs-devel package is required to build libibverbs_devel
RPM

I know that the package is named "sysfsutils-devel" in Fedora Core 3-5,
and "libsysfs-devel" in Fedora Core 6, similar to the RH 4 vs. RH 5
split. Would it be possible to change the definition and use of
$DISTRIBUTION in build_env.sh so the we had "fedora" for FC3-5, and
"fedora6" for FC6, similar to the "redhat" and "redhat5" split? I'm not
married to those names, of course.

Naturally, this shouldn't gate the alpha. :)

Thanks for getting the build ready!

Jeremy Brown


From robert.j.woodruff at intel.com  Tue Feb 13 14:49:21 2007
From: robert.j.woodruff at intel.com (Woodruff, Robert J)
Date: Tue, 13 Feb 2007 14:49:21 -0800
Subject: [openib-general] [openfabrics-ewg] new OFED 1.2 package
In-Reply-To: <BAE9DCEF64577A439B3A37F36F9B691C01B54D33@orsmsx418.amr.corp.intel.com>
Message-ID: <BAE9DCEF64577A439B3A37F36F9B691C01B54EC7@orsmsx418.amr.corp.intel.com>

I am also still seeing the issue with the rdma_cm abi_version on RedHat
EL4-U3,
bug number, 347. The bug report contains the patch that should fix this.


I_MPI: [0] set_up_devices(): I_MPI_DAPL_IP_ADDR     = NULL
I_MPI: [0] set_up_devices(): I_MPI_DAPL_PORT        = NULL
librdmacm: couldn't read ABI version.
librdmacm: assuming: 4
librdmacm: couldn't read ABI version.
librdmacm: assuming: 4
I_MPI: [0] I_MPI_dlopen_dat(): trying to dlopen default -ldat: libdat.so
I_MPI: [0] my_dlopen(): trying to dlopen: libdat.so
I_MPI: [1] I_MPI_dlopen_dat(): trying to dlopen default -ldat: libdat.so


-----Original Message-----
From: openib-general-bounces at openib.org
[mailto:openib-general-bounces at openib.org] On Behalf Of Woodruff, Robert
J
Sent: Tuesday, February 13, 2007 1:36 PM
To: Tziporet Koren; Vladimir Sokolovsky
Cc: EWG; OPENIB
Subject: Re: [openib-general] [openfabrics-ewg] new OFED 1.2 package


I tried to build this on RedHat EL4-U3 and got the following
build error.

make: ***
[_module_/var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bondi
ng] Error 2
make: Leaving directory `/usr/src/kernels/2.6.9-34.EL.root-smp-x86_64'
+ echo ' Building  IB bonding driver failed'
 Building  IB bonding driver failed
+ exit 1 

-----Original Message-----
From: openfabrics-ewg-bounces at openib.org
[mailto:openfabrics-ewg-bounces at openib.org] On Behalf Of Tziporet Koren
Sent: Tuesday, February 13, 2007 12:12 PM
To: Vladimir Sokolovsky
Cc: EWG; OPENIB
Subject: Re: [openfabrics-ewg] new OFED 1.2 package

Vladimir Sokolovsky wrote:
> New OFED package was uploaded to the OFA server:
>
http://www.openfabrics.org/~vlad/builds/ofed-1.2/OFED-1.2-20070213-1646.
tgz
>
>
>
> Known issues:
> mvapich2 RPM build fails (will be fixed in alpha1).	
> sdpnetstat compilation fails in RHEL5
>
>
>   
Hi All,

This is the pre-alpha package for your testing.
Please send us feedback today so we can build the first alpha OFED
tomorrow.
If any show-stopper issue for the alpha is found please let us know.

Note that components compilation is blocked on kernels that they do not 
support.

Thanks,
Tziporet

_______________________________________________
openfabrics-ewg mailing list
openfabrics-ewg at openib.org
http://openib.org/mailman/listinfo/openfabrics-ewg

_______________________________________________
openib-general mailing list
openib-general at openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit
http://openib.org/mailman/listinfo/openib-general


From krause at cup.hp.com  Tue Feb 13 14:48:06 2007
From: krause at cup.hp.com (Michael Krause)
Date: Tue, 13 Feb 2007 14:48:06 -0800
Subject: [openib-general] Problem is routing CM REQ
In-Reply-To: <45D22A21.9040708@ichips.intel.com>
References: <20070210004820.GS11411@obsidianresearch.com>
	<000001c74cb8$5e80eef0$3cd4180a@amr.corp.intel.com>
	<20070211230935.GT11411@obsidianresearch.com>
	<45D0A27A.2010302@ichips.intel.com>
	<20070212205634.GW11411@obsidianresearch.com>
	<6.2.0.14.2.20070212152343.08e8ca48@esmail.cup.hp.com>
	<45D0FCC8.4090304@ichips.intel.com>
	<6.2.0.14.2.20070213124447.08e6e600@esmail.cup.hp.com>
	<45D22A21.9040708@ichips.intel.com>
Message-ID: <6.2.0.14.2.20070213143635.09393fe0@esmail.cup.hp.com>

At 01:14 PM 2/13/2007, Sean Hefty wrote:
>>It does not need to comprehend the remote subnet(s) LID.
>>That is the router protocol to determine.  CM also must understand the 
>>GIDs involved which the router will process to figure out its LID mapping 
>>to the next hop.
>
>The CM REQ carries the remote router LID (primary local port lid - 
>12.7.11) and remote endpoint LID (primary remote port lid - 12.7.21).

Let me clarify what the specification is saying which is what I'm saying.

A LID is subnet local on that we can all agree.   The CM Req contains 
either the LID of a local subnet CA or the LID a local router which will 
move the packet to the next hop to the destination.   12.7.11 is basically 
saying that the remote LID is the router's LID of the local subnet's router 
Port.   12.7.21 also refers to the remote LID but in each subnet that is 
either the router Port's LID or the destination CA.

 From an operational flow perspective, CM would:

Query to see if the destination CA is on the local subnet
If yes, then obtain the associated records to find the local LID
If no, then obtain the set of records that contain the local addressing to 
a router Port that will progress connection establishment to the next hop 
on the way to the destination.

While there isn't a router specification any longer, the basic operation is 
very much like that of an IP subnet.   The router protocol establishes a 
set of routes for given subnet prefix and then communicates that to each 
SM/SA so that queries will resolve the optimal router Port.   Chapter 8 
provides clear guidance in this regard.  Chapter 12 is basically stating 
what to plug into various fields with all LIDs being only local to the 
subnet where they are managed.   The primary global knowledge that one must 
have across subnets are to establish a connection or communication flow.

- SGID
- DGID
- P_Key
- Q_Key

There really isn't much more than this to comprehend.  The TClass and Flow 
Labels were expected to be provided via the router protocol so the 
management requirements are really query look up.

Mike 


From krause at cup.hp.com  Tue Feb 13 15:10:27 2007
From: krause at cup.hp.com (Michael Krause)
Date: Tue, 13 Feb 2007 15:10:27 -0800
Subject: [openib-general] Problem is routing CM REQ
In-Reply-To: <20070213220255.GA10579@obsidianresearch.com>
References: <20070210004820.GS11411@obsidianresearch.com>
	<000001c74cb8$5e80eef0$3cd4180a@amr.corp.intel.com>
	<20070211230935.GT11411@obsidianresearch.com>
	<45D0A27A.2010302@ichips.intel.com>
	<20070212205634.GW11411@obsidianresearch.com>
	<6.2.0.14.2.20070212152343.08e8ca48@esmail.cup.hp.com>
	<20070213001045.GY11411@obsidianresearch.com>
	<6.2.0.14.2.20070213124803.08ee1208@esmail.cup.hp.com>
	<20070213220255.GA10579@obsidianresearch.com>
Message-ID: <6.2.0.14.2.20070213144835.093938f8@esmail.cup.hp.com>

At 02:02 PM 2/13/2007, Jason Gunthorpe wrote:
>On Tue, Feb 13, 2007 at 12:49:57PM -0800, Michael Krause wrote:
>
> > >Translated into a network with routers this means that for a RC flow
> > >to successfully work both the *forward* and *reverse* direction must
> > >traverse the same router *LID* not just *port* on both subnets.
> >
> > That is a given since the LID = path and same path must be used to insure
> > strong ordering is maintained.
>
>I think you are missing what I'm saying. IB within a subnet has the
>path selected by the DLID only.

The actual path selection is a policy decision outside the scope of the 
specification - it appears this is your main concern in that the 
specification does not state "take these N parameters and apply the 
following algorithm to identify a path".   The address vector can be 
comprised of many fields including a LID range.  The actual DLID selected 
is done above as there can be a variety of policies or constraints imposed 
for a given data flow.   I agree that packet switching within is via a DLID.

>So the construction process for a QP is to choose two enport LIDs, reverse 
>them on one side and then query the SA for the forward and reverse SL. 
>That gives you a pair of workable QPs.

SL, LID, etc. are all uploaded into the management database for the SM / SA 
to access and there can be much more robust information loaded as well that 
goes well beyond what the IBTA specified in order to provide additional 
interpretations / information to guide path selection.   A query can return 
multiple records if multi-path has been configured.  Policy above is used 
to construct the CM messages which communicate the preferred path.    The 
CM messages for establishment across subnets should be sufficient in their 
existing content to work independent of how the actual routing is 
accomplished.

>This same procedure doesn't work for routers.
>Consider a case where a router port has LID 1 and an end port has
>LIDs 3,4.
>The end port establishes two RC QPs:
>  #1: SLID=3, DLID=1
>  #2: SLID=4, DLID=1
>Both have the same DGID - how is the router expected to know that QP
>#1 requires one set of LIDs and QP #2 requires a different set?

For all intents and purposes, within a local subnet, a router Port is 
treated the same as CA.  If there are multiple paths between a router Port 
and a given CA Port, i.e. multiple LIDs are configured, then the router is 
supposed to query the SM / SA database and obtain the appropriate records 
and make a decision that remains valid for the lifetime of the data 
flow.   The purpose of the TClass is to enable a local mapping to SL which 
can also be used as input into LID selection.   The flow label is left open 
in its value and was expected to be used much like it is in IP.   People 
considered encoding it or at a minimum, using it as an input parameter to 
identify the associated LID for the flow but that was not agreed to since 
the router vendors at the time wanted it left largely opaque.


>Section 19.2.4.1 seems to make it explicit to me that this is a valid 
>situation.

Yes, 19.2.4.1 supports multi-path within a given subnet.

>To have this work the router must use the flow label to identify the
>correct DLID. SA/CM must be enhanced in some way to let the two sides
>exchange flow labels.

That is a policy decision or something for a TBD router protocol 
specification.   It is not required to use the Flow Label.

>This problem is worse if you have multiple independent redundent
>routers on your subnet, or LMC != 0. Then you now have the problem of
>SLID matching as well as DLID matching.

It is no worse due to the existence of multi-path.   There are many 
variables involved in creating a viable router protocol specification which 
is in part, why the IBTA chose to not complete that work.

>Strong ordering is maintained in all cases because the routers always
>make consistent choices for the LRH.DLID on a session by session
>basis.

Agreed,  The router is responsible for insuring a consistent path is used 
for a given flow.  That does not preclude multi-path nor does it make 
multi-path more complicated as a result.


> > >That is why I said previously that the QP matching rules are a
> > >mistake. The best way to solve this is to change C9-54 to only be in
> > >effect if the GRH is not present.
> >
> > I disagree.  We were very explicit in how and why we constructed those
> > rules.
>
>Do you know of a solution then?
>
>If C9-54 is a very deliberate design then it must be that the CM
>specification in Chapter 12 is not designed to handle the
>ramifications of C9-54.
>
>I just can't see how to fit both CM and C9-54 together into a workable
>solution.

You are arguing about a router protocol problem that does not exist  or 
perhaps I just don't get it.   We did progress the router specification or 
at least the operating models behind it sufficiently to validate that both 
Chapter 9 and Chapter 12 worked as specified (as well as chapters 8 and 
19).   Yes, there are implementation issues within a router for it to 
perform the appropriate queries on the SM / SA to identify a preferred 
flow's path within a given subnet.   This makes this a local subnet policy 
issue and is orthogonal to the compliance statements in the volume 1 
specification.    If you believe the specification is faulty, then it would 
be best to take this to the IBTA and have an official review done by the 
workgroup teams involved with these chapters.   People are free to 
implement what they choose which for routers is completely open since there 
isn't a specification but for the compliance statements, assuming 
interoperability is desirable in this regard, the validation tree in the 
specification should be used and any software implemented on top of such 
hardware should take that into account.    For the most part, what you've 
described is largely an argument about the policy to select a path and that 
is a router domain problem not a packet validation problem.    Within the 
router domain, that is pure policy just like in the IP world.  As long as 
it results a given flow consistently using the same data path, all is 
good.    At the end of the day, the router implementations will decide 
their policy for determining the optimal path and I doubt there will be a 
one-size-fits-all agreement on the formula that is used to construct that 
decision (albeit, if the SM/SA only returns one path for a given flow, then 
the decision is rather easy).

Mike  


From sean.hefty at intel.com  Tue Feb 13 13:17:57 2007
From: sean.hefty at intel.com (Sean Hefty)
Date: Tue, 13 Feb 2007 13:17:57 -0800
Subject: [openib-general] IB routing discussion summary
Message-ID: <000601c74fb4$6ed83840$8698070a@amr.corp.intel.com>

Here's a first take at summarizing the IB routing discussion.

The following spec references are noted:

9.6.1.5 C9-54. The SLID shall be validated (for connected QPs).
12.7.11. CM REQ Local Port LID - is LID of remote router.
13.5.4: Defines reversible paths.

The main discussion point centered on trying to meet 9.6.1.5 C9-54.  This
requires that the forward and reverse data flows between two QPs traverse the
same router LID on both subnets.  The idea was made to try to eliminate this
compliance statement for packets carrying a GRH, but this is viewed as going
against the spirit of IBA.

Ideas were presented around trying to construct an 'inter-subnet path record'
that contained the following:

   - Side A GRH.SGID = active side's Port GID
   - Side A GRH.DGID = passive side's Port GID
   - Side A LRH.SLID = any active side's port LID
   - Side A LRH.DLID = A subnet router
   - Side A LRH.SL   = SL to A subnet router

   - Side B GRH.SGID = Side A GRH.DGID
   - Side B GRH.DGID = Side A GRH.SGID
   - Side B LRH.SLID = any passive side's port LID
   - Side B LRH.DLID = B subnet router
   - Side B LRH.SL   = SL to B subnet router

It is still unclear how such a record can be constructed.  But communication
with remote SAs might be achieved by using a well-known GID suffix.  It's also
unclear whether the fields in a path record are relative to the SA's subnet or
the SGID.

It's anticipated that SAs will need to interact with routers, but in an
unspecified manner.


From sean.hefty at intel.com  Tue Feb 13 15:55:19 2007
From: sean.hefty at intel.com (Sean Hefty)
Date: Tue, 13 Feb 2007 15:55:19 -0800
Subject: [openib-general] Problem is routing CM REQ
In-Reply-To: <6.2.0.14.2.20070213143635.09393fe0@esmail.cup.hp.com>
Message-ID: <000001c74fca$6c765170$8698070a@amr.corp.intel.com>

>A LID is subnet local on that we can all agree.   The CM Req contains
>either the LID of a local subnet CA or the LID a local router which will
>move the packet to the next hop to the destination.   12.7.11 is basically
>saying that the remote LID is the router's LID of the local subnet's router
>Port.   12.7.21 also refers to the remote LID but in each subnet that is
>either the router Port's LID or the destination CA.

This isn't my interpretation.

12.7.11 Local Port LID:  When local and remote ports are on different subnets,
this field must be the LID of the router that the *passive* side will target for
the return path.

The CM REQ carries the LIDs for the remote (passive side) subnet.  This is what
the passive side needs to configure the QP, not the active side LID information.
(See address vector information for 11.2.4.2 - page 574.)

So, the CM REQ is _sent_ to either the LID of the local subnet CA or the LID of
a local router port, but _contains_ the LIDs from the remote subnet.

- Sean


From swise at opengridcomputing.com  Tue Feb 13 16:01:26 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Tue, 13 Feb 2007 18:01:26 -0600
Subject: [openib-general] [openfabrics-ewg] new OFED 1.2 package
In-Reply-To: <45D21B85.9070007@mellanox.co.il>
References: <1171387167.3978.90.camel@vladsk-laptop>
	<45D21B85.9070007@mellanox.co.il>
Message-ID: <1171411286.28495.12.camel@stevo-desktop>

I installed this on RHEL5 beta 2 with that distro's kernel and RHEL4U4
with a kernel.org 2.6.20 kernel.  

I successfully configured cxgb3 and mthca and could icmp-ping over both
interfaces.

I successfully ran rping over both IB and IW.

I successfully ran dapltest/regress.sh over both IB and IW.

I could _not_ get ib_rdma_bw to run in either cma mode or non-cma mode.
The server side exits immediately without an error. ???

I'm blocked on mvapich2/iwarp testing due to the known issues with that
package.

I tried rping over iwarp on the RHEL4U4 distro's kernel and had
problems.  I'm thinking the SLES9SP3 fix that was pulled in might have
problems on other distros (it changed the behavior of xxx_ip_dev_find()
on all backports).   I don't think this is stop-ship for alpha1,
however.  

That's it for today.

Steve.


On Tue, 2007-02-13 at 22:11 +0200, Tziporet Koren wrote:
> Vladimir Sokolovsky wrote:
> > New OFED package was uploaded to the OFA server:
> > http://www.openfabrics.org/~vlad/builds/ofed-1.2/OFED-1.2-20070213-1646.tgz
> >
> >
> >
> > Known issues:
> > mvapich2 RPM build fails (will be fixed in alpha1).	
> > sdpnetstat compilation fails in RHEL5
> >
> >
> >   
> Hi All,
> 
> This is the pre-alpha package for your testing.
> Please send us feedback today so we can build the first alpha OFED tomorrow.
> If any show-stopper issue for the alpha is found please let us know.
> 
> Note that components compilation is blocked on kernels that they do not 
> support.
> 
> Thanks,
> Tziporet
> 
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
> 


From ogerlitz at voltaire.com  Tue Feb 13 22:04:01 2007
From: ogerlitz at voltaire.com (Or Gerlitz)
Date: Wed, 14 Feb 2007 08:04:01 +0200
Subject: [openib-general] mvapich2 ofed 1.2 problem
In-Reply-To: <adazm7ikm7q.fsf@cisco.com>
References: <1171380610.15471.25.camel@stevo-desktop>
	<adar6sum1fq.fsf@cisco.com> <1171386686.15471.36.camel@stevo-desktop>
	<adazm7ikm7q.fsf@cisco.com>
Message-ID: <45D2A651.6020604@voltaire.com>

Roland Dreier wrote:
>  > How do I tell?  Can I tell from the .so files?
> 
> ldd on the .so and the app would probably give you good info.
> 
> I'm pretty sure that mpicc must be linking against an libibverbs 1.0
> from somewhere.

To be really sure which dynamic libraries where loaded, do

$ info sharedlibrary

within the gdb console

Or.


From dotanb at dev.mellanox.co.il  Wed Feb 14 01:12:00 2007
From: dotanb at dev.mellanox.co.il (Dotan Barak)
Date: Wed, 14 Feb 2007 11:12:00 +0200
Subject: [openib-general] [GIT PULL] please pull infiniband.git
In-Reply-To: <ada64a5khei.fsf@cisco.com>
References: <adazm7ioqpc.fsf@cisco.com>
	<45D17418.3000508@dev.mellanox.co.il> <ada64a5khei.fsf@cisco.com>
Message-ID: <45D2D260.1000306@dev.mellanox.co.il>

Roland Dreier wrote:
>  > What about the patch that i sent on "Allow the following QP state
>  > transition : reset --> reset"?
>
> OK, I'll merge that in the next patch.  It's the kind of patch I'm not
> happy about merging, since it bloats the code to handle a corner case
> no one is likely to hit in practice, but it is technically correct so
> I guess we're forced to merge it.
>   
thanks.

Dotan


From mst at mellanox.co.il  Wed Feb 14 02:01:09 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Wed, 14 Feb 2007 12:01:09 +0200
Subject: [openib-general] [PATCH 3 of 4] IB/mthca: fix
 non-cache-coherent CPUs with memfree
In-Reply-To: <adahctqobkg.fsf@cisco.com>
References: <adahctqobkg.fsf@cisco.com>
Message-ID: <20070214100109.GA4434@mellanox.co.il>

> Quoting Roland Dreier <rdreier at cisco.com>:
> Subject: Re: [PATCH 3 of 4] IB/mthca: fix non-cache-coherent CPUs with memfree
> 
> OK, I already merged this but now I'm thinking it's somewhat buggy:

Hopefully not.

>  > +		if (coherent)
>  > +			ret = mthca_alloc_icm_coherent(&dev->pdev->dev,
>  > +						       &chunk->mem[chunk->npages],
>  > +						       cur_order, gfp_mask);
>  > +		else
>  > +		       	ret = mthca_alloc_icm_pages(&chunk->mem[chunk->npages],
>  > +						    cur_order, gfp_mask);
>  >  
>  > -			if (++chunk->npages == MTHCA_ICM_CHUNK_LEN) {
>  > +		if (!ret) {
>  > +			++chunk->npages;
>  > +
>  > +			if (!coherent && chunk->npages == MTHCA_ICM_CHUNK_LEN) {
>  >  				chunk->nsg = pci_map_sg(dev->pdev, chunk->mem,
> 
> I don't see anything that ever bumps chunk->nsg if we're allocating a
> coherent region and we end up needing more than one allocation to do
> it.

Yes but this is intentional.

> Maybe something like this on top of the patch?
> 
> diff --git a/drivers/infiniband/hw/mthca/mthca_memfree.c b/drivers/infiniband/hw/mthca/mthca_memfree.c
> index 0b9d053..48f7c65 100644
> --- a/drivers/infiniband/hw/mthca/mthca_memfree.c
> +++ b/drivers/infiniband/hw/mthca/mthca_memfree.c
> @@ -175,7 +175,9 @@ struct mthca_icm *mthca_alloc_icm(struct mthca_dev *dev, int npages,
>  		if (!ret) {
>  			++chunk->npages;
>  
> -			if (!coherent && chunk->npages == MTHCA_ICM_CHUNK_LEN) {
> +			if (coherent)
> +				++chunk->nsg;
> +			else if (chunk->npages == MTHCA_ICM_CHUNK_LEN) {
>  				chunk->nsg = pci_map_sg(dev->pdev, chunk->mem,
>  							chunk->npages,
>  							PCI_DMA_BIDIRECTIONAL);

No, I think the code is fine and this patch will break things:
chunk->nsg is needed only for non-coherent memory to call pci_unmap_sg:

               if (chunk->nsg > 0)
			pci_unmap_sg(dev->pdev, chunk->mem,
				chunk->npages, PCI_DMA_BIDIRECTIONAL);


and we do *not* want to call pci_unmap_sg on consistent memory.

-- 
MST


From mst at mellanox.co.il  Wed Feb 14 02:01:51 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Wed, 14 Feb 2007 12:01:51 +0200
Subject: [openib-general] [GIT PULL] please pull infiniband.git
In-Reply-To: <ada64a5khei.fsf@cisco.com>
References: <adazm7ioqpc.fsf@cisco.com>
	<45D17418.3000508@dev.mellanox.co.il> <ada64a5khei.fsf@cisco.com>
Message-ID: <20070214100151.GB4434@mellanox.co.il>

> Quoting Roland Dreier <rdreier at cisco.com>:
> Subject: Re: [GIT PULL] please pull infiniband.git
> 
>  > What about the patch that i sent on "Allow the following QP state
>  > transition : reset --> reset"?
> 
> OK, I'll merge that in the next patch.  It's the kind of patch I'm not
> happy about merging, since it bloats the code to handle a corner case
> no one is likely to hit in practice, but it is technically correct so
> I guess we're forced to merge it.

I actually think this is a spec bug:
IB spec requires the following transitions in modify qp:
        Any State to Error
	Any state to Reset

If we are in reset state, both of these do not make any sense.

To Error from reset is especially hard to implement, and I think
this will actually create bugs in application (if it did "to error" I
think its expecting completions with error).

I'll try speaking with our representatives at the IBTA to remove this requirement.

-- 
MST


From mst at mellanox.co.il  Wed Feb 14 02:16:43 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Wed, 14 Feb 2007 12:16:43 +0200
Subject: [openib-general] [PATCH 3 of 4] IB/mthca: fix
 non-cache-coherent CPUs with memfree
In-Reply-To: <adahctrrn22.fsf@cisco.com>
References: <20070210211508.GD14903@mellanox.co.il> <adahctrrn22.fsf@cisco.com>
Message-ID: <20070214101618.GC4434@mellanox.co.il>

> Quoting Roland Dreier <rdreier at cisco.com>:
> Subject: Re: [PATCH 3 of 4] IB/mthca: fix non-cache-coherent CPUs with memfree
> 
> Queued for 2.6.21, although I think a further cleanup would be:
> 
>  >  	mdev->mr_table.mpt_table = mthca_alloc_icm_table(mdev, init_hca->mpt_base,
>  >  							 dev_lim->mpt_entry_sz,
>  >  							 mdev->limits.num_mpts,
>  > -							 mdev->limits.reserved_mrws, 1);
>  > +							 mdev->limits.reserved_mrws,
>  > +							 1, 1);
> 
> instead of having use_lowmem and use_coherent be separate parameters,
> we should probably convert it to a type parameter, and have
> MTHCA_ICM_TABLE_HIGHMEM, _LOWMEM and _COHERENT.  That would make these
> calls a lot easier to read and get correct.

Good idea.

-- 
MST


From mst at mellanox.co.il  Wed Feb 14 02:22:52 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Wed, 14 Feb 2007 12:22:52 +0200
Subject: [openib-general] [PATCH 3 of 4] IB/mthca: fix
 non-cache-coherent CPUs with memfree
In-Reply-To: <adasldb2ked.fsf@cisco.com>
References: <20070210211508.GD14903@mellanox.co.il> <adasldb2ked.fsf@cisco.com>
Message-ID: <20070214102252.GD4434@mellanox.co.il>

> Quoting Roland Dreier <rdreier at cisco.com>:
> Subject: Re: [PATCH 3 of 4] IB/mthca: fix non-cache-coherent CPUs with memfree
> 
>  > +	sg_set_buf(mem, buf, PAGE_SIZE << order);
>  > +	BUG_ON(mem->offset);
>  > +	sg_dma_len(mem) = PAGE_SIZE << order;
> 
> What am I missing?  Any reason to set sg_dma_len() again after sg_set_buf()?

How do you mean, again? Does sg_set_buf set dma_length?

In 2.6.20, I see this in include/linux/scatterlist.h:

static inline void sg_set_buf(struct scatterlist *sg, const void *buf,
			                                    unsigned int buflen)
{
	        sg->page = virt_to_page(buf);
		sg->offset = offset_in_page(buf);
		sg->length = buflen;
}


-- 
MST


From vlad at lists.openfabrics.org  Wed Feb 14 02:24:45 2007
From: vlad at lists.openfabrics.org (vlad at lists.openfabrics.org)
Date: Wed, 14 Feb 2007 02:24:45 -0800 (PST)
Subject: [openib-general] ofa_1_2_kernel 20070214-0200 daily build status
Message-ID: <20070214102445.A9067E603C3@openfabrics.org>

This email was generated automatically, please do not reply


Common build parameters:  --with-ipoib-mod --with-sdp-mod --with-srp-mod --with-user_mad-mod --with-user_access-mod --with-mthca-mod --with-core-mod --with-addr_trans-mod --with-cxgb3-mod 

Passed:
Passed on i686 with 2.6.15-23-server
Passed on i686 with linux-2.6.17
Passed on i686 with linux-2.6.14
Passed on i686 with linux-2.6.13
Passed on i686 with linux-2.6.15
Passed on i686 with linux-2.6.16
Passed on i686 with linux-2.6.12
Passed on i686 with linux-2.6.18
Passed on i686 with linux-2.6.19
Passed on x86_64 with linux-2.6.20
Passed on x86_64 with linux-2.6.17
Passed on x86_64 with linux-2.6.19
Passed on x86_64 with linux-2.6.18
Passed on x86_64 with linux-2.6.14
Passed on powerpc with linux-2.6.19
Passed on x86_64 with linux-2.6.13
Passed on x86_64 with linux-2.6.16
Passed on powerpc with linux-2.6.18
Passed on powerpc with linux-2.6.17
Passed on x86_64 with linux-2.6.15
Passed on x86_64 with linux-2.6.12
Passed on ia64 with linux-2.6.19
Passed on ppc64 with linux-2.6.19
Passed on powerpc with linux-2.6.16
Passed on powerpc with linux-2.6.15
Passed on powerpc with linux-2.6.14
Passed on powerpc with linux-2.6.12
Passed on powerpc with linux-2.6.13
Passed on ppc64 with linux-2.6.12
Passed on ppc64 with linux-2.6.14
Passed on ppc64 with linux-2.6.13
Passed on ppc64 with linux-2.6.18
Passed on ppc64 with linux-2.6.15
Passed on ia64 with linux-2.6.15
Passed on ppc64 with linux-2.6.17
Passed on ia64 with linux-2.6.18
Passed on ppc64 with linux-2.6.16
Passed on ia64 with linux-2.6.12
Passed on ia64 with linux-2.6.13
Passed on ia64 with linux-2.6.17
Passed on ia64 with linux-2.6.16
Passed on ia64 with linux-2.6.14

Failed:


From bugzilla-daemon at lists.openfabrics.org  Wed Feb 14 03:49:22 2007
From: bugzilla-daemon at lists.openfabrics.org (bugzilla-daemon at lists.openfabrics.org)
Date: Wed, 14 Feb 2007 03:49:22 -0800 (PST)
Subject: [openib-general] [Bug 347] rdma cm backport to EL4 - U3 broken
In-Reply-To: <bug-347-1@https.bugs.openfabrics.org/>
Message-ID: <20070214114922.6E1CCE603C3@openfabrics.org>

https://bugs.openfabrics.org/show_bug.cgi?id=347


tziporet at mellanox.co.il changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         AssignedTo|bugzilla at openib.org         |mst at mellanox.co.il


-- 
Configure bugmail: https://bugs.openfabrics.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at lists.openfabrics.org  Wed Feb 14 03:56:48 2007
From: bugzilla-daemon at lists.openfabrics.org (bugzilla-daemon at lists.openfabrics.org)
Date: Wed, 14 Feb 2007 03:56:48 -0800 (PST)
Subject: [openib-general] [Bug 322] 2.6.17 backport: reading the rdma-cm abi
 file causes fault.
In-Reply-To: <bug-322-1@https.bugs.openfabrics.org/>
Message-ID: <20070214115648.BFF8AE602FA@openfabrics.org>

https://bugs.openfabrics.org/show_bug.cgi?id=322


tziporet at mellanox.co.il changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         AssignedTo|bugzilla at openib.org         |sean.hefty at intel.com


-- 
Configure bugmail: https://bugs.openfabrics.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at lists.openfabrics.org  Wed Feb 14 04:12:17 2007
From: bugzilla-daemon at lists.openfabrics.org (bugzilla-daemon at lists.openfabrics.org)
Date: Wed, 14 Feb 2007 04:12:17 -0800 (PST)
Subject: [openib-general] [Bug 318] Registering up to 1.6 GB in one process
 causes a machine crash
In-Reply-To: <bug-318-1@https.bugs.openfabrics.org/>
Message-ID: <20070214121217.54A88E603C3@openfabrics.org>

https://bugs.openfabrics.org/show_bug.cgi?id=318


dotanb at mellanox.co.il changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Version|1.1                         |1.2


-- 
Configure bugmail: https://bugs.openfabrics.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at lists.openfabrics.org  Wed Feb 14 04:17:14 2007
From: bugzilla-daemon at lists.openfabrics.org (bugzilla-daemon at lists.openfabrics.org)
Date: Wed, 14 Feb 2007 04:17:14 -0800 (PST)
Subject: [openib-general] [Bug 315] enabling the rdma_ucm and restarting the
 driver several times causes kernel oops
In-Reply-To: <bug-315-1@https.bugs.openfabrics.org/>
Message-ID: <20070214121714.85163E60804@openfabrics.org>

https://bugs.openfabrics.org/show_bug.cgi?id=315


dotanb at mellanox.co.il changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Version|gen2                        |1.2


-- 
Configure bugmail: https://bugs.openfabrics.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at lists.openfabrics.org  Wed Feb 14 04:18:47 2007
From: bugzilla-daemon at lists.openfabrics.org (bugzilla-daemon at lists.openfabrics.org)
Date: Wed, 14 Feb 2007 04:18:47 -0800 (PST)
Subject: [openib-general] [Bug 318] Registering up to 1.6 GB in one process
 causes a machine crash
In-Reply-To: <bug-318-1@https.bugs.openfabrics.org/>
Message-ID: <20070214121847.6701EE60804@openfabrics.org>

https://bugs.openfabrics.org/show_bug.cgi?id=318


------- Comment #1 from mst at mellanox.co.il  2007-02-14 04:18 -------
Subject: Re:  Registering up to 1.6 GB in one process causes a machine crash

> Driver Version    : OFED 1.1

Is this still relevant?


-- 
Configure bugmail: https://bugs.openfabrics.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at lists.openfabrics.org  Wed Feb 14 04:19:57 2007
From: bugzilla-daemon at lists.openfabrics.org (bugzilla-daemon at lists.openfabrics.org)
Date: Wed, 14 Feb 2007 04:19:57 -0800 (PST)
Subject: [openib-general] [Bug 314] libibverbs doesn't support static linkage
In-Reply-To: <bug-314-1@https.bugs.openfabrics.org/>
Message-ID: <20070214121958.04880E603C3@openfabrics.org>

https://bugs.openfabrics.org/show_bug.cgi?id=314


dotanb at mellanox.co.il changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #2 from dotanb at mellanox.co.il  2007-02-14 04:19 -------
in this mail:
http://openib.org/pipermail/openib-general/2007-January/031009.html

it is described that the driver that failed the static linkage is old driver
without the bug fixes of the static linkage support.

I checked this issue with the new driver, and everything is working now.


-- 
Configure bugmail: https://bugs.openfabrics.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at lists.openfabrics.org  Wed Feb 14 04:21:38 2007
From: bugzilla-daemon at lists.openfabrics.org (bugzilla-daemon at lists.openfabrics.org)
Date: Wed, 14 Feb 2007 04:21:38 -0800 (PST)
Subject: [openib-general] [Bug 296] The function ib_init_ah_from_path
 doesn't fill all of the ib_ah_attr attributes
In-Reply-To: <bug-296-1@https.bugs.openfabrics.org/>
Message-ID: <20070214122138.446B5E603C3@openfabrics.org>

https://bugs.openfabrics.org/show_bug.cgi?id=296


dotanb at mellanox.co.il changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Version|1.1                         |1.2


-- 
Configure bugmail: https://bugs.openfabrics.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at lists.openfabrics.org  Wed Feb 14 04:22:09 2007
From: bugzilla-daemon at lists.openfabrics.org (bugzilla-daemon at lists.openfabrics.org)
Date: Wed, 14 Feb 2007 04:22:09 -0800 (PST)
Subject: [openib-general] [Bug 315] enabling the rdma_ucm and restarting the
 driver several times causes kernel oops
In-Reply-To: <bug-315-1@https.bugs.openfabrics.org/>
Message-ID: <20070214122209.DEA92E603C3@openfabrics.org>

https://bugs.openfabrics.org/show_bug.cgi?id=315


mst at mellanox.co.il changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         AssignedTo|bugzilla at openib.org         |sean.hefty at intel.com


------- Comment #2 from mst at mellanox.co.il  2007-02-14 04:22 -------
Sean is the ucma maintainer in both ofed and kernel.org trees,
reassigned to him.

This could be a duplicate of bug 322.


-- 
Configure bugmail: https://bugs.openfabrics.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at lists.openfabrics.org  Wed Feb 14 04:26:38 2007
From: bugzilla-daemon at lists.openfabrics.org (bugzilla-daemon at lists.openfabrics.org)
Date: Wed, 14 Feb 2007 04:26:38 -0800 (PST)
Subject: [openib-general] [Bug 296] The function ib_init_ah_from_path
 doesn't fill all of the ib_ah_attr attributes
In-Reply-To: <bug-296-1@https.bugs.openfabrics.org/>
Message-ID: <20070214122638.3468CE603C3@openfabrics.org>

https://bugs.openfabrics.org/show_bug.cgi?id=296


mst at mellanox.co.il changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         AssignedTo|bugzilla at openib.org         |sean.hefty at intel.com


------- Comment #1 from mst at mellanox.co.il  2007-02-14 04:26 -------
This is Sean's code, reassign to him.


-- 
Configure bugmail: https://bugs.openfabrics.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
You are the assignee for the bug, or are watching the assignee.


From mst at mellanox.co.il  Wed Feb 14 05:29:25 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Wed, 14 Feb 2007 15:29:25 +0200
Subject: [openib-general] 32-bit build for ppc64 is required
In-Reply-To: <200702121736.35468.ossrosch@linux.vnet.ibm.com>
References: <200702121736.35468.ossrosch@linux.vnet.ibm.com>
Message-ID: <20070214132925.GG16867@mellanox.co.il>

> Quoting Stefan Roscher <ossrosch at linux.vnet.ibm.com>:
> Subject: 32-bit build for ppc64 is required
> 
> Hi,
> 
> after building the latest ofed build package we recognized that on PPC64 only
> 64-bit libaries were build.
> Because we have customers using older userpace apllications which are
> certified for 32-bit we think additional 32bit support is a requirement for 64bit builds.
> 
> If OFED 1.2 supports 32 bit on ppc64, we have to change the install
> directory.I would suggest to install 32-bit binaries into
> /usr/local/ofed/bin32 directory. So no changes on current naming conventions
> has to be done.The libaries are installed in the /usr/local/ofed/lib directory.

The standard practice is to install 64 bit libraries under prefix/lib64
and 32 bit libraries under prefix/lib. Why would PPC64 be any different?

I do not think we need 32 bit binaries at all, and there's no other package
I'm aware of that uses "bin32".

Comments?


-- 
MST


From ossrosch at linux.vnet.ibm.com  Wed Feb 14 06:18:55 2007
From: ossrosch at linux.vnet.ibm.com (Stefan Roscher)
Date: Wed, 14 Feb 2007 15:18:55 +0100
Subject: [openib-general] 32-bit build for ppc64 is required
In-Reply-To: <20070214132925.GG16867@mellanox.co.il>
References: <200702121736.35468.ossrosch@linux.vnet.ibm.com>
	<20070214132925.GG16867@mellanox.co.il>
Message-ID: <200702141518.56138.ossrosch@linux.vnet.ibm.com>

On Wednesday 14 February 2007 14:29, Michael S. Tsirkin wrote:
> > Quoting Stefan Roscher <ossrosch at linux.vnet.ibm.com>:
> > Subject: 32-bit build for ppc64 is required
> > 
> > Hi,
> > 
> > after building the latest ofed build package we recognized that on PPC64 only
> > 64-bit libaries were build.
> > Because we have customers using older userpace apllications which are
> > certified for 32-bit we think additional 32bit support is a requirement for 64bit builds.
> > 
> > If OFED 1.2 supports 32 bit on ppc64, we have to change the install
> > directory.I would suggest to install 32-bit binaries into
> > /usr/local/ofed/bin32 directory. So no changes on current naming conventions
> > has to be done.The libaries are installed in the /usr/local/ofed/lib directory.
> 
> The standard practice is to install 64 bit libraries under prefix/lib64
> and 32 bit libraries under prefix/lib. Why would PPC64 be any different?

I think you missunderstand my post. The directory for 32/64bit libaries
shouldbe prefix/lib and prefix/lib64 respectively. 
But current ofed1.2 I saw only prefix/lib64 directory, ie 64bit libs only.  
> 
> I do not think we need 32 bit binaries at all, and there's no other package
> I'm aware of that uses "bin32".

We have customers that still use 32-bit userspace applications. 
It would be beneficial for them if they can obtain 32bit libs and execs from
ofed1.2 in order to run their applications without recompiling them, because
for some 32-bit applications recompiling is not an option.

regards Stefan


From mst at mellanox.co.il  Wed Feb 14 06:29:24 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Wed, 14 Feb 2007 16:29:24 +0200
Subject: [openib-general] 32-bit build for ppc64 is required
In-Reply-To: <200702141518.56138.ossrosch@linux.vnet.ibm.com>
References: <200702141518.56138.ossrosch@linux.vnet.ibm.com>
Message-ID: <20070214142924.GC20977@mellanox.co.il>

> Quoting Stefan Roscher <ossrosch at linux.vnet.ibm.com>:
> Subject: Re: 32-bit build for ppc64 is required
> 
> On Wednesday 14 February 2007 14:29, Michael S. Tsirkin wrote:
> > > Quoting Stefan Roscher <ossrosch at linux.vnet.ibm.com>:
> > > Subject: 32-bit build for ppc64 is required
> > > 
> > > Hi,
> > > 
> > > after building the latest ofed build package we recognized that on PPC64 only
> > > 64-bit libaries were build.
> > > Because we have customers using older userpace apllications which are
> > > certified for 32-bit we think additional 32bit support is a requirement for 64bit builds.
> > > 
> > > If OFED 1.2 supports 32 bit on ppc64, we have to change the install
> > > directory.I would suggest to install 32-bit binaries into
> > > /usr/local/ofed/bin32 directory. So no changes on current naming conventions
> > > has to be done.The libaries are installed in the /usr/local/ofed/lib directory.
> > 
> > The standard practice is to install 64 bit libraries under prefix/lib64
> > and 32 bit libraries under prefix/lib. Why would PPC64 be any different?
> 
> I think you missunderstand my post. The directory for 32/64bit libaries
> shouldbe prefix/lib and prefix/lib64 respectively. 
> But current ofed1.2 I saw only prefix/lib64 directory, ie 64bit libs only.  

Well, this is not by design: AFAIK on x86_64 both types of libraries
are installed.

> > I do not think we need 32 bit binaries at all, and there's no other package
> > I'm aware of that uses "bin32".
> 
> We have customers that still use 32-bit userspace applications. 
> It would be beneficial for them if they can obtain 32bit libs and execs from
> ofed1.2 in order to run their applications without recompiling them, because
> for some 32-bit applications recompiling is not an option.

32 bit libraries are needed for users to run 32 applications.

But I still do not see how installing 32 bit binaries alongside the 64
bit ones is useful, and I do not think other packages provide this option,
so maybe we shouldn't, either.

-- 
MST


From tziporet at mellanox.co.il  Wed Feb 14 06:17:52 2007
From: tziporet at mellanox.co.il (Tziporet Koren)
Date: Wed, 14 Feb 2007 16:17:52 +0200
Subject: [openib-general] how to handle OFEd 1.2 bugs in bugzilla
Message-ID: <45D31A10.8020102@mellanox.co.il>

Hi Scott and all,
I wish to consult with you in the way we will treat OFED 1.2 bugs in 
bugzilla.

1. Do we want to have 1.2-alpha 1.2-beta, 1.2-rcX in version, or just 
1.2 as we have now
2. What do we wish to do with bugs that were opened for 1.1 and are 
still open?
3. What to do with old bugs that where open to gen2 in general?
4. What is our methodology for priority and severity setup? (There are 
too many  blocker bugs still open in OFED 1.1 so they are not actually 
blockers or they were fixed but not updated)

Thanks,
Tziporet


From bugzilla-daemon at lists.openfabrics.org  Wed Feb 14 06:34:35 2007
From: bugzilla-daemon at lists.openfabrics.org (bugzilla-daemon at lists.openfabrics.org)
Date: Wed, 14 Feb 2007 06:34:35 -0800 (PST)
Subject: [openib-general] [Bug 289] executing ucmatose on local IPoIB
 address of IB port 2 in kernel 2.6.16.21-0.8-smp fails
In-Reply-To: <bug-289-1@https.bugs.openfabrics.org/>
Message-ID: <20070214143436.0917EE603CE@openfabrics.org>

https://bugs.openfabrics.org/show_bug.cgi?id=289


dotanb at mellanox.co.il changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #1 from dotanb at mellanox.co.il  2007-02-14 06:34 -------
there was a bug in one of the backports. it was fixed.


-- 
Configure bugmail: https://bugs.openfabrics.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at lists.openfabrics.org  Wed Feb 14 06:35:00 2007
From: bugzilla-daemon at lists.openfabrics.org (bugzilla-daemon at lists.openfabrics.org)
Date: Wed, 14 Feb 2007 06:35:00 -0800 (PST)
Subject: [openib-general] [Bug 289] executing ucmatose on local IPoIB
 address of IB port 2 in kernel 2.6.16.21-0.8-smp fails
In-Reply-To: <bug-289-1@https.bugs.openfabrics.org/>
Message-ID: <20070214143500.8A2CAE60802@openfabrics.org>

https://bugs.openfabrics.org/show_bug.cgi?id=289


dotanb at mellanox.co.il changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|RESOLVED                    |CLOSED


-- 
Configure bugmail: https://bugs.openfabrics.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From vlad at mellanox.co.il  Wed Feb 14 06:37:29 2007
From: vlad at mellanox.co.il (Vladimir Sokolovsky)
Date: Wed, 14 Feb 2007 16:37:29 +0200
Subject: [openib-general] 32-bit build for ppc64 is required
In-Reply-To: <200702141518.56138.ossrosch@linux.vnet.ibm.com>
References: <200702121736.35468.ossrosch@linux.vnet.ibm.com>
	<20070214132925.GG16867@mellanox.co.il>
	<200702141518.56138.ossrosch@linux.vnet.ibm.com>
Message-ID: <1171463849.16240.11.camel@vladsk-laptop>

On Wed, 2007-02-14 at 15:18 +0100, Stefan Roscher wrote:
> On Wednesday 14 February 2007 14:29, Michael S. Tsirkin wrote:
> > > Quoting Stefan Roscher <ossrosch at linux.vnet.ibm.com>:
> > > Subject: 32-bit build for ppc64 is required
> > > 
> > > Hi,
> > > 
> > > after building the latest ofed build package we recognized that on PPC64 only
> > > 64-bit libaries were build.
> > > Because we have customers using older userpace apllications which are
> > > certified for 32-bit we think additional 32bit support is a requirement for 64bit builds.
> > > 
> > > If OFED 1.2 supports 32 bit on ppc64, we have to change the install
> > > directory.I would suggest to install 32-bit binaries into
> > > /usr/local/ofed/bin32 directory. So no changes on current naming conventions
> > > has to be done.The libaries are installed in the /usr/local/ofed/lib directory.
> > 
> > The standard practice is to install 64 bit libraries under prefix/lib64
> > and 32 bit libraries under prefix/lib. Why would PPC64 be any different?
> 
> I think you missunderstand my post. The directory for 32/64bit libaries
> shouldbe prefix/lib and prefix/lib64 respectively. 
> But current ofed1.2 I saw only prefix/lib64 directory, ie 64bit libs only.  
> > 

prefix/lib (32bit libraries) should be created on ppc64 as well.
Check that you have sysfsutils 32bit RPM installed.
I don't have ppc64 here to check.


-- 
Vladimir Sokolovsky <vlad at mellanox.co.il>
Mellanox Technologies Ltd.


From bugzilla-daemon at lists.openfabrics.org  Wed Feb 14 06:49:45 2007
From: bugzilla-daemon at lists.openfabrics.org (bugzilla-daemon at lists.openfabrics.org)
Date: Wed, 14 Feb 2007 06:49:45 -0800 (PST)
Subject: [openib-general] [Bug 318] Registering up to 1.6 GB in one process
 causes a machine crash
In-Reply-To: <bug-318-1@https.bugs.openfabrics.org/>
Message-ID: <20070214144945.1A694E603CE@openfabrics.org>

https://bugs.openfabrics.org/show_bug.cgi?id=318


mst at mellanox.co.il changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |DUPLICATE


------- Comment #2 from mst at mellanox.co.il  2007-02-14 06:49 -------


*** This bug has been marked as a duplicate of bug 333 ***


-- 
Configure bugmail: https://bugs.openfabrics.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From swise at opengridcomputing.com  Wed Feb 14 06:57:29 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Wed, 14 Feb 2007 08:57:29 -0600
Subject: [openib-general] [Bug 325] RDMA_CM and address translation
 broken on sles9sp3
In-Reply-To: <20070214115444.E46EAE603C3@openfabrics.org>
References: <20070214115444.E46EAE603C3@openfabrics.org>
Message-ID: <1171465049.15208.13.camel@stevo-desktop>


Tziporet, 

I didn't think we were going to apply this patch until Michael tested it
with SDP/IPoIB on various distros. 

Michael, did you get a chance to test it (I'm guessing not since you
were out sick)?  

The reason I'm concerned is that it changes the behavior of
xxx_ip_dev_find() and _all_ backports, and we needed to test it out and
make sure it doesn't regress anything.  If it causes problems on other
backports, the plan was to just fix the sles9sp3 backport and leave the
others alone. 

With the test build vlad published yesterday which has this patch,
rhel4u4 kernel wasn't working for me with iWARP and I'm afraid it might
be due to this patch.  I'm investigating this now.

Steve.


On Wed, 2007-02-14 at 03:54 -0800, bugzilla-daemon at lists.openfabrics.org
wrote:
> https://bugs.openfabrics.org/show_bug.cgi?id=325
> 
> 
> 
> 
> 
> ------- Comment #2 from tziporet at mellanox.co.il  2007-02-14 03:54 -------
> Patch from Steve was applied.
> Please check again on alpha1 package.
> 
> 


From tziporet at mellanox.co.il  Wed Feb 14 07:05:06 2007
From: tziporet at mellanox.co.il (Tziporet Koren)
Date: Wed, 14 Feb 2007 17:05:06 +0200
Subject: [openib-general] [Bug 325] RDMA_CM and address translation
 broken on sles9sp3
In-Reply-To: <1171465049.15208.13.camel@stevo-desktop>
References: <20070214115444.E46EAE603C3@openfabrics.org>
	<1171465049.15208.13.camel@stevo-desktop>
Message-ID: <45D32522.5080100@mellanox.co.il>

Steve Wise wrote:
> Tziporet, 
>
> I didn't think we were going to apply this patch until Michael tested it
> with SDP/IPoIB on various distros. 
>
> Michael, did you get a chance to test it (I'm guessing not since you
> were out sick)?  
>
> The reason I'm concerned is that it changes the behavior of
> xxx_ip_dev_find() and _all_ backports, and we needed to test it out and
> make sure it doesn't regress anything.  If it causes problems on other
> backports, the plan was to just fix the sles9sp3 backport and leave the
> others alone. 
>
> With the test build vlad published yesterday which has this patch,
> rhel4u4 kernel wasn't working for me with iWARP and I'm afraid it might
> be due to this patch.  I'm investigating this now.
>
>   
>
We tested this patch with our regression on IB and its worked fine for 
both SDP and IPoIB.
Then we applied it.
Please report ASAP if you think there is an issue.

Tziporet


From rdreier at cisco.com  Wed Feb 14 07:32:14 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Wed, 14 Feb 2007 07:32:14 -0800
Subject: [openib-general] [PATCH 3 of 4] IB/mthca: fix
 non-cache-coherent CPUs with memfree
In-Reply-To: <20070214102252.GD4434@mellanox.co.il> (Michael S.
	Tsirkin's message of "Wed, 14 Feb 2007 12:22:52 +0200")
References: <20070210211508.GD14903@mellanox.co.il>
	<adasldb2ked.fsf@cisco.com> <20070214102252.GD4434@mellanox.co.il>
Message-ID: <adak5yk69hd.fsf@cisco.com>

 > How do you mean, again? Does sg_set_buf set dma_length?

No, you're right, sorry.

 - R.


From rdreier at cisco.com  Wed Feb 14 07:34:50 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Wed, 14 Feb 2007 07:34:50 -0800
Subject: [openib-general] [PATCH 3 of 4] IB/mthca: fix
 non-cache-coherent CPUs with memfree
In-Reply-To: <20070214100109.GA4434@mellanox.co.il> (Michael S.
	Tsirkin's message of "Wed, 14 Feb 2007 12:01:09 +0200")
References: <adahctqobkg.fsf@cisco.com> <20070214100109.GA4434@mellanox.co.il>
Message-ID: <adafy9869d1.fsf@cisco.com>

 > > I don't see anything that ever bumps chunk->nsg if we're allocating a
 > > coherent region and we end up needing more than one allocation to do
 > > it.
 > 
 > Yes but this is intentional.

 > No, I think the code is fine and this patch will break things:
 > chunk->nsg is needed only for non-coherent memory to call pci_unmap_sg:

what about this code in mthca_memfree.h?

	static inline void mthca_icm_next(struct mthca_icm_iter *iter)
	{
		if (++iter->page_idx >= iter->chunk->nsg) {

the call to pci_unmap_sg you're worried about is in
mthca_free_icm_pages(), which can't be called for coherent memory
anyway, so I don't see a problem with that.

So I think my patch is correct and needed.

 - R.


From mst at mellanox.co.il  Wed Feb 14 07:50:08 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Wed, 14 Feb 2007 17:50:08 +0200
Subject: [openib-general] [Bug 325] RDMA_CM and address translation
 broken on sles9sp3
In-Reply-To: <1171465049.15208.13.camel@stevo-desktop>
References: <1171465049.15208.13.camel@stevo-desktop>
Message-ID: <20070214155008.GJ16867@mellanox.co.il>

> Quoting Steve Wise <swise at opengridcomputing.com>:
> Subject: Re: [Bug 325] RDMA_CM and address translation broken on sles9sp3
> 
> 
> Tziporet, 
> 
> I didn't think we were going to apply this patch until Michael tested it
> with SDP/IPoIB on various distros. 
> 
> Michael, did you get a chance to test it (I'm guessing not since you
> were out sick)?  

Right, I'm not at the lab. I assume Vlad tested this before applying.

> The reason I'm concerned is that it changes the behavior of
> xxx_ip_dev_find() and _all_ backports, and we needed to test it out and
> make sure it doesn't regress anything.  If it causes problems on other
> backports, the plan was to just fix the sles9sp3 backport and leave the
> others alone. 
> 
> With the test build vlad published yesterday which has this patch,
> rhel4u4 kernel wasn't working for me with iWARP and I'm afraid it might
> be due to this patch.  I'm investigating this now.

In actual fact, xxx_ip_dev_find is not even *needed* on anything
except 2.6.14, 2.6.15, 2.6.16 and 2.6.17: these are the kernels
which do not export ip_dev_find.


-- 
MST


From mst at mellanox.co.il  Wed Feb 14 08:04:38 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Wed, 14 Feb 2007 18:04:38 +0200
Subject: [openib-general] [PATCH 3 of 4] IB/mthca: fix
 non-cache-coherent CPUs with memfree
In-Reply-To: <adafy9869d1.fsf@cisco.com>
References: <adafy9869d1.fsf@cisco.com>
Message-ID: <20070214160438.GK16867@mellanox.co.il>

> Quoting Roland Dreier <rdreier at cisco.com>:
> Subject: Re: [PATCH 3 of 4] IB/mthca: fix non-cache-coherent CPUs with memfree
> 
>  > > I don't see anything that ever bumps chunk->nsg if we're allocating a
>  > > coherent region and we end up needing more than one allocation to do
>  > > it.
>  > 
>  > Yes but this is intentional.
> 
>  > No, I think the code is fine and this patch will break things:
>  > chunk->nsg is needed only for non-coherent memory to call pci_unmap_sg:
> 
> what about this code in mthca_memfree.h?
> 
> 	static inline void mthca_icm_next(struct mthca_icm_iter *iter)
> 	{
> 		if (++iter->page_idx >= iter->chunk->nsg) {

Correct. Good catch.
	
> the call to pci_unmap_sg you're worried about is in
> mthca_free_icm_pages(), which can't be called for coherent memory
> anyway, so I don't see a problem with that.
> 
> So I think my patch is correct and needed.

Yes, I agree. I'll also put it in OFED.
Thanks!

-- 
MST


From sashak at voltaire.com  Wed Feb 14 08:24:07 2007
From: sashak at voltaire.com (Sasha Khapyorsky)
Date: Wed, 14 Feb 2007 18:24:07 +0200
Subject: [openib-general] [PATCH] drivers/infiniband: madeye integration
Message-ID: <20070214162407.GP22807@sashak.voltaire.com>


This integrates madeye debug module into the tree.

Signed-off-by: Sasha Khapyorsky <sashak at voltaire.com>
---
 drivers/infiniband/Kconfig       |    2 +
 drivers/infiniband/Makefile      |    1 +
 drivers/infiniband/util/Kconfig  |    6 +
 drivers/infiniband/util/Makefile |    3 +
 drivers/infiniband/util/madeye.c |  590 ++++++++++++++++++++++++++++++++++++++
 5 files changed, 602 insertions(+), 0 deletions(-)
 create mode 100644 drivers/infiniband/util/Kconfig
 create mode 100644 drivers/infiniband/util/Makefile
 create mode 100644 drivers/infiniband/util/madeye.c

diff --git a/drivers/infiniband/Kconfig b/drivers/infiniband/Kconfig
index 712e5e2..de8e39f 100644
--- a/drivers/infiniband/Kconfig
+++ b/drivers/infiniband/Kconfig
@@ -50,4 +50,6 @@ source "drivers/infiniband/ulp/sdp/Kconfig"
 
 source "drivers/infiniband/ulp/vnic/Kconfig"
 
+source "drivers/infiniband/util/Kconfig"
+
 endmenu
diff --git a/drivers/infiniband/Makefile b/drivers/infiniband/Makefile
index 57f2616..a7d1dc2 100644
--- a/drivers/infiniband/Makefile
+++ b/drivers/infiniband/Makefile
@@ -9,3 +9,4 @@ obj-$(CONFIG_INFINIBAND_SRP)		+= ulp/srp/
 obj-$(CONFIG_INFINIBAND_ISER)		+= ulp/iser/
 obj-$(CONFIG_INFINIBAND_SDP)		+= ulp/sdp/
 obj-$(CONFIG_INFINIBAND_VNIC)		+= ulp/vnic/
+obj-$(CONFIG_INFINIBAND_MADEYE)		+= util/
diff --git a/drivers/infiniband/util/Kconfig b/drivers/infiniband/util/Kconfig
new file mode 100644
index 0000000..5e98eaa
--- /dev/null
+++ b/drivers/infiniband/util/Kconfig
@@ -0,0 +1,6 @@
+config INFINIBAND_MADEYE
+	tristate "MAD debug viewer for InfiniBand"
+	depends on INFINIBAND
+	---help---
+	  Prints sent and received MADs on QP 0/1 for debugging.
+
diff --git a/drivers/infiniband/util/Makefile b/drivers/infiniband/util/Makefile
new file mode 100644
index 0000000..caf9471
--- /dev/null
+++ b/drivers/infiniband/util/Makefile
@@ -0,0 +1,3 @@
+obj-$(CONFIG_INFINIBAND_MADEYE)	+= ib_madeye.o
+
+ib_madeye-y := madeye.o
diff --git a/drivers/infiniband/util/madeye.c b/drivers/infiniband/util/madeye.c
new file mode 100644
index 0000000..2a76d45
--- /dev/null
+++ b/drivers/infiniband/util/madeye.c
@@ -0,0 +1,590 @@
+/*
+ * Copyright (c) 2004, 2005 Intel Corporation.  All rights reserved.
+ * Copyright (c) 2005, 2006 Voltaire Inc.  All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directorY of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ *
+ * $Id$
+ */
+#include <linux/module.h>
+#include <linux/device.h>
+#include <linux/err.h>
+
+#include <rdma/ib_mad.h>
+#include <rdma/ib_smi.h>
+#include <rdma/ib_sa.h>
+
+MODULE_AUTHOR("Sean Hefty");
+MODULE_DESCRIPTION("InfiniBand MAD viewer");
+MODULE_LICENSE("Dual BSD/GPL");
+
+static void madeye_remove_one(struct ib_device *device);
+static void madeye_add_one(struct ib_device *device);
+
+static struct ib_client madeye_client = {
+	.name   = "madeye",
+	.add    = madeye_add_one,
+	.remove = madeye_remove_one
+};
+
+struct madeye_port {
+	struct ib_mad_agent *smi_agent;
+	struct ib_mad_agent *gsi_agent;
+};
+
+static int smp = 1;
+static int gmp = 1;
+static int mgmt_class = 0;
+static int attr_id = 0;
+static int data = 0;
+
+module_param(smp, int, 0444);
+module_param(gmp, int, 0444);
+module_param(mgmt_class, int, 0444);
+module_param(attr_id, int, 0444);
+module_param(data, int, 0444);
+
+MODULE_PARM_DESC(smp, "Display all SMPs (default=1)");
+MODULE_PARM_DESC(gmp, "Display all GMPs (default=1)");
+MODULE_PARM_DESC(mgmt_class, "Display all MADs of specified class (default=0)");
+MODULE_PARM_DESC(attr_id, "Display add MADs of specified attribute ID (default=0)");
+MODULE_PARM_DESC(data, "Display data area of MADs (default=0)");
+
+static char * get_class_name(u8 mgmt_class)
+{
+	switch(mgmt_class) {
+	case IB_MGMT_CLASS_SUBN_LID_ROUTED:
+		return "LID routed SMP";
+	case IB_MGMT_CLASS_SUBN_DIRECTED_ROUTE:
+		return "Directed route SMP";
+	case IB_MGMT_CLASS_SUBN_ADM:
+		return "Subnet admin.";
+	case IB_MGMT_CLASS_PERF_MGMT:
+		return "Perf. mgmt.";
+	case IB_MGMT_CLASS_BM:
+		return "Baseboard mgmt.";
+	case IB_MGMT_CLASS_DEVICE_MGMT:
+		return "Device mgmt.";
+	case IB_MGMT_CLASS_CM:
+		return "Comm. mgmt.";
+	case IB_MGMT_CLASS_SNMP:
+		return "SNMP";
+	default:
+		return "Unknown vendor/application";
+	}
+}
+
+static char * get_method_name(u8 mgmt_class, u8 method)
+{
+	switch(method) {
+	case IB_MGMT_METHOD_GET:
+		return "Get";
+	case IB_MGMT_METHOD_SET:
+		return "Set";
+	case IB_MGMT_METHOD_GET_RESP:
+		return "Get response";
+	case IB_MGMT_METHOD_SEND:
+		return "Send";
+	case IB_MGMT_METHOD_SEND | IB_MGMT_METHOD_RESP:
+		return "Send response";
+	case IB_MGMT_METHOD_TRAP:
+		return "Trap";
+	case IB_MGMT_METHOD_REPORT:
+		return "Report";
+	case IB_MGMT_METHOD_REPORT_RESP:
+		return "Report response";
+	case IB_MGMT_METHOD_TRAP_REPRESS:
+		return "Trap repress";
+	default:
+		break;
+	}
+
+	switch (mgmt_class) {
+	case IB_MGMT_CLASS_SUBN_ADM:
+		switch (method) {
+		case IB_SA_METHOD_GET_TABLE:
+			return "Get table";
+		case IB_SA_METHOD_GET_TABLE_RESP:
+			return "Get table response";
+		case IB_SA_METHOD_DELETE:
+			return "Delete";
+		case IB_SA_METHOD_DELETE_RESP:
+			return "Delete response";
+		case IB_SA_METHOD_GET_MULTI:
+			return "Get Multi";
+		case IB_SA_METHOD_GET_MULTI_RESP:
+			return "Get Multi response";
+		case IB_SA_METHOD_GET_TRACE_TBL:
+			return "Get Trace Table response";
+		default:
+			break;
+		}
+	default:
+		break;
+	}
+
+	return "Unknown";
+}
+
+static void print_status_details(u16 status)
+{
+	if (status & 0x0001)
+		printk("               busy\n");
+	if (status & 0x0002)
+		printk("               redirection required\n");
+	switch((status & 0x001C) >> 2) {
+	case 1:
+		printk("               bad version\n");
+		break;
+	case 2:
+		printk("               method not supported\n");
+		break;
+	case 3:
+		printk("               method/attribute combo not supported\n");
+		break;
+	case 7:
+		printk("               invalid attribute/modifier value\n");
+		break;
+	}
+}
+
+static char * get_sa_attr(__be16 attr)
+{
+	switch(attr) {
+	case IB_SA_ATTR_CLASS_PORTINFO:
+		return "Class Port Info";
+	case IB_SA_ATTR_NOTICE:
+		return "Notice";
+	case IB_SA_ATTR_INFORM_INFO:
+		return "Inform Info";
+	case IB_SA_ATTR_NODE_REC:
+		return "Node Record";
+	case IB_SA_ATTR_PORT_INFO_REC:
+		return "PortInfo Record";
+	case IB_SA_ATTR_SL2VL_REC:
+		return "SL to VL Record";
+	case IB_SA_ATTR_SWITCH_REC:
+		return "Switch Record";
+	case IB_SA_ATTR_LINEAR_FDB_REC:
+		return "Linear FDB Record";
+	case IB_SA_ATTR_RANDOM_FDB_REC:
+		return "Random FDB Record";
+	case IB_SA_ATTR_MCAST_FDB_REC:
+		return "Multicast FDB Record";
+	case IB_SA_ATTR_SM_INFO_REC:
+		return "SM Info Record";
+	case IB_SA_ATTR_LINK_REC:
+		return "Link Record";
+	case IB_SA_ATTR_GUID_INFO_REC:
+		return "Guid Info Record";
+	case IB_SA_ATTR_SERVICE_REC:
+		return "Service Record";
+	case IB_SA_ATTR_PARTITION_REC:
+		return "Partition Record";
+	case IB_SA_ATTR_PATH_REC:
+		return "Path Record";
+	case IB_SA_ATTR_VL_ARB_REC:
+		return "VL Arb Record";
+	case IB_SA_ATTR_MC_MEMBER_REC:
+		return "MC Member Record";
+	case IB_SA_ATTR_TRACE_REC:
+		return "Trace Record";
+	case IB_SA_ATTR_MULTI_PATH_REC:
+		return "Multi Path Record";
+	case IB_SA_ATTR_SERVICE_ASSOC_REC:
+		return "Service Assoc Record";
+	case IB_SA_ATTR_INFORM_INFO_REC:
+		return "Inform Info Record";
+	default:
+		return "";
+	}
+}
+
+static void print_mad_hdr(struct ib_mad_hdr *mad_hdr)
+{
+	printk("MAD version....0x%01x\n", mad_hdr->base_version);
+	printk("Class..........0x%01x (%s)\n", mad_hdr->mgmt_class,
+	       get_class_name(mad_hdr->mgmt_class));
+	printk("Class version..0x%01x\n", mad_hdr->class_version);
+	printk("Method.........0x%01x (%s)\n", mad_hdr->method,
+	       get_method_name(mad_hdr->mgmt_class, mad_hdr->method));
+	printk("Status.........0x%02x\n", be16_to_cpu(mad_hdr->status));
+	if (mad_hdr->status)
+		print_status_details(be16_to_cpu(mad_hdr->status));
+	printk("Class specific.0x%02x\n", be16_to_cpu(mad_hdr->class_specific));
+	printk("Trans ID.......0x%llx\n", mad_hdr->tid);
+	if (mad_hdr->mgmt_class == IB_MGMT_CLASS_SUBN_ADM)
+		printk("Attr ID........0x%02x (%s)\n",
+		       be16_to_cpu(mad_hdr->attr_id),
+		       get_sa_attr(be16_to_cpu(mad_hdr->attr_id)));
+	else
+		printk("Attr ID........0x%02x\n",
+		       be16_to_cpu(mad_hdr->attr_id));
+	printk("Attr modifier..0x%04x\n", be32_to_cpu(mad_hdr->attr_mod));
+}
+
+static char * get_rmpp_type(u8 rmpp_type)
+{
+	switch (rmpp_type) {
+	case IB_MGMT_RMPP_TYPE_DATA:
+		return "Data";
+	case IB_MGMT_RMPP_TYPE_ACK:
+		return "Ack";
+	case IB_MGMT_RMPP_TYPE_STOP:
+		return "Stop";
+	case IB_MGMT_RMPP_TYPE_ABORT:
+		return "Abort";
+	default:
+		return "Unknown";
+	}
+}
+
+static char * get_rmpp_flags(u8 rmpp_flags)
+{
+	if (rmpp_flags & IB_MGMT_RMPP_FLAG_ACTIVE)
+		if (rmpp_flags & IB_MGMT_RMPP_FLAG_FIRST)
+			if (rmpp_flags & IB_MGMT_RMPP_FLAG_LAST)
+				return "Active - First & Last";
+			else
+				return "Active - First";
+		else
+			if (rmpp_flags & IB_MGMT_RMPP_FLAG_LAST)
+				return "Active - Last";
+			else
+				return "Active";
+	else
+		return "Inactive";
+}
+
+static void print_rmpp_hdr(struct ib_rmpp_hdr *rmpp_hdr)
+{
+	printk("RMPP version...0x%01x\n", rmpp_hdr->rmpp_version);
+	printk("RMPP type......0x%01x (%s)\n", rmpp_hdr->rmpp_type,
+	       get_rmpp_type(rmpp_hdr->rmpp_type));
+	printk("RMPP RRespTime.0x%01x\n", ib_get_rmpp_resptime(rmpp_hdr));
+	printk("RMPP flags.....0x%01x (%s)\n", ib_get_rmpp_flags(rmpp_hdr),
+	       get_rmpp_flags(ib_get_rmpp_flags(rmpp_hdr)));
+	printk("RMPP status....0x%01x\n", rmpp_hdr->rmpp_status);
+	printk("Seg number.....0x%04x\n", be32_to_cpu(rmpp_hdr->seg_num));
+	switch (rmpp_hdr->rmpp_type) {
+	case IB_MGMT_RMPP_TYPE_DATA:
+		printk("Payload len....0x%04x\n",
+		       be32_to_cpu(rmpp_hdr->paylen_newwin));
+		break;
+	case IB_MGMT_RMPP_TYPE_ACK:
+		printk("New window.....0x%04x\n",
+		       be32_to_cpu(rmpp_hdr->paylen_newwin));
+		break;
+	default:
+		printk("Data 2.........0x%04x\n",
+		       be32_to_cpu(rmpp_hdr->paylen_newwin));
+		break;
+	}
+}
+
+static char * get_smp_attr(__be16 attr)
+{
+	switch (attr) {
+	case IB_SMP_ATTR_NOTICE:
+		return "notice";
+	case IB_SMP_ATTR_NODE_DESC:
+		return "node description";
+	case IB_SMP_ATTR_NODE_INFO:
+		return "node info";
+	case IB_SMP_ATTR_SWITCH_INFO:
+		return "switch info";
+	case IB_SMP_ATTR_GUID_INFO:
+		return "GUID info";
+	case IB_SMP_ATTR_PORT_INFO:
+		return "port info";
+	case IB_SMP_ATTR_PKEY_TABLE:
+		return "pkey table";
+	case IB_SMP_ATTR_SL_TO_VL_TABLE:
+		return "SL to VL table";
+	case IB_SMP_ATTR_VL_ARB_TABLE:
+		return "VL arbitration table";
+	case IB_SMP_ATTR_LINEAR_FORWARD_TABLE:
+		return "linear forwarding table";
+	case IB_SMP_ATTR_RANDOM_FORWARD_TABLE:
+		return "random forward table";
+	case IB_SMP_ATTR_MCAST_FORWARD_TABLE:
+		return "multicast forward table";
+	case IB_SMP_ATTR_SM_INFO:
+		return "SM info";
+	case IB_SMP_ATTR_VENDOR_DIAG:
+		return "vendor diags";
+	case IB_SMP_ATTR_LED_INFO:
+		return "LED info";
+	default:
+		return "";
+	}
+}
+
+static void print_smp(struct ib_smp *smp)
+{
+	int i;
+
+	printk("MAD version....0x%01x\n", smp->base_version);
+	printk("Class..........0x%01x (%s)\n", smp->mgmt_class,
+	       get_class_name(smp->mgmt_class));
+	printk("Class version..0x%01x\n", smp->class_version);
+	printk("Method.........0x%01x (%s)\n", smp->method,
+	       get_method_name(smp->mgmt_class, smp->method));
+	printk("Status.........0x%02x\n", be16_to_cpu(smp->status));
+	if (smp->status)
+		print_status_details(be16_to_cpu(smp->status));
+	printk("Hop pointer...0x%01x\n", smp->hop_ptr);
+	printk("Hop counter...0x%01x\n", smp->hop_cnt);
+	printk("Trans ID.......0x%llx\n", smp->tid);
+	printk("Attr ID........0x%02x (%s)\n", be16_to_cpu(smp->attr_id),
+		get_smp_attr(smp->attr_id));
+	printk("Attr modifier..0x%04x\n", be32_to_cpu(smp->attr_mod));
+
+	printk("Mkey...........0x%llx\n", be64_to_cpu(smp->mkey));
+	printk("DR SLID........0x%02x\n", be16_to_cpu(smp->dr_slid));
+	printk("DR DLID........0x%02x", be16_to_cpu(smp->dr_dlid));
+
+	if (data) {
+		for (i = 0; i < IB_SMP_DATA_SIZE; i++) {
+			if (i % 16 == 0)
+				printk("\nSMP Data.......");
+			printk("%01x ", smp->data[i]);
+		}
+		for (i = 0; i < IB_SMP_MAX_PATH_HOPS; i++) {
+			if (i % 16 == 0)
+				printk("\nInitial path...");
+			printk("%01x ", smp->initial_path[i]);
+		}
+		for (i = 0; i < IB_SMP_MAX_PATH_HOPS; i++) {
+			if (i % 16 == 0)
+				printk("\nReturn path....");
+			printk("%01x ", smp->return_path[i]);
+		}
+	}
+	printk("\n");
+}
+
+static void snoop_smi_handler(struct ib_mad_agent *mad_agent,
+			      struct ib_mad_send_buf *send_buf,
+			      struct ib_mad_send_wc *mad_send_wc)
+{
+	struct ib_mad_hdr *hdr = send_buf->mad;
+
+	if (!smp && hdr->mgmt_class != mgmt_class)
+		return;
+	if (attr_id && hdr->attr_id != attr_id)
+		return;
+
+	printk("Madeye:sent SMP\n");
+	print_smp(send_buf->mad);
+}
+
+static void recv_smi_handler(struct ib_mad_agent *mad_agent,
+			     struct ib_mad_recv_wc *mad_recv_wc)
+{
+	if (!smp && mad_recv_wc->recv_buf.mad->mad_hdr.mgmt_class != mgmt_class)
+		return;
+	if (attr_id && mad_recv_wc->recv_buf.mad->mad_hdr.attr_id != attr_id)
+		return;
+
+	printk("Madeye:recv SMP\n");
+	print_smp((struct ib_smp *)&mad_recv_wc->recv_buf.mad->mad_hdr);
+}
+
+static int is_rmpp_mad(struct ib_mad_hdr *mad_hdr)
+{
+	if (mad_hdr->mgmt_class == IB_MGMT_CLASS_SUBN_ADM) {
+		switch (mad_hdr->method) {
+		case IB_SA_METHOD_GET_TABLE:
+		case IB_SA_METHOD_GET_TABLE_RESP:
+		case IB_SA_METHOD_GET_MULTI_RESP:
+			return 1;
+		default:
+			break;
+		}
+	} else if ((mad_hdr->mgmt_class >= IB_MGMT_CLASS_VENDOR_RANGE2_START) &&
+		   (mad_hdr->mgmt_class <= IB_MGMT_CLASS_VENDOR_RANGE2_END))
+		return 1;
+
+	return 0;
+}
+
+static void snoop_gsi_handler(struct ib_mad_agent *mad_agent,
+			      struct ib_mad_send_buf *send_buf,
+			      struct ib_mad_send_wc *mad_send_wc)
+{
+	struct ib_mad_hdr *hdr = send_buf->mad;
+
+	if (!gmp && hdr->mgmt_class != mgmt_class)
+		return;
+	if (attr_id && hdr->attr_id != attr_id)
+		return;
+
+	printk("Madeye:sent GMP\n");
+	print_mad_hdr(hdr);
+
+	if (is_rmpp_mad(hdr))
+		print_rmpp_hdr(&((struct ib_rmpp_mad *) hdr)->rmpp_hdr);
+}
+
+static void recv_gsi_handler(struct ib_mad_agent *mad_agent,
+			     struct ib_mad_recv_wc *mad_recv_wc)
+{
+	struct ib_mad_hdr *hdr = &mad_recv_wc->recv_buf.mad->mad_hdr;
+	struct ib_rmpp_mad *mad = NULL;
+	struct ib_sa_mad *sa_mad;
+	struct ib_vendor_mad *vendor_mad;
+	u8 *mad_data;
+	int i, j;
+
+	if (!gmp && hdr->mgmt_class != mgmt_class)
+		return;
+	if (attr_id && mad_recv_wc->recv_buf.mad->mad_hdr.attr_id != attr_id)
+		return;
+
+	printk("Madeye:recv GMP\n");
+	print_mad_hdr(hdr);
+
+	if (is_rmpp_mad(hdr)) {
+		mad = (struct ib_rmpp_mad *) hdr;
+		print_rmpp_hdr(&mad->rmpp_hdr);
+	}
+
+	if (data) {
+		if (hdr->mgmt_class == IB_MGMT_CLASS_SUBN_ADM) {
+			j = IB_MGMT_SA_DATA;
+			/* Display SA header */
+			if (is_rmpp_mad(hdr) &&
+			    mad->rmpp_hdr.rmpp_type != IB_MGMT_RMPP_TYPE_DATA)
+				return;
+			sa_mad = (struct ib_sa_mad *)
+				 &mad_recv_wc->recv_buf.mad;
+			mad_data = sa_mad->data;
+		} else {
+			if (is_rmpp_mad(hdr)) {
+				j = IB_MGMT_VENDOR_DATA;
+				/* Display OUI */
+				vendor_mad = (struct ib_vendor_mad *)
+					     &mad_recv_wc->recv_buf.mad;
+				printk("Vendor OUI......%01x %01x %01x\n",
+					vendor_mad->oui[0],
+					vendor_mad->oui[1],
+					vendor_mad->oui[2]);
+				mad_data = vendor_mad->data;
+			} else {
+				j = IB_MGMT_MAD_DATA;
+				mad_data = mad_recv_wc->recv_buf.mad->data;
+			}
+		}
+		for (i = 0; i < j; i++) {
+			if (i % 16 == 0)
+				printk("\nData...........");
+			printk("%01x ", mad_data[i]);
+		}
+		printk("\n");
+	}
+}
+
+static void madeye_add_one(struct ib_device *device)
+{
+	struct madeye_port *port;
+	int reg_flags;
+	u8 i, s, e;
+
+	if (device->node_type == RDMA_NODE_IB_SWITCH) {
+		s = 0;
+		e = 0;
+	} else {
+		s = 1;
+		e = device->phys_port_cnt;
+	}
+
+	port = kmalloc(sizeof *port * (e - s + 1), GFP_KERNEL);
+	if (!port)
+		goto out;
+
+	reg_flags = IB_MAD_SNOOP_SEND_COMPLETIONS | IB_MAD_SNOOP_RECVS;
+	for (i = s; i <= e; i++) {
+		port[i].smi_agent = ib_register_mad_snoop(device, i,
+							  IB_QPT_SMI,
+							  reg_flags,
+							  snoop_smi_handler,
+							  recv_smi_handler,
+							  &port[i]);
+		port[i].gsi_agent = ib_register_mad_snoop(device, i,
+							  IB_QPT_GSI,
+							  reg_flags,
+							  snoop_gsi_handler,
+							  recv_gsi_handler,
+							  &port[i]);
+	}
+
+out:
+	ib_set_client_data(device, &madeye_client, port);
+}
+
+static void madeye_remove_one(struct ib_device *device)
+{
+	struct madeye_port *port;
+	int i, s, e;
+
+	port = (struct madeye_port *)
+		ib_get_client_data(device, &madeye_client);
+	if (!port)
+		return;
+
+	if (device->node_type == RDMA_NODE_IB_SWITCH) {
+		s = 0;
+		e = 0;
+	} else {
+		s = 1;
+		e = device->phys_port_cnt;
+	}
+
+	for (i = s; i <= e; i++) {
+		if (!IS_ERR(port[i].smi_agent))
+			ib_unregister_mad_agent(port[i].smi_agent);
+		if (!IS_ERR(port[i].gsi_agent))
+			ib_unregister_mad_agent(port[i].gsi_agent);
+	}
+	kfree(port);
+}
+
+static int __init ib_madeye_init(void)
+{
+	return ib_register_client(&madeye_client);
+}
+
+static void __exit ib_madeye_cleanup(void)
+{
+	ib_unregister_client(&madeye_client);
+}
+
+module_init(ib_madeye_init);
+module_exit(ib_madeye_cleanup);
-- 
1.5.0.rc2.g73a2


From Ashish.Batwara at lsi.com  Wed Feb 14 08:12:35 2007
From: Ashish.Batwara at lsi.com (Batwara, Ashish)
Date: Wed, 14 Feb 2007 09:12:35 -0700
Subject: [openib-general] SM assigned GID addresses
Message-ID: <01B9E81EECACE94DBBD0A556E768FB8A013599EC@NAMAIL2.ad.lsil.com>

Hi,
I am referring to Section 4.1.1 of IB Spec which talks about "GID Usage
AND Properties". Does anyone know whether or not SM uses item # 3 below
for the address assignment and who are all the vendor supports # 3? Can
anybody points me to the appropriate driver documentation in this area?

Thanks
Ashish


GID USAGE AND PROPERTIES
1) Each endport shall be assigned at least one unicast GID. The first
unicast GID assigned shall be created using the manufacturer assigned
EUI-64 identifier. This GID is referred to as GID index 0 and is
formed by techniques 3(a) and 3(b) described below.
2) The default GID prefix shall be (0xFE80::0). A packet using the
default
GID prefix and either a manufacturer assigned or SM assigned
EUI-64 must always be accepted by an endnode. A packet containing
a GRH with a destination GID with this prefix must never be
forwarded by a router, i.e. it is restricted to the local subnet.
3) A unicast GID shall be created using one or more of the following
mechanisms:
a) Concatenation of the default GID prefix with the manufacturer
assigned
EUI-64 identifier associated with an endport. This GID is
referred to as the default GID.
b) Concatenation of a subnet manager assigned 64-bit GID prefix
and the manufacturer assigned EUI-64 identifier associated with
an endport.
c) Assignment of a GID by the subnet manager. The subnet manager
creates a GID by concatenating the GID prefix (default or assigned)
with a set of locally assigned EUI-64 values (at GID index
1 or above). Each endport must be assigned at least one unicast GID
using (a). Additional GIDs may be assigned using (b) and/or (c). Note: A
subnet
shall only have one assigned GID prefix (non default) at any given
time.


From tziporet at mellanox.co.il  Wed Feb 14 08:25:06 2007
From: tziporet at mellanox.co.il (Tziporet Koren)
Date: Wed, 14 Feb 2007 18:25:06 +0200
Subject: [openib-general] OFED 1.2 alpha release
Message-ID: <45D337E2.200@mellanox.co.il>

Hi,

In two weeks delay we publish OFED 1.2-alpha1 on  
http://www.openfabrics.org/builds/ofed-1.2/
File: OFED-1.2-alpha1.tgz
BUILD_ID contains info on all packages sources location.

Please report any issues in bugzilla https://bugs.openfabrics.org/

Tziporet & Vlad

*_OS support:_*
Novell:
    - SLES 9.0 SP3
    - SLES10
Redhat:
    - Redhat EL4 up4
    - Redhat EL5 beta2 (only partially tested)
kernel.org:
    - 2.6.20
    - 2.6.19

Note: Redhat EL4 up3, Fedora C4, Fedora C6 and SuSE Pro 10 are not part 
of the official list.
We keep the backport patches for these OSes and make sure OFED compile 
and loaded properly but will not do full QA cycle.

_*Systems:*_
    * x86_64
    * x86
    * ia64
    * ppc64 (have not tested user space)

_*Main changes from OFED-1.1:*_

   1. iWRAP is now supported with Chelsio T3
   2. New kernel modules: VNIC, RDS, Bonding, SA cache,
   3. New packages: MVAPICH2
   4. IPoIB Connected mode
   5. Multicast join from user space
   6. libibverbs 1.1
   7. OpenSM new routing models: FAT tree routing and Taurus routing
   8. GUI tool for network diagnostic
   9. New MPI releases: MVAPICH: version 0.9.9, Open MPI: version 1.2,
      MVAPICH2: version 0.9.8

Detailed list of changes can be found in: 
https://wiki.openfabrics.org/tiki-index.php?page=OFED+1.2+release+plan+and+features

_*Limitations and known issues:*_

   1. ipath driver compilation fails on all systems, except for kernel
      2.6.20
   2. libipathverbs  is not working with libibverbs 1.1
   3. SDP netstat does not available on RHEL5 (due to compilation errors)
   4. Routing table problem in SLES10 when using port #2
   5. RDS compiles only on kernel 2.6.18/19/20
   6. MVAPICH2 installation fails on SuSE Pro 10.
   7. mstflint is not working on ppc64
   8. RDS was not tested

_*Missing features that should be completed for the Beta:*_

   1. Add madeye utility
   2. RDS to support SLES10 and RHEL

For details on each module status see: 
https://wiki.openfabrics.org/tiki-index.php?page=Teleconf+02-12-2007


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070214/f43c22ae/attachment.html>

From hnguyen at linux.vnet.ibm.com  Wed Feb 14 08:40:30 2007
From: hnguyen at linux.vnet.ibm.com (Hoang-Nam Nguyen)
Date: Wed, 14 Feb 2007 17:40:30 +0100
Subject: [openib-general] [PATCH 2.6.21-rc1 0/5] ehca patch set for
	2.6.21-rc1
Message-ID: <200702141740.30637.hnguyen@linux.vnet.ibm.com>

Hello Roland!
Here is a patch set for ehca with the following changes resp. bug fixes:
* Reworked irq handler to avoid/reduce missed irq events
* Fix race condition bug in find_next_online_cpu() and other potential
  locking issue of scaling code
* Allow scaling code to be configurable (en-/disable) via module parameter
* Replace yield() in ehca_destroy_cq() by wait_for_completion()
* ehca_query_port() now returns LINK_UP for phys_state instead UNKNOWN
Thanks!
Nam


From hnguyen at linux.vnet.ibm.com  Wed Feb 14 08:40:47 2007
From: hnguyen at linux.vnet.ibm.com (Hoang-Nam Nguyen)
Date: Wed, 14 Feb 2007 17:40:47 +0100
Subject: [openib-general] [PATCH 2.6.21-rc1 1/5] ehca: reworked irq handler
 to avoid/reduce missed irq events
Message-ID: <200702141740.48286.hnguyen@linux.vnet.ibm.com>

Hi,
here is a patch for ehca with the reworked irq handler.
Thanks
Nam


Signed-off-by: Hoang-Nam Nguyen <hnguyen at de.ibm.com>
---


 ehca_classes.h |   18 +++--
 ehca_eq.c      |    1 
 ehca_irq.c     |  200 ++++++++++++++++++++++++++++++++++++---------------------
 ehca_irq.h     |    1 
 ehca_main.c    |   24 +++++-
 ipz_pt_fn.h    |    9 ++
 6 files changed, 172 insertions(+), 81 deletions(-)


diff -Nurp infiniband_orig/drivers/infiniband/hw/ehca/ehca_classes.h infiniband_work/drivers/infiniband/hw/ehca/ehca_classes.h
--- infiniband_orig/drivers/infiniband/hw/ehca/ehca_classes.h   2007-02-11 21:31:06.000000000 +0100
+++ infiniband_work/drivers/infiniband/hw/ehca/ehca_classes.h   2007-02-14 12:53:41.000000000 +0100
@@ -42,8 +42,6 @@
 #ifndef __EHCA_CLASSES_H__
 #define __EHCA_CLASSES_H__

-#include "ehca_classes.h"
-#include "ipz_pt_fn.h"

 struct ehca_module;
 struct ehca_qp;
@@ -54,14 +52,22 @@ struct ehca_mw;
 struct ehca_pd;
 struct ehca_av;

+#include <rdma/ib_verbs.h>
+#include <rdma/ib_user_verbs.h>
+
 #ifdef CONFIG_PPC64
 #include "ehca_classes_pSeries.h"
 #endif
+#include "ipz_pt_fn.h"
+#include "ehca_qes.h"
+#include "ehca_irq.h"

-#include <rdma/ib_verbs.h>
-#include <rdma/ib_user_verbs.h>
+#define EHCA_EQE_CACHE_SIZE 20

-#include "ehca_irq.h"
+struct ehca_eqe_cache_entry {
+       struct ehca_eqe *eqe;
+       struct ehca_cq *cq;
+};

 struct ehca_eq {
        u32 length;
@@ -74,6 +80,8 @@ struct ehca_eq {
        spinlock_t spinlock;
        struct tasklet_struct interrupt_task;
        u32 ist;
+       spinlock_t irq_spinlock;
+       struct ehca_eqe_cache_entry eqe_cache[EHCA_EQE_CACHE_SIZE];
 };

 struct ehca_sport {
diff -Nurp infiniband_orig/drivers/infiniband/hw/ehca/ehca_eq.c infiniband_work/drivers/infiniband/hw/ehca/ehca_eq.c
--- infiniband_orig/drivers/infiniband/hw/ehca/ehca_eq.c        2007-02-11 21:31:06.000000000 +0100
+++ infiniband_work/drivers/infiniband/hw/ehca/ehca_eq.c        2007-02-14 12:53:40.000000000 +0100
@@ -61,6 +61,7 @@ int ehca_create_eq(struct ehca_shca *shc
        struct ib_device *ib_dev = &shca->ib_device;

        spin_lock_init(&eq->spinlock);
+       spin_lock_init(&eq->irq_spinlock);
        eq->is_initialized = 0;

        if (type != EHCA_EQ && type != EHCA_NEQ) {
diff -Nurp infiniband_orig/drivers/infiniband/hw/ehca/ehca_irq.c infiniband_work/drivers/infiniband/hw/ehca/ehca_irq.c
--- infiniband_orig/drivers/infiniband/hw/ehca/ehca_irq.c       2007-02-11 21:36:12.000000000 +0100
+++ infiniband_work/drivers/infiniband/hw/ehca/ehca_irq.c       2007-02-14 13:07:54.000000000 +0100
@@ -401,87 +400,143 @@ irqreturn_t ehca_interrupt_eq(int irq, v
        return IRQ_HANDLED;
 }

-void ehca_tasklet_eq(unsigned long data)
-{
-       struct ehca_shca *shca = (struct ehca_shca*)data;
-       struct ehca_eqe *eqe;
-       int int_state;
-       int query_cnt = 0;

-       do {
-               eqe = (struct ehca_eqe *)ehca_poll_eq(shca, &shca->eq);
+static inline void process_eqe(struct ehca_shca *shca, struct ehca_eqe *eqe)
+{
+       u64 eqe_value;
+       u32 token;
+       unsigned long flags;
+       struct ehca_cq *cq;
+       eqe_value = eqe->entry;
+       ehca_dbg(&shca->ib_device, "eqe_value=%lx", eqe_value);
+       if (EHCA_BMASK_GET(EQE_COMPLETION_EVENT, eqe_value)) {
+               ehca_dbg(&shca->ib_device, "... completion event");
+               token = EHCA_BMASK_GET(EQE_CQ_TOKEN, eqe_value);
+               spin_lock_irqsave(&ehca_cq_idr_lock, flags);
+               cq = idr_find(&ehca_cq_idr, token);
+               if (cq == NULL) {
+                       spin_unlock_irqrestore(&ehca_cq_idr_lock, flags);
+                       ehca_err(&shca->ib_device,
+                                "Invalid eqe for non-existing cq token=%x",
+                                token);
+                       return;
+               }
+               reset_eq_pending(cq);
+#ifdef CONFIG_INFINIBAND_EHCA_SCALING
+               queue_comp_task(cq);
+               spin_unlock_irqrestore(&ehca_cq_idr_lock, flags);
+#else
+               spin_unlock_irqrestore(&ehca_cq_idr_lock, flags);
+               comp_event_callback(cq);
+#endif
+       } else {
+               ehca_dbg(&shca->ib_device,
+                        "Got non completion event");
+               parse_identifier(shca, eqe_value);
+       }
+}

-               if ((shca->hw_level >= 2) && eqe)
-                       int_state = 1;
-               else
-                       int_state = 0;
+void ehca_process_eq(struct ehca_shca *shca, int is_irq)
+{
+       struct ehca_eq *eq = &shca->eq;
+       struct ehca_eqe_cache_entry *eqe_cache = eq->eqe_cache;
+       u64 eqe_value;
+       unsigned long flags;
+       int eqe_cnt, i;
+       int eq_empty = 0;

-               while ((int_state == 1) || eqe) {
-                       while (eqe) {
-                               u64 eqe_value = eqe->entry;
-
-                               ehca_dbg(&shca->ib_device,
-                                        "eqe_value=%lx", eqe_value);
-
-                               /* TODO: better structure */
-                               if (EHCA_BMASK_GET(EQE_COMPLETION_EVENT,
-                                                  eqe_value)) {
-                                       unsigned long flags;
-                                       u32 token;
-                                       struct ehca_cq *cq;
-
-                                       ehca_dbg(&shca->ib_device,
-                                                "... completion event");
-                                       token =
-                                               EHCA_BMASK_GET(EQE_CQ_TOKEN,
-                                                              eqe_value);
-                                       spin_lock_irqsave(&ehca_cq_idr_lock,
-                                                         flags);
-                                       cq = idr_find(&ehca_cq_idr, token);
-
-                                       if (cq == NULL) {
-                                               spin_unlock_irqrestore(&ehca_cq_idr_lock,
-                                                                      flags);
-                                               break;
-                                       }
+       spin_lock_irqsave(&eq->irq_spinlock, flags);
+       if (is_irq) {
+               const int max_query_cnt = 100;
+               int query_cnt = 0;
+               int int_state = 1;
+               do {
+                       int_state = hipz_h_query_int_state(
+                               shca->ipz_hca_handle, eq->ist);
+                       query_cnt++;
+                       iosync();
+               } while (int_state && query_cnt < max_query_cnt);
+               if (unlikely((query_cnt == max_query_cnt)))
+                       ehca_dbg(&shca->ib_device, "int_state=%x query_cnt=%x",
+                                int_state, query_cnt);
+       }

-                                       reset_eq_pending(cq);
+       /* read out all eqes */
+       eqe_cnt = 0;
+       do {
+               u32 token;
+               eqe_cache[eqe_cnt].eqe =
+                       (struct ehca_eqe *)ehca_poll_eq(shca, eq);
+               if (!eqe_cache[eqe_cnt].eqe)
+                       break;
+               eqe_value = eqe_cache[eqe_cnt].eqe->entry;
+               if (EHCA_BMASK_GET(EQE_COMPLETION_EVENT, eqe_value)) {
+                       token = EHCA_BMASK_GET(EQE_CQ_TOKEN, eqe_value);
+                       spin_lock(&ehca_cq_idr_lock);
+                       eqe_cache[eqe_cnt].cq = idr_find(&ehca_cq_idr, token);
+                       if (!eqe_cache[eqe_cnt].cq) {
+                               spin_unlock(&ehca_cq_idr_lock);
+                               ehca_err(&shca->ib_device,
+                                        "Invalid eqe for non-existing cq "
+                                        "token=%x", token);
+                               continue;
+                       }
+                       spin_unlock(&ehca_cq_idr_lock);
+               } else
+                       eqe_cache[eqe_cnt].cq = NULL;
+               eqe_cnt++;
+       } while (eqe_cnt < EHCA_EQE_CACHE_SIZE);
+       if (!eqe_cnt) {
+               if (is_irq)
+                       ehca_dbg(&shca->ib_device,
+                                "No eqe found for irq event");
+               goto unlock_irq_spinlock;
+       } else if (!is_irq)
+               ehca_dbg(&shca->ib_device, "deadman found %x eqe", eqe_cnt);
+       if (unlikely(eqe_cnt == EHCA_EQE_CACHE_SIZE))
+               ehca_dbg(&shca->ib_device, "too many eqes for one irq event");
+       /* enable irq for new packets */
+       for (i = 0; i < eqe_cnt; i++) {
+               if (eq->eqe_cache[i].cq)
+                       reset_eq_pending(eq->eqe_cache[i].cq);
+       }
+       /* check eq */
+       spin_lock(&eq->spinlock);
+       eq_empty = (!ipz_eqit_eq_peek_valid(&shca->eq.ipz_queue));
+       spin_unlock(&eq->spinlock);
+       /* call completion handler for cached eqes */
+       for (i = 0; i < eqe_cnt; i++)
+               if (eq->eqe_cache[i].cq) {
 #ifdef CONFIG_INFINIBAND_EHCA_SCALING
-                                       queue_comp_task(cq);
-                                       spin_unlock_irqrestore(&ehca_cq_idr_lock,
-                                                              flags);
+                       spin_lock(&ehca_cq_idr_lock);
+                       queue_comp_task(eq->eqe_cache[i].cq);
+                       spin_unlock(&ehca_cq_idr_lock);
 #else
-                                       spin_unlock_irqrestore(&ehca_cq_idr_lock,
-                                                              flags);
-                                       comp_event_callback(cq);
+                       comp_event_callback(eq->eqe_cache[i].cq);
 #endif
-                               } else {
-                                       ehca_dbg(&shca->ib_device,
-                                                "... non completion event");
-                                       parse_identifier(shca, eqe_value);
-                               }
-                               eqe =
-                                       (struct ehca_eqe *)ehca_poll_eq(shca,
-                                                                   &shca->eq);
-                       }
-
-
-                       if (shca->hw_level >= 2) {
-                               int_state =
-                                   hipz_h_query_int_state(shca->ipz_hca_handle,
-                                                          shca->eq.ist);
-                               query_cnt++;
-                               iosync();
-                               if (query_cnt >= 100) {
-                                       query_cnt = 0;
-                                       int_state = 0;
-                               }
-                       }
-                       eqe = (struct ehca_eqe *)ehca_poll_eq(shca, &shca->eq);
-
+               } else {
+                       ehca_dbg(&shca->ib_device, "Got non completion event");
+                       parse_identifier(shca, eq->eqe_cache[i].eqe->entry);
                }
-       } while (int_state != 0);
+       /* poll eq if not empty */
+       if (eq_empty)
+               goto unlock_irq_spinlock;
+       do {
+               struct ehca_eqe *eqe;
+               eqe = (struct ehca_eqe *)ehca_poll_eq(shca, &shca->eq);
+               if (!eqe)
+                       break;
+               process_eqe(shca, eqe);
+               eqe_cnt++;
+       } while (1);

-       return;
+ unlock_irq_spinlock:
+       spin_unlock_irqrestore(&eq->irq_spinlock, flags);
+}
+
+void ehca_tasklet_eq(unsigned long data)
+{
+       ehca_process_eq((struct ehca_shca*)data, 1);
 }

 #ifdef CONFIG_INFINIBAND_EHCA_SCALING
diff -Nurp infiniband_orig/drivers/infiniband/hw/ehca/ehca_irq.h infiniband_work/drivers/infiniband/hw/ehca/ehca_irq.h
--- infiniband_orig/drivers/infiniband/hw/ehca/ehca_irq.h       2007-02-11 21:31:06.000000000 +0100
+++ infiniband_work/drivers/infiniband/hw/ehca/ehca_irq.h       2007-02-14 12:53:40.000000000 +0100
@@ -56,6 +56,7 @@ void ehca_tasklet_neq(unsigned long data

 irqreturn_t ehca_interrupt_eq(int irq, void *dev_id);
 void ehca_tasklet_eq(unsigned long data);
+void ehca_process_eq(struct ehca_shca *shca, int is_irq);

 struct ehca_cpu_comp_task {
        wait_queue_head_t wait_queue;
diff -Nurp infiniband_orig/drivers/infiniband/hw/ehca/ehca_main.c infiniband_work/drivers/infiniband/hw/ehca/ehca_main.c
--- infiniband_orig/drivers/infiniband/hw/ehca/ehca_main.c      2007-02-11 21:31:06.000000000 +0100
+++ infiniband_work/drivers/infiniband/hw/ehca/ehca_main.c      2007-02-14 12:53:41.000000000 +0100
@@ -52,7 +52,7 @@
 MODULE_LICENSE("Dual BSD/GPL");
 MODULE_AUTHOR("Christoph Raisch <raisch at de.ibm.com>");
 MODULE_DESCRIPTION("IBM eServer HCA InfiniBand Device Driver");
-MODULE_VERSION("SVNEHCA_0020");
+MODULE_VERSION("SVNEHCA_0021");

 int ehca_open_aqp1     = 0;
 int ehca_debug_level   = 0;
@@ -778,8 +777,24 @@ void ehca_poll_eqs(unsigned long data)

        spin_lock(&shca_list_lock);
        list_for_each_entry(shca, &shca_list, shca_list) {
-               if (shca->eq.is_initialized)
-                       ehca_tasklet_eq((unsigned long)(void*)shca);
+               if (shca->eq.is_initialized) {
+                       /* call deadman proc only if eq ptr does not change */
+                       struct ehca_eq *eq = &shca->eq;
+                       int max = 3;
+                       volatile u64 q_ofs, q_ofs2;
+                       u64 flags;
+                       spin_lock_irqsave(&eq->spinlock, flags);
+                       q_ofs = eq->ipz_queue.current_q_offset;
+                       spin_unlock_irqrestore(&eq->spinlock, flags);
+                       do {
+                               spin_lock_irqsave(&eq->spinlock, flags);
+                               q_ofs2 = eq->ipz_queue.current_q_offset;
+                               spin_unlock_irqrestore(&eq->spinlock, flags);
+                               max--;
+                       } while (q_ofs == q_ofs2 && max > 0);
+                       if (q_ofs == q_ofs2)
+                               ehca_process_eq(shca, 0);
+               }
        }
        mod_timer(&poll_eqs_timer, jiffies + HZ);
        spin_unlock(&shca_list_lock);
@@ -790,7 +805,7 @@ int __init ehca_module_init(void)
        int ret;

        printk(KERN_INFO "eHCA Infiniband Device Driver "
-                        "(Rel.: SVNEHCA_0020)\n");
+                        "(Rel.: SVNEHCA_0021)\n");
        idr_init(&ehca_qp_idr);
        idr_init(&ehca_cq_idr);
        spin_lock_init(&ehca_qp_idr_lock);
diff -Nurp infiniband_orig/drivers/infiniband/hw/ehca/ipz_pt_fn.h infiniband_work/drivers/infiniband/hw/ehca/ipz_pt_fn.h
--- infiniband_orig/drivers/infiniband/hw/ehca/ipz_pt_fn.h      2007-02-11 21:31:06.000000000 +0100
+++ infiniband_work/drivers/infiniband/hw/ehca/ipz_pt_fn.h      2007-02-14 12:53:40.000000000 +0100
@@ -247,6 +247,15 @@ static inline void *ipz_eqit_eq_get_inc_
        return ret;
 }

+static inline void *ipz_eqit_eq_peek_valid(struct ipz_queue *queue)
+{
+       void *ret = ipz_qeit_get(queue);
+       u32 qe = *(u8 *) ret;
+       if ((qe >> 7) != (queue->toggle_state & 1))
+               return NULL;
+       return ret;
+}
+
 /* returns address (GX) of first queue entry */
 static inline u64 ipz_qpt_get_firstpage(struct ipz_qpt *qpt)
 {


From hnguyen at linux.vnet.ibm.com  Wed Feb 14 08:41:03 2007
From: hnguyen at linux.vnet.ibm.com (Hoang-Nam Nguyen)
Date: Wed, 14 Feb 2007 17:41:03 +0100
Subject: [openib-general] [PATCH 2.6.21-rc1 2/5] ehca: fix race
 condition/locking issues in scaling code
Message-ID: <200702141741.03964.hnguyen@linux.vnet.ibm.com>

Hi,
this patch fixes a race condition in find_next_cpu_online() and some
other locking issues in scaling code.
Thanks
Nam


Signed-off-by: Hoang-Nam Nguyen <hnguyen at de.ibm.com>
---


 ehca_irq.c |   68 +++++++++++++++++++++++++++++--------------------------------
 1 files changed, 33 insertions(+), 35 deletions(-)


diff -Nurp infiniband_orig/drivers/infiniband/hw/ehca/ehca_irq.c infiniband_work/drivers/infiniband/hw/ehca/ehca_irq.c
--- infiniband_orig/drivers/infiniband/hw/ehca/ehca_irq.c       2007-02-14 14:16:45.000000000 +0100
+++ infiniband_work/drivers/infiniband/hw/ehca/ehca_irq.c       2007-02-14 14:16:35.000000000 +0100
@@ -544,28 +544,30 @@ void ehca_tasklet_eq(unsigned long data)

 static inline int find_next_online_cpu(struct ehca_comp_pool* pool)
 {
-       unsigned long flags_last_cpu;
+       int cpu;
+       unsigned long flags;

+       WARN_ON_ONCE(!in_interrupt());
        if (ehca_debug_level)
                ehca_dmp(&cpu_online_map, sizeof(cpumask_t), "");

-       spin_lock_irqsave(&pool->last_cpu_lock, flags_last_cpu);
-       pool->last_cpu = next_cpu(pool->last_cpu, cpu_online_map);
-       if (pool->last_cpu == NR_CPUS)
-               pool->last_cpu = first_cpu(cpu_online_map);
-       spin_unlock_irqrestore(&pool->last_cpu_lock, flags_last_cpu);
+       spin_lock_irqsave(&pool->last_cpu_lock, flags);
+       cpu = next_cpu(pool->last_cpu, cpu_online_map);
+       if (cpu == NR_CPUS)
+               cpu = first_cpu(cpu_online_map);
+       pool->last_cpu = cpu;
+       spin_unlock_irqrestore(&pool->last_cpu_lock, flags);

-       return pool->last_cpu;
+       return cpu;
 }

 static void __queue_comp_task(struct ehca_cq *__cq,
                              struct ehca_cpu_comp_task *cct)
 {
-       unsigned long flags_cct;
-       unsigned long flags_cq;
+       unsigned long flags;

-       spin_lock_irqsave(&cct->task_lock, flags_cct);
-       spin_lock_irqsave(&__cq->task_lock, flags_cq);
+       spin_lock_irqsave(&cct->task_lock, flags);
+       spin_lock(&__cq->task_lock);

        if (__cq->nr_callbacks == 0) {
                __cq->nr_callbacks++;
@@ -576,8 +578,8 @@ static void __queue_comp_task(struct ehc
        else
                __cq->nr_callbacks++;

-       spin_unlock_irqrestore(&__cq->task_lock, flags_cq);
-       spin_unlock_irqrestore(&cct->task_lock, flags_cct);
+       spin_unlock(&__cq->task_lock);
+       spin_unlock_irqrestore(&cct->task_lock, flags);
 }

 static void queue_comp_task(struct ehca_cq *__cq)
@@ -588,69 +590,69 @@ static void queue_comp_task(struct ehca_

        cpu = get_cpu();
        cpu_id = find_next_online_cpu(pool);
-
        BUG_ON(!cpu_online(cpu_id));

        cct = per_cpu_ptr(pool->cpu_comp_tasks, cpu_id);
+       BUG_ON(!cct);

        if (cct->cq_jobs > 0) {
                cpu_id = find_next_online_cpu(pool);
                cct = per_cpu_ptr(pool->cpu_comp_tasks, cpu_id);
+               BUG_ON(!cct);
        }

        __queue_comp_task(__cq, cct);
-
-       put_cpu();
-
-       return;
 }

 static void run_comp_task(struct ehca_cpu_comp_task* cct)
 {
        struct ehca_cq *cq;
-       unsigned long flags_cct;
-       unsigned long flags_cq;
+       unsigned long flags;

-       spin_lock_irqsave(&cct->task_lock, flags_cct);
+       spin_lock_irqsave(&cct->task_lock, flags);

        while (!list_empty(&cct->cq_list)) {
                cq = list_entry(cct->cq_list.next, struct ehca_cq, entry);
-               spin_unlock_irqrestore(&cct->task_lock, flags_cct);
+               spin_unlock_irqrestore(&cct->task_lock, flags);
                comp_event_callback(cq);
-               spin_lock_irqsave(&cct->task_lock, flags_cct);
+               spin_lock_irqsave(&cct->task_lock, flags);

-               spin_lock_irqsave(&cq->task_lock, flags_cq);
+               spin_lock(&cq->task_lock);
                cq->nr_callbacks--;
                if (cq->nr_callbacks == 0) {
                        list_del_init(cct->cq_list.next);
                        cct->cq_jobs--;
                }
-               spin_unlock_irqrestore(&cq->task_lock, flags_cq);
-
+               spin_unlock(&cq->task_lock);
        }

-       spin_unlock_irqrestore(&cct->task_lock, flags_cct);
-
-       return;
+       spin_unlock_irqrestore(&cct->task_lock, flags);
 }

 static int comp_task(void *__cct)
 {
        struct ehca_cpu_comp_task* cct = __cct;
+       int cql_empty;
        DECLARE_WAITQUEUE(wait, current);

        set_current_state(TASK_INTERRUPTIBLE);
        while(!kthread_should_stop()) {
                add_wait_queue(&cct->wait_queue, &wait);

-               if (list_empty(&cct->cq_list))
+               spin_lock_irq(&cct->task_lock);
+               cql_empty = list_empty(&cct->cq_list);
+               spin_unlock_irq(&cct->task_lock);
+               if (cql_empty)
                        schedule();
                else
                        __set_current_state(TASK_RUNNING);

                remove_wait_queue(&cct->wait_queue, &wait);

-               if (!list_empty(&cct->cq_list))
+               spin_lock_irq(&cct->task_lock);
+               cql_empty = list_empty(&cct->cq_list);
+               spin_unlock_irq(&cct->task_lock);
+               if (!cql_empty)
                        run_comp_task(__cct);

                set_current_state(TASK_INTERRUPTIBLE);
@@ -693,8 +695,6 @@ static void destroy_comp_task(struct ehc

        if (task)
                kthread_stop(task);
-
-       return;
 }

 static void take_over_work(struct ehca_comp_pool *pool,
@@ -815,6 +815,4 @@ void ehca_destroy_comp_pool(void)
        free_percpu(pool->cpu_comp_tasks);
        kfree(pool);
 #endif
-
-       return;
 }


From hnguyen at linux.vnet.ibm.com  Wed Feb 14 08:41:21 2007
From: hnguyen at linux.vnet.ibm.com (Hoang-Nam Nguyen)
Date: Wed, 14 Feb 2007 17:41:21 +0100
Subject: [openib-general] [PATCH 2.6.21-rc1 3/5] ehca: allow en/disabling
 scaling code via module parameter
Message-ID: <200702141741.21722.hnguyen@linux.vnet.ibm.com>

Hi,
here is a patch for ehca that allows users to en/disable scaling code
when loading ib_ehca module.
Thanks
Nam


Signed-off-by: Hoang-Nam Nguyen <hnguyen at de.ibm.com>
---


 Kconfig        |    8 --------
 ehca_classes.h |    1 +
 ehca_irq.c     |   47 +++++++++++++++++++++--------------------------
 ehca_main.c    |    4 ++++
 4 files changed, 26 insertions(+), 34 deletions(-)


diff -Nurp infiniband_orig/drivers/infiniband/hw/ehca/Kconfig infiniband_work/drivers/infiniband/hw/ehca/Kconfig
--- infiniband_orig/drivers/infiniband/hw/ehca/Kconfig  2007-02-14 14:18:16.000000000 +0100
+++ infiniband_work/drivers/infiniband/hw/ehca/Kconfig  2007-02-14 14:20:52.000000000 +0100
@@ -7,11 +7,3 @@ config INFINIBAND_EHCA
        To compile the driver as a module, choose M here. The module
        will be called ib_ehca.

-config INFINIBAND_EHCA_SCALING
-       bool "Scaling support (EXPERIMENTAL)"
-       depends on IBMEBUS && INFINIBAND_EHCA && HOTPLUG_CPU && EXPERIMENTAL
-       default y
-       ---help---
-       eHCA scaling support schedules the CQ callbacks to different CPUs.
-
-       To enable this feature choose Y here.
diff -Nurp infiniband_orig/drivers/infiniband/hw/ehca/ehca_classes.h infiniband_work/drivers/infiniband/hw/ehca/ehca_classes.h
--- infiniband_orig/drivers/infiniband/hw/ehca/ehca_classes.h   2007-02-14 14:18:16.000000000 +0100
+++ infiniband_work/drivers/infiniband/hw/ehca/ehca_classes.h   2007-02-14 14:20:17.000000000 +0100
@@ -277,6 +277,7 @@ extern struct idr ehca_cq_idr;
 extern int ehca_static_rate;
 extern int ehca_port_act_time;
 extern int ehca_use_hp_mr;
+extern int ehca_scaling_code;

 struct ipzu_queue_resp {
        u32 qe_size;      /* queue entry size */
diff -Nurp infiniband_orig/drivers/infiniband/hw/ehca/ehca_irq.c infiniband_work/drivers/infiniband/hw/ehca/ehca_irq.c
--- infiniband_orig/drivers/infiniband/hw/ehca/ehca_irq.c       2007-02-14 14:18:16.000000000 +0100
+++ infiniband_work/drivers/infiniband/hw/ehca/ehca_irq.c       2007-02-14 14:20:17.000000000 +0100
@@ -63,15 +63,11 @@
 #define ERROR_DATA_LENGTH      EHCA_BMASK_IBM(52,63)
 #define ERROR_DATA_TYPE        EHCA_BMASK_IBM(0,7)

-#ifdef CONFIG_INFINIBAND_EHCA_SCALING
-
 static void queue_comp_task(struct ehca_cq *__cq);

 static struct ehca_comp_pool* pool;
 static struct notifier_block comp_pool_callback_nb;

-#endif
-
 static inline void comp_event_callback(struct ehca_cq *cq)
 {
        if (!cq->ib_cq.comp_handler)
@@ -423,13 +419,13 @@ static inline void process_eqe(struct eh
                        return;
                }
                reset_eq_pending(cq);
-#ifdef CONFIG_INFINIBAND_EHCA_SCALING
-               queue_comp_task(cq);
-               spin_unlock_irqrestore(&ehca_cq_idr_lock, flags);
-#else
-               spin_unlock_irqrestore(&ehca_cq_idr_lock, flags);
-               comp_event_callback(cq);
-#endif
+               if (ehca_scaling_code) {
+                       queue_comp_task(cq);
+                       spin_unlock_irqrestore(&ehca_cq_idr_lock, flags);
+               } else {
+                       spin_unlock_irqrestore(&ehca_cq_idr_lock, flags);
+                       comp_event_callback(cq);
+               }
        } else {
                ehca_dbg(&shca->ib_device,
                         "Got non completion event");
@@ -508,13 +504,12 @@ void ehca_process_eq(struct ehca_shca *s
        /* call completion handler for cached eqes */
        for (i = 0; i < eqe_cnt; i++)
                if (eq->eqe_cache[i].cq) {
-#ifdef CONFIG_INFINIBAND_EHCA_SCALING
-                       spin_lock(&ehca_cq_idr_lock);
-                       queue_comp_task(eq->eqe_cache[i].cq);
-                       spin_unlock(&ehca_cq_idr_lock);
-#else
-                       comp_event_callback(eq->eqe_cache[i].cq);
-#endif
+                       if (ehca_scaling_code) {
+                               spin_lock(&ehca_cq_idr_lock);
+                               queue_comp_task(eq->eqe_cache[i].cq);
+                               spin_unlock(&ehca_cq_idr_lock);
+                       } else
+                               comp_event_callback(eq->eqe_cache[i].cq);
                } else {
                        ehca_dbg(&shca->ib_device, "Got non completion event");
                        parse_identifier(shca, eq->eqe_cache[i].eqe->entry);
@@ -540,8 +535,6 @@ void ehca_tasklet_eq(unsigned long data)
        ehca_process_eq((struct ehca_shca*)data, 1);
 }

-#ifdef CONFIG_INFINIBAND_EHCA_SCALING
-
 static inline int find_next_online_cpu(struct ehca_comp_pool* pool)
 {
        int cpu;
@@ -764,14 +757,14 @@ static int comp_pool_callback(struct not
        return NOTIFY_OK;
 }

-#endif
-
 int ehca_create_comp_pool(void)
 {
-#ifdef CONFIG_INFINIBAND_EHCA_SCALING
        int cpu;
        struct task_struct *task;

+       if (!ehca_scaling_code)
+               return 0;
+
        pool = kzalloc(sizeof(struct ehca_comp_pool), GFP_KERNEL);
        if (pool == NULL)
                return -ENOMEM;
@@ -796,16 +789,19 @@ int ehca_create_comp_pool(void)
        comp_pool_callback_nb.notifier_call = comp_pool_callback;
        comp_pool_callback_nb.priority =0;
        register_cpu_notifier(&comp_pool_callback_nb);
-#endif
+
+       printk(KERN_INFO "eHCA scaling code enabled\n");

        return 0;
 }

 void ehca_destroy_comp_pool(void)
 {
-#ifdef CONFIG_INFINIBAND_EHCA_SCALING
        int i;

+       if (!ehca_scaling_code)
+               return;
+
        unregister_cpu_notifier(&comp_pool_callback_nb);

        for (i = 0; i < NR_CPUS; i++) {
@@ -814,5 +810,4 @@ void ehca_destroy_comp_pool(void)
        }
        free_percpu(pool->cpu_comp_tasks);
        kfree(pool);
-#endif
 }
diff -Nurp infiniband_orig/drivers/infiniband/hw/ehca/ehca_main.c infiniband_work/drivers/infiniband/hw/ehca/ehca_main.c
--- infiniband_orig/drivers/infiniband/hw/ehca/ehca_main.c      2007-02-14 14:18:16.000000000 +0100
+++ infiniband_work/drivers/infiniband/hw/ehca/ehca_main.c      2007-02-14 14:20:17.000000000 +0100
@@ -62,6 +62,7 @@ int ehca_use_hp_mr     = 0;
 int ehca_port_act_time = 30;
 int ehca_poll_all_eqs  = 1;
 int ehca_static_rate   = -1;
+int ehca_scaling_code  = 1;

 module_param_named(open_aqp1,     ehca_open_aqp1,     int, 0);
 module_param_named(debug_level,   ehca_debug_level,   int, 0);
@@ -71,6 +72,7 @@ module_param_named(use_hp_mr,     ehca_u
 module_param_named(port_act_time, ehca_port_act_time, int, 0);
 module_param_named(poll_all_eqs,  ehca_poll_all_eqs,  int, 0);
 module_param_named(static_rate,   ehca_static_rate,   int, 0);
+module_param_named(scaling_code,   ehca_scaling_code,   int, 0);

 MODULE_PARM_DESC(open_aqp1,
                 "AQP1 on startup (0: no (default), 1: yes)");
@@ -91,6 +93,8 @@ MODULE_PARM_DESC(poll_all_eqs,
                 " (0: no, 1: yes (default))");
 MODULE_PARM_DESC(static_rate,
                 "set permanent static rate (default: disabled)");
+MODULE_PARM_DESC(scaling_code,
+                "set scaling code (0: disabled, 1: enabled/default)");

 spinlock_t ehca_qp_idr_lock;
 spinlock_t ehca_cq_idr_lock;


From hnguyen at linux.vnet.ibm.com  Wed Feb 14 08:41:35 2007
From: hnguyen at linux.vnet.ibm.com (Hoang-Nam Nguyen)
Date: Wed, 14 Feb 2007 17:41:35 +0100
Subject: [openib-general] [PATCH 2.6.21-rc1 4/5] ehca: replace yield() by
 wait_for_completion()
Message-ID: <200702141741.35444.hnguyen@linux.vnet.ibm.com>

Hi,
this patch removes yield() and uses wait_for_completion() in order
to wait for running completion handlers finished before destroying
associated completion queue.
Thanks
Nam


Signed-off-by: Hoang-Nam Nguyen <hnguyen at de.ibm.com>
---


 ehca_classes.h |    3 +++
 ehca_cq.c      |    3 ++-
 ehca_irq.c     |    6 +++++-
 3 files changed, 10 insertions(+), 2 deletions(-)


diff -Nurp infiniband_orig/drivers/infiniband/hw/ehca/ehca_classes.h infiniband_work/drivers/infiniband/hw/ehca/ehca_classes.h
--- infiniband_orig/drivers/infiniband/hw/ehca/ehca_classes.h   2007-02-14 13:52:49.000000000 +0100
+++ infiniband_work/drivers/infiniband/hw/ehca/ehca_classes.h   2007-02-14 13:52:06.000000000 +0100
@@ -52,6 +52,8 @@ struct ehca_mw;
 struct ehca_pd;
 struct ehca_av;

+#include <linux/completion.h>
+
 #include <rdma/ib_verbs.h>
 #include <rdma/ib_user_verbs.h>

@@ -154,6 +156,7 @@ struct ehca_cq {
        struct hlist_head qp_hashtab[QP_HASHTAB_LEN];
        struct list_head entry;
        u32 nr_callbacks;
+       struct completion zero_callbacks;
        spinlock_t task_lock;
        u32 ownpid;
        /* mmap counter for resources mapped into user space */
diff -Nurp infiniband_orig/drivers/infiniband/hw/ehca/ehca_cq.c infiniband_work/drivers/infiniband/hw/ehca/ehca_cq.c
--- infiniband_orig/drivers/infiniband/hw/ehca/ehca_cq.c        2007-02-14 13:52:49.000000000 +0100
+++ infiniband_work/drivers/infiniband/hw/ehca/ehca_cq.c        2007-02-14 13:52:06.000000000 +0100
@@ -147,6 +147,7 @@ struct ib_cq *ehca_create_cq(struct ib_d
        spin_lock_init(&my_cq->spinlock);
        spin_lock_init(&my_cq->cb_lock);
        spin_lock_init(&my_cq->task_lock);
+       init_completion(&my_cq->zero_callbacks);
        my_cq->ownpid = current->tgid;

        cq = &my_cq->ib_cq;
@@ -332,7 +333,7 @@ int ehca_destroy_cq(struct ib_cq *cq)
        spin_lock_irqsave(&ehca_cq_idr_lock, flags);
        while (my_cq->nr_callbacks) {
                spin_unlock_irqrestore(&ehca_cq_idr_lock, flags);
-               yield();
+               wait_for_completion(&my_cq->zero_callbacks);
                spin_lock_irqsave(&ehca_cq_idr_lock, flags);
        }

diff -Nurp infiniband_orig/drivers/infiniband/hw/ehca/ehca_irq.c infiniband_work/drivers/infiniband/hw/ehca/ehca_irq.c
--- infiniband_orig/drivers/infiniband/hw/ehca/ehca_irq.c       2007-02-14 13:52:49.000000000 +0100
+++ infiniband_work/drivers/infiniband/hw/ehca/ehca_irq.c       2007-02-14 13:52:06.000000000 +0100
@@ -605,6 +605,7 @@ static void run_comp_task(struct ehca_cp
        spin_lock_irqsave(&cct->task_lock, flags);

        while (!list_empty(&cct->cq_list)) {
+               int is_complete = 0;
                cq = list_entry(cct->cq_list.next, struct ehca_cq, entry);
                spin_unlock_irqrestore(&cct->task_lock, flags);
                comp_event_callback(cq);
@@ -612,11 +613,14 @@ static void run_comp_task(struct ehca_cp

                spin_lock(&cq->task_lock);
                cq->nr_callbacks--;
-               if (cq->nr_callbacks == 0) {
+               is_complete = (cq->nr_callbacks == 0);
+               if (is_complete) {
                        list_del_init(cct->cq_list.next);
                        cct->cq_jobs--;
                }
                spin_unlock(&cq->task_lock);
+               if (is_complete) /* wake up waiting destroy_cq() */
+                       complete(&cq->zero_callbacks);
        }

        spin_unlock_irqrestore(&cct->task_lock, flags);


From halr at voltaire.com  Wed Feb 14 08:41:01 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 14 Feb 2007 11:41:01 -0500
Subject: [openib-general] SM assigned GID addresses
In-Reply-To: <01B9E81EECACE94DBBD0A556E768FB8A013599EC@NAMAIL2.ad.lsil.com>
References: <01B9E81EECACE94DBBD0A556E768FB8A013599EC@NAMAIL2.ad.lsil.com>
Message-ID: <1171471259.22446.104290.camel@hal.voltaire.com>

Hi,

On Wed, 2007-02-14 at 11:12, Batwara, Ashish wrote:
> Hi,
> I am referring to Section 4.1.1 of IB Spec which talks about "GID Usage
> AND Properties". Does anyone know whether or not SM uses item # 3 below
> for the address assignment and who are all the vendor supports # 3?

> Can anybody points me to the appropriate driver documentation in this area?

OpenSM supports setting either the default GID prefix or a configured
GID prefix and GIDs are comprised of this prefix and the endport EUI-64
(at index 0 of GUIDInfo).

OpenSM does not currently support configuring GUIDInfo indices above 0.
I'm also not sure how the stack would deal with this either. Is this a
requirement for some reason ? If so, can you elaborate/explain ?

-- Hal

> Thanks
> Ashish
> 
> 
> GID USAGE AND PROPERTIES
> 1) Each endport shall be assigned at least one unicast GID. The first
> unicast GID assigned shall be created using the manufacturer assigned
> EUI-64 identifier. This GID is referred to as GID index 0 and is
> formed by techniques 3(a) and 3(b) described below.
> 2) The default GID prefix shall be (0xFE80::0). A packet using the
> default
> GID prefix and either a manufacturer assigned or SM assigned
> EUI-64 must always be accepted by an endnode. A packet containing
> a GRH with a destination GID with this prefix must never be
> forwarded by a router, i.e. it is restricted to the local subnet.
> 3) A unicast GID shall be created using one or more of the following
> mechanisms:
> a) Concatenation of the default GID prefix with the manufacturer
> assigned
> EUI-64 identifier associated with an endport. This GID is
> referred to as the default GID.
> b) Concatenation of a subnet manager assigned 64-bit GID prefix
> and the manufacturer assigned EUI-64 identifier associated with
> an endport.
> c) Assignment of a GID by the subnet manager. The subnet manager
> creates a GID by concatenating the GID prefix (default or assigned)
> with a set of locally assigned EUI-64 values (at GID index
> 1 or above). Each endport must be assigned at least one unicast GID
> using (a). Additional GIDs may be assigned using (b) and/or (c). Note: A
> subnet
> shall only have one assigned GID prefix (non default) at any given
> time.
> 
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
> 


From hnguyen at linux.vnet.ibm.com  Wed Feb 14 08:41:44 2007
From: hnguyen at linux.vnet.ibm.com (Hoang-Nam Nguyen)
Date: Wed, 14 Feb 2007 17:41:44 +0100
Subject: [openib-general] [PATCH 2.6.21-rc1 5/5] ehca: query_port() returns
 LINK_UP instead UNKNOWN
Message-ID: <200702141741.45135.hnguyen@linux.vnet.ibm.com>

Hi,
this patch sets port phys state as a result of ehca_query_port() to LINK_UP.
On pSeries ehca actually represents a logical HCA, whose phys/link state always
is LINK_UP. 
Thanks
Nam


Signed-off-by: Hoang-Nam Nguyen <hnguyen at de.ibm.com>
---


 ehca_hca.c |    3 +++
 1 files changed, 3 insertions(+)


diff -Nurp infiniband_orig/drivers/infiniband/hw/ehca/ehca_hca.c infiniband_work/drivers/infiniband/hw/ehca/ehca_hca.c
--- infiniband_orig/drivers/infiniband/hw/ehca/ehca_hca.c       2007-02-14 13:11:45.000000000 +0100
+++ infiniband_work/drivers/infiniband/hw/ehca/ehca_hca.c       2007-02-14 12:53:52.000000000 +0100
@@ -162,6 +162,9 @@ int ehca_query_port(struct ib_device *ib
        props->active_width    = IB_WIDTH_12X;
        props->active_speed    = 0x1;

+       /* at the moment (logical) link state is always LINK_UP */
+       props->phys_state      = 0x5;
+
 query_port1:
        ehca_free_fw_ctrlblock(rblock);


From swise at opengridcomputing.com  Wed Feb 14 08:47:05 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Wed, 14 Feb 2007 10:47:05 -0600
Subject: [openib-general] [Bug 325] RDMA_CM and address translation
 broken on sles9sp3
In-Reply-To: <45D32522.5080100@mellanox.co.il>
References: <20070214115444.E46EAE603C3@openfabrics.org>
	<1171465049.15208.13.camel@stevo-desktop>
	<45D32522.5080100@mellanox.co.il>
Message-ID: <1171471625.15208.30.camel@stevo-desktop>

On Wed, 2007-02-14 at 17:05 +0200, Tziporet Koren wrote:
> Steve Wise wrote:
> > Tziporet, 
> >
> > I didn't think we were going to apply this patch until Michael tested it
> > with SDP/IPoIB on various distros. 
> >
> > Michael, did you get a chance to test it (I'm guessing not since you
> > were out sick)?  
> >
> > The reason I'm concerned is that it changes the behavior of
> > xxx_ip_dev_find() and _all_ backports, and we needed to test it out and
> > make sure it doesn't regress anything.  If it causes problems on other
> > backports, the plan was to just fix the sles9sp3 backport and leave the
> > others alone. 
> >
> > With the test build vlad published yesterday which has this patch,
> > rhel4u4 kernel wasn't working for me with iWARP and I'm afraid it might
> > be due to this patch.  I'm investigating this now.
> >
> >   
> >
> We tested this patch with our regression on IB and its worked fine for 
> both SDP and IPoIB.
> Then we applied it.
> Please report ASAP if you think there is an issue.
> 

I undid that change on RHEL4U4 and still see my iwarp rping problem, so
its not related...

Thanks,


Steve.


From HNGUYEN at de.ibm.com  Wed Feb 14 08:55:27 2007
From: HNGUYEN at de.ibm.com (Hoang-Nam Nguyen)
Date: Wed, 14 Feb 2007 17:55:27 +0100
Subject: [openib-general] 32-bit build for ppc64 is required
In-Reply-To: <20070214142924.GC20977@mellanox.co.il>
Message-ID: <OFB6E76481.E1134695-ONC1257282.005C703D-C1257282.005CF873@de.ibm.com>

Hi,
> Well, this is not by design: AFAIK on x86_64 both types of libraries
> are installed.
So, it seems to be an issue with the build script. Will talk to Vlad.
> But I still do not see how installing 32 bit binaries alongside the 64
> bit ones is useful, and I do not think other packages provide this
option,
> so maybe we shouldn't, either.
Since we've 32bit libs with ofed-1.2, it is a benefit to have also at least
ibutils as 32bit so that we can test if the corresponding 32bit libs work
properly, especially the context switch path.
Thus, please include also 32bit binaries.
Thanks
Nam


From Ashish.Batwara at lsi.com  Wed Feb 14 08:56:27 2007
From: Ashish.Batwara at lsi.com (Batwara, Ashish)
Date: Wed, 14 Feb 2007 09:56:27 -0700
Subject: [openib-general] SM assigned GID addresses
Message-ID: <01B9E81EECACE94DBBD0A556E768FB8A01359A19@NAMAIL2.ad.lsil.com>

Thanks for your reply.
So do you mean to say that current GUIDCap is always configured by SM to
1 for all the HCAs, and it is safe to assume that one IB port will have
only one or multiple IB address but that will have GUID portion common
in them (Based upon manufacturer's assigned EUI-64 based).
We are trying to define the target functionality for IB for our storage
arrays, and are trying to explore howmany port addresses that we can get
from an initiator standpoint.
How does this IB port GUID maps to SRP initiator ID? Are they same or
I/O Controller may have its own GUID and can use SM prefix to derive
initiator port ID in SRP login req?

Thanks
Ashish

-----Original Message-----
From: Hal Rosenstock [mailto:halr at voltaire.com] 
Sent: Wednesday, February 14, 2007 10:41 AM
To: Batwara, Ashish
Cc: openib-general at openib.org
Subject: Re: [openib-general] SM assigned GID addresses

Hi,

On Wed, 2007-02-14 at 11:12, Batwara, Ashish wrote:
> Hi,
> I am referring to Section 4.1.1 of IB Spec which talks about "GID
Usage
> AND Properties". Does anyone know whether or not SM uses item # 3
below
> for the address assignment and who are all the vendor supports # 3?

> Can anybody points me to the appropriate driver documentation in this
area?

OpenSM supports setting either the default GID prefix or a configured
GID prefix and GIDs are comprised of this prefix and the endport EUI-64
(at index 0 of GUIDInfo).

OpenSM does not currently support configuring GUIDInfo indices above 0.
I'm also not sure how the stack would deal with this either. Is this a
requirement for some reason ? If so, can you elaborate/explain ?

-- Hal

> Thanks
> Ashish
> 
> 
> GID USAGE AND PROPERTIES
> 1) Each endport shall be assigned at least one unicast GID. The first
> unicast GID assigned shall be created using the manufacturer assigned
> EUI-64 identifier. This GID is referred to as GID index 0 and is
> formed by techniques 3(a) and 3(b) described below.
> 2) The default GID prefix shall be (0xFE80::0). A packet using the
> default
> GID prefix and either a manufacturer assigned or SM assigned
> EUI-64 must always be accepted by an endnode. A packet containing
> a GRH with a destination GID with this prefix must never be
> forwarded by a router, i.e. it is restricted to the local subnet.
> 3) A unicast GID shall be created using one or more of the following
> mechanisms:
> a) Concatenation of the default GID prefix with the manufacturer
> assigned
> EUI-64 identifier associated with an endport. This GID is
> referred to as the default GID.
> b) Concatenation of a subnet manager assigned 64-bit GID prefix
> and the manufacturer assigned EUI-64 identifier associated with
> an endport.
> c) Assignment of a GID by the subnet manager. The subnet manager
> creates a GID by concatenating the GID prefix (default or assigned)
> with a set of locally assigned EUI-64 values (at GID index
> 1 or above). Each endport must be assigned at least one unicast GID
> using (a). Additional GIDs may be assigned using (b) and/or (c). Note:
A
> subnet
> shall only have one assigned GID prefix (non default) at any given
> time.
> 
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit
http://openib.org/mailman/listinfo/openib-general
> 


From mst at mellanox.co.il  Wed Feb 14 09:05:21 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Wed, 14 Feb 2007 19:05:21 +0200
Subject: [openib-general] 32-bit build for ppc64 is required
In-Reply-To: <OFB6E76481.E1134695-ONC1257282.005C703D-C1257282.005CF873@de.ibm.com>
References: <OFB6E76481.E1134695-ONC1257282.005C703D-C1257282.005CF873@de.ibm.com>
Message-ID: <20070214170520.GM16867@mellanox.co.il>

> > Well, this is not by design: AFAIK on x86_64 both types of libraries
> > are installed.
> So, it seems to be an issue with the build script. Will talk to Vlad.
> 
> > But I still do not see how installing 32 bit binaries alongside the 64
> > bit ones is useful, and I do not think other packages provide this option,
> > so maybe we shouldn't, either.
> 
> Since we've 32bit libs with ofed-1.2, it is a benefit to have also at least
> ibutils as 32bit so that we can test if the corresponding 32bit libs work
> properly, especially the context switch path.
> Thus, please include also 32bit binaries.

Still, using non-standard hacks like bin32 does not sound like a good idea.

Maybe an option to *only* make 32 bit userspace might make sense though.
Something like --disable-32bit, --disable-64bit.

This would solve your problem, would it not?

-- 
MST


From HNGUYEN at de.ibm.com  Wed Feb 14 09:02:38 2007
From: HNGUYEN at de.ibm.com (Hoang-Nam Nguyen)
Date: Wed, 14 Feb 2007 18:02:38 +0100
Subject: [openib-general] 32-bit build for ppc64 is required
In-Reply-To: <1171463849.16240.11.camel@vladsk-laptop>
Message-ID: <OF227E0669.CB1D9C0A-ONC1257282.005CFC7F-C1257282.005DA0D8@de.ibm.com>

Hi Vlad,
> prefix/lib (32bit libraries) should be created on ppc64 as well.
> Check that you have sysfsutils 32bit RPM installed.
> I don't have ppc64 here to check.
The current ofed-1.2 package does not, while ofed-1.1.1 has done.
It looks like that the one fix we did for ofed-1.1.1 were away.
If I remember right, the issue was that 64bit libs were created
first, then copied as backup. Next 32bit libs were created and
64bit libs copied back to the same place of 32bit libs, ie.
overwrote the 32bit libs.
Haven't checked the build script/openib.spec yet...
Regards
Nam


From halr at voltaire.com  Wed Feb 14 09:16:05 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 14 Feb 2007 12:16:05 -0500
Subject: [openib-general] SM assigned GID addresses
In-Reply-To: <01B9E81EECACE94DBBD0A556E768FB8A01359A19@NAMAIL2.ad.lsil.com>
References: <01B9E81EECACE94DBBD0A556E768FB8A01359A19@NAMAIL2.ad.lsil.com>
Message-ID: <1171473359.22446.106282.camel@hal.voltaire.com>

On Wed, 2007-02-14 at 11:56, Batwara, Ashish wrote:
> Thanks for your reply.
> So do you mean to say that current GUIDCap is always configured by SM to
> 1 for all the HCAs, and it is safe to assume that one IB port will have
> only one or multiple IB address but that will have GUID portion common
> in them (Based upon manufacturer's assigned EUI-64 based).

GUIDCap is a RO component in terms of the SM and guaranteed to be at
least 1 for an endport. This comes from the device SMA, not the SM.

-- Hal

> We are trying to define the target functionality for IB for our storage
> arrays, and are trying to explore howmany port addresses that we can get
> from an initiator standpoint.
> How does this IB port GUID maps to SRP initiator ID? Are they same or
> I/O Controller may have its own GUID and can use SM prefix to derive
> initiator port ID in SRP login req?
> 
> Thanks
> Ashish
> 
> -----Original Message-----
> From: Hal Rosenstock [mailto:halr at voltaire.com] 
> Sent: Wednesday, February 14, 2007 10:41 AM
> To: Batwara, Ashish
> Cc: openib-general at openib.org
> Subject: Re: [openib-general] SM assigned GID addresses
> 
> Hi,
> 
> On Wed, 2007-02-14 at 11:12, Batwara, Ashish wrote:
> > Hi,
> > I am referring to Section 4.1.1 of IB Spec which talks about "GID
> Usage
> > AND Properties". Does anyone know whether or not SM uses item # 3
> below
> > for the address assignment and who are all the vendor supports # 3?
> 
> > Can anybody points me to the appropriate driver documentation in this
> area?
> 
> OpenSM supports setting either the default GID prefix or a configured
> GID prefix and GIDs are comprised of this prefix and the endport EUI-64
> (at index 0 of GUIDInfo).
> 
> OpenSM does not currently support configuring GUIDInfo indices above 0.
> I'm also not sure how the stack would deal with this either. Is this a
> requirement for some reason ? If so, can you elaborate/explain ?
> 
> -- Hal
> 
> > Thanks
> > Ashish
> > 
> > 
> > GID USAGE AND PROPERTIES
> > 1) Each endport shall be assigned at least one unicast GID. The first
> > unicast GID assigned shall be created using the manufacturer assigned
> > EUI-64 identifier. This GID is referred to as GID index 0 and is
> > formed by techniques 3(a) and 3(b) described below.
> > 2) The default GID prefix shall be (0xFE80::0). A packet using the
> > default
> > GID prefix and either a manufacturer assigned or SM assigned
> > EUI-64 must always be accepted by an endnode. A packet containing
> > a GRH with a destination GID with this prefix must never be
> > forwarded by a router, i.e. it is restricted to the local subnet.
> > 3) A unicast GID shall be created using one or more of the following
> > mechanisms:
> > a) Concatenation of the default GID prefix with the manufacturer
> > assigned
> > EUI-64 identifier associated with an endport. This GID is
> > referred to as the default GID.
> > b) Concatenation of a subnet manager assigned 64-bit GID prefix
> > and the manufacturer assigned EUI-64 identifier associated with
> > an endport.
> > c) Assignment of a GID by the subnet manager. The subnet manager
> > creates a GID by concatenating the GID prefix (default or assigned)
> > with a set of locally assigned EUI-64 values (at GID index
> > 1 or above). Each endport must be assigned at least one unicast GID
> > using (a). Additional GIDs may be assigned using (b) and/or (c). Note:
> A
> > subnet
> > shall only have one assigned GID prefix (non default) at any given
> > time.
> > 
> > _______________________________________________
> > openib-general mailing list
> > openib-general at openib.org
> > http://openib.org/mailman/listinfo/openib-general
> > 
> > To unsubscribe, please visit
> http://openib.org/mailman/listinfo/openib-general
> > 
> 


From HNGUYEN at de.ibm.com  Wed Feb 14 09:25:27 2007
From: HNGUYEN at de.ibm.com (Hoang-Nam Nguyen)
Date: Wed, 14 Feb 2007 18:25:27 +0100
Subject: [openib-general] 32-bit build for ppc64 is required
In-Reply-To: <20070214170520.GM16867@mellanox.co.il>
Message-ID: <OF8C394F5B.50344168-ONC1257282.005F4CF5-C1257282.005FB7AA@de.ibm.com>

Hi,
> Still, using non-standard hacks like bin32 does not sound like a good
idea.
I think the actual issue is there is no common approach for this on various
platforms.
> Maybe an option to *only* make 32 bit userspace might make sense though.
> Something like --disable-32bit, --disable-64bit.
> This would solve your problem, would it not?
Does that mean if I don't specify one of them, I'll get 32- and 64bit
execs?
If yes, that's fine.
Thanks
Nam


From mst at mellanox.co.il  Wed Feb 14 09:39:27 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Wed, 14 Feb 2007 19:39:27 +0200
Subject: [openib-general] 32-bit build for ppc64 is required
In-Reply-To: <OF8C394F5B.50344168-ONC1257282.005F4CF5-C1257282.005FB7AA@de.ibm.com>
References: <OF8C394F5B.50344168-ONC1257282.005F4CF5-C1257282.005FB7AA@de.ibm.com>
Message-ID: <20070214173927.GN16867@mellanox.co.il>

> > Still, using non-standard hacks like bin32 does not sound like a good idea.
>
> I think the actual issue is there is no common approach for this on various
> platforms.
>

On platforms I've seen, there are 2 sets of libraries but only 64 bit executables
provided.

This is what we had for OFED 1.0, OFED 1.1, and I don't see
the reason to change that - adding more executables to install on
production systems will double QA work.


> 
> > Maybe an option to *only* make 32 bit userspace might make sense though.
> > Something like --disable-32bit, --disable-64bit.
> > This would solve your problem, would it not?
>
> Does that mean if I don't specify one of them, I'll get 32- and 64bit
> execs?
> If yes, that's fine.

No, by default we build 2 sets of libraries, and only 64 bit execs.

But for your development purposes (I think you mentioned testing user/kernel
context switch) I think we could have --disable-32bit flag to configure to get
only 32 bit userspace.

-- 
MST


From dledford at redhat.com  Wed Feb 14 10:29:22 2007
From: dledford at redhat.com (Doug Ledford)
Date: Wed, 14 Feb 2007 13:29:22 -0500
Subject: [openib-general] 32-bit build for ppc64 is required
In-Reply-To: <20070214142924.GC20977@mellanox.co.il>
References: <200702141518.56138.ossrosch@linux.vnet.ibm.com>
	<20070214142924.GC20977@mellanox.co.il>
Message-ID: <1171477762.3161.105.camel@fc6.xsintricity.com>

On Wed, 2007-02-14 at 16:29 +0200, Michael S. Tsirkin wrote:
> > Quoting Stefan Roscher <ossrosch at linux.vnet.ibm.com>:
> > Subject: Re: 32-bit build for ppc64 is required
> > 
> > On Wednesday 14 February 2007 14:29, Michael S. Tsirkin wrote:
> > > > Quoting Stefan Roscher <ossrosch at linux.vnet.ibm.com>:
> > > > Subject: 32-bit build for ppc64 is required
> > > > 
> > > > Hi,
> > > > 
> > > > after building the latest ofed build package we recognized that on PPC64 only
> > > > 64-bit libaries were build.
> > > > Because we have customers using older userpace apllications which are
> > > > certified for 32-bit we think additional 32bit support is a requirement for 64bit builds.
> > > > 
> > > > If OFED 1.2 supports 32 bit on ppc64, we have to change the install
> > > > directory.I would suggest to install 32-bit binaries into
> > > > /usr/local/ofed/bin32 directory. So no changes on current naming conventions
> > > > has to be done.The libaries are installed in the /usr/local/ofed/lib directory.
> > > 
> > > The standard practice is to install 64 bit libraries under prefix/lib64
> > > and 32 bit libraries under prefix/lib. Why would PPC64 be any different?
> > 
> > I think you missunderstand my post. The directory for 32/64bit libaries
> > shouldbe prefix/lib and prefix/lib64 respectively. 
> > But current ofed1.2 I saw only prefix/lib64 directory, ie 64bit libs only.  
> 
> Well, this is not by design: AFAIK on x86_64 both types of libraries
> are installed.
> 
> > > I do not think we need 32 bit binaries at all, and there's no other package
> > > I'm aware of that uses "bin32".
> > 
> > We have customers that still use 32-bit userspace applications. 
> > It would be beneficial for them if they can obtain 32bit libs and execs from
> > ofed1.2 in order to run their applications without recompiling them, because
> > for some 32-bit applications recompiling is not an option.
> 
> 32 bit libraries are needed for users to run 32 applications.
> 
> But I still do not see how installing 32 bit binaries alongside the 64
> bit ones is useful, and I do not think other packages provide this option,
> so maybe we shouldn't, either.

The choice of 32/64 bit default is done on a per arch basis.  With
x86_64/i386, the increased number of CPU registers in 64bit mode
outweighs the increased code bloat that goes along with 64bit mode.  On
PPC, no such register benefit exists for 64bit mode.  As such, 32bit
apps on PPC are faster than the equivalent 64bit apps up to the point at
which a 4GB address space becomes a problem.  Correspondingly, the
default binaries on PPC are 32bit, and only those that *need* to be
64bit are.  While a customer's application may need >4GB address space,
certainly all the ibutils, diags, opensm, etc. do not.  As a result, we
compile all of those utilities as 32bit by default on PPC.  We also ship
all the libs as both 32/64bit so users can select the appropriate
environment for their particular application (with the exception of
dapl, which doesn't support 32bit and for which I filed a bug around the
time of OFED 1.1).

-- 
Doug Ledford <dledford at redhat.com>
              GPG KeyID: CFBFF194
              http://people.redhat.com/dledford

Infiniband specific RPMs available at
              http://people.redhat.com/dledford/Infiniband
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070214/72d76bb6/attachment.sig>

From sashak at voltaire.com  Wed Feb 14 10:42:19 2007
From: sashak at voltaire.com (Sasha Khapyorsky)
Date: Wed, 14 Feb 2007 20:42:19 +0200
Subject: [openib-general] [PATCH RFC] opensm: OpenSM Coding Style doc draft
Message-ID: <20070214184219.GB27414@sashak.voltaire.com>


Initial writeup about OpenSM Coding Style recommendations.

Signed-off-by: Sasha Khapyorsky <sashak at voltaire.com>
---
 osm/doc/opensm-coding-style.txt |   34 ++++++++++++++++++++++++++++++++++
 1 files changed, 34 insertions(+), 0 deletions(-)
 create mode 100644 osm/doc/opensm-coding-style.txt

diff --git a/osm/doc/opensm-coding-style.txt b/osm/doc/opensm-coding-style.txt
new file mode 100644
index 0000000..379042c
--- /dev/null
+++ b/osm/doc/opensm-coding-style.txt
@@ -0,0 +1,34 @@
+This short (hopefully) memo is about to define the coding style
+recommended for OpenSM development.
+
+The goal of this is to make OpenSM code base to be standard in terms of
+the rest of OpenIB management software, OpenIB projects and Linux in
+general. And in this way to make OpenSM more developer friendly and to
+involve more open source programmers to be part of OpenSM development
+process.
+
+The goal of this is not to provide long and boring list of coding style
+paradigms, but rather to define general coding style concept and to
+suggest a way for such a concept to be implemented in the existing
+OpenSM code base.
+
+The OpenSM project is an OpenIB and Linux centric project, so we think
+it is reasonable to use the coding style most popular with OpenIB
+projects (linux/Documentation/CodingStyle) as the starting point rather
+than reinventing one more coding style rule-set.
+
+Some things from there in short: tab character for indentation and space
+character for alignment, K&R style braces, short local and meanful
+global names, please no confused Hungary style, short functions. And of
+course to be reasonable about all above.
+
+
+Some ideas about existing OpenSM code improvements in terms of the
+Coding style:
+
+* When writing new code, please try to follow the new Coding style.
+* Coding style improvement patches are desired and accepted, but please
+  try to not mix coding style improvement with functional and other
+  changes in one patch.
+* When you are going to improve coding style for existing code, please
+  try to do it for entire file(s).
-- 
1.5.0.rc2.g11a3


From dledford at redhat.com  Wed Feb 14 10:36:27 2007
From: dledford at redhat.com (Doug Ledford)
Date: Wed, 14 Feb 2007 13:36:27 -0500
Subject: [openib-general] Open MPI rpmbuild fails in OFED-1.2
In-Reply-To: <7CDAEF93-7E07-45CE-9D66-99F3ED98405B@cisco.com>
References: <1170866522.6223.8.camel@vladsk-laptop>
	<7CDAEF93-7E07-45CE-9D66-99F3ED98405B@cisco.com>
Message-ID: <1171478187.3161.107.camel@fc6.xsintricity.com>

On Fri, 2007-02-09 at 13:38 -0500, Jeff Squyres wrote:
> New SRPM on server that munges the %build section into the %install  
> section.
> 
> Yuck.  :-)

Worse than yuck, it's wrong.  Your SuSE %build section bug is a result
of trying to build against something that isn't installed yet but is
required for the build.  You guys chose to split things up into modules,
and that's fine and the way things should be, but that means you need to
install required packages along the way if you want to build against
them, not try to build against binaries in temporary directories.  Apart
from that though, I can assure you that on RHEL and FC, the %build
section is a requirement if you want valid -debuginfo packages.

I've brought it up at the last two conferences I attented, and I usually
get a brick wall when I do, but the OFED packaging process is broken by
design.  As Shaun brought up, one of the benefits of proper RPM
packaging is reliable, reproducible builds, not to mention the whole
issue of debugging with gdb is nigh impossible without valid debuginfo
rpms; all of which are vital to supportability.

I'm looking through the alpha1 tarball right now, I'll comment on it
later under separate email.  But, first glance is that I'll be ripping
everything out and making it sane again.

Which brings up another point that I've mentioned before but nothing has
happened on: as long as you guys keep making your distribution use an
installation hierarchy that violates the rules for distributions
shipping code, places like Novell or Red Hat have one of two choices:
violate the Linux File Hierarchy Standard in our distributions or use a
different hierarchy than you do.  Obviously, we aren't going to fore go
LFHS compliance of our entire product for just this, so we use a
different hierarchy than you.  In the end, this can end up causing
confusion for customers, as well as inconsistency between what Red Hat
or Novell or you guys choose to use as the file placement.  Something
needs to be done to standardize installation directories in an
acceptable place IMO (/usr/local is verboten for a distribution to use,
and theoretically that should include you guys since you are a
distribution source, the only real reason people are compiling your code
locally is that you don't provide binary RPMs or because they want a
custom compiler instead of gcc, not because they are trying out new
software they don't necessarily intend to keep/use or which is new
enough that no one has formally packaged it up, which is what /usr/local
is for).

> 
> On Feb 7, 2007, at 11:42 AM, Vladimir Sokolovsky wrote:
> 
> > Hi Jeff,
> > Please remove %build macro from the RPM spec file.
> > On SuSE distros it removes RPM_BUILD_ROOT.
> >
> > Executing(%build): /bin/sh -e /var/tmp/rpm-tmp.23343
> > + umask 022
> > + cd /var/tmp/OFEDRPM/BUILD
> > + /bin/rm -rf /var/tmp/OFED
> > ++ dirname /var/tmp/OFED
> > + /bin/mkdir -p /var/tmp
> > + /bin/mkdir /var/tmp/OFED
> > + cd openmpi-1.2b4ofedr13470
> > + fortify_source=1
> > + test '' '!=' ''
> > ...
> >
> > -- 
> > Vladimir Sokolovsky <vlad at dev.mellanox.co.il>
> > Mellanox Technologies Ltd.
> 
> 
-- 
Doug Ledford <dledford at redhat.com>
              GPG KeyID: CFBFF194
              http://people.redhat.com/dledford

Infiniband specific RPMs available at
              http://people.redhat.com/dledford/Infiniband
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070214/2193ebd3/attachment.sig>

From sweitzen at cisco.com  Wed Feb 14 10:44:14 2007
From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen))
Date: Wed, 14 Feb 2007 10:44:14 -0800
Subject: [openib-general] Open MPI rpmbuild fails in OFED-1.2
In-Reply-To: <1171478187.3161.107.camel@fc6.xsintricity.com>
References: <1170866522.6223.8.camel@vladsk-laptop>
	<7CDAEF93-7E07-45CE-9D66-99F3ED98405B@cisco.com>
	<1171478187.3161.107.camel@fc6.xsintricity.com>
Message-ID: <A15335FBE9BD2449AF2C9EF3D1EB8EA302FEFF62@xmb-sjc-216.amer.cisco.com>

Tziporet and Doug, we can discuss this at the OFED conf call on Feb 26,
I suggest we try to improve this area.

Scott Weitzenkamp
SQA and Release Manager
Server Virtualization Business Unit
Cisco Systems
 

> -----Original Message-----
> From: openib-general-bounces at openib.org 
> [mailto:openib-general-bounces at openib.org] On Behalf Of Doug Ledford
> Sent: Wednesday, February 14, 2007 10:36 AM
> To: Jeff Squyres (jsquyres)
> Cc: openfabrics-ewg at openib.org; 'Openib-General at Openib.Org'
> Subject: Re: [openib-general] Open MPI rpmbuild fails in OFED-1.2
> 
> On Fri, 2007-02-09 at 13:38 -0500, Jeff Squyres wrote:
> > New SRPM on server that munges the %build section into the 
> %install  
> > section.
> > 
> > Yuck.  :-)
> 
> Worse than yuck, it's wrong.  Your SuSE %build section bug is a result
> of trying to build against something that isn't installed yet but is
> required for the build.  You guys chose to split things up 
> into modules,
> and that's fine and the way things should be, but that means 
> you need to
> install required packages along the way if you want to build against
> them, not try to build against binaries in temporary 
> directories.  Apart
> from that though, I can assure you that on RHEL and FC, the %build
> section is a requirement if you want valid -debuginfo packages.
> 
> I've brought it up at the last two conferences I attented, 
> and I usually
> get a brick wall when I do, but the OFED packaging process is 
> broken by
> design.  As Shaun brought up, one of the benefits of proper RPM
> packaging is reliable, reproducible builds, not to mention the whole
> issue of debugging with gdb is nigh impossible without valid debuginfo
> rpms; all of which are vital to supportability.
> 
> I'm looking through the alpha1 tarball right now, I'll comment on it
> later under separate email.  But, first glance is that I'll be ripping
> everything out and making it sane again.
> 
> Which brings up another point that I've mentioned before but 
> nothing has
> happened on: as long as you guys keep making your distribution use an
> installation hierarchy that violates the rules for distributions
> shipping code, places like Novell or Red Hat have one of two choices:
> violate the Linux File Hierarchy Standard in our 
> distributions or use a
> different hierarchy than you do.  Obviously, we aren't going 
> to fore go
> LFHS compliance of our entire product for just this, so we use a
> different hierarchy than you.  In the end, this can end up causing
> confusion for customers, as well as inconsistency between what Red Hat
> or Novell or you guys choose to use as the file placement.  Something
> needs to be done to standardize installation directories in an
> acceptable place IMO (/usr/local is verboten for a 
> distribution to use,
> and theoretically that should include you guys since you are a
> distribution source, the only real reason people are 
> compiling your code
> locally is that you don't provide binary RPMs or because they want a
> custom compiler instead of gcc, not because they are trying out new
> software they don't necessarily intend to keep/use or which is new
> enough that no one has formally packaged it up, which is what 
> /usr/local
> is for).
> 
> > 
> > On Feb 7, 2007, at 11:42 AM, Vladimir Sokolovsky wrote:
> > 
> > > Hi Jeff,
> > > Please remove %build macro from the RPM spec file.
> > > On SuSE distros it removes RPM_BUILD_ROOT.
> > >
> > > Executing(%build): /bin/sh -e /var/tmp/rpm-tmp.23343
> > > + umask 022
> > > + cd /var/tmp/OFEDRPM/BUILD
> > > + /bin/rm -rf /var/tmp/OFED
> > > ++ dirname /var/tmp/OFED
> > > + /bin/mkdir -p /var/tmp
> > > + /bin/mkdir /var/tmp/OFED
> > > + cd openmpi-1.2b4ofedr13470
> > > + fortify_source=1
> > > + test '' '!=' ''
> > > ...
> > >
> > > -- 
> > > Vladimir Sokolovsky <vlad at dev.mellanox.co.il>
> > > Mellanox Technologies Ltd.
> > 
> > 
> -- 
> Doug Ledford <dledford at redhat.com>
>               GPG KeyID: CFBFF194
>               http://people.redhat.com/dledford
> 
> Infiniband specific RPMs available at
>               http://people.redhat.com/dledford/Infiniband
> 


From jsquyres at cisco.com  Wed Feb 14 10:51:29 2007
From: jsquyres at cisco.com (Jeff Squyres)
Date: Wed, 14 Feb 2007 13:51:29 -0500
Subject: [openib-general] Open MPI rpmbuild fails in OFED-1.2
In-Reply-To: <A15335FBE9BD2449AF2C9EF3D1EB8EA302FEFF62@xmb-sjc-216.amer.cisco.com>
References: <1170866522.6223.8.camel@vladsk-laptop>
	<7CDAEF93-7E07-45CE-9D66-99F3ED98405B@cisco.com>
	<1171478187.3161.107.camel@fc6.xsintricity.com>
	<A15335FBE9BD2449AF2C9EF3D1EB8EA302FEFF62@xmb-sjc-216.amer.cisco.com>
Message-ID: <464B6D9D-FA58-46C1-88AD-5D109E98C16B@cisco.com>

On Feb 14, 2007, at 1:44 PM, Scott Weitzenkamp ((sweitzen)) wrote:

> Tziporet and Doug, we can discuss this at the OFED conf call on Feb  
> 26,
> I suggest we try to improve this area.

I strongly agree with this and all of Doug's points (see my prior e- 
mails on this subject :-) ).

-- 
Jeff Squyres
Server Virtualization Business Unit
Cisco Systems


From hch at infradead.org  Wed Feb 14 10:59:07 2007
From: hch at infradead.org (Christoph Hellwig)
Date: Wed, 14 Feb 2007 18:59:07 +0000
Subject: [openib-general] [PATCH 2.6.21-rc1 1/5] ehca: reworked irq
 handler to avoid/reduce missed irq events
In-Reply-To: <200702141740.48286.hnguyen@linux.vnet.ibm.com>
References: <200702141740.48286.hnguyen@linux.vnet.ibm.com>
Message-ID: <20070214185907.GA15105@infradead.org>

On Wed, Feb 14, 2007 at 05:40:47PM +0100, Hoang-Nam Nguyen wrote:
> Hi,
> here is a patch for ehca with the reworked irq handler.
> Thanks
> Nam

This looks okay to me (and sorry for new replying earlier to you private
mail)


From jeremy.brown at qlogic.com  Wed Feb 14 11:08:38 2007
From: jeremy.brown at qlogic.com (Jeremy Brown)
Date: Wed, 14 Feb 2007 11:08:38 -0800
Subject: [openib-general] [openfabrics-ewg] new OFED 1.2 package
In-Reply-To: <1171406869.17328.16.camel@citrine.pathscale.com>
References: <1171387167.3978.90.camel@vladsk-laptop>
	<45D21B85.9070007@mellanox.co.il>
	<1171406869.17328.16.camel@citrine.pathscale.com>
Message-ID: <1171480118.17328.18.camel@citrine.pathscale.com>

On Tue, 2007-02-13 at 14:47 -0800, Jeremy Brown wrote:
> I know that the package is named "sysfsutils-devel" in Fedora Core 3-5,
> and "libsysfs-devel" in Fedora Core 6, similar to the RH 4 vs. RH 5
> split. Would it be possible to change the definition and use of
> $DISTRIBUTION in build_env.sh so the we had "fedora" for FC3-5, and
> "fedora6" for FC6, similar to the "redhat" and "redhat5" split? I'm not
> married to those names, of course.

I apologize for replying to myself, but I wanted to say that this is
working great in the alpha. Thanks for making the change!

Jeremy Brown


From sashak at voltaire.com  Wed Feb 14 11:20:08 2007
From: sashak at voltaire.com (Sasha Khapyorsky)
Date: Wed, 14 Feb 2007 21:20:08 +0200
Subject: [openib-general] ibsim announcement
Message-ID: <20070214192008.GC27414@sashak.voltaire.com>

Hi All,

'ibsim' is Voltaire Infiniband Fabric Simulator. The tool was originally
developed by voltairians and was used with big success for IB management
software development, debug and testing. Also we found this perfectly
useful for various researches and a routing algorithms development.

Based on the successful experience in the using 'ibsim' for development
Voltaire decided to make this tool available for everybody and
contributes 'ibsim' sources to the OpenIB Community.

The ibsim package is available now on the OFA site and can be cloned:

  git clone git://git.openfabrics.org/~sashak/ibsim

There is README file with build instructions. The kernel support or
OpenSM and diags tools recompilation are _not_ required.

Enjoy!

Sasha


From sweitzen at cisco.com  Wed Feb 14 11:12:52 2007
From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen))
Date: Wed, 14 Feb 2007 11:12:52 -0800
Subject: [openib-general] how to handle OFEd 1.2 bugs in bugzilla
In-Reply-To: <45D31A10.8020102@mellanox.co.il>
References: <45D31A10.8020102@mellanox.co.il>
Message-ID: <A15335FBE9BD2449AF2C9EF3D1EB8EA302FEFFAB@xmb-sjc-216.amer.cisco.com>

Yes, I'd like to add alpha1, etc. version numbers in bugzilla.

For existing bugs, the Reporter and Assignee should try to
communicate/negotiate Priority/Severity.  For bugs in areas that Cisco
supports, I review the bugs and try to ask for desired ones to be fixed.
I was happy with the responses I got for OFED 1.1 from Mellanox and Open
MPI.

If you want a bug scrub, I suggest a distributed one, where someone from
each company scrubs the bugs in areas they are responsible for.

Scott 

> -----Original Message-----
> From: Tziporet Koren [mailto:tziporet at mellanox.co.il] 
> Sent: Wednesday, February 14, 2007 6:18 AM
> To: Scott Weitzenkamp (sweitzen)
> Cc: EWG; OPENIB
> Subject: how to handle OFEd 1.2 bugs in bugzilla
> 
> Hi Scott and all,
> I wish to consult with you in the way we will treat OFED 1.2 bugs in 
> bugzilla.
> 
> 1. Do we want to have 1.2-alpha 1.2-beta, 1.2-rcX in version, or just 
> 1.2 as we have now
> 2. What do we wish to do with bugs that were opened for 1.1 and are 
> still open?
> 3. What to do with old bugs that where open to gen2 in general?
> 4. What is our methodology for priority and severity setup? 
> (There are 
> too many  blocker bugs still open in OFED 1.1 so they are not 
> actually 
> blockers or they were fixed but not updated)
> 
> Thanks,
> Tziporet
> 


From dledford at redhat.com  Wed Feb 14 11:33:24 2007
From: dledford at redhat.com (Doug Ledford)
Date: Wed, 14 Feb 2007 14:33:24 -0500
Subject: [openib-general] Open MPI rpmbuild fails in OFED-1.2
In-Reply-To: <A15335FBE9BD2449AF2C9EF3D1EB8EA302FEFF62@xmb-sjc-216.amer.cisco.com>
References: <1170866522.6223.8.camel@vladsk-laptop>
	<7CDAEF93-7E07-45CE-9D66-99F3ED98405B@cisco.com>
	<1171478187.3161.107.camel@fc6.xsintricity.com>
	<A15335FBE9BD2449AF2C9EF3D1EB8EA302FEFF62@xmb-sjc-216.amer.cisco.com>
Message-ID: <1171481604.3161.110.camel@fc6.xsintricity.com>

On Wed, 2007-02-14 at 10:44 -0800, Scott Weitzenkamp (sweitzen) wrote:
> Tziporet and Doug, we can discuss this at the OFED conf call on Feb 26,
> I suggest we try to improve this area.

OK.  I'll make sure to attend the Feb 26 meeting.

> Scott Weitzenkamp
> SQA and Release Manager
> Server Virtualization Business Unit
> Cisco Systems
>  
> 
> > -----Original Message-----
> > From: openib-general-bounces at openib.org 
> > [mailto:openib-general-bounces at openib.org] On Behalf Of Doug Ledford
> > Sent: Wednesday, February 14, 2007 10:36 AM
> > To: Jeff Squyres (jsquyres)
> > Cc: openfabrics-ewg at openib.org; 'Openib-General at Openib.Org'
> > Subject: Re: [openib-general] Open MPI rpmbuild fails in OFED-1.2
> > 
> > On Fri, 2007-02-09 at 13:38 -0500, Jeff Squyres wrote:
> > > New SRPM on server that munges the %build section into the 
> > %install  
> > > section.
> > > 
> > > Yuck.  :-)
> > 
> > Worse than yuck, it's wrong.  Your SuSE %build section bug is a result
> > of trying to build against something that isn't installed yet but is
> > required for the build.  You guys chose to split things up 
> > into modules,
> > and that's fine and the way things should be, but that means 
> > you need to
> > install required packages along the way if you want to build against
> > them, not try to build against binaries in temporary 
> > directories.  Apart
> > from that though, I can assure you that on RHEL and FC, the %build
> > section is a requirement if you want valid -debuginfo packages.
> > 
> > I've brought it up at the last two conferences I attented, 
> > and I usually
> > get a brick wall when I do, but the OFED packaging process is 
> > broken by
> > design.  As Shaun brought up, one of the benefits of proper RPM
> > packaging is reliable, reproducible builds, not to mention the whole
> > issue of debugging with gdb is nigh impossible without valid debuginfo
> > rpms; all of which are vital to supportability.
> > 
> > I'm looking through the alpha1 tarball right now, I'll comment on it
> > later under separate email.  But, first glance is that I'll be ripping
> > everything out and making it sane again.
> > 
> > Which brings up another point that I've mentioned before but 
> > nothing has
> > happened on: as long as you guys keep making your distribution use an
> > installation hierarchy that violates the rules for distributions
> > shipping code, places like Novell or Red Hat have one of two choices:
> > violate the Linux File Hierarchy Standard in our 
> > distributions or use a
> > different hierarchy than you do.  Obviously, we aren't going 
> > to fore go
> > LFHS compliance of our entire product for just this, so we use a
> > different hierarchy than you.  In the end, this can end up causing
> > confusion for customers, as well as inconsistency between what Red Hat
> > or Novell or you guys choose to use as the file placement.  Something
> > needs to be done to standardize installation directories in an
> > acceptable place IMO (/usr/local is verboten for a 
> > distribution to use,
> > and theoretically that should include you guys since you are a
> > distribution source, the only real reason people are 
> > compiling your code
> > locally is that you don't provide binary RPMs or because they want a
> > custom compiler instead of gcc, not because they are trying out new
> > software they don't necessarily intend to keep/use or which is new
> > enough that no one has formally packaged it up, which is what 
> > /usr/local
> > is for).
> > 
> > > 
> > > On Feb 7, 2007, at 11:42 AM, Vladimir Sokolovsky wrote:
> > > 
> > > > Hi Jeff,
> > > > Please remove %build macro from the RPM spec file.
> > > > On SuSE distros it removes RPM_BUILD_ROOT.
> > > >
> > > > Executing(%build): /bin/sh -e /var/tmp/rpm-tmp.23343
> > > > + umask 022
> > > > + cd /var/tmp/OFEDRPM/BUILD
> > > > + /bin/rm -rf /var/tmp/OFED
> > > > ++ dirname /var/tmp/OFED
> > > > + /bin/mkdir -p /var/tmp
> > > > + /bin/mkdir /var/tmp/OFED
> > > > + cd openmpi-1.2b4ofedr13470
> > > > + fortify_source=1
> > > > + test '' '!=' ''
> > > > ...
> > > >
> > > > -- 
> > > > Vladimir Sokolovsky <vlad at dev.mellanox.co.il>
> > > > Mellanox Technologies Ltd.
> > > 
> > > 
> > -- 
> > Doug Ledford <dledford at redhat.com>
> >               GPG KeyID: CFBFF194
> >               http://people.redhat.com/dledford
> > 
> > Infiniband specific RPMs available at
> >               http://people.redhat.com/dledford/Infiniband
> > 
-- 
Doug Ledford <dledford at redhat.com>
              GPG KeyID: CFBFF194
              http://people.redhat.com/dledford

Infiniband specific RPMs available at
              http://people.redhat.com/dledford/Infiniband
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070214/2c61fb19/attachment.sig>

From krause at cup.hp.com  Wed Feb 14 11:09:25 2007
From: krause at cup.hp.com (Michael Krause)
Date: Wed, 14 Feb 2007 11:09:25 -0800
Subject: [openib-general] Problem is routing CM REQ
In-Reply-To: <000001c74fca$6c765170$8698070a@amr.corp.intel.com>
References: <6.2.0.14.2.20070213143635.09393fe0@esmail.cup.hp.com>
	<000001c74fca$6c765170$8698070a@amr.corp.intel.com>
Message-ID: <6.2.0.14.2.20070214105420.09493fc8@esmail.cup.hp.com>

At 03:55 PM 2/13/2007, Sean Hefty wrote:
> >A LID is subnet local on that we can all agree.   The CM Req contains
> >either the LID of a local subnet CA or the LID a local router which will
> >move the packet to the next hop to the destination.   12.7.11 is basically
> >saying that the remote LID is the router's LID of the local subnet's router
> >Port.   12.7.21 also refers to the remote LID but in each subnet that is
> >either the router Port's LID or the destination CA.
>
>This isn't my interpretation.
>
>12.7.11 Local Port LID:  When local and remote ports are on different subnets,
>this field must be the LID of the router that the *passive* side will 
>target for
>the return path.
>
>The CM REQ carries the LIDs for the remote (passive side) subnet.  This is 
>what
>the passive side needs to configure the QP, not the active side LID 
>information.
>(See address vector information for 11.2.4.2 - page 574.)
>
>So, the CM REQ is _sent_ to either the LID of the local subnet CA or the 
>LID of
>a local router port, but _contains_ the LIDs from the remote subnet.

  In volume 1, version 1.2, page 574 it states:

Emacs!


12.7.11
Emacs!

Both of these statements refer to the local subnet's LID for the router 
port being used by the local CA to communicate to a remote subnet.   The IB 
architecture is built upon the concept that no subnet local information 
knowledge is required beyond the subnet itself to establish communication 
across subnets.   Perhaps the various wordings are a bit confusing but the 
CM protocol should not be concerned with a remote subnet's LID or any 
validation of such remote subnet information.   All it needs to do is 
communicate what is global so that a remote endnode can respond 
correctly.  It is up to the router and the associated router protocol to 
perform any global to subnet local mapping which includes the LID and LRH 
generation.   The router must work with each subnet's SM / SA to provide 
the necessary global to subnet local mappings which are then queried by the 
CM agent to find the appropriate router Port.   There is no requirement in 
the specification to ever communicate across a subnet anything that is 
strictly subnet local.  LID is a strictly subnet local value and is not 
shared.  Again, the passive here refers to the subnet local router LID and 
not the remote subnet's LID.

Mike 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070214/e40b4b5d/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: a3c35b92.jpg
Type: image/jpeg
Size: 43794 bytes
Desc: not available
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070214/e40b4b5d/attachment.jpg>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: a3c35c00.jpg
Type: image/jpeg
Size: 45123 bytes
Desc: not available
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070214/e40b4b5d/attachment-0001.jpg>

From sweitzen at cisco.com  Wed Feb 14 12:24:39 2007
From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen))
Date: Wed, 14 Feb 2007 12:24:39 -0800
Subject: [openib-general] OFED 1.2 alpha release
In-Reply-To: <45D337E2.200@mellanox.co.il>
References: <45D337E2.200@mellanox.co.il>
Message-ID: <A15335FBE9BD2449AF2C9EF3D1EB8EA303051A05@xmb-sjc-216.amer.cisco.com>

I don't remember discussing dropping RHEL4 U3, and would like to add it
back to the official list.  IPoIB multicast does not work correctly (bug
266) in RHEL4 U4, thus RHEL4 U3 is the most recent working RHEL release
in this area (unless it has been fixed in U4 errata kernels).  The new
ib-bonding RPM also says it only supports RHEL4 U3 for Red Hat releases.
 
We should probably also plan for SLES10 SP1 support in OFED 1.2.
 
Scott


________________________________

	From: openib-general-bounces at openib.org
[mailto:openib-general-bounces at openib.org] On Behalf Of Tziporet Koren
	Sent: Wednesday, February 14, 2007 8:25 AM
	To: EWG
	Cc: OPENIB
	Subject: [openib-general] OFED 1.2 alpha release
	
	
	Hi,
	
	In two weeks delay we publish OFED 1.2-alpha1 on
http://www.openfabrics.org/builds/ofed-1.2/
	File: OFED-1.2-alpha1.tgz
	BUILD_ID contains info on all packages sources location.
	
	Please report any issues in bugzilla
https://bugs.openfabrics.org/
	
	Tziporet & Vlad
	
	OS support:
	Novell:
	    - SLES 9.0 SP3
	    - SLES10
	Redhat:
	    - Redhat EL4 up4
	    - Redhat EL5 beta2 (only partially tested)
	kernel.org:
	    - 2.6.20
	    - 2.6.19
	
	Note: Redhat EL4 up3, Fedora C4, Fedora C6 and SuSE Pro 10 are
not part of the official list.
	We keep the backport patches for these OSes and make sure OFED
compile and loaded properly but will not do full QA cycle.
	
	Systems:
	    * x86_64
	    * x86
	    * ia64
	    * ppc64 (have not tested user space)
	
	Main changes from OFED-1.1:
	

	1.	iWRAP is now supported with Chelsio T3
		
	2.	New kernel modules: VNIC, RDS, Bonding, SA cache, 
		
	3.	New packages: MVAPICH2 
	4.	IPoIB Connected mode 
	5.	Multicast join from user space 
	6.	libibverbs 1.1 
	7.	OpenSM new routing models: FAT tree routing and Taurus
routing 
	8.	GUI tool for network diagnostic 
	9.	New MPI releases: MVAPICH: version 0.9.9, Open MPI:
version 1.2, MVAPICH2: version 0.9.8 

	Detailed list of changes can be found in:
https://wiki.openfabrics.org/tiki-index.php?page=OFED+1.2+release+plan+a
nd+features
	
	Limitations and known issues:
	

	1.	ipath driver compilation fails on all systems, except
for kernel 2.6.20 
	2.	libipathverbs  is not working with libibverbs 1.1
		
	3.	SDP netstat does not available on RHEL5 (due to
compilation errors)
		
	4.	Routing table problem in SLES10 when using port #2 
	5.	RDS compiles only on kernel 2.6.18/19/20 
	6.	MVAPICH2 installation fails on SuSE Pro 10. 
	7.	mstflint is not working on ppc64 
	8.	RDS was not tested
		

	Missing features that should be completed for the Beta:
	

	1.	Add madeye utility 
	2.	RDS to support SLES10 and RHEL 

	For details on each module status see:
https://wiki.openfabrics.org/tiki-index.php?page=Teleconf+02-12-2007
	
	
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070214/37a4c4d0/attachment.html>

From ardavis at ichips.intel.com  Wed Feb 14 13:26:38 2007
From: ardavis at ichips.intel.com (Arlin Davis)
Date: Wed, 14 Feb 2007 13:26:38 -0800
Subject: [openib-general] OFED 1.2 dapl and dat.conf
In-Reply-To: <1171397522.21471.7.camel@stevo-desktop>
References: <1171397522.21471.7.camel@stevo-desktop>
Message-ID: <45D37E8E.5050800@ichips.intel.com>

Steve Wise wrote:

>Currently, the dapl rpms don't install dat.conf.  I think they probably
>should, eh?  Maybe in <prefix>/etc/dat.conf
>  
>
my specfile is setup to target sysconfdir which is typically set to 
`$(prefix)/etc'

%{_sysconfdir}/dat.conf

I am not sure how the 1.2 scripts are building the rpms. Maybe Vladimir 
can help explain?

-arlin


From mshefty at ichips.intel.com  Wed Feb 14 13:36:46 2007
From: mshefty at ichips.intel.com (Sean Hefty)
Date: Wed, 14 Feb 2007 13:36:46 -0800
Subject: [openib-general] Problem is routing CM REQ
In-Reply-To: <6.2.0.14.2.20070214105420.09493fc8@esmail.cup.hp.com>
References: <6.2.0.14.2.20070213143635.09393fe0@esmail.cup.hp.com>
	<000001c74fca$6c765170$8698070a@amr.corp.intel.com>
	<6.2.0.14.2.20070214105420.09493fc8@esmail.cup.hp.com>
Message-ID: <45D380EE.70300@ichips.intel.com>

Assume that the active and passive sides of a connection request are on 
different subnets and:

Active side - LID 1
Active side router - LID 2
Passive side - LID 93
Passive side router - LID 94

What values are you suggesting are used for:

Active side QP - DLID
Passive side QP - DLID
CM REQ Primary Local Port LID

- Sean


From sean.hefty at intel.com  Wed Feb 14 13:45:40 2007
From: sean.hefty at intel.com (Sean Hefty)
Date: Wed, 14 Feb 2007 13:45:40 -0800
Subject: [openib-general] GetTable path record query not returning DGID=SGID
	paths
Message-ID: <000501c75081$7898edc0$ff0da8c0@amr.corp.intel.com>

We're seeing a situation where it appears that the response to a GetTable path
record query is not returning paths where the DGID is the same as the SGID.  The
query is setting the SGID and number of paths.

We're still investigating if this is indeed the case, but does anyone know if
such a query should return paths where DGID=SGID?

- Sean


From krause at cup.hp.com  Wed Feb 14 13:49:26 2007
From: krause at cup.hp.com (Michael Krause)
Date: Wed, 14 Feb 2007 13:49:26 -0800
Subject: [openib-general] IB routing discussion summary
In-Reply-To: <000601c74fb4$6ed83840$8698070a@amr.corp.intel.com>
References: <000601c74fb4$6ed83840$8698070a@amr.corp.intel.com>
Message-ID: <6.2.0.14.2.20070214134413.0944fcf8@esmail.cup.hp.com>


I do not see the need for any of this.   The router protocol should be 
designed to work with each subnet's SM / SA to provide information on what 
GID prefix is on each router Port.  This is used to look up the subnet 
local LRH fields.

The only cross-subnet challenges are global based, e.g. what is the P_Key 
to use and how to manage those across subnets or how should TClass be 
interpreted to achieve a consistent behavior independent of how the TClass 
is subnet local mapped to a SL.    These were the types of challenges 
remaining when we stopped development of the router specification.   If the 
IBTA decides to develop a router specification then it might be best to 
join that effort and work it out in detail before attempting to develop the 
management infrastructure.  May be able to slightly lag in order to 
validate the technical directions that the spec will take without having to 
wait until 1.0 to say, yep, this looks good or here is where you need to 
change the spec.   Not clear what can be developed until there is a router 
specification to execute to in the industry.

Mike


At 01:17 PM 2/13/2007, Sean Hefty wrote:
>Here's a first take at summarizing the IB routing discussion.
>
>The following spec references are noted:
>
>9.6.1.5 C9-54. The SLID shall be validated (for connected QPs).
>12.7.11. CM REQ Local Port LID - is LID of remote router.
>13.5.4: Defines reversible paths.
>
>The main discussion point centered on trying to meet 9.6.1.5 C9-54.  This
>requires that the forward and reverse data flows between two QPs traverse the
>same router LID on both subnets.  The idea was made to try to eliminate this
>compliance statement for packets carrying a GRH, but this is viewed as going
>against the spirit of IBA.
>
>Ideas were presented around trying to construct an 'inter-subnet path record'
>that contained the following:
>
>    - Side A GRH.SGID = active side's Port GID
>    - Side A GRH.DGID = passive side's Port GID
>    - Side A LRH.SLID = any active side's port LID
>    - Side A LRH.DLID = A subnet router
>    - Side A LRH.SL   = SL to A subnet router
>
>    - Side B GRH.SGID = Side A GRH.DGID
>    - Side B GRH.DGID = Side A GRH.SGID
>    - Side B LRH.SLID = any passive side's port LID
>    - Side B LRH.DLID = B subnet router
>    - Side B LRH.SL   = SL to B subnet router
>
>It is still unclear how such a record can be constructed.  But communication
>with remote SAs might be achieved by using a well-known GID suffix.  It's also
>unclear whether the fields in a path record are relative to the SA's subnet or
>the SGID.
>
>It's anticipated that SAs will need to interact with routers, but in an
>unspecified manner.


From mshefty at ichips.intel.com  Wed Feb 14 14:02:38 2007
From: mshefty at ichips.intel.com (Sean Hefty)
Date: Wed, 14 Feb 2007 14:02:38 -0800
Subject: [openib-general] IB routing discussion summary
In-Reply-To: <6.2.0.14.2.20070214134413.0944fcf8@esmail.cup.hp.com>
References: <000601c74fb4$6ed83840$8698070a@amr.corp.intel.com>
	<6.2.0.14.2.20070214134413.0944fcf8@esmail.cup.hp.com>
Message-ID: <45D386FE.5080202@ichips.intel.com>

Mike, are you expecting that routers will modify CM messages as they flow 
between subnets?

- Sean


From rdreier at cisco.com  Wed Feb 14 14:08:03 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Wed, 14 Feb 2007 14:08:03 -0800
Subject: [openib-general] [PATCH 2.6.21-rc1 1/5] ehca: reworked irq
 handler to avoid/reduce missed irq events
In-Reply-To: <200702141740.48286.hnguyen@linux.vnet.ibm.com> (Hoang-Nam
	Nguyen's message of "Wed, 14 Feb 2007 17:40:47 +0100")
References: <200702141740.48286.hnguyen@linux.vnet.ibm.com>
Message-ID: <adafy981jgc.fsf@cisco.com>

Looks fine but this patch at least has serious whitespace
damage... please resend a fixed version.

 - R.


From rdreier at cisco.com  Wed Feb 14 14:16:28 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Wed, 14 Feb 2007 14:16:28 -0800
Subject: [openib-general] [PATCH] IPoIB: Only allow root to change between
 datagram and connected mode
Message-ID: <ada4ppo1j2b.fsf@cisco.com>

Change the permissions of the "mode" sysfs attribute to be S_IWUSR
instead of S_IWUGO.

Signed-off-by: Roland Dreier <rolandd at cisco.com>
---
FYI -- I'm planning to merge this for 2.6.21.  It doesn't seem
appropriate to allow ordinary users to mess with this sort of config.

 drivers/infiniband/ulp/ipoib/ipoib_cm.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/infiniband/ulp/ipoib/ipoib_cm.c b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
index 2d48387..8881a71 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_cm.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
@@ -1138,7 +1138,7 @@ static ssize_t set_mode(struct device *d, struct device_attribute *attr,
 	return -EINVAL;
 }
 
-static DEVICE_ATTR(mode, S_IWUGO | S_IRUGO, show_mode, set_mode);
+static DEVICE_ATTR(mode, S_IWUSR | S_IRUGO, show_mode, set_mode);
 
 int ipoib_cm_add_mode_attr(struct net_device *dev)
 {
-- 
1.4.4.4


From krause at cup.hp.com  Wed Feb 14 14:15:50 2007
From: krause at cup.hp.com (Michael Krause)
Date: Wed, 14 Feb 2007 14:15:50 -0800
Subject: [openib-general] Problem is routing CM REQ
In-Reply-To: <45D380EE.70300@ichips.intel.com>
References: <6.2.0.14.2.20070213143635.09393fe0@esmail.cup.hp.com>
	<000001c74fca$6c765170$8698070a@amr.corp.intel.com>
	<6.2.0.14.2.20070214105420.09493fc8@esmail.cup.hp.com>
	<45D380EE.70300@ichips.intel.com>
Message-ID: <6.2.0.14.2.20070214135826.096b7638@esmail.cup.hp.com>

At 01:36 PM 2/14/2007, Sean Hefty wrote:
>Assume that the active and passive sides of a connection request are on 
>different subnets and:
>
>Active side - LID 1
>Active side router - LID 2
>Passive side - LID 93
>Passive side router - LID 94
>
>What values are you suggesting are used for:
>
>Active side QP - DLID
>Passive side QP - DLID
>CM REQ Primary Local Port LID

Subnet A is:
QP Port LID 1
Router A Port LID 2

Subnet B is:
QP Port LID 93
Router B Port LID 94


Process steps:

- Router A populates SM / SA A with the GID prefix it can route.   SM / SA 
A will have configured the router Port with the appropriate local route 
information and hence have assigned it LID 2.

- CM associated with Port LID 1 queries the SM / SA to identify a path to a 
GID Prefix.   SM / SA returns a path record indicating a global route, i.e. 
one that requires a GRH, is available and provides the CM with the 
information targeting router Port LID 2.

- CM creates a REQ and populates the global information to identify the 
remote endnode.  The LRH generated targets Port LID 2.  The GRH is 
generated to target the remote subnet so the router will comprehend how to 
process the packet.

- Router A receives the packet and examines the GRH.   Via its router 
protocol, it has previously identified what router Port will lead to the 
next hop on the path to the destination endnode.

- If the endnode is subnet local, say subnet B, then the router generates a 
LRH with QP LID 93 and emits that on router Port LID 94.

- QP in subnet B receives the CM REQ and validates the LRH.  Given these 
messages are via UD service and not RC / UC, the validation rules for the 
LRH are different.   The CM agent processes the request and returns an 
appropriate response by filling in a GRH that replaces the SGID with the 
DGID and so forth so the addresses are basically reflected back.   The 
response uses QP port LID 93 and targets router Port 94.

- Router B Port 94 receives the response.  It parses the GRH and determines 
the next hop port.   In this example, the response goes out router A Port 2 
and targets QP Port LID 1.  The LRH is generated using these 
fields.  Again, since CM is targeting a UD QP, the LRH validation rules are 
different.

- Once the connection is established, the QP on subnet A will send packets 
to QP on subnet B using a GRH that is processed by the router with each QP 
using a LRH that targets the router port locally attached to its 
subnet.   The router is responsible for generating a LRH to forward to the 
next hop.   These packets are now in a RC / UC data flow so the LRH 
validation is per the sections cited in this e-mail string.

In all cases, the router protocol is responsible for generation of a LRH 
that will work within each subnet.  There is no exchange of subnet local 
information between the subnets.  Each subnet's SM/SA only tracks what is 
local to it as well as what GID prefix can be routed via a given LID.   If 
multiple LID can route to a given GID prefix, multiple path records are 
returned.   Which to choose is not specified by the specifications so it 
can be any policy one desires. If the router protocol communicates a "cost" 
to a given path in order to give an indication of appropriateness for a 
given workload, then this should be communicated to the CM agent.

Mike 


From hch at infradead.org  Wed Feb 14 14:28:24 2007
From: hch at infradead.org (Christoph Hellwig)
Date: Wed, 14 Feb 2007 22:28:24 +0000
Subject: [openib-general] [PATCH 2.6.21-rc1 4/5] ehca: replace yield()
 by wait_for_completion()
In-Reply-To: <200702141741.35444.hnguyen@linux.vnet.ibm.com>
References: <200702141741.35444.hnguyen@linux.vnet.ibm.com>
Message-ID: <20070214222824.GA11579@infradead.org>

> @@ -332,7 +333,7 @@ int ehca_destroy_cq(struct ib_cq *cq)
>         spin_lock_irqsave(&ehca_cq_idr_lock, flags);
>         while (my_cq->nr_callbacks) {
>                 spin_unlock_irqrestore(&ehca_cq_idr_lock, flags);
> -               yield();
> +               wait_for_completion(&my_cq->zero_callbacks);
>                 spin_lock_irqsave(&ehca_cq_idr_lock, flags);
>         }

A while loop around wait_for_completion doesn't make all that much sense.
I suspect a simple

	if (my_cq->nr_callbacks)
		wait_for_completion(&my_cq->zero_callbacks);

Is what you need.


From halr at voltaire.com  Wed Feb 14 14:28:09 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 14 Feb 2007 17:28:09 -0500
Subject: [openib-general] GetTable path record query not returning
	DGID=SGID paths
In-Reply-To: <000501c75081$7898edc0$ff0da8c0@amr.corp.intel.com>
References: <000501c75081$7898edc0$ff0da8c0@amr.corp.intel.com>
Message-ID: <1171492082.22446.123948.camel@hal.voltaire.com>

On Wed, 2007-02-14 at 16:45, Sean Hefty wrote:
> We're seeing a situation where it appears that the response to a GetTable path
> record query is not returning paths where the DGID is the same as the SGID.

Is this OpenSM or a vendor SM ?

> The query is setting the SGID and number of paths.

Yes, that's the min required for GetTable request.

> We're still investigating if this is indeed the case, but does anyone know if
> such a query should return paths where DGID=SGID?

I believe it should but I'm not sure there's specific compliance. Such
loopback paths are mentioned though.

-- Hal

> - Sean


From krause at cup.hp.com  Wed Feb 14 14:39:59 2007
From: krause at cup.hp.com (Michael Krause)
Date: Wed, 14 Feb 2007 14:39:59 -0800
Subject: [openib-general] IB routing discussion summary
In-Reply-To: <45D386FE.5080202@ichips.intel.com>
References: <000601c74fb4$6ed83840$8698070a@amr.corp.intel.com>
	<6.2.0.14.2.20070214134413.0944fcf8@esmail.cup.hp.com>
	<45D386FE.5080202@ichips.intel.com>
Message-ID: <6.2.0.14.2.20070214143823.09695658@esmail.cup.hp.com>

At 02:02 PM 2/14/2007, Sean Hefty wrote:
>Mike, are you expecting that routers will modify CM messages as they flow 
>between subnets?

The router parses the GRH, strips the LRH, attaches a new LRH to the next 
hop with the contents of the LRH filled in per its internal 
policies.   Nothing more for the main packet processing.   The router 
interacts with each subnet's SM/SA to insure the path records can be 
provided to the CM to fill in the right information.

Mike 


From mshefty at ichips.intel.com  Wed Feb 14 15:01:55 2007
From: mshefty at ichips.intel.com (Sean Hefty)
Date: Wed, 14 Feb 2007 15:01:55 -0800
Subject: [openib-general] GetTable path record query not returning
 DGID=SGID paths
In-Reply-To: <1171492082.22446.123948.camel@hal.voltaire.com>
References: <000501c75081$7898edc0$ff0da8c0@amr.corp.intel.com>
	<1171492082.22446.123948.camel@hal.voltaire.com>
Message-ID: <45D394E3.5080805@ichips.intel.com>

>>We're seeing a situation where it appears that the response to a GetTable path
>>record query is not returning paths where the DGID is the same as the SGID.
> 
> Is this OpenSM or a vendor SM ?

This is with opensm.  When we're running with the local SA cache, we're seeing 
route resolution (path record lookup) retries, but only for loopback 
connections.  This suggests that that we're not getting path records for DGID=SGID.

- Sean


From mshefty at ichips.intel.com  Wed Feb 14 15:16:41 2007
From: mshefty at ichips.intel.com (Sean Hefty)
Date: Wed, 14 Feb 2007 15:16:41 -0800
Subject: [openib-general] Problem is routing CM REQ
In-Reply-To: <6.2.0.14.2.20070214135826.096b7638@esmail.cup.hp.com>
References: <6.2.0.14.2.20070213143635.09393fe0@esmail.cup.hp.com>
	<000001c74fca$6c765170$8698070a@amr.corp.intel.com>
	<6.2.0.14.2.20070214105420.09493fc8@esmail.cup.hp.com>
	<45D380EE.70300@ichips.intel.com>
	<6.2.0.14.2.20070214135826.096b7638@esmail.cup.hp.com>
Message-ID: <45D39859.4070700@ichips.intel.com>

I agree with what was in your response, however, this is how I interpret your 
answers:

>> Active side QP - DLID
2

>> Passive side QP - DLID
94

>> CM REQ Primary Local Port LID
no answer given

> - CM creates a REQ and populates the global information to identify the 
> remote endnode.  The LRH generated targets Port LID 2.  The GRH is 
> generated to target the remote subnet so the router will comprehend how 
> to process the packet.

What is carried in the Primary REQ Local Port LID and Primary Remote Port LID 
fields in the REQ?  My claim is that in this example the values are 94 and 93, 
respectively.  The passive side uses these values to configure its QP.  This 
means that the active side requires knowledge of the LIDs that are used on the 
passive side subnet.  If you believe that other values are carried in the REQ, 
what are they, and how are they used?

- Sean


From halr at voltaire.com  Wed Feb 14 15:20:20 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 14 Feb 2007 18:20:20 -0500
Subject: [openib-general] GetTable path record query not returning
 DGID=SGID paths
In-Reply-To: <45D394E3.5080805@ichips.intel.com>
References: <000501c75081$7898edc0$ff0da8c0@amr.corp.intel.com>
	<1171492082.22446.123948.camel@hal.voltaire.com>
	<45D394E3.5080805@ichips.intel.com>
Message-ID: <1171495184.22446.126481.camel@hal.voltaire.com>

On Wed, 2007-02-14 at 18:01, Sean Hefty wrote:
> >>We're seeing a situation where it appears that the response to a GetTable path
> >>record query is not returning paths where the DGID is the same as the SGID.
> > 
> > Is this OpenSM or a vendor SM ?
> 
> This is with opensm.  When we're running with the local SA cache, we're seeing 
> route resolution (path record lookup) retries, but only for loopback 
> connections.  This suggests that that we're not getting path records for DGID=SGID.

What is the value of NumbPath and how large a subnet is this ? I'm
pretty sure this works; at least it did the last I checked.

-- Hal

> - Sean


From sean.hefty at intel.com  Wed Feb 14 15:42:20 2007
From: sean.hefty at intel.com (Sean Hefty)
Date: Wed, 14 Feb 2007 15:42:20 -0800
Subject: [openib-general] GetTable path record query not
 returningDGID=SGID paths
In-Reply-To: <1171495184.22446.126481.camel@hal.voltaire.com>
Message-ID: <000701c75091$c4f59fa0$ff0da8c0@amr.corp.intel.com>

>What is the value of NumbPath and how large a subnet is this ? I'm
>pretty sure this works; at least it did the last I checked.

By default, NumbPath should be 127, but I would have expected a path record even
with it set to 1.  (I don't think we were using different PKeys or anything like
that.)

We haven't looked into this in more detail yet.  This was our observation while
testing on a larger (64 node) cluster this morning that we don't have access to
at the moment.  With the local SA cache running, we were surprised to see any
retries, and when we looked into it more, retries were always for loopback
connections.

Let me look into this more on the host stack side.

- Sean


From rdreier at cisco.com  Wed Feb 14 16:50:06 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Wed, 14 Feb 2007 16:50:06 -0800
Subject: [openib-general] [PATCH 2.6.21-rc1 4/5] ehca: replace yield()
 by wait_for_completion()
In-Reply-To: <200702141741.35444.hnguyen@linux.vnet.ibm.com> (Hoang-Nam
	Nguyen's message of "Wed, 14 Feb 2007 17:41:35 +0100")
References: <200702141741.35444.hnguyen@linux.vnet.ibm.com>
Message-ID: <adahctow8g1.fsf@cisco.com>

I agree with Christoph -- the use of wait_for_completion() in a loop
makes no sense.  When you send a new copy of this patch without
whitespace damage, please fix that up too...


From michael.arndt at informatik.tu-chemnitz.de  Wed Feb 14 17:34:32 2007
From: michael.arndt at informatik.tu-chemnitz.de (Michael Arndt)
Date: Thu, 15 Feb 2007 02:34:32 +0100
Subject: [openib-general] Unknown SMP Recv
References: <000901c74938$e10b2a30$21606d86@one7>
	<1170689654.4525.201415.camel@hal.voltaire.com>
	<001401c74946$a664a2e0$21606d86@one7>
	<1170695591.4525.207604.camel@hal.voltaire.com>
	<002001c74a33$c2ec1db0$21606d86@one7>
	<1170807564.4525.324195.camel@hal.voltaire.com>
	<001e01c74be2$b4889310$21606d86@one7>
	<1170994529.31538.124584.camel@hal.voltaire.com>
	<000401c74c6d$ce4875f0$21606d86@one7>
	<1171044773.31538.175280.camel@hal.voltaire.com>
	<000401c74c79$74439b50$21606d86@one7>
	<1171051141.2767.7.camel@localhost>
	<001001c74c87$8b653470$21606d86@one7>
	<1171122546.31538.251673.camel@hal.voltaire.com>
Message-ID: <000601c750a1$71746a40$21606d86@one7>

Hi,

I used your changes and it helps in some cases, but there are still 
situations where the umad_send return with that error. I try to describe 
this situation:

(Node 1) -> (Node 2) -> (Node 3)

Node 1:  sends 100 SubnGets to Node 3 (Dr [0][1][1])
Node 2: traverse 100 SubGets to Node 3 and also traverse 100 SubnGetResp to 
Node 1
Node 3: response 100 times

That works fine!! Please don't wonder that the Node2 gets the packets, 
that's because I changed the SMI.

But if I start now the sender on Node 1 again, so that it sends another 100 
SubnGets the Node 2 produces umad_send errors. The error didn't come every 
time. The receive are allways ok and also the packets are.

Below I attach the main code from the router tool on Node 2. I also tested 
to allocate a packet for every single receive and send, but that didn't work 
as well.

What is about the size of the packet, could there be any error?

Thanks Michael

while(run){

  bcopy((char*)&fd_ports,(char*)&fd_ports_tmp,sizeof(fd_ports));

  activ = select(highest_fd+1, (fd_set*)&fd_ports_tmp, (fd_set*)0, 
(fd_set*)0,(struct timeval*)0);

  if (activ < 0 ){
   if (run) printf("Error: select : %i\n",activ);
   run = 0;
  }
  else if (activ == 0) printf("Nothing to do\n");
  else {

   // ++ Alloc MAD ++
     //printf("... Alloc UMAD .......................");
     if (!(umad = umad_alloc(Port_ID_cnt, umad_size() + IB_MAD_SIZE))){
        printf("Error: umad_alloc\n");
        goto Exit;
     }
      //printf("done\n");

     // ++ Alloc SMP Pointer ++
     //printf("... Alloc SMP ........................");
     smp = (struct drsmp**) malloc(Port_ID_cnt * sizeof(struct drsmp*));
     for (i = 0; i < Port_ID_cnt; i++)
        smp[i] = (struct drsmp*) umad_get_mad(umad + (i * (umad_size() + 
IB_MAD_SIZE)));
     //printf("done\n");


   // ++ Check All Ports where something is to do ++
   for (i = 0; i < Port_ID_cnt; i++) {
    if (  (Port_ID[i] >= 0) && (Agent_ID[i] >= 0) && 
(FD_ISSET(umad_get_fd(Port_ID[i]),(fd_set*)&fd_ports_tmp))) {

     smplength = IB_MAD_SIZE;
     packet_size = umad_size() + IB_MAD_SIZE;

     printf("... Recv Mad (Port: %i (ID:%i).....",i+1,Port_ID[i]);
     // ++ Receive ++
       if ((ret = umad_recv(Port_ID[i], umad + (i * packet_size), 
&smplength, timeout_ms_r)) != Agent_ID[i]){
          printf("Error: umad_recv: %s ,Nr: %i\n", 
drmad_status_str(smp[i]),ret);
      if (optExitRecvFail) run = 0;
     }
     else {
      // ++ Drop Echo ++
      if (smp[i]->initial_path[1] != 0) {

       // ++ Keep TID in Mind with supporting turning algorithm ++
       if ( !( (smp[i]->initial_path[smp[i]->hop_ptr] == i+1)  &&
          (smp[i]->status & DIRECTION)        &&
          (smp[i]->hop_cnt == smp[i]->hop_ptr)     &&
          (smp[i]->initial_path[smp[i]->hop_ptr] != 
smp[i]->initial_path[smp[i]->hop_ptr - 1]) )
          &&
           ( (Agent_TIDs[i] == -1) || (Agent_TIDs[i] != 
(own_ntoh64(smp[i]->tid) >> 32))   )
        )
        Agent_TIDs[i] = smp[i]->tid;
       printf("TID: 0x%lx\n",own_ntoh64(Agent_TIDs[i]));

       // ++ Message Logging ++
       if (optMsgLog) {
        fprintf(MsgLogFile,"...............................................................................................\n");
        fprintf(MsgLogFile,"... Recv Mad (Port: %i 
(ID:%i)...............\n",i+1,Port_ID[i]);
        fprintf(MsgLogFile,"... Recv TID: 0x%lx 
\n",own_ntoh64(Agent_TIDs[i]));
        dump_dr_smp(smp[i], MsgLogFile);
       }

       // ++ Looking up the Out-Port ++
       Out_Port_index = routing(smp[i],Devices_Info,Devices_cnt);

       if ((Out_Port_index >= 0) && (Port_ID[Out_Port_index] >=0)){
        printf("... Send Mad (Port: %i 
(ID:%i).....",Out_Port_index+1,Port_ID[Out_Port_index]);

        // ++ Replace TID
        if (Agent_TIDs[Out_Port_index] != -1) smp[i]->tid = (uint64_t) 
Agent_TIDs[Out_Port_index];

        // ++ Sending ++
        //printf("%i\n",timeout_ms_s); //= (smp[i]->status & DIRECTION)? 0 : 
200;
          if ((ret = umad_send(Port_ID[Out_Port_index], 
Agent_ID[Out_Port_index], umad + (i * packet_size), smplength, 
(smp[i]->status & DIRECTION)? 0 : timeout_ms_s, 3)) < 0){
           printf("Error: umad_send Nr: %i \n",ret);
         if (optExitSendFail) run = 0;
        }
          else printf("TID: 0x%lx 
\n",own_ntoh64(Agent_TIDs[Out_Port_index]));

        if (optMsgLog) {
         fprintf(MsgLogFile,"... Send TID: 0x%lx 
\n",own_ntoh64(Agent_TIDs[Out_Port_index]));
             fprintf(MsgLogFile,"... Send Mad (Port: %i 
(ID:%i)(%s)(%i)...............\n",Out_Port_index+1,Port_ID[Out_Port_index],(ret 
 >= 0)?"OK":"Fail",(smp[i]->status & DIRECTION)? 0 : timeout_ms_s);
                         fprintf(MsgLogFile,"...............................................................................................\n");
         fflush(MsgLogFile);
                      }
        traversed++;
       }
      }
      else {
       printf("dropped, probably there is missing a response mad\n");
       dropped++;
      }
     }
    }
   }
   if (umad) umad_free(umad);
  }
  printf("... Traversed Packets (%i)(%i) 
.............................\n",traversed,dropped);
 }


From michael.arndt at informatik.tu-chemnitz.de  Wed Feb 14 18:12:54 2007
From: michael.arndt at informatik.tu-chemnitz.de (Michael Arndt)
Date: Thu, 15 Feb 2007 03:12:54 +0100
Subject: [openib-general] Unknown SMP Recv
References: <000901c74938$e10b2a30$21606d86@one7>
	<1170689654.4525.201415.camel@hal.voltaire.com>
	<001401c74946$a664a2e0$21606d86@one7>
	<1170695591.4525.207604.camel@hal.voltaire.com>
	<002001c74a33$c2ec1db0$21606d86@one7>
	<1170807564.4525.324195.camel@hal.voltaire.com>
	<001e01c74be2$b4889310$21606d86@one7>
	<1170994529.31538.124584.camel@hal.voltaire.com>
	<000401c74c6d$ce4875f0$21606d86@one7>
	<1171044773.31538.175280.camel@hal.voltaire.com>
	<000401c74c79$74439b50$21606d86@one7>
	<1171051141.2767.7.camel@localhost>
	<001001c74c87$8b653470$21606d86@one7>
	<1171122546.31538.251673.camel@hal.voltaire.com>
Message-ID: <000801c750a6$cd925120$21606d86@one7>

Hi,

what I forgot was that the write function in umad_send returns with -1 if 
the error occurs. Maybe that helps.

Thanks Michael 


From nimrodg at mellanox.com  Wed Feb 14 20:16:20 2007
From: nimrodg at mellanox.com (Nimrod Gindi)
Date: Wed, 14 Feb 2007 20:16:20 -0800
Subject: [openib-general] OFED release testing Task force meeting minutes
Message-ID: <1E3DCD1C63492545881FACB6063A57C1C8275A@mtiexch01.mti.com>

Meeting took place on Wednesday - Feb. 7th, 2007    8:30AM (PST)

 
Agenda:

1. Review report summary suggestion 1 (sent by Amit K.- Mellanox).

2. Review report summary suggestion 2 (sent by Moni L.- Voltaire).

3. Review testing matrix report (sent by Jeremy B.- Qlogic).

 
Attending companies: Mellanox, IBM, Qlogic, Voltaire, SystemFabricWorks

 
Discussion Items and Action Items:

1)       Reviewed the different reports.

2)       Minor suggestions for the structure were suggested and adopted
(see attached combined suggestion for review).

3)       Agreed to review and close on structure the content will be
done later.  

4)       Agreed Action Items:

a.       AI 1: Amit K (Mellanox) - send update/fixed spread-sheet.

b.       AI 2: Jeremy B (Qlogic) - send update/fixed spread-sheet.

c.       AI 3: Nimrod G. (Mellanox) - send combined suggestion for
review.

 
We agreed to review the above via e-mails before the next meeting.

Follow-up meeting scheduled for 21st  February 2007 8:30am PDT=11:30am
EDT=6:30pm Israel.

 
Nimrod  Gindi

Mellanox Technologies Ltd.

mail  :  nimrodg at mellanox.com

Cell  :  +1-408-750-4801

Office:  +1-347-342-0011

Fax   :  +1-212-987-0275

 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070214/d4b8dfc0/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: OFED testing report format rev1.xls
Type: application/vnd.ms-excel
Size: 45056 bytes
Desc: OFED testing report format rev1.xls
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070214/d4b8dfc0/attachment.xls>

From rowland at cse.ohio-state.edu  Wed Feb 14 20:31:32 2007
From: rowland at cse.ohio-state.edu (Shaun Rowland)
Date: Wed, 14 Feb 2007 23:31:32 -0500
Subject: [openib-general] mvapich2 ofed 1.2 problem
In-Reply-To: <ada1wktlwux.fsf@cisco.com>
References: <1171380610.15471.25.camel@stevo-desktop>
	<adar6sum1fq.fsf@cisco.com> <1171386686.15471.36.camel@stevo-desktop>
	<adazm7ikm7q.fsf@cisco.com> <45D1FD0B.2080606@cse.ohio-state.edu>
	<ada1wktlwux.fsf@cisco.com>
Message-ID: <45D3E224.9060306@cse.ohio-state.edu>

Roland Dreier wrote:
>  > When I build using the OFED-1.2-20070208-1508, libibverbs 1.0 is what is
>  > built, at least by looking at the .so file result:
>  > 
>  > [rowland at z0 ~]$ ls /usr/local/ofed/lib64/ |grep ibverbs libibverbs.a
>  > libibverbs.so
>  > libibverbs.so.1
>  > libibverbs.so.1.0.0
> 
> The soname hasn't changed because the library is still compatible.
> But (I hope at least) OFED has libibverbs 1.1.

The soname is libibverbs.so.1, so I guess the longer name would not
matter anyway. Clearly, what I posted shows the IBVERBS 1.1 ABI is
there. I think I have figured out why our code has this problem. The
problem below is similar to the original one posted about.

I did some experimentation with the srq_pingpong libibverbs example
code. First I built it directly with:


gcc -g -c pingpong.c -I/usr/local/ofed/include

gcc -g -c -D_GNU_SOURCE srq_pingpong.c -I/usr/local/ofed/include

gcc -g -o srq_pingpong srq_pingpong.o pingpong.o -L/usr/local/ofed/lib64 
-libverbs


This works.  Next I copied srq_pingpong.c to two files:

srq_pingpong_rowland.c
      - just has a main function that calls lib_start().

srq_pingpong_lib_rowland.c
      - main() changed to lib_start().

This moves all of the SRQ pingpong code into a shared library. If I
build this shared library in this way, it works:


gcc -g -fpic -c pingpong.c -I/usr/local/ofed/include

gcc -g -fpic -c -D_GNU_SOURCE srq_pingpong_lib_rowland.c 
-I/usr/local/ofed/include

gcc -g -shared -Wl,-soname,libsrqtest.so -o libsrqtest.so 
srq_pingpong_lib_rowland.o pingpong.o -L/usr/local/ofed/lib64 -libverbs

gcc -g -o srq_pingpong_rowland srq_pingpong_rowland.c -L$PWD -lsrqtest


Above I am linking libibverbs directly into my libsrqtest.so
library. This works and the IBVERBS 1.1 ABI is clearly in the
libsrqtest.so file:

[rowland at z1 ibverbs-examples]$ nm libsrqtest.so |grep ibv |head
                  U ibv_ack_cq_events@@IBVERBS_1.1
                  U ibv_alloc_pd@@IBVERBS_1.1
                  U ibv_close_device@@IBVERBS_1.1
                  U ibv_create_comp_channel@@IBVERBS_1.0
                  U ibv_create_cq@@IBVERBS_1.1
                  U ibv_create_qp@@IBVERBS_1.1
                  U ibv_create_srq@@IBVERBS_1.1
                  U ibv_dealloc_pd@@IBVERBS_1.1
                  U ibv_dereg_mr@@IBVERBS_1.1
                  U ibv_destroy_comp_channel@@IBVERBS_1.0

However, if I build in a similar way to MVAPICH2, the resulting program
fails:


gcc -g -fpic -c pingpong.c -I/usr/local/ofed/include

gcc -g -fpic -c -D_GNU_SOURCE srq_pingpong_lib_rowland.c 
-I/usr/local/ofed/include

gcc -g -shared -Wl,-soname,libsrqtest.so -o libsrqtest.so 
srq_pingpong_lib_rowland.o pingpong.o

gcc -g -o srq_pingpong_rowland srq_pingpong_rowland.c -L$PWD 
-L/usr/local/ofed/lib64 -lsrqtest -libverbs


Above I am not linking libibverbs into libsrqtest.so, thus it is
required on the last gcc line. This is how MVAPICH2's libmpich.so file
works, and from past experience, I've seen this before. Running shows:

[rowland at z1 ibverbs-examples]$ gdb ./srq_pingpong_rowland
GNU gdb Red Hat Linux (6.3.0.0-1.132.EL4rh)
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain 
conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu"...Using host 
libthread_db library "/lib64/tls/libthread_db.so.1".

(gdb) r
Starting program: 
/home/7/rowland/z1-test/ibverbs-examples/srq_pingpong_rowland
[Thread debugging using libthread_db enabled]
[New Thread 182896403968 (LWP 29858)]

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 182896403968 (LWP 29858)]
post_srq_recv_wrapper_1_0 (srq=0x5075b0, wr=0x7fbfff88d0, 
bad_wr=0x7fbfff88c8)
     at src/compat-1_0.c:312
312     src/compat-1_0.c: No such file or directory.
         in src/compat-1_0.c
(gdb) bt
#0  post_srq_recv_wrapper_1_0 (srq=0x5075b0, wr=0x7fbfff88d0,
     bad_wr=0x7fbfff88c8) at src/compat-1_0.c:312
#1  0x0000002a95559e12 in ibv_post_srq_recv (srq=0x5075b0,
     recv_wr=0x7fbfff88d0, bad_recv_wr=0x7fbfff88c8)
     at /usr/local/ofed/include/infiniband/verbs.h:915
#2  0x0000002a95559dcf in pp_post_recv (ctx=0x5023d0, n=500)
     at srq_pingpong_lib_rowland.c:496
#3  0x0000002a9555a614 in lib_start (argc=1, argv=0x7fbffff7f8)
     at srq_pingpong_lib_rowland.c:696
#4  0x0000000000400608 in main (argc=1, argv=0x7fbffff7f8)
     at srq_pingpong_rowland.c:36
(gdb) quit

It is not clear to me why the difference of either linking libibverbs
into libsrqtest.so or not doing so causes the IBVERBS 1.1 ABI to be used
or not. I looked at the libibverbs code, and the 1.1 ABI is the default.
The libsrqtest.so file in the above case seems to have lost this
information:

[rowland at z1 ibverbs-examples]$ nm libsrqtest.so |grep ibv |head
                  U ibv_ack_cq_events
                  U ibv_alloc_pd
                  U ibv_close_device
                  U ibv_create_comp_channel
                  U ibv_create_cq
                  U ibv_create_qp
                  U ibv_create_srq
                  U ibv_dealloc_pd
                  U ibv_dereg_mr
                  U ibv_destroy_comp_channel

I've never had to deal with an ABI issue like this in shared library
linking/usage. Does it make sense for this to be the case? I think
perhaps it does, but I wanted to ask.

I've placed my test code here if it helps:

http://www.cse.ohio-state.edu/~rowland/ibverbs-examples.tar.gz

I have a fix for our code that I am testing now. It seems to work and
solve the observed problems, but more testing will be required to be
sure there are no issues. This will require a new SRPM if the fix is
required, which it seems at this point.
-- 
Shaun Rowland	rowland at cse.ohio-state.edu
http://www.cse.ohio-state.edu/~rowland/


From halr at voltaire.com  Wed Feb 14 20:47:20 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 14 Feb 2007 23:47:20 -0500
Subject: [openib-general] GetTable path record query not
 returningDGID=SGID paths
In-Reply-To: <000701c75091$c4f59fa0$ff0da8c0@amr.corp.intel.com>
References: <000701c75091$c4f59fa0$ff0da8c0@amr.corp.intel.com>
Message-ID: <1171514817.22446.145890.camel@hal.voltaire.com>

On Wed, 2007-02-14 at 18:42, Sean Hefty wrote:
> >What is the value of NumbPath and how large a subnet is this ? I'm
> >pretty sure this works; at least it did the last I checked.
> 
> By default, NumbPath should be 127, but I would have expected a path record even
> with it set to 1.

Yes, you should be getting a PathRecord or more. Are you getting some
error returned instead ?

> (I don't think we were using different PKeys or anything like that.)

Were partitions (other than full default) being used ?

> We haven't looked into this in more detail yet.  This was our observation while
> testing on a larger (64 node) cluster this morning that we don't have access to
> at the moment.  With the local SA cache running, we were surprised to see any
> retries, and when we looked into it more, retries were always for loopback
> connections.
> 
> Let me look into this more on the host stack side.

OK; thanks.

-- Hal

> - Sean


From krkumar2 at in.ibm.com  Wed Feb 14 20:51:01 2007
From: krkumar2 at in.ibm.com (Krishna Kumar2)
Date: Thu, 15 Feb 2007 10:21:01 +0530
Subject: [openib-general] [PATCH] RDMA/iwcm: Bugs in
 cm_conn_req_handler()]
In-Reply-To: <OF15BE0751.6CC9EB03-ON65257281.00126F0E-65257281.00127A6D@LocalDomain>
Message-ID: <OF21D91F53.E5633704-ON65257283.0019C7AF-65257283.001AA507@in.ibm.com>

Steve/Tom,

I tested with rdma_bw and also introduced some failures
"randomly" in handlers, and the tests ran without any
problems.

Acked-by: Krishna Kumar <krkumar2 at in.ibm.com>

thanks,

- KK

> From: Steve Wise <swise at opengridcomputing.com>
> To: Tom Tucker <tom at opengridcomputing.com>
> Cc: Roland Dreier <rdreier at cisco.com>, openib-general at openib.org
> Subject: Re: [openib-general] [PATCH] RDMA/iwcm: Bugs in
> cm_conn_req_handler()
> Date: Sat, 10 Feb 2007 15:26:35 -0600
>
> On Sat, 2007-02-10 at 14:36 -0600, Steve Wise wrote:
> > ugh.
> >
> > There is at least one bug in this patch.  I cannot call iw_cm_reject()
> > inside destroy_cm_id() because both functions grab the iw_cm lock...
> >
> >
>
> This patch puts the iw_cm_reject() calls back in
> cm_conn_req_handler()...
>
>
> ---
>
> iw_cm_id destruction race condition fixes.
>
> From: Steve Wise <swise at opengridcomputing.com>
>
> Several changes:
>
> - iwcm_deref_id() always wakes up if there's another reference.
>
> - clean up race condition in cm_work_handler().
>
> - create static void free_cm_id() which deallocs the work entries and
then
>   kfrees the cm_id memory.  This reduces code replication.
>
> - rem_ref() if this is the last reference -and- the IWCM owns freeing the

>   cm_id, then free it.
>
> Signed-off-by: Steve Wise <swise at opengridcomputing.com>
> Signed-off-by: Tom Tucker <tom at opengridcomputing.com>
> ---
>
>  drivers/infiniband/core/iwcm.c |   47
> +++++++++++++++++++++-------------------
>  1 files changed, 25 insertions(+), 22 deletions(-)
>
> diff --git a/drivers/infiniband/core/iwcm.c
b/drivers/infiniband/core/iwcm.c
> index 1039ad5..891d1fa 100644
> --- a/drivers/infiniband/core/iwcm.c
> +++ b/drivers/infiniband/core/iwcm.c
> @@ -146,6 +146,12 @@ static int copy_private_data(struct iw_c
>     return 0;
>  }
>
> +static void free_cm_id(struct iwcm_id_private *cm_id_priv)
> +{
> +   dealloc_work_entries(cm_id_priv);
> +   kfree(cm_id_priv);
> +}
> +
>  /*
>   * Release a reference on cm_id. If the last reference is being
>   * released, enable the waiting thread (in iw_destroy_cm_id) to
> @@ -153,21 +159,14 @@ static int copy_private_data(struct iw_c
>   */
>  static int iwcm_deref_id(struct iwcm_id_private *cm_id_priv)
>  {
> -   int ret = 0;
> -
>     BUG_ON(atomic_read(&cm_id_priv->refcount)==0);
>     if (atomic_dec_and_test(&cm_id_priv->refcount)) {
>        BUG_ON(!list_empty(&cm_id_priv->work_list));
> -      if (waitqueue_active(&cm_id_priv->destroy_comp.wait)) {
> -         BUG_ON(cm_id_priv->state != IW_CM_STATE_DESTROYING);
> -         BUG_ON(test_bit(IWCM_F_CALLBACK_DESTROY,
> -               &cm_id_priv->flags));
> -         ret = 1;
> -      }
>        complete(&cm_id_priv->destroy_comp);
> +      return 1;
>     }
>
> -   return ret;
> +   return 0;
>  }
>
>  static void add_ref(struct iw_cm_id *cm_id)
> @@ -181,7 +180,11 @@ static void rem_ref(struct iw_cm_id *cm_
>  {
>     struct iwcm_id_private *cm_id_priv;
>     cm_id_priv = container_of(cm_id, struct iwcm_id_private, id);
> -   iwcm_deref_id(cm_id_priv);
> +   if (iwcm_deref_id(cm_id_priv) &&
> +       test_bit(IWCM_F_CALLBACK_DESTROY, &cm_id_priv->flags)) {
> +      BUG_ON(!list_empty(&cm_id_priv->work_list));
> +      free_cm_id(cm_id_priv);
> +   }
>  }
>
>  static int cm_event_handler(struct iw_cm_id *cm_id, struct
> iw_cm_event *event);
> @@ -355,7 +358,9 @@ static void destroy_cm_id(struct iw_cm_i
>     case IW_CM_STATE_CONN_RECV:
>        /*
>         * App called destroy before/without calling accept after
> -       * receiving connection request event notification.
> +       * receiving connection request event notification or
> +       * returned non zero from the event callback function.
> +       * In either case, must tell the provider to reject.
>         */
>        cm_id_priv->state = IW_CM_STATE_DESTROYING;
>        break;
> @@ -391,9 +396,7 @@ void iw_destroy_cm_id(struct iw_cm_id *c
>
>     wait_for_completion(&cm_id_priv->destroy_comp);
>
> -   dealloc_work_entries(cm_id_priv);
> -
> -   kfree(cm_id_priv);
> +   free_cm_id(cm_id_priv);
>  }
>  EXPORT_SYMBOL(iw_destroy_cm_id);
>
> @@ -647,10 +650,11 @@ static void cm_conn_req_handler(struct i
>     /* Call the client CM handler */
>     ret = cm_id->cm_handler(cm_id, iw_event);
>     if (ret) {
> +      iw_cm_reject(cm_id, NULL, 0);
>        set_bit(IWCM_F_CALLBACK_DESTROY, &cm_id_priv->flags);
>        destroy_cm_id(cm_id);
>        if (atomic_read(&cm_id_priv->refcount)==0)
> -         kfree(cm_id);
> +         free_cm_id(cm_id_priv);
>     }
>
>  out:
> @@ -854,13 +858,12 @@ static void cm_work_handler(struct work_
>           destroy_cm_id(&cm_id_priv->id);
>        }
>        BUG_ON(atomic_read(&cm_id_priv->refcount)==0);
> -      if (iwcm_deref_id(cm_id_priv))
> -         return;
> -
> -      if (atomic_read(&cm_id_priv->refcount)==0 &&
> -          test_bit(IWCM_F_CALLBACK_DESTROY, &cm_id_priv->flags)) {
> -         dealloc_work_entries(cm_id_priv);
> -         kfree(cm_id_priv);
> +      if (iwcm_deref_id(cm_id_priv)) {
> +         if (test_bit(IWCM_F_CALLBACK_DESTROY,
> +                 &cm_id_priv->flags)) {
> +            BUG_ON(!list_empty(&cm_id_priv->work_list));
> +            free_cm_id(cm_id_priv);
> +         }
>           return;
>        }
>        spin_lock_irqsave(&cm_id_priv->lock, flags);
>
>
>
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
>
> To unsubscribe, please visit
http://openib.org/mailman/listinfo/openib-general
>
>


From devesh28 at gmail.com  Wed Feb 14 21:37:04 2007
From: devesh28 at gmail.com (Devesh Sharma)
Date: Thu, 15 Feb 2007 11:07:04 +0530
Subject: [openib-general] Immediate data question
In-Reply-To: <6.2.0.14.2.20070213125130.07f4dbf8@esmail.cup.hp.com>
References: <6C2C79E72C305246B504CBA17B5500C905DC04@mtlexch01.mtl.com>
	<349DCDA352EACF42A0C49FA6DCEA84035DF880@G3W0634.americas.hpqcorp.net>
	<adamz3pfym0.fsf@cisco.com>
	<349DCDA352EACF42A0C49FA6DCEA84035DFAFF@G3W0634.americas.hpqcorp.net>
	<adahctxeds8.fsf@cisco.com>
	<6.2.0.14.2.20070208131610.085c52f8@esmail.cup.hp.com>
	<349DCDA352EACF42A0C49FA6DCEA840362A87E@G3W0634.americas.hpqcorp.net>
	<309a667c0702112110h7a79961fv9c3cf46d4392e1d4@mail.gmail.com>
	<309a667c0702130537u35745e98y429d3d564fb093e9@mail.gmail.com>
	<6.2.0.14.2.20070213125130.07f4dbf8@esmail.cup.hp.com>
Message-ID: <309a667c0702142137p724172f5va93a0ef046a60483@mail.gmail.com>

On 2/14/07, Michael Krause <krause at cup.hp.com> wrote:
> At 05:37 AM 2/13/2007, Devesh Sharma wrote:
> >On 2/12/07, Devesh Sharma <devesh28 at gmail.com> wrote:
> >>On 2/10/07, Tang, Changqing <changquing.tang at hp.com> wrote:
> >> > > >
> >> > > >Not for the receiver, but the sender will be severely slowed down by
> >> > > >having to wait for the RNR timeouts.
> >> > >
> >> > > RNR = Receiver Not Ready so by definition, the data flow
> >> > > isn't going to
> >> > > progress until the receiver is ready to receive data.   If a
> >> > > receive QP
> >> > > enters RNR for a RC, then it is likely not progressing as
> >> > > desired.   RNR
> >> > > was initially put in place to enable a receiver to create
> >> > > back pressure to the sender without causing a fatal error
> >> > > condition.  It should rarely be entered and therefore should
> >> > > have negligible impact on overall performance however when a
> >> > > RNR occurs, no forward progress will occur so performance is
> >> > > essentially zero.
> >> >
> >> > Mike:
> >> >         I still do not quite understand this issue. I have two
> >> > situations that have RNR triggered.
> >> >
> >> > 1. process A and process B is connected with QP. A first post a send to
> >> > B, B does not post receive. Then A and B are doing a long time
> >> > RDMA_WRITE each other, A and B just check memory for the RDMA_WRITE
> >> > message. Finally B will post a receive. Does the first pending send in A
> >> > block all the later RDMA_WRITE ?
> >>According to IBTA spec HCA will process WR entries in strict order in
> >>which they are posted so the send will block all WR posted after this
> >>send, Until-unless HCA has multiple processing elements, I think even
> >>then processing order will be maintained by HCA
> >>  If not, since RNR is triggered
> >> > periodically till B post receive, does it affect the RDMA_WRITE
> >> > performance between A and B ?
> >> >
> >> > 2. extend above to three processes, A connect to B, B connect to C, so B
> >> > has two QPs, but one CQ.A posts a send to B, B does not post receive,
> >post ordering accross QP is not guaranteed hence presence of same CQ
> >or different CQ will not affect any thing.
> >> > rather B and C are doing a long time RDMA_WRITE,or send/recv. But B
> >If RDMA WRITE _on_ B, no effect on performance. If RDMA WRITE _on_ C,
I am sorry I have missed that in both cases same DMA channel is in use.
> >_may_ affect the performance, since load is on same HCA. In case of
> >Send/Recv again _may_ affect the performance, with the same reason.
>
> Seems orthogonal.  Any time h/w is shared, multiple flows will have an
> impact on one another.  That is why we have the different arbitration
> mechanisms to enable one to control that impact.
Please, can you explain it more clearly?
>
> >> > must sends RNR periodically to A, right?. So does the pending message
> >> > from A affects B's overall performance  between B and C ?
> >But RNR NAK is not for very long time.....possibly this performance
> >hit you will not be able to observe even. The moment rnr_counter
> >expires connection will be broken!
>
> Keep in mind the timeout can be infinite.  RNR NAK are not expected to be
> frequent so their performance impact was considered reasonable.
Thanks I missed that.
>
> Mike
>
> >> >
> >> >         Thank you.
> >> >
> >> > --CQ
> >> >
> >> >
> >> > >
> >> > > Mike
> >> > >
> >> > >
> >> > >
> >> >
> >> > _______________________________________________
> >> > openib-general mailing list
> >> > openib-general at openib.org
> >> > http://openib.org/mailman/listinfo/openib-general
> >> >
> >> > To unsubscribe, please visit
> >> http://openib.org/mailman/listinfo/openib-general
> >> >
> >> >
>
>
>


From mst at mellanox.co.il  Wed Feb 14 21:57:51 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Thu, 15 Feb 2007 07:57:51 +0200
Subject: [openib-general] 32-bit build for ppc64 is required
In-Reply-To: <1171477762.3161.105.camel@fc6.xsintricity.com>
References: <1171477762.3161.105.camel@fc6.xsintricity.com>
Message-ID: <20070215055751.GA11866@mellanox.co.il>

> Quoting Doug Ledford <dledford at redhat.com>:
> Subject: Re: [openib-general] 32-bit build for ppc64 is required
> 
> On Wed, 2007-02-14 at 16:29 +0200, Michael S. Tsirkin wrote:
> > > Quoting Stefan Roscher <ossrosch at linux.vnet.ibm.com>:
> > > Subject: Re: 32-bit build for ppc64 is required
> > > 
> > > On Wednesday 14 February 2007 14:29, Michael S. Tsirkin wrote:
> > > > > Quoting Stefan Roscher <ossrosch at linux.vnet.ibm.com>:
> > > > > Subject: 32-bit build for ppc64 is required
> > > > > 
> > > > > Hi,
> > > > > 
> > > > > after building the latest ofed build package we recognized that on PPC64 only
> > > > > 64-bit libaries were build.
> > > > > Because we have customers using older userpace apllications which are
> > > > > certified for 32-bit we think additional 32bit support is a requirement for 64bit builds.
> > > > > 
> > > > > If OFED 1.2 supports 32 bit on ppc64, we have to change the install
> > > > > directory.I would suggest to install 32-bit binaries into
> > > > > /usr/local/ofed/bin32 directory. So no changes on current naming conventions
> > > > > has to be done.The libaries are installed in the /usr/local/ofed/lib directory.
> > > > 
> > > > The standard practice is to install 64 bit libraries under prefix/lib64
> > > > and 32 bit libraries under prefix/lib. Why would PPC64 be any different?
> > > 
> > > I think you missunderstand my post. The directory for 32/64bit libaries
> > > shouldbe prefix/lib and prefix/lib64 respectively. 
> > > But current ofed1.2 I saw only prefix/lib64 directory, ie 64bit libs only.  
> > 
> > Well, this is not by design: AFAIK on x86_64 both types of libraries
> > are installed.
> > 
> > > > I do not think we need 32 bit binaries at all, and there's no other package
> > > > I'm aware of that uses "bin32".
> > > 
> > > We have customers that still use 32-bit userspace applications. 
> > > It would be beneficial for them if they can obtain 32bit libs and execs from
> > > ofed1.2 in order to run their applications without recompiling them, because
> > > for some 32-bit applications recompiling is not an option.
> > 
> > 32 bit libraries are needed for users to run 32 applications.
> > 
> > But I still do not see how installing 32 bit binaries alongside the 64
> > bit ones is useful, and I do not think other packages provide this option,
> > so maybe we shouldn't, either.
> 
> The choice of 32/64 bit default is done on a per arch basis.  With
> x86_64/i386, the increased number of CPU registers in 64bit mode
> outweighs the increased code bloat that goes along with 64bit mode.  On
> PPC, no such register benefit exists for 64bit mode.  As such, 32bit
> apps on PPC are faster than the equivalent 64bit apps up to the point at
> which a 4GB address space becomes a problem.  Correspondingly, the
> default binaries on PPC are 32bit, and only those that *need* to be
> 64bit are.  While a customer's application may need >4GB address space,
> certainly all the ibutils, diags, opensm, etc. do not.  As a result, we
> compile all of those utilities as 32bit by default on PPC.  We also ship
> all the libs as both 32/64bit so users can select the appropriate
> environment for their particular application (with the exception of
> dapl, which doesn't support 32bit and for which I filed a bug around the
> time of OFED 1.1).

So, what you suggest is - build 2 types of libraries, but on PPC make
binaries 32 bit? That's easy - do others agree to this approach?

Another option is to build binaries with whatever type of binary
gcc without extra flags generates by default.


-- 
MST


From mst at mellanox.co.il  Wed Feb 14 21:58:58 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Thu, 15 Feb 2007 07:58:58 +0200
Subject: [openib-general] [PATCH] IPoIB: Only allow root to change
 between datagram and connected mode
In-Reply-To: <ada4ppo1j2b.fsf@cisco.com>
References: <ada4ppo1j2b.fsf@cisco.com>
Message-ID: <20070215055858.GB11866@mellanox.co.il>

> Quoting Roland Dreier <rdreier at cisco.com>:
> Subject: [PATCH] IPoIB: Only allow root to change between datagram and connected mode
> 
> Change the permissions of the "mode" sysfs attribute to be S_IWUSR
> instead of S_IWUGO.
> 
> Signed-off-by: Roland Dreier <rolandd at cisco.com>
> ---
> FYI -- I'm planning to merge this for 2.6.21.  It doesn't seem
> appropriate to allow ordinary users to mess with this sort of config.

Acked-by: Michael S. Tsirkin <mst at mellanox.co.il>


-- 
MST


From erezz at voltaire.com  Wed Feb 14 22:33:13 2007
From: erezz at voltaire.com (Erez Zilber)
Date: Thu, 15 Feb 2007 08:33:13 +0200
Subject: [openib-general] OFED 1.2 alpha release
In-Reply-To: <45D337E2.200@mellanox.co.il>
References: <45D337E2.200@mellanox.co.il>
Message-ID: <45D3FEA9.9020802@voltaire.com>

Tziporet Koren wrote:
> Hi,
>
> In two weeks delay we publish OFED 1.2-alpha1 on 
> http://www.openfabrics.org/builds/ofed-1.2/
> File: OFED-1.2-alpha1.tgz
> BUILD_ID contains info on all packages sources location.
>
> Please report any issues in bugzilla https://bugs.openfabrics.org/
>
> Tziporet & Vlad
>
> *_OS support:_*
> Novell:
>     - SLES 9.0 SP3
>     - SLES10
> Redhat:
>     - Redhat EL4 up4
>     - Redhat EL5 beta2 (only partially tested)
> kernel.org:
>     - 2.6.20
>     - 2.6.19
>
> Note: Redhat EL4 up3, Fedora C4, Fedora C6 and SuSE Pro 10 are not
> part of the official list.
> We keep the backport patches for these OSes and make sure OFED compile
> and loaded properly but will not do full QA cycle.
>
> _*Systems:*_
>     * x86_64
>     * x86
>     * ia64
>     * ppc64 (have not tested user space)
>
> _*Main changes from OFED-1.1:*_
>
>    1. iWRAP is now supported with Chelsio T3
>    2. New kernel modules: VNIC, RDS, Bonding, SA cache,
>    3. New packages: MVAPICH2
>    4. IPoIB Connected mode
>    5. Multicast join from user space
>    6. libibverbs 1.1
>    7. OpenSM new routing models: FAT tree routing and Taurus routing
>    8. GUI tool for network diagnostic
>    9. New MPI releases: MVAPICH: version 0.9.9, Open MPI: version 1.2,
>       MVAPICH2: version 0.9.8
>
> Detailed list of changes can be found in:
> https://wiki.openfabrics.org/tiki-index.php?page=OFED+1.2+release+plan+and+features
>
> _*Limitations and known issues:*_
>
>    1. ipath driver compilation fails on all systems, except for kernel
>       2.6.20
>    2. libipathverbs  is not working with libibverbs 1.1
>    3. SDP netstat does not available on RHEL5 (due to compilation errors)
>    4. Routing table problem in SLES10 when using port #2
>    5. RDS compiles only on kernel 2.6.18/19/20
>    6. MVAPICH2 installation fails on SuSE Pro 10.
>    7. mstflint is not working on ppc64
>    8. RDS was not tested
>

One more limitation - open-iscsi over iSER is currently supported on
SLES 10, RHEL5 beta 2 & 2.6.20 (naturally). Other distros/kernels are
not supported.

Erez


From ogerlitz at voltaire.com  Wed Feb 14 23:03:41 2007
From: ogerlitz at voltaire.com (Or Gerlitz)
Date: Thu, 15 Feb 2007 09:03:41 +0200
Subject: [openib-general] please pull for 2.6.21: fix + add IB multicast
 support
In-Reply-To: <45CEFCA8.4000008@voltaire.com>
References: <000701c741a6$16dc4760$ff0da8c0@amr.corp.intel.com>
	<45BF8E17.2010805@ichips.intel.com> <45C37BE9.5040105@ichips.intel.com>
	<45C85B39.4080700@voltaire.com> <45CB3537.8060508@voltaire.com>
	<adaodo4nqz7.fsf@cisco.com> <45CEFCA8.4000008@voltaire.com>
Message-ID: <45D405CD.9020606@voltaire.com>

Or Gerlitz wrote:
> Roland Dreier wrote:
>> I merged the "increment port number" and "remove redundant '_wq'"
>> patches from git.openfabrics.org/~shefty/scm/rdma-dev.git for-roland
>>
>> I plan to review to multicast stuff next week and I hope to merge it
>> for 2.6.21.  Or, have you or anyone else at Voltaire read over the
>> code in addition to using it?  Do you see anything that should be
>> cleaned up?
> 
> OK, I spent some time today on reviewing and playing with the ib_sa: 
> track multicast join/leave requests patch - and have no special 
> comments. I think the two patches are ready for merge, let me know if 
> you have any specific question.

Roland - any progress here?

Or.


From mst at mellanox.co.il  Wed Feb 14 23:15:37 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Thu, 15 Feb 2007 09:15:37 +0200
Subject: [openib-general] Fwd: [ANNOUNCE] GIT 1.5.0
Message-ID: <20070215071537.GD11866@mellanox.co.il>

FYI.
I suggest we update git on the openfabrics server to 1.5.0:
"Detached HEAD" feature will be useful for nightly build scripts.
Sasha?

----- Forwarded message from Junio C Hamano <junkio at cox.net> -----

Subject: [ANNOUNCE] GIT 1.5.0
Date: Wed, 14 Feb 2007 05:14:16 +0200
From: Junio C Hamano <junkio at cox.net>


The latest feature release GIT 1.5.0 is available at the usual places:

  http://www.kernel.org/pub/software/scm/git/

  git-1.5.0.tar.{gz,bz2}			(tarball)
  git-htmldocs-1.5.0.tar.{gz,bz2}		(preformatted docs)
  git-manpages-1.5.0.tar.{gz,bz2}		(preformatted docs)
  RPMS/$arch/git-*-1.5.0-1.$arch.rpm	(RPM)

----------------------------------------------------------------

GIT v1.5.0 Release Notes
========================

Old news
--------

This section is for people who are upgrading from ancient
versions of git.  Although all of the changes in this section
happened before the current v1.4.4 release, they are summarized
here in the v1.5.0 release notes for people who skipped earlier
versions.

As of git v1.5.0 there are some optional features that changes
the repository to allow data to be stored and transferred more
efficiently.  These features are not enabled by default, as they
will make the repository unusable with older versions of git.
Specifically, the available options are:

 - There is a configuration variable core.legacyheaders that
   changes the format of loose objects so that they are more
   efficient to pack and to send out of the repository over git
   native protocol, since v1.4.2.  However, loose objects
   written in the new format cannot be read by git older than
   that version; people fetching from your repository using
   older clients over dumb transports (e.g. http) using older
   versions of git will also be affected.

 - Since v1.4.3, configuration repack.usedeltabaseoffset allows
   packfile to be created in more space efficient format, which
   cannot be read by git older than that version.

The above two are not enabled by default and you explicitly have
to ask for them, because these two features make repositories
unreadable by older versions of git, and in v1.5.0 we still do
not enable them by default for the same reason.  We will change
this default probably 1 year after 1.4.2's release, when it is
reasonable to expect everybody to have new enough version of
git.

 - 'git pack-refs' appeared in v1.4.4; this command allows tags
   to be accessed much more efficiently than the traditional
   'one-file-per-tag' format.  Older git-native clients can
   still fetch from a repository that packed and pruned refs
   (the server side needs to run the up-to-date version of git),
   but older dumb transports cannot.  Packing of refs is done by
   an explicit user action, either by use of "git pack-refs
   --prune" command or by use of "git gc" command.

 - 'git -p' to paginate anything -- many commands do pagination
   by default on a tty.  Introduced between v1.4.1 and v1.4.2;
   this may surprise old timers.

 - 'git archive' superseded 'git tar-tree' in v1.4.3;

 - 'git cvsserver' was new invention in v1.3.0;

 - 'git repo-config', 'git grep', 'git rebase' and 'gitk' were
   seriously enhanced during v1.4.0 timeperiod.

 - 'gitweb' became part of git.git during v1.4.0 timeperiod and
   seriously modified since then.

 - reflog is an v1.4.0 invention.  This allows you to name a
   revision that a branch used to be at (e.g. "git diff
   master@{yesterday} master" allows you to see changes since
   yesterday's tip of the branch).


Updates in v1.5.0 since v1.4.4 series
-------------------------------------

* Index manipulation

 - git-add is to add contents to the index (aka "staging area"
   for the next commit), whether the file the contents happen to
   be is an existing one or a newly created one.

 - git-add without any argument does not add everything
   anymore.  Use 'git-add .' instead.  Also you can add
   otherwise ignored files with an -f option.

 - git-add tries to be more friendly to users by offering an
   interactive mode ("git-add -i").

 - git-commit <path> used to refuse to commit if <path> was
   different between HEAD and the index (i.e. update-index was
   used on it earlier).  This check was removed.

 - git-rm is much saner and safer.  It is used to remove paths
   from both the index file and the working tree, and makes sure
   you are not losing any local modification before doing so.

 - git-reset <tree> <paths>... can be used to revert index
   entries for selected paths.

 - git-update-index is much less visible.  Many suggestions to
  use the command in git output and documentation have now been
  replaced by simpler commands such as "git add" or "git rm".


* Repository layout and objects transfer

 - The data for origin repository is stored in the configuration
   file $GIT_DIR/config, not in $GIT_DIR/remotes/, for newly
   created clones.  The latter is still supported and there is
   no need to convert your existing repository if you are
   already comfortable with your workflow with the layout.

 - git-clone always uses what is known as "separate remote"
   layout for a newly created repository with a working tree.

   A repository with the separate remote layout starts with only
   one default branch, 'master', to be used for your own
   development.  Unlike the traditional layout that copied all
   the upstream branches into your branch namespace (while
   renaming their 'master' to your 'origin'), the new layout
   puts upstream branches into local "remote-tracking branches"
   with their own namespace. These can be referenced with names
   such as "origin/$upstream_branch_name" and are stored in
   .git/refs/remotes rather than .git/refs/heads where normal
   branches are stored.

   This layout keeps your own branch namespace less cluttered,
   avoids name collision with your upstream, makes it possible
   to automatically track new branches created at the remote
   after you clone from it, and makes it easier to interact with
   more than one remote repository (you can use "git remote" to
   add other repositories to track).  There might be some
   surprises:

   * 'git branch' does not show the remote tracking branches.
     It only lists your own branches.  Use '-r' option to view
     the tracking branches.

   * If you are forking off of a branch obtained from the
     upstream, you would have done something like 'git branch
     my-next next', because traditional layout dropped the
     tracking branch 'next' into your own branch namespace.
     With the separate remote layout, you say 'git branch next
     origin/next', which allows you to use the matching name
     'next' for your own branch.  It also allows you to track a
     remote other than 'origin' (i.e. where you initially cloned
     from) and fork off of a branch from there the same way
     (e.g. "git branch mingw j6t/master").

   Repositories initialized with the traditional layout continue
   to work.

 - New branches that appear on the origin side after a clone is
   made are also tracked automatically.  This is done with an
   wildcard refspec "refs/heads/*:refs/remotes/origin/*", which
   older git does not understand, so if you clone with 1.5.0,
   you would need to downgrade remote.*.fetch in the
   configuration file to specify each branch you are interested
   in individually if you plan to fetch into the repository with
   older versions of git (but why would you?).

 - Similarly, wildcard refspec "refs/heads/*:refs/remotes/me/*"
   can be given to "git-push" command to update the tracking
   branches that is used to track the repository you are pushing
   from on the remote side.

 - git-branch and git-show-branch know remote tracking branches
   (use the command line switch "-r" to list only tracked branches).

 - git-push can now be used to delete a remote branch or a tag.
   This requires the updated git on the remote side (use "git
   push <remote> :refs/heads/<branch>" to delete "branch").

 - git-push more aggressively keeps the transferred objects
   packed.  Earlier we recommended to monitor amount of loose
   objects and repack regularly, but you should repack when you
   accumulated too many small packs this way as well.  Updated
   git-count-objects helps you with this.

 - git-fetch also more aggressively keeps the transferred objects
   packed.  This behavior of git-push and git-fetch can be
   tweaked with a single configuration transfer.unpacklimit (but
   usually there should not be any need for a user to tweak it).

 - A new command, git-remote, can help you manage your remote
   tracking branch definitions.

 - You may need to specify explicit paths for upload-pack and/or
   receive-pack due to your ssh daemon configuration on the
   other end.  This can now be done via remote.*.uploadpack and
   remote.*.receivepack configuration.


* Bare repositories

 - Certain commands change their behavior in a bare repository
   (i.e. a repository without associated working tree).  We use
   a fairly conservative heuristic (if $GIT_DIR is ".git", or
   ends with "/.git", the repository is not bare) to decide if a
   repository is bare, but "core.bare" configuration variable
   can be used to override the heuristic when it misidentifies
   your repository.

 - git-fetch used to complain updating the current branch but
   this is now allowed for a bare repository.  So is the use of
   'git-branch -f' to update the current branch.

 - Porcelain-ish commands that require a working tree refuses to
   work in a bare repository.


* Reflog

 - Reflog records the history from the view point of the local
   repository. In other words, regardless of the real history,
   the reflog shows the history as seen by one particular
   repository (this enables you to ask "what was the current
   revision in _this_ repository, yesterday at 1pm?").  This
   facility is enabled by default for repositories with working
   trees, and can be accessed with the "branch@{time}" and
   "branch@{Nth}" notation.

 - "git show-branch" learned showing the reflog data with the
   new -g option.  "git log" has -s option to view reflog
   entries in a more verbose manner.

 - git-branch knows how to rename branches and moves existing
   reflog data from the old branch to the new one.

 - In addition to the reflog support in v1.4.4 series, HEAD
   reference maintains its own log.  "HEAD@{5.minutes.ago}"
   means the commit you were at 5 minutes ago, which takes
   branch switching into account.  If you want to know where the
   tip of your current branch was at 5 minutes ago, you need to
   explicitly say its name (e.g. "master@{5.minutes.ago}") or
   omit the refname altogether i.e. "@{5.minutes.ago}".

 - The commits referred to by reflog entries are now protected
   against pruning.  The new command "git reflog expire" can be
   used to truncate older reflog entries and entries that refer
   to commits that have been pruned away previously with older
   versions of git.

   Existing repositories that have been using reflog may get
   complaints from fsck-objects and may not be able to run
   git-repack, if you had run git-prune from older git; please
   run "git reflog expire --stale-fix --all" first to remove
   reflog entries that refer to commits that are no longer in
   the repository when that happens.


* Crufts removal

 - We used to say "old commits are retrievable using reflog and
   'master@{yesterday}' syntax as long as you haven't run
   git-prune".  We no longer have to say the latter half of the
   above sentence, as git-prune does not remove things reachable
   from reflog entries.

 - 'git-prune' by default does not remove _everything_
   unreachable, as there is a one-day grace period built-in.

 - There is a toplevel garbage collector script, 'git-gc', that
   runs periodic cleanup functions, including 'git-repack -a -d',
   'git-reflog expire', 'git-pack-refs --prune', and 'git-rerere
   gc'.

 - The output from fsck ("fsck-objects" is called just "fsck"
   now, but the old name continues to work) was needlessly
   alarming in that it warned missing objects that are reachable
   only from dangling objects.  This has been corrected and the
   output is much more useful.


* Detached HEAD

 - You can use 'git-checkout' to check out an arbitrary revision
   or a tag as well, instead of named branches.  This will
   dissociate your HEAD from the branch you are currently on.

   A typical use of this feature is to "look around".  E.g.

	$ git checkout v2.6.16
	... compile, test, etc.
	$ git checkout v2.6.17
	... compile, test, etc.

 - After detaching your HEAD, you can go back to an existing
   branch with usual "git checkout $branch".  Also you can
   start a new branch using "git checkout -b $newbranch" to
   start a new branch at that commit.

 - You can even pull from other repositories, make merges and
   commits while your HEAD is detached.  Also you can use "git
   reset" to jump to arbitrary commit, while still keeping your
   HEAD detached.

   Going back to attached state (i.e. on a particular branch) by
   "git checkout $branch" can lose the current stat you arrived
   in these ways, and "git checkout" refuses when the detached
   HEAD is not pointed by any existing ref (an existing branch,
   a remote tracking branch or a tag).  This safety can be
   overridden with "git checkout -f $branch".


* Packed refs

 - Repositories with hundreds of tags have been paying large
   overhead, both in storage and in runtime, due to the
   traditional one-ref-per-file format.  A new command,
   git-pack-refs, can be used to "pack" them in more efficient
   representation (you can let git-gc do this for you).

 - Clones and fetches over dumb transports are now aware of
   packed refs and can download from repositories that use
   them.


* Configuration

 - configuration related to color setting are consolidated under
   color.* namespace (older diff.color.*, status.color.* are
   still supported).

 - 'git-repo-config' command is accessible as 'git-config' now.


* Updated features

 - git-describe uses better criteria to pick a base ref.  It
   used to pick the one with the newest timestamp, but now it
   picks the one that is topologically the closest (that is,
   among ancestors of commit C, the ref T that has the shortest
   output from "git-rev-list T..C" is chosen).

 - git-describe gives the number of commits since the base ref
   between the refname and the hash suffix.  E.g. the commit one
   before v2.6.20-rc6 in the kernel repository is:

	v2.6.20-rc5-306-ga21b069

   which tells you that its object name begins with a21b069,
   v2.6.20-rc5 is an ancestor of it (meaning, the commit
   contains everything -rc5 has), and there are 306 commits
   since v2.6.20-rc5.

 - git-describe with --abbrev=0 can be used to show only the
   name of the base ref.

 - git-blame learned a new option, --incremental, that tells it
   to output the blames as they are assigned.  A sample script
   to use it is also included as contrib/blameview.

 - git-blame starts annotating from the working tree by default.


* Less external dependency

 - We no longer require the "merge" program from the RCS suite.
   All 3-way file-level merges are now done internally.

 - The original implementation of git-merge-recursive which was
   in Python has been removed; we have a C implementation of it
   now.

 - git-shortlog is no longer a Perl script.  It no longer
   requires output piped from git-log; it can accept revision
   parameters directly on the command line.


* I18n

 - We have always encouraged the commit message to be encoded in
   UTF-8, but the users are allowed to use legacy encoding as
   appropriate for their projects.  This will continue to be the
   case.  However, a non UTF-8 commit encoding _must_ be
   explicitly set with i18n.commitencoding in the repository
   where a commit is made; otherwise git-commit-tree will
   complain if the log message does not look like a valid UTF-8
   string.

 - The value of i18n.commitencoding in the originating
   repository is recorded in the commit object on the "encoding"
   header, if it is not UTF-8.  git-log and friends notice this,
   and reencodes the message to the log output encoding when
   displaying, if they are different.  The log output encoding
   is determined by "git log --encoding=<encoding>",
   i18n.logoutputencoding configuration, or i18n.commitencoding
   configuration, in the decreasing order of preference, and
   defaults to UTF-8.

 - Tools for e-mailed patch application now default to -u
   behavior; i.e. it always re-codes from the e-mailed encoding
   to the encoding specified with i18n.commitencoding.  This
   unfortunately forces projects that have happily been using a
   legacy encoding without setting i18n.commitencoding to set
   the configuration, but taken with other improvement, please
   excuse us for this very minor one-time inconvenience.


* e-mailed patches

 - See the above I18n section.

 - git-format-patch now enables --binary without being asked.
   git-am does _not_ default to it, as sending binary patch via
   e-mail is unusual and is harder to review than textual
   patches and it is prudent to require the person who is
   applying the patch to explicitly ask for it.

 - The default suffix for git-format-patch output is now ".patch",
   not ".txt".  This can be changed with --suffix=.txt option,
   or setting the config variable "format.suffix" to ".txt".


* Foreign SCM interfaces

  - git-svn now requires the Perl SVN:: libraries, the
    command-line backend was too slow and limited.

  - the 'commit' subcommand of git-svn has been renamed to
    'set-tree', and 'dcommit' is the recommended replacement for
    day-to-day work.

  - git fast-import backend.


* User support

 - Quite a lot of documentation updates.

 - Bash completion scripts have been updated heavily.

 - Better error messages for often used Porcelainish commands.

 - Git GUI.  This is a simple Tk based graphical interface for
   common Git operations.


* Sliding mmap

 - We used to assume that we can mmap the whole packfile while
   in use, but with a large project this consumes huge virtual
   memory space and truly huge ones would not fit in the
   userland address space on 32-bit platforms.  We now mmap huge
   packfile in pieces to avoid this problem.


* Shallow clones

 - There is a partial support for 'shallow' repositories that
   keeps only recent history.  A 'shallow clone' is created by
   specifying how deep that truncated history should be
   (e.g. "git clone --depth=5 git://some.where/repo.git").

   Currently a shallow repository has number of limitations:

   - Cloning and fetching _from_ a shallow clone are not
     supported (nor tested -- so they might work by accident but
     they are not expected to).

   - Pushing from nor into a shallow clone are not expected to
     work.

   - Merging inside a shallow repository would work as long as a
     merge base is found in the recent history, but otherwise it
     will be like merging unrelated histories and may result in
     huge conflicts.

   but this would be more than adequate for people who want to
   look at near the tip of a big project with a deep history and
   send patches in e-mail format.

-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

----- End forwarded message -----

-- 
MST


From HNGUYEN at de.ibm.com  Wed Feb 14 23:40:09 2007
From: HNGUYEN at de.ibm.com (Hoang-Nam Nguyen)
Date: Thu, 15 Feb 2007 08:40:09 +0100
Subject: [openib-general] 32-bit build for ppc64 is required
In-Reply-To: <20070215055751.GA11866@mellanox.co.il>
Message-ID: <OF9910F87F.2CDEEBEE-ONC1257283.0027E408-C1257283.002A2129@de.ibm.com>

> So, what you suggest is - build 2 types of libraries, but on PPC make
> binaries 32 bit? That's easy - do others agree to this approach?
No, for execs please create 32bit and 64bit on PPC.
> Another option is to build binaries with whatever type of binary
> gcc without extra flags generates by default.
On PPC we really need the ability to build both versions. The reason is
simply that there're customers who want them. Why don't offer both
options, and each component owner can decide her/his default? And the
customers can pick the one(s) they like.
I see your point regarding QA effort. Is it really twice? My assumption
might be wrong: I guess we have to assure/test the 32/64bit compatibility
anyway eg. 32bit client talks to 64bit server.
If we have 32bit execs only for development resp. testing, why don't we
also give them to customers in order to do basic test or diagnosis of
their setup?
Regards
Nam


From mst at mellanox.co.il  Thu Feb 15 00:15:53 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Thu, 15 Feb 2007 10:15:53 +0200
Subject: [openib-general] 32-bit build for ppc64 is required
In-Reply-To: <OF9910F87F.2CDEEBEE-ONC1257283.0027E408-C1257283.002A2129@de.ibm.com>
References: <20070215055751.GA11866@mellanox.co.il>
	<OF9910F87F.2CDEEBEE-ONC1257283.0027E408-C1257283.002A2129@de.ibm.com>
Message-ID: <20070215081529.GG11866@mellanox.co.il>

> Quoting Hoang-Nam Nguyen <HNGUYEN at de.ibm.com>:
> Subject: Re: 32-bit build for ppc64 is required
> 
> > So, what you suggest is - build 2 types of libraries, but on PPC make
> > binaries 32 bit? That's easy - do others agree to this approach?
> No, for execs please create 32bit and 64bit on PPC.
>
> > Another option is to build binaries with whatever type of binary
> > gcc without extra flags generates by default.
> On PPC we really need the ability to build both versions. The reason is
> simply that there're customers who want them. Why don't offer both
> options, and each component owner can decide her/his default?

I don't think this can be elegantly dealt with on a per-component basis.
But if some component owner has an opinion on this, do speak up - note
this affects binaries only, not libraries such as libehca.

> And the
> customers can pick the one(s) they like.
> I see your point regarding QA effort. Is it really twice?

Probably more - I'm reasonably sure most scripts written so far
assume stuff is installed in prefix/bin, so testing harness etc
would need to be changed.

And how to make sure the *correct* set of binaries was actually
QA'd?

> My assumption
> might be wrong: I guess we have to assure/test the 32/64bit compatibility
> anyway eg. 32bit client talks to 64bit server.
> If we have 32bit execs only for development resp. testing, why don't we
> also give them to customers in order to do basic test or diagnosis of
> their setup?

Because of the confusion this would create.
For shared libraries, the 32/64 bit issues seem to be automatically figured out
by ld.so, but there's no such solution for binaries.

-- 
MST


From ogerlitz at voltaire.com  Thu Feb 15 01:25:13 2007
From: ogerlitz at voltaire.com (Or Gerlitz)
Date: Thu, 15 Feb 2007 11:25:13 +0200
Subject: [openib-general] IPv6 multicast address per NIC
Message-ID: <45D426F9.6060807@voltaire.com>

Hi,

I see that when IPv6 is enabled in the kernel, the stack joins for a 
--dedicated-- multicast group per each interface. Can anyone here supply 
me with a pointer to where this is defined, doing a quick look on rfc 
3307 did not provide an answer.

Or.

Below is the maddr show on a node with two partitions on ib0, note that 
the --pkey-- is not presented in the link addresses since IPoIB fill 
that in its own copy (i don't mind send a patch to fix that if anyone 
here think it is helpful).

$ ip maddr show

> 41:     ib0
>         link  00:ff:ff:ff:ff:12:40:1b:00:00:00:00:00:00:00:00:00:00:00:01
>         link  00:ff:ff:ff:ff:12:60:1b:00:00:00:00:00:00:00:01:ff:98:00:6d
>         link  00:ff:ff:ff:ff:12:60:1b:00:00:00:00:00:00:00:00:00:00:00:01
>         inet  224.0.0.1
>         inet6 ff02::1:ff98:6d
>         inet6 ff02::1
> 45:     ib0.8001
>         link  00:ff:ff:ff:ff:12:40:1b:00:00:00:00:00:00:00:00:00:00:00:01
>         link  00:ff:ff:ff:ff:12:60:1b:00:00:00:00:00:00:00:01:ff:98:00:6d
>         link  00:ff:ff:ff:ff:12:60:1b:00:00:00:00:00:00:00:00:00:00:00:01
>         inet  224.0.0.1
>         inet6 ff02::1:ff98:6d
>         inet6 ff02::1
> 46:     ib0.8003
>         link  00:ff:ff:ff:ff:12:40:1b:00:00:00:00:00:00:00:00:00:00:00:01
>         link  00:ff:ff:ff:ff:12:60:1b:00:00:00:00:00:00:00:01:ff:98:00:6d
>         link  00:ff:ff:ff:ff:12:60:1b:00:00:00:00:00:00:00:00:00:00:00:01
>         inet  224.0.0.1
>         inet6 ff02::1:ff98:6d
>         inet6 ff02::1
> 


From tziporet at mellanox.co.il  Thu Feb 15 01:43:02 2007
From: tziporet at mellanox.co.il (Tziporet Koren)
Date: Thu, 15 Feb 2007 11:43:02 +0200
Subject: [openib-general] [openfabrics-ewg]  OFED 1.2 alpha release
In-Reply-To: <A15335FBE9BD2449AF2C9EF3D1EB8EA303051A05@xmb-sjc-216.amer.cisco.com>
References: <45D337E2.200@mellanox.co.il>
	<A15335FBE9BD2449AF2C9EF3D1EB8EA303051A05@xmb-sjc-216.amer.cisco.com>
Message-ID: <45D42B26.10709@mellanox.co.il>

Scott Weitzenkamp (sweitzen) wrote:
> I don't remember discussing dropping RHEL4 U3, and would like to add 
> it back to the official list.  IPoIB multicast does not work correctly 
> (bug 266) in RHEL4 U4, thus RHEL4 U3 is the most recent working RHEL 
> release in this area (unless it has been fixed in U4 errata kernels).  
> The new ib-bonding RPM also says it only supports RHEL4 U3 for Red Hat 
> releases.
Its OK with me to add RHEL4 U3 for the official list.
All other partners - please approve
Regarding RHEL4 U4 and IPoIB bug - Or just prepared a patch that should 
fix it. We will merge it and test for the beta.
>  
> We should probably also plan for SLES10 SP1 support in OFED 1.2.
>  
This is the plan. We still don't have the backport patches for this 
kernel but they should be added for the beta.

Tziporet


From vlad at lists.openfabrics.org  Thu Feb 15 02:24:24 2007
From: vlad at lists.openfabrics.org (vlad at lists.openfabrics.org)
Date: Thu, 15 Feb 2007 02:24:24 -0800 (PST)
Subject: [openib-general] ofa_1_2_kernel 20070215-0200 daily build status
Message-ID: <20070215102425.2BAB0E6080F@openfabrics.org>

This email was generated automatically, please do not reply


Common build parameters:  --with-ipoib-mod --with-sdp-mod --with-srp-mod --with-user_mad-mod --with-user_access-mod --with-mthca-mod --with-core-mod --with-addr_trans-mod --with-cxgb3-mod 

Passed:
Passed on i686 with 2.6.15-23-server
Passed on i686 with linux-2.6.18
Passed on i686 with linux-2.6.17
Passed on i686 with linux-2.6.15
Passed on i686 with linux-2.6.16
Passed on i686 with linux-2.6.14
Passed on i686 with linux-2.6.12
Passed on i686 with linux-2.6.19
Passed on i686 with linux-2.6.13
Passed on powerpc with linux-2.6.19
Passed on ppc64 with linux-2.6.19
Passed on x86_64 with linux-2.6.20
Passed on powerpc with linux-2.6.18
Passed on x86_64 with linux-2.6.19
Passed on x86_64 with linux-2.6.16
Passed on x86_64 with linux-2.6.13
Passed on x86_64 with linux-2.6.18
Passed on x86_64 with linux-2.6.15
Passed on x86_64 with linux-2.6.14
Passed on x86_64 with linux-2.6.12
Passed on x86_64 with linux-2.6.17
Passed on powerpc with linux-2.6.17
Passed on powerpc with linux-2.6.15
Passed on ia64 with linux-2.6.12
Passed on powerpc with linux-2.6.14
Passed on ia64 with linux-2.6.14
Passed on ppc64 with linux-2.6.18
Passed on ppc64 with linux-2.6.17
Passed on powerpc with linux-2.6.13
Passed on ppc64 with linux-2.6.16
Passed on powerpc with linux-2.6.12
Passed on ppc64 with linux-2.6.13
Passed on ppc64 with linux-2.6.12
Passed on ppc64 with linux-2.6.14
Passed on powerpc with linux-2.6.16
Passed on ppc64 with linux-2.6.15
Passed on ia64 with linux-2.6.18
Passed on ia64 with linux-2.6.15
Passed on ia64 with linux-2.6.16
Passed on ia64 with linux-2.6.19
Passed on ia64 with linux-2.6.13
Passed on ia64 with linux-2.6.17

Failed:


From HNGUYEN at de.ibm.com  Thu Feb 15 03:41:51 2007
From: HNGUYEN at de.ibm.com (Hoang-Nam Nguyen)
Date: Thu, 15 Feb 2007 12:41:51 +0100
Subject: [openib-general] 32-bit build for ppc64 is required
In-Reply-To: <20070215081529.GG11866@mellanox.co.il>
Message-ID: <OFA2918CBA.71952EE8-ONC1257283.0038B7C3-C1257283.004041ED@de.ibm.com>

> > And the
> > customers can pick the one(s) they like.
> > I see your point regarding QA effort. Is it really twice?
> Probably more - I'm reasonably sure most scripts written so far
> assume stuff is installed in prefix/bin, so testing harness etc
> would need to be changed.
> And how to make sure the *correct* set of binaries was actually
> QA'd?
As far as I understood each component owner is responsible for
QA of her/his component and supported platforms. For PPC we cover that.
> > If we have 32bit execs only for development resp. testing, why don't we
> > also give them to customers in order to do basic test or diagnosis of
> > their setup?
> Because of the confusion this would create.
> For shared libraries, the 32/64 bit issues seem to be automatically
> figured out
> by ld.so, but there's no such solution for binaries.
This is true as we have discussed at ofed-1.1. See also
http://openib.org/pipermail/openfabrics-ewg/2006-October/001831.html
I agree with you in that if there is a standard for binaries dir
struct let's go for it. If there is no such one, let's agree on
one approach: either bin32 resp bin or appl resp appl64 or...
To me it's worse if customers have to fix or write build scripts by
themselves in order to build 32bit binaries.

Nam


From halr at voltaire.com  Thu Feb 15 04:02:20 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 15 Feb 2007 07:02:20 -0500
Subject: [openib-general] IPv6 multicast address per NIC
In-Reply-To: <45D426F9.6060807@voltaire.com>
References: <45D426F9.6060807@voltaire.com>
Message-ID: <1171540918.22446.171829.camel@hal.voltaire.com>

Or,

On Thu, 2007-02-15 at 04:25, Or Gerlitz wrote:
> Hi,
> 
> I see that when IPv6 is enabled in the kernel, the stack joins for a 
> --dedicated-- multicast group per each interface. Can anyone here supply 
> me with a pointer to where this is defined, doing a quick look on rfc 
> 3307 did not provide an answer.

You are referring to the solicited-node multicast address (see RFC
4291). There have been several different threads on issues relating to
this on this list over time.

-- Hal

> Or.
> 
> Below is the maddr show on a node with two partitions on ib0, note that 
> the --pkey-- is not presented in the link addresses since IPoIB fill 
> that in its own copy (i don't mind send a patch to fix that if anyone 
> here think it is helpful).
> 
> $ ip maddr show
> 
> > 41:     ib0
> >         link  00:ff:ff:ff:ff:12:40:1b:00:00:00:00:00:00:00:00:00:00:00:01
> >         link  00:ff:ff:ff:ff:12:60:1b:00:00:00:00:00:00:00:01:ff:98:00:6d
> >         link  00:ff:ff:ff:ff:12:60:1b:00:00:00:00:00:00:00:00:00:00:00:01
> >         inet  224.0.0.1
> >         inet6 ff02::1:ff98:6d
> >         inet6 ff02::1
> > 45:     ib0.8001
> >         link  00:ff:ff:ff:ff:12:40:1b:00:00:00:00:00:00:00:00:00:00:00:01
> >         link  00:ff:ff:ff:ff:12:60:1b:00:00:00:00:00:00:00:01:ff:98:00:6d
> >         link  00:ff:ff:ff:ff:12:60:1b:00:00:00:00:00:00:00:00:00:00:00:01
> >         inet  224.0.0.1
> >         inet6 ff02::1:ff98:6d
> >         inet6 ff02::1
> > 46:     ib0.8003
> >         link  00:ff:ff:ff:ff:12:40:1b:00:00:00:00:00:00:00:00:00:00:00:01
> >         link  00:ff:ff:ff:ff:12:60:1b:00:00:00:00:00:00:00:01:ff:98:00:6d
> >         link  00:ff:ff:ff:ff:12:60:1b:00:00:00:00:00:00:00:00:00:00:00:01
> >         inet  224.0.0.1
> >         inet6 ff02::1:ff98:6d
> >         inet6 ff02::1
> > 
> 


From ogerlitz at voltaire.com  Thu Feb 15 05:08:28 2007
From: ogerlitz at voltaire.com (Or Gerlitz)
Date: Thu, 15 Feb 2007 15:08:28 +0200
Subject: [openib-general] IPv6 multicast address per NIC
In-Reply-To: <1171540918.22446.171829.camel@hal.voltaire.com>
References: <45D426F9.6060807@voltaire.com>
	<1171540918.22446.171829.camel@hal.voltaire.com>
Message-ID: <45D45B4C.10702@voltaire.com>

Hal Rosenstock wrote:
> Or,
> 
> On Thu, 2007-02-15 at 04:25, Or Gerlitz wrote:
>> Hi,
>>
>> I see that when IPv6 is enabled in the kernel, the stack joins for a 
>> --dedicated-- multicast group per each interface. Can anyone here supply 
>> me with a pointer to where this is defined, doing a quick look on rfc 
>> 3307 did not provide an answer.
> 
> You are referring to the solicited-node multicast address (see RFC
> 4291). There have been several different threads on issues relating to
> this on this list over time.

thanks for the pointer, i will look into that.

Or.


From swise at opengridcomputing.com  Thu Feb 15 06:09:36 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Thu, 15 Feb 2007 08:09:36 -0600
Subject: [openib-general] [PATCH] 2.6.21 iwcm - iw_cm_id destruction race
 condition fixes.
Message-ID: <1171548576.12187.2.camel@stevo-desktop>


From: Steve Wise <swise at opengridcomputing.com>

iwcm iw_cm_id destruction race condition fixes.

Several changes:

- iwcm_deref_id() always wakes up if there's another reference.

- clean up race condition in cm_work_handler().

- create static void free_cm_id() which deallocs the work entries and then
  kfrees the cm_id memory.  This reduces code replication.

- rem_ref() if this is the last reference -and- the IWCM owns freeing the
  cm_id, then free it.

Signed-off-by: Steve Wise <swise at opengridcomputing.com>
Signed-off-by: Tom Tucker <tom at opengridcomputing.com>
Acked-by: Krishna Kumar <krkumar2 at in.ibm.com>
---

 drivers/infiniband/core/iwcm.c |   47 +++++++++++++++++++++-------------------
 1 files changed, 25 insertions(+), 22 deletions(-)

diff --git a/drivers/infiniband/core/iwcm.c b/drivers/infiniband/core/iwcm.c
index 1039ad5..891d1fa 100644
--- a/drivers/infiniband/core/iwcm.c
+++ b/drivers/infiniband/core/iwcm.c
@@ -146,6 +146,12 @@ static int copy_private_data(struct iw_c
 	return 0;
 }
 
+static void free_cm_id(struct iwcm_id_private *cm_id_priv)
+{
+	dealloc_work_entries(cm_id_priv);
+	kfree(cm_id_priv);
+}
+
 /*
  * Release a reference on cm_id. If the last reference is being
  * released, enable the waiting thread (in iw_destroy_cm_id) to
@@ -153,21 +159,14 @@ static int copy_private_data(struct iw_c
  */
 static int iwcm_deref_id(struct iwcm_id_private *cm_id_priv)
 {
-	int ret = 0;
-
 	BUG_ON(atomic_read(&cm_id_priv->refcount)==0);
 	if (atomic_dec_and_test(&cm_id_priv->refcount)) {
 		BUG_ON(!list_empty(&cm_id_priv->work_list));
-		if (waitqueue_active(&cm_id_priv->destroy_comp.wait)) {
-			BUG_ON(cm_id_priv->state != IW_CM_STATE_DESTROYING);
-			BUG_ON(test_bit(IWCM_F_CALLBACK_DESTROY,
-					&cm_id_priv->flags));
-			ret = 1;
-		}
 		complete(&cm_id_priv->destroy_comp);
+		return 1;
 	}
 
-	return ret;
+	return 0;
 }
 
 static void add_ref(struct iw_cm_id *cm_id)
@@ -181,7 +180,11 @@ static void rem_ref(struct iw_cm_id *cm_
 {
 	struct iwcm_id_private *cm_id_priv;
 	cm_id_priv = container_of(cm_id, struct iwcm_id_private, id);
-	iwcm_deref_id(cm_id_priv);
+	if (iwcm_deref_id(cm_id_priv) &&
+	    test_bit(IWCM_F_CALLBACK_DESTROY, &cm_id_priv->flags)) {
+		BUG_ON(!list_empty(&cm_id_priv->work_list));
+		free_cm_id(cm_id_priv);
+	}
 }
 
 static int cm_event_handler(struct iw_cm_id *cm_id, struct iw_cm_event *event);
@@ -355,7 +358,9 @@ static void destroy_cm_id(struct iw_cm_i
 	case IW_CM_STATE_CONN_RECV:
 		/*
 		 * App called destroy before/without calling accept after
-		 * receiving connection request event notification.
+		 * receiving connection request event notification or
+		 * returned non zero from the event callback function.
+		 * In either case, must tell the provider to reject.
 		 */
 		cm_id_priv->state = IW_CM_STATE_DESTROYING;
 		break;
@@ -391,9 +396,7 @@ void iw_destroy_cm_id(struct iw_cm_id *c
 
 	wait_for_completion(&cm_id_priv->destroy_comp);
 
-	dealloc_work_entries(cm_id_priv);
-
-	kfree(cm_id_priv);
+	free_cm_id(cm_id_priv);
 }
 EXPORT_SYMBOL(iw_destroy_cm_id);
 
@@ -647,10 +650,11 @@ static void cm_conn_req_handler(struct i
 	/* Call the client CM handler */
 	ret = cm_id->cm_handler(cm_id, iw_event);
 	if (ret) {
+		iw_cm_reject(cm_id, NULL, 0);
 		set_bit(IWCM_F_CALLBACK_DESTROY, &cm_id_priv->flags);
 		destroy_cm_id(cm_id);
 		if (atomic_read(&cm_id_priv->refcount)==0)
-			kfree(cm_id);
+			free_cm_id(cm_id_priv);
 	}
 
 out:
@@ -854,13 +858,12 @@ static void cm_work_handler(struct work_
 			destroy_cm_id(&cm_id_priv->id);
 		}
 		BUG_ON(atomic_read(&cm_id_priv->refcount)==0);
-		if (iwcm_deref_id(cm_id_priv))
-			return;
-
-		if (atomic_read(&cm_id_priv->refcount)==0 &&
-		    test_bit(IWCM_F_CALLBACK_DESTROY, &cm_id_priv->flags)) {
-			dealloc_work_entries(cm_id_priv);
-			kfree(cm_id_priv);
+		if (iwcm_deref_id(cm_id_priv)) {
+			if (test_bit(IWCM_F_CALLBACK_DESTROY,
+				     &cm_id_priv->flags)) {
+				BUG_ON(!list_empty(&cm_id_priv->work_list));
+				free_cm_id(cm_id_priv);
+			}
 			return;
 		}
 		spin_lock_irqsave(&cm_id_priv->lock, flags);


From swise at opengridcomputing.com  Thu Feb 15 06:49:02 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Thu, 15 Feb 2007 08:49:02 -0600
Subject: [openib-general] [PATCH] 2.6.21 iw_cxgb3 Fail posts synchronously
 when in TERMINATE state.
Message-ID: <1171550942.13282.5.camel@stevo-desktop>

From: Steve Wise <swise at opengridcomputing.com>

Fail posts synchronously when in TERMINATE state.

For T3B devices, mark user qp in error once we transition
to TERMINATE.

Signed-off-by: Steve Wise <swise at opengridcomputing.com>
---

 drivers/infiniband/hw/cxgb3/iwch_qp.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/drivers/infiniband/hw/cxgb3/iwch_qp.c b/drivers/infiniband/hw/cxgb3/iwch_qp.c
index e066727..da13a38 100644
--- a/drivers/infiniband/hw/cxgb3/iwch_qp.c
+++ b/drivers/infiniband/hw/cxgb3/iwch_qp.c
@@ -846,6 +846,8 @@ int iwch_modify_qp(struct iwch_dev *rhp,
 			break;
 		case IWCH_QP_STATE_TERMINATE:
 			qhp->attr.state = IWCH_QP_STATE_TERMINATE;
+			if (t3b_device(qhp->rhp))
+				cxio_set_wq_in_error(&qhp->wq);
 			if (!internal)
 				terminate = 1;
 			break;


From swise at opengridcomputing.com  Thu Feb 15 06:50:38 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Thu, 15 Feb 2007 08:50:38 -0600
Subject: [openib-general] [PATCH] ofed_1_2 iw_cxgb3 Fail posts synchronously
 when in TERMINATE state.
Message-ID: <1171551038.13282.6.camel@stevo-desktop>


Fail posts synchronously when in TERMINATE state.

For T3B devices, mark user qp in error once we transition
to TERMINATE.

Signed-off-by: Steve Wise <swise at opengridcomputing.com>
---

 drivers/infiniband/hw/cxgb3/iwch_qp.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/drivers/infiniband/hw/cxgb3/iwch_qp.c b/drivers/infiniband/hw/cxgb3/iwch_qp.c
index ad044bd..9cc8b5e 100644
--- a/drivers/infiniband/hw/cxgb3/iwch_qp.c
+++ b/drivers/infiniband/hw/cxgb3/iwch_qp.c
@@ -846,6 +846,8 @@ int iwch_modify_qp(struct iwch_dev *rhp,
 			break;
 		case IWCH_QP_STATE_TERMINATE:
 			qhp->attr.state = IWCH_QP_STATE_TERMINATE;
+			if (t3b_device(qhp->rhp))
+				cxio_set_wq_in_error(&qhp->wq);
 			if (!internal)
 				terminate = 1;
 			break;


From todd.rimmer at qlogic.com  Thu Feb 15 07:20:59 2007
From: todd.rimmer at qlogic.com (Todd Rimmer)
Date: Thu, 15 Feb 2007 09:20:59 -0600
Subject: [openib-general] IPv6 multicast address per NIC
In-Reply-To: <45D426F9.6060807@voltaire.com>
Message-ID: <4FB1BCCAE6CAED44A1DC005B1DE061191A5FAF@EPEXCH2.qlogic.org>

> From: Or Gerlitz
> Sent: Thursday, February 15, 2007 4:25 AM
> To: Roland Dreier; Hal Rosenstock; openib
> Subject: [openib-general] IPv6 multicast address per NIC
> 
> Hi,
> 
> I see that when IPv6 is enabled in the kernel, the stack joins for a
> --dedicated-- multicast group per each interface. Can anyone here
supply
> me with a pointer to where this is defined, doing a quick look on rfc
> 3307 did not provide an answer.
> 
RFC 2373 defined an IPv6 Solicited Node multicast address which is based
on the IPv6 address of the Node.  Each node supports a unique multicast
(in addition to the assorted multicast addresses for all nodes, all
routers, etc).

>From RFC 2373:
      Solicited-Node Address:  FF02:0:0:0:0:1:FFXX:XXXX

   The above multicast address is computed as a function of a node's
   unicast and anycast addresses.  The solicited-node multicast address
   is formed by taking the low-order 24 bits of the address (unicast or
   anycast) and appending those bits to the prefix
   FF02:0:0:0:0:1:FF00::/104 resulting in a multicast address in the
   range

      FF02:0:0:0:0:1:FF00:0000

   to

      FF02:0:0:0:0:1:FFFF:FFFF

   For example, the solicited node multicast address corresponding to
   the IPv6 address 4037::01:800:200E:8C6C is FF02::1:FF0E:8C6C.  IPv6
   addresses that differ only in the high-order bits, e.g. due to
   multiple high-order prefixes associated with different aggregations,
   will map to the same solicited-node address thereby reducing the
   number of multicast addresses a node must join.

   A node is required to compute and join the associated Solicited-Node
   multicast addresses for every unicast and anycast address it is
   assigned.

Todd Rimmer


From hnguyen at linux.vnet.ibm.com  Thu Feb 15 07:28:35 2007
From: hnguyen at linux.vnet.ibm.com (Hoang-Nam Nguyen)
Date: Thu, 15 Feb 2007 16:28:35 +0100
Subject: [openib-general] [PATCH 2.6.21-rc1 1/5] ehca: reworked irq
 handler to avoid/reduce missed irq events
In-Reply-To: <adafy981jgc.fsf@cisco.com>
References: <200702141740.48286.hnguyen@linux.vnet.ibm.com>
	<adafy981jgc.fsf@cisco.com>
Message-ID: <200702151628.35483.hnguyen@linux.vnet.ibm.com>

> Looks fine but this patch at least has serious whitespace
> damage... please resend a fixed version.
Sorry for this. Resending the patches 1-5.

Nam


From dledford at redhat.com  Thu Feb 15 07:12:15 2007
From: dledford at redhat.com (Doug Ledford)
Date: Thu, 15 Feb 2007 10:12:15 -0500
Subject: [openib-general] 32-bit build for ppc64 is required
In-Reply-To: <20070215055751.GA11866@mellanox.co.il>
References: <1171477762.3161.105.camel@fc6.xsintricity.com>
	<20070215055751.GA11866@mellanox.co.il>
Message-ID: <1171552335.3161.128.camel@fc6.xsintricity.com>

On Thu, 2007-02-15 at 07:57 +0200, Michael S. Tsirkin wrote:

> > The choice of 32/64 bit default is done on a per arch basis.  With
> > x86_64/i386, the increased number of CPU registers in 64bit mode
> > outweighs the increased code bloat that goes along with 64bit mode.  On
> > PPC, no such register benefit exists for 64bit mode.  As such, 32bit
> > apps on PPC are faster than the equivalent 64bit apps up to the point at
> > which a 4GB address space becomes a problem.  Correspondingly, the
> > default binaries on PPC are 32bit, and only those that *need* to be
> > 64bit are.  While a customer's application may need >4GB address space,
> > certainly all the ibutils, diags, opensm, etc. do not.  As a result, we
> > compile all of those utilities as 32bit by default on PPC.  We also ship
> > all the libs as both 32/64bit so users can select the appropriate
> > environment for their particular application (with the exception of
> > dapl, which doesn't support 32bit and for which I filed a bug around the
> > time of OFED 1.1).
> 
> So, what you suggest is - build 2 types of libraries, but on PPC make
> binaries 32 bit? That's easy - do others agree to this approach?

Yep, that's what we do.

> Another option is to build binaries with whatever type of binary
> gcc without extra flags generates by default.

Usually this should work, but I don't rely on that since we also support
s390/s390x (although not with Infiniband, but the OpenMPI alternative
that we shipped with RHEL4, lam, gets compiled on s390/s390x) and that
pair is a bit of an odd mix and I don't have one setting here at my
house where I work, so it's hard for me to confirm that just leaving
things to happen by default works as anticipated.  If they would ever
make an s390 that uses less than a gigawatt of power and heats less than
a large sized convention center, that could change... ;-)

-- 
Doug Ledford <dledford at redhat.com>
              GPG KeyID: CFBFF194
              http://people.redhat.com/dledford

Infiniband specific RPMs available at
              http://people.redhat.com/dledford/Infiniband
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070215/86008a17/attachment.sig>

From rdreier at cisco.com  Thu Feb 15 07:42:21 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Thu, 15 Feb 2007 07:42:21 -0800
Subject: [openib-general] 32-bit build for ppc64 is required
In-Reply-To: <1171552335.3161.128.camel@fc6.xsintricity.com> (Doug
	Ledford's message of "Thu, 15 Feb 2007 10:12:15 -0500")
References: <1171477762.3161.105.camel@fc6.xsintricity.com>
	<20070215055751.GA11866@mellanox.co.il>
	<1171552335.3161.128.camel@fc6.xsintricity.com>
Message-ID: <adar6srv34y.fsf@cisco.com>

 > Usually this should work, but I don't rely on that since we also support
 > s390/s390x (although not with Infiniband, but the OpenMPI alternative
 > that we shipped with RHEL4, lam, gets compiled on s390/s390x) and that
 > pair is a bit of an odd mix and I don't have one setting here at my
 > house where I work, so it's hard for me to confirm that just leaving
 > things to happen by default works as anticipated.  If they would ever
 > make an s390 that uses less than a gigawatt of power and heats less than
 > a large sized convention center, that could change... ;-)

http://www.conmicro.cx/hercules/


From dledford at redhat.com  Thu Feb 15 07:35:39 2007
From: dledford at redhat.com (Doug Ledford)
Date: Thu, 15 Feb 2007 10:35:39 -0500
Subject: [openib-general] 32-bit build for ppc64 is required
In-Reply-To: <OF9910F87F.2CDEEBEE-ONC1257283.0027E408-C1257283.002A2129@de.ibm.com>
References: <OF9910F87F.2CDEEBEE-ONC1257283.0027E408-C1257283.002A2129@de.ibm.com>
Message-ID: <1171553739.3161.141.camel@fc6.xsintricity.com>

On Thu, 2007-02-15 at 08:40 +0100, Hoang-Nam Nguyen wrote:
> > So, what you suggest is - build 2 types of libraries, but on PPC make
> > binaries 32 bit? That's easy - do others agree to this approach?
> No, for execs please create 32bit and 64bit on PPC.
> > Another option is to build binaries with whatever type of binary
> > gcc without extra flags generates by default.
> On PPC we really need the ability to build both versions. The reason is
> simply that there're customers who want them.

Customers ask for all sorts of silly things here and there.  Sometimes
you just need to say "No".

>  Why don't offer both
> options, and each component owner can decide her/his default? And the
> customers can pick the one(s) they like.

Generally speaking, because doing that costs money.  So, there needs to
be a valid reason for the customer to pick one or the other in order to
justify the extra spending.  If there isn't, then it's time to educate
the customer as to *why* there's no reason to do both sets of binaries.

In this case, it's that generally speaking, no fabric is large enough
that the provided utilities have any need of a >4GB address space.
Additionally, the utilities need not be the same bit size as the
customers applications since they are separate processes.  A 64bit
customer app can happily call a 32bit utility and the return code from
that utility will still be valid.

Now, last I knew, we don't ship anything that is a general RDMA
application for use with custom applications other than opensm, and that
follows a standard packet format that prevents 32/64bit issues from
arising (modulo bugs).  Things like rping aren't intended to be used on
one side of a connection while the customer's application sits on the
other.

> I see your point regarding QA effort. Is it really twice? My assumption
> might be wrong: I guess we have to assure/test the 32/64bit compatibility
> anyway eg. 32bit client talks to 64bit server.
> If we have 32bit execs only for development resp. testing, why don't we
> also give them to customers in order to do basic test or diagnosis of
> their setup?

32/64 bit mpitests would suffice for testing that I think (and is
generally a good test anyway).

-- 
Doug Ledford <dledford at redhat.com>
              GPG KeyID: CFBFF194
              http://people.redhat.com/dledford

Infiniband specific RPMs available at
              http://people.redhat.com/dledford/Infiniband
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070215/c3252ace/attachment.sig>

From swise at opengridcomputing.com  Thu Feb 15 07:55:19 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Thu, 15 Feb 2007 09:55:19 -0600
Subject: [openib-general] remap_page_range() in older kernels
Message-ID: <1171554919.13282.17.camel@stevo-desktop>

Roland, 

Do you remember any issues with using remap_page_range() in older
kernels for mapping memory allocated in the kernel back to a user
process?  

I'm testing cxgb3 in ofed 1.2 on rhel4u4 with uses a 2.6.9 based kernel.
And cxgb3 kernel-bypass isn't working because my WQ and CQ memory isn't
getting correctly mapped into the user process.  

I've confirmed that the mapping is wrong by scribbling in the memory
just after its allocated in the kernel (via dma_alloc_coherent()), then
reading in the library after mapping it.  The process isn't reading the
correct scribbles...

For the ofed 1.2 backport, we've redefined remap_pfn_range() to:

static inline int
remap_pfn_range(struct vm_area_struct *vma, unsigned long addr,
                unsigned long pfn, unsigned long size, pgprot_t prot)
{
        return remap_page_range(vma, addr, pfn << PAGE_SHIFT, size, prot);
}


Any of this ring a bell?  Any ideas?

Thanks,

Steve.


From yipeeyipeeyipeeyipee at yahoo.com  Thu Feb 15 07:53:11 2007
From: yipeeyipeeyipeeyipee at yahoo.com (yipeeyipeeyipeeyipee)
Date: Thu, 15 Feb 2007 15:53:11 +0000 (UTC)
Subject: [openib-general] bad port physstate
Message-ID: <loom.20070215T164121-318@post.gmane.org>

Hi,

It seems like I've stumbled into some sort of bug in the port info mad query.
I have several pc's connected to an IB switch.
On one of the machines I have an OpenIB installation, and on one pc I
continuously run a management utility that sweeps the fabric (using
ibnetdiscover from management/diags/ibnetdiscover/). At one point in time after
another slow-booting pc boots, ibnetdiscover fails during its fabric sweep and
the IB_ATTR_PORT_INFO query to the sweeping node's ib port fails returning a
physstate == 6 (LinkErrorRecovery).
When I check the /sys/class/infiniband/mthca0/ports/1/state I get "4: ACTIVE".

Is there some known issue with port info mad queries? Could this be somehow
related to mixed SDR/DDR switch and hcas? Maybe someone here knows how to
workaround this issue?

Thanks


From hnguyen at linux.vnet.ibm.com  Thu Feb 15 08:06:33 2007
From: hnguyen at linux.vnet.ibm.com (Hoang-Nam Nguyen)
Date: Thu, 15 Feb 2007 17:06:33 +0100
Subject: [openib-general] [PATCH 2.6.21-rc1 1/5] ehca: reworked irq handler
 to avoid/reduce missed irq events
Message-ID: <200702151706.33773.hnguyen@linux.vnet.ibm.com>

reworked irq handler to avoid/reduce missed irq events


Signed-off-by: Hoang-Nam Nguyen <hnguyen at de.ibm.com>
---


 ehca_classes.h |   18 +++-
 ehca_eq.c      |    1
 ehca_irq.c     |  214 +++++++++++++++++++++++++++++++++++----------------------
 ehca_irq.h     |    1
 ehca_main.c    |   28 +++++--
 ipz_pt_fn.h    |   11 ++
 6 files changed, 182 insertions(+), 91 deletions(-)


diff --git a/drivers/infiniband/hw/ehca/ehca_classes.h b/drivers/infiniband/hw/ehca/ehca_classes.h
index cf95ee4..f08ad6f 100644
--- a/drivers/infiniband/hw/ehca/ehca_classes.h
+++ b/drivers/infiniband/hw/ehca/ehca_classes.h
@@ -42,8 +42,6 @@
 #ifndef __EHCA_CLASSES_H__
 #define __EHCA_CLASSES_H__
 
-#include "ehca_classes.h"
-#include "ipz_pt_fn.h"
 
 struct ehca_module;
 struct ehca_qp;
@@ -54,14 +52,22 @@ struct ehca_mw;
 struct ehca_pd;
 struct ehca_av;
 
+#include <rdma/ib_verbs.h>
+#include <rdma/ib_user_verbs.h>
+
 #ifdef CONFIG_PPC64
 #include "ehca_classes_pSeries.h"
 #endif
+#include "ipz_pt_fn.h"
+#include "ehca_qes.h"
+#include "ehca_irq.h"
 
-#include <rdma/ib_verbs.h>
-#include <rdma/ib_user_verbs.h>
+#define EHCA_EQE_CACHE_SIZE 20
 
-#include "ehca_irq.h"
+struct ehca_eqe_cache_entry {
+	struct ehca_eqe *eqe;
+	struct ehca_cq *cq;
+};
 
 struct ehca_eq {
 	u32 length;
@@ -74,6 +80,8 @@ struct ehca_eq {
 	spinlock_t spinlock;
 	struct tasklet_struct interrupt_task;
 	u32 ist;
+	spinlock_t irq_spinlock;
+	struct ehca_eqe_cache_entry eqe_cache[EHCA_EQE_CACHE_SIZE];
 };
 
 struct ehca_sport {
diff --git a/drivers/infiniband/hw/ehca/ehca_eq.c b/drivers/infiniband/hw/ehca/ehca_eq.c
index 5281dec..33c822e 100644
--- a/drivers/infiniband/hw/ehca/ehca_eq.c
+++ b/drivers/infiniband/hw/ehca/ehca_eq.c
@@ -61,6 +61,7 @@ int ehca_create_eq(struct ehca_shca *shc
 	struct ib_device *ib_dev = &shca->ib_device;
 
 	spin_lock_init(&eq->spinlock);
+	spin_lock_init(&eq->irq_spinlock);
 	eq->is_initialized = 0;
 
 	if (type != EHCA_EQ && type != EHCA_NEQ) {
diff --git a/drivers/infiniband/hw/ehca/ehca_irq.c b/drivers/infiniband/hw/ehca/ehca_irq.c
index 6c4f9f9..b923b5d 100644
--- a/drivers/infiniband/hw/ehca/ehca_irq.c
+++ b/drivers/infiniband/hw/ehca/ehca_irq.c
@@ -206,7 +206,7 @@ static void qp_event_callback(struct ehc
 }
 
 static void cq_event_callback(struct ehca_shca *shca,
-					  u64 eqe)
+			      u64 eqe)
 {
 	struct ehca_cq *cq;
 	unsigned long flags;
@@ -318,7 +318,7 @@ static void parse_ec(struct ehca_shca *s
 			  "disruptive port %x configuration change", port);
 
 		ehca_info(&shca->ib_device,
-			 "port %x is inactive.", port);
+			  "port %x is inactive.", port);
 		event.device = &shca->ib_device;
 		event.event = IB_EVENT_PORT_ERR;
 		event.element.port_num = port;
@@ -326,7 +326,7 @@ static void parse_ec(struct ehca_shca *s
 		ib_dispatch_event(&event);
 
 		ehca_info(&shca->ib_device,
-			 "port %x is active.", port);
+			  "port %x is active.", port);
 		event.device = &shca->ib_device;
 		event.event = IB_EVENT_PORT_ACTIVE;
 		event.element.port_num = port;
@@ -401,87 +401,143 @@ irqreturn_t ehca_interrupt_eq(int irq, v
 	return IRQ_HANDLED;
 }
 
-void ehca_tasklet_eq(unsigned long data)
-{
-	struct ehca_shca *shca = (struct ehca_shca*)data;
-	struct ehca_eqe *eqe;
-	int int_state;
-	int query_cnt = 0;
 
-	do {
-		eqe = (struct ehca_eqe *)ehca_poll_eq(shca, &shca->eq);
+static inline void process_eqe(struct ehca_shca *shca, struct ehca_eqe *eqe)
+{
+	u64 eqe_value;
+	u32 token;
+	unsigned long flags;
+	struct ehca_cq *cq;
+	eqe_value = eqe->entry;
+	ehca_dbg(&shca->ib_device, "eqe_value=%lx", eqe_value);
+	if (EHCA_BMASK_GET(EQE_COMPLETION_EVENT, eqe_value)) {
+		ehca_dbg(&shca->ib_device, "... completion event");
+		token = EHCA_BMASK_GET(EQE_CQ_TOKEN, eqe_value);
+		spin_lock_irqsave(&ehca_cq_idr_lock, flags);
+		cq = idr_find(&ehca_cq_idr, token);
+		if (cq == NULL) {
+			spin_unlock_irqrestore(&ehca_cq_idr_lock, flags);
+			ehca_err(&shca->ib_device,
+				 "Invalid eqe for non-existing cq token=%x",
+				 token);
+			return;
+		}
+		reset_eq_pending(cq);
+#ifdef CONFIG_INFINIBAND_EHCA_SCALING
+		queue_comp_task(cq);
+		spin_unlock_irqrestore(&ehca_cq_idr_lock, flags);
+#else
+		spin_unlock_irqrestore(&ehca_cq_idr_lock, flags);
+		comp_event_callback(cq);
+#endif
+	} else {
+		ehca_dbg(&shca->ib_device,
+			 "Got non completion event");
+		parse_identifier(shca, eqe_value);
+	}
+}
 
-		if ((shca->hw_level >= 2) && eqe)
-			int_state = 1;
-		else
-			int_state = 0;
+void ehca_process_eq(struct ehca_shca *shca, int is_irq)
+{
+	struct ehca_eq *eq = &shca->eq;
+	struct ehca_eqe_cache_entry *eqe_cache = eq->eqe_cache;
+	u64 eqe_value;
+	unsigned long flags;
+	int eqe_cnt, i;
+	int eq_empty = 0;
 
-		while ((int_state == 1) || eqe) {
-			while (eqe) {
-				u64 eqe_value = eqe->entry;
-
-				ehca_dbg(&shca->ib_device,
-					 "eqe_value=%lx", eqe_value);
-
-				/* TODO: better structure */
-				if (EHCA_BMASK_GET(EQE_COMPLETION_EVENT,
-						   eqe_value)) {
-					unsigned long flags;
-					u32 token;
-					struct ehca_cq *cq;
-
-					ehca_dbg(&shca->ib_device,
-						 "... completion event");
-					token =
-						EHCA_BMASK_GET(EQE_CQ_TOKEN,
-							       eqe_value);
-					spin_lock_irqsave(&ehca_cq_idr_lock,
-							  flags);
-					cq = idr_find(&ehca_cq_idr, token);
-
-					if (cq == NULL) {
-						spin_unlock_irqrestore(&ehca_cq_idr_lock,
-								       flags);
-						break;
-					}
+	spin_lock_irqsave(&eq->irq_spinlock, flags);
+	if (is_irq) {
+		const int max_query_cnt = 100;
+		int query_cnt = 0;
+		int int_state = 1;
+		do {
+			int_state = hipz_h_query_int_state(
+				shca->ipz_hca_handle, eq->ist);
+			query_cnt++;
+			iosync();
+		} while (int_state && query_cnt < max_query_cnt);
+		if (unlikely((query_cnt == max_query_cnt)))
+			ehca_dbg(&shca->ib_device, "int_state=%x query_cnt=%x",
+				 int_state, query_cnt);
+	}
 
-					reset_eq_pending(cq);
+	/* read out all eqes */
+	eqe_cnt = 0;
+	do {
+		u32 token;
+		eqe_cache[eqe_cnt].eqe =
+			(struct ehca_eqe *)ehca_poll_eq(shca, eq);
+		if (!eqe_cache[eqe_cnt].eqe)
+			break;
+		eqe_value = eqe_cache[eqe_cnt].eqe->entry;
+		if (EHCA_BMASK_GET(EQE_COMPLETION_EVENT, eqe_value)) {
+			token = EHCA_BMASK_GET(EQE_CQ_TOKEN, eqe_value);
+			spin_lock(&ehca_cq_idr_lock);
+			eqe_cache[eqe_cnt].cq = idr_find(&ehca_cq_idr, token);
+			if (!eqe_cache[eqe_cnt].cq) {
+				spin_unlock(&ehca_cq_idr_lock);
+				ehca_err(&shca->ib_device,
+					 "Invalid eqe for non-existing cq "
+					 "token=%x", token);
+				continue;
+			}
+			spin_unlock(&ehca_cq_idr_lock);
+		} else
+			eqe_cache[eqe_cnt].cq = NULL;
+		eqe_cnt++;
+	} while (eqe_cnt < EHCA_EQE_CACHE_SIZE);
+	if (!eqe_cnt) {
+		if (is_irq)
+			ehca_dbg(&shca->ib_device,
+				 "No eqe found for irq event");
+		goto unlock_irq_spinlock;
+	} else if (!is_irq)
+		ehca_dbg(&shca->ib_device, "deadman found %x eqe", eqe_cnt);
+	if (unlikely(eqe_cnt == EHCA_EQE_CACHE_SIZE))
+		ehca_dbg(&shca->ib_device, "too many eqes for one irq event");
+	/* enable irq for new packets */
+	for (i = 0; i < eqe_cnt; i++) {
+		if (eq->eqe_cache[i].cq)
+			reset_eq_pending(eq->eqe_cache[i].cq);
+	}
+	/* check eq */
+	spin_lock(&eq->spinlock);
+	eq_empty = (!ipz_eqit_eq_peek_valid(&shca->eq.ipz_queue));
+	spin_unlock(&eq->spinlock);
+	/* call completion handler for cached eqes */
+	for (i = 0; i < eqe_cnt; i++)
+		if (eq->eqe_cache[i].cq) {
 #ifdef CONFIG_INFINIBAND_EHCA_SCALING
-					queue_comp_task(cq);
-					spin_unlock_irqrestore(&ehca_cq_idr_lock,
-							       flags);
+			spin_lock(&ehca_cq_idr_lock);
+			queue_comp_task(eq->eqe_cache[i].cq);
+			spin_unlock(&ehca_cq_idr_lock);
 #else
-					spin_unlock_irqrestore(&ehca_cq_idr_lock,
-							       flags);
-					comp_event_callback(cq);
+			comp_event_callback(eq->eqe_cache[i].cq);
 #endif
-				} else {
-					ehca_dbg(&shca->ib_device,
-						 "... non completion event");
-					parse_identifier(shca, eqe_value);
-				}
-				eqe =
-					(struct ehca_eqe *)ehca_poll_eq(shca,
-								    &shca->eq);
-			}
-
-			if (shca->hw_level >= 2) {
-				int_state =
-				    hipz_h_query_int_state(shca->ipz_hca_handle,
-							   shca->eq.ist);
-				query_cnt++;
-				iosync();
-				if (query_cnt >= 100) {
-					query_cnt = 0;
-					int_state = 0;
-				}
-			}
-			eqe = (struct ehca_eqe *)ehca_poll_eq(shca, &shca->eq);
-
+		} else {
+			ehca_dbg(&shca->ib_device, "Got non completion event");
+			parse_identifier(shca, eq->eqe_cache[i].eqe->entry);
 		}
-	} while (int_state != 0);
+	/* poll eq if not empty */
+	if (eq_empty)
+		goto unlock_irq_spinlock;
+	do {
+		struct ehca_eqe *eqe;
+		eqe = (struct ehca_eqe *)ehca_poll_eq(shca, &shca->eq);
+		if (!eqe)
+			break;
+		process_eqe(shca, eqe);
+		eqe_cnt++;
+	} while (1);
 
-	return;
+unlock_irq_spinlock:
+	spin_unlock_irqrestore(&eq->irq_spinlock, flags);
+}
+
+void ehca_tasklet_eq(unsigned long data)
+{
+	ehca_process_eq((struct ehca_shca*)data, 1);
 }
 
 #ifdef CONFIG_INFINIBAND_EHCA_SCALING
@@ -654,11 +710,11 @@ static void take_over_work(struct ehca_c
 	list_splice_init(&cct->cq_list, &list);
 
 	while(!list_empty(&list)) {
-	       cq = list_entry(cct->cq_list.next, struct ehca_cq, entry);
+		cq = list_entry(cct->cq_list.next, struct ehca_cq, entry);
 
-	       list_del(&cq->entry);
-	       __queue_comp_task(cq, per_cpu_ptr(pool->cpu_comp_tasks,
-						 smp_processor_id()));
+		list_del(&cq->entry);
+		__queue_comp_task(cq, per_cpu_ptr(pool->cpu_comp_tasks,
+						  smp_processor_id()));
 	}
 
 	spin_unlock_irqrestore(&cct->task_lock, flags_cct);
diff --git a/drivers/infiniband/hw/ehca/ehca_irq.h b/drivers/infiniband/hw/ehca/ehca_irq.h
index be579cc..6ed06ee 100644
--- a/drivers/infiniband/hw/ehca/ehca_irq.h
+++ b/drivers/infiniband/hw/ehca/ehca_irq.h
@@ -56,6 +56,7 @@ void ehca_tasklet_neq(unsigned long data
 
 irqreturn_t ehca_interrupt_eq(int irq, void *dev_id);
 void ehca_tasklet_eq(unsigned long data);
+void ehca_process_eq(struct ehca_shca *shca, int is_irq);
 
 struct ehca_cpu_comp_task {
 	wait_queue_head_t wait_queue;
diff --git a/drivers/infiniband/hw/ehca/ehca_main.c b/drivers/infiniband/hw/ehca/ehca_main.c
index 1155bcf..5790534 100644
--- a/drivers/infiniband/hw/ehca/ehca_main.c
+++ b/drivers/infiniband/hw/ehca/ehca_main.c
@@ -52,7 +52,7 @@
 MODULE_LICENSE("Dual BSD/GPL");
 MODULE_AUTHOR("Christoph Raisch <raisch at de.ibm.com>");
 MODULE_DESCRIPTION("IBM eServer HCA InfiniBand Device Driver");
-MODULE_VERSION("SVNEHCA_0020");
+MODULE_VERSION("SVNEHCA_0021");
 
 int ehca_open_aqp1     = 0;
 int ehca_debug_level   = 0;
@@ -432,8 +432,8 @@ static int ehca_destroy_aqp1(struct ehca
 
 static ssize_t ehca_show_debug_level(struct device_driver *ddp, char *buf)
 {
-	return  snprintf(buf, PAGE_SIZE, "%d\n",
-			 ehca_debug_level);
+	return snprintf(buf, PAGE_SIZE, "%d\n",
+			ehca_debug_level);
 }
 
 static ssize_t ehca_store_debug_level(struct device_driver *ddp,
@@ -778,8 +778,24 @@ void ehca_poll_eqs(unsigned long data)
 
 	spin_lock(&shca_list_lock);
 	list_for_each_entry(shca, &shca_list, shca_list) {
-		if (shca->eq.is_initialized)
-			ehca_tasklet_eq((unsigned long)(void*)shca);
+		if (shca->eq.is_initialized) {
+			/* call deadman proc only if eq ptr does not change */
+			struct ehca_eq *eq = &shca->eq;
+			int max = 3;
+			volatile u64 q_ofs, q_ofs2;
+			u64 flags;
+			spin_lock_irqsave(&eq->spinlock, flags);
+			q_ofs = eq->ipz_queue.current_q_offset;
+			spin_unlock_irqrestore(&eq->spinlock, flags);
+			do {
+				spin_lock_irqsave(&eq->spinlock, flags);
+				q_ofs2 = eq->ipz_queue.current_q_offset;
+				spin_unlock_irqrestore(&eq->spinlock, flags);
+				max--;
+			} while (q_ofs == q_ofs2 && max > 0);
+			if (q_ofs == q_ofs2)
+				ehca_process_eq(shca, 0);
+		}
 	}
 	mod_timer(&poll_eqs_timer, jiffies + HZ);
 	spin_unlock(&shca_list_lock);
@@ -790,7 +806,7 @@ int __init ehca_module_init(void)
 	int ret;
 
 	printk(KERN_INFO "eHCA Infiniband Device Driver "
-	                 "(Rel.: SVNEHCA_0020)\n");
+	       "(Rel.: SVNEHCA_0021)\n");
 	idr_init(&ehca_qp_idr);
 	idr_init(&ehca_cq_idr);
 	spin_lock_init(&ehca_qp_idr_lock);
diff --git a/drivers/infiniband/hw/ehca/ipz_pt_fn.h b/drivers/infiniband/hw/ehca/ipz_pt_fn.h
index dc3bda2..8199c45 100644
--- a/drivers/infiniband/hw/ehca/ipz_pt_fn.h
+++ b/drivers/infiniband/hw/ehca/ipz_pt_fn.h
@@ -79,7 +79,7 @@ static inline void *ipz_qeit_calc(struct
 	if (q_offset >= queue->queue_length)
 		return NULL;
 	current_page = (queue->queue_pages)[q_offset >> EHCA_PAGESHIFT];
-	return  &current_page->entries[q_offset & (EHCA_PAGESIZE - 1)];
+	return &current_page->entries[q_offset & (EHCA_PAGESIZE - 1)];
 }
 
 /*
@@ -247,6 +247,15 @@ static inline void *ipz_eqit_eq_get_inc_
 	return ret;
 }
 
+static inline void *ipz_eqit_eq_peek_valid(struct ipz_queue *queue)
+{
+	void *ret = ipz_qeit_get(queue);
+	u32 qe = *(u8 *) ret;
+	if ((qe >> 7) != (queue->toggle_state & 1))
+		return NULL;
+	return ret;
+}
+
 /* returns address (GX) of first queue entry */
 static inline u64 ipz_qpt_get_firstpage(struct ipz_qpt *qpt)
 {


From hnguyen at linux.vnet.ibm.com  Thu Feb 15 08:07:30 2007
From: hnguyen at linux.vnet.ibm.com (Hoang-Nam Nguyen)
Date: Thu, 15 Feb 2007 17:07:30 +0100
Subject: [openib-general] [PATCH 2.6.21-rc1 2/5] ehca: fix race
 condition/locking issues in scaling code
Message-ID: <200702151707.31021.hnguyen@linux.vnet.ibm.com>

fix a race condition in find_next_cpu_online() and some
other locking issues in scaling code


Signed-off-by: Hoang-Nam Nguyen <hnguyen at de.ibm.com>
---


 ehca_irq.c |   68 +++++++++++++++++++++++++++++--------------------------------
 1 files changed, 33 insertions(+), 35 deletions(-)


diff --git a/drivers/infiniband/hw/ehca/ehca_irq.c b/drivers/infiniband/hw/ehca/ehca_irq.c
index b923b5d..9679b07 100644
--- a/drivers/infiniband/hw/ehca/ehca_irq.c
+++ b/drivers/infiniband/hw/ehca/ehca_irq.c
@@ -544,28 +544,30 @@ void ehca_tasklet_eq(unsigned long data)
 
 static inline int find_next_online_cpu(struct ehca_comp_pool* pool)
 {
-	unsigned long flags_last_cpu;
+	int cpu;
+	unsigned long flags;
 
+	WARN_ON_ONCE(!in_interrupt());
 	if (ehca_debug_level)
 		ehca_dmp(&cpu_online_map, sizeof(cpumask_t), "");
 
-	spin_lock_irqsave(&pool->last_cpu_lock, flags_last_cpu);
-	pool->last_cpu = next_cpu(pool->last_cpu, cpu_online_map);
-	if (pool->last_cpu == NR_CPUS)
-		pool->last_cpu = first_cpu(cpu_online_map);
-	spin_unlock_irqrestore(&pool->last_cpu_lock, flags_last_cpu);
+	spin_lock_irqsave(&pool->last_cpu_lock, flags);
+	cpu = next_cpu(pool->last_cpu, cpu_online_map);
+	if (cpu == NR_CPUS)
+		cpu = first_cpu(cpu_online_map);
+	pool->last_cpu = cpu;
+	spin_unlock_irqrestore(&pool->last_cpu_lock, flags);
 
-	return pool->last_cpu;
+	return cpu;
 }
 
 static void __queue_comp_task(struct ehca_cq *__cq,
 			      struct ehca_cpu_comp_task *cct)
 {
-	unsigned long flags_cct;
-	unsigned long flags_cq;
+	unsigned long flags;
 
-	spin_lock_irqsave(&cct->task_lock, flags_cct);
-	spin_lock_irqsave(&__cq->task_lock, flags_cq);
+	spin_lock_irqsave(&cct->task_lock, flags);
+	spin_lock(&__cq->task_lock);
 
 	if (__cq->nr_callbacks == 0) {
 		__cq->nr_callbacks++;
@@ -576,8 +578,8 @@ static void __queue_comp_task(struct ehc
 	else
 		__cq->nr_callbacks++;
 
-	spin_unlock_irqrestore(&__cq->task_lock, flags_cq);
-	spin_unlock_irqrestore(&cct->task_lock, flags_cct);
+	spin_unlock(&__cq->task_lock);
+	spin_unlock_irqrestore(&cct->task_lock, flags);
 }
 
 static void queue_comp_task(struct ehca_cq *__cq)
@@ -588,69 +590,69 @@ static void queue_comp_task(struct ehca_
 
 	cpu = get_cpu();
 	cpu_id = find_next_online_cpu(pool);
-
 	BUG_ON(!cpu_online(cpu_id));
 
 	cct = per_cpu_ptr(pool->cpu_comp_tasks, cpu_id);
+	BUG_ON(!cct);
 
 	if (cct->cq_jobs > 0) {
 		cpu_id = find_next_online_cpu(pool);
 		cct = per_cpu_ptr(pool->cpu_comp_tasks, cpu_id);
+		BUG_ON(!cct);
 	}
 
 	__queue_comp_task(__cq, cct);
-
-	put_cpu();
-
-	return;
 }
 
 static void run_comp_task(struct ehca_cpu_comp_task* cct)
 {
 	struct ehca_cq *cq;
-	unsigned long flags_cct;
-	unsigned long flags_cq;
+	unsigned long flags;
 
-	spin_lock_irqsave(&cct->task_lock, flags_cct);
+	spin_lock_irqsave(&cct->task_lock, flags);
 
 	while (!list_empty(&cct->cq_list)) {
 		cq = list_entry(cct->cq_list.next, struct ehca_cq, entry);
-		spin_unlock_irqrestore(&cct->task_lock, flags_cct);
+		spin_unlock_irqrestore(&cct->task_lock, flags);
 		comp_event_callback(cq);
-		spin_lock_irqsave(&cct->task_lock, flags_cct);
+		spin_lock_irqsave(&cct->task_lock, flags);
 
-		spin_lock_irqsave(&cq->task_lock, flags_cq);
+		spin_lock(&cq->task_lock);
 		cq->nr_callbacks--;
 		if (cq->nr_callbacks == 0) {
 			list_del_init(cct->cq_list.next);
 			cct->cq_jobs--;
 		}
-		spin_unlock_irqrestore(&cq->task_lock, flags_cq);
-
+		spin_unlock(&cq->task_lock);
 	}
 
-	spin_unlock_irqrestore(&cct->task_lock, flags_cct);
-
-	return;
+	spin_unlock_irqrestore(&cct->task_lock, flags);
 }
 
 static int comp_task(void *__cct)
 {
 	struct ehca_cpu_comp_task* cct = __cct;
+	int cql_empty;
 	DECLARE_WAITQUEUE(wait, current);
 
 	set_current_state(TASK_INTERRUPTIBLE);
 	while(!kthread_should_stop()) {
 		add_wait_queue(&cct->wait_queue, &wait);
 
-		if (list_empty(&cct->cq_list))
+		spin_lock_irq(&cct->task_lock);
+		cql_empty = list_empty(&cct->cq_list);
+		spin_unlock_irq(&cct->task_lock);
+		if (cql_empty)
 			schedule();
 		else
 			__set_current_state(TASK_RUNNING);
 
 		remove_wait_queue(&cct->wait_queue, &wait);
 
-		if (!list_empty(&cct->cq_list))
+		spin_lock_irq(&cct->task_lock);
+		cql_empty = list_empty(&cct->cq_list);
+		spin_unlock_irq(&cct->task_lock);
+		if (!cql_empty)
 			run_comp_task(__cct);
 
 		set_current_state(TASK_INTERRUPTIBLE);
@@ -693,8 +695,6 @@ static void destroy_comp_task(struct ehc
 
 	if (task)
 		kthread_stop(task);
-
-	return;
 }
 
 static void take_over_work(struct ehca_comp_pool *pool,
@@ -815,6 +815,4 @@ void ehca_destroy_comp_pool(void)
 	free_percpu(pool->cpu_comp_tasks);
 	kfree(pool);
 #endif
-
-	return;
 }


From hnguyen at linux.vnet.ibm.com  Thu Feb 15 08:08:33 2007
From: hnguyen at linux.vnet.ibm.com (Hoang-Nam Nguyen)
Date: Thu, 15 Feb 2007 17:08:33 +0100
Subject: [openib-general] [PATCH 2.6.21-rc1 3/5] ehca: allow en/disabling
 scaling code via module parameter
Message-ID: <200702151708.33781.hnguyen@linux.vnet.ibm.com>

allow users to en/disable scaling code when loading ib_ehca module


Signed-off-by: Hoang-Nam Nguyen <hnguyen at de.ibm.com>
---


 Kconfig        |    8 --------
 ehca_classes.h |    1 +
 ehca_irq.c     |   47 +++++++++++++++++++++--------------------------
 ehca_main.c    |    4 ++++
 4 files changed, 26 insertions(+), 34 deletions(-)


diff --git a/drivers/infiniband/hw/ehca/Kconfig b/drivers/infiniband/hw/ehca/Kconfig
index 727b10d..1a85459 100644
--- a/drivers/infiniband/hw/ehca/Kconfig
+++ b/drivers/infiniband/hw/ehca/Kconfig
@@ -7,11 +7,3 @@ config INFINIBAND_EHCA
 	To compile the driver as a module, choose M here. The module
 	will be called ib_ehca.
 
-config INFINIBAND_EHCA_SCALING
-	bool "Scaling support (EXPERIMENTAL)"
-	depends on IBMEBUS && INFINIBAND_EHCA && HOTPLUG_CPU && EXPERIMENTAL
-	default y
-	---help---
-	eHCA scaling support schedules the CQ callbacks to different CPUs.
-
-	To enable this feature choose Y here.
diff --git a/drivers/infiniband/hw/ehca/ehca_classes.h b/drivers/infiniband/hw/ehca/ehca_classes.h
index f08ad6f..40404c9 100644
--- a/drivers/infiniband/hw/ehca/ehca_classes.h
+++ b/drivers/infiniband/hw/ehca/ehca_classes.h
@@ -277,6 +277,7 @@ extern struct idr ehca_cq_idr;
 extern int ehca_static_rate;
 extern int ehca_port_act_time;
 extern int ehca_use_hp_mr;
+extern int ehca_scaling_code;
 
 struct ipzu_queue_resp {
 	u32 qe_size;      /* queue entry size */
diff --git a/drivers/infiniband/hw/ehca/ehca_irq.c b/drivers/infiniband/hw/ehca/ehca_irq.c
index 9679b07..3ec53c6 100644
--- a/drivers/infiniband/hw/ehca/ehca_irq.c
+++ b/drivers/infiniband/hw/ehca/ehca_irq.c
@@ -63,15 +63,11 @@
 #define ERROR_DATA_LENGTH      EHCA_BMASK_IBM(52,63)
 #define ERROR_DATA_TYPE        EHCA_BMASK_IBM(0,7)
 
-#ifdef CONFIG_INFINIBAND_EHCA_SCALING
-
 static void queue_comp_task(struct ehca_cq *__cq);
 
 static struct ehca_comp_pool* pool;
 static struct notifier_block comp_pool_callback_nb;
 
-#endif
-
 static inline void comp_event_callback(struct ehca_cq *cq)
 {
 	if (!cq->ib_cq.comp_handler)
@@ -423,13 +419,13 @@ static inline void process_eqe(struct eh
 			return;
 		}
 		reset_eq_pending(cq);
-#ifdef CONFIG_INFINIBAND_EHCA_SCALING
-		queue_comp_task(cq);
-		spin_unlock_irqrestore(&ehca_cq_idr_lock, flags);
-#else
-		spin_unlock_irqrestore(&ehca_cq_idr_lock, flags);
-		comp_event_callback(cq);
-#endif
+		if (ehca_scaling_code) {
+			queue_comp_task(cq);
+			spin_unlock_irqrestore(&ehca_cq_idr_lock, flags);
+		} else {
+			spin_unlock_irqrestore(&ehca_cq_idr_lock, flags);
+			comp_event_callback(cq);
+		}
 	} else {
 		ehca_dbg(&shca->ib_device,
 			 "Got non completion event");
@@ -508,13 +504,12 @@ void ehca_process_eq(struct ehca_shca *s
 	/* call completion handler for cached eqes */
 	for (i = 0; i < eqe_cnt; i++)
 		if (eq->eqe_cache[i].cq) {
-#ifdef CONFIG_INFINIBAND_EHCA_SCALING
-			spin_lock(&ehca_cq_idr_lock);
-			queue_comp_task(eq->eqe_cache[i].cq);
-			spin_unlock(&ehca_cq_idr_lock);
-#else
-			comp_event_callback(eq->eqe_cache[i].cq);
-#endif
+			if (ehca_scaling_code) {
+				spin_lock(&ehca_cq_idr_lock);
+				queue_comp_task(eq->eqe_cache[i].cq);
+				spin_unlock(&ehca_cq_idr_lock);
+			} else
+				comp_event_callback(eq->eqe_cache[i].cq);
 		} else {
 			ehca_dbg(&shca->ib_device, "Got non completion event");
 			parse_identifier(shca, eq->eqe_cache[i].eqe->entry);
@@ -540,8 +535,6 @@ void ehca_tasklet_eq(unsigned long data)
 	ehca_process_eq((struct ehca_shca*)data, 1);
 }
 
-#ifdef CONFIG_INFINIBAND_EHCA_SCALING
-
 static inline int find_next_online_cpu(struct ehca_comp_pool* pool)
 {
 	int cpu;
@@ -764,14 +757,14 @@ static int comp_pool_callback(struct not
 	return NOTIFY_OK;
 }
 
-#endif
-
 int ehca_create_comp_pool(void)
 {
-#ifdef CONFIG_INFINIBAND_EHCA_SCALING
 	int cpu;
 	struct task_struct *task;
 
+	if (!ehca_scaling_code)
+		return 0;
+
 	pool = kzalloc(sizeof(struct ehca_comp_pool), GFP_KERNEL);
 	if (pool == NULL)
 		return -ENOMEM;
@@ -796,16 +789,19 @@ int ehca_create_comp_pool(void)
 	comp_pool_callback_nb.notifier_call = comp_pool_callback;
 	comp_pool_callback_nb.priority =0;
 	register_cpu_notifier(&comp_pool_callback_nb);
-#endif
+
+	printk(KERN_INFO "eHCA scaling code enabled\n");
 
 	return 0;
 }
 
 void ehca_destroy_comp_pool(void)
 {
-#ifdef CONFIG_INFINIBAND_EHCA_SCALING
 	int i;
 
+	if (!ehca_scaling_code)
+		return;
+
 	unregister_cpu_notifier(&comp_pool_callback_nb);
 
 	for (i = 0; i < NR_CPUS; i++) {
@@ -814,5 +810,4 @@ void ehca_destroy_comp_pool(void)
 	}
 	free_percpu(pool->cpu_comp_tasks);
 	kfree(pool);
-#endif
 }
diff --git a/drivers/infiniband/hw/ehca/ehca_main.c b/drivers/infiniband/hw/ehca/ehca_main.c
index 5790534..c183512 100644
--- a/drivers/infiniband/hw/ehca/ehca_main.c
+++ b/drivers/infiniband/hw/ehca/ehca_main.c
@@ -62,6 +62,7 @@ int ehca_use_hp_mr     = 0;
 int ehca_port_act_time = 30;
 int ehca_poll_all_eqs  = 1;
 int ehca_static_rate   = -1;
+int ehca_scaling_code  = 1;
 
 module_param_named(open_aqp1,     ehca_open_aqp1,     int, 0);
 module_param_named(debug_level,   ehca_debug_level,   int, 0);
@@ -71,6 +72,7 @@ module_param_named(use_hp_mr,     ehca_u
 module_param_named(port_act_time, ehca_port_act_time, int, 0);
 module_param_named(poll_all_eqs,  ehca_poll_all_eqs,  int, 0);
 module_param_named(static_rate,   ehca_static_rate,   int, 0);
+module_param_named(scaling_code,   ehca_scaling_code,   int, 0);
 
 MODULE_PARM_DESC(open_aqp1,
 		 "AQP1 on startup (0: no (default), 1: yes)");
@@ -91,6 +93,8 @@ MODULE_PARM_DESC(poll_all_eqs,
 		 " (0: no, 1: yes (default))");
 MODULE_PARM_DESC(static_rate,
 		 "set permanent static rate (default: disabled)");
+MODULE_PARM_DESC(scaling_code,
+		 "set scaling code (0: disabled, 1: enabled/default)");
 
 spinlock_t ehca_qp_idr_lock;
 spinlock_t ehca_cq_idr_lock;


From hnguyen at linux.vnet.ibm.com  Thu Feb 15 08:09:44 2007
From: hnguyen at linux.vnet.ibm.com (Hoang-Nam Nguyen)
Date: Thu, 15 Feb 2007 17:09:44 +0100
Subject: [openib-general] [PATCH 2.6.21-rc1 4/5] ehca: replace yield() by
 wait_for_completion()
Message-ID: <200702151709.45323.hnguyen@linux.vnet.ibm.com>

remove yield() and use wait_for_completion() in order to wait for running
completion handlers finished before destroying associated completion queue


Signed-off-by: Hoang-Nam Nguyen <hnguyen at de.ibm.com>
---


 ehca_classes.h |    3 +++
 ehca_cq.c      |    5 +++--
 ehca_irq.c     |    6 +++++-
 3 files changed, 11 insertions(+), 3 deletions(-)


diff --git a/drivers/infiniband/hw/ehca/ehca_classes.h b/drivers/infiniband/hw/ehca/ehca_classes.h
index 40404c9..d8ce0c8 100644
--- a/drivers/infiniband/hw/ehca/ehca_classes.h
+++ b/drivers/infiniband/hw/ehca/ehca_classes.h
@@ -52,6 +52,8 @@ struct ehca_mw;
 struct ehca_pd;
 struct ehca_av;
 
+#include <linux/completion.h>
+
 #include <rdma/ib_verbs.h>
 #include <rdma/ib_user_verbs.h>
 
@@ -154,6 +156,7 @@ struct ehca_cq {
 	struct hlist_head qp_hashtab[QP_HASHTAB_LEN];
 	struct list_head entry;
 	u32 nr_callbacks;
+	struct completion zero_callbacks;
 	spinlock_t task_lock;
 	u32 ownpid;
 	/* mmap counter for resources mapped into user space */
diff --git a/drivers/infiniband/hw/ehca/ehca_cq.c b/drivers/infiniband/hw/ehca/ehca_cq.c
index 9291a86..906bd5b 100644
--- a/drivers/infiniband/hw/ehca/ehca_cq.c
+++ b/drivers/infiniband/hw/ehca/ehca_cq.c
@@ -147,6 +147,7 @@ struct ib_cq *ehca_create_cq(struct ib_d
 	spin_lock_init(&my_cq->spinlock);
 	spin_lock_init(&my_cq->cb_lock);
 	spin_lock_init(&my_cq->task_lock);
+	init_completion(&my_cq->zero_callbacks);
 	my_cq->ownpid = current->tgid;
 
 	cq = &my_cq->ib_cq;
@@ -330,9 +331,9 @@ int ehca_destroy_cq(struct ib_cq *cq)
 	}
 
 	spin_lock_irqsave(&ehca_cq_idr_lock, flags);
-	while (my_cq->nr_callbacks) {
+	if (my_cq->nr_callbacks) {
 		spin_unlock_irqrestore(&ehca_cq_idr_lock, flags);
-		yield();
+		wait_for_completion(&my_cq->zero_callbacks);
 		spin_lock_irqsave(&ehca_cq_idr_lock, flags);
 	}
 
diff --git a/drivers/infiniband/hw/ehca/ehca_irq.c b/drivers/infiniband/hw/ehca/ehca_irq.c
index 3ec53c6..7db39b7 100644
--- a/drivers/infiniband/hw/ehca/ehca_irq.c
+++ b/drivers/infiniband/hw/ehca/ehca_irq.c
@@ -605,6 +605,7 @@ static void run_comp_task(struct ehca_cp
 	spin_lock_irqsave(&cct->task_lock, flags);
 
 	while (!list_empty(&cct->cq_list)) {
+		int is_complete = 0;
 		cq = list_entry(cct->cq_list.next, struct ehca_cq, entry);
 		spin_unlock_irqrestore(&cct->task_lock, flags);
 		comp_event_callback(cq);
@@ -612,11 +613,14 @@ static void run_comp_task(struct ehca_cp
 
 		spin_lock(&cq->task_lock);
 		cq->nr_callbacks--;
-		if (cq->nr_callbacks == 0) {
+		is_complete = (cq->nr_callbacks == 0);
+		if (is_complete) {
 			list_del_init(cct->cq_list.next);
 			cct->cq_jobs--;
 		}
 		spin_unlock(&cq->task_lock);
+		if (is_complete) /* wake up waiting destroy_cq() */
+			complete(&cq->zero_callbacks);
 	}
 
 	spin_unlock_irqrestore(&cct->task_lock, flags);


From hnguyen at linux.vnet.ibm.com  Thu Feb 15 08:10:06 2007
From: hnguyen at linux.vnet.ibm.com (Hoang-Nam Nguyen)
Date: Thu, 15 Feb 2007 17:10:06 +0100
Subject: [openib-general] [PATCH 2.6.21-rc1 5/5] ehca: query_port() returns
 LINK_UP instead UNKNOWN
Message-ID: <200702151710.06432.hnguyen@linux.vnet.ibm.com>

set port phys state as a result of ehca_query_port() to LINK_UP.
On pSeries ehca actually represents a logical HCA, whose phys/link state 
always is LINK_UP.


Signed-off-by: Hoang-Nam Nguyen <hnguyen at de.ibm.com>
---


 ehca_hca.c |    3 +++
 1 files changed, 3 insertions(+)


diff --git a/drivers/infiniband/hw/ehca/ehca_hca.c b/drivers/infiniband/hw/ehca/ehca_hca.c
index b7be950..30eb45d 100644
--- a/drivers/infiniband/hw/ehca/ehca_hca.c
+++ b/drivers/infiniband/hw/ehca/ehca_hca.c
@@ -162,6 +162,9 @@ int ehca_query_port(struct ib_device *ib
 	props->active_width    = IB_WIDTH_12X;
 	props->active_speed    = 0x1;
 
+	/* at the moment (logical) link state is always LINK_UP */
+	props->phys_state      = 0x5;
+
 query_port1:
 	ehca_free_fw_ctrlblock(rblock);
 

From changquing.tang at hp.com  Thu Feb 15 08:13:21 2007
From: changquing.tang at hp.com (Tang, Changqing)
Date: Thu, 15 Feb 2007 16:13:21 -0000
Subject: [openib-general] How heavy to resize a CQ ?
In-Reply-To: <adahctxeds8.fsf@cisco.com>
References: <6C2C79E72C305246B504CBA17B5500C905DC04@mtlexch01.mtl.com><adaveigvg7q.fsf@cisco.com><349DCDA352EACF42A0C49FA6DCEA840350AAC4@G3W0634.americas.hpqcorp.net><adatzy0qmt3.fsf@cisco.com><349DCDA352EACF42A0C49FA6DCEA840350B1B5@G3W0634.americas.hpqcorp.net><ada7iuwp5rr.fsf@cisco.com><349DCDA352EACF42A0C49FA6DCEA84035DF880@G3W0634.americas.hpqcorp.net><adamz3pfym0.fsf@cisco.com><349DCDA352EACF42A0C49FA6DCEA84035DFAFF@G3W0634.americas.hpqcorp.net>
	<adahctxeds8.fsf@cisco.com>
Message-ID: <349DCDA352EACF42A0C49FA6DCEA84037050CE@G3W0634.americas.hpqcorp.net>


Roland or other driver developers:

	 In dynamic process application, we don't know how many
connections a process will make when we create the CQ, so we don't know
the CQ size, what we do is to increase the CQ size when a new connection
is made, and decrease the CQ size when a connection is destroyed. My
question is, is ibv_resize_cq() a lightweight function call ?  Do we
have to drain the CQ before we resize the CQ ?

	Thanks

--CQ


> -----Original Message-----
> From: Roland Dreier [mailto:rdreier at cisco.com] 
> Sent: Wednesday, February 07, 2007 5:42 PM
> To: Tang, Changqing
> Cc: Michael S. Tsirkin; openib-general at openib.org
> Subject: Re: Immediate data question
> 
>     Changqing> What I mean is that, is there any performance penalty
>     Changqing> for receiver's overall performance if RNR happens
>     Changqing> continuously on one of the QP ?
> 
> Not for the receiver, but the sender will be severely slowed 
> down by having to wait for the RNR timeouts.
> 


From bclements at SBSPlanet.com  Thu Feb 15 08:16:55 2007
From: bclements at SBSPlanet.com (Clements, Brent)
Date: Thu, 15 Feb 2007 11:16:55 -0500
Subject: [openib-general] What is the expected performance of IPoIB using
	DDR equipment?
Message-ID: <DF4A736A3399E248B90C7AD8E41D9B125E3470@sbsms01.SBSNET.INC>

I've searched the web but I cannot find the answer to the following
question:

 
What is the expected (not theoretical) IPoIB throughput performance when
using DDR switches and DDR HCA's?

 
Thanks!

 
The information contained in this transmission may contain privileged and confidential information. 
It is intended only for the use of the person(s) named above. If you are not the intended  
recipient, you are hereby notified that any review, dissemination, distribution or  
duplication of this communication is strictly prohibited. If you are not the intended recipient, 
please contact the sender by reply email and destroy all copies of the original message. 
To reply to our email administrator directly, please send an email to postmaster at sbsplanet.com.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070215/c9f2b44e/attachment.html>

From halr at voltaire.com  Thu Feb 15 08:14:34 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 15 Feb 2007 11:14:34 -0500
Subject: [openib-general] bad port physstate
In-Reply-To: <loom.20070215T164121-318@post.gmane.org>
References: <loom.20070215T164121-318@post.gmane.org>
Message-ID: <1171556073.22446.185292.camel@hal.voltaire.com>

On Thu, 2007-02-15 at 10:53, yipeeyipeeyipeeyipee wrote:
> Hi,
> 
> It seems like I've stumbled into some sort of bug in the port info mad query.
> I have several pc's connected to an IB switch.
> On one of the machines I have an OpenIB installation, and on one pc I
> continuously run a management utility that sweeps the fabric (using
> ibnetdiscover from management/diags/ibnetdiscover/). At one point in time after
> another slow-booting pc boots, ibnetdiscover fails during its fabric sweep and
> the IB_ATTR_PORT_INFO query to the sweeping node's ib port fails returning a
> physstate == 6 (LinkErrorRecovery).
> When I check the /sys/class/infiniband/mthca0/ports/1/state I get "4: ACTIVE".

That's because the initial smpquery (by ibnetdiscover) sees the
LinkErrorRecovery PortPhysicalState, the port then comes up at the
physical level, and then the SM moves it through the port states to
active and when you look again locally (via
sys/class/infiniband/mthca0/ports/1/state), it has been made active and
I would expect an smpquery of portinfo of this or ibnetdiscover would
now show this.

> Is there some known issue with port info mad queries? Could this be somehow
> related to mixed SDR/DDR switch and hcas? Maybe someone here knows how to
> workaround this issue?

Sounds like the way it is suppposed to work to me.

-- Hal

> Thanks
> 
> 
> 
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
> 


From mst at mellanox.co.il  Thu Feb 15 08:29:42 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Thu, 15 Feb 2007 18:29:42 +0200
Subject: [openib-general] 32-bit build for ppc64 is required
In-Reply-To: <1171552335.3161.128.camel@fc6.xsintricity.com>
References: <1171477762.3161.105.camel@fc6.xsintricity.com>
	<20070215055751.GA11866@mellanox.co.il>
	<1171552335.3161.128.camel@fc6.xsintricity.com>
Message-ID: <20070215162942.GB15185@mellanox.co.il>

> Quoting Doug Ledford <dledford at redhat.com>:
> Subject: Re: 32-bit build for ppc64 is required
> 
> On Thu, 2007-02-15 at 07:57 +0200, Michael S. Tsirkin wrote:
> 
> > > The choice of 32/64 bit default is done on a per arch basis.  With
> > > x86_64/i386, the increased number of CPU registers in 64bit mode
> > > outweighs the increased code bloat that goes along with 64bit mode.  On
> > > PPC, no such register benefit exists for 64bit mode.  As such, 32bit
> > > apps on PPC are faster than the equivalent 64bit apps up to the point at
> > > which a 4GB address space becomes a problem.  Correspondingly, the
> > > default binaries on PPC are 32bit, and only those that *need* to be
> > > 64bit are.  While a customer's application may need >4GB address space,
> > > certainly all the ibutils, diags, opensm, etc. do not.  As a result, we
> > > compile all of those utilities as 32bit by default on PPC.  We also ship
> > > all the libs as both 32/64bit so users can select the appropriate
> > > environment for their particular application (with the exception of
> > > dapl, which doesn't support 32bit and for which I filed a bug around the
> > > time of OFED 1.1).
> > 
> > So, what you suggest is - build 2 types of libraries, but on PPC make
> > binaries 32 bit? That's easy - do others agree to this approach?
> 
> Yep, that's what we do.

Care to post a patch to Vlad's scripts?

-- 
MST


From vlad at mellanox.co.il  Thu Feb 15 08:29:30 2007
From: vlad at mellanox.co.il (Vladimir Sokolovsky)
Date: Thu, 15 Feb 2007 18:29:30 +0200
Subject: [openib-general] [PATCH] ofed_1_2 iw_cxgb3 Fail posts
 synchronously when in TERMINATE state.
In-Reply-To: <1171551038.13282.6.camel@stevo-desktop>
References: <1171551038.13282.6.camel@stevo-desktop>
Message-ID: <1171556970.16477.0.camel@vladsk-laptop>

On Thu, 2007-02-15 at 08:50 -0600, Steve Wise wrote:
> Fail posts synchronously when in TERMINATE state.
> 
> For T3B devices, mark user qp in error once we transition
> to TERMINATE.
> 
> Signed-off-by: Steve Wise <swise at opengridcomputing.com>
> ---

Applied.

-- 
Vladimir Sokolovsky <vlad at mellanox.co.il>
Mellanox Technologies Ltd.


From swise at opengridcomputing.com  Thu Feb 15 08:59:45 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Thu, 15 Feb 2007 10:59:45 -0600
Subject: [openib-general] mvapich2 ofed 1.2 problem
In-Reply-To: <45D3E224.9060306@cse.ohio-state.edu>
References: <1171380610.15471.25.camel@stevo-desktop>
	<adar6sum1fq.fsf@cisco.com> <1171386686.15471.36.camel@stevo-desktop>
	<adazm7ikm7q.fsf@cisco.com> <45D1FD0B.2080606@cse.ohio-state.edu>
	<ada1wktlwux.fsf@cisco.com> <45D3E224.9060306@cse.ohio-state.edu>
Message-ID: <1171558785.13282.29.camel@stevo-desktop>

Shaun,

Lemme know if you have an mvapich2 kit that I can test with iwarp...

Thanks,

Steve.


On Wed, 2007-02-14 at 23:31 -0500, Shaun Rowland wrote:
> Roland Dreier wrote:
> >  > When I build using the OFED-1.2-20070208-1508, libibverbs 1.0 is what is
> >  > built, at least by looking at the .so file result:
> >  > 
> >  > [rowland at z0 ~]$ ls /usr/local/ofed/lib64/ |grep ibverbs libibverbs.a
> >  > libibverbs.so
> >  > libibverbs.so.1
> >  > libibverbs.so.1.0.0
> > 
> > The soname hasn't changed because the library is still compatible.
> > But (I hope at least) OFED has libibverbs 1.1.
> 
> The soname is libibverbs.so.1, so I guess the longer name would not
> matter anyway. Clearly, what I posted shows the IBVERBS 1.1 ABI is
> there. I think I have figured out why our code has this problem. The
> problem below is similar to the original one posted about.
> 
> I did some experimentation with the srq_pingpong libibverbs example
> code. First I built it directly with:
> 
> 
> gcc -g -c pingpong.c -I/usr/local/ofed/include
> 
> gcc -g -c -D_GNU_SOURCE srq_pingpong.c -I/usr/local/ofed/include
> 
> gcc -g -o srq_pingpong srq_pingpong.o pingpong.o -L/usr/local/ofed/lib64 
> -libverbs
> 
> 
> This works.  Next I copied srq_pingpong.c to two files:
> 
> srq_pingpong_rowland.c
>       - just has a main function that calls lib_start().
> 
> srq_pingpong_lib_rowland.c
>       - main() changed to lib_start().
> 
> This moves all of the SRQ pingpong code into a shared library. If I
> build this shared library in this way, it works:
> 
> 
> gcc -g -fpic -c pingpong.c -I/usr/local/ofed/include
> 
> gcc -g -fpic -c -D_GNU_SOURCE srq_pingpong_lib_rowland.c 
> -I/usr/local/ofed/include
> 
> gcc -g -shared -Wl,-soname,libsrqtest.so -o libsrqtest.so 
> srq_pingpong_lib_rowland.o pingpong.o -L/usr/local/ofed/lib64 -libverbs
> 
> gcc -g -o srq_pingpong_rowland srq_pingpong_rowland.c -L$PWD -lsrqtest
> 
> 
> Above I am linking libibverbs directly into my libsrqtest.so
> library. This works and the IBVERBS 1.1 ABI is clearly in the
> libsrqtest.so file:
> 
> [rowland at z1 ibverbs-examples]$ nm libsrqtest.so |grep ibv |head
>                   U ibv_ack_cq_events@@IBVERBS_1.1
>                   U ibv_alloc_pd@@IBVERBS_1.1
>                   U ibv_close_device@@IBVERBS_1.1
>                   U ibv_create_comp_channel@@IBVERBS_1.0
>                   U ibv_create_cq@@IBVERBS_1.1
>                   U ibv_create_qp@@IBVERBS_1.1
>                   U ibv_create_srq@@IBVERBS_1.1
>                   U ibv_dealloc_pd@@IBVERBS_1.1
>                   U ibv_dereg_mr@@IBVERBS_1.1
>                   U ibv_destroy_comp_channel@@IBVERBS_1.0
> 
> However, if I build in a similar way to MVAPICH2, the resulting program
> fails:
> 
> 
> gcc -g -fpic -c pingpong.c -I/usr/local/ofed/include
> 
> gcc -g -fpic -c -D_GNU_SOURCE srq_pingpong_lib_rowland.c 
> -I/usr/local/ofed/include
> 
> gcc -g -shared -Wl,-soname,libsrqtest.so -o libsrqtest.so 
> srq_pingpong_lib_rowland.o pingpong.o
> 
> gcc -g -o srq_pingpong_rowland srq_pingpong_rowland.c -L$PWD 
> -L/usr/local/ofed/lib64 -lsrqtest -libverbs
> 
> 
> Above I am not linking libibverbs into libsrqtest.so, thus it is
> required on the last gcc line. This is how MVAPICH2's libmpich.so file
> works, and from past experience, I've seen this before. Running shows:
> 
> [rowland at z1 ibverbs-examples]$ gdb ./srq_pingpong_rowland
> GNU gdb Red Hat Linux (6.3.0.0-1.132.EL4rh)
> Copyright 2004 Free Software Foundation, Inc.
> GDB is free software, covered by the GNU General Public License, and you are
> welcome to change it and/or distribute copies of it under certain 
> conditions.
> Type "show copying" to see the conditions.
> There is absolutely no warranty for GDB.  Type "show warranty" for details.
> This GDB was configured as "x86_64-redhat-linux-gnu"...Using host 
> libthread_db library "/lib64/tls/libthread_db.so.1".
> 
> (gdb) r
> Starting program: 
> /home/7/rowland/z1-test/ibverbs-examples/srq_pingpong_rowland
> [Thread debugging using libthread_db enabled]
> [New Thread 182896403968 (LWP 29858)]
> 
> Program received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 182896403968 (LWP 29858)]
> post_srq_recv_wrapper_1_0 (srq=0x5075b0, wr=0x7fbfff88d0, 
> bad_wr=0x7fbfff88c8)
>      at src/compat-1_0.c:312
> 312     src/compat-1_0.c: No such file or directory.
>          in src/compat-1_0.c
> (gdb) bt
> #0  post_srq_recv_wrapper_1_0 (srq=0x5075b0, wr=0x7fbfff88d0,
>      bad_wr=0x7fbfff88c8) at src/compat-1_0.c:312
> #1  0x0000002a95559e12 in ibv_post_srq_recv (srq=0x5075b0,
>      recv_wr=0x7fbfff88d0, bad_recv_wr=0x7fbfff88c8)
>      at /usr/local/ofed/include/infiniband/verbs.h:915
> #2  0x0000002a95559dcf in pp_post_recv (ctx=0x5023d0, n=500)
>      at srq_pingpong_lib_rowland.c:496
> #3  0x0000002a9555a614 in lib_start (argc=1, argv=0x7fbffff7f8)
>      at srq_pingpong_lib_rowland.c:696
> #4  0x0000000000400608 in main (argc=1, argv=0x7fbffff7f8)
>      at srq_pingpong_rowland.c:36
> (gdb) quit
> 
> It is not clear to me why the difference of either linking libibverbs
> into libsrqtest.so or not doing so causes the IBVERBS 1.1 ABI to be used
> or not. I looked at the libibverbs code, and the 1.1 ABI is the default.
> The libsrqtest.so file in the above case seems to have lost this
> information:
> 
> [rowland at z1 ibverbs-examples]$ nm libsrqtest.so |grep ibv |head
>                   U ibv_ack_cq_events
>                   U ibv_alloc_pd
>                   U ibv_close_device
>                   U ibv_create_comp_channel
>                   U ibv_create_cq
>                   U ibv_create_qp
>                   U ibv_create_srq
>                   U ibv_dealloc_pd
>                   U ibv_dereg_mr
>                   U ibv_destroy_comp_channel
> 
> I've never had to deal with an ABI issue like this in shared library
> linking/usage. Does it make sense for this to be the case? I think
> perhaps it does, but I wanted to ask.
> 
> I've placed my test code here if it helps:
> 
> http://www.cse.ohio-state.edu/~rowland/ibverbs-examples.tar.gz
> 
> I have a fix for our code that I am testing now. It seems to work and
> solve the observed problems, but more testing will be required to be
> sure there are no issues. This will require a new SRPM if the fix is
> required, which it seems at this point.


From swise at opengridcomputing.com  Thu Feb 15 09:12:06 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Thu, 15 Feb 2007 11:12:06 -0600
Subject: [openib-general] bug 355 - problems building modules that depend on
 the ofed 1.2 modules
Message-ID: <1171559526.13282.41.camel@stevo-desktop>

All,

I've run into the following problem.  Bug 335 opened to track this...

I install the alpha1 ofed 1.2 rpms on a RHEL5b2 system with its
2.6.18-1.2747.el5 kernel.

Then I build a module outside of the kernel that uses the IB verbs and
RDMA CM kernel interface.  (krping).  This module builds and loads ok on
a stock 2.6.20 system with ofed1.2 installed, but it fails to load on
the rhel5b2 system with a version symbol problem.  Here is a snipit of
the errors:

rdma_krping: disagrees about version of symbol ib_create_cq
rdma_krping: Unknown symbol ib_create_cq
rdma_krping: disagrees about version of symbol rdma_resolve_addr
rdma_krping: Unknown symbol rdma_resolve_addr
rdma_krping: disagrees about version of symbol ib_dereg_mr
rdma_krping: Unknown symbol ib_dereg_mr

I'm wondering if maybe the ofed modules are _not_ being build with src
versioning even if the kernel has it turned on?

We see similar problems with NFS-RDMA trying to use OFED 1.2 modules.
And the NFS-RDMA works with OFED 1.1 modules, so I _think_ something is
whacked with the OFED 1.2 build process.

Here is the Makefile I'm using to build rdma_krping (borrowed from
Intel's e1000 kit):

[root at vic12 krping]# cat Makefile
KSRC=/lib/modules/`uname -r`/source
KOBJ=/lib/modules/`uname -r`/build
CFLAGS += -DLINUX -D__KERNEL__ -DMODULE -O2 -pipe -Wall
CFLAGS += -I/usr/local/ofed/src/ofa_kernel-1.2/include -I$(KSRC)/include -I.
CFLAGS += $(shell [ -f $(KSRC)/include/linux/modversions.h ] && \
            echo "-DMODVERSIONS -DEXPORT_SYMTAB \
                  -include $(KSRC)/include/linux/modversions.h")

CFLAGS += $(CFLAGS_EXTRA)

obj-m += rdma_krping.o
rdma_krping-y                   := getopt.o krping.o

default:
        make -C $(KSRC) O=$(KOBJ) SUBDIRS=$(shell pwd) modules

clean:
        rm -f *.o
        rm -f *.ko
        rm -f rdma_krping.mod.c
        rm -f Module.symvers
[root at vic12 krping]#


From dledford at redhat.com  Thu Feb 15 09:11:17 2007
From: dledford at redhat.com (Doug Ledford)
Date: Thu, 15 Feb 2007 12:11:17 -0500
Subject: [openib-general] 32-bit build for ppc64 is required
In-Reply-To: <20070215162942.GB15185@mellanox.co.il>
References: <1171477762.3161.105.camel@fc6.xsintricity.com>
	<20070215055751.GA11866@mellanox.co.il>
	<1171552335.3161.128.camel@fc6.xsintricity.com>
	<20070215162942.GB15185@mellanox.co.il>
Message-ID: <1171559478.3161.155.camel@fc6.xsintricity.com>

On Thu, 2007-02-15 at 18:29 +0200, Michael S. Tsirkin wrote:
> > Quoting Doug Ledford <dledford at redhat.com>:
> > Subject: Re: 32-bit build for ppc64 is required
> > 
> > On Thu, 2007-02-15 at 07:57 +0200, Michael S. Tsirkin wrote:
> > 
> > > > The choice of 32/64 bit default is done on a per arch basis.  With
> > > > x86_64/i386, the increased number of CPU registers in 64bit mode
> > > > outweighs the increased code bloat that goes along with 64bit mode.  On
> > > > PPC, no such register benefit exists for 64bit mode.  As such, 32bit
> > > > apps on PPC are faster than the equivalent 64bit apps up to the point at
> > > > which a 4GB address space becomes a problem.  Correspondingly, the
> > > > default binaries on PPC are 32bit, and only those that *need* to be
> > > > 64bit are.  While a customer's application may need >4GB address space,
> > > > certainly all the ibutils, diags, opensm, etc. do not.  As a result, we
> > > > compile all of those utilities as 32bit by default on PPC.  We also ship
> > > > all the libs as both 32/64bit so users can select the appropriate
> > > > environment for their particular application (with the exception of
> > > > dapl, which doesn't support 32bit and for which I filed a bug around the
> > > > time of OFED 1.1).
> > > 
> > > So, what you suggest is - build 2 types of libraries, but on PPC make
> > > binaries 32 bit? That's easy - do others agree to this approach?
> > 
> > Yep, that's what we do.
> 
> Care to post a patch to Vlad's scripts?

Yuk.  I suppose I could write one, but I don't (and can't) use any of
the OFED supplied build scripts in our build system, so it's hard for me
to test since our build system is the only way I have to access
ppc/ppc64 hardware.

-- 
Doug Ledford <dledford at redhat.com>
              GPG KeyID: CFBFF194
              http://people.redhat.com/dledford

Infiniband specific RPMs available at
              http://people.redhat.com/dledford/Infiniband
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070215/06ab03f3/attachment.sig>

From mst at mellanox.co.il  Thu Feb 15 09:43:09 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Thu, 15 Feb 2007 19:43:09 +0200
Subject: [openib-general] 32-bit build for ppc64 is required
In-Reply-To: <1171559478.3161.155.camel@fc6.xsintricity.com>
References: <1171559478.3161.155.camel@fc6.xsintricity.com>
Message-ID: <20070215174309.GD15185@mellanox.co.il>

> Quoting Doug Ledford <dledford at redhat.com>:
> Subject: Re: 32-bit build for ppc64 is required
> 
> On Thu, 2007-02-15 at 18:29 +0200, Michael S. Tsirkin wrote:
> > > Quoting Doug Ledford <dledford at redhat.com>:
> > > Subject: Re: 32-bit build for ppc64 is required
> > > 
> > > On Thu, 2007-02-15 at 07:57 +0200, Michael S. Tsirkin wrote:
> > > 
> > > > > The choice of 32/64 bit default is done on a per arch basis.  With
> > > > > x86_64/i386, the increased number of CPU registers in 64bit mode
> > > > > outweighs the increased code bloat that goes along with 64bit mode.  On
> > > > > PPC, no such register benefit exists for 64bit mode.  As such, 32bit
> > > > > apps on PPC are faster than the equivalent 64bit apps up to the point at
> > > > > which a 4GB address space becomes a problem.  Correspondingly, the
> > > > > default binaries on PPC are 32bit, and only those that *need* to be
> > > > > 64bit are.  While a customer's application may need >4GB address space,
> > > > > certainly all the ibutils, diags, opensm, etc. do not.  As a result, we
> > > > > compile all of those utilities as 32bit by default on PPC.  We also ship
> > > > > all the libs as both 32/64bit so users can select the appropriate
> > > > > environment for their particular application (with the exception of
> > > > > dapl, which doesn't support 32bit and for which I filed a bug around the
> > > > > time of OFED 1.1).
> > > > 
> > > > So, what you suggest is - build 2 types of libraries, but on PPC make
> > > > binaries 32 bit? That's easy - do others agree to this approach?
> > > 
> > > Yep, that's what we do.
> > 
> > Care to post a patch to Vlad's scripts?
> 
> Yuk.  I suppose I could write one, but I don't (and can't) use any of
> the OFED supplied build scripts in our build system, so it's hard for me
> to test since our build system is the only way I have to access
> ppc/ppc64 hardware.

Oh, well.
Other takers?


-- 
MST


From krause at cup.hp.com  Thu Feb 15 09:42:37 2007
From: krause at cup.hp.com (Michael Krause)
Date: Thu, 15 Feb 2007 09:42:37 -0800
Subject: [openib-general] Immediate data question
In-Reply-To: <309a667c0702142137p724172f5va93a0ef046a60483@mail.gmail.co
 m>
References: <6C2C79E72C305246B504CBA17B5500C905DC04@mtlexch01.mtl.com>
	<349DCDA352EACF42A0C49FA6DCEA84035DF880@G3W0634.americas.hpqcorp.net>
	<adamz3pfym0.fsf@cisco.com>
	<349DCDA352EACF42A0C49FA6DCEA84035DFAFF@G3W0634.americas.hpqcorp.net>
	<adahctxeds8.fsf@cisco.com>
	<6.2.0.14.2.20070208131610.085c52f8@esmail.cup.hp.com>
	<349DCDA352EACF42A0C49FA6DCEA840362A87E@G3W0634.americas.hpqcorp.net>
	<309a667c0702112110h7a79961fv9c3cf46d4392e1d4@mail.gmail.com>
	<309a667c0702130537u35745e98y429d3d564fb093e9@mail.gmail.com>
	<6.2.0.14.2.20070213125130.07f4dbf8@esmail.cup.hp.com>
	<309a667c0702142137p724172f5va93a0ef046a60483@mail.gmail.com>
Message-ID: <6.2.0.14.2.20070215071309.0979bed8@esmail.cup.hp.com>

At 09:37 PM 2/14/2007, Devesh Sharma wrote:
>On 2/14/07, Michael Krause <krause at cup.hp.com> wrote:
>>At 05:37 AM 2/13/2007, Devesh Sharma wrote:
>> >On 2/12/07, Devesh Sharma <devesh28 at gmail.com> wrote:
>> >>On 2/10/07, Tang, Changqing <changquing.tang at hp.com> wrote:
>> >> > > >
>> >> > > >Not for the receiver, but the sender will be severely slowed down by
>> >> > > >having to wait for the RNR timeouts.
>> >> > >
>> >> > > RNR = Receiver Not Ready so by definition, the data flow
>> >> > > isn't going to
>> >> > > progress until the receiver is ready to receive data.   If a
>> >> > > receive QP
>> >> > > enters RNR for a RC, then it is likely not progressing as
>> >> > > desired.   RNR
>> >> > > was initially put in place to enable a receiver to create
>> >> > > back pressure to the sender without causing a fatal error
>> >> > > condition.  It should rarely be entered and therefore should
>> >> > > have negligible impact on overall performance however when a
>> >> > > RNR occurs, no forward progress will occur so performance is
>> >> > > essentially zero.
>> >> >
>> >> > Mike:
>> >> >         I still do not quite understand this issue. I have two
>> >> > situations that have RNR triggered.
>> >> >
>> >> > 1. process A and process B is connected with QP. A first post a send to
>> >> > B, B does not post receive. Then A and B are doing a long time
>> >> > RDMA_WRITE each other, A and B just check memory for the RDMA_WRITE
>> >> > message. Finally B will post a receive. Does the first pending send 
>> in A
>> >> > block all the later RDMA_WRITE ?
>> >>According to IBTA spec HCA will process WR entries in strict order in
>> >>which they are posted so the send will block all WR posted after this
>> >>send, Until-unless HCA has multiple processing elements, I think even
>> >>then processing order will be maintained by HCA
>> >>  If not, since RNR is triggered
>> >> > periodically till B post receive, does it affect the RDMA_WRITE
>> >> > performance between A and B ?
>> >> >
>> >> > 2. extend above to three processes, A connect to B, B connect to C, 
>> so B
>> >> > has two QPs, but one CQ.A posts a send to B, B does not post receive,
>> >post ordering accross QP is not guaranteed hence presence of same CQ
>> >or different CQ will not affect any thing.
>> >> > rather B and C are doing a long time RDMA_WRITE,or send/recv. But B
>> >If RDMA WRITE _on_ B, no effect on performance. If RDMA WRITE _on_ C,
>I am sorry I have missed that in both cases same DMA channel is in use.
>> >_may_ affect the performance, since load is on same HCA. In case of
>> >Send/Recv again _may_ affect the performance, with the same reason.
>>
>>Seems orthogonal.  Any time h/w is shared, multiple flows will have an
>>impact on one another.  That is why we have the different arbitration
>>mechanisms to enable one to control that impact.
>Please, can you explain it more clearly?

Most I/O devices are shared by multiple applications / kernel 
subsystems.   Hence, the device acts as a serialization point for what goes 
on the wire / link.   Sharing = resource contention and in order to add any 
structure to that contention, a number of technologies provide arbitration 
options.   In the case of IB, the arbitration is confined to VL arbitration 
where a given data flow is assigned to a VL and that VL is services at some 
particular rate.   A number of years ago I wrote up how one might also 
provide QP arbitration (not part of the IBTA specifications) and I 
understand some implementations have incorporated that or a variation of 
the mechanisms into their products.

In addition to IB link contention, there is also PCI link / bus 
contention.   For PCIe, given most designs did not want to waste resources 
on multiple VC, there really isn't any standard arbitration 
mechanism.   However, many devices, especially a device like a HCA or a 
RNIC, already have the concept of separate resource domains, e.g. QP, and 
they provide a mechanism to associate how the QP's DMA requests or 
interrupts requests are scheduled to the PCIe link.


>> >> > must sends RNR periodically to A, right?. So does the pending message
>> >> > from A affects B's overall performance  between B and C ?
>> >But RNR NAK is not for very long time.....possibly this performance
>> >hit you will not be able to observe even. The moment rnr_counter
>> >expires connection will be broken!
>>
>>Keep in mind the timeout can be infinite.  RNR NAK are not expected to be
>>frequent so their performance impact was considered reasonable.
>Thanks I missed that.

It is a subtlety within the specification that is easy to miss.

Mike 


From rdreier at cisco.com  Thu Feb 15 09:48:14 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Thu, 15 Feb 2007 09:48:14 -0800
Subject: [openib-general] [PATCH 2.6.21-rc1 5/5] ehca: query_port()
 returns LINK_UP instead UNKNOWN
In-Reply-To: <200702151710.06432.hnguyen@linux.vnet.ibm.com> (Hoang-Nam
	Nguyen's message of "Thu, 15 Feb 2007 17:10:06 +0100")
References: <200702151710.06432.hnguyen@linux.vnet.ibm.com>
Message-ID: <ada4ppntiqp.fsf@cisco.com>

Thanks, queued 1, 2, 3 and 5 for 2.6.21.


From dledford at redhat.com  Thu Feb 15 09:49:43 2007
From: dledford at redhat.com (Doug Ledford)
Date: Thu, 15 Feb 2007 12:49:43 -0500
Subject: [openib-general] OFED 1.2 dapl and dat.conf
In-Reply-To: <45D37E8E.5050800@ichips.intel.com>
References: <1171397522.21471.7.camel@stevo-desktop>
	<45D37E8E.5050800@ichips.intel.com>
Message-ID: <1171561783.3161.165.camel@fc6.xsintricity.com>

On Wed, 2007-02-14 at 13:26 -0800, Arlin Davis wrote:
> Steve Wise wrote:
> 
> >Currently, the dapl rpms don't install dat.conf.  I think they probably
> >should, eh?  Maybe in <prefix>/etc/dat.conf
> >  
> >
> my specfile is setup to target sysconfdir which is typically set to 
> `$(prefix)/etc'
> 
> %{_sysconfdir}/dat.conf
> 
> I am not sure how the 1.2 scripts are building the rpms. Maybe Vladimir 
> can help explain?

Note that this setup is problematic on multilib arches.  Since the
dat.conf file hard codes a library path that's different for 32bit/64bit
arches, installing both a 32bit and 64bit dapl library is impossible
without munging things.

For RHEL4U5/RHEL5 I changed the dat library to read dat<bits>.conf and
have two separate conf files.  A probably better approach would be to
change the library to use a relative library name that it looks for
starting from the libraries own directory.  Hence if the dapl library is
in /usr/lib, it looks in /usr/lib.  Doing that would allow the
32bit/64bit libraries to share the same config file.

-- 
Doug Ledford <dledford at redhat.com>
              GPG KeyID: CFBFF194
              http://people.redhat.com/dledford

Infiniband specific RPMs available at
              http://people.redhat.com/dledford/Infiniband
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070215/22b8f382/attachment.sig>

From rdreier at cisco.com  Thu Feb 15 09:57:48 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Thu, 15 Feb 2007 09:57:48 -0800
Subject: [openib-general] [PATCH 2.6.21-rc1 4/5] ehca: replace yield()
 by wait_for_completion()
In-Reply-To: <200702151709.45323.hnguyen@linux.vnet.ibm.com> (Hoang-Nam
	Nguyen's message of "Thu, 15 Feb 2007 17:09:44 +0100")
References: <200702151709.45323.hnguyen@linux.vnet.ibm.com>
Message-ID: <adazm7fs3qb.fsf@cisco.com>

Looking at this one more time, I think it actually may be buggy:

 > @@ -147,6 +147,7 @@ struct ib_cq *ehca_create_cq(struct ib_d
 >  	spin_lock_init(&my_cq->spinlock);
 >  	spin_lock_init(&my_cq->cb_lock);
 >  	spin_lock_init(&my_cq->task_lock);
 > +	init_completion(&my_cq->zero_callbacks);

So you initialize the zero_callbacks completion once, at
ehca_create_cq().

But then 

 > @@ -612,11 +613,14 @@ static void run_comp_task(struct ehca_cp
 >  
 >  		spin_lock(&cq->task_lock);
 >  		cq->nr_callbacks--;
 > -		if (cq->nr_callbacks == 0) {
 > +		is_complete = (cq->nr_callbacks == 0);
 > +		if (is_complete) {
 >  			list_del_init(cct->cq_list.next);
 >  			cct->cq_jobs--;
 >  		}
 >  		spin_unlock(&cq->task_lock);
 > +		if (is_complete) /* wake up waiting destroy_cq() */
 > +			complete(&cq->zero_callbacks);
 >  	}

every time nr_callbacks drops to 0, you complete the zero_callbacks
completion.  So the first time a callback runs, you will complete
zero_callbacks, which will let wait_for_completion() finish even if
you later increment nr_callbacks again.

Also this

 > -	while (my_cq->nr_callbacks) {
 > +	if (my_cq->nr_callbacks) {
 >  		spin_unlock_irqrestore(&ehca_cq_idr_lock, flags);
 > -		yield();
 > +		wait_for_completion(&my_cq->zero_callbacks);
 >  		spin_lock_irqsave(&ehca_cq_idr_lock, flags);
 >  	}

looks rather unsafe -- I don't see any common locking protecting both
this test of nr_callbacks and the setting of nr_callbacks in the ehca
irq handling... so I don't see anything protecting you from seeing
nr_callbacks==0 and not going into the if() (or while() -- the old
code has the same problem I think) but then doing ++nr_callbacks
somewhere else.  In fact since you do the idr_remove() and
hipz_h_destroy_cq() *after* you make sure no callbacks are running,
this seems like it could happen easily.

So I'm holding off on applying this for now.  Please think it over and
either tell me the current patch is OK, or fix it up.  There's not
really too much urgency because a change like this is something I
would be comfortable merging between 2.6.21-rc1 and -rc2.

 - R.


From rdreier at cisco.com  Thu Feb 15 09:59:29 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Thu, 15 Feb 2007 09:59:29 -0800
Subject: [openib-general] How heavy to resize a CQ ?
In-Reply-To: <349DCDA352EACF42A0C49FA6DCEA84037050CE@G3W0634.americas.hpqcorp.net>
	(Changqing Tang's message of "Thu, 15 Feb 2007 16:13:21 -0000")
References: <6C2C79E72C305246B504CBA17B5500C905DC04@mtlexch01.mtl.com>
	<adaveigvg7q.fsf@cisco.com>
	<349DCDA352EACF42A0C49FA6DCEA840350AAC4@G3W0634.americas.hpqcorp.net>
	<adatzy0qmt3.fsf@cisco.com>
	<349DCDA352EACF42A0C49FA6DCEA840350B1B5@G3W0634.americas.hpqcorp.net>
	<ada7iuwp5rr.fsf@cisco.com>
	<349DCDA352EACF42A0C49FA6DCEA84035DF880@G3W0634.americas.hpqcorp.net>
	<adamz3pfym0.fsf@cisco.com>
	<349DCDA352EACF42A0C49FA6DCEA84035DFAFF@G3W0634.americas.hpqcorp.net>
	<adahctxeds8.fsf@cisco.com>
	<349DCDA352EACF42A0C49FA6DCEA84037050CE@G3W0634.americas.hpqcorp.net>
Message-ID: <adaodnvs3ni.fsf@cisco.com>

 > 	 In dynamic process application, we don't know how many
 > connections a process will make when we create the CQ, so we don't know
 > the CQ size, what we do is to increase the CQ size when a new connection
 > is made, and decrease the CQ size when a connection is destroyed. My
 > question is, is ibv_resize_cq() a lightweight function call ?  Do we
 > have to drain the CQ before we resize the CQ ?

I would say that resizing a CQ is not lightweight -- I've never
benchmarked it but it's probably comparable to creating a CQ or
something like that.  There is no requirement to drain the CQ or
anything like that before resizing it -- you can resize it any time,
even if it is currently getting completions or being polled.

 - R.


From dledford at redhat.com  Thu Feb 15 09:56:34 2007
From: dledford at redhat.com (Doug Ledford)
Date: Thu, 15 Feb 2007 12:56:34 -0500
Subject: [openib-general] mvapich2 ofed 1.2 problem
In-Reply-To: <45D3E224.9060306@cse.ohio-state.edu>
References: <1171380610.15471.25.camel@stevo-desktop>
	<adar6sum1fq.fsf@cisco.com> <1171386686.15471.36.camel@stevo-desktop>
	<adazm7ikm7q.fsf@cisco.com> <45D1FD0B.2080606@cse.ohio-state.edu>
	<ada1wktlwux.fsf@cisco.com> <45D3E224.9060306@cse.ohio-state.edu>
Message-ID: <1171562194.3161.169.camel@fc6.xsintricity.com>

On Wed, 2007-02-14 at 23:31 -0500, Shaun Rowland wrote:

> It is not clear to me why the difference of either linking libibverbs
> into libsrqtest.so or not doing so causes the IBVERBS 1.1 ABI to be used
> or not. I looked at the libibverbs code, and the 1.1 ABI is the default.
> The libsrqtest.so file in the above case seems to have lost this
> information:
> 
> [rowland at z1 ibverbs-examples]$ nm libsrqtest.so |grep ibv |head
>                   U ibv_ack_cq_events
>                   U ibv_alloc_pd
>                   U ibv_close_device
>                   U ibv_create_comp_channel
>                   U ibv_create_cq
>                   U ibv_create_qp
>                   U ibv_create_srq
>                   U ibv_dealloc_pd
>                   U ibv_dereg_mr
>                   U ibv_destroy_comp_channel

It didn't loose the information, it never had it.  When you link both
libs against the application binary, the linker is resolving linkups and
writing that into the resulting application binary output, but unless
it's allowed to write into the libsrqtest.so binary and modify *it's*
link table, that particular versioning information can't be written.
Obviously, if every gcc compile that touched a shared library as a
source object file also attempted to write back to that source object
file, people would be very surprised when their attempt to link an
application failed due to permission problems on the shared library.

> I've never had to deal with an ABI issue like this in shared library
> linking/usage. Does it make sense for this to be the case? I think
> perhaps it does, but I wanted to ask.

Yes.  If you want symbol information in a shared lib that uses other
shared libs, then they have to be linked at .so creation time, not at
application creation time.

> I've placed my test code here if it helps:
> 
> http://www.cse.ohio-state.edu/~rowland/ibverbs-examples.tar.gz
> 
> I have a fix for our code that I am testing now. It seems to work and
> solve the observed problems, but more testing will be required to be
> sure there are no issues. This will require a new SRPM if the fix is
> required, which it seems at this point.
-- 
Doug Ledford <dledford at redhat.com>
              GPG KeyID: CFBFF194
              http://people.redhat.com/dledford

Infiniband specific RPMs available at
              http://people.redhat.com/dledford/Infiniband
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070215/0df393c2/attachment.sig>

From rowland at cse.ohio-state.edu  Thu Feb 15 10:09:26 2007
From: rowland at cse.ohio-state.edu (Shaun Rowland)
Date: Thu, 15 Feb 2007 13:09:26 -0500
Subject: [openib-general] mvapich2 ofed 1.2 problem
In-Reply-To: <1171562194.3161.169.camel@fc6.xsintricity.com>
References: <1171380610.15471.25.camel@stevo-desktop>
	<adar6sum1fq.fsf@cisco.com> <1171386686.15471.36.camel@stevo-desktop>
	<adazm7ikm7q.fsf@cisco.com> <45D1FD0B.2080606@cse.ohio-state.edu>
	<ada1wktlwux.fsf@cisco.com> <45D3E224.9060306@cse.ohio-state.edu>
	<1171562194.3161.169.camel@fc6.xsintricity.com>
Message-ID: <45D4A1D6.2030409@cse.ohio-state.edu>

Doug Ledford wrote:

> It didn't loose the information, it never had it.  When you link both
> libs against the application binary, the linker is resolving linkups and
> writing that into the resulting application binary output, but unless
> it's allowed to write into the libsrqtest.so binary and modify *it's*
> link table, that particular versioning information can't be written.

I thought that this might be the case, but I had never run into this
before. Thanks for clearing that up.

> Obviously, if every gcc compile that touched a shared library as a
> source object file also attempted to write back to that source object
> file, people would be very surprised when their attempt to link an
> application failed due to permission problems on the shared library.

Yes. I thought perhaps it would use the default ABI when the symbols
were resolved when making the binary, but as I said, I've never seen
this issue before. Clearly, it is working as you've described, and that
one thought I had seems not to make sense. Even when I tried making my
shared library the way I thought it should be done the first time, I
linked libibverbs into it at shared library creation time. Only when I
saw the difference did I try waiting until building the application.

> Yes.  If you want symbol information in a shared lib that uses other
> shared libs, then they have to be linked at .so creation time, not at
> application creation time.

I can make this happen.  I am testing it now.
-- 
Shaun Rowland	rowland at cse.ohio-state.edu
http://www.cse.ohio-state.edu/~rowland/


From Kapil.Dukle at med.ge.com  Thu Feb 15 10:22:56 2007
From: Kapil.Dukle at med.ge.com (Dukle, Kapil (GE Healthcare))
Date: Thu, 15 Feb 2007 13:22:56 -0500
Subject: [openib-general] IB diagnostic tool : ibping
Message-ID: <DE4D96C8DFF3B94BACC3B6FE3B7D140102F21C4A@CINMLVEM11.e2k.ad.ge.com>

Hi all,
 
I came across a list of tools for displaying information IB nodes and
testing connectivity/performance between nodes. (ex. ibping,
ibstat..etc). 
The list can be found here:
https://wiki.openfabrics.org/tiki-index.php?page=Diagnostics
 
Is there any link online to the manual pages for these commands? The
link on the page points to a server that is no longer maintained.
 
I'm trying to ping self using ibping and it fails without showing the
reason. What could be the problem?
 
[xxx at xxx ~]$ ibstat
CA 'mthca0'
        CA type: MT25208 (MT23108 compat mode)
        Number of ports: 2
        Firmware version: 4.7.400
        Hardware version: a0
        Node GUID: 0x0003ba00010027e4
        System image GUID: 0x0003ba00010027e7
        Port 1:
                State: Active
                Physical state: LinkUp
                Rate: 10
                Base lid: 2
                LMC: 0
                SM lid: 1
                Capability mask: 0x02510a68
                Port GUID: 0x0003ba00010027e5
        Port 2:
                State: Down
                Physical state: Polling
                Rate: 10
                Base lid: 0
                LMC: 0
                SM lid: 0
                Capability mask: 0x02510a68
                Port GUID: 0x0003ba00010027e6
[xxx at xxx ~]$ su
Password: 
[root at xxx]# ibping -v -G 0x0003ba00010027e5
ibwarn: [6207] ibping: Ping..
ibwarn: [6207] main: ibping to Lid 0x2 failed
ibwarn: [6207] ibping: Ping..
ibwarn: [6207] main: ibping to Lid 0x2 failed
ibwarn: [6207] ibping: Ping..
ibwarn: [6207] report: out due signal 2
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070215/9e3cbd9a/attachment.html>

From changquing.tang at hp.com  Thu Feb 15 10:29:01 2007
From: changquing.tang at hp.com (Tang, Changqing)
Date: Thu, 15 Feb 2007 18:29:01 -0000
Subject: [openib-general] How heavy to resize a CQ ?
In-Reply-To: <adaodnvs3ni.fsf@cisco.com>
References: <6C2C79E72C305246B504CBA17B5500C905DC04@mtlexch01.mtl.com><adaveigvg7q.fsf@cisco.com><349DCDA352EACF42A0C49FA6DCEA840350AAC4@G3W0634.americas.hpqcorp.net><adatzy0qmt3.fsf@cisco.com><349DCDA352EACF42A0C49FA6DCEA840350B1B5@G3W0634.americas.hpqcorp.net><ada7iuwp5rr.fsf@cisco.com><349DCDA352EACF42A0C49FA6DCEA84035DF880@G3W0634.americas.hpqcorp.net><adamz3pfym0.fsf@cisco.com><349DCDA352EACF42A0C49FA6DCEA84035DFAFF@G3W0634.americas.hpqcorp.net><adahctxeds8.fsf@cisco.com><349DCDA352EACF42A0C49FA6DCEA84037050CE@G3W0634.americas.hpqcorp.net>
	<adaodnvs3ni.fsf@cisco.com>
Message-ID: <349DCDA352EACF42A0C49FA6DCEA8403705365@G3W0634.americas.hpqcorp.net>


Thanks for your good point.  --CQ 

> -----Original Message-----
> From: Roland Dreier [mailto:rdreier at cisco.com] 
> Sent: Thursday, February 15, 2007 11:59 AM
> To: Tang, Changqing
> Cc: Michael S. Tsirkin; openib-general at openib.org
> Subject: Re: How heavy to resize a CQ ?
> 
>  > 	 In dynamic process application, we don't know how many
>  > connections a process will make when we create the CQ, so 
> we don't know  > the CQ size, what we do is to increase the 
> CQ size when a new connection  > is made, and decrease the CQ 
> size when a connection is destroyed. My  > question is, is 
> ibv_resize_cq() a lightweight function call ?  Do we  > have 
> to drain the CQ before we resize the CQ ?
> 
> I would say that resizing a CQ is not lightweight -- I've 
> never benchmarked it but it's probably comparable to creating 
> a CQ or something like that.  There is no requirement to 
> drain the CQ or anything like that before resizing it -- you 
> can resize it any time, even if it is currently getting 
> completions or being polled.
> 
>  - R.
> 


From boris at mellanox.com  Thu Feb 15 10:35:38 2007
From: boris at mellanox.com (Boris Shpolyansky)
Date: Thu, 15 Feb 2007 10:35:38 -0800
Subject: [openib-general] IB diagnostic tool : ibping
Message-ID: <1E3DCD1C63492545881FACB6063A57C16E449E@mtiexch01.mti.com>

Try 'man ibping' on the machine where you have OFED installed.
Also 'ibping -h' will list all available flags (without explanation).
 
Particularly for ibping command you need to start a Server first:
    ibping -S
and then to run the client side.
 
Hope this helps.
 
Boris Shpolyansky
Sr. Member of Technical Staff
Applications
Mellanox Technologies Inc.
2900 Stender Way
Santa Clara, CA 95054
Tel.: (408) 916 0014
Fax: (408) 970 3403
Cell: (408) 834 9365
www.mellanox.com
 

________________________________

From: openib-general-bounces at openib.org
[mailto:openib-general-bounces at openib.org] On Behalf Of Dukle, Kapil (GE
Healthcare)
Sent: Thursday, February 15, 2007 10:23 AM
To: openib-general at openib.org
Subject: [openib-general] IB diagnostic tool : ibping


Hi all,
 
I came across a list of tools for displaying information IB nodes and
testing connectivity/performance between nodes. (ex. ibping,
ibstat..etc). 
The list can be found here:
https://wiki.openfabrics.org/tiki-index.php?page=Diagnostics
 
Is there any link online to the manual pages for these commands? The
link on the page points to a server that is no longer maintained.
 
I'm trying to ping self using ibping and it fails without showing the
reason. What could be the problem?
 
[xxx at xxx ~]$ ibstat
CA 'mthca0'
        CA type: MT25208 (MT23108 compat mode)
        Number of ports: 2
        Firmware version: 4.7.400
        Hardware version: a0
        Node GUID: 0x0003ba00010027e4
        System image GUID: 0x0003ba00010027e7
        Port 1:
                State: Active
                Physical state: LinkUp
                Rate: 10
                Base lid: 2
                LMC: 0
                SM lid: 1
                Capability mask: 0x02510a68
                Port GUID: 0x0003ba00010027e5
        Port 2:
                State: Down
                Physical state: Polling
                Rate: 10
                Base lid: 0
                LMC: 0
                SM lid: 0
                Capability mask: 0x02510a68
                Port GUID: 0x0003ba00010027e6
[xxx at xxx ~]$ su
Password: 
[root at xxx]# ibping -v -G 0x0003ba00010027e5
ibwarn: [6207] ibping: Ping..
ibwarn: [6207] main: ibping to Lid 0x2 failed
ibwarn: [6207] ibping: Ping..
ibwarn: [6207] main: ibping to Lid 0x2 failed
ibwarn: [6207] ibping: Ping..
ibwarn: [6207] report: out due signal 2
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070215/09247c41/attachment.html>

From Kapil.Dukle at med.ge.com  Thu Feb 15 11:14:36 2007
From: Kapil.Dukle at med.ge.com (Dukle, Kapil (GE Healthcare))
Date: Thu, 15 Feb 2007 14:14:36 -0500
Subject: [openib-general] IB diagnostic tool : ibping
In-Reply-To: <1E3DCD1C63492545881FACB6063A57C16E449E@mtiexch01.mti.com>
Message-ID: <DE4D96C8DFF3B94BACC3B6FE3B7D140102F21D0C@CINMLVEM11.e2k.ad.ge.com>

Hi,
 
There is no manual page for ibping on the system. 
 
[root at xxx]# man ibping
No manual entry for ibping
[root at xxx]# ibping -h
Usage: ibping [-d(ebug) -e(rr_show) -v(erbose) -G(uid) -s smlid
-V(ersion) -C ca_name -P ca_port -t(imeout) timeout_ms -c ping_count
-f(lood) -o oui -S(erver)] <dest lid|guid>
[root at xxx]
 
I will try using the "-S" option to start the server as Boris suggested.
 
Thanks,
Kapil

________________________________

From: Boris Shpolyansky [mailto:boris at mellanox.com] 
Sent: Thursday, February 15, 2007 12:36 PM
To: Dukle, Kapil (GE Healthcare); openib-general at openib.org
Subject: RE: [openib-general] IB diagnostic tool : ibping


Try 'man ibping' on the machine where you have OFED installed.
Also 'ibping -h' will list all available flags (without explanation).
 
Particularly for ibping command you need to start a Server first:
    ibping -S
and then to run the client side.
 
Hope this helps.
 
Boris Shpolyansky
Sr. Member of Technical Staff
Applications
Mellanox Technologies Inc.
2900 Stender Way
Santa Clara, CA 95054
Tel.: (408) 916 0014
Fax: (408) 970 3403
Cell: (408) 834 9365
www.mellanox.com
 

________________________________

From: openib-general-bounces at openib.org
[mailto:openib-general-bounces at openib.org] On Behalf Of Dukle, Kapil (GE
Healthcare)
Sent: Thursday, February 15, 2007 10:23 AM
To: openib-general at openib.org
Subject: [openib-general] IB diagnostic tool : ibping


Hi all,
 
I came across a list of tools for displaying information IB nodes and
testing connectivity/performance between nodes. (ex. ibping,
ibstat..etc). 
The list can be found here:
https://wiki.openfabrics.org/tiki-index.php?page=Diagnostics
 
Is there any link online to the manual pages for these commands? The
link on the page points to a server that is no longer maintained.
 
I'm trying to ping self using ibping and it fails without showing the
reason. What could be the problem?
 
[xxx at xxx ~]$ ibstat
CA 'mthca0'
        CA type: MT25208 (MT23108 compat mode)
        Number of ports: 2
        Firmware version: 4.7.400
        Hardware version: a0
        Node GUID: 0x0003ba00010027e4
        System image GUID: 0x0003ba00010027e7
        Port 1:
                State: Active
                Physical state: LinkUp
                Rate: 10
                Base lid: 2
                LMC: 0
                SM lid: 1
                Capability mask: 0x02510a68
                Port GUID: 0x0003ba00010027e5
        Port 2:
                State: Down
                Physical state: Polling
                Rate: 10
                Base lid: 0
                LMC: 0
                SM lid: 0
                Capability mask: 0x02510a68
                Port GUID: 0x0003ba00010027e6
[xxx at xxx ~]$ su
Password: 
[root at xxx]# ibping -v -G 0x0003ba00010027e5
ibwarn: [6207] ibping: Ping..
ibwarn: [6207] main: ibping to Lid 0x2 failed
ibwarn: [6207] ibping: Ping..
ibwarn: [6207] main: ibping to Lid 0x2 failed
ibwarn: [6207] ibping: Ping..
ibwarn: [6207] report: out due signal 2
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070215/83eb849f/attachment.html>

From HNGUYEN at de.ibm.com  Thu Feb 15 11:30:13 2007
From: HNGUYEN at de.ibm.com (Hoang-Nam Nguyen)
Date: Thu, 15 Feb 2007 20:30:13 +0100
Subject: [openib-general] 32-bit build for ppc64 is required
In-Reply-To: <20070215174309.GD15185@mellanox.co.il>
Message-ID: <OFFBAF19C0.B25E1F90-ONC1257283.006ACFE5-C1257283.006B231F@de.ibm.com>

> > Yuk.  I suppose I could write one, but I don't (and can't) use any of
> > the OFED supplied build scripts in our build system, so it's hard for
me
> > to test since our build system is the only way I have to access
> > ppc/ppc64 hardware.
> Oh, well.
> Other takers?
OK, I've no choice to say no. Haven't look at the scripts yet. But will do
in next couple of days!
Nam


From mshefty at ichips.intel.com  Thu Feb 15 11:39:49 2007
From: mshefty at ichips.intel.com (Sean Hefty)
Date: Thu, 15 Feb 2007 11:39:49 -0800
Subject: [openib-general] IB routing discussion summary
In-Reply-To: <000601c74fb4$6ed83840$8698070a@amr.corp.intel.com>
References: <000601c74fb4$6ed83840$8698070a@amr.corp.intel.com>
Message-ID: <45D4B705.5020805@ichips.intel.com>

> Ideas were presented around trying to construct an 'inter-subnet path record'
> that contained the following:
> 
>    - Side A GRH.SGID = active side's Port GID
>    - Side A GRH.DGID = passive side's Port GID
>    - Side A LRH.SLID = any active side's port LID
>    - Side A LRH.DLID = A subnet router
>    - Side A LRH.SL   = SL to A subnet router
> 
>    - Side B GRH.SGID = Side A GRH.DGID
>    - Side B GRH.DGID = Side A GRH.SGID
>    - Side B LRH.SLID = any passive side's port LID
>    - Side B LRH.DLID = B subnet router
>    - Side B LRH.SL   = SL to B subnet router

Until I can become convinced that the above isn't needed, I've been trying to 
brainstorm of ways to obtain this information.

0. Have the SA return pairs of PathRecords for inter-subnet queries.

But, since this simply punts the problem to the SA, my other thought is to 
define the following:

1. Inter-subnet PathRecord/MultiPathRecord Get/GetTable requests require both an 
SGID and DGID, one of which must be subnet local to the processing SA.
2. PathRecord/MultiPathRecord Get/GetTable request fields are relative to
the subnet specified by the SGID.
3. PathRecord GetResp/GetTableResp response fields are relative to the
subnet local to the processing SA.
4. SAs are addressable by a well-known GID suffix.

I think this may allow establishing inter-subnet connections.  As an example of
its usage:

a. Active side issues a PathRecord query to the local SA with SGID=local,
DGID=remote.
b. SA responds with PathRecord(s).
c. Active side selects local PathRecord P1.
d. Active side issues a PathRecord query to the remote SA using PathRecord P1 to
format the request: SGID, DGID, SLID, DLID, TC, FL, SL, etc.
e. The remote SA responds with PathRecord(s).  The SA must ensure that packets 
injected into the internetwork using P1 will route to the returned records.
f. Active side selects remote PathRecord P2.
g. Active side validates that remote packets injected using P2 route to P1.

At this point, the active side should have path information that can be used to
configure the QPs for a connection.

Assuming that this will work, what I don't like about it is the validation at 
step g.  This adds a third query that I don't see a way to eliminate.  If the 
check fails, the client restarts at step c.

- Sean


From halr at voltaire.com  Thu Feb 15 11:46:39 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 15 Feb 2007 14:46:39 -0500
Subject: [openib-general] IB diagnostic tool : ibping
In-Reply-To: <DE4D96C8DFF3B94BACC3B6FE3B7D140102F21D0C@CINMLVEM11.e2k.ad.ge.com>
References: <DE4D96C8DFF3B94BACC3B6FE3B7D140102F21D0C@CINMLVEM11.e2k.ad.ge.com>
Message-ID: <1171568778.22446.195441.camel@hal.voltaire.com>

On Thu, 2007-02-15 at 14:14, Dukle, Kapil (GE Healthcare) wrote:
> Hi,
>  
> There is no manual page for ibping on the system. 
>  
> [root at xxx]# man ibping
> No manual entry for ibping
> [root at xxx]# ibping -h
> Usage: ibping [-d(ebug) -e(rr_show) -v(erbose) -G(uid) -s smlid
> -V(ersion) -C ca_name -P ca_port -t(imeout) timeout_ms -c ping_count
> -f(lood) -o oui -S(erver)] <dest lid|guid>
> [root at xxx]
>  
> I will try using the "-S" option to start the server as Boris
> suggested.

What version of OFED are you running ? 1.0 ? 1.1 had them as does 1.2.
Attached is the latest man page but nothing has changed.

-- Hal

> Thanks,
> Kapil
> 
> 
> ______________________________________________________________________
> From: Boris Shpolyansky [mailto:boris at mellanox.com] 
> Sent: Thursday, February 15, 2007 12:36 PM
> To: Dukle, Kapil (GE Healthcare); openib-general at openib.org
> Subject: RE: [openib-general] IB diagnostic tool : ibping
> 
> 
> Try 'man ibping' on the machine where you have OFED installed.
> Also 'ibping -h' will list all available flags (without explanation).
>  
> Particularly for ibping command you need to start a Server first:
>     ibping -S
> and then to run the client side.
>  
> Hope this helps.
>  
> Boris Shpolyansky
> Sr. Member of Technical Staff
> Applications
> Mellanox Technologies Inc.
> 2900 Stender Way
> Santa Clara, CA 95054
> Tel.: (408) 916 0014
> Fax: (408) 970 3403
> Cell: (408) 834 9365
> www.mellanox.com
>  
> 
> ______________________________________________________________________
> From: openib-general-bounces at openib.org
> [mailto:openib-general-bounces at openib.org] On Behalf Of Dukle, Kapil
> (GE Healthcare)
> Sent: Thursday, February 15, 2007 10:23 AM
> To: openib-general at openib.org
> Subject: [openib-general] IB diagnostic tool : ibping
> 
> 
> Hi all,
>  
> I came across a list of tools for displaying information IB nodes and
> testing connectivity/performance between nodes. (ex. ibping,
> ibstat..etc). 
> The list can be found here:
> https://wiki.openfabrics.org/tiki-index.php?page=Diagnostics
>  
> Is there any link online to the manual pages for these commands? The
> link on the page points to a server that is no longer maintained.
>  
> I'm trying to ping self using ibping and it fails without showing the
> reason. What could be the problem?
>  
> [xxx at xxx ~]$ ibstat
> CA 'mthca0'
>         CA type: MT25208 (MT23108 compat mode)
>         Number of ports: 2
>         Firmware version: 4.7.400
>         Hardware version: a0
>         Node GUID: 0x0003ba00010027e4
>         System image GUID: 0x0003ba00010027e7
>         Port 1:
>                 State: Active
>                 Physical state: LinkUp
>                 Rate: 10
>                 Base lid: 2
>                 LMC: 0
>                 SM lid: 1
>                 Capability mask: 0x02510a68
>                 Port GUID: 0x0003ba00010027e5
>         Port 2:
>                 State: Down
>                 Physical state: Polling
>                 Rate: 10
>                 Base lid: 0
>                 LMC: 0
>                 SM lid: 0
>                 Capability mask: 0x02510a68
>                 Port GUID: 0x0003ba00010027e6
> [xxx at xxx ~]$ su
> Password: 
> [root at xxx]# ibping -v -G 0x0003ba00010027e5
> ibwarn: [6207] ibping: Ping..
> ibwarn: [6207] main: ibping to Lid 0x2 failed
> ibwarn: [6207] ibping: Ping..
> ibwarn: [6207] main: ibping to Lid 0x2 failed
> ibwarn: [6207] ibping: Ping..
> ibwarn: [6207] report: out due signal 2
> 
> ______________________________________________________________________
> 
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
-------------- next part --------------
.TH IBPING 8 "August 11, 2006" "OpenIB" "OpenIB Diagnostics"

.SH NAME
ibping \- ping an InfiniBand address

.SH SYNOPSIS
.B ibping
[\-d(ebug)] [\-e(rr_show)] [\-v(erbose)] [\-G(uid)] [\-C ca_name] [\-P ca_port] [\-s smlid] [\-t(imeout) timeout_ms] [\-V(ersion)] [\-c ping_count] [\-f(lood)] [\-o oui] [\-S(erver)] [\-h(elp)] <dest lid | guid>

.SH DESCRIPTION
.PP
ibping uses vendor mads to validate connectivity between IB nodes.
On exit, (IP) ping like output is show. ibping is run as client/server.
Default is to run as client. Note also that a default ping server is
implemented within the kernel.

.SH OPTIONS

.PP
.TP
\fB\-c\fR
stop after count packets
.TP
\fB\-f\fR, \fB\-\-flood\fR
flood destination: send packets back to back without delay
.TP
\fB\-o\fR, \fB\-\-oui\fR
use specified OUI number to multiplex vendor mads
.TP
\fB\-S\fR, \fB\-\-Server\fR
start in server mode (do not return)

.SH COMMON OPTIONS

Most OpenIB diagnostics take the following common flags. The exact list of 
supported flags per utility can be found in the usage message and can be shown
using the util_name -h syntax.

# Debugging flags
.PP
\-d      raise the IB debugging level.
        May be used several times (-ddd or -d -d -d).
.PP
\-e      show send and receive errors (timeouts and others)
.PP
\-h      show the usage message
.PP
\-v      increase the application verbosity level.
        May be used several times (-vv or -v -v -v)
.PP
\-V      show the version info.

# Addressing flags
.PP
\-G      use GUID address argument. In most cases, it is the Port GUID.
        Example:
        "0x08f1040023"
.PP
\-s <smlid>      use 'smlid' as the target lid for SM/SA queries.

# Other common flags:
.PP
\-C <ca_name>    use the specified ca_name.
.PP
\-P <ca_port>    use the specified ca_port.
.PP
\-t <timeout_ms> override the default timeout for the solicited mads.

Multiple CA/Multiple Port Support

When no IB device or port is specified, the port to use is selected
by the following criteria:
.PP
1. the first port that is ACTIVE.
.PP
2. if not found, the first port that is UP (physical link up).

If a port and/or CA name is specified, the user request is  
attempted to be fulfilled, and will fail if it is not possible.

.SH AUTHOR
.TP
Hal Rosenstock
.RI < halr at voltaire.com >

From swise at opengridcomputing.com  Thu Feb 15 11:54:22 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Thu, 15 Feb 2007 13:54:22 -0600
Subject: [openib-general] [PATCH] iw_cxgb3 Fix copyrights in the iw_cxgb3
	driver.
Message-ID: <1171569262.13282.59.camel@stevo-desktop>


Fix copyrights in the iw_cxgb3 driver.

Remove the Open Grid Computing copyright.  It shouldn't be there.

Signed-off-by: Steve Wise <swise at opengridcomputing.com>
---

 drivers/infiniband/hw/cxgb3/cxio_dbg.c      |    1 -
 drivers/infiniband/hw/cxgb3/cxio_hal.c      |    1 -
 drivers/infiniband/hw/cxgb3/cxio_hal.h      |    1 -
 drivers/infiniband/hw/cxgb3/cxio_resource.c |    1 -
 drivers/infiniband/hw/cxgb3/cxio_resource.h |    1 -
 drivers/infiniband/hw/cxgb3/cxio_wr.h       |    1 -
 drivers/infiniband/hw/cxgb3/iwch.c          |    1 -
 drivers/infiniband/hw/cxgb3/iwch.h          |    1 -
 drivers/infiniband/hw/cxgb3/iwch_cm.c       |    1 -
 drivers/infiniband/hw/cxgb3/iwch_cm.h       |    1 -
 drivers/infiniband/hw/cxgb3/iwch_cq.c       |    1 -
 drivers/infiniband/hw/cxgb3/iwch_ev.c       |    1 -
 drivers/infiniband/hw/cxgb3/iwch_mem.c      |    1 -
 drivers/infiniband/hw/cxgb3/iwch_provider.c |    1 -
 drivers/infiniband/hw/cxgb3/iwch_provider.h |    1 -
 drivers/infiniband/hw/cxgb3/iwch_qp.c       |    1 -
 drivers/infiniband/hw/cxgb3/iwch_user.h     |    1 -
 17 files changed, 0 insertions(+), 17 deletions(-)

diff --git a/drivers/infiniband/hw/cxgb3/cxio_dbg.c b/drivers/infiniband/hw/cxgb3/cxio_dbg.c
index 5a7306f..75f7b16 100644
--- a/drivers/infiniband/hw/cxgb3/cxio_dbg.c
+++ b/drivers/infiniband/hw/cxgb3/cxio_dbg.c
@@ -1,6 +1,5 @@
 /*
  * Copyright (c) 2006 Chelsio, Inc. All rights reserved.
- * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved.
  *
  * This software is available to you under a choice of one of two
  * licenses.  You may choose to be licensed under the terms of the GNU
diff --git a/drivers/infiniband/hw/cxgb3/cxio_hal.c b/drivers/infiniband/hw/cxgb3/cxio_hal.c
index 82fa720..114ac3b 100644
--- a/drivers/infiniband/hw/cxgb3/cxio_hal.c
+++ b/drivers/infiniband/hw/cxgb3/cxio_hal.c
@@ -1,6 +1,5 @@
 /*
  * Copyright (c) 2006 Chelsio, Inc. All rights reserved.
- * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved.
  *
  * This software is available to you under a choice of one of two
  * licenses.  You may choose to be licensed under the terms of the GNU
diff --git a/drivers/infiniband/hw/cxgb3/cxio_hal.h b/drivers/infiniband/hw/cxgb3/cxio_hal.h
index 1b97e80..8ab04a7 100644
--- a/drivers/infiniband/hw/cxgb3/cxio_hal.h
+++ b/drivers/infiniband/hw/cxgb3/cxio_hal.h
@@ -1,6 +1,5 @@
 /*
  * Copyright (c) 2006 Chelsio, Inc. All rights reserved.
- * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved.
  *
  * This software is available to you under a choice of one of two
  * licenses.  You may choose to be licensed under the terms of the GNU
diff --git a/drivers/infiniband/hw/cxgb3/cxio_resource.c b/drivers/infiniband/hw/cxgb3/cxio_resource.c
index 997aa32..65bf577 100644
--- a/drivers/infiniband/hw/cxgb3/cxio_resource.c
+++ b/drivers/infiniband/hw/cxgb3/cxio_resource.c
@@ -1,6 +1,5 @@
 /*
  * Copyright (c) 2006 Chelsio, Inc. All rights reserved.
- * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved.
  *
  * This software is available to you under a choice of one of two
  * licenses.  You may choose to be licensed under the terms of the GNU
diff --git a/drivers/infiniband/hw/cxgb3/cxio_resource.h b/drivers/infiniband/hw/cxgb3/cxio_resource.h
index a6bbe83..a2703a3 100644
--- a/drivers/infiniband/hw/cxgb3/cxio_resource.h
+++ b/drivers/infiniband/hw/cxgb3/cxio_resource.h
@@ -1,6 +1,5 @@
 /*
  * Copyright (c) 2006 Chelsio, Inc. All rights reserved.
- * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved.
  *
  * This software is available to you under a choice of one of two
  * licenses.  You may choose to be licensed under the terms of the GNU
diff --git a/drivers/infiniband/hw/cxgb3/cxio_wr.h b/drivers/infiniband/hw/cxgb3/cxio_wr.h
index 103fc42..90d7b89 100644
--- a/drivers/infiniband/hw/cxgb3/cxio_wr.h
+++ b/drivers/infiniband/hw/cxgb3/cxio_wr.h
@@ -1,6 +1,5 @@
 /*
  * Copyright (c) 2006 Chelsio, Inc. All rights reserved.
- * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved.
  *
  * This software is available to you under a choice of one of two
  * licenses.  You may choose to be licensed under the terms of the GNU
diff --git a/drivers/infiniband/hw/cxgb3/iwch.c b/drivers/infiniband/hw/cxgb3/iwch.c
index 4611afa..0315c9d 100644
--- a/drivers/infiniband/hw/cxgb3/iwch.c
+++ b/drivers/infiniband/hw/cxgb3/iwch.c
@@ -1,6 +1,5 @@
 /*
  * Copyright (c) 2006 Chelsio, Inc. All rights reserved.
- * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved.
  *
  * This software is available to you under a choice of one of two
  * licenses.  You may choose to be licensed under the terms of the GNU
diff --git a/drivers/infiniband/hw/cxgb3/iwch.h b/drivers/infiniband/hw/cxgb3/iwch.h
index 6517ef8..caf4e60 100644
--- a/drivers/infiniband/hw/cxgb3/iwch.h
+++ b/drivers/infiniband/hw/cxgb3/iwch.h
@@ -1,6 +1,5 @@
 /*
  * Copyright (c) 2006 Chelsio, Inc. All rights reserved.
- * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved.
  *
  * This software is available to you under a choice of one of two
  * licenses.  You may choose to be licensed under the terms of the GNU
diff --git a/drivers/infiniband/hw/cxgb3/iwch_cm.c b/drivers/infiniband/hw/cxgb3/iwch_cm.c
index a522b1b..e5442e3 100644
--- a/drivers/infiniband/hw/cxgb3/iwch_cm.c
+++ b/drivers/infiniband/hw/cxgb3/iwch_cm.c
@@ -1,6 +1,5 @@
 /*
  * Copyright (c) 2006 Chelsio, Inc. All rights reserved.
- * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved.
  *
  * This software is available to you under a choice of one of two
  * licenses.  You may choose to be licensed under the terms of the GNU
diff --git a/drivers/infiniband/hw/cxgb3/iwch_cm.h b/drivers/infiniband/hw/cxgb3/iwch_cm.h
index 7c810d9..0c6f281 100644
--- a/drivers/infiniband/hw/cxgb3/iwch_cm.h
+++ b/drivers/infiniband/hw/cxgb3/iwch_cm.h
@@ -1,6 +1,5 @@
 /*
  * Copyright (c) 2006 Chelsio, Inc. All rights reserved.
- * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved.
  *
  * This software is available to you under a choice of one of two
  * licenses.  You may choose to be licensed under the terms of the GNU
diff --git a/drivers/infiniband/hw/cxgb3/iwch_cq.c b/drivers/infiniband/hw/cxgb3/iwch_cq.c
index 98b3bdb..d7624c1 100644
--- a/drivers/infiniband/hw/cxgb3/iwch_cq.c
+++ b/drivers/infiniband/hw/cxgb3/iwch_cq.c
@@ -1,6 +1,5 @@
 /*
  * Copyright (c) 2006 Chelsio, Inc. All rights reserved.
- * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved.
  *
  * This software is available to you under a choice of one of two
  * licenses.  You may choose to be licensed under the terms of the GNU
diff --git a/drivers/infiniband/hw/cxgb3/iwch_ev.c b/drivers/infiniband/hw/cxgb3/iwch_ev.c
index a6efa8f..54362af 100644
--- a/drivers/infiniband/hw/cxgb3/iwch_ev.c
+++ b/drivers/infiniband/hw/cxgb3/iwch_ev.c
@@ -1,6 +1,5 @@
 /*
  * Copyright (c) 2006 Chelsio, Inc. All rights reserved.
- * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved.
  *
  * This software is available to you under a choice of one of two
  * licenses.  You may choose to be licensed under the terms of the GNU
diff --git a/drivers/infiniband/hw/cxgb3/iwch_mem.c b/drivers/infiniband/hw/cxgb3/iwch_mem.c
index 2b6cd53..a6c2c4b 100644
--- a/drivers/infiniband/hw/cxgb3/iwch_mem.c
+++ b/drivers/infiniband/hw/cxgb3/iwch_mem.c
@@ -1,6 +1,5 @@
 /*
  * Copyright (c) 2006 Chelsio, Inc. All rights reserved.
- * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved.
  *
  * This software is available to you under a choice of one of two
  * licenses.  You may choose to be licensed under the terms of the GNU
diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.c b/drivers/infiniband/hw/cxgb3/iwch_provider.c
index 6861087..2aef122 100644
--- a/drivers/infiniband/hw/cxgb3/iwch_provider.c
+++ b/drivers/infiniband/hw/cxgb3/iwch_provider.c
@@ -1,6 +1,5 @@
 /*
  * Copyright (c) 2006 Chelsio, Inc. All rights reserved.
- * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved.
  *
  * This software is available to you under a choice of one of two
  * licenses.  You may choose to be licensed under the terms of the GNU
diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.h b/drivers/infiniband/hw/cxgb3/iwch_provider.h
index 61e3278..2af3e93 100644
--- a/drivers/infiniband/hw/cxgb3/iwch_provider.h
+++ b/drivers/infiniband/hw/cxgb3/iwch_provider.h
@@ -1,6 +1,5 @@
 /*
  * Copyright (c) 2006 Chelsio, Inc. All rights reserved.
- * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved.
  *
  * This software is available to you under a choice of one of two
  * licenses.  You may choose to be licensed under the terms of the GNU
diff --git a/drivers/infiniband/hw/cxgb3/iwch_qp.c b/drivers/infiniband/hw/cxgb3/iwch_qp.c
index da13a38..4dda2f6 100644
--- a/drivers/infiniband/hw/cxgb3/iwch_qp.c
+++ b/drivers/infiniband/hw/cxgb3/iwch_qp.c
@@ -1,6 +1,5 @@
 /*
  * Copyright (c) 2006 Chelsio, Inc. All rights reserved.
- * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved.
  *
  * This software is available to you under a choice of one of two
  * licenses.  You may choose to be licensed under the terms of the GNU
diff --git a/drivers/infiniband/hw/cxgb3/iwch_user.h b/drivers/infiniband/hw/cxgb3/iwch_user.h
index c4e7fbe..cb7086f 100644
--- a/drivers/infiniband/hw/cxgb3/iwch_user.h
+++ b/drivers/infiniband/hw/cxgb3/iwch_user.h
@@ -1,6 +1,5 @@
 /*
  * Copyright (c) 2006 Chelsio, Inc. All rights reserved.
- * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved.
  *
  * This software is available to you under a choice of one of two
  * licenses.  You may choose to be licensed under the terms of the GNU


From HNGUYEN at de.ibm.com  Thu Feb 15 11:54:43 2007
From: HNGUYEN at de.ibm.com (Hoang-Nam Nguyen)
Date: Thu, 15 Feb 2007 20:54:43 +0100
Subject: [openib-general] [PATCH 2.6.21-rc1 4/5] ehca: replace yield()
 by wait_for_completion()
In-Reply-To: <adazm7fs3qb.fsf@cisco.com>
Message-ID: <OFAA6668FF.C2E1DD58-ONC1257283.006C7A81-C1257283.006D6172@de.ibm.com>

Hi,
> So I'm holding off on applying this for now.  Please think it over and
> either tell me the current patch is OK, or fix it up.  There's not
> really too much urgency because a change like this is something I
> would be comfortable merging between 2.6.21-rc1 and -rc2.
You're absolutely right. Let's target for rc2.
Thanks for this good catch!
Nam


From swise at opengridcomputing.com  Thu Feb 15 11:59:55 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Thu, 15 Feb 2007 13:59:55 -0600
Subject: [openib-general] [PATCH 1/2] ofed_1_2 Fix copyrights in the cxgb3
	driver.
Message-ID: <1171569595.13282.60.camel@stevo-desktop>

Fix copyrights in the cxgb3 driver.

Remove the Open Grid Computing copyright.  It shouldn't be there.

Signed-off-by: Steve Wise <swise at opengridcomputing.com>
---

 drivers/net/cxgb3/cxgb3_defs.h    |    1 -
 drivers/net/cxgb3/cxgb3_offload.c |    1 -
 drivers/net/cxgb3/cxgb3_offload.h |    1 -
 drivers/net/cxgb3/l2t.c           |    1 -
 drivers/net/cxgb3/l2t.h           |    1 -
 drivers/net/cxgb3/t3cdev.h        |    1 -
 6 files changed, 0 insertions(+), 6 deletions(-)

diff --git a/drivers/net/cxgb3/cxgb3_defs.h b/drivers/net/cxgb3/cxgb3_defs.h
old mode 100755
new mode 100644
index 16e0049..e14862b
--- a/drivers/net/cxgb3/cxgb3_defs.h
+++ b/drivers/net/cxgb3/cxgb3_defs.h
@@ -1,6 +1,5 @@
 /*
  * Copyright (c) 2006-2007 Chelsio, Inc. All rights reserved.
- * Copyright (c) 2006-2007 Open Grid Computing, Inc. All rights reserved.
  *
  * This software is available to you under a choice of one of two
  * licenses.  You may choose to be licensed under the terms of the GNU
diff --git a/drivers/net/cxgb3/cxgb3_offload.c b/drivers/net/cxgb3/cxgb3_offload.c
old mode 100755
new mode 100644
index c3a02d6..46e9068
--- a/drivers/net/cxgb3/cxgb3_offload.c
+++ b/drivers/net/cxgb3/cxgb3_offload.c
@@ -1,6 +1,5 @@
 /*
  * Copyright (c) 2006-2007 Chelsio, Inc. All rights reserved.
- * Copyright (c) 2006-2007 Open Grid Computing, Inc. All rights reserved.
  *
  * This software is available to you under a choice of one of two
  * licenses.  You may choose to be licensed under the terms of the GNU
diff --git a/drivers/net/cxgb3/cxgb3_offload.h b/drivers/net/cxgb3/cxgb3_offload.h
old mode 100755
new mode 100644
index 0e6beb6..f15446a
--- a/drivers/net/cxgb3/cxgb3_offload.h
+++ b/drivers/net/cxgb3/cxgb3_offload.h
@@ -1,6 +1,5 @@
 /*
  * Copyright (c) 2006-2007 Chelsio, Inc. All rights reserved.
- * Copyright (c) 2006-2007 Open Grid Computing, Inc. All rights reserved.
  *
  * This software is available to you under a choice of one of two
  * licenses.  You may choose to be licensed under the terms of the GNU
diff --git a/drivers/net/cxgb3/l2t.c b/drivers/net/cxgb3/l2t.c
old mode 100755
new mode 100644
index 3c0cb85..d660af7
--- a/drivers/net/cxgb3/l2t.c
+++ b/drivers/net/cxgb3/l2t.c
@@ -1,6 +1,5 @@
 /*
  * Copyright (c) 2003-2007 Chelsio, Inc. All rights reserved.
- * Copyright (c) 2006-2007 Open Grid Computing, Inc. All rights reserved.
  *
  * This software is available to you under a choice of one of two
  * licenses.  You may choose to be licensed under the terms of the GNU
diff --git a/drivers/net/cxgb3/l2t.h b/drivers/net/cxgb3/l2t.h
old mode 100755
new mode 100644
index ba5d2cb..d790013
--- a/drivers/net/cxgb3/l2t.h
+++ b/drivers/net/cxgb3/l2t.h
@@ -1,6 +1,5 @@
 /*
  * Copyright (c) 2003-2007 Chelsio, Inc. All rights reserved.
- * Copyright (c) 2006-2007 Open Grid Computing, Inc. All rights reserved.
  *
  * This software is available to you under a choice of one of two
  * licenses.  You may choose to be licensed under the terms of the GNU
diff --git a/drivers/net/cxgb3/t3cdev.h b/drivers/net/cxgb3/t3cdev.h
old mode 100755
new mode 100644
index 9af3bcd..fa4099b
--- a/drivers/net/cxgb3/t3cdev.h
+++ b/drivers/net/cxgb3/t3cdev.h
@@ -1,6 +1,5 @@
 /*
  * Copyright (C) 2006-2007 Chelsio Communications.  All rights reserved.
- * Copyright (C) 2006-2007 Open Grid Computing, Inc.  All rights reserved.
  *
  * This software is available to you under a choice of one of two
  * licenses.  You may choose to be licensed under the terms of the GNU


From swise at opengridcomputing.com  Thu Feb 15 12:00:21 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Thu, 15 Feb 2007 14:00:21 -0600
Subject: [openib-general] [PATCH 2/2] ofed_1_2 Fix copyrights in the
	iw_cxgb3 driver.
Message-ID: <1171569621.13282.62.camel@stevo-desktop>


Fix copyrights in the iw_cxgb3 driver.

Remove the Open Grid Computing copyright.  It shouldn't be there.

Signed-off-by: Steve Wise <swise at opengridcomputing.com>
---

 drivers/infiniband/hw/cxgb3/core/cxio_dbg.c      |    1 -
 drivers/infiniband/hw/cxgb3/core/cxio_hal.c      |    1 -
 drivers/infiniband/hw/cxgb3/core/cxio_hal.h      |    1 -
 drivers/infiniband/hw/cxgb3/core/cxio_resource.c |    1 -
 drivers/infiniband/hw/cxgb3/core/cxio_resource.h |    1 -
 drivers/infiniband/hw/cxgb3/core/cxio_wr.h       |    1 -
 drivers/infiniband/hw/cxgb3/iwch.c               |    1 -
 drivers/infiniband/hw/cxgb3/iwch.h               |    1 -
 drivers/infiniband/hw/cxgb3/iwch_cm.c            |    1 -
 drivers/infiniband/hw/cxgb3/iwch_cm.h            |    1 -
 drivers/infiniband/hw/cxgb3/iwch_cq.c            |    1 -
 drivers/infiniband/hw/cxgb3/iwch_ev.c            |    1 -
 drivers/infiniband/hw/cxgb3/iwch_mem.c           |    1 -
 drivers/infiniband/hw/cxgb3/iwch_provider.c      |    1 -
 drivers/infiniband/hw/cxgb3/iwch_provider.h      |    1 -
 drivers/infiniband/hw/cxgb3/iwch_qp.c            |    1 -
 drivers/infiniband/hw/cxgb3/iwch_user.h          |    1 -
 17 files changed, 0 insertions(+), 17 deletions(-)

diff --git a/drivers/infiniband/hw/cxgb3/core/cxio_dbg.c b/drivers/infiniband/hw/cxgb3/core/cxio_dbg.c
index dfaa704..d6b6c97 100644
--- a/drivers/infiniband/hw/cxgb3/core/cxio_dbg.c
+++ b/drivers/infiniband/hw/cxgb3/core/cxio_dbg.c
@@ -1,6 +1,5 @@
 /*
  * Copyright (c) 2006 Chelsio, Inc. All rights reserved.
- * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved.
  *
  * This software is available to you under a choice of one of two
  * licenses.  You may choose to be licensed under the terms of the GNU
diff --git a/drivers/infiniband/hw/cxgb3/core/cxio_hal.c b/drivers/infiniband/hw/cxgb3/core/cxio_hal.c
index 5e31816..229edd5 100644
--- a/drivers/infiniband/hw/cxgb3/core/cxio_hal.c
+++ b/drivers/infiniband/hw/cxgb3/core/cxio_hal.c
@@ -1,6 +1,5 @@
 /*
  * Copyright (c) 2006 Chelsio, Inc. All rights reserved.
- * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved.
  *
  * This software is available to you under a choice of one of two
  * licenses.  You may choose to be licensed under the terms of the GNU
diff --git a/drivers/infiniband/hw/cxgb3/core/cxio_hal.h b/drivers/infiniband/hw/cxgb3/core/cxio_hal.h
index e5e702d..1553bda 100644
--- a/drivers/infiniband/hw/cxgb3/core/cxio_hal.h
+++ b/drivers/infiniband/hw/cxgb3/core/cxio_hal.h
@@ -1,6 +1,5 @@
 /*
  * Copyright (c) 2006 Chelsio, Inc. All rights reserved.
- * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved.
  *
  * This software is available to you under a choice of one of two
  * licenses.  You may choose to be licensed under the terms of the GNU
diff --git a/drivers/infiniband/hw/cxgb3/core/cxio_resource.c b/drivers/infiniband/hw/cxgb3/core/cxio_resource.c
index d1d8722..cf78050 100644
--- a/drivers/infiniband/hw/cxgb3/core/cxio_resource.c
+++ b/drivers/infiniband/hw/cxgb3/core/cxio_resource.c
@@ -1,6 +1,5 @@
 /*
  * Copyright (c) 2006 Chelsio, Inc. All rights reserved.
- * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved.
  *
  * This software is available to you under a choice of one of two
  * licenses.  You may choose to be licensed under the terms of the GNU
diff --git a/drivers/infiniband/hw/cxgb3/core/cxio_resource.h b/drivers/infiniband/hw/cxgb3/core/cxio_resource.h
index a6bbe83..a2703a3 100644
--- a/drivers/infiniband/hw/cxgb3/core/cxio_resource.h
+++ b/drivers/infiniband/hw/cxgb3/core/cxio_resource.h
@@ -1,6 +1,5 @@
 /*
  * Copyright (c) 2006 Chelsio, Inc. All rights reserved.
- * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved.
  *
  * This software is available to you under a choice of one of two
  * licenses.  You may choose to be licensed under the terms of the GNU
diff --git a/drivers/infiniband/hw/cxgb3/core/cxio_wr.h b/drivers/infiniband/hw/cxgb3/core/cxio_wr.h
index 234a084..6c7ac55 100644
--- a/drivers/infiniband/hw/cxgb3/core/cxio_wr.h
+++ b/drivers/infiniband/hw/cxgb3/core/cxio_wr.h
@@ -1,6 +1,5 @@
 /*
  * Copyright (c) 2006 Chelsio, Inc. All rights reserved.
- * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved.
  *
  * This software is available to you under a choice of one of two
  * licenses.  You may choose to be licensed under the terms of the GNU
diff --git a/drivers/infiniband/hw/cxgb3/iwch.c b/drivers/infiniband/hw/cxgb3/iwch.c
index 0c95f2c..de44c57 100644
--- a/drivers/infiniband/hw/cxgb3/iwch.c
+++ b/drivers/infiniband/hw/cxgb3/iwch.c
@@ -1,6 +1,5 @@
 /*
  * Copyright (c) 2006 Chelsio, Inc. All rights reserved.
- * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved.
  *
  * This software is available to you under a choice of one of two
  * licenses.  You may choose to be licensed under the terms of the GNU
diff --git a/drivers/infiniband/hw/cxgb3/iwch.h b/drivers/infiniband/hw/cxgb3/iwch.h
index 8b11198..8d9390b 100644
--- a/drivers/infiniband/hw/cxgb3/iwch.h
+++ b/drivers/infiniband/hw/cxgb3/iwch.h
@@ -1,6 +1,5 @@
 /*
  * Copyright (c) 2006 Chelsio, Inc. All rights reserved.
- * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved.
  *
  * This software is available to you under a choice of one of two
  * licenses.  You may choose to be licensed under the terms of the GNU
diff --git a/drivers/infiniband/hw/cxgb3/iwch_cm.c b/drivers/infiniband/hw/cxgb3/iwch_cm.c
index 3237fc8..21fadbe 100644
--- a/drivers/infiniband/hw/cxgb3/iwch_cm.c
+++ b/drivers/infiniband/hw/cxgb3/iwch_cm.c
@@ -1,6 +1,5 @@
 /*
  * Copyright (c) 2006 Chelsio, Inc. All rights reserved.
- * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved.
  *
  * This software is available to you under a choice of one of two
  * licenses.  You may choose to be licensed under the terms of the GNU
diff --git a/drivers/infiniband/hw/cxgb3/iwch_cm.h b/drivers/infiniband/hw/cxgb3/iwch_cm.h
index 893f9d0..855f1ef 100644
--- a/drivers/infiniband/hw/cxgb3/iwch_cm.h
+++ b/drivers/infiniband/hw/cxgb3/iwch_cm.h
@@ -1,6 +1,5 @@
 /*
  * Copyright (c) 2006 Chelsio, Inc. All rights reserved.
- * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved.
  *
  * This software is available to you under a choice of one of two
  * licenses.  You may choose to be licensed under the terms of the GNU
diff --git a/drivers/infiniband/hw/cxgb3/iwch_cq.c b/drivers/infiniband/hw/cxgb3/iwch_cq.c
index ff09509..225fcfa 100644
--- a/drivers/infiniband/hw/cxgb3/iwch_cq.c
+++ b/drivers/infiniband/hw/cxgb3/iwch_cq.c
@@ -1,6 +1,5 @@
 /*
  * Copyright (c) 2006 Chelsio, Inc. All rights reserved.
- * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved.
  *
  * This software is available to you under a choice of one of two
  * licenses.  You may choose to be licensed under the terms of the GNU
diff --git a/drivers/infiniband/hw/cxgb3/iwch_ev.c b/drivers/infiniband/hw/cxgb3/iwch_ev.c
index 646f612..f4cd5ec 100644
--- a/drivers/infiniband/hw/cxgb3/iwch_ev.c
+++ b/drivers/infiniband/hw/cxgb3/iwch_ev.c
@@ -1,6 +1,5 @@
 /*
  * Copyright (c) 2006 Chelsio, Inc. All rights reserved.
- * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved.
  *
  * This software is available to you under a choice of one of two
  * licenses.  You may choose to be licensed under the terms of the GNU
diff --git a/drivers/infiniband/hw/cxgb3/iwch_mem.c b/drivers/infiniband/hw/cxgb3/iwch_mem.c
index 5909ec5..335e9a4 100644
--- a/drivers/infiniband/hw/cxgb3/iwch_mem.c
+++ b/drivers/infiniband/hw/cxgb3/iwch_mem.c
@@ -1,6 +1,5 @@
 /*
  * Copyright (c) 2006 Chelsio, Inc. All rights reserved.
- * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved.
  *
  * This software is available to you under a choice of one of two
  * licenses.  You may choose to be licensed under the terms of the GNU
diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.c b/drivers/infiniband/hw/cxgb3/iwch_provider.c
index 4a46771..3f64dbf 100644
--- a/drivers/infiniband/hw/cxgb3/iwch_provider.c
+++ b/drivers/infiniband/hw/cxgb3/iwch_provider.c
@@ -1,6 +1,5 @@
 /*
  * Copyright (c) 2006 Chelsio, Inc. All rights reserved.
- * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved.
  *
  * This software is available to you under a choice of one of two
  * licenses.  You may choose to be licensed under the terms of the GNU
diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.h b/drivers/infiniband/hw/cxgb3/iwch_provider.h
index d9d94e3..7322773 100644
--- a/drivers/infiniband/hw/cxgb3/iwch_provider.h
+++ b/drivers/infiniband/hw/cxgb3/iwch_provider.h
@@ -1,6 +1,5 @@
 /*
  * Copyright (c) 2006 Chelsio, Inc. All rights reserved.
- * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved.
  *
  * This software is available to you under a choice of one of two
  * licenses.  You may choose to be licensed under the terms of the GNU
diff --git a/drivers/infiniband/hw/cxgb3/iwch_qp.c b/drivers/infiniband/hw/cxgb3/iwch_qp.c
index 9cc8b5e..e1e35d9 100644
--- a/drivers/infiniband/hw/cxgb3/iwch_qp.c
+++ b/drivers/infiniband/hw/cxgb3/iwch_qp.c
@@ -1,6 +1,5 @@
 /*
  * Copyright (c) 2006 Chelsio, Inc. All rights reserved.
- * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved.
  *
  * This software is available to you under a choice of one of two
  * licenses.  You may choose to be licensed under the terms of the GNU
diff --git a/drivers/infiniband/hw/cxgb3/iwch_user.h b/drivers/infiniband/hw/cxgb3/iwch_user.h
index e8ff061..bf0a2f6 100644
--- a/drivers/infiniband/hw/cxgb3/iwch_user.h
+++ b/drivers/infiniband/hw/cxgb3/iwch_user.h
@@ -1,6 +1,5 @@
 /*
  * Copyright (c) 2006 Chelsio, Inc. All rights reserved.
- * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved.
  *
  * This software is available to you under a choice of one of two
  * licenses.  You may choose to be licensed under the terms of the GNU


From rdreier at cisco.com  Thu Feb 15 12:08:32 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Thu, 15 Feb 2007 12:08:32 -0800
Subject: [openib-general] remap_page_range() in older kernels
In-Reply-To: <1171554919.13282.17.camel@stevo-desktop> (Steve Wise's
	message of "Thu, 15 Feb 2007 09:55:19 -0600")
References: <1171554919.13282.17.camel@stevo-desktop>
Message-ID: <adar6srp4jj.fsf@cisco.com>

 > Do you remember any issues with using remap_page_range() in older
 > kernels for mapping memory allocated in the kernel back to a user
 > process?  

No, I would have thought it should work just like remap_pfn_range() in
later kernels.

 - R.


From swise at opengridcomputing.com  Thu Feb 15 12:19:48 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Thu, 15 Feb 2007 14:19:48 -0600
Subject: [openib-general] remap_page_range() in older kernels
In-Reply-To: <adar6srp4jj.fsf@cisco.com>
References: <1171554919.13282.17.camel@stevo-desktop>
	<adar6srp4jj.fsf@cisco.com>
Message-ID: <1171570788.13282.69.camel@stevo-desktop>

On Thu, 2007-02-15 at 12:08 -0800, Roland Dreier wrote:
>  > Do you remember any issues with using remap_page_range() in older
>  > kernels for mapping memory allocated in the kernel back to a user
>  > process?  
> 
> No, I would have thought it should work just like remap_pfn_range() in
> later kernels.
> 
>  - R.

Me too.  But it definitely isn't working for cxgb3.

Sigh...


From krause at cup.hp.com  Thu Feb 15 12:44:02 2007
From: krause at cup.hp.com (Michael Krause)
Date: Thu, 15 Feb 2007 12:44:02 -0800
Subject: [openib-general] IB routing discussion summary
In-Reply-To: <45D4B705.5020805@ichips.intel.com>
References: <000601c74fb4$6ed83840$8698070a@amr.corp.intel.com>
	<45D4B705.5020805@ichips.intel.com>
Message-ID: <6.2.0.14.2.20070215123631.09692088@esmail.cup.hp.com>

At 11:39 AM 2/15/2007, Sean Hefty wrote:
>>Ideas were presented around trying to construct an 'inter-subnet path record'
>>that contained the following:
>>    - Side A GRH.SGID = active side's Port GID
>>    - Side A GRH.DGID = passive side's Port GID
>>    - Side A LRH.SLID = any active side's port LID
>>    - Side A LRH.DLID = A subnet router
>>    - Side A LRH.SL   = SL to A subnet router
>>    - Side B GRH.SGID = Side A GRH.DGID
>>    - Side B GRH.DGID = Side A GRH.SGID
>>    - Side B LRH.SLID = any passive side's port LID
>>    - Side B LRH.DLID = B subnet router
>>    - Side B LRH.SL   = SL to B subnet router
>
>Until I can become convinced that the above isn't needed, I've been trying 
>to brainstorm of ways to obtain this information.

Is this first an IBTA problem to solve if you believe there is a 
problem?   I believe the track you are on is incorrect and any attempt to 
surface subnet local information across subnets will create unnecessary 
complexity and therefore makes such solutions less practical to execute 
within the industry.   I've tried to illustrate the role of the router, how 
the flows work, etc.  I believe these to be correct and are reflected not 
only in the existing specifications but also the prior router specification 
work and thinking.   They also parallel the IP world quite nicely which 
should also lend credence that subnet-local information does not need to be 
exchanged between subnets.   I contend CM does not require anything that is 
subnet local other than to target a given router port which should be 
derived from local SM/SA only information.  I will further state that SA-SA 
communication sans perhaps a P_Key / Q_Key service lookup should be avoided 
wherever possible.

I strongly urge you to take this problem to the IBTA where any issues 
regarding specification interpretation can be sorted out and an official 
position taken.   This will yield a faster and more successful 
investigation into whether there is a problem and if so, how best to solve it.

Mike


>0. Have the SA return pairs of PathRecords for inter-subnet queries.
>
>But, since this simply punts the problem to the SA, my other thought is to 
>define the following:
>
>1. Inter-subnet PathRecord/MultiPathRecord Get/GetTable requests require 
>both an SGID and DGID, one of which must be subnet local to the processing SA.
>2. PathRecord/MultiPathRecord Get/GetTable request fields are relative to
>the subnet specified by the SGID.
>3. PathRecord GetResp/GetTableResp response fields are relative to the
>subnet local to the processing SA.
>4. SAs are addressable by a well-known GID suffix.
>
>I think this may allow establishing inter-subnet connections.  As an 
>example of
>its usage:
>
>a. Active side issues a PathRecord query to the local SA with SGID=local,
>DGID=remote.
>b. SA responds with PathRecord(s).
>c. Active side selects local PathRecord P1.
>d. Active side issues a PathRecord query to the remote SA using PathRecord 
>P1 to
>format the request: SGID, DGID, SLID, DLID, TC, FL, SL, etc.
>e. The remote SA responds with PathRecord(s).  The SA must ensure that 
>packets injected into the internetwork using P1 will route to the returned 
>records.
>f. Active side selects remote PathRecord P2.
>g. Active side validates that remote packets injected using P2 route to P1.
>
>At this point, the active side should have path information that can be 
>used to
>configure the QPs for a connection.
>
>Assuming that this will work, what I don't like about it is the validation 
>at step g.  This adds a third query that I don't see a way to 
>eliminate.  If the check fails, the client restarts at step c.
>
>- Sean


From rdreier at cisco.com  Thu Feb 15 13:10:39 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Thu, 15 Feb 2007 13:10:39 -0800
Subject: [openib-general] SA multicast patches
In-Reply-To: <000101c74a29$6f796610$e598070a@amr.corp.intel.com> (Sean
	Hefty's message of "Tue, 6 Feb 2007 12:00:22 -0800")
References: <000101c74a29$6f796610$e598070a@amr.corp.intel.com>
Message-ID: <adatzxnnn3k.fsf@cisco.com>

So I'm reading this over, and the following code looks kind of odd to me:

 > +int ib_sa_get_mcmember_rec(struct ib_device *device, u8 port_num,
 > +			   union ib_gid *mgid, struct ib_sa_mcmember_rec *rec)
 > 
 > ...
 > 
 > +	} else {
 > +		memset(rec, 0, sizeof *rec);
 > +		ib_get_cached_gid(device, port_num, 0, &rec->port_gid);
 > +		rec->pkey = 0xFFFF;
 > +		get_random_bytes(&rec->qkey, sizeof rec->qkey);
 > +		rec->join_state = 1;
 > +	}

Where is this particular hard-coded P_Key value coming from?  And how
about the Q_Key -- why is a random one being chosen?  Does it matter
that this is setting the privileged bit of the Q_Key at random?

The only place this code seems to be used is in
cma_join_ib_multicast(), which overwrites all the values that get set
here anyway.  (Except it leaves the Q_Key if the portspace is not UDP??)
Would it be more sensible to leave the P_Key and Q_Key initialized to
0 here, and let the caller handle it?  I don't see how the multicast
tracking module can pick a sensible default here.

Also, should we check the return value of ib_get_cached_gid()?

 - R.


From rdreier at cisco.com  Thu Feb 15 13:47:12 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Thu, 15 Feb 2007 13:47:12 -0800
Subject: [openib-general] [PATCH] 2.6.21 iwcm - iw_cm_id destruction
 race condition fixes.
In-Reply-To: <1171548576.12187.2.camel@stevo-desktop> (Steve Wise's
	message of "Thu, 15 Feb 2007 08:09:36 -0600")
References: <1171548576.12187.2.camel@stevo-desktop>
Message-ID: <adawt2jozz3.fsf@cisco.com>

thanks, applied


From rdreier at cisco.com  Thu Feb 15 13:48:57 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Thu, 15 Feb 2007 13:48:57 -0800
Subject: [openib-general] [PATCH] 2.6.21 iw_cxgb3 Fail posts
 synchronously when in TERMINATE state.
In-Reply-To: <1171550942.13282.5.camel@stevo-desktop> (Steve Wise's
	message of "Thu, 15 Feb 2007 08:49:02 -0600")
References: <1171550942.13282.5.camel@stevo-desktop>
Message-ID: <adasld7ozw6.fsf@cisco.com>

thanks, applied.


From rdreier at cisco.com  Thu Feb 15 13:50:30 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Thu, 15 Feb 2007 13:50:30 -0800
Subject: [openib-general] [PATCH] iw_cxgb3 Fix copyrights in the
	iw_cxgb3 driver.
In-Reply-To: <1171569262.13282.59.camel@stevo-desktop> (Steve Wise's
	message of "Thu, 15 Feb 2007 13:54:22 -0600")
References: <1171569262.13282.59.camel@stevo-desktop>
Message-ID: <adaodnvoztl.fsf@cisco.com>

thanks, applied


From mshefty at ichips.intel.com  Thu Feb 15 14:05:24 2007
From: mshefty at ichips.intel.com (Sean Hefty)
Date: Thu, 15 Feb 2007 14:05:24 -0800
Subject: [openib-general] IB routing discussion summary
In-Reply-To: <6.2.0.14.2.20070215123631.09692088@esmail.cup.hp.com>
References: <000601c74fb4$6ed83840$8698070a@amr.corp.intel.com>
	<45D4B705.5020805@ichips.intel.com>
	<6.2.0.14.2.20070215123631.09692088@esmail.cup.hp.com>
Message-ID: <45D4D924.8070507@ichips.intel.com>

> Is this first an IBTA problem to solve if you believe there is a problem?

Based on my interpretation, I do not believe that there's an error in the 
architecture.  It seems consistent.  Additional clarification of what PathRecord 
fields mean when the GIDs are on different subnets may be needed, and a change 
to the architecture may make things easier to implement, but that's a separate 
matter.

> I contend CM does not require anything that is subnet local other than to
> target a given router port which should be derived from local SM/SA only 

Then please state how the passive side obtains the information (e.g. SLID/DLID) 
it needs in order to configure its QP.  I claim that information is carried in 
the CM REQ.

The alternatives that I see are:

1. The passive side extracts the data from the LRH that carries the CM REQ.
2. The passive side issues its own local path record query.

Will you please clarify where this information comes from?

> I will further state that SA-SA communication sans perhaps a
> P_Key / Q_Key service lookup should be avoided wherever possible.

I agree - which is why my proposal avoided SA-SA communication.  I see nothing 
in the architecture that prohibits a node from querying an SA that is not on its 
local subnet.

- Sean


From purdy at sgi.com  Thu Feb 15 14:08:24 2007
From: purdy at sgi.com (Dale Purdy)
Date: Thu, 15 Feb 2007 16:08:24 -0600
Subject: [openib-general] sl2vl tables
Message-ID: <Pine.SGI.4.58.0702151600360.61280@cantor.americas.sgi.com>

We are experimenting with OFED 1.2 (alpha1) and have dumped the SL2VL
tables for both a switch port and an HCA port using the smpqueury
command:

switch:
# SL2VL table: Lid 103
#                 SL: | 0| 1| 2| 3| 4| 5| 6| 7| 8| 9|10|11|12|13|14|15|
ports: in  0, out  1: | 0| 1| 2| 3| 0| 1| 2| 3| 0| 1| 2| 3| 0| 1| 2| 3|
ports: in  1, out  1: | 0| 1| 2| 3| 0| 1| 2| 3| 0| 1| 2| 3| 0| 1| 2| 3|
...

HCA:
# SL2VL table: Lid 37
#                 SL: | 0| 1| 2| 3| 4| 5| 6| 7| 8| 9|10|11|12|13|14|15|
ports: in  0, out  0: | 3| 2| 1| 0| 3| 2| 1| 0| 3| 2| 1| 0| 3| 2| 1| 0|

I would have expected the behavior that the switch describes - a one
to one mapping mod the supported number of VLs.  But I can't explain
why the HCA VLs are in reverse order to the SL.  If this were a host
endian issue I would have expected both to behave the same.  Can
someone explain what is going on?

Dale


From mst at mellanox.co.il  Thu Feb 15 14:16:13 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Fri, 16 Feb 2007 00:16:13 +0200
Subject: [openib-general] [PATCH for-2.6.21] IB/ipoib: error handling thinko
	fix
Message-ID: <20070215221613.GB26227@mellanox.co.il>

ipoib_cm_alloc_rx_skb might be called from IRQ context, so it must use
dev_kfree_skb_any, not kfree_skb

Signed-off-by: Michael S. Tsirkin <mst at mellanox.co.il>

---

This one's obvious, isn't it?

diff --git a/drivers/infiniband/ulp/ipoib/ipoib_cm.c b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
index 8ee6f06..e388a41 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_cm.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
@@ -148,7 +148,7 @@ partial_error:
 	for (; i >= 0; --i)
 		ib_dma_unmap_single(priv->ca, mapping[i + 1], PAGE_SIZE, DMA_FROM_DEVICE);
 
-	kfree_skb(skb);
+	dev_kfree_skb_any(skb);
 	return -ENOMEM;
 }
 

-- 
MST


From mshefty at ichips.intel.com  Thu Feb 15 14:39:47 2007
From: mshefty at ichips.intel.com (Sean Hefty)
Date: Thu, 15 Feb 2007 14:39:47 -0800
Subject: [openib-general] SA multicast patches
In-Reply-To: <adatzxnnn3k.fsf@cisco.com>
References: <000101c74a29$6f796610$e598070a@amr.corp.intel.com>
	<adatzxnnn3k.fsf@cisco.com>
Message-ID: <45D4E133.3000302@ichips.intel.com>

>  > +		memset(rec, 0, sizeof *rec);
>  > +		ib_get_cached_gid(device, port_num, 0, &rec->port_gid);
>  > +		rec->pkey = 0xFFFF;
>  > +		get_random_bytes(&rec->qkey, sizeof rec->qkey);
>  > +		rec->join_state = 1;
>  > +	}
> 
> Where is this particular hard-coded P_Key value coming from?  And how
> about the Q_Key -- why is a random one being chosen?  Does it matter
> that this is setting the privileged bit of the Q_Key at random?

The idea behind this part of the call was to return the user an MCMemberRecord 
that they can use to create a new multicast group.  Maybe it would be better to 
just drop this functionality and fail any lookups for mgid 0, but to answer your 
questions:

The pkey is the default partition, full membership pkey.  I believe all nodes 
will have either 0xffff or 0x7fff as their pkey.  We could probably call 
ib_get_cached_pkey() instead and just use the first entry in the table.

We don't want to to set the privileged bit of the q_key, so that's wrong.  Good 
catch.

> The only place this code seems to be used is in
> cma_join_ib_multicast(), which overwrites all the values that get set
> here anyway.  (Except it leaves the Q_Key if the portspace is not UDP??)
> Would it be more sensible to leave the P_Key and Q_Key initialized to
> 0 here, and let the caller handle it?  I don't see how the multicast
> tracking module can pick a sensible default here.

The user can overwrite any of the values that they don't like as defaults before 
sending the actual join.

> Also, should we check the return value of ib_get_cached_gid()?

That is probably best.  There's shouldn't be much harm if the call fails; the 
MCMemberRecord will be invalid, and the future join request will fail.

- Sean


From halr at voltaire.com  Thu Feb 15 14:39:36 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 15 Feb 2007 17:39:36 -0500
Subject: [openib-general] sl2vl tables
In-Reply-To: <Pine.SGI.4.58.0702151600360.61280@cantor.americas.sgi.com>
References: <Pine.SGI.4.58.0702151600360.61280@cantor.americas.sgi.com>
Message-ID: <1171579140.22446.204899.camel@hal.voltaire.com>

On Thu, 2007-02-15 at 17:08, Dale Purdy wrote:
> We are experimenting with OFED 1.2 (alpha1) and have dumped the SL2VL
> tables for both a switch port and an HCA port using the smpqueury
> command:
> 
> switch:
> # SL2VL table: Lid 103
> #                 SL: | 0| 1| 2| 3| 4| 5| 6| 7| 8| 9|10|11|12|13|14|15|
> ports: in  0, out  1: | 0| 1| 2| 3| 0| 1| 2| 3| 0| 1| 2| 3| 0| 1| 2| 3|
> ports: in  1, out  1: | 0| 1| 2| 3| 0| 1| 2| 3| 0| 1| 2| 3| 0| 1| 2| 3|
> ...
> 
> HCA:
> # SL2VL table: Lid 37
> #                 SL: | 0| 1| 2| 3| 4| 5| 6| 7| 8| 9|10|11|12|13|14|15|
> ports: in  0, out  0: | 3| 2| 1| 0| 3| 2| 1| 0| 3| 2| 1| 0| 3| 2| 1| 0|
> 
> I would have expected the behavior that the switch describes - a one
> to one mapping mod the supported number of VLs.  But I can't explain
> why the HCA VLs are in reverse order to the SL.  If this were a host
> endian issue I would have expected both to behave the same.  Can
> someone explain what is going on?

Is this on powerup of HCA node or after some SM potentially programs the
HCA (for this) ?

I typically see (for HCAs):
# SL2VL table: Lid 10
#                 SL: | 0| 1| 2| 3| 4| 5| 6| 7| 8| 9|10|11|12|13|14|15|
ports: in  0, out  0: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7|

-- Hal

> Dale
> 
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
> 


From purdy at sgi.com  Thu Feb 15 14:48:31 2007
From: purdy at sgi.com (Dale Purdy)
Date: Thu, 15 Feb 2007 16:48:31 -0600
Subject: [openib-general] sl2vl tables
In-Reply-To: <1171579140.22446.204899.camel@hal.voltaire.com>
References: <Pine.SGI.4.58.0702151600360.61280@cantor.americas.sgi.com>
	<1171579140.22446.204899.camel@hal.voltaire.com>
Message-ID: <Pine.SGI.4.58.0702151646030.61280@cantor.americas.sgi.com>

We are experimenting with LASH.  It appears that the SL2VL tables
don't get initialized unless QoS is enabled on the opensm command line
(-Q).  Enabling this seems to rectify the problem.  So it would appear
that LASH needs to enable this also.

Dale

On Thu, 15 Feb 2007, Hal Rosenstock wrote:

> On Thu, 2007-02-15 at 17:08, Dale Purdy wrote:
> > We are experimenting with OFED 1.2 (alpha1) and have dumped the SL2VL
> > tables for both a switch port and an HCA port using the smpqueury
> > command:
> >
> > switch:
> > # SL2VL table: Lid 103
> > #                 SL: | 0| 1| 2| 3| 4| 5| 6| 7| 8| 9|10|11|12|13|14|15|
> > ports: in  0, out  1: | 0| 1| 2| 3| 0| 1| 2| 3| 0| 1| 2| 3| 0| 1| 2| 3|
> > ports: in  1, out  1: | 0| 1| 2| 3| 0| 1| 2| 3| 0| 1| 2| 3| 0| 1| 2| 3|
> > ...
> >
> > HCA:
> > # SL2VL table: Lid 37
> > #                 SL: | 0| 1| 2| 3| 4| 5| 6| 7| 8| 9|10|11|12|13|14|15|
> > ports: in  0, out  0: | 3| 2| 1| 0| 3| 2| 1| 0| 3| 2| 1| 0| 3| 2| 1| 0|
> >
> > I would have expected the behavior that the switch describes - a one
> > to one mapping mod the supported number of VLs.  But I can't explain
> > why the HCA VLs are in reverse order to the SL.  If this were a host
> > endian issue I would have expected both to behave the same.  Can
> > someone explain what is going on?
>
> Is this on powerup of HCA node or after some SM potentially programs the
> HCA (for this) ?
>
> I typically see (for HCAs):
> # SL2VL table: Lid 10
> #                 SL: | 0| 1| 2| 3| 4| 5| 6| 7| 8| 9|10|11|12|13|14|15|
> ports: in  0, out  0: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7|
>
> -- Hal
>
> > Dale
> >
> > _______________________________________________
> > openib-general mailing list
> > openib-general at openib.org
> > http://openib.org/mailman/listinfo/openib-general
> >
> > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
> >
>
>


From halr at voltaire.com  Thu Feb 15 15:17:56 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 15 Feb 2007 18:17:56 -0500
Subject: [openib-general] sl2vl tables
In-Reply-To: <Pine.SGI.4.58.0702151646030.61280@cantor.americas.sgi.com>
References: <Pine.SGI.4.58.0702151600360.61280@cantor.americas.sgi.com>
	<1171579140.22446.204899.camel@hal.voltaire.com>
	<Pine.SGI.4.58.0702151646030.61280@cantor.americas.sgi.com>
Message-ID: <1171581435.22446.206644.camel@hal.voltaire.com>

On Thu, 2007-02-15 at 17:48, Dale Purdy wrote:
> We are experimenting with LASH.  It appears that the SL2VL tables
> don't get initialized unless QoS is enabled on the opensm command line
> (-Q).  Enabling this seems to rectify the problem.  So it would appear
> that LASH needs to enable this also.

You can run with -Q or set no_qos to FALSE in the opensm.opts file.

I wonder whether we should tie LASH to this so this isn't needed.

-- Hal

> Dale
> 
> On Thu, 15 Feb 2007, Hal Rosenstock wrote:
> 
> > On Thu, 2007-02-15 at 17:08, Dale Purdy wrote:
> > > We are experimenting with OFED 1.2 (alpha1) and have dumped the SL2VL
> > > tables for both a switch port and an HCA port using the smpqueury
> > > command:
> > >
> > > switch:
> > > # SL2VL table: Lid 103
> > > #                 SL: | 0| 1| 2| 3| 4| 5| 6| 7| 8| 9|10|11|12|13|14|15|
> > > ports: in  0, out  1: | 0| 1| 2| 3| 0| 1| 2| 3| 0| 1| 2| 3| 0| 1| 2| 3|
> > > ports: in  1, out  1: | 0| 1| 2| 3| 0| 1| 2| 3| 0| 1| 2| 3| 0| 1| 2| 3|
> > > ...
> > >
> > > HCA:
> > > # SL2VL table: Lid 37
> > > #                 SL: | 0| 1| 2| 3| 4| 5| 6| 7| 8| 9|10|11|12|13|14|15|
> > > ports: in  0, out  0: | 3| 2| 1| 0| 3| 2| 1| 0| 3| 2| 1| 0| 3| 2| 1| 0|
> > >
> > > I would have expected the behavior that the switch describes - a one
> > > to one mapping mod the supported number of VLs.  But I can't explain
> > > why the HCA VLs are in reverse order to the SL.  If this were a host
> > > endian issue I would have expected both to behave the same.  Can
> > > someone explain what is going on?
> >
> > Is this on powerup of HCA node or after some SM potentially programs the
> > HCA (for this) ?
> >
> > I typically see (for HCAs):
> > # SL2VL table: Lid 10
> > #                 SL: | 0| 1| 2| 3| 4| 5| 6| 7| 8| 9|10|11|12|13|14|15|
> > ports: in  0, out  0: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7|
> >
> > -- Hal
> >
> > > Dale
> > >
> > > _______________________________________________
> > > openib-general mailing list
> > > openib-general at openib.org
> > > http://openib.org/mailman/listinfo/openib-general
> > >
> > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
> > >
> >
> >


From purdy at sgi.com  Thu Feb 15 15:24:01 2007
From: purdy at sgi.com (Dale Purdy)
Date: Thu, 15 Feb 2007 17:24:01 -0600
Subject: [openib-general] sl2vl tables
In-Reply-To: <1171581435.22446.206644.camel@hal.voltaire.com>
References: <Pine.SGI.4.58.0702151600360.61280@cantor.americas.sgi.com>
	<1171579140.22446.204899.camel@hal.voltaire.com>
	<Pine.SGI.4.58.0702151646030.61280@cantor.americas.sgi.com>
	<1171581435.22446.206644.camel@hal.voltaire.com>
Message-ID: <Pine.SGI.4.58.0702151723330.61280@cantor.americas.sgi.com>

On Thu, 15 Feb 2007, Hal Rosenstock wrote:

> On Thu, 2007-02-15 at 17:48, Dale Purdy wrote:
> > We are experimenting with LASH.  It appears that the SL2VL tables
> > don't get initialized unless QoS is enabled on the opensm command line
> > (-Q).  Enabling this seems to rectify the problem.  So it would appear
> > that LASH needs to enable this also.
>
> You can run with -Q or set no_qos to FALSE in the opensm.opts file.
>
> I wonder whether we should tie LASH to this so this isn't needed.
>

I think that would be a good idea.

Dale


From lareliquia.angulo at gmail.com  Thu Feb 15 18:37:35 2007
From: lareliquia.angulo at gmail.com (lareliquia.angulo at gmail.com)
Date: Fri, 16 Feb 2007 03:37:35 +0100
Subject: [openib-general] Propuesta
Message-ID: <41149-22007251623734640@nanuk-1806bbde9>

Primeramente pedirle perd�n por las molestias.
Quiero darle las gracias por dedicarme parte de su tiempo.
Les pido su colaboraci�n, si pueden poner alg�n articulo, alg�n tipo de referencia o simplemente comentarlo entre sus conocidos, les estar�a muy agradecido.


Francisco Angulo (Madrid, 1976)
 Ha estudiado inform�tica, es inventor y un gran entusiasta de los avances tecnol�gicos, aunque con algunas reservas bien meditadas. 
Declara tener una fuerte conciencia por la conservaci�n del medio ambiente lo que le impele a buscar f�rmulas que ayuden de un modo pr�ctico  a contribuir a la sostenibilidad en el desarrollo de nuestras sociedades. Actualmente ha ideado y patentado diferentes motores ecol�gicos que espera puedan ser de utilidad para lograr este fin. 
En su aspecto art�stico, Francisco Angulo es escritor, hace dise�o digital  y su obra comienza con fuerza mostrando aspectos singulares llenos de una sugestiva vitalidad, humor, intriga e imaginaci�n.


http://www.sirlebert.com/xalfdm/pol40.mp3


les paso el comienzo de mi novela LA RELIQUIA
"Ojos casta�os� pasaba largas horas observ�ndome; no s� lo que vio en m�, pero le encantaba sentarse en la hierba en frente y mirarme detenidamente; lo cierto es que me encantaba contemplarla. Era de altura peque�a, no llegaba al metro y medio, f�sicamente delgada, ten�a una piel morena que sol�a llevar cubierta con pieles de animales para protegerse del fr�o; tambi�n portaba diferentes adornos en el pelo dependiendo de la �poca del a�o: en primavera acostumbraba trenzarse algunas flores y en invierno algunas cintas tintadas de colores; adem�s habituaba ponerse alg�n adorno colgando del cuello a modo de collar, normalmente alguna tira fina de cuero, y, como joya, alguna concha o figurilla de barro que ella misma modelaba con sus manos. Pertenec�a a una tribu que se hab�a establecido cerca de mi posici�n, en unas cuevas poco profundas, que utilizaban como hogar. �Ojos casta�os� ten�a una mirada intensa y observaba todo con curiosidad, intentando comprender el mundo que la rodeaba, como si todo formase parte de un mundo m�gico; percib�a el movimiento en las copas de los �rboles provocado por el viento, sosten�a sobre su mano insectos con cuidado de no da�arlos, y despu�s de contemplarlos intentando comprender qu� eran, los devolv�a de nuevo a la tierra. Tambi�n le encantaba observar los p�jaros e imitarlos; acostumbraba divertirse corriendo en c�rculos a m� alrededor, estirando los brazos y movi�ndolos arriba y abajo como si fuese un ave.
En primavera crec�a una hierba alta en la peque�a pradera que se encontraba a la izquierda, una pradera de hierba verde y alta, plagada de dientes de le�n. A �Ojos casta�os� le encantaba saltar sobre el verde y con sus saltos se llenaba todo de la simiente de los dientes de le�n, que eran arrastradas por la suave brisa de primavera. Aquella bella criatura era incansable y pod�a tirarse horas saltando y jugando a atrapar las semillas que revoloteaban en el viento, cuando ascend�an, �Ojos casta�os� dejaba de saltar y se quedaba quieta, de pie, con la cara hacia arriba, los ojos cerrados y esperando en silencio. Entonces, algunas empezaban a descender suavemente y ca�an sobre su cara acarici�ndola. Me hubiese gustado poder notar aquella sensaci�n, sentir c�mo las suaves semillas ca�an sobre m� como plumas; en algunas ocasiones alguna le entraba en la nariz y la hac�an estornudar; eso me parec�a muy gracioso, porque �Ojos casta�os� se quedaba muy sorprendida, con gesto de preguntarse qu� era lo que hab�a ocurrido.
Menos los d�as de lluvia, ven�a a verme siempre; era algo que me hac�a ilusi�n y, cuando el d�a despertaba soleado, la esperaba hasta que la ve�a aparecer subiendo la pendiente que llegaba hasta mi posici�n; por lo general, sub�a tarareando alguna melod�a y saltando al caminar. 
f�rmame el libro d visitas xfa! 
www.lareliquia.es 

El articulo:

Este articulo lo escrib� hace ya varios a�os, pero es ahora cuando empezamos a ver sus repercusiones.

La tortilla de ma�z  �alimento para los veh�culos norteamericanos�

     La sustituci�n del combustible diesel por el de Biodiesel es totalmente inviable ya que para el consumo actual ser� necesario un cultivo de �girasol o ma�z� mayor de � del territorio nacional y mucho m�s imposible en pa�ses como Jap�n. Las ventajas de la obtenci�n del combustible de los residuos org�nicos mediante bacterias, nos permite  un cultivo cada 3 d�as mientras que uno de �girasol o ma�z�, es de un a�o.
Que las bacterias se alimentan principalmente de agua y no necesitan ning�n cuidado, no ocurriendo lo mismo con los cultivos de �girasol o ma�z�. Por otra parte el cultivo de �girasol o ma�z� utiliza muchos elementos perjudiciales para el medio ambiente, desde nitratos y fertilizantes, hasta toda clase de productos qu�micos para fumigar lo que termina perjudicando seriamente nuestro medio ambiente.  En cambio la utilizaci�n de residuos org�nicos nos da una ventaja m�s que es eliminar las basuras de una forma totalmente ecol�gica, pues en todo el proceso de obtenci�n del nuevo Biodiesel no utilizamos qu�mica alguna.


www.lareliquia.es 
Un abrazo:

Francisco Angulo


From rowland at cse.ohio-state.edu  Thu Feb 15 19:50:46 2007
From: rowland at cse.ohio-state.edu (Shaun Rowland)
Date: Thu, 15 Feb 2007 22:50:46 -0500
Subject: [openib-general] mvapich2 ofed 1.2 problem
In-Reply-To: <1171558785.13282.29.camel@stevo-desktop>
References: <1171380610.15471.25.camel@stevo-desktop>
	<adar6sum1fq.fsf@cisco.com> <1171386686.15471.36.camel@stevo-desktop>
	<adazm7ikm7q.fsf@cisco.com> <45D1FD0B.2080606@cse.ohio-state.edu>
	<ada1wktlwux.fsf@cisco.com> <45D3E224.9060306@cse.ohio-state.edu>
	<1171558785.13282.29.camel@stevo-desktop>
Message-ID: <45D52A16.1080803@cse.ohio-state.edu>

Steve Wise wrote:
> Shaun,
> 
> Lemme know if you have an mvapich2 kit that I can test with iwarp...

Hi Steve. I've updated our SRPM:

https://www.openfabrics.org/~rowland/ofed_1_2/

The latest is mvapich2-0.9.8-4.src.rpm. This version should solve the
shared library linking issues. This can be built outside of the OFED 1.2
alpha1 release with the information in the README file or can replace
the previous SRPM in the OFED-1.2-alpha1/SRPMS/ directory. To use iWARP,
use the OFA build of the SRPM and set MV2_ENABLE_IWARP_MODE=1 in your
environment.
-- 
Shaun Rowland	rowland at cse.ohio-state.edu
http://www.cse.ohio-state.edu/~rowland/


From xma at us.ibm.com  Thu Feb 15 20:41:11 2007
From: xma at us.ibm.com (Shirley Ma)
Date: Thu, 15 Feb 2007 20:41:11 -0800
Subject: [openib-general] IPv6oIB neighbour discover broken when MCGs
	overflow
In-Reply-To: <adak5yk69hd.fsf@cisco.com>
Message-ID: <OF29D0C259.70E22877-ON87257284.0017AEC5-88257283.0071A3BB@us.ibm.com>


Hello, Roland,

We have a customer issue regarding IPv6oIB. In the subnet, there are
limited number of MCGs supported. So when there are multiple IPv6 addresses
are assigned to one interface, each IPv6 address will have one unique
solicited-node address (depends on their groupID). Then in a large subnet,
we will have tons of MCGs. If IPv6 solicited node addresses exceed the
number of MDGs in this subnet, then IPv6 neighbour discovery will be
broken, this won't happen in Ethernet since sendonly doesn't require sender
to be joined any MCG.

According to IPoIB RFCs, it only covers MCG beyond this subnet,  For MCG
not in this subnet, direct the packet to all routes or broadcast. ( for
IPv6 should be all hosts address here), but not cover MCG overflow in this
subnet. (Currently it's not implemented in openFabric.)

I have done an initial patch to addresss MCG overflow problem and redirect
the solicited-node address to all hosts node address, thus IPv6 neighbour
discovery will work no matter how many IPv6 addresses in this subnet. This
patch is only triggered with IPv6 enabled and MGC overflows, so there is
almost no performance penalty.

The patch seems working, although it is still under validation and test. I
would like to hear your opinion regarding how to address this problem,
whether I am in the right direction ...

Thanks
Shirley Ma
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070215/48514a28/attachment.html>

From jgunthorpe at obsidianresearch.com  Thu Feb 15 21:42:27 2007
From: jgunthorpe at obsidianresearch.com (Jason Gunthorpe)
Date: Thu, 15 Feb 2007 22:42:27 -0700
Subject: [openib-general] IB routing discussion summary
In-Reply-To: <45D4B705.5020805@ichips.intel.com>
References: <000601c74fb4$6ed83840$8698070a@amr.corp.intel.com>
	<45D4B705.5020805@ichips.intel.com>
Message-ID: <20070216054227.GB10942@obsidianresearch.com>

On Thu, Feb 15, 2007 at 11:39:49AM -0800, Sean Hefty wrote:

> I think this may allow establishing inter-subnet connections.  As an
> example of its usage:
 
I think you are right, this does contain enough information.

> a. Active side issues a PathRecord query to the local SA with SGID=local,
> DGID=remote.
> b. SA responds with PathRecord(s).
> c. Active side selects local PathRecord P1.
> d. Active side issues a PathRecord query to the remote SA using PathRecord 
> P1 to
> format the request: SGID, DGID, SLID, DLID, TC, FL, SL, etc.
> e. The remote SA responds with PathRecord(s).  The SA must ensure that 
> packets injected into the internetwork using P1 will route to the returned 
> records.
> f. Active side selects remote PathRecord P2.
> g. Active side validates that remote packets injected using P2 route to P1.

Let me add something:
   - Side A GRH.SGID = P2.DGID
   - Side A GRH.DGID = P1.DGID
   - Side A GRH.TC/FL = P2.TC/FL
   - Side A LRH.SLID = P1.SLID
   - Side A LRH.DLID = P1.DLID
   - Side A LRH.SL   = P1.SL

   - Side B GRH.SGID = P1.DGID
   - Side B GRH.DGID = P2.DGID
   - Side B GRH.TC/FL = P1.TC/FL
   - Side B LRH.SLID = P2.SLID
   - Side B LRH.DLID = P2.DLID
   - Side B LRH.SL   = P2.SL
[Side A information programs the active sides QP. Side B information
goes into the REQ and is sent to passive side that then uses it
directly to program the QP. The passive side never does a PR.]

Inverting the TC/FL source can avoid requirement g. When P1 is
produced it also generates a TC/FL that ensures the LIDs match for the
reverse direction. When P2 is created it does the same.

Here is the interesting bit: If the SAs support 'multiple router
paths' then they must have a small bit of global information which is
that packets entering router LID x,y,z on subnet Y appear on local
subnet router port A. Enough information is in the P2 request
to lookup in this table to learn the input port.

In forming P2 the SA will use that bit of global informtion to
restrict the router LIDs to one that ends up on port A. Once it
selects a router LID and CA LID it produces a TC/FL that ensures those
LIDs are selected by the local router port A.

Choosing the DGIDs as I did can allow for some multipathing via GID if
an implementation goes that way. [Though IMHO, doing this creates new
ugly problems, and doesn't solve the SLID selection issue.]

This changes the definition of a PR, the returned GRH fields are the
fields that the *remote* must send to produce the LIDs in the PR.

If you define TC/FL like this then I don't know what happens
when UD makes a PR query and uses the returned TC/FL in the local QP
configuration... It may actually be better to have a new query type
and have SA take care of it.

In this solution new semantics for PR are defined, it requires the SA
magic GID, and it co-opts the flowlabel/tc to solve the QP LID
matching issue. Michael is probably right and we should find a way to
ask IBTA for implementation guidance. Hopefully that can be done
swiftly..

Jason


From krkumar2 at in.ibm.com  Thu Feb 15 22:29:02 2007
From: krkumar2 at in.ibm.com (Krishna Kumar)
Date: Fri, 16 Feb 2007 11:59:02 +0530
Subject: [openib-general] [RFC] [PATCH] iWARP Connection parameters
	negotiation
Message-ID: <20070216062902.10631.98327.sendpatchset@K50wks273871wss.in.ibm.com>

Hi all,

In the Nov 16-17th 2006 OpenFabrics Developer Summit, the
following presentation by Tom talked of negotiating
parameters at the time of establishing a connection :

http://www.openfabrics.org/conference/nov2006sc/OFA-Newstuff-SC06.ppt

See "IRD/ORD Negotiation, Option #1" 

This is an RFC patch that implements the same (after
interracting with Tom, Steve and Arkady). The working
of this functionality is described by Tom below (in one
of his mails to me) :

"Yes, basically what has to happen is that the IRD/ORD
information needs to be exchanged in the private data as
part of connect/accept. The parameters should be pre-pended
followed by the user specified private data. The IWCM strips
the ORD/IRD header off before delivering the private data to
the consumer. BTW, if you look at the IB code you can see
how they do this already for port number, etc...

"The options specified are really whether to be permissive
when the remote peer won't honor your request. My opinion
is that the policy should be permissive, which is to say
that if you ask for 8 outgoing, but the peer responds with
4 incoming, you set up the connection and the local node
adjusts his ORD down to 4 to match the peers IRD response.
Does this make sense?"

Please provide your comments/suggestions/(flames). Patch
applies to 2.6.20.

Thanks,

- KK

diff -ruNp org/drivers/infiniband/core/cma.c new/drivers/infiniband/core/cma.c
--- org/drivers/infiniband/core/cma.c	2007-02-16 11:01:08.000000000 +0530
+++ new/drivers/infiniband/core/cma.c	2007-02-16 11:01:12.000000000 +0530
@@ -2059,19 +2059,81 @@ out:
 	return ret;
 }
 
+/*
+ * set_connp_fields() : Set various fields of iw_param based on connnect or
+ * accept parameters (connp). Also, allocates a private header if
+ * config_privhdr is true.
+ * 
+ * Returns a private header on success and -errno converted-to-pointer on
+ * failure. A null private header indicates that config_privhdr was false.
+ */
+static struct iwcm_priv_hdr *set_connp_fields(struct iw_cm_conn_param *iw_param,
+					      struct rdma_conn_param *connp,
+					      int config_privhdr)
+{
+	struct iwcm_priv_hdr *priv_hdr;
+
+	iw_param->ord = connp->initiator_depth;
+	iw_param->ird = connp->responder_resources;
+
+	if (!config_privhdr) {
+		/*
+		 * We are not configured to send private header (for active),
+		 * or the peer didn't send a private header (for passive); in
+		 * both cases, do not use private header.
+		 */
+		priv_hdr = NULL;	/* Success */
+		iw_param->private_data = connp->private_data;
+		iw_param->private_data_len = connp->private_data_len;
+		goto out;
+	}
+
+	if ((typeof (connp->private_data_len))
+	    (connp->private_data_len + sizeof *priv_hdr) <
+	     connp->private_data_len) {
+		/* Overflow - private_data_len + priv_hdr is too large */
+		/* xxx.KK - Help - there is a better way to check overflow
+		 * than this - some macro that is already inbuilt.
+		 */
+		priv_hdr = ERR_PTR(-EOVERFLOW);
+		goto out;
+	}
+
+	/* Allocate memory for both private data and iwcm_priv_hdr */
+	iw_param->private_data_len = connp->private_data_len + sizeof *priv_hdr;
+	iw_param->private_data = kmalloc(iw_param->private_data_len,
+					 GFP_KERNEL);
+	if (!iw_param->private_data) {
+		priv_hdr = ERR_PTR(-ENOMEM);
+		goto out;
+	}
+
+	/* Prepend iwcm_priv_hdr header before actual private data */
+	priv_hdr = (struct iwcm_priv_hdr *) iw_param->private_data;
+	priv_hdr->ord = cpu_to_be32(iw_param->ord);
+	priv_hdr->ird = cpu_to_be32(iw_param->ird);
+	if (connp->private_data_len)
+		memcpy(priv_hdr->private_data, connp->private_data,
+		       connp->private_data_len);
+
+out:
+	return priv_hdr;
+}
+
+extern int iw_send_private_header;
+
 static int cma_connect_iw(struct rdma_id_private *id_priv,
 			  struct rdma_conn_param *conn_param)
 {
 	struct iw_cm_id *cm_id;
 	struct sockaddr_in* sin;
-	int ret;
 	struct iw_cm_conn_param iw_param;
+	struct iwcm_priv_hdr *priv_hdr = NULL;
+	int ret;
 
 	cm_id = iw_create_cm_id(id_priv->id.device, cma_iw_handler, id_priv);
-	if (IS_ERR(cm_id)) {
-		ret = PTR_ERR(cm_id);
-		goto out;
-	}
+	if (IS_ERR(cm_id))
+		return PTR_ERR(cm_id);
 
 	id_priv->cm_id.iw = cm_id;
 
@@ -2085,17 +2147,28 @@ static int cma_connect_iw(struct rdma_id
 	if (ret)
 		goto out;
 
-	iw_param.ord = conn_param->initiator_depth;
-	iw_param.ird = conn_param->responder_resources;
-	iw_param.private_data = conn_param->private_data;
-	iw_param.private_data_len = conn_param->private_data_len;
+	/* Initialize iw_param fields */
+	priv_hdr = set_connp_fields(&iw_param, conn_param,
+				    iw_send_private_header);
+	if (IS_ERR(priv_hdr)) {
+		ret = PTR_ERR(priv_hdr);
+		goto out;
+	}
+
+	/*
+	 * Save iwcm_priv_hdr till we get a connect response to negotiate.
+	 * priv_hdr can be NULL, indicating that negotiation is disabled.
+	 */
+	cm_id->priv_hdr = priv_hdr;
+
 	if (id_priv->id.qp)
 		iw_param.qpn = id_priv->qp_num;
 	else
 		iw_param.qpn = conn_param->qp_num;
 	ret = iw_cm_connect(cm_id, &iw_param);
 out:
-	if (ret && !IS_ERR(cm_id)) {
+	if (ret) {
+		/* cm_id->priv_hdr is freed up in iwcm_deref_id() */
 		iw_destroy_cm_id(cm_id);
 		id_priv->cm_id.iw = NULL;
 	}
@@ -2185,23 +2258,51 @@ out:
 static int cma_accept_iw(struct rdma_id_private *id_priv,
 		  struct rdma_conn_param *conn_param)
 {
+	struct iw_cm_id *cm_id;
+	struct iwcm_priv_hdr *priv_hdr;
 	struct iw_cm_conn_param iw_param;
 	int ret;
 
+	cm_id = id_priv->cm_id.iw;
+
+	/* Initialize iw_param fields */
+	priv_hdr = set_connp_fields(&iw_param, conn_param,
+				    (cm_id->priv_hdr ?  1 : 0) &
+				     iw_send_private_header);
+	if (IS_ERR(priv_hdr)) {
+		ret = PTR_ERR(priv_hdr);
+		goto out;
+	}
+
 	ret = cma_modify_qp_rtr(&id_priv->id);
 	if (ret)
-		return ret;
+		goto out;
 
-	iw_param.ord = conn_param->initiator_depth;
-	iw_param.ird = conn_param->responder_resources;
-	iw_param.private_data = conn_param->private_data;
-	iw_param.private_data_len = conn_param->private_data_len;
-	if (id_priv->id.qp) {
+	if (id_priv->id.qp)
 		iw_param.qpn = id_priv->qp_num;
-	} else
+	else
 		iw_param.qpn = conn_param->qp_num;
 
-	return iw_cm_accept(id_priv->cm_id.iw, &iw_param);
+	ret = iw_cm_accept(cm_id, &iw_param);
+
+out:
+	/*
+	 * Free the just allocated priv_hdr+private data. priv_hdr could
+	 * be NULL if negotiation is not configured, or if the active side
+	 * didn't use private header.
+	 */
+	if (!IS_ERR(priv_hdr))
+		kfree(priv_hdr);
+
+	/*
+	 * Free priv_hdr that was allocated in parse_connection_params().
+	 */
+	if (cm_id->priv_hdr) {
+		kfree(cm_id->priv_hdr);
+		cm_id->priv_hdr = NULL;
+	}
+
+	return ret;
 }
 
 static int cma_send_sidr_rep(struct rdma_id_private *id_priv,
diff -ruNp org/drivers/infiniband/core/iwcm.c new/drivers/infiniband/core/iwcm.c
--- org/drivers/infiniband/core/iwcm.c	2007-02-16 11:01:08.000000000 +0530
+++ new/drivers/infiniband/core/iwcm.c	2007-02-16 11:01:12.000000000 +0530
@@ -54,6 +54,12 @@ MODULE_AUTHOR("Tom Tucker");
 MODULE_DESCRIPTION("iWARP CM");
 MODULE_LICENSE("Dual BSD/GPL");
 
+int iw_send_private_header __read_mostly = 0;
+module_param_named(iw_send_private_header, iw_send_private_header, int, 0644);
+MODULE_PARM_DESC(iw_send_private_header,
+			"Enable private iwcm connection negotiation header");
+EXPORT_SYMBOL(iw_send_private_header);
+
 static struct workqueue_struct *iwcm_wq;
 struct iwcm_work {
 	struct work_struct work;
@@ -158,6 +164,7 @@ static int iwcm_deref_id(struct iwcm_id_
 	BUG_ON(atomic_read(&cm_id_priv->refcount)==0);
 	if (atomic_dec_and_test(&cm_id_priv->refcount)) {
 		BUG_ON(!list_empty(&cm_id_priv->work_list));
+		kfree(cm_id_priv->id.priv_hdr);
 		if (waitqueue_active(&cm_id_priv->destroy_comp.wait)) {
 			BUG_ON(cm_id_priv->state != IW_CM_STATE_DESTROYING);
 			BUG_ON(test_bit(IWCM_F_CALLBACK_DESTROY,
@@ -470,6 +477,72 @@ int iw_cm_reject(struct iw_cm_id *cm_id,
 }
 EXPORT_SYMBOL(iw_cm_reject);
 
+/* 
+ * iw_negotiate_qp_conn_params() : gets incoming and outgoing ird/ord
+ * parameters and modifies incoming ird/ord if required as per Sequences
+ * #7 and #8 noted in Section 6.6.1.1; and Sequences #9 and 10 noted in
+ * 6.6.1.2 of the hilland draft.
+ *
+ * This routine also modifies the caller's ird/ord so that accept is
+ * provided with this smaller ird/ord.
+ *
+ * This routine is called by :
+ * 	- iw_cm_accept() for accepting an incoming connection, and
+ *	- parse_connection_params() IW_CM_EVENT_CONNECT_REPLY case where
+ * 	  the client gets response to the connect request;
+ * and since both these callers have process context, it is OK to call
+ * ib_modify_qp (which can sleep).
+ *
+ * Returns 0 if negotiation was done successfully; and < 0 if no negotiation
+ * was required or if modify_qp() failed (which is not a catastrophic error).
+ */
+static int iw_negotiate_qp_conn_params(struct ib_qp *qp, int *l_ird, int *l_ord,
+				       int r_ird, int r_ord)
+{
+	int qp_mask = 0;	/* mask of attributes to be modified */
+	int ret = -1;		/* no negotiation required */
+
+	if (*l_ord > r_ird) {
+		/* 
+		 * Local Outgoing is bigger than Peer Incoming, reduce
+		 * my Outgoing.
+		 */
+		pr_debug("%s: Reducing outgoing from %d to %d\n", __FUNCTION__,
+			 *l_ord, r_ird);
+		*l_ord = r_ird;
+		qp_mask = IB_QP_MAX_QP_RD_ATOMIC;
+	}
+
+	if (*l_ird > r_ord) {
+		/*
+		 * Local Incoming is greater than Peer Outgoing, reduce
+		 * my Incoming.
+		 */
+		pr_debug("%s: Reducing incoming from %d to %d\n", __FUNCTION__,
+			 *l_ird, r_ord);
+		*l_ird = r_ord;
+		qp_mask |= IB_QP_MAX_DEST_RD_ATOMIC;
+	}
+
+	if (qp_mask) {
+		struct ib_qp_attr qp_attr;
+
+		qp_attr.max_rd_atomic = *l_ird;
+		qp_attr.max_dest_rd_atomic = *l_ord;
+		ret = ib_modify_qp(qp, &qp_attr, qp_mask);
+		pr_debug("%s: modify qp with qp_mask:%x returns %d\n",
+			 __FUNCTION__, qp_mask, ret);
+		/* xxx.KK : This does NOTHING in amso driver, as
+		 * c2_qp_set_read_limits() is used to set rdma limits, this
+		 * seems to be a limitation of driver as modify_qp is
+		 * supposed to do the same. See mthca_modify_qp which
+		 * modifies the limits. Maybe chelsio driver supports this.
+		 */
+	}
+
+	return ret;
+}
+
 /*
  * CM_ID <-- ESTABLISHED
  *
@@ -483,6 +556,7 @@ int iw_cm_accept(struct iw_cm_id *cm_id,
 	struct iwcm_id_private *cm_id_priv;
 	struct ib_qp *qp;
 	unsigned long flags;
+	int r_ird, r_ord;
 	int ret;
 
 	cm_id_priv = container_of(cm_id, struct iwcm_id_private, id);
@@ -505,6 +579,44 @@ int iw_cm_accept(struct iw_cm_id *cm_id,
 	cm_id_priv->qp = qp;
 	spin_unlock_irqrestore(&cm_id_priv->lock, flags);
 
+	if (!iw_send_private_header || !cm_id->priv_hdr) {
+		/*
+		 * Not configured to send extra header, or peer didn't use
+		 * private header.
+		 */
+		goto accept;
+	}
+
+	/*
+	 * Retrieve client's connection parameters that were earlier saved
+	 * from an incoming client connect request, and negotiate with the
+	 * accept parameters. iw_param contains local connection values
+	 * while cm_id->priv_hdr contains peer connection values.
+	 */
+
+	r_ord = be32_to_cpu(cm_id->priv_hdr->ord);
+	r_ird = be32_to_cpu(cm_id->priv_hdr->ird);
+
+	kfree(cm_id->priv_hdr);
+	cm_id->priv_hdr = NULL;
+
+	ret = iw_negotiate_qp_conn_params(qp, &iw_param->ird,
+					  &iw_param->ord, r_ird, r_ord);
+	if (!ret) {
+		struct iwcm_priv_hdr *priv_hdr;
+
+		/*
+		 * iw_param's ird/ord now contains new values, update the
+		 * prepended header's ird/ord fields to reflect this change
+		 * so as to let the other side know of these updated values.
+		 */
+		priv_hdr = (struct iwcm_priv_hdr *) iw_param->private_data;
+
+		priv_hdr->ord = cpu_to_be32(iw_param->ord);
+		priv_hdr->ird = cpu_to_be32(iw_param->ird);
+	}
+
+accept:
 	ret = cm_id->device->iwcm->accept(cm_id, iw_param);
 	if (ret) {
 		/* An error on accept precludes provider events */
@@ -584,6 +696,126 @@ int iw_cm_connect(struct iw_cm_id *cm_id
 EXPORT_SYMBOL(iw_cm_connect);
 
 /*
+ * parse_connection_params() : parse connection parameters for both
+ * active side requests and passive side responses, and negotiate
+ * suitable values for ird/ord.
+ * 
+ * Returns 0 on success and -errno on failure (via modify_qp). Also
+ * returns the actual iw_event on success by stripping the extra
+ * header that was added by the remote peer.
+ */
+static int parse_connection_params(struct iwcm_id_private *cm_id_priv,
+	struct iw_cm_event *iw_event, struct iw_cm_event *new_iw_event)
+{
+	int ret = 0;
+	int l_ird, l_ord, r_ird, r_ord;
+	struct iw_cm_id *cm_id;
+	struct iwcm_priv_hdr *local_priv_hdr, *remote_priv_hdr;
+
+	cm_id = &cm_id_priv->id;
+	local_priv_hdr = cm_id->priv_hdr;
+	*new_iw_event = *iw_event;
+
+	if (!iw_send_private_header) {
+		/* Not configured to send extra header */
+		BUG_ON(local_priv_hdr);
+		goto out;
+	}
+
+	remote_priv_hdr = iw_event->private_data;
+	if (!iw_has_private_header(remote_priv_hdr, iw_event)) {
+		/*
+		 * Remote side has not sent a iw private header, we use the
+		 * old protocol.
+		 */
+		if (local_priv_hdr) {
+			/*
+			 * This is the active side code path. We had already
+			 * sent a private header when doing the connect, but
+			 * the server does not implement private header, so
+			 * we should fail the connect response otherwise we
+			 * end up having passed wrong private data to the
+			 * application on the server.
+			 */
+			kfree(local_priv_hdr);
+			cm_id->priv_hdr = NULL;
+			ret = -EAGAIN;
+		}
+		goto out;
+	}
+
+	switch (iw_event->event) {
+	case IW_CM_EVENT_CONNECT_REQUEST:
+		/*
+		 * This is Server code - a connect request was received.
+		 * Allocate a iwcm_priv_hdr that can be used later for
+		 * negotiation when accept() is performed.
+		 */
+		BUG_ON(local_priv_hdr);
+
+		/*
+		 * Save only the header (and not the private data passed
+		 * in the connect()). By the time an accept is done, we
+		 * will have the local params which we can then compare
+		 * with the remote params that we are just saving (the
+		 * peer's real private data is passed to the app later in
+		 * the flow when we call id->cm_handler(new_iw_event)).
+		 */
+		cm_id->priv_hdr = kmalloc(sizeof *cm_id->priv_hdr, GFP_KERNEL);
+                if (!cm_id->priv_hdr) {
+			ret = -ENOMEM;
+			goto out;
+		}
+
+		/*
+		 * Save all the remote connection parameters. These will be
+		 * later used for negotiation when an accept is performed.
+		 * Save in Big-Endian format.
+		 */
+		cm_id->priv_hdr->ord = remote_priv_hdr->ord;
+		cm_id->priv_hdr->ird = remote_priv_hdr->ird;
+		break;
+
+	case IW_CM_EVENT_CONNECT_REPLY:
+		/*
+		 * This is Client code - a connect response was received,
+		 * use local connection parameters saved earlier and the
+		 * remote connection parameters to negotiate.
+		 */
+
+		BUG_ON(!local_priv_hdr);
+
+
+		l_ord = be32_to_cpu(local_priv_hdr->ord);
+		l_ird = be32_to_cpu(local_priv_hdr->ird);
+		r_ord = be32_to_cpu(remote_priv_hdr->ord);
+		r_ird = be32_to_cpu(remote_priv_hdr->ird);
+
+		kfree(local_priv_hdr); 
+		cm_id->priv_hdr = NULL;
+
+		BUG_ON(!cm_id_priv->qp);
+		(void) iw_negotiate_qp_conn_params(cm_id_priv->qp, &l_ird,
+						   &l_ord, r_ird, r_ord);
+		break;
+
+	default:
+		/* Should never get here */
+		BUG();
+		break;
+	}
+
+	/*
+	 * Reset the new event values to point to the real private_data/len.
+	 */
+	new_iw_event->private_data_len -= sizeof(struct iwcm_priv_hdr);
+	new_iw_event->private_data = remote_priv_hdr->private_data;
+
+out:
+	return ret;
+}
+
+/*
  * Passive Side: new CM_ID <-- CONN_RECV
  *
  * Handles an inbound connect request. The function creates a new
@@ -604,6 +836,7 @@ static void cm_conn_req_handler(struct i
 	unsigned long flags;
 	struct iw_cm_id *cm_id;
 	struct iwcm_id_private *cm_id_priv;
+	struct iw_cm_event new_iw_event;
 	int ret;
 
 	/*
@@ -644,8 +877,13 @@ static void cm_conn_req_handler(struct i
 		goto out;
 	}
 
-	/* Call the client CM handler */
-	ret = cm_id->cm_handler(cm_id, iw_event);
+	/* Save the incoming connection parameters in cm_id->priv_hdr */
+	ret = parse_connection_params(cm_id_priv, iw_event, &new_iw_event);
+	if (!ret) {
+		/* Call the client CM handler */
+		ret = cm_id->cm_handler(cm_id, &new_iw_event);
+	}
+
 	if (ret) {
 		set_bit(IWCM_F_CALLBACK_DESTROY, &cm_id_priv->flags);
 		destroy_cm_id(cm_id);
@@ -704,6 +942,7 @@ static int cm_conn_rep_handler(struct iw
 			       struct iw_cm_event *iw_event)
 {
 	unsigned long flags;
+        struct iw_cm_event new_iw_event;
 	int ret;
 
 	spin_lock_irqsave(&cm_id_priv->lock, flags);
@@ -724,7 +963,15 @@ static int cm_conn_rep_handler(struct iw
 		cm_id_priv->state = IW_CM_STATE_IDLE;
 	}
 	spin_unlock_irqrestore(&cm_id_priv->lock, flags);
-	ret = cm_id_priv->id.cm_handler(&cm_id_priv->id, iw_event);
+
+	/*
+	 * iw_event contains peer connection parameters, while
+	 * cm_id->priv_hdr has local connection parameters. Use these
+	 * to negotiate mutually agreeable connection parameters.
+	 */
+	ret = parse_connection_params(cm_id_priv, iw_event, &new_iw_event);
+	if (!ret)
+		ret = cm_id_priv->id.cm_handler(&cm_id_priv->id, &new_iw_event);
 
 	if (iw_event->private_data_len)
 		kfree(iw_event->private_data);
diff -ruNp org/include/rdma/iw_cm.h new/include/rdma/iw_cm.h
--- org/include/rdma/iw_cm.h	2007-02-16 11:01:08.000000000 +0530
+++ new/include/rdma/iw_cm.h	2007-02-16 11:01:12.000000000 +0530
@@ -86,6 +86,32 @@ typedef int (*iw_cm_handler)(struct iw_c
 typedef int (*iw_event_handler)(struct iw_cm_id *cm_id,
 				 struct iw_cm_event *event);
 
+/*
+ * Header prepended before the actual private data passed during
+ * connection establishment in connect/accept calls.
+ */
+struct iwcm_priv_hdr {
+	u32 ord;		/* Outbound RDMA Read Queue Depth */
+	u32 ird;		/* Inbound RDMA Read Queue Depth */
+				/* Other negotiation params come here */
+	char private_data[0];	/* copy the real private data here */
+} __attribute__ ((packed));
+
+/*
+ * Returns true if priv_hdr seemingly points to a real private header
+ * structure.  Conditions to determine that a private header is present
+ * are :
+ * 	- non NULL pointer.
+ *	- event's private data is atleast equal to private header size.
+ * Note : This can return false positives.
+ */
+static inline int iw_has_private_header(struct iwcm_priv_hdr *priv_hdr,
+					struct iw_cm_event *event)
+{
+	return (priv_hdr &&
+		event->private_data_len >= sizeof (struct iwcm_priv_hdr));
+}
+
 struct iw_cm_id {
 	iw_cm_handler		cm_handler;      /* client callback function */
 	void		        *context;	 /* client cb context */
@@ -93,6 +119,7 @@ struct iw_cm_id {
 	struct sockaddr_in      local_addr;
 	struct sockaddr_in	remote_addr;
 	void			*provider_data;	 /* provider private data */
+	struct iwcm_priv_hdr	*priv_hdr;	 /* Extra header added */
 	iw_event_handler        event_handler;   /* cb for provider
 						    events */
 	/* Used by provider to add and remove refs on IW cm_id */


From vlad at lists.openfabrics.org  Fri Feb 16 02:23:37 2007
From: vlad at lists.openfabrics.org (vlad at lists.openfabrics.org)
Date: Fri, 16 Feb 2007 02:23:37 -0800 (PST)
Subject: [openib-general] ofa_1_2_kernel 20070216-0200 daily build status
Message-ID: <20070216102337.F18D4E603C0@openfabrics.org>

This email was generated automatically, please do not reply


Common build parameters:  --with-ipoib-mod --with-sdp-mod --with-srp-mod --with-user_mad-mod --with-user_access-mod --with-mthca-mod --with-core-mod --with-addr_trans-mod --with-cxgb3-mod 

Passed:
Passed on i686 with 2.6.15-23-server
Passed on i686 with linux-2.6.17
Passed on i686 with linux-2.6.13
Passed on i686 with linux-2.6.15
Passed on i686 with linux-2.6.16
Passed on i686 with linux-2.6.19
Passed on i686 with linux-2.6.18
Passed on i686 with linux-2.6.12
Passed on i686 with linux-2.6.14
Passed on x86_64 with linux-2.6.20
Passed on x86_64 with linux-2.6.19
Passed on powerpc with linux-2.6.18
Passed on x86_64 with linux-2.6.18
Passed on powerpc with linux-2.6.19
Passed on x86_64 with linux-2.6.13
Passed on ppc64 with linux-2.6.19
Passed on x86_64 with linux-2.6.12
Passed on powerpc with linux-2.6.17
Passed on x86_64 with linux-2.6.17
Passed on x86_64 with linux-2.6.14
Passed on x86_64 with linux-2.6.15
Passed on ppc64 with linux-2.6.18
Passed on x86_64 with linux-2.6.16
Passed on powerpc with linux-2.6.16
Passed on powerpc with linux-2.6.12
Passed on powerpc with linux-2.6.15
Passed on ppc64 with linux-2.6.17
Passed on ia64 with linux-2.6.18
Passed on powerpc with linux-2.6.13
Passed on powerpc with linux-2.6.14
Passed on ppc64 with linux-2.6.12
Passed on ppc64 with linux-2.6.15
Passed on ppc64 with linux-2.6.16
Passed on ppc64 with linux-2.6.14
Passed on ppc64 with linux-2.6.13
Passed on ia64 with linux-2.6.15
Passed on ia64 with linux-2.6.13
Passed on ia64 with linux-2.6.19
Passed on ia64 with linux-2.6.14
Passed on ia64 with linux-2.6.17
Passed on ia64 with linux-2.6.16
Passed on ia64 with linux-2.6.12

Failed:


From dy.manju at gmail.com  Fri Feb 16 05:02:04 2007
From: dy.manju at gmail.com (manju y)
Date: Fri, 16 Feb 2007 18:32:04 +0530
Subject: [openib-general] SRP-FMR
Message-ID: <ba89687d0702160502w4a40f3c9o39e77d8ab47c4e58@mail.gmail.com>

Hi All

I am a newbie to SRP Can any one please explain or provide some links to
understand the same.


Thanks
Manju
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070216/a5ba916c/attachment.html>

From dy.manju at gmail.com  Fri Feb 16 05:29:00 2007
From: dy.manju at gmail.com (manju y)
Date: Fri, 16 Feb 2007 18:59:00 +0530
Subject: [openib-general] fast memory registration
Message-ID: <ba89687d0702160529l76590f87l95642982f05d59f5@mail.gmail.com>

Hi All

I am a newbie to SRP Can any one please explain or provide some links
to understand the the FMR(fast memory registration) concept used in
ib_srp.c

Thanks
Manju


From swise at opengridcomputing.com  Fri Feb 16 07:06:19 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Fri, 16 Feb 2007 09:06:19 -0600
Subject: [openib-general] mvapich2 ofed 1.2 problem
In-Reply-To: <45D52A16.1080803@cse.ohio-state.edu>
References: <1171380610.15471.25.camel@stevo-desktop>
	<adar6sum1fq.fsf@cisco.com> <1171386686.15471.36.camel@stevo-desktop>
	<adazm7ikm7q.fsf@cisco.com> <45D1FD0B.2080606@cse.ohio-state.edu>
	<ada1wktlwux.fsf@cisco.com> <45D3E224.9060306@cse.ohio-state.edu>
	<1171558785.13282.29.camel@stevo-desktop>
	<45D52A16.1080803@cse.ohio-state.edu>
Message-ID: <1171638379.1066.6.camel@stevo-desktop>

Ok I'll try it out today!

Thanks,

Steve.


On Thu, 2007-02-15 at 22:50 -0500, Shaun Rowland wrote:
> Steve Wise wrote:
> > Shaun,
> > 
> > Lemme know if you have an mvapich2 kit that I can test with iwarp...
> 
> Hi Steve. I've updated our SRPM:
> 
> https://www.openfabrics.org/~rowland/ofed_1_2/
> 
> The latest is mvapich2-0.9.8-4.src.rpm. This version should solve the
> shared library linking issues. This can be built outside of the OFED 1.2
> alpha1 release with the information in the README file or can replace
> the previous SRPM in the OFED-1.2-alpha1/SRPMS/ directory. To use iWARP,
> use the OFA build of the SRPM and set MV2_ENABLE_IWARP_MODE=1 in your
> environment.


From brettmcmillian at swbell.net  Fri Feb 16 08:48:34 2007
From: brettmcmillian at swbell.net (Brett McMillian)
Date: Fri, 16 Feb 2007 08:48:34 -0800 (PST)
Subject: [openib-general] krping.c changes
Message-ID: <568620.47287.qm@web81513.mail.mud.yahoo.com>

I wasn't sure who I should email about this, but I recently got krping to work between an Opteron and a PPC G5.  However, in order for krping to work I had to make the following changes to krping.c to ensure the address, key, and length were being sent across the network as big endian, otherwise they were in machine dependent byte order.


static void krping_format_send(struct krping_cb *cb, u64 buf, 
                   struct ib_mr *mr)
{
    struct krping_rdma_info *info = &cb->send_buf;

-    info->buf = buf;
 -    info->rkey = mr->rkey;
 -    info->size = cb->size;
+    info->buf = cpu_to_be64(buf);
+    info->rkey = cpu_to_be32(mr->rkey);
+    info->size = cpu_to_be32(cb->size);

    DEBUG_LOG("RDMA addr %llx rkey %x len %d\n",
          info->buf, info->rkey, info->size);
}


static int server_recv(struct krping_cb *cb, struct ib_wc *wc)
{
    if (wc->byte_len != sizeof(cb->recv_buf)) {
        printk(KERN_ERR PFX "Received bogus data, size %d\n", 
               wc->byte_len);
        return -1;
    }
-     cb->remote_rkey = cb->recv_buf.rkey;
-     cb->remote_addr = cb->recv_buf.buf;
-     cb->remote_len  = cb->recv_buf.size;
+    cb->remote_rkey = be32_to_cpu(cb->recv_buf.rkey);
+    cb->remote_addr = be64_to_cpu(cb->recv_buf.buf);
+    cb->remote_len  = be32_to_cpu(cb->recv_buf.size);
    DEBUG_LOG("Received rkey %x addr %llx len %d from peer\n",
          cb->remote_rkey, cb->remote_addr, cb->remote_len);

    if (cb->state <= CONNECTED || cb->state == RDMA_WRITE_COMPLETE)
        cb->state = RDMA_READ_ADV;
    else
        cb->state = RDMA_WRITE_ADV;

    return 0;
}

Brett McMillian
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070216/564dbda6/attachment.html>

From rdreier at cisco.com  Fri Feb 16 09:00:48 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Fri, 16 Feb 2007 09:00:48 -0800
Subject: [openib-general] IPv6oIB neighbour discover broken when MCGs
	overflow
In-Reply-To: <OF29D0C259.70E22877-ON87257284.0017AEC5-88257283.0071A3BB@us.ibm.com>
	(Shirley Ma's message of "Thu, 15 Feb 2007 20:41:11 -0800")
References: <OF29D0C259.70E22877-ON87257284.0017AEC5-88257283.0071A3BB@us.ibm.com>
Message-ID: <adasld6kpfj.fsf@cisco.com>

 > We have a customer issue regarding IPv6oIB. In the subnet, there are
 > limited number of MCGs supported. So when there are multiple IPv6 addresses
 > are assigned to one interface, each IPv6 address will have one unique
 > solicited-node address (depends on their groupID). Then in a large subnet,
 > we will have tons of MCGs. If IPv6 solicited node addresses exceed the
 > number of MDGs in this subnet, then IPv6 neighbour discovery will be
 > broken, this won't happen in Ethernet since sendonly doesn't require sender
 > to be joined any MCG.

 > I have done an initial patch to addresss MCG overflow problem and redirect
 > the solicited-node address to all hosts node address, thus IPv6 neighbour
 > discovery will work no matter how many IPv6 addresses in this subnet. This
 > patch is only triggered with IPv6 enabled and MGC overflows, so there is
 > almost no performance penalty.

I really don't like this approach, since it can break things in very
subtle ways (eg suppose one node fails to join its solicited node
group, but then a later node wants to talk to it and succeeds in
joining the solicited node group as a send-only member -- since the
first node is not a member then it will never see the ND messages).

I much prefer to fix the SM not to impose too-low limits on the number
of MCGs.  Supporting O(# nodes) MCGs is really not a very onerous
requirement on the SM.

 - R.


From rdreier at cisco.com  Fri Feb 16 09:06:33 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Fri, 16 Feb 2007 09:06:33 -0800
Subject: [openib-general] [PATCH for-2.6.21] IB/ipoib: error handling
	thinko fix
In-Reply-To: <20070215221613.GB26227@mellanox.co.il> (Michael S.
	Tsirkin's message of "Fri, 16 Feb 2007 00:16:13 +0200")
References: <20070215221613.GB26227@mellanox.co.il>
Message-ID: <ada64a2kp5y.fsf@cisco.com>

Thanks, queued for 2.6.21.


From swise at opengridcomputing.com  Fri Feb 16 09:15:46 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Fri, 16 Feb 2007 11:15:46 -0600
Subject: [openib-general] mvapich2 ofed 1.2 problem
In-Reply-To: <45D52A16.1080803@cse.ohio-state.edu>
References: <1171380610.15471.25.camel@stevo-desktop>
	<adar6sum1fq.fsf@cisco.com> <1171386686.15471.36.camel@stevo-desktop>
	<adazm7ikm7q.fsf@cisco.com> <45D1FD0B.2080606@cse.ohio-state.edu>
	<ada1wktlwux.fsf@cisco.com> <45D3E224.9060306@cse.ohio-state.edu>
	<1171558785.13282.29.camel@stevo-desktop>
	<45D52A16.1080803@cse.ohio-state.edu>
Message-ID: <1171646146.10345.3.camel@stevo-desktop>

Good news!  

This SRPM works with alpha1.  I'm able to run the IMB benchmarks on a 4
node iwarp cluster!

Thanks,

Steve.


On Thu, 2007-02-15 at 22:50 -0500, Shaun Rowland wrote:
> Steve Wise wrote:
> > Shaun,
> > 
> > Lemme know if you have an mvapich2 kit that I can test with iwarp...
> 
> Hi Steve. I've updated our SRPM:
> 
> https://www.openfabrics.org/~rowland/ofed_1_2/
> 
> The latest is mvapich2-0.9.8-4.src.rpm. This version should solve the
> shared library linking issues. This can be built outside of the OFED 1.2
> alpha1 release with the information in the README file or can replace
> the previous SRPM in the OFED-1.2-alpha1/SRPMS/ directory. To use iWARP,
> use the OFA build of the SRPM and set MV2_ENABLE_IWARP_MODE=1 in your
> environment.


From halr at voltaire.com  Fri Feb 16 09:19:43 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 16 Feb 2007 12:19:43 -0500
Subject: [openib-general] IPv6oIB neighbour discover broken when MCGs
 overflow
In-Reply-To: <adasld6kpfj.fsf@cisco.com>
References: <OF29D0C259.70E22877-ON87257284.0017AEC5-88257283.0071A3BB@us.ibm.com>
	<adasld6kpfj.fsf@cisco.com>
Message-ID: <1171646341.22446.266964.camel@hal.voltaire.com>

On Fri, 2007-02-16 at 12:00, Roland Dreier wrote:
>  > We have a customer issue regarding IPv6oIB. In the subnet, there are
>  > limited number of MCGs supported. So when there are multiple IPv6 addresses
>  > are assigned to one interface, each IPv6 address will have one unique
>  > solicited-node address (depends on their groupID). Then in a large subnet,
>  > we will have tons of MCGs. If IPv6 solicited node addresses exceed the
>  > number of MDGs in this subnet, then IPv6 neighbour discovery will be
>  > broken, this won't happen in Ethernet since sendonly doesn't require sender
>  > to be joined any MCG.
> 
>  > I have done an initial patch to addresss MCG overflow problem and redirect
>  > the solicited-node address to all hosts node address, thus IPv6 neighbour
>  > discovery will work no matter how many IPv6 addresses in this subnet. This
>  > patch is only triggered with IPv6 enabled and MGC overflows, so there is
>  > almost no performance penalty.
> 
> I really don't like this approach, since it can break things in very
> subtle ways (eg suppose one node fails to join its solicited node
> group, but then a later node wants to talk to it and succeeds in
> joining the solicited node group as a send-only member -- since the
> first node is not a member then it will never see the ND messages).
> 
> I much prefer to fix the SM not to impose too-low limits on the number
> of MCGs.  Supporting O(# nodes) MCGs is really not a very onerous
> requirement on the SM.

Is this a MFT size issue or SM issue or both ?

-- Hal

>  - R.
> 
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
> 


From xma at us.ibm.com  Fri Feb 16 09:22:15 2007
From: xma at us.ibm.com (Shirley Ma)
Date: Fri, 16 Feb 2007 09:22:15 -0800
Subject: [openib-general] IPv6oIB neighbour discover broken when MCGs
	overflow
In-Reply-To: <adasld6kpfj.fsf@cisco.com>
Message-ID: <OF1D268498.7F381FBE-ON87257284.005EFB37-88257284.005F6B3D@us.ibm.com>


Roland,

Thanks for your quick response.

Even SM supports 1000 MCGs, it's still not sufficitent for 250 nodes
cluster, each node have 4 links for IPv6 without any scope/global IPv6
address configured.(250*4+ a few default MCGs) There will be a MCG overflow
problem anyway in IPv6oIB.

Thanks
Shirley Ma
IBM Linux Technology Center
15300 SW Koll Parkway
Beaverton, OR 97006-6063
Phone(Fax): (503) 578-7638


             Roland Dreier                                                 
             <rdreier at cisco.co                                             
             m>                                                         To 
                                       Shirley Ma/Beaverton/IBM at IBMUS      
             02/16/2007 09:00                                           cc 
             AM                        "Michael S. Tsirkin"                
                                       <mst at mellanox.co.il>,               
                                       openib-general at openib.org           
                                                                   Subject 
                                       Re: IPv6oIB neighbour discover      
                                       broken when MCGs overflow           
                                                                           
                                                                           
 > We have a customer issue regarding IPv6oIB. In the subnet, there are
 > limited number of MCGs supported. So when there are multiple IPv6
addresses
 > are assigned to one interface, each IPv6 address will have one unique
 > solicited-node address (depends on their groupID). Then in a large
subnet,
 > we will have tons of MCGs. If IPv6 solicited node addresses exceed the
 > number of MDGs in this subnet, then IPv6 neighbour discovery will be
 > broken, this won't happen in Ethernet since sendonly doesn't require
sender
 > to be joined any MCG.

 > I have done an initial patch to addresss MCG overflow problem and
redirect
 > the solicited-node address to all hosts node address, thus IPv6
neighbour
 > discovery will work no matter how many IPv6 addresses in this subnet.
This
 > patch is only triggered with IPv6 enabled and MGC overflows, so there is
 > almost no performance penalty.

I really don't like this approach, since it can break things in very
subtle ways (eg suppose one node fails to join its solicited node
group, but then a later node wants to talk to it and succeeds in
joining the solicited node group as a send-only member -- since the
first node is not a member then it will never see the ND messages).

I much prefer to fix the SM not to impose too-low limits on the number
of MCGs.  Supporting O(# nodes) MCGs is really not a very onerous
requirement on the SM.

 - R.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070216/2f791386/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070216/2f791386/attachment.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: pic12802.gif
Type: image/gif
Size: 1255 bytes
Desc: not available
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070216/2f791386/attachment-0001.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ecblank.gif
Type: image/gif
Size: 45 bytes
Desc: not available
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070216/2f791386/attachment-0002.gif>

From rdreier at cisco.com  Fri Feb 16 09:25:30 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Fri, 16 Feb 2007 09:25:30 -0800
Subject: [openib-general] IPv6oIB neighbour discover broken when MCGs
	overflow
In-Reply-To: <OF1D268498.7F381FBE-ON87257284.005EFB37-88257284.005F6B3D@us.ibm.com>
	(Shirley Ma's message of "Fri, 16 Feb 2007 09:22:15 -0800")
References: <OF1D268498.7F381FBE-ON87257284.005EFB37-88257284.005F6B3D@us.ibm.com>
Message-ID: <adaabzej9px.fsf@cisco.com>

 > Even SM supports 1000 MCGs, it's still not sufficitent for 250 nodes
 > cluster, each node have 4 links for IPv6 without any scope/global IPv6
 > address configured.(250*4+ a few default MCGs) There will be a MCG overflow
 > problem anyway in IPv6oIB.

But what's the problem with supporting 1000 or even 10000 MCGs?

 - R.


From rdreier at cisco.com  Fri Feb 16 09:27:14 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Fri, 16 Feb 2007 09:27:14 -0800
Subject: [openib-general] IPv6oIB neighbour discover broken when MCGs
 overflow
In-Reply-To: <1171646341.22446.266964.camel@hal.voltaire.com> (Hal
	Rosenstock's message of "16 Feb 2007 12:19:43 -0500")
References: <OF29D0C259.70E22877-ON87257284.0017AEC5-88257283.0071A3BB@us.ibm.com>
	<adasld6kpfj.fsf@cisco.com>
	<1171646341.22446.266964.camel@hal.voltaire.com>
Message-ID: <aday7myhv2l.fsf@cisco.com>

 > > I much prefer to fix the SM not to impose too-low limits on the number
 > > of MCGs.  Supporting O(# nodes) MCGs is really not a very onerous
 > > requirement on the SM.
 > 
 > Is this a MFT size issue or SM issue or both ?

Well as we discussed before, the size of the MFT is really independent
of the # of MCGs supported.  It's up to the SM how to allocate MLIDs,
and as long as all the switches in the fabric support at least one
MLID, then any number of MCGs can be managed by the SM.  So I would
say this is entirely an SM issue.

 - R.


From halr at voltaire.com  Fri Feb 16 09:32:37 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 16 Feb 2007 12:32:37 -0500
Subject: [openib-general] IPv6oIB neighbour discover broken when MCGs
 overflow
In-Reply-To: <aday7myhv2l.fsf@cisco.com>
References: <OF29D0C259.70E22877-ON87257284.0017AEC5-88257283.0071A3BB@us.ibm.com>
	<adasld6kpfj.fsf@cisco.com>
	<1171646341.22446.266964.camel@hal.voltaire.com>
	<aday7myhv2l.fsf@cisco.com>
Message-ID: <1171647108.22446.267617.camel@hal.voltaire.com>

On Fri, 2007-02-16 at 12:27, Roland Dreier wrote:
>  > > I much prefer to fix the SM not to impose too-low limits on the number
>  > > of MCGs.  Supporting O(# nodes) MCGs is really not a very onerous
>  > > requirement on the SM.
>  > 
>  > Is this a MFT size issue or SM issue or both ?
> 
> Well as we discussed before, the size of the MFT is really independent
> of the # of MCGs supported.  It's up to the SM how to allocate MLIDs,
> and as long as all the switches in the fabric support at least one
> MLID, then any number of MCGs can be managed by the SM.

Almost but not quite.

> So I would say this is entirely an SM issue.

I thought that mapping multiple MCGs to the same MLID requires that a
set of the (group) parameters are the same. Is that the case for these
IPv6 groups ? Is the only variable in those parameters the PKey ?

I certainly agree that the SM can do a better job than simple 1:1
mapping.

-- Hal

>  - R.


From xma at us.ibm.com  Fri Feb 16 09:38:14 2007
From: xma at us.ibm.com (Shirley Ma)
Date: Fri, 16 Feb 2007 09:38:14 -0800
Subject: [openib-general] IPv6oIB neighbour discover broken when MCGs
	overflow
In-Reply-To: <OF1D268498.7F381FBE-ON87257284.005EFB37-88257284.005F6B3D@LocalDomain>
Message-ID: <OF6238C073.C141E3AC-ON87257284.00609389-88257284.0060E20C@us.ibm.com>


Roland,

>I really don't like this approach, since it can break things in very
>subtle ways (eg suppose one node fails to join its solicited node
>group, but then a later node wants to talk to it and succeeds in
>joining the solicited node group as a send-only member -- since the
>first node is not a member then it will never see the ND messages).

For the successful join, ND sends to the node directly, for the failure
join, ND sends to all hosts addr. So ND will work no matter whether the
join OK or not, that's the patch does.

Thanks
Shirley Ma
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070216/20ce22e9/attachment.html>

From rdreier at cisco.com  Fri Feb 16 09:43:17 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Fri, 16 Feb 2007 09:43:17 -0800
Subject: [openib-general] SA multicast patches
In-Reply-To: <45D4E133.3000302@ichips.intel.com> (Sean Hefty's message
	of "Thu, 15 Feb 2007 14:39:47 -0800")
References: <000101c74a29$6f796610$e598070a@amr.corp.intel.com>
	<adatzxnnn3k.fsf@cisco.com> <45D4E133.3000302@ichips.intel.com>
Message-ID: <adabqjuhubu.fsf@cisco.com>

 > The pkey is the default partition, full membership pkey.  I believe
 > all nodes will have either 0xffff or 0x7fff as their pkey.  We could
 > probably call ib_get_cached_pkey() instead and just use the first
 > entry in the table.

Well the consumer has to know what P_Key to use since it must match
the QP that will be used to send/receive.  So I would suggest not
trying to guess in the low-level multicast.c code, and rely on the
consumer to set it properly.

 > We don't want to to set the privileged bit of the q_key, so that's
 > wrong.  Good catch.

OK, I'll replace the code with something like random32() & 0x7fffffff

One other question about the PS_IPOIB stuff:

 > +static int cma_set_qkey(struct ib_device *device, u8 port_num,
 > +			enum rdma_port_space ps,
 > +			struct rdma_dev_addr *dev_addr, u32 *qkey)
 > +{
 > +	struct ib_sa_mcmember_rec rec;
 > +	int ret = 0;
 > +
 > +	switch (ps) {
 > +	case RDMA_PS_UDP:
 > +		*qkey = RDMA_UDP_QKEY;
 > +		break;
 > +	case RDMA_PS_IPOIB:
 > +		ib_addr_get_mgid(dev_addr, &rec.mgid);
 > +		ret = ib_sa_get_mcmember_rec(device, port_num, &rec.mgid, &rec);
 > +		*qkey = be32_to_cpu(rec.qkey);
 > +		break;

Does this work if userspace tries to join a new IPoIB MCG that the
kernel driver hasn't joined yet?  From reading the code it seems that
ib_sa_get_mcmember_rec() would fail with -EADDRNOTAVAIL and so the
whole join request would fail.

Am I reading this correctly?  Is it supposed to work?  I would think
that it would be nice to be able to receive on IPoIB MCGs not also
being received by the kernel.

 - R.


From mshefty at ichips.intel.com  Fri Feb 16 09:44:09 2007
From: mshefty at ichips.intel.com (Sean Hefty)
Date: Fri, 16 Feb 2007 09:44:09 -0800
Subject: [openib-general] krping.c changes
In-Reply-To: <568620.47287.qm@web81513.mail.mud.yahoo.com>
References: <568620.47287.qm@web81513.mail.mud.yahoo.com>
Message-ID: <45D5ED69.6090909@ichips.intel.com>

Brett McMillian wrote:
> I wasn't sure who I should email about this, but I recently got krping 
> to work between an Opteron and a PPC G5.  However, in order for krping 
> to work I had to make the following changes to krping.c to ensure the 
> address, key, and length were being sent across the network as big 
> endian, otherwise they were in machine dependent byte order.

I maintain a copy of this in the test-apps branch of my rdma-dev.git tree.  I'll 
update my tree with this change.  Can you add a signed-off-by line to the patch?

- Sean


From rdreier at cisco.com  Fri Feb 16 09:47:51 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Fri, 16 Feb 2007 09:47:51 -0800
Subject: [openib-general] IPv6oIB neighbour discover broken when MCGs
 overflow
In-Reply-To: <1171647108.22446.267617.camel@hal.voltaire.com> (Hal
	Rosenstock's message of "16 Feb 2007 12:32:37 -0500")
References: <OF29D0C259.70E22877-ON87257284.0017AEC5-88257283.0071A3BB@us.ibm.com>
	<adasld6kpfj.fsf@cisco.com>
	<1171646341.22446.266964.camel@hal.voltaire.com>
	<aday7myhv2l.fsf@cisco.com>
	<1171647108.22446.267617.camel@hal.voltaire.com>
Message-ID: <adazm7egfjs.fsf@cisco.com>

 > I thought that mapping multiple MCGs to the same MLID requires that a
 > set of the (group) parameters are the same. Is that the case for these
 > IPv6 groups ? Is the only variable in those parameters the PKey ?

I don't see why any group parameters need to be the same -- I'm
probably missing something, but which parameters in particular did you
have in mind?

 - R.


From rdreier at cisco.com  Fri Feb 16 09:49:24 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Fri, 16 Feb 2007 09:49:24 -0800
Subject: [openib-general] IPv6oIB neighbour discover broken when MCGs
	overflow
In-Reply-To: <OF6238C073.C141E3AC-ON87257284.00609389-88257284.0060E20C@us.ibm.com>
	(Shirley Ma's message of "Fri, 16 Feb 2007 09:38:14 -0800")
References: <OF6238C073.C141E3AC-ON87257284.00609389-88257284.0060E20C@us.ibm.com>
Message-ID: <adavei2gfh7.fsf@cisco.com>

 > For the successful join, ND sends to the node directly, for the failure
 > join, ND sends to all hosts addr. So ND will work no matter whether the
 > join OK or not, that's the patch does.

But what if the full-member join fails on node A for node A's
solicited node group, but then node B succeeds in joining that group
as a send-only member (perhaps because some other nodes have dropped
off the fabric in the meantime).  Then node B will send the ND message
on a MCG that A is not a member of.

 - R.


From rdreier at cisco.com  Fri Feb 16 09:53:49 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Fri, 16 Feb 2007 09:53:49 -0800
Subject: [openib-general] SA multicast patches
In-Reply-To: <adabqjuhubu.fsf@cisco.com> (Roland Dreier's message of
	"Fri, 16 Feb 2007 09:43:17 -0800")
References: <000101c74a29$6f796610$e598070a@amr.corp.intel.com>
	<adatzxnnn3k.fsf@cisco.com> <45D4E133.3000302@ichips.intel.com>
	<adabqjuhubu.fsf@cisco.com>
Message-ID: <adak5yigf9u.fsf@cisco.com>

OK, another question about the multicast.c code:

 > +static struct mcast_group *mcast_find(struct mcast_port *port,
 > +				      union ib_gid *mgid)
 > +{
 > +	struct rb_node *node = port->table.rb_node;
 > +	struct mcast_group *group;
 > +	int ret;
 > +
 > +	while (node) {
 > +		group = rb_entry(node, struct mcast_group, node);
 > +		ret = memcmp(mgid->raw, group->rec.mgid.raw, sizeof *mgid);
 > +		if (!ret)
 > +			return group;
 > +
 > +		if (ret < 0)
 > +			node = node->rb_left;
 > +		else
 > +			node = node->rb_right;
 > +	}
 > +	return NULL;
 > +}
 > +
 > +static struct mcast_group *mcast_insert(struct mcast_port *port,
 > +					struct mcast_group *group,
 > +					int allow_duplicates)
 > +{
 > +	struct rb_node **link = &port->table.rb_node;
 > +	struct rb_node *parent = NULL;
 > +	struct mcast_group *cur_group;
 > +	int ret;
 > +
 > +	while (*link) {
 > +		parent = *link;
 > +		cur_group = rb_entry(parent, struct mcast_group, node);
 > +
 > +		ret = memcmp(group->rec.mgid.raw, cur_group->rec.mgid.raw,
 > +			     sizeof group->rec.mgid);
 > +		if (ret < 0)
 > +			link = &(*link)->rb_left;
 > +		else if (ret > 0)
 > +			link = &(*link)->rb_right;
 > +		else if (allow_duplicates)
 > +			link = &(*link)->rb_left;
 > +		else
 > +			return cur_group;
 > +	}
 > +	rb_link_node(&group->node, parent, link);
 > +	rb_insert_color(&group->node, &port->table);
 > +	return NULL;
 > +}

How does it work to put duplicates into the RB tree?  It seems
especially strange that the lookup code does:

 > +		if (ret < 0)
 > +			node = node->rb_left;
 > +		else
 > +			node = node->rb_right;

so if ret == 0 (ie the two GIDs being tested are the same) then it
continues to traverse to the right, while the insert code does:

 > +		else if (allow_duplicates)
 > +			link = &(*link)->rb_left;

which seems to put duplicates to the left always.

Also I'd be really worried that the rebalancing code freaks out when
duplicate keys are inserted in the tree.

 - R.


From halr at voltaire.com  Fri Feb 16 09:54:18 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 16 Feb 2007 12:54:18 -0500
Subject: [openib-general] IPv6oIB neighbour discover broken when MCGs
 overflow
In-Reply-To: <adazm7egfjs.fsf@cisco.com>
References: <OF29D0C259.70E22877-ON87257284.0017AEC5-88257283.0071A3BB@us.ibm.com>
	<adasld6kpfj.fsf@cisco.com>
	<1171646341.22446.266964.camel@hal.voltaire.com>
	<aday7myhv2l.fsf@cisco.com>
	<1171647108.22446.267617.camel@hal.voltaire.com>
	<adazm7egfjs.fsf@cisco.com>
Message-ID: <1171648438.22446.268637.camel@hal.voltaire.com>

On Fri, 2007-02-16 at 12:47, Roland Dreier wrote:
>  > I thought that mapping multiple MCGs to the same MLID requires that a
>  > set of the (group) parameters are the same. Is that the case for these
>  > IPv6 groups ? Is the only variable in those parameters the PKey ?
> 
> I don't see why any group parameters need to be the same -- I'm
> probably missing something, but which parameters in particular did you
> have in mind?

For starters, I think that rate, MTU, and SL (and maybe PKey too) need
to be the same. There may be others too if I stare at the spec for a
while...

-- Hal

>  - R.


From sean.hefty at intel.com  Fri Feb 16 10:02:10 2007
From: sean.hefty at intel.com (Sean Hefty)
Date: Fri, 16 Feb 2007 10:02:10 -0800
Subject: [openib-general] SA multicast patches
In-Reply-To: <adabqjuhubu.fsf@cisco.com>
Message-ID: <000001c751f4$943af3b0$e598070a@amr.corp.intel.com>

>Well the consumer has to know what P_Key to use since it must match
>the QP that will be used to send/receive.  So I would suggest not
>trying to guess in the low-level multicast.c code, and rely on the
>consumer to set it properly.

I'm fine leaving it at 0.

For now, I think the safest thing to do is just remove the entire 'else' portion
from the function and return an error if the MGID is 0.  Neither of the places
that call into ib_sa_get_mcmember_rec() should pass in an MGID of 0.  (I'm
testing this now to verify.)  See below for its use:

> > +	case RDMA_PS_IPOIB:
> > +		ib_addr_get_mgid(dev_addr, &rec.mgid);
> > +		ret = ib_sa_get_mcmember_rec(device, port_num, &rec.mgid, &rec);
> > +		*qkey = be32_to_cpu(rec.qkey);
> > +		break;
>
>Does this work if userspace tries to join a new IPoIB MCG that the
>kernel driver hasn't joined yet?  From reading the code it seems that
>ib_sa_get_mcmember_rec() would fail with -EADDRNOTAVAIL and so the
>whole join request would fail.

In short, yes.

ib_addr_get_mgid() is returning the MGID for the ipoib broadcast group, so ipoib
must have joined that group.  The code then looks up the MCMemberRecord for the
broadcast group, and extracts the qkey for that group to use when joining the
new group.

- Sean


From xma at us.ibm.com  Fri Feb 16 10:05:32 2007
From: xma at us.ibm.com (Shirley Ma)
Date: Fri, 16 Feb 2007 10:05:32 -0800
Subject: [openib-general] IPv6oIB neighbour discover broken when MCGs
	overflow
In-Reply-To: <adavei2gfh7.fsf@cisco.com>
Message-ID: <OFB2E451E1.A24CF99F-ON87257284.00633207-88257284.006361BA@us.ibm.com>


Roland Dreier <rdreier at cisco.com> wrote on 02/16/2007 09:49:24 AM:

>  > For the successful join, ND sends to the node directly, for the
failure
>  > join, ND sends to all hosts addr. So ND will work no matter whether
the
>  > join OK or not, that's the patch does.
>
> But what if the full-member join fails on node A for node A's
> solicited node group, but then node B succeeds in joining that group
> as a send-only member (perhaps because some other nodes have dropped
> off the fabric in the meantime).  Then node B will send the ND message
> on a MCG that A is not a member of.
>
>  - R.

Yes. B can send ND to A, and A responds without being a member so IPv6 ND
works. Is there any security or other problems here?

Thanks
Shirley Ma
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070216/c693e4ee/attachment.html>

From rdreier at cisco.com  Fri Feb 16 10:07:43 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Fri, 16 Feb 2007 10:07:43 -0800
Subject: [openib-general] IPv6oIB neighbour discover broken when MCGs
 overflow
In-Reply-To: <1171648438.22446.268637.camel@hal.voltaire.com> (Hal
	Rosenstock's message of "16 Feb 2007 12:54:18 -0500")
References: <OF29D0C259.70E22877-ON87257284.0017AEC5-88257283.0071A3BB@us.ibm.com>
	<adasld6kpfj.fsf@cisco.com>
	<1171646341.22446.266964.camel@hal.voltaire.com>
	<aday7myhv2l.fsf@cisco.com>
	<1171647108.22446.267617.camel@hal.voltaire.com>
	<adazm7egfjs.fsf@cisco.com>
	<1171648438.22446.268637.camel@hal.voltaire.com>
Message-ID: <ada4ppmgemo.fsf@cisco.com>

 > For starters, I think that rate, MTU, and SL (and maybe PKey too) need
 > to be the same. There may be others too if I stare at the spec for a
 > while...

Can you expand on why?  For example I definitely can send to the same
MLID with different SLs.  Of course MTU and rate need to match up but
I don't see that as a real restriction -- the SM needs to allows for
least-common-denominator values anyway, since the least-capable node
on the fabric might join an existing group.

I don't see why one MCG with an MTU of 2048 and one MCG with an MTU of
1024 can't share the same MLID, as long as the underlying fabric is
capable of supporting an MTU of 2048.  Actually, I wonder what the
spec says about what switches should do if they're asked to forward
packets with too-big MTUs?  Maybe it all works out anyway.

 - R.


From rdreier at cisco.com  Fri Feb 16 10:10:55 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Fri, 16 Feb 2007 10:10:55 -0800
Subject: [openib-general] IPv6oIB neighbour discover broken when MCGs
	overflow
In-Reply-To: <OFB2E451E1.A24CF99F-ON87257284.00633207-88257284.006361BA@us.ibm.com>
	(Shirley Ma's message of "Fri, 16 Feb 2007 10:05:32 -0800")
References: <OFB2E451E1.A24CF99F-ON87257284.00633207-88257284.006361BA@us.ibm.com>
Message-ID: <adasld6ezww.fsf@cisco.com>

 > > But what if the full-member join fails on node A for node A's
 > > solicited node group, but then node B succeeds in joining that group
 > > as a send-only member (perhaps because some other nodes have dropped
 > > off the fabric in the meantime).  Then node B will send the ND message
 > > on a MCG that A is not a member of.

 > Yes. B can send ND to A, and A responds without being a member so IPv6 ND
 > works. Is there any security or other problems here?

Node A is not a member of the group B is sending on, so SM does not
have to set up any routes for the messages to even reach node A.  So
it doesn't see the messages and doesn't respond to ND.

 - R.


From xma at us.ibm.com  Fri Feb 16 10:23:39 2007
From: xma at us.ibm.com (Shirley Ma)
Date: Fri, 16 Feb 2007 10:23:39 -0800
Subject: [openib-general] IPv6oIB neighbour discover broken when MCGs
	overflow
In-Reply-To: <adasld6ezww.fsf@cisco.com>
Message-ID: <OFE74BD281.9AB643B8-ON87257284.0064D8E0-88257284.00650A34@us.ibm.com>


Roland Dreier <rdreier at cisco.com> wrote on 02/16/2007 10:10:55 AM:

>  > > But what if the full-member join fails on node A for node A's
>  > > solicited node group, but then node B succeeds in joining that group
>  > > as a send-only member (perhaps because some other nodes have dropped
>  > > off the fabric in the meantime).  Then node B will send the ND
message
>  > > on a MCG that A is not a member of.
>
>  > Yes. B can send ND to A, and A responds without being a member so IPv6
ND
>  > works. Is there any security or other problems here?
>
> Node A is not a member of the group B is sending on, so SM does not
> have to set up any routes for the messages to even reach node A.  So
> it doesn't see the messages and doesn't respond to ND.
>
>  - R.

Two MCGs groups must be establised before IPoIB link up, one is broadcast
for IPv4, one is all hosts multicast for IPv6. So Node A is a member of all
hosts address, the patch directs ND sends to all hosts, so node A responses
it.

Thanks
Shirley Ma


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070216/fbf1ca6f/attachment.html>

From xma at us.ibm.com  Fri Feb 16 10:31:54 2007
From: xma at us.ibm.com (Shirley Ma)
Date: Fri, 16 Feb 2007 10:31:54 -0800
Subject: [openib-general] IPv6oIB neighbour discover broken when MCGs
	overflow
In-Reply-To: <adaabzej9px.fsf@cisco.com>
Message-ID: <OFFF34D992.BD9F7FFA-ON87257284.00653F2D-88257284.0065CBC3@us.ibm.com>


Roland Dreier <rdreier at cisco.com> wrote on 02/16/2007 09:25:30 AM:

>  > Even SM supports 1000 MCGs, it's still not sufficitent for 250 nodes
>  > cluster, each node have 4 links for IPv6 without any scope/global IPv6
>  > address configured.(250*4+ a few default MCGs) There will be a MCG
overflow
>  > problem anyway in IPv6oIB.
>
> But what's the problem with supporting 1000 or even 10000 MCGs?
>
>  - R.

I am not sure whether I understand your question. I am trying to answer it,
please let me know whether I am wrong.

Each IPv6 Link local address will create a unique solicited-node multicast
address, which will create unique full member of IB MCG, each other IPv6
address will create a solicited-node multicast address, whether it's unique
or not based on the groupID. So when IPv6 module being loaded in the
kernel, (or might be a part of kernel in the future) in SM, we will see
more than 1000 MCGs when IPoIB link up. Some of them can't join any MCGs.
Then IPv6 ND is broken with some of the nodes join failure.


Shirley Ma
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070216/81928b9f/attachment.html>

From mshefty at ichips.intel.com  Fri Feb 16 11:12:07 2007
From: mshefty at ichips.intel.com (Sean Hefty)
Date: Fri, 16 Feb 2007 11:12:07 -0800
Subject: [openib-general] SA multicast patches
In-Reply-To: <adak5yigf9u.fsf@cisco.com>
References: <000101c74a29$6f796610$e598070a@amr.corp.intel.com>
	<adatzxnnn3k.fsf@cisco.com> <45D4E133.3000302@ichips.intel.com>
	<adabqjuhubu.fsf@cisco.com> <adak5yigf9u.fsf@cisco.com>
Message-ID: <45D60207.6080406@ichips.intel.com>

Roland Dreier wrote:
> OK, another question about the multicast.c code:
> 
>  > +static struct mcast_group *mcast_find(struct mcast_port *port,
>  > +				      union ib_gid *mgid)
>  > +{
>  > +	struct rb_node *node = port->table.rb_node;
>  > +	struct mcast_group *group;
>  > +	int ret;
>  > +
>  > +	while (node) {
>  > +		group = rb_entry(node, struct mcast_group, node);
>  > +		ret = memcmp(mgid->raw, group->rec.mgid.raw, sizeof *mgid);
>  > +		if (!ret)
>  > +			return group;
>  > +
>  > +		if (ret < 0)
>  > +			node = node->rb_left;
>  > +		else
>  > +			node = node->rb_right;
>  > +	}
>  > +	return NULL;
>  > +}
>  > +
>  > +static struct mcast_group *mcast_insert(struct mcast_port *port,
>  > +					struct mcast_group *group,
>  > +					int allow_duplicates)
>  > +{
>  > +	struct rb_node **link = &port->table.rb_node;
>  > +	struct rb_node *parent = NULL;
>  > +	struct mcast_group *cur_group;
>  > +	int ret;
>  > +
>  > +	while (*link) {
>  > +		parent = *link;
>  > +		cur_group = rb_entry(parent, struct mcast_group, node);
>  > +
>  > +		ret = memcmp(group->rec.mgid.raw, cur_group->rec.mgid.raw,
>  > +			     sizeof group->rec.mgid);
>  > +		if (ret < 0)
>  > +			link = &(*link)->rb_left;
>  > +		else if (ret > 0)
>  > +			link = &(*link)->rb_right;
>  > +		else if (allow_duplicates)
>  > +			link = &(*link)->rb_left;
>  > +		else
>  > +			return cur_group;
>  > +	}
>  > +	rb_link_node(&group->node, parent, link);
>  > +	rb_insert_color(&group->node, &port->table);
>  > +	return NULL;
>  > +}
> 
> How does it work to put duplicates into the RB tree?  It seems
> especially strange that the lookup code does:

The only duplicates that should appear in the tree are for MGID 0.  After a join 
for MGID 0 completes, the group is removed from the tree and re-inserted based 
on the MGID that was assigned by the SA.

All multicast groups need to be tracked, which is why even groups with MGID 0 
are inserted into the tree.

>  > +		if (ret < 0)
>  > +			node = node->rb_left;
>  > +		else
>  > +			node = node->rb_right;
> 
> so if ret == 0 (ie the two GIDs being tested are the same) then it
> continues to traverse to the right, while the insert code does:

Immediately above this code, the group is returned if ret == 0.  Calling 
mcast_find() for MGID 0 isn't useful, so the code avoids doing this, but I think 
that it would work.  The caller would just get an arbitrary group.

> Also I'd be really worried that the rebalancing code freaks out when
> duplicate keys are inserted in the tree.

I would guess that the rebalancing code is based on left/right branching, and 
isn't aware of the actual key values.  Having duplicate keys would work fine 
then, with the restriction that code searching for a duplicated key would get an 
unpredictable match that is based on the current tree layout.

- Sean


From halr at voltaire.com  Fri Feb 16 10:47:58 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 16 Feb 2007 13:47:58 -0500
Subject: [openib-general] IPv6oIB neighbour discover broken when MCGs
 overflow
In-Reply-To: <ada4ppmgemo.fsf@cisco.com>
References: <OF29D0C259.70E22877-ON87257284.0017AEC5-88257283.0071A3BB@us.ibm.com>
	<adasld6kpfj.fsf@cisco.com>
	<1171646341.22446.266964.camel@hal.voltaire.com>
	<aday7myhv2l.fsf@cisco.com>
	<1171647108.22446.267617.camel@hal.voltaire.com>
	<adazm7egfjs.fsf@cisco.com>
	<1171648438.22446.268637.camel@hal.voltaire.com>
	<ada4ppmgemo.fsf@cisco.com>
Message-ID: <1171651661.22446.271399.camel@hal.voltaire.com>

On Fri, 2007-02-16 at 13:07, Roland Dreier wrote:
>  > For starters, I think that rate, MTU, and SL (and maybe PKey too) need
>  > to be the same. There may be others too if I stare at the spec for a
>  > while...
> 
> Can you expand on why?  For example I definitely can send to the same
> MLID with different SLs.  

Sure but I think this complicates the SL2VL tables in the subnet to
accomodate this. I think a similar thing is true for PKeys. So to me
this is an SM complexity issue when mapping multiple MGRPs to same MLID.

> Of course MTU and rate need to match up but
> I don't see that as a real restriction -- the SM needs to allows for
> least-common-denominator values anyway, since the least-capable node
> on the fabric might join an existing group.

In theory, the least capable node could join any group but is this
reality in operation ?

Different groups could have different LCDs so this would make things
less granular (one rather than multiple LCDs). This seems less
constraining to me.

> I don't see why one MCG with an MTU of 2048 and one MCG with an MTU of
> 1024 can't share the same MLID, as long as the underlying fabric is
> capable of supporting an MTU of 2048.

>From a pure MTU standpoint, the (only) downside of this is that the
group with MTU 1024 could send larger packets.

> Actually, I wonder what the
> spec says about what switches should do if they're asked to forward
> packets with too-big MTUs?  Maybe it all works out anyway.

They get dropped on the output port as packet length > NeighborMTU.
That's part of what PortXmitDiscards counts.

Bottom line: I'm not sure anything precludes what you are saying (I do
need to look at the spec more in terms of this), but I do think there
are different levels of complexity in SM implementation depending on how
much flexibility in mapping multiple MGRPs to the same MLID is
"desired".

-- Hal

>  - R.


From sean.hefty at intel.com  Fri Feb 16 11:48:49 2007
From: sean.hefty at intel.com (Sean Hefty)
Date: Fri, 16 Feb 2007 11:48:49 -0800
Subject: [openib-general] SA multicast patches
In-Reply-To: <000001c751f4$943af3b0$e598070a@amr.corp.intel.com>
Message-ID: <000101c75203$7ace2be0$e598070a@amr.corp.intel.com>

>For now, I think the safest thing to do is just remove the entire 'else'
>portion
>from the function and return an error if the MGID is 0.  Neither of the places
>that call into ib_sa_get_mcmember_rec() should pass in an MGID of 0.  (I'm
>testing this now to verify.)

I'm not sure if you'll need this, but I've updated the two multicast patches in
my for-roland branch based on a couple of comments.  All changes are minor.

* Converted a list_del/list_add combo to list_move.
* Changed a couple of kzalloc calls to kmalloc.
* Modified ib_sa_get_mcmember_rec to no longer return default MCMemberRecord
settings.

The for-roland branch is based on the tip of Linus' tree from two days ago, but
I re-tested the changes against 2.6.20.

If there's an easier way for you to handle these types of updates, just let me
know.

- Sean


From rdreier at cisco.com  Fri Feb 16 13:31:29 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Fri, 16 Feb 2007 13:31:29 -0800
Subject: [openib-general] IPv6oIB neighbour discover broken when MCGs
 overflow
In-Reply-To: <1171651661.22446.271399.camel@hal.voltaire.com> (Hal
	Rosenstock's message of "16 Feb 2007 13:47:58 -0500")
References: <OF29D0C259.70E22877-ON87257284.0017AEC5-88257283.0071A3BB@us.ibm.com>
	<adasld6kpfj.fsf@cisco.com>
	<1171646341.22446.266964.camel@hal.voltaire.com>
	<aday7myhv2l.fsf@cisco.com>
	<1171647108.22446.267617.camel@hal.voltaire.com>
	<adazm7egfjs.fsf@cisco.com>
	<1171648438.22446.268637.camel@hal.voltaire.com>
	<ada4ppmgemo.fsf@cisco.com>
	<1171651661.22446.271399.camel@hal.voltaire.com>
Message-ID: <ada64a1g572.fsf@cisco.com>

 > Sure but I think this complicates the SL2VL tables in the subnet to
 > accomodate this. I think a similar thing is true for PKeys. So to me
 > this is an SM complexity issue when mapping multiple MGRPs to same MLID.

I'm still confused.  Aren't SL2VL and P_Key tables completely
orthogonal from forwarding tables?  Obviously there's no problem using
multiple different SLs or P_Keys to reach the same endport using the
same LID, so I don't understand why MLIDs would be different.

 - R.


From rdreier at cisco.com  Fri Feb 16 13:47:23 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Fri, 16 Feb 2007 13:47:23 -0800
Subject: [openib-general] SA multicast patches
In-Reply-To: <45D60207.6080406@ichips.intel.com> (Sean Hefty's message
	of "Fri, 16 Feb 2007 11:12:07 -0800")
References: <000101c74a29$6f796610$e598070a@amr.corp.intel.com>
	<adatzxnnn3k.fsf@cisco.com> <45D4E133.3000302@ichips.intel.com>
	<adabqjuhubu.fsf@cisco.com> <adak5yigf9u.fsf@cisco.com>
	<45D60207.6080406@ichips.intel.com>
Message-ID: <adabqjtepw4.fsf@cisco.com>

 > All multicast groups need to be tracked, which is why even groups with
 > MGID 0 are inserted into the tree.

OK...

 > Immediately above this code, the group is returned if ret == 0.

Right, I missed that.  But...

 > Calling mcast_find() for MGID 0 isn't useful, so the code avoids doing
 > this, but I think that it would work.  The caller would just get an
 > arbitrary group.

Now this is confusing -- you say the code avoids looking up MGID 0 in
the rbtree.  So why do you have to insert those groups in the tree and
have the allow_duplicates() flag etc?  If you're never going to look
up the group, I assume you have some other way of finding it and so
you don't actually have to insert MGID 0 groups after all... right?

Or is it that you want to be able to iterate through the whole rbtree
and get the MGID 0 groups too?

 - R.


From rdreier at cisco.com  Fri Feb 16 13:55:06 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Fri, 16 Feb 2007 13:55:06 -0800
Subject: [openib-general] IPv6oIB neighbour discover broken when MCGs
	overflow
In-Reply-To: <OFE74BD281.9AB643B8-ON87257284.0064D8E0-88257284.00650A34@us.ibm.com>
	(Shirley Ma's message of "Fri, 16 Feb 2007 10:23:39 -0800")
References: <OFE74BD281.9AB643B8-ON87257284.0064D8E0-88257284.00650A34@us.ibm.com>
Message-ID: <adaodntdayt.fsf@cisco.com>

 > Two MCGs groups must be establised before IPoIB link up, one is broadcast
 > for IPv4, one is all hosts multicast for IPv6. So Node A is a member of all
 > hosts address, the patch directs ND sends to all hosts, so node A responses
 > it.

I'm still confused.  How do you interoperate with other RFC-compliant
nodes (they might not have your patch or might not even be running
Linux) that send ND messages to the solicited node group?  If node A
has your patch and doesn't try to join its own solicited node group,
then another node that doesn't know to send ND messages to the all
nodes group will not be able to find it.

 - R.


From mshefty at ichips.intel.com  Fri Feb 16 13:56:38 2007
From: mshefty at ichips.intel.com (Sean Hefty)
Date: Fri, 16 Feb 2007 13:56:38 -0800
Subject: [openib-general] SA multicast patches
In-Reply-To: <adabqjtepw4.fsf@cisco.com>
References: <000101c74a29$6f796610$e598070a@amr.corp.intel.com>
	<adatzxnnn3k.fsf@cisco.com> <45D4E133.3000302@ichips.intel.com>
	<adabqjuhubu.fsf@cisco.com> <adak5yigf9u.fsf@cisco.com>
	<45D60207.6080406@ichips.intel.com> <adabqjtepw4.fsf@cisco.com>
Message-ID: <45D62896.9060008@ichips.intel.com>

> Or is it that you want to be able to iterate through the whole rbtree
> and get the MGID 0 groups too?

This is it - see mcast_groups_lost().  That call transitions all multicast 
groups into an error state, and reports to the user that the group information 
may have been lost by the SA.  (We can't trust that a successful join response 
is still valid, even if it is reported after we receive a fatal event.)

- Sean


From xma at us.ibm.com  Fri Feb 16 14:25:28 2007
From: xma at us.ibm.com (Shirley Ma)
Date: Fri, 16 Feb 2007 14:25:28 -0800
Subject: [openib-general] IPv6oIB neighbour discover broken when MCGs
	overflow
In-Reply-To: <adaodntdayt.fsf@cisco.com>
Message-ID: <OFDBADD977.0ACF1F23-ON87257284.007971CF-88257284.007B2E04@us.ibm.com>


Roland Dreier <rdreier at cisco.com> wrote on 02/16/2007 01:55:06 PM:

>  > Two MCGs groups must be establised before IPoIB link up, one is
broadcast
>  > for IPv4, one is all hosts multicast for IPv6. So Node A is a member
of all
>  > hosts address, the patch directs ND sends to all hosts, so node
Aresponses
>  > it.
>
> I'm still confused.  How do you interoperate with other RFC-compliant
> nodes (they might not have your patch or might not even be running
> Linux) that send ND messages to the solicited node group?  If node A
> has your patch and doesn't try to join its own solicited node group,
> then another node that doesn't know to send ND messages to the all
> nodes group will not be able to find it.
>
>  - R.

All nodes in the subnet join all hosts multicast group by default. What the
patch does differently than before, is when join failure, sends to all
hosts multicast group instead of sending to a particular solicited-node
multicast address, the node with the destination solicited-node multicast
address will respond to it, so the network will not lose the connectivity
when MCGs overflow. There is no interoperability issue here between patched
and unpatched node or Linux and none-Linux node. I don't think IPoIB RFC
covers this corner case. So there is no RFC-compliant problem here. I will
discuss this with the author.

Thanks
Shirley Ma
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070216/8cc810f2/attachment.html>

From rdreier at cisco.com  Fri Feb 16 14:36:50 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Fri, 16 Feb 2007 14:36:50 -0800
Subject: [openib-general] please pull for 2.6.21: fix + add IB multicast
 support
In-Reply-To: <000101c74a29$6f796610$e598070a@amr.corp.intel.com> (Sean
	Hefty's message of "Tue, 6 Feb 2007 12:00:22 -0800")
References: <000101c74a29$6f796610$e598070a@amr.corp.intel.com>
Message-ID: <adar6spbugt.fsf@cisco.com>

OK, I pulled this in to my for-2.6.21 branch and I will ask Linus to
pull later today.

Thanks.

 - R.


From ardavis at ichips.intel.com  Fri Feb 16 14:38:51 2007
From: ardavis at ichips.intel.com (Arlin Davis)
Date: Fri, 16 Feb 2007 14:38:51 -0800
Subject: [openib-general] OFED 1.2 dapl and dat.conf
In-Reply-To: <1171561783.3161.165.camel@fc6.xsintricity.com>
References: <1171397522.21471.7.camel@stevo-desktop>
	<45D37E8E.5050800@ichips.intel.com>
	<1171561783.3161.165.camel@fc6.xsintricity.com>
Message-ID: <45D6327B.4060606@ichips.intel.com>

Doug Ledford wrote:

>On Wed, 2007-02-14 at 13:26 -0800, Arlin Davis wrote:
>  
>
>>Steve Wise wrote:
>>
>>    
>>
>>>Currently, the dapl rpms don't install dat.conf.  I think they probably
>>>should, eh?  Maybe in <prefix>/etc/dat.conf
>>> 
>>>
>>>      
>>>
>>my specfile is setup to target sysconfdir which is typically set to 
>>`$(prefix)/etc'
>>
>>%{_sysconfdir}/dat.conf
>>
>>I am not sure how the 1.2 scripts are building the rpms. Maybe Vladimir 
>>can help explain?
>>    
>>
>
>Note that this setup is problematic on multilib arches.  Since the
>dat.conf file hard codes a library path that's different for 32bit/64bit
>arches, installing both a 32bit and 64bit dapl library is impossible
>without munging things.
>
>For RHEL4U5/RHEL5 I changed the dat library to read dat<bits>.conf and
>have two separate conf files.  A probably better approach would be to
>change the library to use a relative library name that it looks for
>starting from the libraries own directory.  Hence if the dapl library is
>in /usr/lib, it looks in /usr/lib.  Doing that would allow the
>32bit/64bit libraries to share the same config file.
>
>  
>
This is a good idea. I will take a look at dladdr options to set 
appropriate starting path for dapl libraries when absolute paths are not 
specified.

James, do you see any issues with this approach?

Vladimir, can you tell me how the OFED 1.2 install scripts are handling 
the dat.conf?

-arlin


From mshefty at ichips.intel.com  Fri Feb 16 14:41:39 2007
From: mshefty at ichips.intel.com (Sean Hefty)
Date: Fri, 16 Feb 2007 14:41:39 -0800
Subject: [openib-general] please pull for 2.6.21: fix + add IB multicast
 support
In-Reply-To: <adar6spbugt.fsf@cisco.com>
References: <000101c74a29$6f796610$e598070a@amr.corp.intel.com>
	<adar6spbugt.fsf@cisco.com>
Message-ID: <45D63323.1000404@ichips.intel.com>

Roland Dreier wrote:
> OK, I pulled this in to my for-2.6.21 branch and I will ask Linus to
> pull later today.

Thanks for the review.


From halr at voltaire.com  Fri Feb 16 15:09:26 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 16 Feb 2007 18:09:26 -0500
Subject: [openib-general] IPv6oIB neighbour discover broken when MCGs
 overflow
In-Reply-To: <ada64a1g572.fsf@cisco.com>
References: <OF29D0C259.70E22877-ON87257284.0017AEC5-88257283.0071A3BB@us.ibm.com>
	<adasld6kpfj.fsf@cisco.com>
	<1171646341.22446.266964.camel@hal.voltaire.com>
	<aday7myhv2l.fsf@cisco.com>
	<1171647108.22446.267617.camel@hal.voltaire.com>
	<adazm7egfjs.fsf@cisco.com>
	<1171648438.22446.268637.camel@hal.voltaire.com>
	<ada4ppmgemo.fsf@cisco.com>
	<1171651661.22446.271399.camel@hal.voltaire.com>
	<ada64a1g572.fsf@cisco.com>
Message-ID: <1171667328.22446.285104.camel@hal.voltaire.com>

On Fri, 2007-02-16 at 16:31, Roland Dreier wrote:
>  > Sure but I think this complicates the SL2VL tables in the subnet to
>  > accomodate this. I think a similar thing is true for PKeys. So to me
>  > this is an SM complexity issue when mapping multiple MGRPs to same MLID.
> 
> I'm still confused.  Aren't SL2VL and P_Key tables completely
> orthogonal from forwarding tables?

Sure they are separate mechanisms but by overloading MLIDs I think that
this ends up making them interdependent. I think it complicates the
use/configuration of those mechanisms depending on how flexible this is.
Either of those mechanisms could filter out the packet long before it
ever gets to some destination.

> Obviously there's no problem using
> multiple different SLs or P_Keys to reach the same endport using the
> same LID, so I don't understand why MLIDs would be different.

In terms of PKeys and overloaded MLIDs, p. 149 line 11 states:

"When a multicast LID is overloaded, the multicast groups sharing the
same MLID must have the same P_Key. This simplification is required to
allow switches and routers that implement optional P_Key enforcement for
multicast operations."

-- Hal

>  - R.


From rdreier at cisco.com  Fri Feb 16 15:32:46 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Fri, 16 Feb 2007 15:32:46 -0800
Subject: [openib-general] [PATCH] IB/core: Set static rate in
	ib_init_ah_from_path()
Message-ID: <adasld5adb5.fsf@cisco.com>

Guys, any reason not to merge this?  It's step one of the cleanups
from Jason's patch to make IPoIB work with global routes...

The static rate from the path record should be put into the address
vector -- a long time ago the rate in the address attributes needed to
be a relative rate, which required more munging, but now that the
conversion from absolute to relative is done in the low-level driver,
it's easy for ib_init_ah_from_path() to put the absolute rate in.

Cc: Jason Gunthorpe <jgunthorpe at obsidianresearch.com>
Cc: Sean Hefty <sean.hefty at intel.com>
Signed-off-by: Roland Dreier <rolandd at cisco.com>
---
 drivers/infiniband/core/sa_query.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/drivers/infiniband/core/sa_query.c b/drivers/infiniband/core/sa_query.c
index d7d4a53..68db633 100644
--- a/drivers/infiniband/core/sa_query.c
+++ b/drivers/infiniband/core/sa_query.c
@@ -471,6 +471,7 @@ int ib_init_ah_from_path(struct ib_device *device, u8 port_num,
 	ah_attr->sl = rec->sl;
 	ah_attr->src_path_bits = be16_to_cpu(rec->slid) & 0x7f;
 	ah_attr->port_num = port_num;
+	ah_attr->static_rate = rec->rate;
 
 	if (rec->hop_limit > 1) {
 		ah_attr->ah_flags = IB_AH_GRH;
-- 
1.4.4.4


From sean.hefty at intel.com  Fri Feb 16 15:35:38 2007
From: sean.hefty at intel.com (Sean Hefty)
Date: Fri, 16 Feb 2007 15:35:38 -0800
Subject: [openib-general] [PATCH] IB/core: Set static rate in
	ib_init_ah_from_path()
In-Reply-To: <adasld5adb5.fsf@cisco.com>
Message-ID: <000401c75223$29e86ea0$e598070a@amr.corp.intel.com>

>Guys, any reason not to merge this?  It's step one of the cleanups
>from Jason's patch to make IPoIB work with global routes...

I would like to see this merged in.

- Sean


From rdreier at cisco.com  Fri Feb 16 15:48:19 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Fri, 16 Feb 2007 15:48:19 -0800
Subject: [openib-general] [GIT PULL] please pull infiniband.git
Message-ID: <adahctlacl8.fsf@cisco.com>

Linus, please pull from

    master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband.git for-linus

This tree is also available from kernel.org mirrors at:

    git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git for-linus

This adds IB multicast tracking, to allow userspace to use multicast
groups in a sane way, an ehca interrupt handling fixup, and a few
other minor things.  I don't think there is anything major left, so we
should be good for 2.6.21-rc1 after this pull.

Dotan Barak (1):
      IB/mthca: Allow the QP state transition RESET->RESET

Hoang-Nam Nguyen (4):
      IB/ehca: Rework irq handler
      IB/ehca: Fix race condition/locking issues in scaling code
      IB/ehca: Allow en/disabling scaling code via module parameter
      IB/ehca: Change query_port() to return LINK_UP instead UNKNOWN

Michael S. Tsirkin (1):
      IPoIB: CM error handling thinko fix

Roland Dreier (5):
      IB/mthca: Fix allocation of ICM chunks in coherent memory
      IPoIB: Only allow root to change between datagram and connected mode
      IB/core: Fix sparse warnings about shadowed declarations
      IB/ipath: Make ipath_map_sg() static
      IB/core: Set static rate in ib_init_ah_from_path()

Sean Hefty (2):
      IB/sa: Track multicast join/leave requests
      RDMA/cma: Add multicast communication support

Steve Wise (3):
      RDMA/iwcm: iw_cm_id destruction race fixes
      RDMA/cxgb3: Fail posts synchronously when in TERMINATE state
      RDMA/cxgb3: Remove Open Grid Computing copyrights in iw_cxgb3 driver

 drivers/infiniband/core/Makefile               |    2 +-
 drivers/infiniband/core/cma.c                  |  359 +++++++++--
 drivers/infiniband/core/fmr_pool.c             |    4 +-
 drivers/infiniband/core/iwcm.c                 |   47 +-
 drivers/infiniband/core/multicast.c            |  837 ++++++++++++++++++++++++
 drivers/infiniband/core/sa.h                   |   66 ++
 drivers/infiniband/core/sa_query.c             |   30 +-
 drivers/infiniband/core/sysfs.c                |    2 -
 drivers/infiniband/core/ucma.c                 |  204 ++++++-
 drivers/infiniband/hw/cxgb3/cxio_dbg.c         |    1 -
 drivers/infiniband/hw/cxgb3/cxio_hal.c         |    1 -
 drivers/infiniband/hw/cxgb3/cxio_hal.h         |    1 -
 drivers/infiniband/hw/cxgb3/cxio_resource.c    |    1 -
 drivers/infiniband/hw/cxgb3/cxio_resource.h    |    1 -
 drivers/infiniband/hw/cxgb3/cxio_wr.h          |    1 -
 drivers/infiniband/hw/cxgb3/iwch.c             |    1 -
 drivers/infiniband/hw/cxgb3/iwch.h             |    1 -
 drivers/infiniband/hw/cxgb3/iwch_cm.c          |    1 -
 drivers/infiniband/hw/cxgb3/iwch_cm.h          |    1 -
 drivers/infiniband/hw/cxgb3/iwch_cq.c          |    1 -
 drivers/infiniband/hw/cxgb3/iwch_ev.c          |    1 -
 drivers/infiniband/hw/cxgb3/iwch_mem.c         |    1 -
 drivers/infiniband/hw/cxgb3/iwch_provider.c    |    1 -
 drivers/infiniband/hw/cxgb3/iwch_provider.h    |    1 -
 drivers/infiniband/hw/cxgb3/iwch_qp.c          |    3 +-
 drivers/infiniband/hw/cxgb3/iwch_user.h        |    1 -
 drivers/infiniband/hw/ehca/Kconfig             |    8 -
 drivers/infiniband/hw/ehca/ehca_classes.h      |   19 +-
 drivers/infiniband/hw/ehca/ehca_eq.c           |    1 +
 drivers/infiniband/hw/ehca/ehca_hca.c          |    3 +
 drivers/infiniband/hw/ehca/ehca_irq.c          |  307 +++++----
 drivers/infiniband/hw/ehca/ehca_irq.h          |    1 +
 drivers/infiniband/hw/ehca/ehca_main.c         |   32 +-
 drivers/infiniband/hw/ehca/ipz_pt_fn.h         |   11 +-
 drivers/infiniband/hw/ipath/ipath_dma.c        |    4 +-
 drivers/infiniband/hw/mthca/mthca_memfree.c    |    4 +-
 drivers/infiniband/hw/mthca/mthca_qp.c         |    5 +
 drivers/infiniband/ulp/ipoib/ipoib_cm.c        |    4 +-
 drivers/infiniband/ulp/ipoib/ipoib_multicast.c |  195 ++----
 include/rdma/ib_addr.h                         |    6 +
 include/rdma/ib_sa.h                           |  159 ++---
 include/rdma/rdma_cm.h                         |   21 +-
 include/rdma/rdma_cm_ib.h                      |    4 +-
 include/rdma/rdma_user_cm.h                    |   13 +-
 44 files changed, 1889 insertions(+), 478 deletions(-)
 create mode 100644 drivers/infiniband/core/multicast.c
 create mode 100644 drivers/infiniband/core/sa.h


From vlad at lists.openfabrics.org  Sat Feb 17 02:24:35 2007
From: vlad at lists.openfabrics.org (vlad at lists.openfabrics.org)
Date: Sat, 17 Feb 2007 02:24:35 -0800 (PST)
Subject: [openib-general] ofa_1_2_kernel 20070217-0200 daily build status
Message-ID: <20070217102436.CC885E6080C@openfabrics.org>

This email was generated automatically, please do not reply


Common build parameters:  --with-ipoib-mod --with-sdp-mod --with-srp-mod --with-user_mad-mod --with-user_access-mod --with-mthca-mod --with-core-mod --with-addr_trans-mod --with-cxgb3-mod 

Passed:
Passed on i686 with 2.6.15-23-server
Passed on i686 with linux-2.6.14
Passed on i686 with linux-2.6.17
Passed on i686 with linux-2.6.12
Passed on i686 with linux-2.6.15
Passed on i686 with linux-2.6.16
Passed on i686 with linux-2.6.18
Passed on i686 with linux-2.6.13
Passed on i686 with linux-2.6.19
Passed on x86_64 with linux-2.6.19
Passed on ia64 with linux-2.6.18
Passed on x86_64 with linux-2.6.20
Passed on powerpc with linux-2.6.19
Passed on x86_64 with linux-2.6.17
Passed on x86_64 with linux-2.6.18
Passed on x86_64 with linux-2.6.15
Passed on powerpc with linux-2.6.17
Passed on x86_64 with linux-2.6.16
Passed on powerpc with linux-2.6.18
Passed on x86_64 with linux-2.6.12
Passed on x86_64 with linux-2.6.13
Passed on x86_64 with linux-2.6.14
Passed on ppc64 with linux-2.6.15
Passed on ppc64 with linux-2.6.19
Passed on ia64 with linux-2.6.19
Passed on ppc64 with linux-2.6.12
Passed on ppc64 with linux-2.6.17
Passed on powerpc with linux-2.6.13
Passed on powerpc with linux-2.6.16
Passed on powerpc with linux-2.6.12
Passed on powerpc with linux-2.6.14
Passed on ppc64 with linux-2.6.14
Passed on ppc64 with linux-2.6.18
Passed on powerpc with linux-2.6.15
Passed on ia64 with linux-2.6.12
Passed on ppc64 with linux-2.6.16
Passed on ppc64 with linux-2.6.13
Passed on ia64 with linux-2.6.13
Passed on ia64 with linux-2.6.14
Passed on ia64 with linux-2.6.15
Passed on ia64 with linux-2.6.16
Passed on ia64 with linux-2.6.17

Failed:


From halr at voltaire.com  Sat Feb 17 06:00:21 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 17 Feb 2007 09:00:21 -0500
Subject: [openib-general] Unknown SMP Recv
In-Reply-To: <000801c750a6$cd925120$21606d86@one7>
References: <000901c74938$e10b2a30$21606d86@one7>
	<1170689654.4525.201415.camel@hal.voltaire.com>
	<001401c74946$a664a2e0$21606d86@one7>
	<1170695591.4525.207604.camel@hal.voltaire.com>
	<002001c74a33$c2ec1db0$21606d86@one7>
	<1170807564.4525.324195.camel@hal.voltaire.com>
	<001e01c74be2$b4889310$21606d86@one7>
	<1170994529.31538.124584.camel@hal.voltaire.com>
	<000401c74c6d$ce4875f0$21606d86@one7>
	<1171044773.31538.175280.camel@hal.voltaire.com>
	<000401c74c79$74439b50$21606d86@one7>
	<1171051141.2767.7.camel@localhost>
	<001001c74c87$8b653470$21606d86@one7>
	<1171122546.31538.251673.camel@hal.voltaire.com>
	<000801c750a6$cd925120$21606d86@one7>
Message-ID: <1171720820.4380.40177.camel@hal.voltaire.com>

On Wed, 2007-02-14 at 21:12, Michael Arndt wrote:
> Hi,
> 
> what I forgot was that the write function in umad_send returns with -1 if 
> the error occurs.

That's looks like EPERM. Not sure why write would return this. The only
thing I see that might return this is handle_outgoing_dr_smp on some
errors but I didn't chase this all the way through. 

-- Hal

>  Maybe that helps.
> 
> Thanks Michael 
> 


From michael.arndt at informatik.tu-chemnitz.de  Sat Feb 17 06:43:21 2007
From: michael.arndt at informatik.tu-chemnitz.de (Michael Arndt)
Date: Sat, 17 Feb 2007 15:43:21 +0100
Subject: [openib-general] Unknown SMP Recv
References: <000901c74938$e10b2a30$21606d86@one7>
	<1170689654.4525.201415.camel@hal.voltaire.com>
	<001401c74946$a664a2e0$21606d86@one7>
	<1170695591.4525.207604.camel@hal.voltaire.com>
	<002001c74a33$c2ec1db0$21606d86@one7>
	<1170807564.4525.324195.camel@hal.voltaire.com>
	<001e01c74be2$b4889310$21606d86@one7>
	<1170994529.31538.124584.camel@hal.voltaire.com>
	<000401c74c6d$ce4875f0$21606d86@one7>
	<1171044773.31538.175280.camel@hal.voltaire.com>
	<000401c74c79$74439b50$21606d86@one7>
	<1171051141.2767.7.camel@localhost>
	<001001c74c87$8b653470$21606d86@one7>
	<1171122546.31538.251673.camel@hal.voltaire.com>
	<000801c750a6$cd925120$21606d86@one7>
	<1171720820.4380.40177.camel@hal.voltaire.com>
Message-ID: <000401c752a1$f8d8f990$21606d86@one7>

Hi,

I have solved the problem by my own. There was an error in the TID changing 
process in my own code. That caused a sending with duplicated TIDs which 
brought up the error. Now it works fine.

Thank you for your efforts

Michael 


From dotanb at dev.mellanox.co.il  Sat Feb 17 09:16:09 2007
From: dotanb at dev.mellanox.co.il (dotanb at dev.mellanox.co.il)
Date: Sat, 17 Feb 2007 19:16:09 +0200 (IST)
Subject: [openib-general] [PATCH] IB/core: Set static rate in
 ib_init_ah_from_path()
In-Reply-To: <000401c75223$29e86ea0$e598070a@amr.corp.intel.com>
References: <000401c75223$29e86ea0$e598070a@amr.corp.intel.com>
Message-ID: <1431.85.65.224.140.1171732569.squirrel@dev.mellanox.co.il>

Hi guys.

>>Guys, any reason not to merge this?  It's step one of the cleanups
>>from Jason's patch to make IPoIB work with global routes...
>
> I would like to see this merged in.
>
> - Sean
In issue number 296 that i opened several months ago in the Bugzilla, i
reported about two missing attributes: the first one is the static_rate,
and the second one is the src_path_bits which is not being filled right.

Can someone look at this issue?

thanks
Dotan


From yipeeyipeeyipeeyipee at yahoo.com  Sat Feb 17 23:33:53 2007
From: yipeeyipeeyipeeyipee at yahoo.com (yipeeyipeeyipeeyipee)
Date: Sun, 18 Feb 2007 07:33:53 +0000 (UTC)
Subject: [openib-general] bad port physstate
References: <loom.20070215T164121-318@post.gmane.org>
	<1171556073.22446.185292.camel@hal.voltaire.com>
Message-ID: <loom.20070218T083150-541@post.gmane.org>

Hal Rosenstock <halr <at> voltaire.com> writes:
[snip]

> I would expect an smpquery of portinfo of this or ibnetdiscover would
> now show this.

nope. After I start getting the link state error it doesn't recover and I keep
getting the same error.


From sweitzen at cisco.com  Sun Feb 18 00:42:18 2007
From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen))
Date: Sun, 18 Feb 2007 00:42:18 -0800
Subject: [openib-general] MVAPICH2 working with OFED 1.2 alpha1 and IB?
Message-ID: <A15335FBE9BD2449AF2C9EF3D1EB8EA3030525F6@xmb-sjc-216.amer.cisco.com>

I get this on both RHEL4 and SLES10 trying to run any programs over IB
with MVAPICH2:
 
$ /usr/local/ofed/mpi/gcc/mvapich2-0.9.8-3/bin/mpiexec -n 2
`pwd`/osu_latency.x
 
rank 0 in job 6  svbu-qa1850-1_35332   caused collective abort of all
ranks
  exit status of rank 0: killed by signal 9
 
Scott Weitzenkamp
SQA and Release Manager
Server Virtualization Business Unit
Cisco Systems
 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070218/0087d344/attachment.html>

From vlad at lists.openfabrics.org  Sun Feb 18 02:24:19 2007
From: vlad at lists.openfabrics.org (vlad at lists.openfabrics.org)
Date: Sun, 18 Feb 2007 02:24:19 -0800 (PST)
Subject: [openib-general] ofa_1_2_kernel 20070218-0200 daily build status
Message-ID: <20070218102419.5140AE6080C@openfabrics.org>

This email was generated automatically, please do not reply


Common build parameters:  --with-ipoib-mod --with-sdp-mod --with-srp-mod --with-user_mad-mod --with-user_access-mod --with-mthca-mod --with-core-mod --with-addr_trans-mod --with-cxgb3-mod 

Passed:
Passed on i686 with 2.6.15-23-server
Passed on i686 with linux-2.6.17
Passed on i686 with linux-2.6.18
Passed on i686 with linux-2.6.16
Passed on i686 with linux-2.6.13
Passed on i686 with linux-2.6.14
Passed on i686 with linux-2.6.19
Passed on i686 with linux-2.6.15
Passed on i686 with linux-2.6.12
Passed on powerpc with linux-2.6.19
Passed on x86_64 with linux-2.6.20
Passed on x86_64 with linux-2.6.19
Passed on x86_64 with linux-2.6.16
Passed on ppc64 with linux-2.6.19
Passed on powerpc with linux-2.6.17
Passed on powerpc with linux-2.6.18
Passed on x86_64 with linux-2.6.18
Passed on x86_64 with linux-2.6.14
Passed on x86_64 with linux-2.6.17
Passed on x86_64 with linux-2.6.13
Passed on x86_64 with linux-2.6.12
Passed on x86_64 with linux-2.6.15
Passed on ia64 with linux-2.6.19
Passed on powerpc with linux-2.6.12
Passed on ia64 with linux-2.6.16
Passed on ppc64 with linux-2.6.18
Passed on powerpc with linux-2.6.16
Passed on ppc64 with linux-2.6.16
Passed on ppc64 with linux-2.6.12
Passed on ppc64 with linux-2.6.13
Passed on powerpc with linux-2.6.14
Passed on ppc64 with linux-2.6.14
Passed on ia64 with linux-2.6.18
Passed on powerpc with linux-2.6.13
Passed on ppc64 with linux-2.6.15
Passed on powerpc with linux-2.6.15
Passed on ia64 with linux-2.6.14
Passed on ia64 with linux-2.6.17
Passed on ia64 with linux-2.6.13
Passed on ppc64 with linux-2.6.17
Passed on ia64 with linux-2.6.12
Passed on ia64 with linux-2.6.15

Failed:


From halr at voltaire.com  Sun Feb 18 03:41:21 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 18 Feb 2007 06:41:21 -0500
Subject: [openib-general] bad port physstate
In-Reply-To: <loom.20070218T083150-541@post.gmane.org>
References: <loom.20070215T164121-318@post.gmane.org>
	<1171556073.22446.185292.camel@hal.voltaire.com>
	<loom.20070218T083150-541@post.gmane.org>
Message-ID: <1171798880.4380.118535.camel@hal.voltaire.com>

On Sun, 2007-02-18 at 02:33, yipeeyipeeyipeeyipee wrote:
> Hal Rosenstock <halr <at> voltaire.com> writes:
> [snip]
> 
> > I would expect an smpquery of portinfo of this or ibnetdiscover would
> > now show this.
> 
> nope. After I start getting the link state error it doesn't recover and I keep
> getting the same error.

Try swapping cables on that port to a known good one and see if this
helps.

-- Hal

> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
> 


From panda at cse.ohio-state.edu  Sun Feb 18 06:32:32 2007
From: panda at cse.ohio-state.edu (Dhabaleswar Panda)
Date: Sun, 18 Feb 2007 09:32:32 -0500 (EST)
Subject: [openib-general] MVAPICH2 working with OFED 1.2 alpha1 and IB?
In-Reply-To: <A15335FBE9BD2449AF2C9EF3D1EB8EA3030525F6@xmb-sjc-216.amer.cisco.com>
	from "Scott Weitzenkamp (sweitzen)" at Feb 18, 2007 12:42:18 AM
Message-ID: <200702181432.l1IEWWqn024394@xi.cse.ohio-state.edu>

> I get this on both RHEL4 and SLES10 trying to run any programs over IB
> with MVAPICH2:
>
> $ /usr/local/ofed/mpi/gcc/mvapich2-0.9.8-3/bin/mpiexec -n 2
> `pwd`/osu_latency.x
>
> rank 0 in job 6  svbu-qa1850-1_35332   caused collective abort of all
> ranks
>   exit status of rank 0: killed by signal 9

It looks like you are using an older version of the SRPM:
mvapich2-0.9.8-3. This version had some shared library issues with the
ofed 1.2 build. The latest MVAPICH2 SRPM version is
mvapich2-0.9.8-4. Shaun posted the following e-mail on Feb 15th.

Please use this latest version and let us know whether the problem
still persists.

Thanks, 

DK

=============================================

Steve Wise wrote:
> Shaun,
> 
> Lemme know if you have an mvapich2 kit that I can test with iwarp...

Hi Steve. I've updated our SRPM:

https://www.openfabrics.org/~rowland/ofed_1_2/

The latest is mvapich2-0.9.8-4.src.rpm. This version should solve the
shared library linking issues. This can be built outside of the OFED 1.2
alpha1 release with the information in the README file or can replace
the previous SRPM in the OFED-1.2-alpha1/SRPMS/ directory. To use iWARP,
use the OFA build of the SRPM and set MV2_ENABLE_IWARP_MODE=1 in your
environment.
-- 
Shaun Rowland   rowland at cse.ohio-state.edu
http://www.cse.ohio-state.edu/~rowland/


From sashak at voltaire.com  Sun Feb 18 07:50:06 2007
From: sashak at voltaire.com (Sasha Khapyorsky)
Date: Sun, 18 Feb 2007 17:50:06 +0200
Subject: [openib-general] Fwd: [ANNOUNCE] GIT 1.5.0
In-Reply-To: <20070215071537.GD11866@mellanox.co.il>
References: <20070215071537.GD11866@mellanox.co.il>
Message-ID: <20070218155006.GS27414@sashak.voltaire.com>

On 09:15 Thu 15 Feb     , Michael S. Tsirkin wrote:
> FYI.
> I suggest we update git on the openfabrics server to 1.5.0:
> "Detached HEAD" feature will be useful for nightly build scripts.
> Sasha?

git-1.5.0 feature list looks fine for me. But let's wait with upgrade a
couple of days for 1.5.0.1.

Sasha


From tziporet at mellanox.co.il  Sun Feb 18 08:29:12 2007
From: tziporet at mellanox.co.il (Tziporet Koren)
Date: Sun, 18 Feb 2007 18:29:12 +0200
Subject: [openib-general] how to handle OFEd 1.2 bugs in bugzilla
Message-ID: <6C2C79E72C305246B504CBA17B5500C9A0DE2A@mtlexch01.mtl.com>

All,

Please clean bugs that you opened for OFED 1.1/1.0 so we can work with
bugzilla in efficient manner with OFED 1.2.

For bugs that were found in previous OFED releases and are still
relevant for OFED 1.2 please change product version so I will see them
when I look at OFED 1.2 bugs.

Thanks,
Tziporet 

-----Original Message-----
From: Scott Weitzenkamp (sweitzen) [mailto:sweitzen at cisco.com] 
Sent: Wednesday, February 14, 2007 9:13 PM
To: Tziporet Koren; Scott Weitzenkamp (sweitzen)
Cc: EWG; OPENIB
Subject: RE: how to handle OFEd 1.2 bugs in bugzilla

Yes, I'd like to add alpha1, etc. version numbers in bugzilla.

For existing bugs, the Reporter and Assignee should try to
communicate/negotiate Priority/Severity.  For bugs in areas that Cisco
supports, I review the bugs and try to ask for desired ones to be fixed.
I was happy with the responses I got for OFED 1.1 from Mellanox and Open
MPI.

If you want a bug scrub, I suggest a distributed one, where someone from
each company scrubs the bugs in areas they are responsible for.

Scott 

> -----Original Message-----
> From: Tziporet Koren [mailto:tziporet at mellanox.co.il] 
> Sent: Wednesday, February 14, 2007 6:18 AM
> To: Scott Weitzenkamp (sweitzen)
> Cc: EWG; OPENIB
> Subject: how to handle OFEd 1.2 bugs in bugzilla
> 
> Hi Scott and all,
> I wish to consult with you in the way we will treat OFED 1.2 bugs in 
> bugzilla.
> 
> 1. Do we want to have 1.2-alpha 1.2-beta, 1.2-rcX in version, or just 
> 1.2 as we have now
> 2. What do we wish to do with bugs that were opened for 1.1 and are 
> still open?
> 3. What to do with old bugs that where open to gen2 in general?
> 4. What is our methodology for priority and severity setup? 
> (There are 
> too many  blocker bugs still open in OFED 1.1 so they are not 
> actually 
> blockers or they were fixed but not updated)
> 
> Thanks,
> Tziporet
> 


From rdreier at cisco.com  Sun Feb 18 09:37:13 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Sun, 18 Feb 2007 09:37:13 -0800
Subject: [openib-general] [PATCH] IB/core: Set static rate in
 ib_init_ah_from_path()
References: <000401c75223$29e86ea0$e598070a@amr.corp.intel.com>
	<1431.85.65.224.140.1171732569.squirrel@dev.mellanox.co.il>
Message-ID: <adar6sn74fq.fsf@cisco.com>

 > In issue number 296 that i opened several months ago in the Bugzilla, i
 > reported about two missing attributes: the first one is the static_rate,
 > and the second one is the src_path_bits which is not being filled right.

The patch I posted fixes the static rate, right?

You'll need to explain what you mean about src_path_bits, because at
first glance the code looks OK to me.

 - R.


From sweitzen at cisco.com  Sun Feb 18 11:39:34 2007
From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen))
Date: Sun, 18 Feb 2007 11:39:34 -0800
Subject: [openib-general] MVAPICH2 working with OFED 1.2 alpha1 and IB?
In-Reply-To: <200702181432.l1IEWWqn024394@xi.cse.ohio-state.edu>
References: <A15335FBE9BD2449AF2C9EF3D1EB8EA3030525F6@xmb-sjc-216.amer.cisco.com>
	from "Scott Weitzenkamp (sweitzen)" at Feb 18, 2007 12:42:18 AM
	<200702181432.l1IEWWqn024394@xi.cse.ohio-state.edu>
Message-ID: <A15335FBE9BD2449AF2C9EF3D1EB8EA303052619@xmb-sjc-216.amer.cisco.com>

> It looks like you are using an older version of the SRPM:
> mvapich2-0.9.8-3. This version had some shared library issues with the
> ofed 1.2 build. The latest MVAPICH2 SRPM version is
> mvapich2-0.9.8-4. Shaun posted the following e-mail on Feb 15th.
> 
> Please use this latest version and let us know whether the problem
> still persists.

This fixed it, thanks.

Scott


From kaiser at lfbs.RWTH-Aachen.DE  Sun Feb 18 14:41:29 2007
From: kaiser at lfbs.RWTH-Aachen.DE (Christian Kaiser)
Date: Sun, 18 Feb 2007 23:41:29 +0100
Subject: [openib-general] uDAPL: RDMA Write example
Message-ID: <45D8D619.9020904@lfbs.rwth-aachen.de>

Hello,

I'm trying to find a small sample program, that uses RDMA Write instead 
of Send/Recv. In the sources there is no single uDAPL example program 
and on the net neither.
Could someone please help me to find something useful?

Thanks!
Christian


From ogerlitz at voltaire.com  Sun Feb 18 21:12:47 2007
From: ogerlitz at voltaire.com (Or Gerlitz)
Date: Mon, 19 Feb 2007 07:12:47 +0200
Subject: [openib-general] uDAPL: RDMA Write example
In-Reply-To: <45D8D619.9020904@lfbs.rwth-aachen.de>
References: <45D8D619.9020904@lfbs.rwth-aachen.de>
Message-ID: <45D931CF.4060601@voltaire.com>

Christian Kaiser wrote:
> I'm trying to find a small sample program, that uses RDMA Write instead 
> of Send/Recv. In the sources there is no single uDAPL example program 
> and on the net neither.
> Could someone please help me to find something useful?

see http://dapl.svn.sourceforge.net/viewvc/dapl/trunk/test/dapltest

Anyway, can you comment what using udapl buys you which you don't get 
from coding to the verbs (libibverbs) and rdmacm (librdmacm) ???

Or.


From ogerlitz at voltaire.com  Sun Feb 18 22:40:19 2007
From: ogerlitz at voltaire.com (Or Gerlitz)
Date: Mon, 19 Feb 2007 08:40:19 +0200 (IST)
Subject: [openib-general] [PATCH] librdmacm: fix bug causing failure to work
 with partial membership pkey
Message-ID: <Pine.LNX.4.64.0702190838050.26497@zuben>

Hi Sean,

this fixes a bug which did not allow to run librdmacm apps over a node
which is partial member of a partition. The patch takes the approach of the
kernel ib_find_cached_pkey implementation.

If you approve this, i suggest pushing it also into OFED 1.2 as a bug fix.

Or.

----------------------------------------------------------------------
The pkey extracted by the RDMA CM from the IPoIB device hardware address always
has the full membership bit set. However, when looking in the pkey table the
search must mask out the full membership bit.

Signed-off-by: Or Gerlitz <ogerlitz at voltaire.com>
Signed-off-by: Olga Shern <olgas at voltaire.com>

diff --git a/src/cma.c b/src/cma.c
index c5f8cd9..9c24c6a 100644
--- a/src/cma.c
+++ b/src/cma.c
@@ -661,7 +661,7 @@ static int ucma_find_pkey(struct cma_dev

 	for (i = 0, ret = 0; !ret; i++) {
 		ret = ibv_query_pkey(cma_dev->verbs, port_num, i, &chk_pkey);
-		if (!ret && pkey == chk_pkey) {
+		if ((!ret && pkey  == chk_pkey) || (!ret && htons(ntohs(pkey) & 0x7fff)  == chk_pkey)) {
 			*pkey_index = (uint16_t) i;
 			return 0;
 		}


From ogerlitz at voltaire.com  Mon Feb 19 01:29:36 2007
From: ogerlitz at voltaire.com (Or Gerlitz)
Date: Mon, 19 Feb 2007 11:29:36 +0200
Subject: [openib-general] [openfabrics-ewg]  OFED 1.2 alpha release
In-Reply-To: <45D42B26.10709@mellanox.co.il>
References: <45D337E2.200@mellanox.co.il>
	<A15335FBE9BD2449AF2C9EF3D1EB8EA303051A05@xmb-sjc-216.amer.cisco.com>
	<45D42B26.10709@mellanox.co.il>
Message-ID: <45D96E00.4080108@voltaire.com>

Tziporet Koren wrote:
> Regarding RHEL4 U4 and IPoIB bug - Or just prepared a patch that should 
> fix it. We will merge it and test for the beta.

The patch will only fix the bug for RDMA CM multicast consumers, since 
unlike IPoIB who gets the (wrong in the RH4 U4 case) L2 multicast 
address from the stack, the rdma cm has the multicast IP address and is 
able to compute the correct L2 address. This is confusing, i know...


From bugzilla-daemon at lists.openfabrics.org  Mon Feb 19 01:50:30 2007
From: bugzilla-daemon at lists.openfabrics.org (bugzilla-daemon at lists.openfabrics.org)
Date: Mon, 19 Feb 2007 01:50:30 -0800 (PST)
Subject: [openib-general] [Bug 371] New: IPoIB HA not working properly with
	OFED1.2-alpha
Message-ID: <bug-371-1@https.bugs.openfabrics.org/>

https://bugs.openfabrics.org/show_bug.cgi?id=371

           Summary: IPoIB HA not working properly with OFED1.2-alpha
           Product: OpenFabrics Linux
           Version: 1.2alpha1
          Platform: X86-64
        OS/Version: RHEL 4
            Status: NEW
          Severity: major
          Priority: P2
         Component: IPoIB
        AssignedTo: bugzilla at openib.org
        ReportedBy: karun.sharma at qlogic.com


I configured IPoIB HA with OFED1.2-alpha release and it is not working for me.
I have configured IPoIB HA on a RHEL4up4 machine with both ports up. Before
configuring IPoIB HA, both IB interfaces are able to ping the other machine.

Then I executed ipoib_ha.pl script and configured ib0 as primary and ib1 as
secondary interface. The ip address of ib1 interface has gone and till this
point the things seems to be working fine. 

The problem starts when I pulled the IB cable connecting port1. I can see ib0
interface going down and ib1 interface taking IP address of ib0 interface but
ping doesn't work after that. Even if I reinsert the cable in port1, ping is
not working. I have attached some logs below. 

################################################################
[root at ss27 ~]# ibv_devinfo
hca_id: mthca0
        fw_ver:                         5.1.400
        node_guid:                      0006:6a00:9800:6b90
        sys_image_guid:                 0006:6a00:9800:6b90
        vendor_id:                      0x066a
        vendor_part_id:                 25218
        hw_ver:                         0xA0
        board_id:                       SS_0000000002
        phys_port_cnt:                  2
                port:   1
                        state:                  PORT_ACTIVE (4)
                        max_mtu:                2048 (4)
                        active_mtu:             2048 (4)
                        sm_lid:                 6
                        port_lid:               2
                        port_lmc:               0x00
                port:   2
                        state:                  PORT_ACTIVE (4)
                        max_mtu:                2048 (4)
                        active_mtu:             2048 (4)
                        sm_lid:                 6
                        port_lid:               3
                        port_lmc:               0x00
[root at ss27 ~]# ifconfig
eth0      Link encap:Ethernet  HWaddr 00:A0:D1:E4:53:DA  
          inet addr:172.20.50.227  Bcast:172.20.50.255  Mask:255.255.255.0
          inet6 addr: fe80::2a0:d1ff:fee4:53da/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:125 errors:0 dropped:0 overruns:0 frame:0
          TX packets:115 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:17236 (16.8 KiB)  TX bytes:15347 (14.9 KiB)
          Interrupt:201 
ib0       Link encap:UNSPEC  HWaddr
00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00  
          inet addr:172.20.51.227  Bcast:172.20.51.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:2044  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:128 
          RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)
ib1       Link encap:UNSPEC  HWaddr
00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00  
          inet addr:172.20.52.227  Bcast:172.20.52.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:2044  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:128 
          RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)
lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:1543 errors:0 dropped:0 overruns:0 frame:0
          TX packets:1543 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:1648528 (1.5 MiB)  TX bytes:1648528 (1.5 MiB)
[root at ss27 ~]# ping 172.20.51.226 -c 1
PING 172.20.51.226 (172.20.51.226) 56(84) bytes of data.
64 bytes from 172.20.51.226: icmp_seq=0 ttl=64 time=1.44 ms
--- 172.20.51.226 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 1.442/1.442/1.442/0.000 ms, pipe 2
[root at ss27 ~]# ping 172.20.52.226 -c 1
PING 172.20.52.226 (172.20.52.226) 56(84) bytes of data.
64 bytes from 172.20.52.226: icmp_seq=0 ttl=64 time=1.67 ms
--- 172.20.52.226 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 1.671/1.671/1.671/0.000 ms, pipe 2
[root at ss27 ~]# 

[root at ss27 ~]# ipoib_ha.pl -p ib0 -s ib1 --with-arping -vv
get_cfg: Got /etc/sysconfig/network-scripts/ifcfg-ib0

Date:Mon Feb 19 02:32:22 2007
ib0:
======================================
BOOTPROTO = static
status = 
HA = 0
DEVICE = ib0
NETMASK = 255.255.255.0
BROADCAST = 172.20.51.255
IPADDR = 172.20.51.227
NETWORK = 172.20.51.0
ONBOOT = yes
pkey = ffff

Date:Mon Feb 19 02:32:22 2007
Bond:
======================================
BOOTPROTO = static
status = 
HA = 0
DEVICE = ib0
NETMASK = 255.255.255.0
BROADCAST = 172.20.51.255
IPADDR = 172.20.51.227
NETWORK = 172.20.51.0
ONBOOT = yes
pkey = ffff

Date:Mon Feb 19 02:32:23 2007
Got NO-CARRIER event on ib0.
Interface ib0 is down.
Currently Active : ib0
Other device: ib1 is UP
migrate_conf: Migrating from ib0 to ib1
Date:Mon Feb 19 02:33:37 2007
Date:Mon Feb 19 02:33:37 2007
set_up_bond: Going to set up ib1 with 172.20.51.227
set_up_bond: Arping ib1 172.20.51.227.
Got CARRIER-ON event on ib1.
Got CARRIER-ON event on ib1.
Got CARRIER-ON event on ib1.
Got NO-CARRIER event on ib0.
Interface ib0 is down.
Currently Active : ib1
Got CARRIER-ON event on ib1.
Got CARRIER-ON event on ib0.
Got CARRIER-ON event on ib0.
Got NO-CARRIER event on ib1.
Interface ib1 is down.
Currently Active : ib1
Other device: ib0 is UP
migrate_conf: Migrating from ib1 to ib0
Date:Mon Feb 19 02:35:48 2007
Date:Mon Feb 19 02:35:48 2007
set_up_bond: Going to set up ib0 with 172.20.51.227
set_up_bond: Arping ib0 172.20.51.227.
Got CARRIER-ON event on ib0.
Got CARRIER-ON event on ib0.
Got CARRIER-ON event on ib1.

[root at ss27 ~]# 
#######################################################


-- 
Configure bugmail: https://bugs.openfabrics.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From vlad at lists.openfabrics.org  Mon Feb 19 02:24:13 2007
From: vlad at lists.openfabrics.org (vlad at lists.openfabrics.org)
Date: Mon, 19 Feb 2007 02:24:13 -0800 (PST)
Subject: [openib-general] ofa_1_2_kernel 20070219-0200 daily build status
Message-ID: <20070219102414.4122EE6080C@openfabrics.org>

This email was generated automatically, please do not reply


Common build parameters:  --with-ipoib-mod --with-sdp-mod --with-srp-mod --with-user_mad-mod --with-user_access-mod --with-mthca-mod --with-core-mod --with-addr_trans-mod --with-cxgb3-mod 

Passed:
Passed on i686 with 2.6.15-23-server
Passed on i686 with linux-2.6.17
Passed on i686 with linux-2.6.18
Passed on i686 with linux-2.6.15
Passed on i686 with linux-2.6.14
Passed on i686 with linux-2.6.13
Passed on i686 with linux-2.6.12
Passed on i686 with linux-2.6.16
Passed on i686 with linux-2.6.19
Passed on x86_64 with linux-2.6.20
Passed on x86_64 with linux-2.6.18
Passed on x86_64 with linux-2.6.19
Passed on powerpc with linux-2.6.17
Passed on powerpc with linux-2.6.19
Passed on x86_64 with linux-2.6.12
Passed on x86_64 with linux-2.6.17
Passed on powerpc with linux-2.6.18
Passed on x86_64 with linux-2.6.16
Passed on x86_64 with linux-2.6.15
Passed on x86_64 with linux-2.6.14
Passed on x86_64 with linux-2.6.13
Passed on ppc64 with linux-2.6.19
Passed on powerpc with linux-2.6.15
Passed on ia64 with linux-2.6.19
Passed on powerpc with linux-2.6.16
Passed on ppc64 with linux-2.6.15
Passed on powerpc with linux-2.6.12
Passed on ppc64 with linux-2.6.16
Passed on powerpc with linux-2.6.13
Passed on powerpc with linux-2.6.14
Passed on ppc64 with linux-2.6.13
Passed on ppc64 with linux-2.6.17
Passed on ppc64 with linux-2.6.18
Passed on ppc64 with linux-2.6.14
Passed on ppc64 with linux-2.6.12
Passed on ia64 with linux-2.6.16
Passed on ia64 with linux-2.6.15
Passed on ia64 with linux-2.6.18
Passed on ia64 with linux-2.6.12
Passed on ia64 with linux-2.6.17
Passed on ia64 with linux-2.6.14
Passed on ia64 with linux-2.6.13

Failed:


From grossmann at hlrs.de  Mon Feb 19 03:37:41 2007
From: grossmann at hlrs.de (Thomas =?iso-8859-1?q?Gro=DFmann?=)
Date: Mon, 19 Feb 2007 12:37:41 +0100
Subject: [openib-general] Problem with SRP with 512 byte sector size
 with > 2 TB LUNs
In-Reply-To: <adaabzpluix.fsf@cisco.com>
References: <200702071203.45309.grossmann@hlrs.de> <adaabzpluix.fsf@cisco.com>
Message-ID: <200702191237.42016.grossmann@hlrs.de>

Hello,

I also contacted DDN about that problem and am still waiting for 
a response. I cannot test this DDN target over fibre channel, because
you can only connect over IB to it. I have the same impression, that the DDN 
target somehow does not handle READ CAPACITY(16) properly. 

Best regards,
Thomas Großmann

On Wednesday 07 February 2007 18:58, you wrote:
>  > Is it possible to add LUNs with > 2 TB and 512 byte sectors ?
>  > Why does the READ CAPACITY(16) comand fail ?
>
> It seems that the DDN target is not reporting good information -- I
> don't see anything obviously wrong in what the kernel is doing (now
> that SRP sends a READ CAPACITY command).  Do you know if the same type
> of config works over fibre channel?
>
>  - R.

-- 
 Thomas Großmann                
 High Performance Computing Center Stuttgart (HLRS)                                      
  
 Allmandring 30                                                
 70550 Stuttgart, Germany   

 E-Mail: grossmann at hlrs.de                                                              
 Phone: ++49-711-685-65529
 Fax  : ++49-711-685-65832


From bugzilla-daemon at lists.openfabrics.org  Mon Feb 19 03:56:10 2007
From: bugzilla-daemon at lists.openfabrics.org (bugzilla-daemon at lists.openfabrics.org)
Date: Mon, 19 Feb 2007 03:56:10 -0800 (PST)
Subject: [openib-general] [Bug 371] IPoIB HA not working properly with
	OFED1.2-alpha
In-Reply-To: <bug-371-1@https.bugs.openfabrics.org/>
Message-ID: <20070219115610.33273E60810@openfabrics.org>

https://bugs.openfabrics.org/show_bug.cgi?id=371


------- Comment #1 from karun.sharma at qlogic.com  2007-02-19 03:56 -------
It is working fine on SLES10 systems.


-- 
Configure bugmail: https://bugs.openfabrics.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From monil at voltaire.com  Mon Feb 19 04:00:27 2007
From: monil at voltaire.com (Moni Levy)
Date: Mon, 19 Feb 2007 14:00:27 +0200
Subject: [openib-general] [PATCH] IB/ipoib: Fix ipoib handling for pkey
	reordering
Message-ID: <45D9915B.6070202@voltaire.com>

This issue was found during partitioning & SM fail over testing. The fix was tested for 24 hours with pkey reshuffling every few seconds. The patch applies to Roland's master branch.

SM reconfiguration or failover possibly causes a shuffling of the values in the port pkey table. The current implementation only queries for the index of the pkey once, when it creates the device QP and after that moves it into working state, and hence does not address this scenario. Fix this by using the PKEY_CHANGE event as a trigger to reconfigure the device QP. 

Signed-off-by: Moni Levy <monil at voltaire.com>
---
 ipoib.h       |    2 ++
 ipoib_ib.c    |   22 ++++++++++++++++++++--
 ipoib_main.c  |    1 +
 ipoib_verbs.c |    4 +++-
 4 files changed, 26 insertions(+), 3 deletions(-)

diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h b/drivers/infiniband/ulp/ipoib/ipoib.h
index 07deee8..ed854e8 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib.h
+++ b/drivers/infiniband/ulp/ipoib/ipoib.h
@@ -139,6 +139,7 @@ struct ipoib_dev_priv {
 	struct delayed_work pkey_task;
 	struct delayed_work mcast_task;
 	struct work_struct flush_task;
+	struct work_struct flush_restart_qp_task;
 	struct work_struct restart_task;
 	struct delayed_work ah_reap_task;
 
@@ -261,6 +262,7 @@ struct ipoib_dev_priv *ipoib_intf_alloc(
 
 int ipoib_ib_dev_init(struct net_device *dev, struct ib_device *ca, int port);
 void ipoib_ib_dev_flush(struct work_struct *work);
+void ipoib_ib_dev_flush_restart_qp(struct work_struct *work);
 void ipoib_ib_dev_cleanup(struct net_device *dev);
 
 int ipoib_ib_dev_open(struct net_device *dev);
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_ib.c b/drivers/infiniband/ulp/ipoib/ipoib_ib.c
index 59d9594..5e2ada9 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_ib.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_ib.c
@@ -611,7 +611,7 @@ int ipoib_ib_dev_init(struct net_device 
 	return 0;
 }
 
-void ipoib_ib_dev_flush(struct work_struct *work)
+static void __ipoib_ib_dev_flush(struct work_struct *work, int restart_qp)
 {
 	struct ipoib_dev_priv *cpriv, *priv =
 		container_of(work, struct ipoib_dev_priv, flush_task);
@@ -630,6 +630,12 @@ void ipoib_ib_dev_flush(struct work_stru
 	ipoib_dbg(priv, "flushing\n");
 
 	ipoib_ib_dev_down(dev, 0);
+	
+	if (restart_qp) {
+		ipoib_dbg(priv, "restarting the device QP\n");
+		ipoib_ib_dev_stop(dev);
+		ipoib_ib_dev_open(dev);
+	}
 
 	/*
 	 * The device could have been brought down between the start and when
@@ -644,11 +650,23 @@ void ipoib_ib_dev_flush(struct work_stru
 
 	/* Flush any child interfaces too */
 	list_for_each_entry(cpriv, &priv->child_intfs, list)
-		ipoib_ib_dev_flush(&cpriv->flush_task);
+		__ipoib_ib_dev_flush(&cpriv->flush_task, restart_qp);
 
 	mutex_unlock(&priv->vlan_mutex);
 }
 
+void ipoib_ib_dev_flush(struct work_struct *work)
+{
+ 	/* We only restart the QP in case of PKEY change event */ 
+ 	__ipoib_ib_dev_flush(work, 0);
+}
+
+void ipoib_ib_dev_flush_restart_qp(struct work_struct *work)
+{
+ 	/* We only restart the QP in case of PKEY change event */ 
+ 	__ipoib_ib_dev_flush(work, 1);
+}
+
 void ipoib_ib_dev_cleanup(struct net_device *dev)
 {
 	struct ipoib_dev_priv *priv = netdev_priv(dev);
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c
index 705eb1d..da46b79 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
@@ -942,6 +942,7 @@ static void ipoib_setup(struct net_devic
 	INIT_DELAYED_WORK(&priv->pkey_task,    ipoib_pkey_poll);
 	INIT_DELAYED_WORK(&priv->mcast_task,   ipoib_mcast_join_task);
 	INIT_WORK(&priv->flush_task,   ipoib_ib_dev_flush);
+	INIT_WORK(&priv->flush_restart_qp_task, ipoib_ib_dev_flush_restart_qp);
 	INIT_WORK(&priv->restart_task, ipoib_mcast_restart_task);
 	INIT_DELAYED_WORK(&priv->ah_reap_task, ipoib_reap_ah);
 }
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_verbs.c b/drivers/infiniband/ulp/ipoib/ipoib_verbs.c
index 7b717c6..c249915 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_verbs.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_verbs.c
@@ -252,12 +252,14 @@ void ipoib_event(struct ib_event_handler
 		container_of(handler, struct ipoib_dev_priv, event_handler);
 
 	if (record->event == IB_EVENT_PORT_ERR    ||
-	    record->event == IB_EVENT_PKEY_CHANGE ||
 	    record->event == IB_EVENT_PORT_ACTIVE ||
 	    record->event == IB_EVENT_LID_CHANGE  ||
 	    record->event == IB_EVENT_SM_CHANGE   ||
 	    record->event == IB_EVENT_CLIENT_REREGISTER) {
 		ipoib_dbg(priv, "Port state change event\n");
 		queue_work(ipoib_workqueue, &priv->flush_task);
+	} else if (record->event == IB_EVENT_PKEY_CHANGE) {
+		ipoib_dbg(priv, "PKEY change event\n");
+		queue_work(ipoib_workqueue, &priv->flush_restart_qp_task);
 	}
 }


From monil at voltaire.com  Mon Feb 19 04:15:39 2007
From: monil at voltaire.com (Moni Levy)
Date: Mon, 19 Feb 2007 14:15:39 +0200
Subject: [openib-general] [PATCH] librdmacm: fix bug causing failure to
 work with partial membership pkey
In-Reply-To: <Pine.LNX.4.64.0702190838050.26497@zuben>
References: <Pine.LNX.4.64.0702190838050.26497@zuben>
Message-ID: <6a122cc00702190415p7de43bam97348447d807ac1f@mail.gmail.com>

Or,
On 2/19/07, Or Gerlitz <ogerlitz at voltaire.com> wrote:
> Hi Sean,
>
> this fixes a bug which did not allow to run librdmacm apps over a node
> which is partial member of a partition. The patch takes the approach of the
> kernel ib_find_cached_pkey implementation.
>
> If you approve this, i suggest pushing it also into OFED 1.2 as a bug fix.
>
> Or.
>
> ----------------------------------------------------------------------
> The pkey extracted by the RDMA CM from the IPoIB device hardware address always
> has the full membership bit set. However, when looking in the pkey table the
> search must mask out the full membership bit.
>
> Signed-off-by: Or Gerlitz <ogerlitz at voltaire.com>
> Signed-off-by: Olga Shern <olgas at voltaire.com>
>
> diff --git a/src/cma.c b/src/cma.c
> index c5f8cd9..9c24c6a 100644
> --- a/src/cma.c
> +++ b/src/cma.c
> @@ -661,7 +661,7 @@ static int ucma_find_pkey(struct cma_dev
>
>         for (i = 0, ret = 0; !ret; i++) {
>                 ret = ibv_query_pkey(cma_dev->verbs, port_num, i, &chk_pkey);
> -               if (!ret && pkey == chk_pkey) {
> +               if ((!ret && pkey  == chk_pkey) || (!ret && htons(ntohs(pkey) & 0x7fff)  == chk_pkey)) {

What about just using:
    if (!ret && pkey | 0x8000  == chk_pkey | 0x8000) {

even if not there is no need to check the ret twice in case of limited
membership

-- Moni
>                         *pkey_index = (uint16_t) i;
>                         return 0;
>                 }
>
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
>
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
>
>


From ossrosch at linux.vnet.ibm.com  Mon Feb 19 04:38:33 2007
From: ossrosch at linux.vnet.ibm.com (Stefan Roscher)
Date: Mon, 19 Feb 2007 13:38:33 +0100
Subject: [openib-general] 32-bit build for ppc64 is required
In-Reply-To: <OFFBAF19C0.B25E1F90-ONC1257283.006ACFE5-C1257283.006B231F@de.ibm.com>
References: <OFFBAF19C0.B25E1F90-ONC1257283.006ACFE5-C1257283.006B231F@de.ibm.com>
Message-ID: <200702191338.34623.ossrosch@linux.vnet.ibm.com>

On Thursday 15 February 2007 20:30, Hoang-Nam Nguyen wrote:
> > > Yuk.  I suppose I could write one, but I don't (and can't) use any of
> > > the OFED supplied build scripts in our build system, so it's hard for
> me
> > > to test since our build system is the only way I have to access
> > > ppc/ppc64 hardware.
> > Oh, well.
> > Other takers?
> OK, I've no choice to say no. Haven't look at the scripts yet. But will do
> in next couple of days!
> Nam
> 
> 
Hi,
Did I interpret the conclusion of this thread correctly?
Nam will create a patch against the OFED1.2 build
scripts, which provides 32 and 64 bit binaries for ppc.
Do you agree?
regards Stefan


From mplee at sandia.gov  Mon Feb 19 07:43:23 2007
From: mplee at sandia.gov (Lee, Michael Paichi)
Date: Mon, 19 Feb 2007 08:43:23 -0700
Subject: [openib-general] Address List Change for Friday, 2/23/2007
Message-ID: <3D84A59A1AD3584DA02AEAD240E8863F0366947F@ES22SNLNT.srn.sandia.gov>

We're in the process of migrating the maillists from the old openib.org server to the new lists.openfabrics.org machine.  The list openib-promoters will be moved this Friday, February 23, 2007.  The new address for the maillist will be general at lists.openfabrics.org.  

What this means is that messages will come from general at lists.openfabrics.org.  Conversely, replies should be made to this address as well.  Messages will also have a new subject line prefix of [OFA General].  If you have configured your e-mail client to filter based on maillist address or subject headers, you may need to make some adjustments for filtering.

However, for the sake of transition, messages sent to the previous maillist address on the old server will forward to the new server.  This forward will remain in place until the old server is taken offline and final DNS changes are made.  We expect the old server to go offline sometime in early March.

The web archives will also be migrated to the new web address shortly, http://lists.openfabrics.org.

If you have any questions, please don't hesitate to contact me at mplee at sandia.gov.

Regards,
Michael Lee
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070219/e8d454ee/attachment.html>

From mplee at sandia.gov  Mon Feb 19 08:01:21 2007
From: mplee at sandia.gov (Lee, Michael Paichi)
Date: Mon, 19 Feb 2007 09:01:21 -0700
Subject: [openib-general] Minor correction regarding list migration
Message-ID: <3D84A59A1AD3584DA02AEAD240E8863F03669480@ES22SNLNT.srn.sandia.gov>

Sorry for the follow-up, but I made a minor error in the previous e-mail.  The reference to "openib-promoters" should have been "openib-general."

So just to reiterate:

openib-general will become general at lists.openfabrics.org this Friday, 2/23/2007

Thanks,
Michael
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070219/f01b1dfe/attachment.html>

From caitlinb at broadcom.com  Mon Feb 19 08:51:38 2007
From: caitlinb at broadcom.com (Caitlin Bestler)
Date: Mon, 19 Feb 2007 08:51:38 -0800
Subject: [openib-general] uDAPL: RDMA Write example
In-Reply-To: <45D8D619.9020904@lfbs.rwth-aachen.de>
Message-ID: <54AD0F12E08D1541B826BE97C98F99F101091E5E@NT-SJCA-0751.brcm.ad.broadcom.com>

openib-general-bounces at openib.org wrote:
> Hello,
> 
> I'm trying to find a small sample program, that uses RDMA
> Write instead of Send/Recv. In the sources there is no single
> uDAPL example program and on the net neither.
> Could someone please help me to find something useful?
> 
> Thanks!
> Christian
> 
With uDAPL, you don't use RDMA Write "instead of" Send/Recv, you use
it in addition to Send/Recv. The Send/Recv is still required for
synchronization.


From afriedle at open-mpi.org  Mon Feb 19 08:58:26 2007
From: afriedle at open-mpi.org (Andrew Friedley)
Date: Mon, 19 Feb 2007 11:58:26 -0500
Subject: [openib-general] OFA 1.2 tarball creation
Message-ID: <45D9D732.8070100@open-mpi.org>

How exactly is various developers' source code pulled together to create 
the nightly OFA tarballs at www.openfabrics.org/builds (could this be 
put on the wiki somewhere?)?  I went looking to see if some of Sean's 
work on RDMA CM had made it into these tarballs, and am not seeing code 
with the patches I'm looking for.

The exact patch I'm after was 'rdma_cm: allow joins to return a unique 
address'.  I remember seeing this patch on the ofed_1_2 branch in Sean's 
rdma-dev git repository about two weeks ago, though I don't see the 
ofed_1_2 branch anymore (the patch does exist on the multicast branch). 
  Sean, was this patch supposed to make it to the nightly 1.2 tarballs?

I'm trying to avoid having to figure out how to build source from git 
into suitable RPM's for RHEL4; documentation on that would be great too :)

Andrew


From arlin.r.davis at intel.com  Mon Feb 19 10:32:27 2007
From: arlin.r.davis at intel.com (Arlin Davis)
Date: Mon, 19 Feb 2007 10:32:27 -0800
Subject: [openib-general] Fork issues with simple MPI program
Message-ID: <000001c75454$523660f0$eed4180a@amr.corp.intel.com>


We are seeing some fork issues with a simple MPI program (attached) running on a 2.6.16+ kernels and
OFED 1.1. We have tried both Intel MPI and mvapich2 with the same results:

t_fork> mpiexec -n 2 t_system_fork                                                            
parent process
[0] started child process with pid=31552
send desc error
parent process
[0] Abort: [] Got completion with error 1, vendor code=69, dest rank=1
 at line 540 in file ibv_channel_manager.c
[1] I am child process with pid=25437
[1] started child process with pid=25437
[0] I am child process with pid=31552
child process
[1] finished pid=25437
child process
[0] finished pid=31552

rank 0 in job 2  svlmpicl400_32925   caused collective abort of all ranks
  exit status of rank 0: return code 252

If you run mvapich2 for uDAPL, it hangs before second MPI_Barrier() just like Intel MPI. If you use
the I_MPI_RDMA_USE_EVD_FALLBACK=1 option with Intel MPI you get the following error similar to
mvapich2:

parent process
parent process
[0] I am child process with pid=9596
[0] started child process with pid=9596
[1] I am child process with pid=11477
[1] started child process with pid=11477
[0][rdma_iba.c:1007] Intel MPI fatal error: DTO operation completed with error. status=0x2.
cookie=0x1
[1][rdma_iba.c:1007] Intel MPI fatal error: DTO operation completed with error. status=0x2.
cookie=0x1
child process
[1] finished pid=11477
child process
[0] finished pid=9596
rank 0 in job 8  cst-19_54707   caused collective abort of all ranks
  exit status of rank 0: return code 255

Any insight would be greatly appreciated. It was our assumption that the parent process can continue
to use IB resources after the fixes went into 2.6.16 and OFED 1.1. Is this true? 

Thanks,

-arlin
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: t_system_fork.c
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070219/96dabe76/attachment.c>

From kaiser at lfbs.RWTH-Aachen.DE  Mon Feb 19 10:50:33 2007
From: kaiser at lfbs.RWTH-Aachen.DE (Christian Kaiser)
Date: Mon, 19 Feb 2007 19:50:33 +0100
Subject: [openib-general] uDAPL: RDMA Write example
In-Reply-To: <45D931CF.4060601@voltaire.com>
References: <45D8D619.9020904@lfbs.rwth-aachen.de>
	<45D931CF.4060601@voltaire.com>
Message-ID: <45D9F179.7010103@lfbs.rwth-aachen.de>

We are working on a new provider support for uDAPL. So we don't use 
verbs and rdmacm.

Your example is ok but not what I was looking for. I am searching for a 
really small programm, that does RDMA with uDAPL in a few lines (I know 
a few lines is impossible but a few hundred lines). The dapltest suite 
is not really small.

Christian

Or Gerlitz schrieb:
> Christian Kaiser wrote:
>   
>> I'm trying to find a small sample program, that uses RDMA Write instead 
>> of Send/Recv. In the sources there is no single uDAPL example program 
>> and on the net neither.
>> Could someone please help me to find something useful?
>>     
>
> see http://dapl.svn.sourceforge.net/viewvc/dapl/trunk/test/dapltest
>
> Anyway, can you comment what using udapl buys you which you don't get 
> from coding to the verbs (libibverbs) and rdmacm (librdmacm) ???
>
> Or.
>
>
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
>
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
>
>
>   


From kaiser at lfbs.RWTH-Aachen.DE  Mon Feb 19 10:58:09 2007
From: kaiser at lfbs.RWTH-Aachen.DE (Christian Kaiser)
Date: Mon, 19 Feb 2007 19:58:09 +0100
Subject: [openib-general] uDAPL: RDMA Write example
In-Reply-To: <54AD0F12E08D1541B826BE97C98F99F101091E5E@NT-SJCA-0751.brcm.ad.broadcom.com>
References: <54AD0F12E08D1541B826BE97C98F99F101091E5E@NT-SJCA-0751.brcm.ad.broadcom.com>
Message-ID: <45D9F341.7010001@lfbs.rwth-aachen.de>

Caitlin Bestler schrieb:
> openib-general-bounces at openib.org wrote:
>   
>> Hello,
>>
>> I'm trying to find a small sample program, that uses RDMA
>> Write instead of Send/Recv. In the sources there is no single
>> uDAPL example program and on the net neither.
>> Could someone please help me to find something useful?
>>
>> Thanks!
>> Christian
>>
>>     
> With uDAPL, you don't use RDMA Write "instead of" Send/Recv, you use
> it in addition to Send/Recv. The Send/Recv is still required for
> synchronization.
>
>   
So you put a Send/Recv before and after the dat_ep_post_rdma_write()?
I tried it once with a zero byte Send/Recv but I had the impression that 
it doesn't work so that I have to do a one byte Send/Recv?


From greg.lindahl at qlogic.com  Mon Feb 19 11:28:08 2007
From: greg.lindahl at qlogic.com (Greg Lindahl)
Date: Mon, 19 Feb 2007 11:28:08 -0800
Subject: [openib-general] Address List Change for Friday, 2/23/2007
In-Reply-To: <3D84A59A1AD3584DA02AEAD240E8863F0366947F@ES22SNLNT.srn.sandia.gov>
References: <3D84A59A1AD3584DA02AEAD240E8863F0366947F@ES22SNLNT.srn.sandia.gov>
Message-ID: <20070219192808.GA6801@localhost.localdomain>

I see that the EWG list is now calling itself the Engineering Working
Group, has it been renamed from the Enterprise Working Group? If so,
did the nature of the list change? Or was it a typo?

-- greg


From jsquyres at cisco.com  Mon Feb 19 12:06:15 2007
From: jsquyres at cisco.com (Jeff Squyres)
Date: Mon, 19 Feb 2007 15:06:15 -0500
Subject: [openib-general] [ewg] Re:  Address List Change for Friday,
 2/23/2007
In-Reply-To: <20070219192808.GA6801@localhost.localdomain>
References: <3D84A59A1AD3584DA02AEAD240E8863F0366947F@ES22SNLNT.srn.sandia.gov>
	<20070219192808.GA6801@localhost.localdomain>
Message-ID: <C562A057-FAF1-4F0B-8268-03195A58F380@cisco.com>

Heh.  Probably a typo in the transition to the new server.

Michael -- can you fix?


On Feb 19, 2007, at 2:28 PM, Greg Lindahl wrote:

> I see that the EWG list is now calling itself the Engineering Working
> Group, has it been renamed from the Enterprise Working Group? If so,
> did the nature of the list change? Or was it a typo?
>
> -- greg
>
> _______________________________________________
> ewg mailing list
> ewg at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


-- 
Jeff Squyres
Server Virtualization Business Unit
Cisco Systems


From mplee at sandia.gov  Mon Feb 19 12:08:09 2007
From: mplee at sandia.gov (Lee, Michael Paichi)
Date: Mon, 19 Feb 2007 13:08:09 -0700
Subject: [openib-general] [ewg] Re:  Address List Change for Friday,
	2/23/2007
References: <3D84A59A1AD3584DA02AEAD240E8863F0366947F@ES22SNLNT.srn.sandia.gov>
	<20070219192808.GA6801@localhost.localdomain>
Message-ID: <3D84A59A1AD3584DA02AEAD240E8863F03669483@ES22SNLNT.srn.sandia.gov>

Greg,

Yes, it was a typo.  It's been taken care of now.

Michael


-----Original Message-----
From: ewg-bounces at lists.openfabrics.org on behalf of Greg Lindahl
Sent: Mon 2/19/2007 11:28 AM
To: openib-general at openib.org; ewg at lists.openfabrics.org
Subject: [ewg] Re: [openib-general] Address List Change for Friday, 2/23/2007
 
I see that the EWG list is now calling itself the Engineering Working
Group, has it been renamed from the Enterprise Working Group? If so,
did the nature of the list change? Or was it a typo?

-- greg

_______________________________________________
ewg mailing list
ewg at lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070219/931a18f5/attachment.html>

From scarter at ornl.gov  Mon Feb 19 12:53:36 2007
From: scarter at ornl.gov (Steven Carter)
Date: Mon, 19 Feb 2007 15:53:36 -0500
Subject: [openib-general] Port error rate detection
Message-ID: <45DA0E50.7010002@ornl.gov>

I have a Nagios module that alerts on connectivity, port errors, 
speed/width problems.  I would like to give it the ability to change the 
severity of the alert depending on whether errors are just present or if 
they are increasing faster than a specified rate.  The intent is to 
equip the module to keep the state of the last query and possibly 
history, but I wanted to make sure that I was not re-inventing the wheel 
first.  Is there an attribute or utility that I am overlooking that will 
help me do this?

Thanks,

Steven.


From sashak at voltaire.com  Mon Feb 19 13:46:30 2007
From: sashak at voltaire.com (Sasha Khapyorsky)
Date: Mon, 19 Feb 2007 23:46:30 +0200
Subject: [openib-general] [PATCH] osm_vendor_ibumad: termination crash fix
Message-ID: <20070219214630.GW27414@sashak.voltaire.com>


When OpenSM is terminated umad_receiver thread still running even after
the structures are destroyed and freed, this causes to random (but easily
reproducible) crashes. The reason is that osm_vendor_delete() does not
care about thread termination. This patch adds the receiver thread
cancellation (by using pthread_cancel() and pthread_join()) and cares to
keep have all mutexes unlocked upon termination. There is also minor
termination code consolidation - osm_vendor_port_close() function.

Signed-off-by: Sasha Khapyorsky <sashak at voltaire.com>
---
 osm/include/vendor/osm_vendor_ibumad.h |    6 +-
 osm/libvendor/osm_vendor_ibumad.c      |  157 +++++++++++++++-----------------
 osm/libvendor/osm_vendor_ibumad_sa.c   |    3 +-
 3 files changed, 77 insertions(+), 89 deletions(-)

diff --git a/osm/include/vendor/osm_vendor_ibumad.h b/osm/include/vendor/osm_vendor_ibumad.h
index 4cbd59f..f6e3d69 100644
--- a/osm/include/vendor/osm_vendor_ibumad.h
+++ b/osm/include/vendor/osm_vendor_ibumad.h
@@ -39,7 +39,6 @@
 
 #include <iba/ib_types.h>
 #include <complib/cl_qlist.h>
-#include <complib/cl_thread.h>
 #include <opensm/osm_base.h>
 #include <opensm/osm_log.h>
 
@@ -87,7 +86,6 @@ typedef struct _osm_ca_info
 	ib_net64_t		guid;
 	size_t			attr_size;
 	ib_ca_attr_t		*p_attr;
-
 } osm_ca_info_t;
 /*
 * FIELDS
@@ -170,8 +168,8 @@ typedef	struct _osm_vendor
 	vendor_match_tbl_t mtbl;
 	umad_ca_t umad_ca;
 	umad_port_t umad_port;
-	cl_spinlock_t cb_lock;
-	cl_spinlock_t match_tbl_lock;
+	pthread_mutex_t cb_mutex;
+	pthread_mutex_t match_tbl_mutex;
 	int umad_port_id;
 	void *receiver;
 	int issmfd;
diff --git a/osm/libvendor/osm_vendor_ibumad.c b/osm/libvendor/osm_vendor_ibumad.c
index 35f127a..7320738 100644
--- a/osm/libvendor/osm_vendor_ibumad.c
+++ b/osm/libvendor/osm_vendor_ibumad.c
@@ -1,5 +1,5 @@
 /*
- * Copyright (c) 2004-2006 Voltaire, Inc. All rights reserved.
+ * Copyright (c) 2004-2007 Voltaire, Inc. All rights reserved.
  * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved.
  * Copyright (c) 1996-2003 Intel Corporation. All rights reserved.
  *
@@ -58,13 +58,11 @@
 
 #include <unistd.h>
 #include <stdlib.h>
-#include <signal.h>
 #include <fcntl.h>
 #include <errno.h>
 
 #include <iba/ib_types.h>
 #include <complib/cl_qlist.h>
-#include <complib/cl_thread.h>
 #include <complib/cl_math.h>
 #include <complib/cl_debug.h>
 #include <opensm/osm_madw.h>
@@ -97,12 +95,13 @@ typedef struct _osm_umad_bind_info
 
 typedef struct _umad_receiver
 {
-	cl_event_t		signal;
-	cl_thread_t		receiver;
-	osm_vendor_t		*p_vend;
-	osm_log_t		*p_log;
+	pthread_t tid;
+	osm_vendor_t	*p_vend;
+	osm_log_t	*p_log;
 } umad_receiver_t;
 
+static void osm_vendor_close_port(osm_vendor_t* const p_vend);
+
 static void
 clear_madw(osm_vendor_t *p_vend)
 {
@@ -110,7 +109,7 @@ clear_madw(osm_vendor_t *p_vend)
 	ib_net64_t old_tid;
 
 	OSM_LOG_ENTER( p_vend->p_log, clear_madw );
-	cl_spinlock_acquire( &p_vend->match_tbl_lock );
+	pthread_mutex_lock(&p_vend->match_tbl_mutex);
 	for (m = p_vend->mtbl.tbl, e = m + p_vend->mtbl.max; m < e; m++) {
 		if (m->tid) {
 			old_m = m;
@@ -119,7 +118,7 @@ clear_madw(osm_vendor_t *p_vend)
         		osm_mad_pool_put(
 				((osm_umad_bind_info_t *)((osm_madw_t *)m->v)->h_bind)->p_mad_pool,
 				m->v);
-			cl_spinlock_release( &p_vend->match_tbl_lock );
+			pthread_mutex_unlock(&p_vend->match_tbl_mutex);
 			osm_log(p_vend->p_log, OSM_LOG_ERROR,
 				"clear_madw: ERR 5401: "
 				"evicting entry %p (tid was 0x%"PRIx64")\n",
@@ -127,7 +126,7 @@ clear_madw(osm_vendor_t *p_vend)
 			goto Exit;
 		}
 	}
-	cl_spinlock_release( &p_vend->match_tbl_lock );
+	pthread_mutex_unlock(&p_vend->match_tbl_mutex);
 
 Exit:
 	OSM_LOG_EXIT( p_vend->p_log );
@@ -147,18 +146,18 @@ get_madw(osm_vendor_t *p_vend, ib_net64_t *tid)
 	if (mtid == 0)
 		return 0;
 
-	cl_spinlock_acquire( &p_vend->match_tbl_lock );
+	pthread_mutex_lock(&p_vend->match_tbl_mutex);
 	for (m = p_vend->mtbl.tbl, e = m + p_vend->mtbl.max; m < e; m++) {
 		if (m->tid == mtid) {
 			m->tid = 0;
 			*tid = mtid;
 			res = m->v;
-			cl_spinlock_release( &p_vend->match_tbl_lock );
+			pthread_mutex_unlock(&p_vend->match_tbl_mutex);
 			return res;
 		}
 	}
 
-	cl_spinlock_release( &p_vend->match_tbl_lock );
+	pthread_mutex_unlock(&p_vend->match_tbl_mutex);
 	return 0;
 }
 
@@ -171,13 +170,13 @@ put_madw(osm_vendor_t *p_vend, osm_madw_t *p_madw, ib_net64_t *tid)
 	ib_net64_t old_tid;
 	uint32_t oldest = ~0;
 
-	cl_spinlock_acquire( &p_vend->match_tbl_lock );
+	pthread_mutex_lock(&p_vend->match_tbl_mutex);
 	for (m = p_vend->mtbl.tbl, e = m + p_vend->mtbl.max; m < e; m++) {
 		if (m->tid == 0) {
 			m->tid = *tid;
 			m->v = p_madw;
 			m->version = cl_atomic_inc((atomic32_t *)&p_vend->mtbl.last_version);
-			cl_spinlock_release( &p_vend->match_tbl_lock );
+			pthread_mutex_unlock(&p_vend->match_tbl_mutex);
 			return;
 		}
 		if (oldest > m->version) {
@@ -191,13 +190,13 @@ put_madw(osm_vendor_t *p_vend, osm_madw_t *p_madw, ib_net64_t *tid)
 	p_req_madw = old_lru->v;
 	p_bind = p_req_madw->h_bind;
 	p_req_madw->status = IB_CANCELED;
-	cl_spinlock_acquire( &p_vend->cb_lock );
+	pthread_mutex_lock(&p_vend->cb_mutex);
 	(*p_bind->send_err_callback)(p_bind->client_context, old_lru->v);
-	cl_spinlock_release( &p_vend->cb_lock );
+	pthread_mutex_unlock(&p_vend->cb_mutex);
 	lru->tid = *tid;
 	lru->v = p_madw;
 	lru->version = cl_atomic_inc((atomic32_t *)&p_vend->mtbl.last_version);
-	cl_spinlock_release( &p_vend->match_tbl_lock );
+	pthread_mutex_unlock(&p_vend->match_tbl_mutex);
 	osm_log(p_vend->p_log, OSM_LOG_ERROR,
 		"put_madw: ERR 5402: "
 		"evicting entry %p (tid was 0x%"PRIx64")\n", old_lru, old_tid);
@@ -237,7 +236,12 @@ swap_mad_bufs(osm_madw_t *p_madw, void *umad)
 	return old;
 }
 
-void
+static void unlock_mutex(void *arg)
+{
+	pthread_mutex_unlock(arg);
+}
+
+void *
 umad_receiver(void *p_ptr)
 {
 	umad_receiver_t* const p_ur = (umad_receiver_t *)p_ptr;
@@ -356,9 +360,10 @@ umad_receiver(void *p_ptr)
 			} else {
 				p_req_madw->status = IB_TIMEOUT;
 				/* cb frees req_madw */
-				cl_spinlock_acquire( &p_vend->cb_lock );
+				pthread_mutex_lock(&p_vend->cb_mutex);
+				pthread_cleanup_push(unlock_mutex, &p_vend->cb_mutex);
 				(*p_bind->send_err_callback)(p_bind->client_context, p_req_madw);
-				cl_spinlock_release( &p_vend->cb_lock );
+				pthread_cleanup_pop(1);
 			}
 
 			osm_mad_pool_put(p_bind->p_mad_pool, p_madw);
@@ -398,47 +403,37 @@ umad_receiver(void *p_ptr)
 #endif
 
 		/* call the CB */
-		cl_spinlock_acquire( &p_vend->cb_lock );
+		pthread_mutex_lock(&p_vend->cb_mutex);
+		pthread_cleanup_push(unlock_mutex, &p_vend->cb_mutex);
 		(*p_bind->mad_recv_callback)(p_madw, p_bind->client_context, p_req_madw);
-		cl_spinlock_release( &p_vend->cb_lock );
+		pthread_cleanup_pop(1);
 	}
 
 	OSM_LOG_EXIT( p_vend->p_log );
-	return;
+	return NULL;
 }
 
-static int
-umad_receiver_init(osm_vendor_t *p_vend)
+static int umad_receiver_start(osm_vendor_t *p_vend)
 {
 	umad_receiver_t *p_ur = p_vend->receiver;
-	int r = -1;
-
-	OSM_LOG_ENTER( p_vend->p_log, umad_receiver_init );
 
 	p_ur->p_vend = p_vend;
 	p_ur->p_log = p_vend->p_log;
 
-	cl_event_construct(&p_ur->signal);
-	cl_thread_construct(&p_ur->receiver);
-
-  	if (cl_event_init(&p_ur->signal, FALSE))
-		goto Exit;
-
-	/*
-	 * Initialize the thread after all other dependent objects
-	 * have been initialized.
-	 */
-	if (cl_thread_init( &p_ur->receiver, umad_receiver, p_ur,
-			    "umad receiver" ))
-	    goto Exit;
-
-	r = 0;	/* success */
+	if (pthread_create(&p_ur->tid, NULL, umad_receiver, p_ur) < 0)
+		return -1;
 
-Exit:
-	OSM_LOG_EXIT( p_vend->p_log );
-	return r;
+	return 0;
 }
 
+static void umad_receiver_stop(umad_receiver_t *p_ur)
+{
+	pthread_cancel(p_ur->tid);
+	pthread_join(p_ur->tid, NULL);
+	p_ur->tid = 0;
+	p_ur->p_vend = NULL;
+	p_ur->p_log = NULL;
+}
 /**********************************************************************
  **********************************************************************/
 ib_api_status_t
@@ -454,23 +449,11 @@ osm_vendor_init(
 	p_vend->p_log = p_log;
 	p_vend->timeout = timeout;
 	p_vend->max_retries = OSM_DEFAULT_RETRY_COUNT;
-	cl_spinlock_construct( &p_vend->cb_lock );
-	cl_spinlock_construct( &p_vend->match_tbl_lock );
+	pthread_mutex_init(&p_vend->cb_mutex, NULL);
+	pthread_mutex_init(&p_vend->match_tbl_mutex, NULL);
 	p_vend->umad_port_id = -1;
 	p_vend->issmfd = -1;
 
-	if ((r = cl_spinlock_init( &p_vend->cb_lock ))) {
-		osm_log(p_vend->p_log, OSM_LOG_ERROR,
-			"osm_vendor_init: ERR 5435: Error initializing cb spinlock\n");
-		goto Exit;
-	}
-
-	if ((r = cl_spinlock_init( &p_vend->match_tbl_lock ))) {
-		osm_log(p_vend->p_log, OSM_LOG_ERROR,
-			"osm_vendor_init: ERR 5434: Error initializing match tbl spinlock\n");
-		goto Exit;
-	}
-	
 	/*
 	 * Open our instance of UMAD.
 	 */
@@ -541,29 +524,14 @@ void
 osm_vendor_delete(
   IN osm_vendor_t** const pp_vend )
 {
-	umad_receiver_t *p_ur;
-	int agent_id;
-
-	if ((*pp_vend)->umad_port_id >= 0) {
-		/* unregister UMAD agents */
-		for (agent_id = 0; agent_id < UMAD_CA_MAX_AGENTS; agent_id++)
-			if ( (*pp_vend)->agents[agent_id] )
-				umad_unregister((*pp_vend)->umad_port_id,
-						agent_id );
-		umad_close_port((*pp_vend)->umad_port_id);
-		(*pp_vend)->umad_port_id = -1;
-	}
+	osm_vendor_close_port(*pp_vend);
 
 	clear_madw( *pp_vend );
 	/* make sure all ports are closed */
 	umad_done();
 
-	/* umad receiver thread ? */
-	p_ur = (*pp_vend)->receiver;
-	if (p_ur)
-		cl_event_destroy( &p_ur->signal );
-	cl_spinlock_destroy( &(*pp_vend)->cb_lock );
-	cl_spinlock_destroy( &(*pp_vend)->match_tbl_lock );
+	pthread_mutex_destroy(&(*pp_vend)->cb_mutex);
+	pthread_mutex_destroy(&(*pp_vend)->match_tbl_mutex);
 	free( *pp_vend );
 	*pp_vend = NULL;
 }
@@ -780,7 +748,7 @@ osm_vendor_open_port(
 		p_vend->umad_port_id = umad_port_id = -1;
 		goto Exit;
 	}
-	if (umad_receiver_init(p_vend) != 0) {
+	if (umad_receiver_start(p_vend) != 0) {
 		osm_log( p_vend->p_log, OSM_LOG_ERROR,
 			 "osm_vendor_open_port: ERR 5420: "
 			 "umad_receiver_init failed\n" );
@@ -793,6 +761,27 @@ Exit:
 	return umad_port_id;
 }
 
+static void osm_vendor_close_port(osm_vendor_t* const p_vend)
+{
+	umad_receiver_t *p_ur;
+	int i;
+
+	p_ur = p_vend->receiver;
+	p_vend->receiver = NULL;
+	if (p_ur) {
+		umad_receiver_stop(p_ur);
+		free(p_ur);
+	}
+
+	if (p_vend->umad_port_id >= 0) {
+		for (i = 0; i < UMAD_CA_MAX_AGENTS; i++)
+			if (p_vend->agents[i])
+				umad_unregister(p_vend->umad_port_id, i);
+		umad_close_port(p_vend->umad_port_id);
+		p_vend->umad_port_id = -1;
+	}
+}
+
 static int set_bit(int nr, void *method_mask)
 {
 	int mask, retval;
@@ -985,10 +974,10 @@ osm_vendor_unbind(
 
 	OSM_LOG_ENTER( p_vend->p_log, osm_vendor_unbind );
 
-	cl_spinlock_acquire( &p_vend->cb_lock );
+	pthread_mutex_lock(&p_vend->cb_mutex);
 	p_bind->mad_recv_callback = __osm_vendor_recv_dummy_cb;
 	p_bind->send_err_callback = __osm_vendor_send_err_dummy_cb;
-	cl_spinlock_release( &p_vend->cb_lock );
+	pthread_mutex_unlock(&p_vend->cb_mutex);
 
 	OSM_LOG_EXIT( p_vend->p_log);
 }
@@ -1154,9 +1143,9 @@ Resp:
 			"Send p_madw = %p of size %d failed %d (%m)\n", 
 			p_madw, sent_mad_size, ret);
 		p_madw->status = IB_ERROR;
-		cl_spinlock_acquire( &p_vend->cb_lock );
+		pthread_mutex_lock(&p_vend->cb_mutex);
 		(*p_bind->send_err_callback)(p_bind->client_context, p_madw);	/* cb frees madw */
-		cl_spinlock_release( &p_vend->cb_lock );
+		pthread_mutex_unlock(&p_vend->cb_mutex);
 		goto Exit;
 	}
 
diff --git a/osm/libvendor/osm_vendor_ibumad_sa.c b/osm/libvendor/osm_vendor_ibumad_sa.c
index e3978ef..a110e81 100644
--- a/osm/libvendor/osm_vendor_ibumad_sa.c
+++ b/osm/libvendor/osm_vendor_ibumad_sa.c
@@ -39,9 +39,10 @@
 
 #include <stdlib.h>
 #include <string.h>
+#include <sys/time.h>
 #include <vendor/osm_vendor_api.h>
 #include <vendor/osm_vendor_sa_api.h>
-#include <sys/time.h>
+#include <complib/cl_event.h>
 
 #define MAX_PORTS 64
 
-- 
1.5.0.1.40.gb40d


From sashak at voltaire.com  Mon Feb 19 13:55:39 2007
From: sashak at voltaire.com (Sasha Khapyorsky)
Date: Mon, 19 Feb 2007 23:55:39 +0200
Subject: [openib-general] Fwd: [ANNOUNCE] GIT 1.5.0
In-Reply-To: <20070218155006.GS27414@sashak.voltaire.com>
References: <20070215071537.GD11866@mellanox.co.il>
	<20070218155006.GS27414@sashak.voltaire.com>
Message-ID: <20070219215539.GX27414@sashak.voltaire.com>

On 17:50 Sun 18 Feb     , Sasha Khapyorsky wrote:
> On 09:15 Thu 15 Feb     , Michael S. Tsirkin wrote:
> > FYI.
> > I suggest we update git on the openfabrics server to 1.5.0:
> > "Detached HEAD" feature will be useful for nightly build scripts.
> > Sasha?
> 
> git-1.5.0 feature list looks fine for me. But let's wait with upgrade a
> couple of days for 1.5.0.1.

Upgraded to git-1.5.0.1.

Sasha


From sean.hefty at intel.com  Mon Feb 19 14:38:55 2007
From: sean.hefty at intel.com (Sean Hefty)
Date: Mon, 19 Feb 2007 14:38:55 -0800
Subject: [openib-general] OFA 1.2 tarball creation
In-Reply-To: <45D9D732.8070100@open-mpi.org>
Message-ID: <000001c75476$bcf2eea0$6dcd180a@amr.corp.intel.com>

>How exactly is various developers' source code pulled together to create
>the nightly OFA tarballs at www.openfabrics.org/builds (could this be
>put on the wiki somewhere?)?  I went looking to see if some of Sean's
>work on RDMA CM had made it into these tarballs, and am not seeing code
>with the patches I'm looking for.

I do not know how OFED creates their tarballs or manages their source.

>The exact patch I'm after was 'rdma_cm: allow joins to return a unique
>address'.  I remember seeing this patch on the ofed_1_2 branch in Sean's
>rdma-dev git repository about two weeks ago, though I don't see the
>ofed_1_2 branch anymore (the patch does exist on the multicast branch).
>  Sean, was this patch supposed to make it to the nightly 1.2 tarballs?

Assuming that ~vlad/ofed_1_2.git is the OFED kernel tree, then this patch does
not appear to be included.  I was asked by OFED to publish an ofed_1_2 branch,
which I did, but I do not know if it was used in constructing the OFED tree.
The patch was intended to go into OFED.

- Sean


From sashak at voltaire.com  Mon Feb 19 15:01:39 2007
From: sashak at voltaire.com (Sasha Khapyorsky)
Date: Tue, 20 Feb 2007 01:01:39 +0200
Subject: [openib-general] [PATCH] complib: thread_pool rework
Message-ID: <20070219230139.GZ27414@sashak.voltaire.com>


This reworks complib's thread_pool implementation (used by opensm
dispatcher). Prevents events signaling merges, termination races,
eliminates using of broken cl_atomic stuff, reduces memory allocations
and code complexity.

Signed-off-by: Sasha Khapyorsky <sashak at voltaire.com>
---
 osm/complib/cl_async_proc.c         |    1 -
 osm/complib/cl_dispatcher.c         |    2 +-
 osm/complib/cl_thread.c             |   13 --
 osm/complib/cl_threadpool.c         |  208 +++++++++++-----------------------
 osm/complib/libosmcomp.map          |    1 -
 osm/include/complib/cl_thread.h     |   16 ---
 osm/include/complib/cl_threadpool.h |   84 ++++----------
 osm/osmtest/osmt_multicast.c        |    1 +
 8 files changed, 92 insertions(+), 234 deletions(-)

diff --git a/osm/complib/cl_async_proc.c b/osm/complib/cl_async_proc.c
index 51561af..7ac96bb 100644
--- a/osm/complib/cl_async_proc.c
+++ b/osm/complib/cl_async_proc.c
@@ -55,7 +55,6 @@ cl_async_proc_construct(
 
 	cl_qlist_init( &p_async_proc->item_queue );
 	cl_spinlock_construct( &p_async_proc->lock );
-	cl_thread_pool_construct( &p_async_proc->thread_pool );
 }
 
 cl_status_t
diff --git a/osm/complib/cl_dispatcher.c b/osm/complib/cl_dispatcher.c
index a7c0ac7..4a1960c 100644
--- a/osm/complib/cl_dispatcher.c
+++ b/osm/complib/cl_dispatcher.c
@@ -49,6 +49,7 @@
 
 #include <stdlib.h>
 #include <complib/cl_dispatcher.h>
+#include <complib/cl_thread.h>
 #include <complib/cl_timer.h>
 
 /* give some guidance when we build our cl_pool of messages */
@@ -132,7 +133,6 @@ cl_disp_construct(
 
   cl_qlist_init( &p_disp->reg_list );
   cl_ptr_vector_construct( &p_disp->reg_vec );
-  cl_thread_pool_construct( &p_disp->worker_threads );
   cl_qlist_init( &p_disp->msg_fifo );
   cl_spinlock_construct( &p_disp->lock );
   cl_qpool_construct( &p_disp->msg_pool );
diff --git a/osm/complib/cl_thread.c b/osm/complib/cl_thread.c
index f131480..eecc7d6 100644
--- a/osm/complib/cl_thread.c
+++ b/osm/complib/cl_thread.c
@@ -39,7 +39,6 @@
 
 #include <stdio.h>
 #include <unistd.h>
-#include <sys/sysinfo.h>
 #include <complib/cl_thread.h>
 
 /*
@@ -129,18 +128,6 @@ cl_thread_stall(
 	usleep( pause_us );
 }
 
-uint32_t
-cl_proc_count( void )
-{
-	uint32_t ret;
-
-	ret = get_nprocs();
-	if( !ret)
-		return 1;/* Workaround for PPC where get_nprocs() returns 0 */
-
-	return ret;
-}
-
 boolean_t
 cl_is_current_thread(
 	IN	const cl_thread_t* const	p_thread )
diff --git a/osm/complib/cl_threadpool.c b/osm/complib/cl_threadpool.c
index ff8bf90..ca4e261 100644
--- a/osm/complib/cl_threadpool.c
+++ b/osm/complib/cl_threadpool.c
@@ -1,5 +1,5 @@
 /*
- * Copyright (c) 2004-2006 Voltaire, Inc. All rights reserved.
+ * Copyright (c) 2004-2007 Voltaire, Inc. All rights reserved.
  * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved.
  * Copyright (c) 1996-2003 Intel Corporation. All rights reserved.
  *
@@ -49,134 +49,85 @@
 
 #include <stdlib.h>
 #include <string.h>
+#include <pthread.h>
+#include <sys/sysinfo.h>
 #include <complib/cl_threadpool.h>
-#include <complib/cl_atomic.h>
 
-void
-__cl_thread_pool_routine(
-	IN	void* const	context )
+static int proc_count( void )
 {
-	cl_status_t			status = CL_SUCCESS;
-	cl_thread_pool_t	*p_thread_pool = (cl_thread_pool_t*)context;
-
-	/* Continue looping until signalled to end. */
-	while( !p_thread_pool->exit )
-	{
-		/* Wait for the specified event to occur. */
-		status = cl_event_wait_on( &p_thread_pool->wakeup_event, 
-							EVENT_NO_TIMEOUT, TRUE );
-
-		/* See if we've been signalled to end execution. */
-		if( (p_thread_pool->exit) || (status == CL_NOT_DONE) )
-			break;
-
-		/* The event has been signalled.  Invoke the callback. */
-		(*p_thread_pool->pfn_callback)( (void*)p_thread_pool->context );
-	}
+	int ret = get_nprocs();
+	if (!ret)
+		return 1;/* Workaround for PPC where get_nprocs() returns 0 */
+	return ret;
+}
 
-	/*
-	 * Decrement the running count to notify the destroying thread
-	 * that the event was received and processed.
-	 */
-	cl_atomic_dec( &p_thread_pool->running_count );
-	cl_event_signal( &p_thread_pool->destroy_event );
+static void cleanup_mutex(void *arg)
+{
+	pthread_mutex_unlock(&((cl_thread_pool_t *)arg)->mutex);
 }
 
-void
-cl_thread_pool_construct(
-	IN	cl_thread_pool_t* const	p_thread_pool )
+static void *thread_pool_routine(void* context)
 {
-	CL_ASSERT( p_thread_pool);
+	cl_thread_pool_t *p_thread_pool = (cl_thread_pool_t*)context;
+
+	do {
+		pthread_mutex_lock(&p_thread_pool->mutex);
+		pthread_cleanup_push(cleanup_mutex, p_thread_pool);
+		while(!p_thread_pool->events)
+			pthread_cond_wait(&p_thread_pool->cond,
+					  &p_thread_pool->mutex);
+		p_thread_pool->events--;
+		pthread_cleanup_pop(1);
+		/* The event has been signalled.  Invoke the callback. */
+		(*p_thread_pool->pfn_callback)(p_thread_pool->context);
+	} while (1);
 
-	memset( p_thread_pool, 0, sizeof(cl_thread_pool_t) );
-	cl_event_construct( &p_thread_pool->wakeup_event );
-	cl_event_construct( &p_thread_pool->destroy_event );
-	cl_list_construct( &p_thread_pool->thread_list );
-	p_thread_pool->state = CL_UNINITIALIZED;
+	return NULL;
 }
 
 cl_status_t
 cl_thread_pool_init(
-	IN	cl_thread_pool_t* const		p_thread_pool,
-	IN	uint32_t					count,
-	IN	cl_pfn_thread_callback_t	pfn_callback,
-	IN	const void* const			context,
-	IN	const char* const			name )
+	IN cl_thread_pool_t* const p_thread_pool,
+	IN unsigned count,
+	IN void	(*pfn_callback)(void*),
+	IN void *context,
+	IN const char* const name )
 {
-	cl_status_t	status;
-	cl_thread_t	*p_thread;
-	uint32_t	i;
+	int i;
 
 	CL_ASSERT( p_thread_pool );
 	CL_ASSERT( pfn_callback );
 
-	cl_thread_pool_construct( p_thread_pool );
+	memset(p_thread_pool, 0, sizeof(*p_thread_pool));
 
-	if( !count )
-		count = cl_proc_count();
+	if(!count)
+		count = proc_count();
 
-	status = cl_list_init( &p_thread_pool->thread_list, count );
-	if( status != CL_SUCCESS )
-	{
-		cl_thread_pool_destroy( p_thread_pool );
-		return( status );
-	}
+	pthread_mutex_init(&p_thread_pool->mutex, NULL);
+	pthread_cond_init(&p_thread_pool->cond, NULL);
 
-	/* Initialize the event that the threads wait on. */
-	status = cl_event_init( &p_thread_pool->wakeup_event, FALSE );
-	if( status != CL_SUCCESS )
-	{
-		cl_thread_pool_destroy( p_thread_pool );
-		return( status );
-	}
+	p_thread_pool->events = 0;
 
-	/* Initialize the event used to destroy the threadpool. */
-	status = cl_event_init( &p_thread_pool->destroy_event, FALSE );
-	if( status != CL_SUCCESS )
-	{
+	p_thread_pool->pfn_callback = pfn_callback;
+	p_thread_pool->context = context;
+
+	p_thread_pool->tid = calloc(count, sizeof(*p_thread_pool->tid));
+	if (!p_thread_pool->tid) {
 		cl_thread_pool_destroy( p_thread_pool );
-		return( status );
+		return CL_INSUFFICIENT_MEMORY;
 	}
 
-	p_thread_pool->pfn_callback = pfn_callback;
-	p_thread_pool->context = context;
+	p_thread_pool->running_count = count;
 
 	for( i = 0; i < count; i++ )
 	{
-		/* Create a new thread. */
-		p_thread = (cl_thread_t*)malloc( sizeof(cl_thread_t) );
-		if( !p_thread )
-		{
+		if (pthread_create(&p_thread_pool->tid[i], NULL,
+				   thread_pool_routine, p_thread_pool) < 0) {
 			cl_thread_pool_destroy( p_thread_pool );
-			return( CL_INSUFFICIENT_MEMORY );
+			return CL_INSUFFICIENT_RESOURCES;
 		}
-
-		cl_thread_construct( p_thread );
-
-		/*
-		 * Add it to the list.  This is guaranteed to work since we
-		 * initialized the list to hold at least the number of threads we want
-		 * to store there.
-		 */
-		status = cl_list_insert_head( &p_thread_pool->thread_list, p_thread );
-		CL_ASSERT( status == CL_SUCCESS );
-
-		/* Start the thread. */
-		status = cl_thread_init( p_thread, __cl_thread_pool_routine,
-			p_thread_pool, name );
-		if( status != CL_SUCCESS )
-		{
-			cl_thread_pool_destroy( p_thread_pool );
-			return( status );
-		}
-
-		/*
-		 * Increment the running count to insure that a destroying thread
-		 * will signal all the threads.
-		 */
-		cl_atomic_inc( &p_thread_pool->running_count );
 	}
-	p_thread_pool->state = CL_INITIALIZED;
+
 	return( CL_SUCCESS );
 }
 
@@ -184,59 +135,34 @@ void
 cl_thread_pool_destroy(
 	IN	cl_thread_pool_t* const	p_thread_pool )
 {
-	cl_thread_t		*p_thread;
+	int i;
 
 	CL_ASSERT( p_thread_pool );
-	CL_ASSERT( cl_is_state_valid( p_thread_pool->state ) );
 
-	/* Indicate to all threads that they need to exit. */
-	p_thread_pool->exit = TRUE;
+	for (i = 0 ; i < p_thread_pool->running_count; i++)
+		if (p_thread_pool->tid[i])
+			pthread_cancel(p_thread_pool->tid[i]);
 
-	/*
-	 * Signal the threads until they have all exited.  Signalling
-	 * once for each thread is not guaranteed to work since two events
-	 * could release only a single thread, depending on the rate at which
-	 * the events are set and how the thread scheduler processes notifications.
-	 */
+	for (i = 0 ; i < p_thread_pool->running_count; i++)
+		if (p_thread_pool->tid[i])
+			pthread_join(p_thread_pool->tid[i], NULL);
 
-	while( p_thread_pool->running_count )
-	{
-     cl_event_signal( &p_thread_pool->wakeup_event );
-     /*
-      * Wait for the destroy event to occur, indicating that the thread
-      * has exited.
-      */
-     cl_event_wait_on( &p_thread_pool->destroy_event,
-                       EVENT_NO_TIMEOUT, TRUE );
-   }
-
-	/*
-	 * Stop each thread one at a time.  Note that this cannot be done in the
-	 * above for loop because signal will wake up an unknown thread.
-	 */
-	if( cl_is_list_inited( &p_thread_pool->thread_list ) )
-	{
-		while( !cl_is_list_empty( &p_thread_pool->thread_list ) )
-		{
-			p_thread =
-				(cl_thread_t*)cl_list_remove_head( &p_thread_pool->thread_list );
-			cl_thread_destroy( p_thread );
-			free( p_thread );
-		}
-	}
+	p_thread_pool->running_count = 0;
+	pthread_cond_destroy(&p_thread_pool->cond);
+	pthread_mutex_destroy(&p_thread_pool->mutex);
 
-	cl_event_destroy( &p_thread_pool->destroy_event );
-	cl_event_destroy( &p_thread_pool->wakeup_event );
-	cl_list_destroy( &p_thread_pool->thread_list );
-	p_thread_pool->state = CL_UNINITIALIZED;
+	p_thread_pool->events = 0;
 }
 
 cl_status_t
 cl_thread_pool_signal(
 	IN	cl_thread_pool_t* const	p_thread_pool )
 {
+	int ret;
 	CL_ASSERT( p_thread_pool );
-	CL_ASSERT( p_thread_pool->state == CL_INITIALIZED );
-
-	return( cl_event_signal( &p_thread_pool->wakeup_event ) );
+	pthread_mutex_lock(&p_thread_pool->mutex);
+	p_thread_pool->events++;
+	ret = pthread_cond_signal(&p_thread_pool->cond);
+	pthread_mutex_unlock(&p_thread_pool->mutex);
+	return ret;
 }
diff --git a/osm/complib/libosmcomp.map b/osm/complib/libosmcomp.map
index e2e58b1..3b8c040 100644
--- a/osm/complib/libosmcomp.map
+++ b/osm/complib/libosmcomp.map
@@ -138,7 +138,6 @@ OSMCOMP_1.1 {
 		cl_thread_destroy;
 		cl_thread_suspend;
 		cl_thread_stall;
-		cl_proc_count;
 		cl_is_current_thread;
 		__cl_thread_pool_routine;
 		cl_thread_pool_construct;
diff --git a/osm/include/complib/cl_thread.h b/osm/include/complib/cl_thread.h
index 4752278..9635e22 100644
--- a/osm/include/complib/cl_thread.h
+++ b/osm/include/complib/cl_thread.h
@@ -312,22 +312,6 @@ cl_thread_stall(
 *	Thread, cl_thread_suspend
 *********/
 
-/****f* Component Library: Thread/cl_proc_count
-* NAME
-*	cl_proc_count
-*
-* DESCRIPTION
-*	The cl_proc_count function returns the number of processors in the system.
-*
-* SYNOPSIS
-*/
-uint32_t
-cl_proc_count( void );
-/*
-* RETURN VALUE
-*	Returns the number of processors in the system.
-*********/
-
 /****i* Component Library: Thread/cl_is_current_thread
 * NAME
 *	cl_is_current_thread
diff --git a/osm/include/complib/cl_threadpool.h b/osm/include/complib/cl_threadpool.h
index aa1e066..30b5f86 100644
--- a/osm/include/complib/cl_threadpool.h
+++ b/osm/include/complib/cl_threadpool.h
@@ -46,9 +46,8 @@
 #ifndef _CL_THREAD_POOL_H_
 #define _CL_THREAD_POOL_H_
 
-#include <complib/cl_list.h>
-#include <complib/cl_thread.h>
-#include <complib/cl_event.h>
+#include <pthread.h>
+#include <complib/cl_types.h>
 
 #ifdef __cplusplus
 #  define BEGIN_C_DECLS extern "C" {
@@ -100,15 +99,13 @@ BEGIN_C_DECLS
 */
 typedef struct _cl_thread_pool
 {
-	cl_pfn_thread_callback_t	pfn_callback;
-	const void					*context;
-	cl_list_t					thread_list;
-	cl_event_t					wakeup_event;
-	cl_event_t					destroy_event;
-	boolean_t					exit;
-	cl_state_t					state;
-	atomic32_t					running_count;
-
+	void (*pfn_callback)(void*);
+	void *context;
+	unsigned running_count;
+	unsigned events;
+	pthread_cond_t cond;
+	pthread_mutex_t mutex;
+	pthread_t *tid;
 } cl_thread_pool_t;
 /*
 * FIELDS
@@ -118,58 +115,23 @@ typedef struct _cl_thread_pool
 *	context
 *		Context to pass to the thread callback function.
 *
-*	thread_list
-*		List of threads managed by the thread pool.
-*
-*	event
-*		Event used to signal threads to wake up and do work.
-*
-*	destroy_event
-*		Event used to signal threads to exit.
-*
-*	exit
-*		Flag used to indicates threads to exit.
-*
-*	state
-*		State of the thread pool.
-*
 *	running_count
 *		Number of threads running.
 *
-* SEE ALSO
-*	Thread Pool
-*********/
-
-/****f* Component Library: Thread Pool/cl_thread_pool_construct
-* NAME
-*	cl_thread_pool_construct
+*	events
+*		events counter
 *
-* DESCRIPTION
-*	The cl_thread_pool_construct function initializes the state of a
-*	thread pool.
+*	mutex
+*		mutex for cond variable protection
 *
-* SYNOPSIS
-*/
-void
-cl_thread_pool_construct(
-	IN	cl_thread_pool_t* const	p_thread_pool );
-/*
-* PARAMETERS
-*	p_thread_pool
-*		[in] Pointer to a thread pool structure.
+*	cond
+*		conditional variable to signal an event to thread
 *
-* RETURN VALUE
-*	This function does not return a value.
-*
-* NOTES
-*	Allows calling cl_thread_pool_destroy without first calling
-*	cl_thread_pool_init.
-*
-*	Calling cl_thread_pool_construct is a prerequisite to calling any other
-*	thread pool function except cl_thread_pool_init.
+*	tid
+*		array of allocated thread ids.
 *
 * SEE ALSO
-*	Thread Pool, cl_thread_pool_init, cl_thread_pool_destroy
+*	Thread Pool
 *********/
 
 /****f* Component Library: Thread Pool/cl_thread_pool_init
@@ -184,11 +146,11 @@ cl_thread_pool_construct(
 */
 cl_status_t
 cl_thread_pool_init(
-	IN	cl_thread_pool_t* const		p_thread_pool,
-	IN	uint32_t					thread_count,
-	IN	cl_pfn_thread_callback_t	pfn_callback,
-	IN	const void* const			context,
-	IN	const char* const			name );
+	IN cl_thread_pool_t* const p_thread_pool,
+	IN unsigned count,
+	IN void	(*pfn_callback)(void*),
+	IN void *context,
+	IN const char* const name );
 /*
 * PARAMETERS
 *	p_thread_pool
diff --git a/osm/osmtest/osmt_multicast.c b/osm/osmtest/osmt_multicast.c
index d5519eb..724a0bb 100644
--- a/osm/osmtest/osmt_multicast.c
+++ b/osm/osmtest/osmt_multicast.c
@@ -51,6 +51,7 @@
 #include <string.h>
 #include <complib/cl_debug.h>
 #include <complib/cl_map.h>
+#include <complib/cl_list.h>
 #include "osmtest.h"
 
 /**********************************************************************
-- 
1.5.0.1.40.gb40d


From sashak at voltaire.com  Mon Feb 19 15:04:41 2007
From: sashak at voltaire.com (Sasha Khapyorsky)
Date: Tue, 20 Feb 2007 01:04:41 +0200
Subject: [openib-general] [PATCH] osm/libvendor: compilation fixes
In-Reply-To: <20070219214630.GW27414@sashak.voltaire.com>
References: <20070219214630.GW27414@sashak.voltaire.com>
Message-ID: <20070219230441.GA27414@sashak.voltaire.com>


This adds needed header files inclusion to prevent compilation failures.

Signed-off-by: Sasha Khapyorsky <sashak at voltaire.com>
---

Those compilation failures was detected during ibutils/ibmgtsim build.

 osm/libvendor/osm_vendor_mlx_sa.c  |    1 +
 osm/libvendor/osm_vendor_mlx_sim.c |    1 +
 2 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/osm/libvendor/osm_vendor_mlx_sa.c b/osm/libvendor/osm_vendor_mlx_sa.c
index ab37adb..37fa618 100644
--- a/osm/libvendor/osm_vendor_mlx_sa.c
+++ b/osm/libvendor/osm_vendor_mlx_sa.c
@@ -43,6 +43,7 @@
 #include <string.h>
 #include <complib/cl_debug.h>
 #include <complib/cl_timer.h>
+#include <complib/cl_event.h>
 #include <vendor/osm_vendor_api.h>
 #include <vendor/osm_vendor_sa_api.h>
 
diff --git a/osm/libvendor/osm_vendor_mlx_sim.c b/osm/libvendor/osm_vendor_mlx_sim.c
index d3e6eeb..bcd2bdc 100644
--- a/osm/libvendor/osm_vendor_mlx_sim.c
+++ b/osm/libvendor/osm_vendor_mlx_sim.c
@@ -57,6 +57,7 @@
 #include <vendor/osm_vendor_mlx_transport.h>
 #include <vendor/osm_vendor_mlx_dispatcher.h>
 #include <vendor/osm_vendor_mlx_svc.h>
+#include <complib/cl_thread.h>
 
 /* the simulator messages definition */
 #include <ibmgtsim/ibms_client_api.h>
-- 
1.5.0.1.40.gb40d


From sashak at voltaire.com  Mon Feb 19 15:06:22 2007
From: sashak at voltaire.com (Sasha Khapyorsky)
Date: Tue, 20 Feb 2007 01:06:22 +0200
Subject: [openib-general] [PATCH] ibutils/ibis: compilation fixes
In-Reply-To: <20070219214630.GW27414@sashak.voltaire.com>
References: <20070219214630.GW27414@sashak.voltaire.com>
Message-ID: <20070219230622.GB27414@sashak.voltaire.com>


This adds needed header file inclusions.

Signed-off-by: Sasha Khapyorsky <sashak at voltaire.com>
---

Those compilation failures was detected during ibutils/ibmgtsim build.

 ibis/src/ibbbm.h             |    1 +
 ibis/src/ibis_gsi_mad_ctrl.c |    1 +
 2 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/ibis/src/ibbbm.h b/ibis/src/ibbbm.h
index d998179..026a49c 100644
--- a/ibis/src/ibbbm.h
+++ b/ibis/src/ibbbm.h
@@ -50,6 +50,7 @@
 #include <complib/cl_qmap.h>
 #include <complib/cl_passivelock.h>
 #include <complib/cl_debug.h>
+#include <complib/cl_event.h>
 #include <iba/ib_types.h>
 #include <opensm/osm_madw.h>
 #include <opensm/osm_log.h>
diff --git a/ibis/src/ibis_gsi_mad_ctrl.c b/ibis/src/ibis_gsi_mad_ctrl.c
index a147642..3c7ea86 100644
--- a/ibis/src/ibis_gsi_mad_ctrl.c
+++ b/ibis/src/ibis_gsi_mad_ctrl.c
@@ -48,6 +48,7 @@
 #include <complib/cl_passivelock.h>
 #include <complib/cl_debug.h>
 #include <complib/cl_map.h>
+#include <complib/cl_event.h>
 #include <iba/ib_types.h>
 #include "ibis_gsi_mad_ctrl.h"
 #include "ibis.h"
-- 
1.5.0.1.40.gb40d


From sashak at voltaire.com  Mon Feb 19 15:07:57 2007
From: sashak at voltaire.com (Sasha Khapyorsky)
Date: Tue, 20 Feb 2007 01:07:57 +0200
Subject: [openib-general] [PATCH] complib: remove unused stuff
In-Reply-To: <20070219230139.GZ27414@sashak.voltaire.com>
References: <20070219230139.GZ27414@sashak.voltaire.com>
Message-ID: <20070219230757.GC27414@sashak.voltaire.com>


This removes some unused complib stuff - cl_memory, cl_async_proc,
cl_perf.

Signed-off-by: Sasha Khapyorsky <sashak at voltaire.com>
---
 osm/complib/Makefile.am             |   11 +-
 osm/complib/cl_async_proc.c         |  147 --------
 osm/complib/cl_memory.c             |  515 -------------------------
 osm/complib/cl_memory_osd.c         |   93 -----
 osm/complib/cl_perf.c               |  273 --------------
 osm/complib/libosmcomp.map          |   33 --
 osm/include/Makefile.am             |    4 -
 osm/include/complib/cl_async_proc.h |  334 -----------------
 osm/include/complib/cl_memory.h     |  663 --------------------------------
 osm/include/complib/cl_memtrack.h   |   96 -----
 osm/include/complib/cl_perf.h       |  708 -----------------------------------
 11 files changed, 4 insertions(+), 2873 deletions(-)
 delete mode 100644 osm/complib/cl_async_proc.c
 delete mode 100644 osm/complib/cl_memory.c
 delete mode 100644 osm/complib/cl_memory_osd.c
 delete mode 100644 osm/complib/cl_perf.c
 delete mode 100644 osm/include/complib/cl_async_proc.h
 delete mode 100644 osm/include/complib/cl_memory.h
 delete mode 100644 osm/include/complib/cl_memtrack.h
 delete mode 100644 osm/include/complib/cl_perf.h

diff --git a/osm/complib/Makefile.am b/osm/complib/Makefile.am
index 7bdf34b..be26bb7 100644
--- a/osm/complib/Makefile.am
+++ b/osm/complib/Makefile.am
@@ -17,10 +17,10 @@ else
     libosmcomp_version_script =
 endif
 
-libosmcomp_la_SOURCES = cl_async_proc.c cl_complib.c \
+libosmcomp_la_SOURCES = cl_complib.c \
 			cl_dispatcher.c cl_event.c cl_event_wheel.c \
-			cl_list.c cl_log.c cl_map.c cl_memory.c \
-			cl_memory_osd.c cl_perf.c cl_pool.c \
+			cl_list.c cl_log.c cl_map.c \
+			cl_pool.c \
 			cl_ptr_vector.c \
 			cl_spinlock.c cl_statustext.c \
 			cl_thread.c cl_threadpool.c \
@@ -32,7 +32,7 @@ libosmcomp_la_DEPENDENCIES = $(srcdir)/libosmcomp.map
 
 libosmcompincludedir = $(includedir)/infiniband/complib
 
-libosmcompinclude_HEADERS = $(srcdir)/../include/complib/cl_async_proc.h \
+libosmcompinclude_HEADERS = \
 	$(srcdir)/../include/complib/cl_atomic.h \
 	$(srcdir)/../include/complib/cl_atomic_osd.h \
 	$(srcdir)/../include/complib/cl_byteswap.h \
@@ -49,12 +49,9 @@ libosmcompinclude_HEADERS = $(srcdir)/../include/complib/cl_async_proc.h \
 	$(srcdir)/../include/complib/cl_log.h \
 	$(srcdir)/../include/complib/cl_map.h \
 	$(srcdir)/../include/complib/cl_math.h \
-	$(srcdir)/../include/complib/cl_memory.h \
-	$(srcdir)/../include/complib/cl_memtrack.h \
 	$(srcdir)/../include/complib/cl_packoff.h \
 	$(srcdir)/../include/complib/cl_packon.h \
 	$(srcdir)/../include/complib/cl_passivelock.h \
-	$(srcdir)/../include/complib/cl_perf.h \
 	$(srcdir)/../include/complib/cl_pool.h \
 	$(srcdir)/../include/complib/cl_ptr_vector.h \
 	$(srcdir)/../include/complib/cl_qcomppool.h \
diff --git a/osm/complib/cl_async_proc.c b/osm/complib/cl_async_proc.c
deleted file mode 100644
index 7ac96bb..0000000
--- a/osm/complib/cl_async_proc.c
+++ /dev/null
@@ -1,147 +0,0 @@
-/*
- * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved.
- * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved.
- * Copyright (c) 1996-2003 Intel Corporation. All rights reserved.
- *
- * This software is available to you under a choice of one of two
- * licenses.  You may choose to be licensed under the terms of the GNU
- * General Public License (GPL) Version 2, available from the file
- * COPYING in the main directory of this source tree, or the
- * OpenIB.org BSD license below:
- *
- *     Redistribution and use in source and binary forms, with or
- *     without modification, are permitted provided that the following
- *     conditions are met:
- *
- *      - Redistributions of source code must retain the above
- *        copyright notice, this list of conditions and the following
- *        disclaimer.
- *
- *      - Redistributions in binary form must reproduce the above
- *        copyright notice, this list of conditions and the following
- *        disclaimer in the documentation and/or other materials
- *        provided with the distribution.
- *
- * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
- * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
- * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
- * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
- * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
- * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
- * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
- * SOFTWARE.
- *
- */
-
-#if HAVE_CONFIG_H
-#  include <config.h>
-#endif /* HAVE_CONFIG_H */
-
-#include <complib/cl_async_proc.h>
-
-#define CL_ASYNC_PROC_MIN	16
-#define CL_ASYNC_PROC_GROWSIZE	16
-
-/* Worker function declaration. */
-static void
-__cl_async_proc_worker(
-	IN	void* const	context );
-
-void
-cl_async_proc_construct(
-	IN	cl_async_proc_t* const	p_async_proc )
-{
-	CL_ASSERT( p_async_proc );
-
-	cl_qlist_init( &p_async_proc->item_queue );
-	cl_spinlock_construct( &p_async_proc->lock );
-}
-
-cl_status_t
-cl_async_proc_init(
-	IN	cl_async_proc_t* const	p_async_proc,
-	IN	const uint32_t			thread_count,
-	IN	const char* const		name )
-{
-	cl_status_t		status;
-
-	CL_ASSERT( p_async_proc );
-
-	cl_async_proc_construct( p_async_proc );
-
-	status = cl_spinlock_init( &p_async_proc->lock );
-	if( status != CL_SUCCESS )
-	{
-		cl_async_proc_destroy( p_async_proc );
-		return( status );
-	}
-
-	status = cl_thread_pool_init( &p_async_proc->thread_pool, thread_count,
-		__cl_async_proc_worker, p_async_proc, name );
-	if( status != CL_SUCCESS )
-		cl_async_proc_destroy( p_async_proc );
-
-	return( status );
-}
-
-void
-cl_async_proc_destroy(
-	IN	cl_async_proc_t* const	p_async_proc )
-{
-	/* Destroy the thread pool first so that the threads stop. */
-	cl_thread_pool_destroy( &p_async_proc->thread_pool );
-
-	/* Flush all queued callbacks. */
-	__cl_async_proc_worker( p_async_proc );
-
-	/* Destroy the spinlock. */
-	cl_spinlock_destroy( &p_async_proc->lock );
-}
-
-void
-cl_async_proc_queue(
-	IN	cl_async_proc_t* const		p_async_proc,
-	IN	cl_async_proc_item_t* const	p_item )
-{
-	CL_ASSERT( p_async_proc );
-	CL_ASSERT( p_item->pfn_callback );
-
-	/* Enqueue this item for processing. */
-	cl_spinlock_acquire( &p_async_proc->lock );
-	cl_qlist_insert_tail( &p_async_proc->item_queue,
-		&p_item->pool_item.list_item );
-	cl_spinlock_release( &p_async_proc->lock );
-
-	/* Signal the thread pool to wake up. */
-	cl_thread_pool_signal( &p_async_proc->thread_pool );
-}
-
-static void
-__cl_async_proc_worker(
-	IN	void* const	context)
-{
-	cl_async_proc_t			*p_async_proc = (cl_async_proc_t*)context;
-	cl_list_item_t			*p_list_item;
-	cl_async_proc_item_t	*p_item;
-
-	/* Process items from the head of the queue until it is empty. */
-	cl_spinlock_acquire( &p_async_proc->lock );
-	p_list_item = cl_qlist_remove_head( &p_async_proc->item_queue );
-	while( p_list_item != cl_qlist_end( &p_async_proc->item_queue ) )
-	{
-		/* Release the lock during the user's callback. */
-		cl_spinlock_release( &p_async_proc->lock );
-
-		/* Invoke the user callback. */
-		p_item = (cl_async_proc_item_t*)p_list_item;
-		p_item->pfn_callback( p_item );
-
-		/* Acquire the lock again to continue processing. */
-		cl_spinlock_acquire( &p_async_proc->lock );
-		/* Get the next item in the queue. */
-		p_list_item = cl_qlist_remove_head( &p_async_proc->item_queue );
-	}
-
-	/* The queue is empty.  Release the lock and return. */
-	cl_spinlock_release( &p_async_proc->lock );
-}
diff --git a/osm/complib/cl_memory.c b/osm/complib/cl_memory.c
deleted file mode 100644
index daf7fe1..0000000
--- a/osm/complib/cl_memory.c
+++ /dev/null
@@ -1,515 +0,0 @@
-/*
- * Copyright (c) 2004-2006 Voltaire, Inc. All rights reserved.
- * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved.
- * Copyright (c) 1996-2003 Intel Corporation. All rights reserved.
- *
- * This software is available to you under a choice of one of two
- * licenses.  You may choose to be licensed under the terms of the GNU
- * General Public License (GPL) Version 2, available from the file
- * COPYING in the main directory of this source tree, or the
- * OpenIB.org BSD license below:
- *
- *     Redistribution and use in source and binary forms, with or
- *     without modification, are permitted provided that the following
- *     conditions are met:
- *
- *      - Redistributions of source code must retain the above
- *        copyright notice, this list of conditions and the following
- *        disclaimer.
- *
- *      - Redistributions in binary form must reproduce the above
- *        copyright notice, this list of conditions and the following
- *        disclaimer in the documentation and/or other materials
- *        provided with the distribution.
- *
- * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
- * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
- * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
- * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
- * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
- * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
- * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
- * SOFTWARE.
- *
- */
-
-/*
- * Abstract:
- *	Implementation of memory allocation tracking functions.
- *
- * Environment:
- *	All
- *
- * $Revision: 1.4 $
- */
-
-#if HAVE_CONFIG_H
-#  include <config.h>
-#endif /* HAVE_CONFIG_H */
-
-#include <string.h>
-#include <complib/cl_memtrack.h>
-#define  _MEM_DEBUG_MODE_ 0
-#ifdef _MEM_DEBUG_MODE_
-/* 
-   In the mem debug mode we will be wrapping up the allocated buffer
-   with magic constants and the required size and then check during free.
-   
-   The memory layout will be:
-   |<magic start>|<req size>|<buffer ....>|<magic end>|
-   
-*/
-
-#define _MEM_DEBUG_MAGIC_SIZE_  4
-#define _MEM_DEBUG_EXTRA_SIZE_  sizeof(size) + 8
-static uint8_t _MEM_DEBUG_MAGIC_START_[4] = {0x12, 0x34, 0x56, 0x78, };
-static uint8_t _MEM_DEBUG_MAGIC_END_[4] =   {0x87, 0x65, 0x43, 0x21, };
-#endif
-
-cl_mem_tracker_t		*gp_mem_tracker = NULL;
-
-/*
- * Allocates memory.
- */
-void*
-__cl_malloc_priv(
-	IN	const size_t	size );
-
-/*
- * Deallocates memory.
- */
-void
-__cl_free_priv(
-	IN	void* const	p_memory );
-
-/*
- * Allocate and initialize the memory tracker object.
- */
-static inline void
-__cl_mem_track_start( void )
-{
-	cl_status_t			status;
-
-	if( gp_mem_tracker )
-		return;
-
-	/* Allocate the memory tracker object. */
-	gp_mem_tracker = (cl_mem_tracker_t*)
-		__cl_malloc_priv( sizeof(cl_mem_tracker_t) );
-
-	if( !gp_mem_tracker )
-		return;
-
-	/* Initialize the free list. */
-	cl_qlist_init( &gp_mem_tracker->free_hdr_list );
-	/* Initialize the allocation list. */
-	cl_qlist_init( &gp_mem_tracker->alloc_list );
-
-	/* Initialize the spin lock to protect list operations. */
-	status = cl_spinlock_init( &gp_mem_tracker->lock );
-	if( status != CL_SUCCESS )
-	{
-		__cl_free_priv( gp_mem_tracker );
-		return;
-	}
-
-	cl_msg_out( "\n\n\n*** Memory tracker object address = %p ***\n\n\n",
-		gp_mem_tracker );
-}
-
-/*
- * Clean up memory tracking.
- */
-static inline void
-__cl_mem_track_stop( void )
-{
-	cl_list_item_t	*p_list_item;
-
-	if( !gp_mem_tracker )
-		return;
-
-	if( !cl_is_qlist_empty( &gp_mem_tracker->alloc_list ) )
-	{
-		/* There are still items in the list.  Print them out. */
-		cl_mem_display();
-	}
-
-	/* Free all allocated headers. */
-	cl_spinlock_acquire( &gp_mem_tracker->lock );
-	while( !cl_is_qlist_empty( &gp_mem_tracker->alloc_list ) )
-	{
-		p_list_item = cl_qlist_remove_head( &gp_mem_tracker->alloc_list );
-		__cl_free_priv(
-			PARENT_STRUCT( p_list_item, cl_malloc_hdr_t, list_item ) );
-	}
-
-	while( !cl_is_qlist_empty( &gp_mem_tracker->free_hdr_list ) )
-	{
-		p_list_item = cl_qlist_remove_head( &gp_mem_tracker->free_hdr_list );
-		__cl_free_priv(
-			PARENT_STRUCT( p_list_item, cl_malloc_hdr_t, list_item ) );
-	}
-	cl_spinlock_release( &gp_mem_tracker->lock );
-
-	/* Destory all objects in the memory tracker object. */
-	cl_spinlock_destroy( &gp_mem_tracker->lock );
-
-	/* Free the memory allocated for the memory tracker object. */
-	__cl_free_priv( gp_mem_tracker );
-}
-
-/*
- * Enables memory allocation tracking.
- */
-void
-__cl_mem_track(
-	IN	const boolean_t	start )
-{
-	if( start )
-		__cl_mem_track_start();
-	else
-		__cl_mem_track_stop();
-}
-
-/*
- * Display memory usage.
- */
-void
-cl_mem_display( void )
-{
-	cl_list_item_t		*p_list_item;
-	cl_malloc_hdr_t		*p_hdr;
-
-	if( !gp_mem_tracker )
-		return;
-
-	cl_spinlock_acquire( &gp_mem_tracker->lock );
-	cl_msg_out( "\n\n\n*** Memory Usage ***\n" );
-	p_list_item = cl_qlist_head( &gp_mem_tracker->alloc_list );
-	while( p_list_item != cl_qlist_end( &gp_mem_tracker->alloc_list ) )
-	{
-		/*
-		 * Get the pointer to the header.  Note that the object member of the
-		 * list item will be used to store the pointer to the user's memory.
-		 */
-		p_hdr = PARENT_STRUCT( p_list_item, cl_malloc_hdr_t, list_item );
-
-		cl_msg_out( "\tMemory block at %p allocated in file %s line %d\n",
-			p_hdr->p_mem, p_hdr->file_name, p_hdr->line_num );
-
-		p_list_item = cl_qlist_next( p_list_item );
-	}
-	cl_msg_out( "*** End of Memory Usage ***\n\n" );
-	cl_spinlock_release( &gp_mem_tracker->lock );
-}
-
-/*
- * Check the memory using the magic bits to see if anything corrupted
- * our memory.
- */
-boolean_t
-cl_mem_check( void )
-{
-   boolean_t res = TRUE;
-
-#ifdef _MEM_DEBUG_MODE_
-   {
- 	cl_list_item_t		*p_list_item;
-	cl_malloc_hdr_t		*p_hdr;
-   size_t size;
-   void *p_mem;
-
-	if( !gp_mem_tracker )
-		return res;
-
-	cl_spinlock_acquire( &gp_mem_tracker->lock );
-   /*	cl_msg_out( "\n\n\n*** Memory Checker ***\n" ); */
-	p_list_item = cl_qlist_head( &gp_mem_tracker->alloc_list );
-	while( p_list_item != cl_qlist_end( &gp_mem_tracker->alloc_list ) )
-	{
-     /*
-      * Get the pointer to the header.  Note that the object member of the
-      * list item will be used to store the pointer to the user's memory.
-      */
-     p_hdr = PARENT_STRUCT( p_list_item, cl_malloc_hdr_t, list_item );
-     
-     /*     cl_msg_out( "\tMemory block at %p allocated in file %s line %d\n",
-            p_hdr->p_mem, p_hdr->file_name, p_hdr->line_num ); */
-     
-     /* calc the start */
-     p_mem = (char*)p_hdr->p_mem - sizeof(size) - _MEM_DEBUG_MAGIC_SIZE_;
-     /* check the header magic: */
-     if (memcmp(p_mem, &_MEM_DEBUG_MAGIC_START_, _MEM_DEBUG_MAGIC_SIZE_))
-     {
-       cl_msg_out("\n *** cl_mem_check ERROR: BAD Magic Start in free of memory:%p file:%s line:%d\n", 
-                  p_hdr->p_mem , p_hdr->file_name, p_hdr->line_num
-                  );
-       res = FALSE;
-     }
-     else 
-     {
-       /* obtain the size from the header */
-       memcpy(&size, (char*)p_mem + _MEM_DEBUG_MAGIC_SIZE_, sizeof(size));
-       
-       if (memcmp((char*)p_mem + sizeof(size) + _MEM_DEBUG_MAGIC_SIZE_ + size, 
-                  &_MEM_DEBUG_MAGIC_END_, _MEM_DEBUG_MAGIC_SIZE_))
-       {
-         cl_msg_out("\n *** cl_mem_check ERROR: BAD Magic End in free of memory:%p file:%s line:%d\n", 
-                    p_hdr->p_mem , p_hdr->file_name, p_hdr->line_num
-                    );
-         res = FALSE;
-       }
-     }
-
-     p_list_item = cl_qlist_next( p_list_item );
-	}
-   /*	cl_msg_out( "*** End of Memory Checker ***\n\n" ); */
-	cl_spinlock_release( &gp_mem_tracker->lock );
-   }
-#endif
-   return res;
-}
-
-/*
- * Allocates memory and stores information about the allocation in a list.
- * The contents of the list can be printed out by calling the function
- * "MemoryReportUsage".  Memory allocation will succeed even if the list
- * cannot be created.
- */
-void*
-__cl_malloc_trk(
-	IN	const char* const	p_file_name,
-	IN	const int32_t		line_num,
-	IN	const size_t		size )
-{
-	cl_malloc_hdr_t	*p_hdr;
-	cl_list_item_t	*p_list_item;
-	void			*p_mem;
-	char			temp_buf[FILE_NAME_LENGTH];
-	int32_t			temp_line;
-
-#ifdef _MEM_DEBUG_MODE_
-      /* If we are running in MEM_DEBUG_MODE then 
-         the cl_mem_check will be called on every run */
-      if (cl_mem_check() == FALSE) 
-      {
-        cl_msg_out( "*** MEMORY ERROR !!! ***\n" );
-        CL_ASSERT(0);
-      }
-#endif
-
-	/*
-	 * Allocate the memory first, so that we give the user's allocation
-	 * priority over the the header allocation.
-	 */
-#ifndef _MEM_DEBUG_MODE_
-   p_mem = __cl_malloc_priv( size );
-	if( !p_mem )
-		return( NULL );
-#else
-   p_mem = __cl_malloc_priv( size + sizeof(size) + 32 );
-	if( !p_mem )
-		return( NULL );
-   /* now poisen */
-   memset(p_mem, 0xA5, size + _MEM_DEBUG_EXTRA_SIZE_);
-   /* special layout */
-   memcpy(p_mem, &_MEM_DEBUG_MAGIC_START_, _MEM_DEBUG_MAGIC_SIZE_);
-   memcpy((char*)p_mem + _MEM_DEBUG_MAGIC_SIZE_, &size, sizeof(size));
-   memcpy((char*)p_mem + sizeof(size) + size + _MEM_DEBUG_MAGIC_SIZE_,
-          &_MEM_DEBUG_MAGIC_END_, _MEM_DEBUG_MAGIC_SIZE_);
-   p_mem = (char*)p_mem +  _MEM_DEBUG_MAGIC_SIZE_ + sizeof(size);
-#endif
-
-	if( !gp_mem_tracker )
-		return( p_mem );
-
-	/*
-	 * Make copies of the file name and line number in case those
-	 * parameters are in paged pool.
-	 */
-	temp_line = line_num;
-	strncpy( temp_buf, p_file_name, FILE_NAME_LENGTH );
-	/* Make sure the string is null terminated. */
-	temp_buf[FILE_NAME_LENGTH - 1] = '\0';
-
-	cl_spinlock_acquire( &gp_mem_tracker->lock );
-
-	/* Get a header from the free header list. */
-	p_list_item = cl_qlist_remove_head( &gp_mem_tracker->free_hdr_list );
-	if( p_list_item != cl_qlist_end( &gp_mem_tracker->free_hdr_list ) )
-	{
-		/* Set the header pointer to the header retrieved from the list. */
-		p_hdr = PARENT_STRUCT( p_list_item, cl_malloc_hdr_t, list_item );
-	}
-	else
-	{
-		/* We failed to get a free header.  Allocate one. */
-		p_hdr = __cl_malloc_priv( sizeof(cl_malloc_hdr_t) );
-		if( !p_hdr )
-		{
-			/* We failed to allocate the header.  Return the user's memory. */
-			cl_spinlock_release( &gp_mem_tracker->lock );
-			return( p_mem );
-		}
-	}
-	memcpy( p_hdr->file_name, temp_buf, FILE_NAME_LENGTH );
-	p_hdr->line_num = temp_line;
-	/*
-	 * We store the pointer to the memory returned to the user.  This allows
-	 * searching the list of allocated memory even if the buffer allocated is
-	 * not in the list without dereferencing memory we do not own.
-	 */
-	p_hdr->p_mem = p_mem;
-
-	/* Insert the header structure into our allocation list. */
-	cl_qlist_insert_tail( &gp_mem_tracker->alloc_list, &p_hdr->list_item );
-	cl_spinlock_release( &gp_mem_tracker->lock );
-
-	return( p_mem );
-}
-
-/*
- * Allocate non-tracked memory.
- */
-void*
-__cl_malloc_ntrk(
-	IN	const size_t	size )
-{
-	return( __cl_malloc_priv( size ) );
-}
-
-void*
-__cl_zalloc_trk(
-	IN	const char* const	p_file_name,
-	IN	const int32_t		line_num,
-	IN	const size_t		size )
-{
-	void	*p_buffer;
-
-	p_buffer = __cl_malloc_trk( p_file_name, line_num, size );
-	if( p_buffer )
-		memset( p_buffer, 0, size );
-
-	return( p_buffer );
-}
-
-void*
-__cl_zalloc_ntrk(
-	IN	const size_t	size )
-{
-	void	*p_buffer;
-
-	p_buffer = __cl_malloc_priv( size );
-	if( p_buffer )
-		memset( p_buffer, 0, size );
-
-	return( p_buffer );
-}
-
-static cl_status_t
-__cl_find_mem(
-	IN	const cl_list_item_t* const p_list_item,
-	IN	void* const					p_memory )
-{
-	cl_malloc_hdr_t		*p_hdr;
-
-	/* Get the pointer to the header. */
-	p_hdr = PARENT_STRUCT( p_list_item, cl_malloc_hdr_t, list_item );
-
-	if( p_memory == p_hdr->p_mem )
-		return( CL_SUCCESS );
-
-	return( CL_NOT_FOUND );
-}
-
-void
-__cl_free_trk(
-  IN	const char* const	p_file_name,
-  IN	const int32_t		line_num,  
-  IN	void* const	      p_memory )
-{
-	cl_malloc_hdr_t		*p_hdr;
-	cl_list_item_t		*p_list_item;
-
-#ifdef _MEM_DEBUG_MODE_
-      /* If we are running in MEM_DEBUG_MODE then 
-         the cl_mem_check will be called on every run */
-      if (cl_mem_check() == FALSE) 
-      {
-        cl_msg_out( "*** MEMORY ERROR !!! ***\n" );
-        CL_ASSERT(0);
-      }
-#endif
-
-	if( gp_mem_tracker )
-	{
-		cl_spinlock_acquire( &gp_mem_tracker->lock );
-
-		/*
-		 * Removes an item from the allocation tracking list given a pointer
-		 * To the user's data and returns the pointer to header referencing the
-		 * allocated memory block.
-		 */
-		p_list_item = cl_qlist_find_from_tail( &gp_mem_tracker->alloc_list,
-			__cl_find_mem, p_memory );
-
-		if( p_list_item != cl_qlist_end(&gp_mem_tracker->alloc_list) )
-		{
-			/* Get the pointer to the header. */
-			p_hdr = PARENT_STRUCT( p_list_item, cl_malloc_hdr_t, list_item );
-			/* Remove the item from the list. */
-			cl_qlist_remove_item( &gp_mem_tracker->alloc_list, p_list_item );
-
-			/* Return the header to the free header list. */
-			cl_qlist_insert_head( &gp_mem_tracker->free_hdr_list,
-				&p_hdr->list_item );
-		} else {
-        	cl_msg_out("\n *** cl_free ERROR: free of non tracked memory:%p file:%s line:%d\n", 
-                    p_memory , p_file_name, line_num
-                    );
-      }
-		cl_spinlock_release( &gp_mem_tracker->lock );
-	}
-
-#ifdef _MEM_DEBUG_MODE_
-   {
-     size_t size;
-     void *p_mem;
-
-     /* calc the start */
-     p_mem = (char*)p_memory - sizeof(size) - _MEM_DEBUG_MAGIC_SIZE_;
-     /* check the header magic: */
-     if (memcmp(p_mem, &_MEM_DEBUG_MAGIC_START_, _MEM_DEBUG_MAGIC_SIZE_))
-     {
-       cl_msg_out("\n *** cl_free ERROR: BAD Magic Start in free of memory:%p file:%s line:%d\n", 
-                  p_memory , p_file_name, line_num
-                  );
-     } 
-     else 
-     {
-       /* obtain the size from the header */
-       memcpy(&size, (char*)p_mem + _MEM_DEBUG_MAGIC_SIZE_, sizeof(size));
-       
-       if (memcmp((char*)p_mem + sizeof(size) + _MEM_DEBUG_MAGIC_SIZE_ + size, 
-                  &_MEM_DEBUG_MAGIC_END_, _MEM_DEBUG_MAGIC_SIZE_))
-       {
-         cl_msg_out("\n *** cl_free ERROR: BAD Magic End in free of memory:%p file:%s line:%d\n", 
-                    p_memory , p_file_name, line_num
-                    );
-       }
-       /* now poisen */
-       memset(p_mem, 0x5A, size + _MEM_DEBUG_EXTRA_SIZE_);
-     }
-     __cl_free_priv( p_mem );
-   }
-#else
-	__cl_free_priv( p_memory );
-#endif
-}
-
-void
-__cl_free_ntrk(
-	IN	void* const	p_memory )
-{
-	__cl_free_priv( p_memory );
-}
diff --git a/osm/complib/cl_memory_osd.c b/osm/complib/cl_memory_osd.c
deleted file mode 100644
index ac2658b..0000000
--- a/osm/complib/cl_memory_osd.c
+++ /dev/null
@@ -1,93 +0,0 @@
-/*
- * Copyright (c) 2004-2006 Voltaire, Inc. All rights reserved.
- * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved.
- * Copyright (c) 1996-2003 Intel Corporation. All rights reserved.
- *
- * This software is available to you under a choice of one of two
- * licenses.  You may choose to be licensed under the terms of the GNU
- * General Public License (GPL) Version 2, available from the file
- * COPYING in the main directory of this source tree, or the
- * OpenIB.org BSD license below:
- *
- *     Redistribution and use in source and binary forms, with or
- *     without modification, are permitted provided that the following
- *     conditions are met:
- *
- *      - Redistributions of source code must retain the above
- *        copyright notice, this list of conditions and the following
- *        disclaimer.
- *
- *      - Redistributions in binary form must reproduce the above
- *        copyright notice, this list of conditions and the following
- *        disclaimer in the documentation and/or other materials
- *        provided with the distribution.
- *
- * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
- * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
- * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
- * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
- * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
- * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
- * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
- * SOFTWARE.
- *
- */
-
-/*
- * Abstract:
- *	Implementation of memory manipulation functions for Linux user mode.
- *
- * Environment:
- *	Linux User Mode
- *
- * $Revision: 1.3 $
- */
-
-#if HAVE_CONFIG_H
-#  include <config.h>
-#endif /* HAVE_CONFIG_H */
-
-#include <complib/cl_memory.h>
-#include <stdlib.h>
-
-void*
-__cl_malloc_priv(
-	IN	const size_t	size )
-{
-	return malloc( size );
-}
-
-void
-__cl_free_priv(
-	IN	void* const	p_memory )
-{
-	free( p_memory );
-}
-
-void
-cl_memset(
-	IN	void* const		p_memory,
-	IN	const uint8_t	fill,
-	IN	const size_t	count )
-{
-	memset( p_memory, fill, count );
-}
-
-void*
-cl_memcpy(
-	IN	void* const			p_dest,
-	IN	const void* const	p_src,
-	IN	const size_t		count )
-{
-	return( memcpy( p_dest, p_src, count ) );
-}
-
-int32_t
-cl_memcmp(
-	IN	const void* const	p_mem,
-	IN	const void* const	p_ref,
-	IN	const size_t		count )
-{
-	return( memcmp( p_mem, p_ref, count ) );
-}
-
diff --git a/osm/complib/cl_perf.c b/osm/complib/cl_perf.c
deleted file mode 100644
index 9450bb1..0000000
--- a/osm/complib/cl_perf.c
+++ /dev/null
@@ -1,273 +0,0 @@
-/*
- * Copyright (c) 2004-2006 Voltaire, Inc. All rights reserved.
- * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved.
- * Copyright (c) 1996-2003 Intel Corporation. All rights reserved.
- *
- * This software is available to you under a choice of one of two
- * licenses.  You may choose to be licensed under the terms of the GNU
- * General Public License (GPL) Version 2, available from the file
- * COPYING in the main directory of this source tree, or the
- * OpenIB.org BSD license below:
- *
- *     Redistribution and use in source and binary forms, with or
- *     without modification, are permitted provided that the following
- *     conditions are met:
- *
- *      - Redistributions of source code must retain the above
- *        copyright notice, this list of conditions and the following
- *        disclaimer.
- *
- *      - Redistributions in binary form must reproduce the above
- *        copyright notice, this list of conditions and the following
- *        disclaimer in the documentation and/or other materials
- *        provided with the distribution.
- *
- * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
- * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
- * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
- * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
- * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
- * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
- * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
- * SOFTWARE.
- *
- */
-
-/*
- * Abstract:
- *	Implementation of performance tracking.
- *
- * Environment:
- *	All supported environments.
- *
- * $Revision: 1.3 $
- */
-
-#if HAVE_CONFIG_H
-#  include <config.h>
-#endif /* HAVE_CONFIG_H */
-
-#include <stdlib.h>
-#include <string.h>
-
-/*
- * Always turn on performance tracking when building this file to allow the
- * performance counter functions to be built into the component library.
- * Users control their use of the functions by defining the PERF_TRACK_ON
- * keyword themselves before including cl_perf.h to enable the macros to
- * resolve to the internal functions.
- */
-#define PERF_TRACK_ON
-
-#include <complib/cl_perf.h>
-#include <complib/cl_debug.h>
-
-uint64_t
-__cl_perf_run_calibration(
-	IN	cl_perf_t* const p_perf );
-
-/*
- * Initialize the state of the performance tracker.
- */
-void
-__cl_perf_construct(
-	IN	cl_perf_t* const	p_perf )
-{
-	memset( p_perf, 0, sizeof(cl_perf_t) );
-	p_perf->state = CL_UNINITIALIZED;
-}
-
-/*
- * Initialize the performance tracker.
- */
-cl_status_t
-__cl_perf_init(
-	IN	cl_perf_t* const	p_perf,
-	IN	const uintn_t		num_counters )
-{
-	cl_status_t		status;
-	cl_spinlock_t	lock;
-	uintn_t			i;
-	static uint64_t	locked_calibration_time = 0;
-	static uint64_t	normal_calibration_time;
-
-	CL_ASSERT( p_perf );
-	CL_ASSERT( !p_perf->size && num_counters );
-
-	/* Construct the performance tracker. */
-	__cl_perf_construct( p_perf );
-
-	/* Allocate an array of counters. */
-	p_perf->size = num_counters;
-	p_perf->data_array = (cl_perf_data_t*)
-		malloc( sizeof(cl_perf_data_t) * num_counters );
-
-	if( !p_perf->data_array )
-		return( CL_INSUFFICIENT_MEMORY );
-	else
-		memset( p_perf->data_array, 0,
-			sizeof(cl_perf_data_t) * num_counters );
-
-	/* Initialize the user's counters. */
-	for( i = 0; i < num_counters; i++ )
-	{
-		p_perf->data_array[i].min_time = ((uint64_t)~0);
-		cl_spinlock_construct( &p_perf->data_array[i].lock );
-	}
-
-	for( i = 0; i < num_counters; i++ )
-	{
-		status = cl_spinlock_init( &p_perf->data_array[i].lock );
-		if( status != CL_SUCCESS )
-		{
-			__cl_perf_destroy( p_perf, FALSE );
-			return( status );
-		}
-	}
-
-	/*
-	 * Run the calibration only if it has not been run yet.  Subsequent
-	 * calls will use the results from the first calibration.
-	 */
-	if( !locked_calibration_time )
-	{
-		/*
-		 * Perform the calibration under lock to prevent thread context
-		 * switches.
-		 */
-		cl_spinlock_construct( &lock );
-		status = cl_spinlock_init( &lock );
-		if( status != CL_SUCCESS )
-		{
-			__cl_perf_destroy( p_perf, FALSE );
-			return( status );
-		}
-
-		/* Measure the impact when running at elevated thread priority. */
-		cl_spinlock_acquire( &lock );
-		locked_calibration_time = __cl_perf_run_calibration( p_perf );
-		cl_spinlock_release( &lock );
-		cl_spinlock_destroy( &lock );
-
-		/* Measure the impact when runnin at normal thread priority. */
-		normal_calibration_time = __cl_perf_run_calibration( p_perf );
-	}
-
-	/* Reset the user's performance counter. */
-	p_perf->normal_calibration_time = locked_calibration_time;
-	p_perf->locked_calibration_time = normal_calibration_time;
-	p_perf->data_array[0].count = 0;
-	p_perf->data_array[0].total_time = 0;
-	p_perf->data_array[0].min_time = ((uint64_t)~0);
-
-	p_perf->state = CL_INITIALIZED;
-
-	return( CL_SUCCESS );
-}
-
-/*
- * Measure the time to take performance counters.
- */
-uint64_t
-__cl_perf_run_calibration(
-	IN	cl_perf_t* const	p_perf )
-{
-	uint64_t		start_time;
-	uintn_t			i;
-	PERF_DECLARE( 0 );
-
-	/* Start timing. */
-	start_time = cl_get_time_stamp();
-
-	/*
-	 * Get the performance counter repeatedly in a loop.  Use the first
-	 * user counter as our test counter.
-	 */
-	for( i = 0; i < PERF_CALIBRATION_TESTS; i++ )
-	{
-		cl_perf_start( 0 );
-		cl_perf_stop( p_perf, 0 );
-	}
-
-	/* Calculate the total time for the calibration. */
-	return( cl_get_time_stamp() - start_time );
-}
-
-/*
- * Destroy the performance tracker.
- */
-void
-__cl_perf_destroy(
-	IN	cl_perf_t* const	p_perf,
-	IN	const boolean_t		display )
-{
-	uintn_t	i;
-
-	CL_ASSERT( cl_is_state_valid( p_perf->state ) );
-
-	if( !p_perf->data_array )
-		return;
-
-	/* Display the performance data as requested. */
-	if( display && p_perf->state == CL_INITIALIZED )
-		__cl_perf_display( p_perf );
-
-	/* Destroy the user's counters. */
-	for( i = 0; i < p_perf->size; i++ )
-		cl_spinlock_destroy( &p_perf->data_array[i].lock );
-
-	free( p_perf->data_array );
-	p_perf->data_array = NULL;
-
-	p_perf->state = CL_UNINITIALIZED;
-}
-
-/*
- * Reset the performance counters.
- */
-void
-__cl_perf_reset(
-	IN	cl_perf_t* const		p_perf )
-{
-	uintn_t	i;
-
-	for( i = 0; i < p_perf->size; i++ )
-	{
-		cl_spinlock_acquire( &p_perf->data_array[i].lock );
-		p_perf->data_array[i].min_time = ((uint64_t)~0);
-		p_perf->data_array[i].total_time = 0;
-		p_perf->data_array[i].count = 0;
-		cl_spinlock_release( &p_perf->data_array[i].lock );
-	}
-}
-
-/*
- * Display the captured performance data.
- */
-void
-__cl_perf_display(
-	IN	const cl_perf_t* const	p_perf )
-{
-	uintn_t	i;
-
-	CL_ASSERT( p_perf );
-	CL_ASSERT( p_perf->state == CL_INITIALIZED );
-
-	cl_msg_out( "\n\n\nCL Perf:\tPerformance Data\n" );
-
-	cl_msg_out( "CL Perf:\tCounter Calibration Time\n" );
-	cl_msg_out( "CL Perf:\tLocked TotalTime\tNormal TotalTime\tTest Count\n" );
-	cl_msg_out( "CL Perf:\t%"PRIu64"\t%"PRIu64"\t%u\n",
-		p_perf->locked_calibration_time, p_perf->normal_calibration_time,
-		PERF_CALIBRATION_TESTS );
-
-	cl_msg_out( "CL Perf:\tUser Performance Counters\n" );
-	cl_msg_out( "CL Perf:\tIndex\tTotalTime\tMinTime\tCount\n" );
-	for( i = 0; i < p_perf->size; i++ )
-	{
-		cl_msg_out( "CL Perf:\t%lu\t%"PRIu64"\t%"PRIu64"\t%"PRIu64"\n",
-			i, p_perf->data_array[i].total_time,
-			p_perf->data_array[i].min_time, p_perf->data_array[i].count );
-	}
-	cl_msg_out( "CL Perf:\tEnd of User Performance Counters\n" );
-}
diff --git a/osm/complib/libosmcomp.map b/osm/complib/libosmcomp.map
index 3b8c040..9d9588b 100644
--- a/osm/complib/libosmcomp.map
+++ b/osm/complib/libosmcomp.map
@@ -1,9 +1,5 @@
 OSMCOMP_1.1 {
 	global:
-		cl_async_proc_construct;
-		cl_async_proc_init;
-		cl_async_proc_destroy;
-		cl_async_proc_queue;
 		complib_init;
 		complib_exit;
 		cl_is_debug;
@@ -75,28 +71,6 @@ OSMCOMP_1.1 {
 		cl_fmap_remove;
 		cl_fmap_merge;
 		cl_fmap_delta;
-		__cl_malloc_priv;
-		__cl_free_priv;
-		__cl_mem_track;
-		cl_mem_display;
-		cl_mem_check;
-		__cl_malloc_trk;
-		__cl_malloc_ntrk;
-		__cl_zalloc_trk;
-		__cl_zalloc_ntrk;
-		__cl_find_mem;
-		__cl_free_trk;
-		__cl_free_ntrk;
-		cl_memset;
-		cl_memcpy;
-		cl_memcmp;
-		__cl_perf_run_calibration;
-		__cl_perf_construct;
-		__cl_perf_init;
-		__cl_perf_run_calibration;
-		__cl_perf_destroy;
-		__cl_perf_reset;
-		__cl_perf_display;
 		cl_qcpool_construct;
 		cl_qcpool_init;
 		cl_qcpool_destroy;
@@ -171,13 +145,6 @@ OSMCOMP_1.1 {
 		cl_vector_find_from_end;
 		cl_atomic_spinlock;
 		cl_atomic_dec;
-		cl_free;
-		cl_malloc;
-		cl_perf_construct;
-		cl_perf_destroy;
-		cl_perf_display;
-		cl_perf_init;
-		cl_perf_reset;
 		cl_zalloc;
 		ib_error_str;
 		ib_async_event_str;
diff --git a/osm/include/Makefile.am b/osm/include/Makefile.am
index 5efc11a..cf1b0e7 100644
--- a/osm/include/Makefile.am
+++ b/osm/include/Makefile.am
@@ -105,7 +105,6 @@ EXTRA_DIST = \
 	$(srcdir)/complib/cl_qlockpool.h \
 	$(srcdir)/complib/cl_event_wheel.h \
 	$(srcdir)/complib/cl_thread.h \
-	$(srcdir)/complib/cl_memory.h \
 	$(srcdir)/complib/cl_packoff.h \
 	$(srcdir)/complib/cl_pool.h \
 	$(srcdir)/complib/cl_types_osd.h \
@@ -118,12 +117,9 @@ EXTRA_DIST = \
 	$(srcdir)/complib/cl_dispatcher.h \
 	$(srcdir)/complib/cl_spinlock_osd.h \
 	$(srcdir)/complib/cl_debug_osd.h \
-	$(srcdir)/complib/cl_perf.h \
 	$(srcdir)/complib/cl_qmap.h \
 	$(srcdir)/complib/cl_byteswap.h \
-	$(srcdir)/complib/cl_async_proc.h \
 	$(srcdir)/complib/cl_threadpool.h \
-	$(srcdir)/complib/cl_memtrack.h \
 	$(srcdir)/complib/cl_types.h \
 	$(srcdir)/complib/cl_fleximap.h \
 	$(srcdir)/complib/cl_qcomppool.h \
diff --git a/osm/include/complib/cl_async_proc.h b/osm/include/complib/cl_async_proc.h
deleted file mode 100644
index 8d6a71f..0000000
--- a/osm/include/complib/cl_async_proc.h
+++ /dev/null
@@ -1,334 +0,0 @@
-/*
- * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved.
- * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved.
- * Copyright (c) 1996-2003 Intel Corporation. All rights reserved.
- *
- * This software is available to you under a choice of one of two
- * licenses.  You may choose to be licensed under the terms of the GNU
- * General Public License (GPL) Version 2, available from the file
- * COPYING in the main directory of this source tree, or the
- * OpenIB.org BSD license below:
- *
- *     Redistribution and use in source and binary forms, with or
- *     without modification, are permitted provided that the following
- *     conditions are met:
- *
- *      - Redistributions of source code must retain the above
- *        copyright notice, this list of conditions and the following
- *        disclaimer.
- *
- *      - Redistributions in binary form must reproduce the above
- *        copyright notice, this list of conditions and the following
- *        disclaimer in the documentation and/or other materials
- *        provided with the distribution.
- *
- * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
- * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
- * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
- * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
- * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
- * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
- * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
- * SOFTWARE.
- *
- */
-
-/*
- * Abstract:
- *	Declaration of the asynchronous processing module.
- *
- * Environment:
- *	All
- *
- * $Revision: 1.3 $
- */
-
-#ifndef _CL_ASYNC_PROC_H_
-#define _CL_ASYNC_PROC_H_
-
-#include <complib/cl_qlist.h>
-#include <complib/cl_qpool.h>
-#include <complib/cl_threadpool.h>
-#include <complib/cl_spinlock.h>
-
-#ifdef __cplusplus
-#  define BEGIN_C_DECLS extern "C" {
-#  define END_C_DECLS   }
-#else /* !__cplusplus */
-#  define BEGIN_C_DECLS
-#  define END_C_DECLS
-#endif /* __cplusplus */
-
-BEGIN_C_DECLS
-
-/****h* Component Library/Asynchronous Processor
-* NAME
-*	Asynchronous Processor
-*
-* DESCRIPTION
-*	The asynchronous processor provides threads for executing queued callbacks.
-*
-*	The threads in the asynchronous processor wait for callbacks to be queued.
-*
-*	The asynchronous processor functions operate on a cl_async_proc_t structure
-*	which should be treated as opaque and manipulated only through the provided
-*	functions.
-*
-* SEE ALSO
-*	Structures:
-*		cl_async_proc_t, cl_async_proc_item_t
-*
-*	Initialization:
-*		cl_async_proc_construct, cl_async_proc_init, cl_async_proc_destroy
-*
-*	Manipulation:
-*		cl_async_proc_queue
-*********/
-
-/****s* Component Library: Asynchronous Processor/cl_async_proc_t
-* NAME
-*	cl_async_proc_t
-*
-* DESCRIPTION
-*	Asynchronous processor structure.
-*
-*	The cl_async_proc_t structure should be treated as opaque, and should be
-*	manipulated only through the provided functions.
-*
-* SYNOPSIS
-*/
-typedef struct _cl_async_proc
-{
-	cl_thread_pool_t	thread_pool;
-	cl_qlist_t			item_queue;
-	cl_spinlock_t		lock;
-
-} cl_async_proc_t;
-/*
-* FIELDS
-*	item_pool
-*		Pool of items storing the callback function and contexts to be invoked
-*		by the asynchronous processor's threads.
-*
-*	thread_pool
-*		Thread pool that will invoke the callbacks.
-*
-*	item_queue
-*		Queue of items that the threads should process.
-*
-*	lock
-*		Lock used to synchronize access to the item pool and queue.
-*
-* SEE ALSO
-*	Asynchronous Processor
-*********/
-
-/*
- * Declare the structure so we can reference it in the following function
- * prototype.
- */
-typedef struct _cl_async_proc_item	*__p_cl_async_proc_item_t;
-
-/****d* Component Library: Asynchronous Processor/cl_pfn_async_proc_cb_t
-* NAME
-*	cl_pfn_async_proc_cb_t
-*
-* DESCRIPTION
-*	The cl_pfn_async_proc_cb_t function type defines the prototype for
-*	callbacks queued to and invoked by the asynchronous processor.
-*
-* SYNOPSIS
-*/
-typedef void
-(*cl_pfn_async_proc_cb_t)(
-	IN	struct _cl_async_proc_item	*p_item );
-/*
-* PARAMETERS
-*	p_item
-*		Pointer to the cl_async_proc_item_t structure that was queued in
-*		a call to cl_async_proc_queue.
-*
-* NOTES
-*	This function type is provided as function prototype reference for the
-*	function provided by users as a parameter to the cl_async_proc_queue
-*	function.
-*
-* SEE ALSO
-*	Asynchronous Processor, cl_async_proc_item_t
-*********/
-
-/****s* Component Library: Asynchronous Processor/cl_async_proc_item_t
-* NAME
-*	cl_async_proc_item_t
-*
-* DESCRIPTION
-*	Asynchronous processor item structure passed to the cl_async_proc_queue
-*	function to queue a callback for execution.
-*
-* SYNOPSIS
-*/
-typedef struct _cl_async_proc_item
-{
-	cl_pool_item_t			pool_item;
-	cl_pfn_async_proc_cb_t	pfn_callback;
-
-} cl_async_proc_item_t;
-/*
-* FIELDS
-*	pool_item
-*		Pool item for queuing the item to be invoked by the asynchronous
-*		processor's threads.  This field is defined as a pool item to
-*		allow items to be managed by a pool.
-*
-*	pfn_callback
-*		Pointer to a callback function to invoke when the item is dequeued.
-*
-* SEE ALSO
-*	Asynchronous Processor, cl_async_proc_queue, cl_pfn_async_proc_cb_t
-*********/
-
-/****f* Component Library: Asynchronous Processor/cl_async_proc_construct
-* NAME
-*	cl_async_proc_construct
-*
-* DESCRIPTION
-*	The cl_async_proc_construct function initializes the state of a
-*	thread pool.
-*
-* SYNOPSIS
-*/
-void
-cl_async_proc_construct(
-	IN	cl_async_proc_t* const	p_async_proc );
-/*
-* PARAMETERS
-*	p_async_proc
-*		[in] Pointer to an asynchronous processor structure.
-*
-* RETURN VALUE
-*	This function does not return a value.
-*
-* NOTES
-*	Allows calling cl_async_proc_destroy without first calling
-*	cl_async_proc_init.
-*
-*	Calling cl_async_proc_construct is a prerequisite to calling any other
-*	thread pool function except cl_async_proc_init.
-*
-* SEE ALSO
-*	Asynchronous Processor, cl_async_proc_init, cl_async_proc_destroy
-*********/
-
-/****f* Component Library: Asynchronous Processor/cl_async_proc_init
-* NAME
-*	cl_async_proc_init
-*
-* DESCRIPTION
-*	The cl_async_proc_init function initialized an asynchronous processor
-*	for use.
-*
-* SYNOPSIS
-*/
-cl_status_t
-cl_async_proc_init(
-	IN	cl_async_proc_t* const	p_async_proc,
-	IN	const uint32_t			thread_count,
-	IN	const char* const		name );
-/*
-* PARAMETERS
-*	p_async_proc
-*		[in] Pointer to an asynchronous processor structure to initialize.
-*
-*	thread_count
-*		[in] Number of threads to be managed by the asynchronous processor.
-*
-*	name
-*		[in] Name to associate with the threads.  The name may be up to 16
-*		characters, including a terminating null character.  All threads
-*		created in the asynchronous processor have the same name.
-*
-* RETURN VALUES
-*	CL_SUCCESS if the asynchronous processor creation succeeded.
-*
-*	CL_INSUFFICIENT_MEMORY if there was not enough memory to inititalize
-*	the asynchronous processor.
-*
-*	CL_ERROR if the threads could not be created.
-*
-* NOTES
-*	cl_async_proc_init creates and starts the specified number of threads.
-*	If thread_count is zero, the asynchronous processor creates as many
-*	threads as there are processors in the system.
-*
-* SEE ALSO
-*	Asynchronous Processor, cl_async_proc_construct, cl_async_proc_destroy,
-*	cl_async_proc_queue
-*********/
-
-/****f* Component Library: Asynchronous Processor/cl_async_proc_destroy
-* NAME
-*	cl_async_proc_destroy
-*
-* DESCRIPTION
-*	The cl_async_proc_destroy function performs any necessary cleanup
-*	for a thread pool.
-*
-* SYNOPSIS
-*/
-void
-cl_async_proc_destroy(
-	IN	cl_async_proc_t* const	p_async_proc );
-/*
-* PARAMETERS
-*	p_async_proc
-*		[in] Pointer to an asynchronous processor structure to destroy.
-*
-* RETURN VALUE
-*	This function does not return a value.
-*
-* NOTES
-*	This function blocks until all threads exit, and must therefore not
-*	be called from any of the asynchronous processor's threads. Because of
-*	its blocking nature, callers of cl_async_proc_destroy must ensure that
-*	entering a wait state is valid from the calling thread context.
-*
-*	This function should only be called after a call to
-*	cl_async_proc_construct or cl_async_proc_init.
-*
-* SEE ALSO
-*	Asynchronous Processor, cl_async_proc_construct, cl_async_proc_init
-*********/
-
-/****f* Component Library: Asynchronous Processor/cl_async_proc_queue
-* NAME
-*	cl_async_proc_queue
-*
-* DESCRIPTION
-*	The cl_async_proc_queue function queues a callback to an asynchronous
-*	processor.
-*
-* SYNOPSIS
-*/
-void
-cl_async_proc_queue(
-	IN	cl_async_proc_t* const		p_async_proc,
-	IN	cl_async_proc_item_t* const	p_item );
-/*
-* PARAMETERS
-*	p_async_proc
-*		[in] Pointer to an asynchronous processor structure to initialize.
-*
-*	p_item
-*		[in] Pointer to an asynchronous processor item to queue for execution.
-*		The callback and context fields of the item must be valid.
-*
-* RETURN VALUES
-*	This function does not return a value.
-*
-* SEE ALSO
-*	Asynchronous Processor, cl_async_proc_init, cl_pfn_async_proc_cb_t
-*********/
-
-END_C_DECLS
-
-#endif	/* !defined(_CL_ASYNC_PROC_H_) */
diff --git a/osm/include/complib/cl_memory.h b/osm/include/complib/cl_memory.h
deleted file mode 100644
index 9a8580b..0000000
--- a/osm/include/complib/cl_memory.h
+++ /dev/null
@@ -1,663 +0,0 @@
-/*
- * Copyright (c) 2004-2006 Voltaire, Inc. All rights reserved.
- * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved.
- * Copyright (c) 1996-2003 Intel Corporation. All rights reserved.
- *
- * This software is available to you under a choice of one of two
- * licenses.  You may choose to be licensed under the terms of the GNU
- * General Public License (GPL) Version 2, available from the file
- * COPYING in the main directory of this source tree, or the
- * OpenIB.org BSD license below:
- *
- *     Redistribution and use in source and binary forms, with or
- *     without modification, are permitted provided that the following
- *     conditions are met:
- *
- *      - Redistributions of source code must retain the above
- *        copyright notice, this list of conditions and the following
- *        disclaimer.
- *
- *      - Redistributions in binary form must reproduce the above
- *        copyright notice, this list of conditions and the following
- *        disclaimer in the documentation and/or other materials
- *        provided with the distribution.
- *
- * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
- * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
- * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
- * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
- * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
- * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
- * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
- * SOFTWARE.
- *
- */
-
-/*
- * Abstract:
- *	Declaration of generic memory allocation calls.
- *
- * Environment:
- *	All
- *
- * $Revision: 1.4 $
- */
-
-#ifndef _CL_MEMORY_H_
-#define _CL_MEMORY_H_
-
-#include <complib/cl_types.h>
-
-#ifdef __cplusplus
-#  define BEGIN_C_DECLS extern "C" {
-#  define END_C_DECLS   }
-#else /* !__cplusplus */
-#  define BEGIN_C_DECLS
-#  define END_C_DECLS
-#endif /* __cplusplus */
-
-BEGIN_C_DECLS
-
-/****h* Public/Memory Management
-* NAME
-*	Memory Management
-*
-* DESCRIPTION
-*	The memory management functionality provides memory manipulation
-*	functions as well as powerful debugging tools.
-*
-*	The Allocation Tracking functionality provides a means for tracking memory
-*	allocations in order to detect memory leaks.
-*
-*	Memory allocation tracking stores the file name and line number where
-*	allocations occur. Gathering this information does have an adverse impact
-*	on performance, and memory tracking should therefore not be enabled in
-*	release builds of software.
-*
-*	Memory tracking is compiled into the debug version of the library,
-*	and can be enabled for the release version as well. To Enable memory
-*	tracking in a release build of the public layer, users should define
-*	the MEM_TRACK_ON keyword for compilation.
-*********/
-
-/****i* Public: Memory Management/__cl_mem_track
-* NAME
-*	__cl_mem_track
-*
-* DESCRIPTION
-*	The __cl_mem_track function enables or disables memory allocation tracking.
-*
-* SYNOPSIS
-*/
-void __attribute__((deprecated))
-__cl_mem_track(
-	IN	const boolean_t	start );
-/*
-* PARAMETERS
-*	start
-*		[in] Specifies whether to start or stop memory tracking.
-*
-* RETURN VALUE
-*	This function does not return a value.
-*
-* NOTES
-*	This function performs all necessary initialization for tracking
-*	allocations.  Users should never call this function, as it is called by
-*	the component library framework.
-*
-*	If the Start parameter is set to TRUE, the function starts tracking memory
-*	usage if not already started. When set to FALSE, memory tracking is stoped
-*	and all remaining allocations are displayed to the applicable debugger, if
-*	any.
-*
-*	Starting memory tracking when it is already started has no effect.
-*	Likewise, stoping memory tracking when it is already stopped has no effect.
-*
-* SEE ALSO
-*	Memory Management, cl_mem_display
-**********/
-
-/****f* Public: Memory Management/cl_mem_display
-* NAME
-*	cl_mem_display
-*
-* DESCRIPTION
-*	The cl_mem_display function displays all tracked memory allocations to
-*	the applicable debugger.
-*
-* SYNOPSIS
-*/
-void __attribute__((deprecated))
-cl_mem_display( void );
-/*
-* RETURN VALUE
-*	This function does not return a value.
-*
-* NOTES
-*	Each tracked memory allocation is displayed along with the file name and
-*	line number that allocated it.
-*
-*	Output is sent to the platform's debugging target, which may be the
-*	system log file.
-*
-* SEE ALSO
-*	Memory Management
-**********/
-
-/****f* Public: Memory Management/cl_mem_check
-* NAME
-*	cl_mem_check
-*
-* DESCRIPTION
-*	The cl_mem_check function checks all tracked memory allocations to
-*	the applicable debugger.
-*
-* SYNOPSIS
-*/
-boolean_t __attribute__((deprecated))
-cl_mem_check( void );
-/*
-* RETURN VALUE
-*	TRUE if no errors were found. FALSE - otherwise.
-*
-* NOTES
-*	Each tracked memory allocation is displayed along with the file name and
-*	line number that allocated it.
-*
-*	Output is sent to the platform's debugging target, which may be the
-*	system log file.
-*
-* SEE ALSO
-*	Memory Management
-**********/
-
-/****i* Public: Memory Management/__cl_malloc_trk
-* NAME
-*	__cl_malloc_trk
-*
-* DESCRIPTION
-*	The __cl_malloc_trk function allocates and tracks a block of memory.
-*
-* SYNOPSIS
-*/
-void __attribute__((deprecated)) *
-__cl_malloc_trk(
-	IN	const char* const	p_file_name,
-	IN	const int32_t		line_num,
-	IN	const size_t		size );
-/*
-* PARAMETERS
-*	p_file_name
-*		[in] Name of the source file initiating the allocation.
-*
-*	line_num
-*		[in] Line number in the specified file where the allocation is
-*		initiated
-*
-*	size
-*		[in] Size of the requested allocation.
-*
-* RETURN VALUES
-*	Pointer to allocated memory if successful.
-*
-*	NULL otherwise.
-*
-* NOTES
-*	Allocated memory follows alignment rules specific to the different
-*	environments.
-*	This function is should not be called directly.  The cl_malloc macro will
-*	redirect users to this function when memory tracking is enabled.
-*
-* SEE ALSO
-*	Memory Management, __cl_malloc_ntrk, __cl_zalloc_trk, __cl_free_trk
-**********/
-
-/****i* Public: Memory Management/__cl_zalloc_trk
-* NAME
-*	__cl_zalloc_trk
-*
-* DESCRIPTION
-*	The __cl_zalloc_trk function allocates and tracks a block of memory
-*	initialized to zero.
-*
-* SYNOPSIS
-*/
-void __attribute__((deprecated)) *
-__cl_zalloc_trk(
-	IN	const char* const	p_file_name,
-	IN	const int32_t		line_num,
-	IN	const size_t		bytes );
-/*
-* PARAMETERS
-*	p_file_name
-*		[in] Name of the source file initiating the allocation.
-*
-*	line_num
-*		[in] Line number in the specified file where the allocation is
-*		initiated
-*
-*	size
-*		[in] Size of the requested allocation.
-*
-* RETURN VALUES
-*	Pointer to allocated memory if successful.
-*
-*	NULL otherwise.
-*
-* NOTES
-*	Allocated memory follows alignment rules specific to the different
-*	environments.
-*	This function should not be called directly.  The cl_zalloc macro will
-*	redirect users to this function when memory tracking is enabled.
-*
-* SEE ALSO
-*	Memory Management, __cl_zalloc_ntrk, __cl_malloc_trk, __cl_free_trk
-**********/
-
-/****i* Public: Memory Management/__cl_malloc_ntrk
-* NAME
-*	__cl_malloc_ntrk
-*
-* DESCRIPTION
-*	The __cl_malloc_ntrk function allocates a block of memory.
-*
-* SYNOPSIS
-*/
-void __attribute__((deprecated)) *
-__cl_malloc_ntrk(
-	IN	const size_t		size );
-/*
-* PARAMETERS
-*	size
-*		[in] Size of the requested allocation.
-*
-* RETURN VALUES
-*	Pointer to allocated memory if successful.
-*
-*	NULL otherwise.
-*
-* NOTES
-*	Allocated memory follows alignment rules specific to the different
-*	environments.
-*	This function is should not be called directly.  The cl_malloc macro will
-*	redirect users to this function when memory tracking is not enabled.
-*
-* SEE ALSO
-*	Memory Management, __cl_malloc_trk, __cl_zalloc_ntrk, __cl_free_ntrk
-**********/
-
-/****i* Public: Memory Management/__cl_zalloc_ntrk
-* NAME
-*	__cl_zalloc_ntrk
-*
-* DESCRIPTION
-*	The __cl_zalloc_ntrk function allocates a block of memory
-*	initialized to zero.
-*
-* SYNOPSIS
-*/
-void __attribute__((deprecated)) *
-__cl_zalloc_ntrk(
-	IN	const size_t		bytes );
-/*
-* PARAMETERS
-*	size
-*		[in] Size of the requested allocation.
-*
-* RETURN VALUES
-*	Pointer to allocated memory if successful.
-*
-*	NULL otherwise.
-*
-* NOTES
-*	Allocated memory follows alignment rules specific to the different
-*	environments.
-*	This function should not be called directly.  The cl_zalloc macro will
-*	redirect users to this function when memory tracking is not enabled.
-*
-* SEE ALSO
-*	Memory Management, __cl_zalloc_trk, __cl_malloc_ntrk, __cl_free_ntrk
-**********/
-
-/****i* Public: Memory Management/__cl_free_trk
-* NAME
-*	__cl_free_trk
-*
-* DESCRIPTION
-*	The __cl_free_trk function deallocates a block of tracked memory.
-*
-* SYNOPSIS
-*/
-void __attribute__((deprecated))
-__cl_free_trk(
-  IN	const char* const	p_file_name,
-  IN	const int32_t		line_num,  
-  IN	void* const	p_memory );
-/*
-* PARAMETERS
-*	p_memory
-*		[in] Pointer to a memory block.
-*
-*	p_file_name
-*		[in] Name of the source file initiating the allocation.
-*
-*	line_num
-*		[in] Line number in the specified file where the allocation is
-*		initiated
-*
-* RETURN VALUE
-*	This function does not return a value.
-*
-* NOTES
-*	The p_memory parameter is the pointer returned by a previous call to
-*	__cl_malloc_trk, or __cl_zalloc_trk.
-*
-*	__cl_free_trk has no effect if p_memory is NULL.
-*
-*	This function should not be called directly.  The cl_free macro will
-*	redirect users to this function when memory tracking is enabled.
-*
-* SEE ALSO
-*	Memory Management, __cl_free_ntrk, __cl_malloc_trk, __cl_zalloc_trk
-**********/
-
-/****i* Public: Memory Management/__cl_free_ntrk
-* NAME
-*	__cl_free_ntrk
-*
-* DESCRIPTION
-*	The __cl_free_ntrk function deallocates a block of memory.
-*
-* SYNOPSIS
-*/
-void __attribute__((deprecated))
-__cl_free_ntrk(
-	IN	void* const	p_memory );
-/*
-* PARAMETERS
-*	p_memory
-*		[in] Pointer to a memory block.
-*
-* RETURN VALUE
-*	This function does not return a value.
-*
-* NOTES
-*	The p_memory parameter is the pointer returned by a previous call to
-*	__cl_malloc_ntrk, or __cl_zalloc_ntrk.
-*
-*	__cl_free_ntrk has no effect if p_memory is NULL.
-*
-*	This function should not be called directly.  The cl_free macro will
-*	redirect users to this function when memory tracking is not enabled.
-*
-* SEE ALSO
-*	Memory Management, __cl_free_ntrk, __cl_malloc_trk, __cl_zalloc_trk
-**********/
-
-/****f* Public: Memory Management/cl_malloc
-* NAME
-*	cl_malloc
-*
-* DESCRIPTION
-*	The cl_malloc function allocates a block of memory.
-*
-* SYNOPSIS
-*/
-void __attribute__((deprecated)) *
-cl_malloc(
-	IN	const size_t	size );
-/*
-* PARAMETERS
-*	size
-*		[in] Size of the requested allocation.
-*
-* RETURN VALUES
-*	Pointer to allocated memory if successful.
-*
-*	NULL otherwise.
-*
-* NOTES
-*	Allocated memory follows alignment rules specific to the different
-*	environments.
-*
-* SEE ALSO
-*	Memory Management, cl_free, cl_zalloc, cl_memset, cl_memclr, cl_memcpy, cl_memcmp
-**********/
-
-/****f* Public: Memory Management/cl_zalloc
-* NAME
-*	cl_zalloc
-*
-* DESCRIPTION
-*	The cl_zalloc function allocates a block of memory initialized to zero.
-*
-* SYNOPSIS
-*/
-void __attribute__((deprecated)) *
-cl_zalloc(
-	IN	const size_t	size );
-/*
-* PARAMETERS
-*	size
-*		[in] Size of the requested allocation.
-*
-* RETURN VALUES
-*	Pointer to allocated memory if successful.
-*
-*	NULL otherwise.
-*
-* NOTES
-*	Allocated memory follows alignment rules specific to the different
-*	environments.
-*
-* SEE ALSO
-*	Memory Management, cl_free, cl_malloc, cl_memset, cl_memclr, cl_memcpy, cl_memcmp
-**********/
-
-/****f* Public: Memory Management/cl_free
-* NAME
-*	cl_free
-*
-* DESCRIPTION
-*	The cl_free function deallocates a block of memory.
-*
-* SYNOPSIS
-*/
-void __attribute__((deprecated))
-cl_free(
-	IN	void* const	p_memory );
-/*
-* PARAMETERS
-*	p_memory
-*		[in] Pointer to a memory block.
-*
-* RETURN VALUE
-*	This function does not return a value.
-*
-* NOTES
-*	The p_memory parameter is the pointer returned by a previous call to
-*	cl_malloc, or cl_zalloc.
-*
-*	cl_free has no effect if p_memory is NULL.
-*
-* SEE ALSO
-*	Memory Management, cl_alloc, cl_zalloc
-**********/
-
-/****f* Public: Memory Management/cl_memset
-* NAME
-*	cl_memset
-*
-* DESCRIPTION
-*	The cl_memset function sets every byte in a memory range to a given value.
-*
-* SYNOPSIS
-*/
-void __attribute__((deprecated)) 
-cl_memset(
-	IN	void* const		p_memory,
-	IN	const uint8_t	fill,
-	IN	const size_t	count );
-/*
-* PARAMETERS
-*	p_memory
-*		[in] Pointer to a memory block.
-*
-*	fill
-*		[in] Byte value with which to fill the memory.
-*
-*	count
-*		[in] Number of bytes to set.
-*
-* RETURN VALUE
-*	This function does not return a value.
-*
-* SEE ALSO
-*	Memory Management, cl_memclr, cl_memcpy, cl_memcmp
-**********/
-
-/****f* Public: Memory Management/cl_memclr
-* NAME
-*	cl_memclr
-*
-* DESCRIPTION
-*	The cl_memclr function sets every byte in a memory range to zero.
-*
-* SYNOPSIS
-*/
-static inline void __attribute__((deprecated))
-cl_memclr(
-	IN	void* const		p_memory,
-	IN	const size_t	count )
-{
-	memset( p_memory, 0, count );
-}
-/*
-* PARAMETERS
-*	p_memory
-*		[in] Pointer to a memory block.
-*
-*	count
-*		[in] Number of bytes to set.
-*
-* RETURN VALUE
-*	This function does not return a value.
-*
-* SEE ALSO
-*	Memory Management, cl_memset, cl_memcpy, cl_memcmp
-**********/
-
-/****f* Public: Memory Management/cl_memcpy
-* NAME
-*	cl_memcpy
-*
-* DESCRIPTION
-*	The cl_memcpy function copies a given number of bytes from
-*	one buffer to another.
-*
-* SYNOPSIS
-*/
-void __attribute__((deprecated)) *
-cl_memcpy(
-	IN	void* const			p_dest,
-	IN	const void* const	p_src,
-	IN	const size_t		count );
-/*
-* PARAMETERS
-*	p_dest
-*		[in] Pointer to the buffer being copied to.
-*
-*	p_src
-*		[in] Pointer to the buffer being copied from.
-*
-*	count
-*		[in] Number of bytes to copy from the source buffer to the
-*		destination buffer.
-*
-* RETURN VALUE
-*	This function does not return a value.
-*
-* SEE ALSO
-*	Memory Management, cl_memset, cl_memclr, cl_memcmp
-**********/
-
-/****f* Public: Memory Management/cl_memcmp
-* NAME
-*	cl_memcmp
-*
-* DESCRIPTION
-*	The cl_memcmp function compares two memory buffers.
-*
-* SYNOPSIS
-*/
-int32_t  __attribute__((deprecated))
-cl_memcmp(
-	IN	const void* const	p_mem,
-	IN	const void* const	p_ref,
-	IN	const size_t		count );
-/*
-* PARAMETERS
-*	p_mem
-*		[in] Pointer to a memory block being compared.
-*
-*	p_ref
-*		[in] Pointer to the reference memory block to compare against.
-*
-*	count
-*		[in] Number of bytes to compare.
-*
-* RETURN VALUES
-*	Returns less than zero if p_mem is less than p_ref.
-*
-*	Returns greater than zero if p_mem is greater than p_ref.
-*
-*	Returns zero if the two memory regions are the identical.
-*
-* SEE ALSO
-*	Memory Management, cl_memset, cl_memclr, cl_memcpy
-**********/
-
-#if defined( CL_NO_TRACK_MEM ) && defined( CL_TRACK_MEM )
-	#error Conflict: Cannot define both CL_NO_TRACK_MEM and CL_TRACK_MEM.
-#endif
-
-/*
- * Turn on memory allocation tracking in debug builds if not explicitly
- * disabled or already turned on.
- */
-#if defined( _DEBUG_ ) && \
-	!defined( CL_NO_TRACK_MEM ) && \
-	!defined( CL_TRACK_MEM )
-	#define CL_TRACK_MEM
-#endif
-
-/*
- * Define allocation macro.
- */
-#if defined( CL_TRACK_MEM )
-
-#define cl_malloc( a )	\
-	__cl_malloc_trk( __FILE__, __LINE__, a )
-
-#define cl_zalloc( a )	\
-	__cl_zalloc_trk( __FILE__, __LINE__, a )
-
-#define cl_free( a )	\
-	__cl_free_trk( __FILE__, __LINE__, a )
-
-#else	/* !defined( CL_TRACK_MEM ) */
-
-#define cl_malloc( a )	\
-	__cl_malloc_ntrk( a )
-
-#define cl_zalloc( a )	\
-	__cl_zalloc_ntrk( a )
-
-#define cl_free( a )	\
-	__cl_free_ntrk( a )
-
-#endif	/* defined( CL_TRACK_MEM ) */
-
-END_C_DECLS
-
-#endif /* _CL_MEMORY_H_ */
diff --git a/osm/include/complib/cl_memtrack.h b/osm/include/complib/cl_memtrack.h
deleted file mode 100644
index 9e97136..0000000
--- a/osm/include/complib/cl_memtrack.h
+++ /dev/null
@@ -1,96 +0,0 @@
-/*
- * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved.
- * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved.
- * Copyright (c) 1996-2003 Intel Corporation. All rights reserved.
- *
- * This software is available to you under a choice of one of two
- * licenses.  You may choose to be licensed under the terms of the GNU
- * General Public License (GPL) Version 2, available from the file
- * COPYING in the main directory of this source tree, or the
- * OpenIB.org BSD license below:
- *
- *     Redistribution and use in source and binary forms, with or
- *     without modification, are permitted provided that the following
- *     conditions are met:
- *
- *      - Redistributions of source code must retain the above
- *        copyright notice, this list of conditions and the following
- *        disclaimer.
- *
- *      - Redistributions in binary form must reproduce the above
- *        copyright notice, this list of conditions and the following
- *        disclaimer in the documentation and/or other materials
- *        provided with the distribution.
- *
- * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
- * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
- * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
- * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
- * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
- * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
- * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
- * SOFTWARE.
- *
- */
-
-/*
- * Abstract:
- *	Definitions of Data-Structures for memory allocation tracking functions.
- *
- * Environment:
- *	All
- *
- * $Revision: 1.3 $
- */
-
-
-#ifndef _CL_MEMTRACK_H_
-#define _CL_MEMTRACK_H_
-
-#include <complib/cl_types.h>
-#include <complib/cl_memory.h>
-#include <complib/cl_debug.h>
-#include <complib/cl_qlist.h>
-#include <complib/cl_spinlock.h>
-
-#ifdef __cplusplus
-#  define BEGIN_C_DECLS extern "C" {
-#  define END_C_DECLS   }
-#else /* !__cplusplus */
-#  define BEGIN_C_DECLS
-#  define END_C_DECLS
-#endif /* __cplusplus */
-
-BEGIN_C_DECLS
-
-/* Structure to track memory allocations. */
-typedef struct _cl_mem_tracker
-{
-	/* List for tracking memory allocations. */
-	cl_qlist_t		alloc_list;
-
-	/* Lock for synchronization. */
-	cl_spinlock_t	lock;
-
-	/* List to manage free headers. */
-	cl_qlist_t		free_hdr_list;
-
-} cl_mem_tracker_t __attribute__((deprecated));
-
-#define FILE_NAME_LENGTH	64
-
-/* Header for all memory allocations. */
-typedef struct _cl_malloc_hdr
-{
-	cl_list_item_t		list_item;
-	void				*p_mem;
-	char				file_name[FILE_NAME_LENGTH];
-	int32_t				line_num;
-
-} cl_malloc_hdr_t __attribute__((deprecated));
-
-extern cl_mem_tracker_t		*gp_mem_tracker;
-
-END_C_DECLS
-
-#endif	/* _CL_MEMTRACK_H_ */
diff --git a/osm/include/complib/cl_perf.h b/osm/include/complib/cl_perf.h
deleted file mode 100644
index 522f23f..0000000
--- a/osm/include/complib/cl_perf.h
+++ /dev/null
@@ -1,708 +0,0 @@
-/*
- * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved.
- * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved.
- * Copyright (c) 1996-2003 Intel Corporation. All rights reserved.
- *
- * This software is available to you under a choice of one of two
- * licenses.  You may choose to be licensed under the terms of the GNU
- * General Public License (GPL) Version 2, available from the file
- * COPYING in the main directory of this source tree, or the
- * OpenIB.org BSD license below:
- *
- *     Redistribution and use in source and binary forms, with or
- *     without modification, are permitted provided that the following
- *     conditions are met:
- *
- *      - Redistributions of source code must retain the above
- *        copyright notice, this list of conditions and the following
- *        disclaimer.
- *
- *      - Redistributions in binary form must reproduce the above
- *        copyright notice, this list of conditions and the following
- *        disclaimer in the documentation and/or other materials
- *        provided with the distribution.
- *
- * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
- * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
- * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
- * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
- * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
- * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
- * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
- * SOFTWARE.
- *
- */
-
-/*
- * Abstract:
- *	Declaration of performance tracking.
- *
- * Environment:
- *	All
- *
- * $Revision: 1.3 $
- */
-
-#ifndef _CL_PERF_H_
-#define _CL_PERF_H_
-
-#include <complib/cl_types.h>
-#include <complib/cl_spinlock.h>
-#include <complib/cl_timer.h>
-
-#ifdef __cplusplus
-#  define BEGIN_C_DECLS extern "C" {
-#  define END_C_DECLS   }
-#else /* !__cplusplus */
-#  define BEGIN_C_DECLS
-#  define END_C_DECLS
-#endif /* __cplusplus */
-
-BEGIN_C_DECLS
-
-/****h* Component Library/Performance Counters
-* NAME
-*	Performance Counters
-*
-* DESCRIPTION
-*	The performance counters allows timing operations to benchmark
-*	software performance and help identify potential bottlenecks.
-*
-*	All performance counters are NULL macros when disabled, preventing them
-*	from adversly affecting performance in builds where the counters are not
-*	used.
-*
-*	Each counter records elapsed time in micro-seconds, minimum time elapsed,
-*	and total number of samples.
-*
-*	Each counter is independently protected by a spinlock, allowing use of
-*	the counters in multi-processor environments.
-*
-*	The impact of serializing access to performance counters is measured,
-*	allowing measurements to be corrected as necessary.
-*
-* NOTES
-*	Performance counters do impact performance, and should only be enabled
-*	when gathering data.  Counters can be enabled or disabled on a per-user
-*	basis at compile time.  To enable the counters, users should define
-*	the PERF_TRACK_ON keyword before including the cl_perf.h file.
-*	Undefining the PERF_TRACK_ON keyword disables the performance counters.
-*	When disabled, all performance tracking calls resolve to no-ops.
-*
-*	When using performance counters, it is the user's responsibility to
-*	maintain the counter indexes.  It is recomended that users define an
-*	enumerated type to use for counter indexes.  It improves readability
-*	and simplifies maintenance by reducing the work necessary in managing
-*	the counter indexes.
-*
-* SEE ALSO
-*	Structures:
-*		cl_perf_t
-*
-*	Initialization:
-*		cl_perf_construct, cl_perf_init, cl_perf_destroy
-*
-*	Manipulation
-*		cl_perf_reset, cl_perf_display, cl_perf_start, cl_perf_update,
-*		cl_perf_log, cl_perf_stop
-*
-*	Macros:
-*		PERF_DECLARE, PERF_DECLARE_START
-*********/
-
-/*
- * Number of times the counter calibration test is executed.  This is used
- * to determine the average time to use a performance counter.
- */
-#define PERF_CALIBRATION_TESTS		100000
-
-/****i* Component Library: Performance Counters/cl_perf_data_t
-* NAME
-*	cl_perf_data_t
-*
-* DESCRIPTION
-*	The cl_perf_data_t structure is used to tracking information
-*	for a single counter.
-*
-* SYNOPSIS
-*/
-typedef struct _cl_perf_data
-{
-	uint64_t		count;
-	uint64_t		total_time;
-	uint64_t		min_time;
-	cl_spinlock_t	lock;
-
-} cl_perf_data_t;
-/*
-* FIELDS
-*	count
-*		Number of samples in the counter.
-*
-*	total_time
-*		Total time for all samples, in microseconds.
-*
-*	min_time
-*		Minimum time for any sample in the counter, in microseconds.
-*
-*	lock
-*		Spinlock to serialize counter updates.
-*
-* SEE ALSO
-*	Performance Counters
-*********/
-
-/****i* Component Library: Performance Counters/cl_perf_t
-* NAME
-*	cl_perf_t
-*
-* DESCRIPTION
-*	The cl_perf_t structure serves as a container for a group of performance
-*	counters and related calibration data.
-*
-*	This structure should be treated as opaque and be manipulated only through
-*	the provided functions.
-*
-* SYNOPSIS
-*/
-typedef struct _cl_perf
-{
-	cl_perf_data_t	*data_array;
-	uintn_t			size;
-	uint64_t		locked_calibration_time;
-	uint64_t		normal_calibration_time;
-	cl_state_t		state;
-
-} cl_perf_t;
-/*
-* FIELDS
-*	data_array
-*		Pointer to the array of performance counters.
-*
-*	size
-*		Number of counters in the counter array.
-*
-*	locked_calibration_time
-*		Time needed to update counters while holding a spinlock.
-*
-*	normal_calibration_time
-*		Time needed to update counters while not holding a spinlock.
-*
-*	state
-*		State of the performance counter provider.
-*
-* SEE ALSO
-*	Performance Counters, cl_perf_data_t
-*********/
-
-/****f* Component Library: Performance Counters/cl_perf_construct
-* NAME
-*	cl_perf_construct
-*
-* DESCRIPTION
-*	The cl_perf_construct macro constructs a performance
-*	tracking container.
-*
-* SYNOPSIS
-*/
-void
-cl_perf_construct(
-	IN	cl_perf_t* const	p_perf );
-/*
-* PARAMETERS
-*	p_perf
-*		[in] Pointer to a performance counter container to construct.
-*
-* RETURN VALUE
-*	This function does not return a value.
-*
-* NOTES
-*	cl_perf_construct allows calling cl_perf_destroy without first calling
-*	cl_perf_init.
-*
-*	Calling cl_perf_construct is a prerequisite to calling any other
-*	perfromance counter function except cl_perf_init.
-*
-*	This function is implemented as a macro and has no effect when
-*	performance counters are disabled.
-*
-* SEE ALSO
-*	Performance Counters, cl_perf_init, cl_perf_destroy
-*********/
-
-/****f* Component Library: Performance Counters/cl_perf_init
-* NAME
-*	cl_perf_init
-*
-* DESCRIPTION
-*	The cl_perf_init function initializes a performance counter container
-*	for use.
-*
-* SYNOPSIS
-*/
-cl_status_t
-cl_perf_init(
-	IN	cl_perf_t* const	p_perf,
-	IN	const uintn_t		num_counters );
-/*
-* PARAMETERS
-*	p_perf
-*		[in] Pointer to a performance counter container to initalize.
-*
-*	num_cntrs
-*		[in] Number of counters to allocate in the container.
-*
-* RETURN VALUES
-*	CL_SUCCESS if initialization was successful.
-*
-*	CL_INSUFFICIENT_MEMORY if there was not enough memory to initialize
-*	the container.
-*
-*	CL_ERROR if an error was encountered initializing the locks for the
-*	performance counters.
-*
-* NOTES
-*	This function allocates all memory required for the requested number of
-*	counters and initializes all locks protecting those counters.  After a
-*	successful initialization, cl_perf_init calibrates the counters and
-*	resets their value.
-*
-*	This function is implemented as a macro and has no effect when
-*	performance counters are disabled.
-*
-* SEE ALSO
-*	Performance Counters, cl_perf_construct, cl_perf_destroy, cl_perf_display
-*********/
-
-/****f* Component Library: Performance Counters/cl_perf_destroy
-* NAME
-*	cl_perf_destroy
-*
-* DESCRIPTION
-*	The cl_perf_destroy function destroys a performance tracking container.
-*
-* SYNOPSIS
-*/
-void
-cl_perf_destroy(
-	IN	cl_perf_t* const	p_perf,
-	IN	const boolean_t		display );
-/*
-* PARAMETERS
-*	p_perf
-*		[in] Pointer to a performance counter container to destroy.
-*
-*	display
-*		[in] If TRUE, causes the performance counters to be displayed.
-*
-* RETURN VALUE
-*	This function does not return a value.
-*
-* NOTES
-*	cl_perf_destroy frees all resources allocated in a call to cl_perf_init.
-*	If the display parameter is set to TRUE, displays all counter values
-*	before deallocating resources.
-*
-*	This function should only be called after a call to cl_perf_construct
-*	or cl_perf_init.
-*
-*	This function is implemented as a macro and has no effect when
-*	performance counters are disabled.
-*
-* SEE ALSO
-*	Performance Counters, cl_perf_construct, cl_perf_init
-*********/
-
-/****f* Component Library: Performance Counters/cl_perf_reset
-* NAME
-*	cl_perf_reset
-*
-* DESCRIPTION
-*	The cl_perf_reset function resets the counters contained in
-*	a performance tracking container.
-*
-* SYNOPSIS
-*/
-void
-cl_perf_reset(
-	IN	cl_perf_t* const	p_perf );
-/*
-* PARAMETERS
-*	p_perf
-*		[in] Pointer to a performance counter container whose counters
-*		to reset.
-*
-* RETURN VALUE
-*	This function does not return a value.
-*
-* NOTES
-*	This function is implemented as a macro and has no effect when
-*	performance counters are disabled.
-*
-* SEE ALSO
-*	Performance Counters
-*********/
-
-/****f* Component Library: Performance Counters/cl_perf_display
-* NAME
-*	cl_perf_display
-*
-* DESCRIPTION
-*	The cl_perf_display function displays the current performance
-*	counter values.
-*
-* SYNOPSIS
-*/
-void
-cl_perf_display(
-	IN	const cl_perf_t* const	p_perf );
-/*
-* PARAMETERS
-*	p_perf
-*		[in] Pointer to a performance counter container whose counter
-*		values to display.
-*
-* RETURN VALUE
-*	This function does not return a value.
-*
-* NOTES
-*	This function is implemented as a macro and has no effect when
-*	performance counters are disabled.
-*
-* SEE ALSO
-*	Performance Counters, cl_perf_init
-*********/
-
-/****d* Component Library: Performance Counters/PERF_DECLARE
-* NAME
-*	PERF_DECLARE
-*
-* DESCRIPTION
-*	The PERF_DECLARE macro declares a performance counter variable used
-*	to store the starting time of a timing sequence.
-*
-* SYNOPSIS
-*	PERF_DECLARE( index )
-*
-* PARAMETERS
-*	index
-*		[in] Index of the performance counter for which to use this
-*		variable.
-*
-* NOTES
-*	Variables should generally be declared on the stack to support
-*	multi-threading.  In cases where a counter needs to be used to
-*	time operations accross multiple functions, care must be taken to
-*	ensure that the start time stored in this variable is not overwritten
-*	before the related performance counter has been updated.
-*
-*	This macro has no effect when performance counters are disabled.
-*
-* SEE ALSO
-*	Performance Counters, PERF_DECLARE_START, cl_perf_start, cl_perf_log,
-*	cl_perf_stop
-*********/
-
-/****d* Component Library: Performance Counters/PERF_DECLARE_START
-* NAME
-*	PERF_DECLARE_START
-*
-* DESCRIPTION
-*	The PERF_DECLARE_START macro declares a performance counter variable
-*	and sets it to the starting time of a timed sequence.
-*
-* SYNOPSIS
-*	PERF_DECLARE_START( index )
-*
-* PARAMETERS
-*	index
-*		[in] Index of the performance counter for which to use this
-*		variable.
-*
-* NOTES
-*	Variables should generally be declared on the stack to support
-*	multi-threading.
-*
-*	This macro has no effect when performance counters are disabled.
-*
-* SEE ALSO
-*	Performance Counters, PERF_DECLARE, cl_perf_start, cl_perf_log,
-*	cl_perf_stop
-*********/
-
-/****d* Component Library: Performance Counters/cl_perf_start
-* NAME
-*	cl_perf_start
-*
-* DESCRIPTION
-*	The cl_perf_start macro sets the starting value of a timed sequence.
-*
-* SYNOPSIS
-*/
-void
-cl_perf_start(
-	IN	const uintn_t index );
-/*
-* PARAMETERS
-*	index
-*		[in] Index of the performance counter to set.
-*
-* NOTES
-*	This macro has no effect when performance counters are disabled.
-*
-* SEE ALSO
-*	Performance Counters, PERF_DECLARE, PERF_DECLARE_START, cl_perf_log,
-*	cl_perf_update, cl_perf_stop
-*********/
-
-/****d* Component Library: Performance Counters/cl_perf_update
-* NAME
-*	cl_perf_update
-*
-* DESCRIPTION
-*	The cl_perf_update macro adds a timing sample based on a provided start
-*	time to a counter in a performance counter container.
-*
-* SYNOPSIS
-*/
-void
-cl_perf_update(
-	IN	cl_perf_t* const	p_perf,
-	IN	const uintn_t		index,
-	IN	const uint64_t		start_time );
-/*
-* PARAMETERS
-*	p_perf
-*		[in] Pointer to a performance counter container to whose counter
-*		the sample should be added.
-*
-*	index
-*		[in] Number of the performance counter to update with a new sample.
-*
-*	start_time
-*		[in] Timestamp to use as the start time for the timing sample.
-*
-* RETURN VALUE
-*	This function does not return a value.
-*
-* NOTES
-*	This macro has no effect when performance counters are disabled.
-*
-* SEE ALSO
-*	Performance Counters, PERF_DECLARE, PERF_DECLARE_START, cl_perf_start,
-*	cl_perf_lob, cl_perf_stop
-*********/
-
-/****d* Component Library: Performance Counters/cl_perf_log
-* NAME
-*	cl_perf_log
-*
-* DESCRIPTION
-*	The cl_perf_log macro adds a given timing sample to a
-*	counter in a performance counter container.
-*
-* SYNOPSIS
-*/
-void
-cl_perf_log(
-	IN	cl_perf_t* const	p_perf,
-	IN	const uintn_t		index,
-	IN	const uint64_t		pc_total_time );
-/*
-* PARAMETERS
-*	p_perf
-*		[in] Pointer to a performance counter container to whose counter
-*		the sample should be added.
-*
-*	index
-*		[in] Number of the performance counter to update with a new sample.
-*
-*	pc_total_time
-*		[in] Total elapsed time for the sample being added.
-*
-* RETURN VALUE
-*	This function does not return a value.
-*
-* NOTES
-*	This macro has no effect when performance counters are disabled.
-*
-* SEE ALSO
-*	Performance Counters, PERF_DECLARE, PERF_DECLARE_START, cl_perf_start,
-*	cl_perf_update, cl_perf_stop
-*********/
-
-/****d* Component Library: Performance Counters/cl_perf_stop
-* NAME
-*	cl_perf_stop
-*
-* DESCRIPTION
-*	The cl_perf_log macro updates a counter in a performance counter
-*	container with a new timing sample.
-*
-* SYNOPSIS
-*/
-void
-cl_perf_stop(
-	IN	cl_perf_t* const	p_perf,
-	IN	const uintn_t		index );
-/*
-* PARAMETERS
-*	p_perf
-*		[in] Pointer to a performance counter container to whose counter
-*		a sample should be added.
-*
-*	index
-*		[in] Number of the performance counter to update with a new sample.
-*
-* RETURN VALUE
-*	This function does not return a value.
-*
-* NOTES
-*	The ending time stamp is taken and elapsed time calculated before updating
-*	the specified counter.
-*
-*	This macro has no effect when performance counters are disabled.
-*
-* SEE ALSO
-*	Performance Counters, PERF_DECLARE, PERF_DECLARE_START, cl_perf_start,
-*	cl_perf_log
-*********/
-
-/*
- * PERF_TRACK_ON must be defined by the user before including this file to
- * enable performance tracking.  To disable tracking, users should undefine
- * PERF_TRACK_ON.
- */
-#if defined( PERF_TRACK_ON )
-/*
- * Enable performance tracking.
- */
-
-#define cl_perf_construct( p_perf ) \
-	__cl_perf_construct( p_perf )
-#define cl_perf_init( p_perf, num_counters ) \
-	__cl_perf_init( p_perf, num_counters )
-#define cl_perf_destroy( p_perf, display ) \
-	__cl_perf_destroy( p_perf, display )
-#define cl_perf_reset( p_perf ) \
-	__cl_perf_reset( p_perf )
-#define cl_perf_display( p_perf ) \
-	__cl_perf_display( p_perf )
-#define PERF_DECLARE( index ) \
-	uint64_t Pc##index
-#define PERF_DECLARE_START( index ) \
-	uint64 Pc##index = cl_get_time_stamp()
-#define cl_perf_start( index ) \
-	(Pc##index = cl_get_time_stamp())
-#define cl_perf_log( p_perf, index, pc_total_time ) \
-{\
-	/* Update the performance data.  This requires synchronization. */ \
-	cl_spinlock_acquire( &((cl_perf_t*)p_perf)->data_array[index].lock ); \
-	\
-	((cl_perf_t*)p_perf)->data_array[index].total_time += pc_total_time; \
-	((cl_perf_t*)p_perf)->data_array[index].count++; \
-	if( pc_total_time < ((cl_perf_t*)p_perf)->data_array[index].min_time ) \
-		((cl_perf_t*)p_perf)->data_array[index].min_time = pc_total_time; \
-	\
-	cl_spinlock_release( &((cl_perf_t*)p_perf)->data_array[index].lock );  \
-}
-#define cl_perf_update( p_perf, index, start_time )	\
-{\
-	/* Get the ending time stamp, and calculate the total time. */ \
-	uint64_t pc_total_time = cl_get_time_stamp() - start_time;\
-	/* Using stack variable for start time, stop and log  */ \
-	cl_perf_log( p_perf, index, pc_total_time ); \
-}
-
-#define cl_perf_stop( p_perf, index ) \
-{\
-	cl_perf_update( p_perf, index, Pc##index );\
-}
-
-#define cl_get_perf_values( p_perf, index, p_total, p_min, p_count )	\
-{\
-	*p_total = p_perf->data_array[index].total_time;	\
-	*p_min = p_perf->data_array[index].min_time;		\
-	*p_count = p_perf->data_array[index].count;			\
-}
-
-#define cl_get_perf_calibration( p_perf, p_locked_time, p_normal_time )	\
-{\
-	*p_locked_time = p_perf->locked_calibration_time;	\
-	*p_normal_time = p_perf->normal_calibration_time;	\
-}
-
-#define cl_get_perf_string( p_perf, i )	\
-"CL Perf:\t%lu\t%"PRIu64"\t%"PRIu64"\t%"PRIu64"\n",	\
-			i, p_perf->data_array[i].total_time,	\
-			p_perf->data_array[i].min_time, p_perf->data_array[i].count
-
-#else	/* PERF_TRACK_ON */
-/*
- * Disable performance tracking.
- */
-
-#define cl_perf_construct( p_perf )
-#define cl_perf_init( p_perf, num_cntrs )		CL_SUCCESS
-#define cl_perf_destroy( p_perf, display )
-#define cl_perf_reset( p_perf )
-#define cl_perf_display( p_perf )
-#define PERF_DECLARE( index )
-#define PERF_DECLARE_START( index )
-#define cl_perf_start( index )
-#define cl_perf_log( p_perf, index, pc_total_time )
-#define cl_perf_upadate( p_perf, index, start_time )
-#define cl_perf_stop( p_perf, index )
-#define cl_get_perf_values( p_perf, index, p_total, p_min, p_count )
-#define cl_get_perf_calibration( p_perf, p_locked_time, p_normal_time )
-#endif	/* PERF_TRACK_ON */
-
-/*
- * Internal performance tracking functions.  Users should never call these
- * functions directly.  Instead, use the macros defined above to resolve
- * to these functions when PERF_TRACK_ON is defined, which allows disabling
- * performance tracking.
- */
-
-/*
- * Initialize the state of the performance tracking structure.
- */
-void
-__cl_perf_construct(
-	IN	cl_perf_t* const		p_perf );
-
-/*
- * Size the performance tracking information and initialize all
- * related structures.
- */
-cl_status_t
-__cl_perf_init(
-	IN	cl_perf_t* const		p_perf,
-	IN	const uintn_t			num_counters );
-
-/*
- * Destroy the performance tracking data.
- */
-void
-__cl_perf_destroy(
-	IN	cl_perf_t* const		p_perf,
-	IN	const boolean_t			display );
-
-/*
- * Reset the performance tracking data.
- */
-void
-__cl_perf_reset(
-	IN	cl_perf_t* const		p_perf );
-
-/*
- * Display the current performance tracking data.
- */
-void
-__cl_perf_display(
-	IN	const cl_perf_t* const	p_perf );
-
-
-END_C_DECLS
-
-#endif	/* _CL_PERF_H_ */
-- 
1.5.0.1.40.gb40d


From swise at opengridcomputing.com  Mon Feb 19 15:24:20 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Mon, 19 Feb 2007 17:24:20 -0600
Subject: [openib-general] OFA 1.2 tarball creation
In-Reply-To: <000001c75476$bcf2eea0$6dcd180a@amr.corp.intel.com>
References: <000001c75476$bcf2eea0$6dcd180a@amr.corp.intel.com>
Message-ID: <1171927460.8180.70.camel@stevo-desktop>

On Mon, 2007-02-19 at 14:38 -0800, Sean Hefty wrote:
> >How exactly is various developers' source code pulled together to create
> >the nightly OFA tarballs at www.openfabrics.org/builds (could this be
> >put on the wiki somewhere?)?  I went looking to see if some of Sean's
> >work on RDMA CM had made it into these tarballs, and am not seeing code
> >with the patches I'm looking for.
> 
> I do not know how OFED creates their tarballs or manages their source.
> 

> >The exact patch I'm after was 'rdma_cm: allow joins to return a unique
> >address'.  I remember seeing this patch on the ofed_1_2 branch in Sean's
> >rdma-dev git repository about two weeks ago, though I don't see the
> >ofed_1_2 branch anymore (the patch does exist on the multicast branch).
> >  Sean, was this patch supposed to make it to the nightly 1.2 tarballs?
> 
> Assuming that ~vlad/ofed_1_2.git is the OFED kernel tree, then this patch does
> not appear to be included.  I was asked by OFED to publish an ofed_1_2 branch,
> which I did, but I do not know if it was used in constructing the OFED tree.
> The patch was intended to go into OFED.
> 

The ofed_1_2 tree has the 2.6.20 drivers/modules in drivers/infiniband.
They are, I think, the stock 2.6.20 drivers and modules.  If there are
fixes to any driver post 2.6.20, then patches get created in
kernel_patches/fixes directory.  These are applied as part of the
configuration process when the tree is being built.   Look in there to
see if your change is in the form of a patch file.

So you can't necessarily look at ofed_1_2/drivers/infiniband/core for
the exact code, because it may get modified/patched as part of the
configure done on the kernel tree during build/installation.  Note in
addition to patches from kernel_patches/fixes, there are other patches
applied based on the kernel version (backports).  

Also, Sean:  If you have changes that you want in OFED 1.2, you need to
explicitly post patches to vlad and cc the group.  They don't, by
process, go pull your kernel tree to get the latest.  This is different
from the user libs which they pull each time they create a user package.

Vlad/Michael, correct me if I'm wrong here.  This all took me some time
to understand and perhaps documenting this would help folks...


Steve.


> - Sean
> 
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
> 


From afriedle at open-mpi.org  Mon Feb 19 18:41:49 2007
From: afriedle at open-mpi.org (Andrew Friedley)
Date: Mon, 19 Feb 2007 18:41:49 -0800
Subject: [openib-general] OFA 1.2 tarball creation
In-Reply-To: <1171927460.8180.70.camel@stevo-desktop>
References: <000001c75476$bcf2eea0$6dcd180a@amr.corp.intel.com>
	<1171927460.8180.70.camel@stevo-desktop>
Message-ID: <45DA5FED.709@open-mpi.org>

Steve Wise wrote:
> The ofed_1_2 tree has the 2.6.20 drivers/modules in drivers/infiniband.
> They are, I think, the stock 2.6.20 drivers and modules.  If there are
> fixes to any driver post 2.6.20, then patches get created in
> kernel_patches/fixes directory.  These are applied as part of the
> configuration process when the tree is being built.   Look in there to
> see if your change is in the form of a patch file.
> 
> So you can't necessarily look at ofed_1_2/drivers/infiniband/core for
> the exact code, because it may get modified/patched as part of the
> configure done on the kernel tree during build/installation.  Note in
> addition to patches from kernel_patches/fixes, there are other patches
> applied based on the kernel version (backports).  

Thanks for the information.  I found what I'm after in 
'merged_sean_rdma_dev_ofed_1_2.patch' -- should have found that on my 
own.  It's good to know this is going into the builds though, that 
wasn't obvious to me.

Andrew


From bunk at stusta.de  Mon Feb 19 16:02:11 2007
From: bunk at stusta.de (Adrian Bunk)
Date: Tue, 20 Feb 2007 01:02:11 +0100
Subject: [openib-general] [2.6 patch] drivers/infiniband/hw/cxgb3/: possible
	cleanups
Message-ID: <20070220000211.GZ13958@stusta.de>

This patch contains the following possible cleanups:
- don't mark static functions in C files as inline - gcc should know
  best whether inlining makes sense
- never compile the unused cxio_dbg.c
- make the following needlessly global functions static:
  - cxio_hal.c: cxio_hal_clear_qp_ctx()
  - iwch_provider.c: iwch_get_qp()
- #if 0 the following unused global functions:
  - cxio_hal.c: cxio_allocate_stag()
  - cxio_resource.: cxio_hal_get_rhdl()
  - cxio_resource.: cxio_hal_put_rhdl()

Signed-off-by: Adrian Bunk <bunk at stusta.de>

---

 drivers/infiniband/hw/cxgb3/Makefile        |    1 
 drivers/infiniband/hw/cxgb3/cxio_hal.c      |   22 +++++++--------
 drivers/infiniband/hw/cxgb3/cxio_hal.h      |    5 ---
 drivers/infiniband/hw/cxgb3/cxio_resource.c |    8 ++++-
 drivers/infiniband/hw/cxgb3/iwch_cm.c       |    5 +--
 drivers/infiniband/hw/cxgb3/iwch_provider.c |    2 -
 drivers/infiniband/hw/cxgb3/iwch_provider.h |    1 
 drivers/infiniband/hw/cxgb3/iwch_qp.c       |   29 ++++++++------------
 8 files changed, 33 insertions(+), 40 deletions(-)

--- linux-2.6.20-mm1/drivers/infiniband/hw/cxgb3/Makefile.old	2007-02-17 17:21:03.000000000 +0100
+++ linux-2.6.20-mm1/drivers/infiniband/hw/cxgb3/Makefile	2007-02-17 17:21:08.000000000 +0100
@@ -8,5 +8,4 @@
 
 ifdef CONFIG_INFINIBAND_CXGB3_DEBUG
 EXTRA_CFLAGS += -DDEBUG
-iw_cxgb3-y += cxio_dbg.o
 endif
--- linux-2.6.20-mm1/drivers/infiniband/hw/cxgb3/cxio_hal.h.old	2007-02-17 17:22:53.000000000 +0100
+++ linux-2.6.20-mm1/drivers/infiniband/hw/cxgb3/cxio_hal.h	2007-02-17 17:25:08.000000000 +0100
@@ -144,7 +144,6 @@
 void cxio_rdev_close(struct cxio_rdev *rdev);
 int cxio_hal_cq_op(struct cxio_rdev *rdev, struct t3_cq *cq,
 		   enum t3_cq_opcode op, u32 credit);
-int cxio_hal_clear_qp_ctx(struct cxio_rdev *rdev, u32 qpid);
 int cxio_create_cq(struct cxio_rdev *rdev, struct t3_cq *cq);
 int cxio_destroy_cq(struct cxio_rdev *rdev, struct t3_cq *cq);
 int cxio_resize_cq(struct cxio_rdev *rdev, struct t3_cq *cq);
@@ -155,8 +154,6 @@
 int cxio_destroy_qp(struct cxio_rdev *rdev, struct t3_wq *wq,
 		    struct cxio_ucontext *uctx);
 int cxio_peek_cq(struct t3_wq *wr, struct t3_cq *cq, int opcode);
-int cxio_allocate_stag(struct cxio_rdev *rdev, u32 * stag, u32 pdid,
-		       enum tpt_mem_perm perm, u32 * pbl_size, u32 * pbl_addr);
 int cxio_register_phys_mem(struct cxio_rdev *rdev, u32 * stag, u32 pdid,
 			   enum tpt_mem_perm perm, u32 zbva, u64 to, u32 len,
 			   u8 page_size, __be64 *pbl, u32 *pbl_size,
@@ -172,8 +169,6 @@
 int cxio_rdma_init(struct cxio_rdev *rdev, struct t3_rdma_init_attr *attr);
 void cxio_register_ev_cb(cxio_hal_ev_callback_func_t ev_cb);
 void cxio_unregister_ev_cb(cxio_hal_ev_callback_func_t ev_cb);
-u32 cxio_hal_get_rhdl(void);
-void cxio_hal_put_rhdl(u32 rhdl);
 u32 cxio_hal_get_pdid(struct cxio_hal_resource *rscp);
 void cxio_hal_put_pdid(struct cxio_hal_resource *rscp, u32 pdid);
 int __init cxio_hal_init(void);
--- linux-2.6.20-mm1/drivers/infiniband/hw/cxgb3/cxio_hal.c.old	2007-02-17 17:23:11.000000000 +0100
+++ linux-2.6.20-mm1/drivers/infiniband/hw/cxgb3/cxio_hal.c	2007-02-17 17:36:40.000000000 +0100
@@ -46,7 +46,7 @@
 static LIST_HEAD(rdev_list);
 static cxio_hal_ev_callback_func_t cxio_ev_cb = NULL;
 
-static inline struct cxio_rdev *cxio_hal_find_rdev_by_name(char *dev_name)
+static struct cxio_rdev *cxio_hal_find_rdev_by_name(char *dev_name)
 {
 	struct cxio_rdev *rdev;
 
@@ -56,8 +56,7 @@
 	return NULL;
 }
 
-static inline struct cxio_rdev *cxio_hal_find_rdev_by_t3cdev(struct t3cdev
-							     *tdev)
+static struct cxio_rdev *cxio_hal_find_rdev_by_t3cdev(struct t3cdev *tdev)
 {
 	struct cxio_rdev *rdev;
 
@@ -119,7 +118,7 @@
 	return 0;
 }
 
-static inline int cxio_hal_clear_cq_ctx(struct cxio_rdev *rdev_p, u32 cqid)
+static int cxio_hal_clear_cq_ctx(struct cxio_rdev *rdev_p, u32 cqid)
 {
 	struct rdma_cq_setup setup;
 	setup.id = cqid;
@@ -131,7 +130,7 @@
 	return (rdev_p->t3cdev_p->ctl(rdev_p->t3cdev_p, RDMA_CQ_SETUP, &setup));
 }
 
-int cxio_hal_clear_qp_ctx(struct cxio_rdev *rdev_p, u32 qpid)
+static int cxio_hal_clear_qp_ctx(struct cxio_rdev *rdev_p, u32 qpid)
 {
 	u64 sge_cmd;
 	struct t3_modify_qp_wr *wqe;
@@ -426,7 +425,7 @@
 	}
 }
 
-static inline int cqe_completes_wr(struct t3_cqe *cqe, struct t3_wq *wq)
+static int cqe_completes_wr(struct t3_cqe *cqe, struct t3_wq *wq)
 {
 	if (CQE_OPCODE(*cqe) == T3_TERMINATE)
 		return 0;
@@ -761,6 +760,7 @@
 	return err;
 }
 
+#if 0
 /* IN : stag key, pdid, pbl_size
  * Out: stag index, actaul pbl_size, and pbl_addr allocated.
  */
@@ -771,6 +771,7 @@
 	return (__cxio_tpt_op(rdev_p, 0, stag, 0, pdid, TPT_NON_SHARED_MR,
 			      perm, 0, 0ULL, 0, 0, NULL, pbl_size, pbl_addr));
 }
+#endif  /*  0  */
 
 int cxio_register_phys_mem(struct cxio_rdev *rdev_p, u32 *stag, u32 pdid,
 			   enum tpt_mem_perm perm, u32 zbva, u64 to, u32 len,
@@ -1030,7 +1031,7 @@
 	cxio_hal_destroy_rhdl_resource();
 }
 
-static inline void flush_completed_wrs(struct t3_wq *wq, struct t3_cq *cq)
+static void flush_completed_wrs(struct t3_wq *wq, struct t3_cq *cq)
 {
 	struct t3_swsq *sqp;
 	__u32 ptr = wq->sq_rptr;
@@ -1059,9 +1060,8 @@
 			break;
 }
 
-static inline void create_read_req_cqe(struct t3_wq *wq,
-				       struct t3_cqe *hw_cqe,
-				       struct t3_cqe *read_cqe)
+static void create_read_req_cqe(struct t3_wq *wq, struct t3_cqe *hw_cqe,
+				struct t3_cqe *read_cqe)
 {
 	read_cqe->u.scqe.wrid_hi = wq->oldest_read->sq_wptr;
 	read_cqe->len = wq->oldest_read->read_len;
@@ -1074,7 +1074,7 @@
 /*
  * Return a ptr to the next read wr in the SWSQ or NULL.
  */
-static inline void advance_oldest_read(struct t3_wq *wq)
+static void advance_oldest_read(struct t3_wq *wq)
 {
 
 	u32 rptr = wq->oldest_read - wq->sq + 1;
--- linux-2.6.20-mm1/drivers/infiniband/hw/cxgb3/cxio_resource.c.old	2007-02-17 17:24:42.000000000 +0100
+++ linux-2.6.20-mm1/drivers/infiniband/hw/cxgb3/cxio_resource.c	2007-02-17 17:27:17.000000000 +0100
@@ -180,7 +180,7 @@
 /*
  * returns 0 if no resource available
  */
-static inline u32 cxio_hal_get_resource(struct kfifo *fifo)
+static u32 cxio_hal_get_resource(struct kfifo *fifo)
 {
 	u32 entry;
 	if (kfifo_get(fifo, (unsigned char *) &entry, sizeof(u32)))
@@ -189,11 +189,13 @@
 		return 0;	/* fifo emptry */
 }
 
-static inline void cxio_hal_put_resource(struct kfifo *fifo, u32 entry)
+static void cxio_hal_put_resource(struct kfifo *fifo, u32 entry)
 {
 	BUG_ON(kfifo_put(fifo, (unsigned char *) &entry, sizeof(u32)) == 0);
 }
 
+#if 0
+
 u32 cxio_hal_get_rhdl(void)
 {
 	return cxio_hal_get_resource(rhdl_fifo);
@@ -204,6 +206,8 @@
 	cxio_hal_put_resource(rhdl_fifo, rhdl);
 }
 
+#endif  /*  0  */
+
 u32 cxio_hal_get_stag(struct cxio_hal_resource *rscp)
 {
 	return cxio_hal_get_resource(rscp->tpt_fifo);
--- linux-2.6.20-mm1/drivers/infiniband/hw/cxgb3/iwch_provider.h.old	2007-02-17 17:25:35.000000000 +0100
+++ linux-2.6.20-mm1/drivers/infiniband/hw/cxgb3/iwch_provider.h	2007-02-17 17:25:41.000000000 +0100
@@ -179,7 +179,6 @@
 
 void iwch_qp_add_ref(struct ib_qp *qp);
 void iwch_qp_rem_ref(struct ib_qp *qp);
-struct ib_qp *iwch_get_qp(struct ib_device *dev, int qpn);
 
 struct iwch_ucontext {
 	struct ib_ucontext ibucontext;
--- linux-2.6.20-mm1/drivers/infiniband/hw/cxgb3/iwch_provider.c.old	2007-02-17 17:25:50.000000000 +0100
+++ linux-2.6.20-mm1/drivers/infiniband/hw/cxgb3/iwch_provider.c	2007-02-17 17:25:57.000000000 +0100
@@ -949,7 +949,7 @@
 	        wake_up(&(to_iwch_qp(qp)->wait));
 }
 
-struct ib_qp *iwch_get_qp(struct ib_device *dev, int qpn)
+static struct ib_qp *iwch_get_qp(struct ib_device *dev, int qpn)
 {
 	PDBG("%s ib_dev %p qpn 0x%x\n", __FUNCTION__, dev, qpn);
 	return (struct ib_qp *)get_qhp(to_iwch_dev(dev), qpn);
--- linux-2.6.20-mm1/drivers/infiniband/hw/cxgb3/iwch_qp.c.old	2007-02-17 17:27:31.000000000 +0100
+++ linux-2.6.20-mm1/drivers/infiniband/hw/cxgb3/iwch_qp.c	2007-02-17 17:38:07.000000000 +0100
@@ -37,8 +37,8 @@
 
 #define NO_SUPPORT -1
 
-static inline int iwch_build_rdma_send(union t3_wr *wqe, struct ib_send_wr *wr,
-				       u8 * flit_cnt)
+static int iwch_build_rdma_send(union t3_wr *wqe, struct ib_send_wr *wr,
+				u8 * flit_cnt)
 {
 	int i;
 	u32 plen;
@@ -97,8 +97,8 @@
 	return 0;
 }
 
-static inline int iwch_build_rdma_write(union t3_wr *wqe, struct ib_send_wr *wr,
-					u8 *flit_cnt)
+static int iwch_build_rdma_write(union t3_wr *wqe, struct ib_send_wr *wr,
+				 u8 *flit_cnt)
 {
 	int i;
 	u32 plen;
@@ -138,8 +138,8 @@
 	return 0;
 }
 
-static inline int iwch_build_rdma_read(union t3_wr *wqe, struct ib_send_wr *wr,
-				       u8 *flit_cnt)
+static int iwch_build_rdma_read(union t3_wr *wqe, struct ib_send_wr *wr,
+				u8 *flit_cnt)
 {
 	if (wr->num_sge > 1)
 		return -EINVAL;
@@ -159,9 +159,8 @@
 /*
  * TBD: this is going to be moved to firmware. Missing pdid/qpid check for now.
  */
-static inline int iwch_sgl2pbl_map(struct iwch_dev *rhp,
-				   struct ib_sge *sg_list, u32 num_sgle,
-				   u32 * pbl_addr, u8 * page_size)
+static int iwch_sgl2pbl_map(struct iwch_dev *rhp, struct ib_sge *sg_list,
+			    u32 num_sgle, u32 * pbl_addr, u8 * page_size)
 {
 	int i;
 	struct iwch_mr *mhp;
@@ -207,9 +206,8 @@
 	return 0;
 }
 
-static inline int iwch_build_rdma_recv(struct iwch_dev *rhp,
-						    union t3_wr *wqe,
-						    struct ib_recv_wr *wr)
+static int iwch_build_rdma_recv(struct iwch_dev *rhp, union t3_wr *wqe,
+				struct ib_recv_wr *wr)
 {
 	int i, err = 0;
 	u32 pbl_addr[4];
@@ -474,8 +472,7 @@
 	return err;
 }
 
-static inline void build_term_codes(int t3err, u8 *layer_type, u8 *ecode,
-				    int tagged)
+static void build_term_codes(int t3err, u8 *layer_type, u8 *ecode, int tagged)
 {
 	switch (t3err) {
 	case TPT_ERR_STAG:
@@ -673,7 +670,7 @@
 	spin_lock_irqsave(&qhp->lock, *flag);
 }
 
-static inline void flush_qp(struct iwch_qp *qhp, unsigned long *flag)
+static void flush_qp(struct iwch_qp *qhp, unsigned long *flag)
 {
 	if (t3b_device(qhp->rhp))
 		cxio_set_wq_in_error(&qhp->wq);
@@ -685,7 +682,7 @@
 /*
  * Return non zero if at least one RECV was pre-posted.
  */
-static inline int rqes_posted(struct iwch_qp *qhp)
+static int rqes_posted(struct iwch_qp *qhp)
 {
 	return fw_riwrh_opcode((struct fw_riwrh *)qhp->wq.queue) == T3_WR_RCV;
 }
--- linux-2.6.20-mm1/drivers/infiniband/hw/cxgb3/iwch_cm.c.old	2007-02-17 17:27:53.000000000 +0100
+++ linux-2.6.20-mm1/drivers/infiniband/hw/cxgb3/iwch_cm.c	2007-02-17 17:38:23.000000000 +0100
@@ -210,8 +210,7 @@
 	return state;
 }
 
-static inline void __state_set(struct iwch_ep_common *epc,
-			       enum iwch_ep_state new)
+static void __state_set(struct iwch_ep_common *epc, enum iwch_ep_state new)
 {
 	epc->state = new;
 }
@@ -1460,7 +1459,7 @@
 /*
  * Returns whether an ABORT_REQ_RSS message is a negative advice.
  */
-static inline int is_neg_adv_abort(unsigned int status)
+static int is_neg_adv_abort(unsigned int status)
 {
 	return status == CPL_ERR_RTX_NEG_ADVICE ||
 	       status == CPL_ERR_PERSIST_NEG_ADVICE;


From bunk at stusta.de  Mon Feb 19 16:02:13 2007
From: bunk at stusta.de (Adrian Bunk)
Date: Tue, 20 Feb 2007 01:02:13 +0100
Subject: [openib-general] [2.6 patch] infiniband/hw/mthca/mthca_mr.c: make 2
 functions static
Message-ID: <20070220000213.GA13958@stusta.de>

This patch makes two needlessly global functions static.

Signed-off-by: Adrian Bunk <bunk at stusta.de>

---

 drivers/infiniband/hw/mthca/mthca_mr.c |   10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

--- linux-2.6.20-mm1/drivers/infiniband/hw/mthca/mthca_mr.c.old	2007-02-17 17:41:39.000000000 +0100
+++ linux-2.6.20-mm1/drivers/infiniband/hw/mthca/mthca_mr.c	2007-02-17 17:42:22.000000000 +0100
@@ -310,8 +310,9 @@
 	return mthca_is_memfree(dev) ? (PAGE_SIZE / sizeof (u64)) : 0x7ffffff;
 }
 
-void mthca_tavor_write_mtt_seg(struct mthca_dev *dev, struct mthca_mtt *mtt,
-			      int start_index, u64 *buffer_list, int list_len)
+static void mthca_tavor_write_mtt_seg(struct mthca_dev *dev,
+				      struct mthca_mtt *mtt, int start_index,
+				      u64 *buffer_list, int list_len)
 {
 	u64 __iomem *mtts;
 	int i;
@@ -323,8 +324,9 @@
 				  mtts + i);
 }
 
-void mthca_arbel_write_mtt_seg(struct mthca_dev *dev, struct mthca_mtt *mtt,
-			      int start_index, u64 *buffer_list, int list_len)
+static void mthca_arbel_write_mtt_seg(struct mthca_dev *dev,
+				      struct mthca_mtt *mtt, int start_index,
+				      u64 *buffer_list, int list_len)
 {
 	__be64 *mtts;
 	dma_addr_t dma_handle;


From arlin.r.davis at intel.com  Mon Feb 19 16:09:07 2007
From: arlin.r.davis at intel.com (Davis, Arlin R)
Date: Mon, 19 Feb 2007 16:09:07 -0800
Subject: [openib-general]  uDAPL: RDMA Write example
Message-ID: <B0095134066CC94FBC80973103FFA1FE031CE077@orsmsx416.amr.corp.intel.com>

Christian,

 
dtest is a simple dapl test with snd/rcv and rdma write/read examples.

 
http://www.openfabrics.org/git/?p=~ardavis/dapl.git;a=blob;f=test/dtest/
dtest.c

 
-arlin

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070219/c07a7f58/attachment.html>

From rdreier at cisco.com  Mon Feb 19 16:11:27 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Mon, 19 Feb 2007 16:11:27 -0800
Subject: [openib-general] [2.6 patch] infiniband/hw/mthca/mthca_mr.c:
 make 2 functions static
In-Reply-To: <20070220000213.GA13958@stusta.de> (Adrian Bunk's message
	of "Tue, 20 Feb 2007 01:02:13 +0100")
References: <20070220000213.GA13958@stusta.de>
Message-ID: <adaps85y9g0.fsf@cisco.com>

Queued for my next merge, thanks.


From monil at voltaire.com  Mon Feb 19 22:15:04 2007
From: monil at voltaire.com (Moni Levy)
Date: Tue, 20 Feb 2007 08:15:04 +0200
Subject: [openib-general] [PATCH] IB/ipoib: Fix ipoib handling for pkey
 reordering
In-Reply-To: <45D9915B.6070202@voltaire.com>
References: <45D9915B.6070202@voltaire.com>
Message-ID: <6a122cc00702192215s6ef799abud5ebd27951dbab8b@mail.gmail.com>

On 2/19/07, Moni Levy <monil at voltaire.com> wrote:
> This issue was found during partitioning & SM fail over testing. The fix was tested for 24
> hours with pkey reshuffling every few seconds. The patch applies to Roland's master
> branch.

I found an issue with that patch, I'll post an updated one soon.

-- Moni

>
> SM reconfiguration or failover possibly causes a shuffling of the values in the port pkey
> table. The current implementation only queries for the index of the pkey once, when it
> creates the device QP and after that moves it into working state, and hence does not
> address this scenario. Fix this by using the PKEY_CHANGE event as a trigger to
> reconfigure the device QP.
>
> Signed-off-by: Moni Levy <monil at voltaire.com>
> ---
>  ipoib.h       |    2 ++
>  ipoib_ib.c    |   22 ++++++++++++++++++++--
>  ipoib_main.c  |    1 +
>  ipoib_verbs.c |    4 +++-
>  4 files changed, 26 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h b/drivers/infiniband/ulp/ipoib/ipoib.h
> index 07deee8..ed854e8 100644
> --- a/drivers/infiniband/ulp/ipoib/ipoib.h
> +++ b/drivers/infiniband/ulp/ipoib/ipoib.h
> @@ -139,6 +139,7 @@ struct ipoib_dev_priv {
>         struct delayed_work pkey_task;
>         struct delayed_work mcast_task;
>         struct work_struct flush_task;
> +       struct work_struct flush_restart_qp_task;
>         struct work_struct restart_task;
>         struct delayed_work ah_reap_task;
>
> @@ -261,6 +262,7 @@ struct ipoib_dev_priv *ipoib_intf_alloc(
>
>  int ipoib_ib_dev_init(struct net_device *dev, struct ib_device *ca, int port);
>  void ipoib_ib_dev_flush(struct work_struct *work);
> +void ipoib_ib_dev_flush_restart_qp(struct work_struct *work);
>  void ipoib_ib_dev_cleanup(struct net_device *dev);
>
>  int ipoib_ib_dev_open(struct net_device *dev);
> diff --git a/drivers/infiniband/ulp/ipoib/ipoib_ib.c b/drivers/infiniband/ulp/ipoib/ipoib_ib.c
> index 59d9594..5e2ada9 100644
> --- a/drivers/infiniband/ulp/ipoib/ipoib_ib.c
> +++ b/drivers/infiniband/ulp/ipoib/ipoib_ib.c
> @@ -611,7 +611,7 @@ int ipoib_ib_dev_init(struct net_device
>         return 0;
>  }
>
> -void ipoib_ib_dev_flush(struct work_struct *work)
> +static void __ipoib_ib_dev_flush(struct work_struct *work, int restart_qp)
>  {
>         struct ipoib_dev_priv *cpriv, *priv =
>                 container_of(work, struct ipoib_dev_priv, flush_task);
> @@ -630,6 +630,12 @@ void ipoib_ib_dev_flush(struct work_stru
>         ipoib_dbg(priv, "flushing\n");
>
>         ipoib_ib_dev_down(dev, 0);
> +
> +       if (restart_qp) {
> +               ipoib_dbg(priv, "restarting the device QP\n");
> +               ipoib_ib_dev_stop(dev);
> +               ipoib_ib_dev_open(dev);
> +       }
>
>         /*
>          * The device could have been brought down between the start and when
> @@ -644,11 +650,23 @@ void ipoib_ib_dev_flush(struct work_stru
>
>         /* Flush any child interfaces too */
>         list_for_each_entry(cpriv, &priv->child_intfs, list)
> -               ipoib_ib_dev_flush(&cpriv->flush_task);
> +               __ipoib_ib_dev_flush(&cpriv->flush_task, restart_qp);
>
>         mutex_unlock(&priv->vlan_mutex);
>  }
>
> +void ipoib_ib_dev_flush(struct work_struct *work)
> +{
> +       /* We only restart the QP in case of PKEY change event */
> +       __ipoib_ib_dev_flush(work, 0);
> +}
> +
> +void ipoib_ib_dev_flush_restart_qp(struct work_struct *work)
> +{
> +       /* We only restart the QP in case of PKEY change event */
> +       __ipoib_ib_dev_flush(work, 1);
> +}
> +
>  void ipoib_ib_dev_cleanup(struct net_device *dev)
>  {
>         struct ipoib_dev_priv *priv = netdev_priv(dev);
> diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c
> index 705eb1d..da46b79 100644
> --- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
> +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
> @@ -942,6 +942,7 @@ static void ipoib_setup(struct net_devic
>         INIT_DELAYED_WORK(&priv->pkey_task,    ipoib_pkey_poll);
>         INIT_DELAYED_WORK(&priv->mcast_task,   ipoib_mcast_join_task);
>         INIT_WORK(&priv->flush_task,   ipoib_ib_dev_flush);
> +       INIT_WORK(&priv->flush_restart_qp_task, ipoib_ib_dev_flush_restart_qp);
>         INIT_WORK(&priv->restart_task, ipoib_mcast_restart_task);
>         INIT_DELAYED_WORK(&priv->ah_reap_task, ipoib_reap_ah);
>  }
> diff --git a/drivers/infiniband/ulp/ipoib/ipoib_verbs.c b/drivers/infiniband/ulp/ipoib/ipoib_verbs.c
> index 7b717c6..c249915 100644
> --- a/drivers/infiniband/ulp/ipoib/ipoib_verbs.c
> +++ b/drivers/infiniband/ulp/ipoib/ipoib_verbs.c
> @@ -252,12 +252,14 @@ void ipoib_event(struct ib_event_handler
>                 container_of(handler, struct ipoib_dev_priv, event_handler);
>
>         if (record->event == IB_EVENT_PORT_ERR    ||
> -           record->event == IB_EVENT_PKEY_CHANGE ||
>             record->event == IB_EVENT_PORT_ACTIVE ||
>             record->event == IB_EVENT_LID_CHANGE  ||
>             record->event == IB_EVENT_SM_CHANGE   ||
>             record->event == IB_EVENT_CLIENT_REREGISTER) {
>                 ipoib_dbg(priv, "Port state change event\n");
>                 queue_work(ipoib_workqueue, &priv->flush_task);
> +       } else if (record->event == IB_EVENT_PKEY_CHANGE) {
> +               ipoib_dbg(priv, "PKEY change event\n");
> +               queue_work(ipoib_workqueue, &priv->flush_restart_qp_task);
>         }
>  }
>
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
>
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
>
>


From ogerlitz at voltaire.com  Tue Feb 20 00:40:29 2007
From: ogerlitz at voltaire.com (Or Gerlitz)
Date: Tue, 20 Feb 2007 10:40:29 +0200
Subject: [openib-general] Fork issues with simple MPI program
In-Reply-To: <000001c75454$523660f0$eed4180a@amr.corp.intel.com>
References: <000001c75454$523660f0$eed4180a@amr.corp.intel.com>
Message-ID: <45DAB3FD.8060606@voltaire.com>

Arlin Davis wrote:
> Any insight would be greatly appreciated. It was our assumption that the parent process can continue
> to use IB resources after the fixes went into 2.6.16 and OFED 1.1. Is this true? 

As was discussed over this list in few occasions: in contrast to popular 
thought the fork support was deployed in libibverbs1.1 where OFED 1.1 
contains libibverbs1.0

Or.


From vlad at lists.openfabrics.org  Tue Feb 20 02:24:10 2007
From: vlad at lists.openfabrics.org (vlad at lists.openfabrics.org)
Date: Tue, 20 Feb 2007 02:24:10 -0800 (PST)
Subject: [openib-general] ofa_1_2_kernel 20070220-0200 daily build status
Message-ID: <20070220102411.8D72EE6080C@openfabrics.org>

This email was generated automatically, please do not reply


Common build parameters:  --with-ipoib-mod --with-sdp-mod --with-srp-mod --with-user_mad-mod --with-user_access-mod --with-mthca-mod --with-core-mod --with-addr_trans-mod --with-cxgb3-mod 

Passed:
Passed on i686 with 2.6.15-23-server
Passed on i686 with linux-2.6.17
Passed on i686 with linux-2.6.14
Passed on i686 with linux-2.6.13
Passed on i686 with linux-2.6.16
Passed on i686 with linux-2.6.18
Passed on i686 with linux-2.6.12
Passed on i686 with linux-2.6.15
Passed on i686 with linux-2.6.19
Passed on x86_64 with linux-2.6.20
Passed on x86_64 with linux-2.6.16
Passed on ppc64 with linux-2.6.19
Passed on x86_64 with linux-2.6.14
Passed on x86_64 with linux-2.6.18
Passed on x86_64 with linux-2.6.15
Passed on x86_64 with linux-2.6.19
Passed on powerpc with linux-2.6.19
Passed on powerpc with linux-2.6.18
Passed on x86_64 with linux-2.6.12
Passed on powerpc with linux-2.6.17
Passed on x86_64 with linux-2.6.17
Passed on x86_64 with linux-2.6.13
Passed on ia64 with linux-2.6.12
Passed on powerpc with linux-2.6.14
Passed on powerpc with linux-2.6.15
Passed on ppc64 with linux-2.6.15
Passed on powerpc with linux-2.6.16
Passed on powerpc with linux-2.6.12
Passed on ppc64 with linux-2.6.17
Passed on ppc64 with linux-2.6.13
Passed on ppc64 with linux-2.6.16
Passed on ppc64 with linux-2.6.14
Passed on powerpc with linux-2.6.13
Passed on ppc64 with linux-2.6.12
Passed on ppc64 with linux-2.6.18
Passed on ia64 with linux-2.6.19
Passed on ia64 with linux-2.6.13
Passed on ia64 with linux-2.6.16
Passed on ia64 with linux-2.6.18
Passed on ia64 with linux-2.6.17
Passed on ia64 with linux-2.6.15
Passed on ia64 with linux-2.6.14

Failed:


From halr at voltaire.com  Tue Feb 20 05:27:00 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 20 Feb 2007 08:27:00 -0500
Subject: [openib-general] Port error rate detection
In-Reply-To: <45DA0E50.7010002@ornl.gov>
References: <45DA0E50.7010002@ornl.gov>
Message-ID: <1171978018.4380.298013.camel@hal.voltaire.com>

On Mon, 2007-02-19 at 15:53, Steven Carter wrote:
> I have a Nagios module that alerts on connectivity, port errors, 
> speed/width problems.  I would like to give it the ability to change the 
> severity of the alert depending on whether errors are just present or if 
> they are increasing faster than a specified rate.  The intent is to 
> equip the module to keep the state of the last query and possibly 
> history, but I wanted to make sure that I was not re-inventing the wheel 
> first.  Is there an attribute or utility that I am overlooking that will 
> help me do this?

Not currently (to my knowledge). The thresholding of rate aspect is
similat to what will be supported in the proposed PerfManager.

-- Hal

> Thanks,
> 
> Steven.
> 
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
> 


From jlentini at netapp.com  Tue Feb 20 06:29:09 2007
From: jlentini at netapp.com (James Lentini)
Date: Tue, 20 Feb 2007 09:29:09 -0500 (EST)
Subject: [openib-general] OFED 1.2 dapl and dat.conf
In-Reply-To: <45D6327B.4060606@ichips.intel.com>
References: <1171397522.21471.7.camel@stevo-desktop>
	<45D37E8E.5050800@ichips.intel.com>
	<1171561783.3161.165.camel@fc6.xsintricity.com>
	<45D6327B.4060606@ichips.intel.com>
Message-ID: <Pine.LNX.4.64.0702200926500.14025@jlentini-linux.nane.netapp.com>


On Fri, 16 Feb 2007, Arlin Davis wrote:

> Doug Ledford wrote:
> 
> > On Wed, 2007-02-14 at 13:26 -0800, Arlin Davis wrote:
> >  
> > > Steve Wise wrote:
> > > 
> > >    
> > > > Currently, the dapl rpms don't install dat.conf.  I think they probably
> > > > should, eh?  Maybe in <prefix>/etc/dat.conf
> > > > 
> > > > 
> > > >      
> > > my specfile is setup to target sysconfdir which is typically set to
> > > `$(prefix)/etc'
> > > 
> > > %{_sysconfdir}/dat.conf
> > > 
> > > I am not sure how the 1.2 scripts are building the rpms. Maybe Vladimir
> > > can help explain?
> > >    
> > 
> > Note that this setup is problematic on multilib arches.  Since the
> > dat.conf file hard codes a library path that's different for 32bit/64bit
> > arches, installing both a 32bit and 64bit dapl library is impossible
> > without munging things.
> > 
> > For RHEL4U5/RHEL5 I changed the dat library to read dat<bits>.conf and
> > have two separate conf files.  A probably better approach would be to
> > change the library to use a relative library name that it looks for
> > starting from the libraries own directory.  Hence if the dapl library is
> > in /usr/lib, it looks in /usr/lib.  Doing that would allow the
> > 32bit/64bit libraries to share the same config file.
> > 
> >  
> This is a good idea. I will take a look at dladdr options to set 
> appropriate starting path for dapl libraries when absolute paths are 
> not specified.
> 
> James, do you see any issues with this approach?

Nope. The dat registry should be able to handle provider libraries at 
any location in the file namespace (provided they are accessible of 
course).

> Vladimir, can you tell me how the OFED 1.2 install scripts are 
> handling the dat.conf?
> 
> -arlin
> 


From swise at opengridcomputing.com  Tue Feb 20 06:43:06 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Tue, 20 Feb 2007 08:43:06 -0600
Subject: [openib-general] [2.6 patch] drivers/infiniband/hw/cxgb3/:
	possible cleanups
In-Reply-To: <20070220000211.GZ13958@stusta.de>
References: <20070220000211.GZ13958@stusta.de>
Message-ID: <1171982587.2101.0.camel@stevo-desktop>

On Tue, 2007-02-20 at 01:02 +0100, Adrian Bunk wrote:
> This patch contains the following possible cleanups:
> - don't mark static functions in C files as inline - gcc should know
>   best whether inlining makes sense
> - never compile the unused cxio_dbg.c
> - make the following needlessly global functions static:
>   - cxio_hal.c: cxio_hal_clear_qp_ctx()
>   - iwch_provider.c: iwch_get_qp()
> - #if 0 the following unused global functions:
>   - cxio_hal.c: cxio_allocate_stag()
>   - cxio_resource.: cxio_hal_get_rhdl()
>   - cxio_resource.: cxio_hal_put_rhdl()
> 

You could just remove the code instead of #if 0...


> Signed-off-by: Adrian Bunk <bunk at stusta.de>
> 
> ---
> 
>  drivers/infiniband/hw/cxgb3/Makefile        |    1 
>  drivers/infiniband/hw/cxgb3/cxio_hal.c      |   22 +++++++--------
>  drivers/infiniband/hw/cxgb3/cxio_hal.h      |    5 ---
>  drivers/infiniband/hw/cxgb3/cxio_resource.c |    8 ++++-
>  drivers/infiniband/hw/cxgb3/iwch_cm.c       |    5 +--
>  drivers/infiniband/hw/cxgb3/iwch_provider.c |    2 -
>  drivers/infiniband/hw/cxgb3/iwch_provider.h |    1 
>  drivers/infiniband/hw/cxgb3/iwch_qp.c       |   29 ++++++++------------
>  8 files changed, 33 insertions(+), 40 deletions(-)
> 
> --- linux-2.6.20-mm1/drivers/infiniband/hw/cxgb3/Makefile.old	2007-02-17 17:21:03.000000000 +0100
> +++ linux-2.6.20-mm1/drivers/infiniband/hw/cxgb3/Makefile	2007-02-17 17:21:08.000000000 +0100
> @@ -8,5 +8,4 @@
>  
>  ifdef CONFIG_INFINIBAND_CXGB3_DEBUG
>  EXTRA_CFLAGS += -DDEBUG
> -iw_cxgb3-y += cxio_dbg.o
>  endif
> --- linux-2.6.20-mm1/drivers/infiniband/hw/cxgb3/cxio_hal.h.old	2007-02-17 17:22:53.000000000 +0100
> +++ linux-2.6.20-mm1/drivers/infiniband/hw/cxgb3/cxio_hal.h	2007-02-17 17:25:08.000000000 +0100
> @@ -144,7 +144,6 @@
>  void cxio_rdev_close(struct cxio_rdev *rdev);
>  int cxio_hal_cq_op(struct cxio_rdev *rdev, struct t3_cq *cq,
>  		   enum t3_cq_opcode op, u32 credit);
> -int cxio_hal_clear_qp_ctx(struct cxio_rdev *rdev, u32 qpid);
>  int cxio_create_cq(struct cxio_rdev *rdev, struct t3_cq *cq);
>  int cxio_destroy_cq(struct cxio_rdev *rdev, struct t3_cq *cq);
>  int cxio_resize_cq(struct cxio_rdev *rdev, struct t3_cq *cq);
> @@ -155,8 +154,6 @@
>  int cxio_destroy_qp(struct cxio_rdev *rdev, struct t3_wq *wq,
>  		    struct cxio_ucontext *uctx);
>  int cxio_peek_cq(struct t3_wq *wr, struct t3_cq *cq, int opcode);
> -int cxio_allocate_stag(struct cxio_rdev *rdev, u32 * stag, u32 pdid,
> -		       enum tpt_mem_perm perm, u32 * pbl_size, u32 * pbl_addr);
>  int cxio_register_phys_mem(struct cxio_rdev *rdev, u32 * stag, u32 pdid,
>  			   enum tpt_mem_perm perm, u32 zbva, u64 to, u32 len,
>  			   u8 page_size, __be64 *pbl, u32 *pbl_size,
> @@ -172,8 +169,6 @@
>  int cxio_rdma_init(struct cxio_rdev *rdev, struct t3_rdma_init_attr *attr);
>  void cxio_register_ev_cb(cxio_hal_ev_callback_func_t ev_cb);
>  void cxio_unregister_ev_cb(cxio_hal_ev_callback_func_t ev_cb);
> -u32 cxio_hal_get_rhdl(void);
> -void cxio_hal_put_rhdl(u32 rhdl);
>  u32 cxio_hal_get_pdid(struct cxio_hal_resource *rscp);
>  void cxio_hal_put_pdid(struct cxio_hal_resource *rscp, u32 pdid);
>  int __init cxio_hal_init(void);
> --- linux-2.6.20-mm1/drivers/infiniband/hw/cxgb3/cxio_hal.c.old	2007-02-17 17:23:11.000000000 +0100
> +++ linux-2.6.20-mm1/drivers/infiniband/hw/cxgb3/cxio_hal.c	2007-02-17 17:36:40.000000000 +0100
> @@ -46,7 +46,7 @@
>  static LIST_HEAD(rdev_list);
>  static cxio_hal_ev_callback_func_t cxio_ev_cb = NULL;
>  
> -static inline struct cxio_rdev *cxio_hal_find_rdev_by_name(char *dev_name)
> +static struct cxio_rdev *cxio_hal_find_rdev_by_name(char *dev_name)
>  {
>  	struct cxio_rdev *rdev;
>  
> @@ -56,8 +56,7 @@
>  	return NULL;
>  }
>  
> -static inline struct cxio_rdev *cxio_hal_find_rdev_by_t3cdev(struct t3cdev
> -							     *tdev)
> +static struct cxio_rdev *cxio_hal_find_rdev_by_t3cdev(struct t3cdev *tdev)
>  {
>  	struct cxio_rdev *rdev;
>  
> @@ -119,7 +118,7 @@
>  	return 0;
>  }
>  
> -static inline int cxio_hal_clear_cq_ctx(struct cxio_rdev *rdev_p, u32 cqid)
> +static int cxio_hal_clear_cq_ctx(struct cxio_rdev *rdev_p, u32 cqid)
>  {
>  	struct rdma_cq_setup setup;
>  	setup.id = cqid;
> @@ -131,7 +130,7 @@
>  	return (rdev_p->t3cdev_p->ctl(rdev_p->t3cdev_p, RDMA_CQ_SETUP, &setup));
>  }
>  
> -int cxio_hal_clear_qp_ctx(struct cxio_rdev *rdev_p, u32 qpid)
> +static int cxio_hal_clear_qp_ctx(struct cxio_rdev *rdev_p, u32 qpid)
>  {
>  	u64 sge_cmd;
>  	struct t3_modify_qp_wr *wqe;
> @@ -426,7 +425,7 @@
>  	}
>  }
>  
> -static inline int cqe_completes_wr(struct t3_cqe *cqe, struct t3_wq *wq)
> +static int cqe_completes_wr(struct t3_cqe *cqe, struct t3_wq *wq)
>  {
>  	if (CQE_OPCODE(*cqe) == T3_TERMINATE)
>  		return 0;
> @@ -761,6 +760,7 @@
>  	return err;
>  }
>  
> +#if 0
>  /* IN : stag key, pdid, pbl_size
>   * Out: stag index, actaul pbl_size, and pbl_addr allocated.
>   */
> @@ -771,6 +771,7 @@
>  	return (__cxio_tpt_op(rdev_p, 0, stag, 0, pdid, TPT_NON_SHARED_MR,
>  			      perm, 0, 0ULL, 0, 0, NULL, pbl_size, pbl_addr));
>  }
> +#endif  /*  0  */
>  
>  int cxio_register_phys_mem(struct cxio_rdev *rdev_p, u32 *stag, u32 pdid,
>  			   enum tpt_mem_perm perm, u32 zbva, u64 to, u32 len,
> @@ -1030,7 +1031,7 @@
>  	cxio_hal_destroy_rhdl_resource();
>  }
>  
> -static inline void flush_completed_wrs(struct t3_wq *wq, struct t3_cq *cq)
> +static void flush_completed_wrs(struct t3_wq *wq, struct t3_cq *cq)
>  {
>  	struct t3_swsq *sqp;
>  	__u32 ptr = wq->sq_rptr;
> @@ -1059,9 +1060,8 @@
>  			break;
>  }
>  
> -static inline void create_read_req_cqe(struct t3_wq *wq,
> -				       struct t3_cqe *hw_cqe,
> -				       struct t3_cqe *read_cqe)
> +static void create_read_req_cqe(struct t3_wq *wq, struct t3_cqe *hw_cqe,
> +				struct t3_cqe *read_cqe)
>  {
>  	read_cqe->u.scqe.wrid_hi = wq->oldest_read->sq_wptr;
>  	read_cqe->len = wq->oldest_read->read_len;
> @@ -1074,7 +1074,7 @@
>  /*
>   * Return a ptr to the next read wr in the SWSQ or NULL.
>   */
> -static inline void advance_oldest_read(struct t3_wq *wq)
> +static void advance_oldest_read(struct t3_wq *wq)
>  {
>  
>  	u32 rptr = wq->oldest_read - wq->sq + 1;
> --- linux-2.6.20-mm1/drivers/infiniband/hw/cxgb3/cxio_resource.c.old	2007-02-17 17:24:42.000000000 +0100
> +++ linux-2.6.20-mm1/drivers/infiniband/hw/cxgb3/cxio_resource.c	2007-02-17 17:27:17.000000000 +0100
> @@ -180,7 +180,7 @@
>  /*
>   * returns 0 if no resource available
>   */
> -static inline u32 cxio_hal_get_resource(struct kfifo *fifo)
> +static u32 cxio_hal_get_resource(struct kfifo *fifo)
>  {
>  	u32 entry;
>  	if (kfifo_get(fifo, (unsigned char *) &entry, sizeof(u32)))
> @@ -189,11 +189,13 @@
>  		return 0;	/* fifo emptry */
>  }
>  
> -static inline void cxio_hal_put_resource(struct kfifo *fifo, u32 entry)
> +static void cxio_hal_put_resource(struct kfifo *fifo, u32 entry)
>  {
>  	BUG_ON(kfifo_put(fifo, (unsigned char *) &entry, sizeof(u32)) == 0);
>  }
>  
> +#if 0
> +
>  u32 cxio_hal_get_rhdl(void)
>  {
>  	return cxio_hal_get_resource(rhdl_fifo);
> @@ -204,6 +206,8 @@
>  	cxio_hal_put_resource(rhdl_fifo, rhdl);
>  }
>  
> +#endif  /*  0  */
> +
>  u32 cxio_hal_get_stag(struct cxio_hal_resource *rscp)
>  {
>  	return cxio_hal_get_resource(rscp->tpt_fifo);
> --- linux-2.6.20-mm1/drivers/infiniband/hw/cxgb3/iwch_provider.h.old	2007-02-17 17:25:35.000000000 +0100
> +++ linux-2.6.20-mm1/drivers/infiniband/hw/cxgb3/iwch_provider.h	2007-02-17 17:25:41.000000000 +0100
> @@ -179,7 +179,6 @@
>  
>  void iwch_qp_add_ref(struct ib_qp *qp);
>  void iwch_qp_rem_ref(struct ib_qp *qp);
> -struct ib_qp *iwch_get_qp(struct ib_device *dev, int qpn);
>  
>  struct iwch_ucontext {
>  	struct ib_ucontext ibucontext;
> --- linux-2.6.20-mm1/drivers/infiniband/hw/cxgb3/iwch_provider.c.old	2007-02-17 17:25:50.000000000 +0100
> +++ linux-2.6.20-mm1/drivers/infiniband/hw/cxgb3/iwch_provider.c	2007-02-17 17:25:57.000000000 +0100
> @@ -949,7 +949,7 @@
>  	        wake_up(&(to_iwch_qp(qp)->wait));
>  }
>  
> -struct ib_qp *iwch_get_qp(struct ib_device *dev, int qpn)
> +static struct ib_qp *iwch_get_qp(struct ib_device *dev, int qpn)
>  {
>  	PDBG("%s ib_dev %p qpn 0x%x\n", __FUNCTION__, dev, qpn);
>  	return (struct ib_qp *)get_qhp(to_iwch_dev(dev), qpn);
> --- linux-2.6.20-mm1/drivers/infiniband/hw/cxgb3/iwch_qp.c.old	2007-02-17 17:27:31.000000000 +0100
> +++ linux-2.6.20-mm1/drivers/infiniband/hw/cxgb3/iwch_qp.c	2007-02-17 17:38:07.000000000 +0100
> @@ -37,8 +37,8 @@
>  
>  #define NO_SUPPORT -1
>  
> -static inline int iwch_build_rdma_send(union t3_wr *wqe, struct ib_send_wr *wr,
> -				       u8 * flit_cnt)
> +static int iwch_build_rdma_send(union t3_wr *wqe, struct ib_send_wr *wr,
> +				u8 * flit_cnt)
>  {
>  	int i;
>  	u32 plen;
> @@ -97,8 +97,8 @@
>  	return 0;
>  }
>  
> -static inline int iwch_build_rdma_write(union t3_wr *wqe, struct ib_send_wr *wr,
> -					u8 *flit_cnt)
> +static int iwch_build_rdma_write(union t3_wr *wqe, struct ib_send_wr *wr,
> +				 u8 *flit_cnt)
>  {
>  	int i;
>  	u32 plen;
> @@ -138,8 +138,8 @@
>  	return 0;
>  }
>  
> -static inline int iwch_build_rdma_read(union t3_wr *wqe, struct ib_send_wr *wr,
> -				       u8 *flit_cnt)
> +static int iwch_build_rdma_read(union t3_wr *wqe, struct ib_send_wr *wr,
> +				u8 *flit_cnt)
>  {
>  	if (wr->num_sge > 1)
>  		return -EINVAL;
> @@ -159,9 +159,8 @@
>  /*
>   * TBD: this is going to be moved to firmware. Missing pdid/qpid check for now.
>   */
> -static inline int iwch_sgl2pbl_map(struct iwch_dev *rhp,
> -				   struct ib_sge *sg_list, u32 num_sgle,
> -				   u32 * pbl_addr, u8 * page_size)
> +static int iwch_sgl2pbl_map(struct iwch_dev *rhp, struct ib_sge *sg_list,
> +			    u32 num_sgle, u32 * pbl_addr, u8 * page_size)
>  {
>  	int i;
>  	struct iwch_mr *mhp;
> @@ -207,9 +206,8 @@
>  	return 0;
>  }
>  
> -static inline int iwch_build_rdma_recv(struct iwch_dev *rhp,
> -						    union t3_wr *wqe,
> -						    struct ib_recv_wr *wr)
> +static int iwch_build_rdma_recv(struct iwch_dev *rhp, union t3_wr *wqe,
> +				struct ib_recv_wr *wr)
>  {
>  	int i, err = 0;
>  	u32 pbl_addr[4];
> @@ -474,8 +472,7 @@
>  	return err;
>  }
>  
> -static inline void build_term_codes(int t3err, u8 *layer_type, u8 *ecode,
> -				    int tagged)
> +static void build_term_codes(int t3err, u8 *layer_type, u8 *ecode, int tagged)
>  {
>  	switch (t3err) {
>  	case TPT_ERR_STAG:
> @@ -673,7 +670,7 @@
>  	spin_lock_irqsave(&qhp->lock, *flag);
>  }
>  
> -static inline void flush_qp(struct iwch_qp *qhp, unsigned long *flag)
> +static void flush_qp(struct iwch_qp *qhp, unsigned long *flag)
>  {
>  	if (t3b_device(qhp->rhp))
>  		cxio_set_wq_in_error(&qhp->wq);
> @@ -685,7 +682,7 @@
>  /*
>   * Return non zero if at least one RECV was pre-posted.
>   */
> -static inline int rqes_posted(struct iwch_qp *qhp)
> +static int rqes_posted(struct iwch_qp *qhp)
>  {
>  	return fw_riwrh_opcode((struct fw_riwrh *)qhp->wq.queue) == T3_WR_RCV;
>  }
> --- linux-2.6.20-mm1/drivers/infiniband/hw/cxgb3/iwch_cm.c.old	2007-02-17 17:27:53.000000000 +0100
> +++ linux-2.6.20-mm1/drivers/infiniband/hw/cxgb3/iwch_cm.c	2007-02-17 17:38:23.000000000 +0100
> @@ -210,8 +210,7 @@
>  	return state;
>  }
>  
> -static inline void __state_set(struct iwch_ep_common *epc,
> -			       enum iwch_ep_state new)
> +static void __state_set(struct iwch_ep_common *epc, enum iwch_ep_state new)
>  {
>  	epc->state = new;
>  }
> @@ -1460,7 +1459,7 @@
>  /*
>   * Returns whether an ABORT_REQ_RSS message is a negative advice.
>   */
> -static inline int is_neg_adv_abort(unsigned int status)
> +static int is_neg_adv_abort(unsigned int status)
>  {
>  	return status == CPL_ERR_RTX_NEG_ADVICE ||
>  	       status == CPL_ERR_PERSIST_NEG_ADVICE;
> 


From scarter at ornl.gov  Tue Feb 20 06:44:59 2007
From: scarter at ornl.gov (Steven Carter)
Date: Tue, 20 Feb 2007 09:44:59 -0500
Subject: [openib-general] Port error rate detection
In-Reply-To: <1171978018.4380.298013.camel@hal.voltaire.com>
References: <45DA0E50.7010002@ornl.gov>
	<1171978018.4380.298013.camel@hal.voltaire.com>
Message-ID: <45DB096B.2060306@ornl.gov>

Hal Rosenstock wrote:
> On Mon, 2007-02-19 at 15:53, Steven Carter wrote:
>   
>> I have a Nagios module that alerts on connectivity, port errors, 
>> speed/width problems.  I would like to give it the ability to change the 
>> severity of the alert depending on whether errors are just present or if 
>> they are increasing faster than a specified rate.  The intent is to 
>> equip the module to keep the state of the last query and possibly 
>> history, but I wanted to make sure that I was not re-inventing the wheel 
>> first.  Is there an attribute or utility that I am overlooking that will 
>> help me do this?
>>     
>
> Not currently (to my knowledge). The thresholding of rate aspect is
> similat to what will be supported in the proposed PerfManager.
>   
I noticed that in your RFC.  How are you planning on presenting the data 
to other agents (e.g. Nagios, Openview, MRTG, etc.)?  One comment that I 
should have made on your RFC is that I wonder if it is necessary to 
include the data analysis/reduction part.  Just having a central 
location that collects the values and presents it via SNMP is extremely 
useful since there are a plethora of monitoring apps (free and 
commercial) that  do what you are proposing.  That way, a network 
manager can leverage existing tools currently used for monitoring 
Ethernet Nodes, Hosts, etc.  You can still include a last change 
attribute with each counter so that simple utilities (like the one that 
I am writing) can get an idea of how quickly errors are occurring.

Steven.

> -- Hal
>
>   
>> Thanks,
>>
>> Steven.
>>
>> _______________________________________________
>> openib-general mailing list
>> openib-general at openib.org
>> http://openib.org/mailman/listinfo/openib-general
>>
>> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
>>
>>     
>
>   


From halr at voltaire.com  Tue Feb 20 06:43:01 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 20 Feb 2007 09:43:01 -0500
Subject: [openib-general] [PATCH] osm_vendor_ibumad: termination crash
	fix
In-Reply-To: <20070219214630.GW27414@sashak.voltaire.com>
References: <20070219214630.GW27414@sashak.voltaire.com>
Message-ID: <1171982581.4380.302584.camel@hal.voltaire.com>

On Mon, 2007-02-19 at 16:46, Sasha Khapyorsky wrote:
> When OpenSM is terminated umad_receiver thread still running even after
> the structures are destroyed and freed, this causes to random (but easily
> reproducible) crashes. The reason is that osm_vendor_delete() does not
> care about thread termination. This patch adds the receiver thread
> cancellation (by using pthread_cancel() and pthread_join()) and cares to
> keep have all mutexes unlocked upon termination. There is also minor
> termination code consolidation - osm_vendor_port_close() function.
> 
> Signed-off-by: Sasha Khapyorsky <sashak at voltaire.com>

Good find.

Thanks! Applied (to both master and ofed_1_2).

-- Hal


From dledford at redhat.com  Tue Feb 20 06:48:08 2007
From: dledford at redhat.com (Doug Ledford)
Date: Tue, 20 Feb 2007 09:48:08 -0500
Subject: [openib-general] OFED 1.2 dapl and dat.conf
In-Reply-To: <Pine.LNX.4.64.0702200926500.14025@jlentini-linux.nane.netapp.com>
References: <1171397522.21471.7.camel@stevo-desktop>
	<45D37E8E.5050800@ichips.intel.com>
	<1171561783.3161.165.camel@fc6.xsintricity.com>
	<45D6327B.4060606@ichips.intel.com>
	<Pine.LNX.4.64.0702200926500.14025@jlentini-linux.nane.netapp.com>
Message-ID: <1171982888.3161.283.camel@fc6.xsintricity.com>

On Tue, 2007-02-20 at 09:29 -0500, James Lentini wrote:
> 
> On Fri, 16 Feb 2007, Arlin Davis wrote:
> 
> > Doug Ledford wrote:
> > 
> > > On Wed, 2007-02-14 at 13:26 -0800, Arlin Davis wrote:
> > >  
> > > > Steve Wise wrote:
> > > > 
> > > >    
> > > > > Currently, the dapl rpms don't install dat.conf.  I think they probably
> > > > > should, eh?  Maybe in <prefix>/etc/dat.conf
> > > > > 
> > > > > 
> > > > >      
> > > > my specfile is setup to target sysconfdir which is typically set to
> > > > `$(prefix)/etc'
> > > > 
> > > > %{_sysconfdir}/dat.conf
> > > > 
> > > > I am not sure how the 1.2 scripts are building the rpms. Maybe Vladimir
> > > > can help explain?
> > > >    
> > > 
> > > Note that this setup is problematic on multilib arches.  Since the
> > > dat.conf file hard codes a library path that's different for 32bit/64bit
> > > arches, installing both a 32bit and 64bit dapl library is impossible
> > > without munging things.
> > > 
> > > For RHEL4U5/RHEL5 I changed the dat library to read dat<bits>.conf and
> > > have two separate conf files.  A probably better approach would be to
> > > change the library to use a relative library name that it looks for
> > > starting from the libraries own directory.  Hence if the dapl library is
> > > in /usr/lib, it looks in /usr/lib.  Doing that would allow the
> > > 32bit/64bit libraries to share the same config file.
> > > 
> > >  
> > This is a good idea. I will take a look at dladdr options to set 
> > appropriate starting path for dapl libraries when absolute paths are 
> > not specified.
> > 
> > James, do you see any issues with this approach?
> 
> Nope. The dat registry should be able to handle provider libraries at 
> any location in the file namespace (provided they are accessible of 
> course).

Yep.  Although if you want the 64bit and 32bit dat.conf to be identical,
then the best bet would be something like putting the main library
in /usr/lib or /usr/lib64 and then doing a relative path from there to
the provider libs, such as dapl/provider/libname.so.  That way, the same
filespec in the dat.conf file will find either the 64bit or 32bit
provider lib depending on whether the 64bit or 32bit main library is the
one searching for it.

> > Vladimir, can you tell me how the OFED 1.2 install scripts are 
> > handling the dat.conf?
> > 
> > -arlin
> > 
-- 
Doug Ledford <dledford at redhat.com>
              GPG KeyID: CFBFF194
              http://people.redhat.com/dledford

Infiniband specific RPMs available at
              http://people.redhat.com/dledford/Infiniband
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070220/27344ff6/attachment.sig>

From halr at voltaire.com  Tue Feb 20 06:47:52 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 20 Feb 2007 09:47:52 -0500
Subject: [openib-general] Port error rate detection
In-Reply-To: <45DB096B.2060306@ornl.gov>
References: <45DA0E50.7010002@ornl.gov>
	<1171978018.4380.298013.camel@hal.voltaire.com>
	<45DB096B.2060306@ornl.gov>
Message-ID: <1171982868.4380.302888.camel@hal.voltaire.com>

On Tue, 2007-02-20 at 09:44, Steven Carter wrote:
> Hal Rosenstock wrote:
> > On Mon, 2007-02-19 at 15:53, Steven Carter wrote:
> >   
> >> I have a Nagios module that alerts on connectivity, port errors, 
> >> speed/width problems.  I would like to give it the ability to change the 
> >> severity of the alert depending on whether errors are just present or if 
> >> they are increasing faster than a specified rate.  The intent is to 
> >> equip the module to keep the state of the last query and possibly 
> >> history, but I wanted to make sure that I was not re-inventing the wheel 
> >> first.  Is there an attribute or utility that I am overlooking that will 
> >> help me do this?
> >>     
> >
> > Not currently (to my knowledge). The thresholding of rate aspect is
> > similat to what will be supported in the proposed PerfManager.
> >   
> I noticed that in your RFC.  How are you planning on presenting the data 
> to other agents (e.g. Nagios, Openview, MRTG, etc.)?  One comment that I 
> should have made on your RFC is that I wonder if it is necessary to 
> include the data analysis/reduction part.

I think it is because there is too much data to push up the tree to one
manager.

> Just having a central location that collects the values and presents it via SNMP is extremely 
> useful since there are a plethora of monitoring apps (free and 
> commercial) that  do what you are proposing.

In general, this information can be exported via SNMP or whatever the
management infrastructure is.

BTW, are there SNMP MIBs for all of this information ? To my knowledge,
some of these were started but never completed. Also, the MIBs were
geared at the agents rather than the managers (in the PerfMgt arena).

-- Hal

> That way, a network manager can leverage existing tools currently used for monitoring 
> Ethernet Nodes, Hosts, etc.  You can still include a last change 
> attribute with each counter so that simple utilities (like the one that 
> I am writing) can get an idea of how quickly errors are occurring.

> Steven.
> 
> > -- Hal
> >
> >   
> >> Thanks,
> >>
> >> Steven.
> >>
> >> _______________________________________________
> >> openib-general mailing list
> >> openib-general at openib.org
> >> http://openib.org/mailman/listinfo/openib-general
> >>
> >> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
> >>
> >>     
> >
> >   
> 


From vlad at mellanox.co.il  Tue Feb 20 07:05:48 2007
From: vlad at mellanox.co.il (Vladimir Sokolovsky)
Date: Tue, 20 Feb 2007 17:05:48 +0200
Subject: [openib-general] OFED 1.2 dapl and dat.conf
In-Reply-To: <Pine.LNX.4.64.0702200926500.14025@jlentini-linux.nane.netapp.com>
References: <1171397522.21471.7.camel@stevo-desktop>
	<45D37E8E.5050800@ichips.intel.com>
	<1171561783.3161.165.camel@fc6.xsintricity.com>
	<45D6327B.4060606@ichips.intel.com>
	<Pine.LNX.4.64.0702200926500.14025@jlentini-linux.nane.netapp.com>
Message-ID: <1171983948.4051.13.camel@vladsk-laptop>


> > Vladimir, can you tell me how the OFED 1.2 install scripts are 
> > handling the dat.conf?
> > 
> > -arlin
> > 

dat.conf updated by rpmbuild process:
/usr/lib is replaced by %{_libdir} (<prefix>/lib for x86, ppc, ia64 and <prefix>/lib64 otherwise).


-- 
Vladimir Sokolovsky <vlad at mellanox.co.il>
Mellanox Technologies Ltd.


From halr at voltaire.com  Tue Feb 20 07:06:51 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 20 Feb 2007 10:06:51 -0500
Subject: [openib-general] [PATCH] librdmacm: fix bug causing failure to
 work with partial membership pkey
In-Reply-To: <Pine.LNX.4.64.0702190838050.26497@zuben>
References: <Pine.LNX.4.64.0702190838050.26497@zuben>
Message-ID: <1171984010.4380.304008.camel@hal.voltaire.com>

On Mon, 2007-02-19 at 01:40, Or Gerlitz wrote:
> Hi Sean,
> 
> this fixes a bug which did not allow to run librdmacm apps over a node
> which is partial member of a partition. The patch takes the approach of the
> kernel ib_find_cached_pkey implementation.
> 
> If you approve this, i suggest pushing it also into OFED 1.2 as a bug fix.
> 
> Or.
> 
> ----------------------------------------------------------------------
> The pkey extracted by the RDMA CM from the IPoIB device hardware address always
> has the full membership bit set. However, when looking in the pkey table the
> search must mask out the full membership bit.
> 
> Signed-off-by: Or Gerlitz <ogerlitz at voltaire.com>
> Signed-off-by: Olga Shern <olgas at voltaire.com>
> 
> diff --git a/src/cma.c b/src/cma.c
> index c5f8cd9..9c24c6a 100644
> --- a/src/cma.c
> +++ b/src/cma.c
> @@ -661,7 +661,7 @@ static int ucma_find_pkey(struct cma_dev
> 
>  	for (i = 0, ret = 0; !ret; i++) {
>  		ret = ibv_query_pkey(cma_dev->verbs, port_num, i, &chk_pkey);
> -		if (!ret && pkey == chk_pkey) {
> +		if ((!ret && pkey  == chk_pkey) || (!ret && htons(ntohs(pkey) & 0x7fff)  == chk_pkey)) {

Is this true for both RC and UD QPs ? I thought that at least the UD QPs
were being used for multicast in which case  wouldn't full member be
required for this ?

-- Hal

>  			*pkey_index = (uint16_t) i;
>  			return 0;
>  		}
> 
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
> 


From ogerlitz at voltaire.com  Tue Feb 20 07:38:29 2007
From: ogerlitz at voltaire.com (Or Gerlitz)
Date: Tue, 20 Feb 2007 17:38:29 +0200
Subject: [openib-general] [PATCH] librdmacm: fix bug causing failure to
 work with partial membership pkey
In-Reply-To: <1171984010.4380.304008.camel@hal.voltaire.com>
References: <Pine.LNX.4.64.0702190838050.26497@zuben>
	<1171984010.4380.304008.camel@hal.voltaire.com>
Message-ID: <45DB15F5.4090406@voltaire.com>

Hal Rosenstock wrote:

>> The pkey extracted by the RDMA CM from the IPoIB device hardware address always
>> has the full membership bit set. However, when looking in the pkey table the
>> search must mask out the full membership bit.

> Is this true for both RC and UD QPs ? I thought that at least the UD QPs
> were being used for multicast in which case  wouldn't full member be
> required for this ?

Yes. Its a little bit confusing: partial and full members of an IPoIB IB 
partition use the same MGID. When an IPoIB MGID is constructed, the pkey 
placed by the driver is --always-- the full membership one. However, on 
a node with partial membership, what's plugged into the QP is the pkey 
index of the partial instance...

In the kernel all this is nicely hidden from the IB ULPs in 
ib_find_cached_pkey().

Or.


From vlad at mellanox.co.il  Tue Feb 20 07:44:49 2007
From: vlad at mellanox.co.il (Vladimir Sokolovsky)
Date: Tue, 20 Feb 2007 17:44:49 +0200
Subject: [openib-general] OFED 1.2 dapl and dat.conf
In-Reply-To: <1171984868.3161.293.camel@fc6.xsintricity.com>
References: <1171397522.21471.7.camel@stevo-desktop>
	<45D37E8E.5050800@ichips.intel.com>
	<1171561783.3161.165.camel@fc6.xsintricity.com>
	<45D6327B.4060606@ichips.intel.com>
	<Pine.LNX.4.64.0702200926500.14025@jlentini-linux.nane.netapp.com>
	<1171983948.4051.13.camel@vladsk-laptop>
	<1171984868.3161.293.camel@fc6.xsintricity.com>
Message-ID: <1171986289.4051.24.camel@vladsk-laptop>

On Tue, 2007-02-20 at 10:21 -0500, Doug Ledford wrote:
> On Tue, 2007-02-20 at 17:05 +0200, Vladimir Sokolovsky wrote:
> > > > Vladimir, can you tell me how the OFED 1.2 install scripts are 
> > > > handling the dat.conf?
> > > > 
> > > > -arlin
> > > > 
> > 
> > dat.conf updated by rpmbuild process:
> > /usr/lib is replaced by %{_libdir} (<prefix>/lib for x86, ppc, ia64 and <prefix>/lib64 otherwise).
> 
> Which creates a multilib regression, aka when you install both the i386
> and x86_64 versions of the dapl rpm, they both contain a dat.conf file
> at the same location in the filesystem, but with different contents.
> Whether you get the 32bit or 64bit version of the dat.conf file depends
> on which is installed later.  Correspondingly, whichever version of the
> library was installed first will be rendered inoperative by this problem
> as it will be either a 32 or 64bit library that is searching for a
> provider library, and the one it finds will be the opposite arch type of
> itself, thereby preventing the dapl library from doing a dlopen on the
> file.  Therefore, whatever version of the dapl library is installed
> first will no longer be able to find any valid provider libraries.  This
> is considered an error condition by our automated package testing tools
> and we are not allowed to ship a package in this state.
> 
I can create /etc/dat32.conf and /etc/dat64.conf.

Currently, in the OFED there is no separation to 32 and 64 bit RPMs.
That is on x86_64, fot example, if 32bit libraries compilation succeeded
then both 32 and 64bit libraries will be a part of the same RPM.


-- 
Vladimir Sokolovsky <vlad at mellanox.co.il>
Mellanox Technologies Ltd.


From halr at voltaire.com  Tue Feb 20 07:42:40 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 20 Feb 2007 10:42:40 -0500
Subject: [openib-general] [PATCH] librdmacm: fix bug causing failure to
 work with partial membership pkey
In-Reply-To: <45DB15F5.4090406@voltaire.com>
References: <Pine.LNX.4.64.0702190838050.26497@zuben>
	<1171984010.4380.304008.camel@hal.voltaire.com>
	<45DB15F5.4090406@voltaire.com>
Message-ID: <1171986159.4380.306117.camel@hal.voltaire.com>

On Tue, 2007-02-20 at 10:38, Or Gerlitz wrote:
> Hal Rosenstock wrote:
> 
> >> The pkey extracted by the RDMA CM from the IPoIB device hardware address always
> >> has the full membership bit set. However, when looking in the pkey table the
> >> search must mask out the full membership bit.
> 
> > Is this true for both RC and UD QPs ? I thought that at least the UD QPs
> > were being used for multicast in which case  wouldn't full member be
> > required for this ?
> 
> Yes. Its a little bit confusing: partial and full members of an IPoIB IB 
> partition use the same MGID. When an IPoIB MGID is constructed, the pkey 
> placed by the driver is --always-- the full membership one. However, on 
> a node with partial membership, what's plugged into the QP is the pkey 
> index of the partial instance...

So in this case, do both the full and partial keys need configuring for
that port ?

-- Hal

> In the kernel all this is nicely hidden from the IB ULPs in 
> ib_find_cached_pkey().
> 
> Or.
> 


From dledford at redhat.com  Tue Feb 20 07:55:27 2007
From: dledford at redhat.com (Doug Ledford)
Date: Tue, 20 Feb 2007 10:55:27 -0500
Subject: [openib-general] OFED 1.2 dapl and dat.conf
In-Reply-To: <1171986289.4051.24.camel@vladsk-laptop>
References: <1171397522.21471.7.camel@stevo-desktop>
	<45D37E8E.5050800@ichips.intel.com>
	<1171561783.3161.165.camel@fc6.xsintricity.com>
	<45D6327B.4060606@ichips.intel.com>
	<Pine.LNX.4.64.0702200926500.14025@jlentini-linux.nane.netapp.com>
	<1171983948.4051.13.camel@vladsk-laptop>
	<1171984868.3161.293.camel@fc6.xsintricity.com>
	<1171986289.4051.24.camel@vladsk-laptop>
Message-ID: <1171986927.3161.297.camel@fc6.xsintricity.com>

On Tue, 2007-02-20 at 17:44 +0200, Vladimir Sokolovsky wrote:
> On Tue, 2007-02-20 at 10:21 -0500, Doug Ledford wrote:
> > On Tue, 2007-02-20 at 17:05 +0200, Vladimir Sokolovsky wrote:
> > > > > Vladimir, can you tell me how the OFED 1.2 install scripts are 
> > > > > handling the dat.conf?
> > > > > 
> > > > > -arlin
> > > > > 
> > > 
> > > dat.conf updated by rpmbuild process:
> > > /usr/lib is replaced by %{_libdir} (<prefix>/lib for x86, ppc, ia64 and <prefix>/lib64 otherwise).
> > 
> > Which creates a multilib regression, aka when you install both the i386
> > and x86_64 versions of the dapl rpm, they both contain a dat.conf file
> > at the same location in the filesystem, but with different contents.
> > Whether you get the 32bit or 64bit version of the dat.conf file depends
> > on which is installed later.  Correspondingly, whichever version of the
> > library was installed first will be rendered inoperative by this problem
> > as it will be either a 32 or 64bit library that is searching for a
> > provider library, and the one it finds will be the opposite arch type of
> > itself, thereby preventing the dapl library from doing a dlopen on the
> > file.  Therefore, whatever version of the dapl library is installed
> > first will no longer be able to find any valid provider libraries.  This
> > is considered an error condition by our automated package testing tools
> > and we are not allowed to ship a package in this state.
> > 
> I can create /etc/dat32.conf and /etc/dat64.conf.

That's pretty much what I did for our next release, but it's crude.  The
other solution we've been discussing would be far preferable.

> Currently, in the OFED there is no separation to 32 and 64 bit RPMs.
> That is on x86_64, fot example, if 32bit libraries compilation succeeded
> then both 32 and 64bit libraries will be a part of the same RPM.

Assuming you actually built both 32 and 64bit dapl libraries, them being
in the same rpm wouldn't solve the problem that the generated dat.conf
would only be correct for one or the other, not for both.  And if you
want to create a dat32.conf and dat64.conf, then you need to munge the
dapl source code so that during a 64bit build it looks for dat64.conf
and in a 32bit build it looks for dat32.conf.  However, you need to
munge the source code in such a way as to have it be the same on both
32bit and 64bit builds, aka something like:

#ifdef __i386__
default_dapl_file = "/usr/local/ofed/etc/dat32.conf";
#else
default_dapl_file = "/usr/local/ofed/etc/dat64.conf";
#endif

If you make the mistake I made, which was to do one patch on 32bit
arches and a different patch on 64bit arches, then the source code
between the 32 and 64bit arches  differs, and guess what, that throws a
multilib regression as well because it breaks debuginfo packages :-/

I'll fix that in our next release.


-- 
Doug Ledford <dledford at redhat.com>
              GPG KeyID: CFBFF194
              http://people.redhat.com/dledford

Infiniband specific RPMs available at
              http://people.redhat.com/dledford/Infiniband
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070220/cc5612bc/attachment.sig>

From sean.hefty at intel.com  Tue Feb 20 10:12:28 2007
From: sean.hefty at intel.com (Sean Hefty)
Date: Tue, 20 Feb 2007 10:12:28 -0800
Subject: [openib-general] OFA 1.2 tarball creation
In-Reply-To: <1171927460.8180.70.camel@stevo-desktop>
Message-ID: <000501c7551a$ae3fb5b0$8698070a@amr.corp.intel.com>

>The ofed_1_2 tree has the 2.6.20 drivers/modules in drivers/infiniband.
>They are, I think, the stock 2.6.20 drivers and modules.  If there are
>fixes to any driver post 2.6.20, then patches get created in
>kernel_patches/fixes directory.  These are applied as part of the
>configuration process when the tree is being built.   Look in there to
>see if your change is in the form of a patch file.

The patch is part of a merged_sean_rdma_dev_ofed_1_2.patch file, so it looks
like it is in OFED 1.2.

- Sean


From mst at mellanox.co.il  Tue Feb 20 10:17:55 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Tue, 20 Feb 2007 20:17:55 +0200
Subject: [openib-general] [PATCH for-2.6.21] IPoIB/cm: improve small message
	bandwidth
Message-ID: <20070220181755.GC11825@mellanox.co.il>

Avoid overhead of freeing/reallocating and mapping/unmapping for dma
for pages that have not been written to by hardware.

Signed-off-by: Michael S. Tsirkin <mst at mellanox.co.il>

---

This gives >10% boost in BW for message sizes up to 32K. Please queue for 2.6.21.

before:

# ./netperf-2.4.2/src/netperf -f M -H 11.4.3.68 -c -C -- -m 32000
TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 11.4.3.68 (11.4.3.68) port 0 AF_INET : demo
Recv   Send    Send                          Utilization       Service Demand
Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
Size   Size    Size     Time     Throughput  local    remote   local   remote
bytes  bytes   bytes    secs.    MBytes  /s  % S      % S      us/KB   us/KB

 87380  16384  32000    10.00       716.23   26.22    23.94    1.430   1.306


after:

# ./netperf-2.4.2/src/netperf -f M -H 11.4.3.68 -c -C -- -m 32000
TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 11.4.3.68 (11.4.3.68) port 0 AF_INET : demo
Recv   Send    Send                          Utilization       Service Demand
Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
Size   Size    Size     Time     Throughput  local    remote   local   remote
bytes  bytes   bytes    secs.    MBytes  /s  % S      % S      us/KB   us/KB

 87380  16384  32000    10.00       888.67   24.13    25.08    1.061   1.102


diff --git a/drivers/infiniband/ulp/ipoib/ipoib_cm.c b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
index 8ee6f06..a23c8e3 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_cm.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
@@ -68,14 +68,14 @@ struct ipoib_cm_id {
 static int ipoib_cm_tx_handler(struct ib_cm_id *cm_id,
 			       struct ib_cm_event *event);
 
-static void ipoib_cm_dma_unmap_rx(struct ipoib_dev_priv *priv,
+static void ipoib_cm_dma_unmap_rx(struct ipoib_dev_priv *priv, int frags,
 				  u64 mapping[IPOIB_CM_RX_SG])
 {
 	int i;
 
 	ib_dma_unmap_single(priv->ca, mapping[0], IPOIB_CM_HEAD_SIZE, DMA_FROM_DEVICE);
 
-	for (i = 0; i < IPOIB_CM_RX_SG - 1; ++i)
+	for (i = 0; i < frags; ++i)
 		ib_dma_unmap_single(priv->ca, mapping[i + 1], PAGE_SIZE, DMA_FROM_DEVICE);
 }
 
@@ -93,7 +93,8 @@ static int ipoib_cm_post_receive(struct net_device *dev, int id)
 	ret = ib_post_srq_recv(priv->cm.srq, &priv->cm.rx_wr, &bad_wr);
 	if (unlikely(ret)) {
 		ipoib_warn(priv, "post srq failed for buf %d (%d)\n", id, ret);
-		ipoib_cm_dma_unmap_rx(priv, priv->cm.srq_ring[id].mapping);
+		ipoib_cm_dma_unmap_rx(priv, IPOIB_CM_RX_SG - 1,
+				      priv->cm.srq_ring[id].mapping);
 		dev_kfree_skb_any(priv->cm.srq_ring[id].skb);
 		priv->cm.srq_ring[id].skb = NULL;
 	}
@@ -101,8 +102,8 @@ static int ipoib_cm_post_receive(struct net_device *dev, int id)
 	return ret;
 }
 
-static int ipoib_cm_alloc_rx_skb(struct net_device *dev, int id,
-				 u64 mapping[IPOIB_CM_RX_SG])
+static struct sk_buff *ipoib_cm_alloc_rx_skb(struct net_device *dev, int id, int frags,
+					     u64 mapping[IPOIB_CM_RX_SG])
 {
 	struct ipoib_dev_priv *priv = netdev_priv(dev);
 	struct sk_buff *skb;
@@ -110,7 +111,7 @@ static int ipoib_cm_alloc_rx_skb(struct net_device *dev, int id,
 
 	skb = dev_alloc_skb(IPOIB_CM_HEAD_SIZE + 12);
 	if (unlikely(!skb))
-		return -ENOMEM;
+		return NULL;
 
 	/*
 	 * IPoIB adds a 4 byte header. So we need 12 more bytes to align the
@@ -122,10 +123,10 @@ static int ipoib_cm_alloc_rx_skb(struct net_device *dev, int id,
 				       DMA_FROM_DEVICE);
 	if (unlikely(ib_dma_mapping_error(priv->ca, mapping[0]))) {
 		dev_kfree_skb_any(skb);
-		return -EIO;
+		return NULL;
 	}
 
-	for (i = 0; i < IPOIB_CM_RX_SG - 1; i++) {
+	for (i = 0; i < frags; i++) {
 		struct page *page = alloc_page(GFP_ATOMIC);
 
 		if (!page)
@@ -139,7 +140,7 @@ static int ipoib_cm_alloc_rx_skb(struct net_device *dev, int id,
 	}
 
 	priv->cm.srq_ring[id].skb = skb;
-	return 0;
+	return skb;
 
 partial_error:
 
@@ -148,8 +149,8 @@ partial_error:
 	for (; i >= 0; --i)
 		ib_dma_unmap_single(priv->ca, mapping[i + 1], PAGE_SIZE, DMA_FROM_DEVICE);
 
-	dev_kfree_skb_any(skb);
-	return -ENOMEM;
+	dev_kfree_skb_any(skb);
+	return NULL;
 }
 
 static struct ib_qp *ipoib_cm_create_rx_qp(struct net_device *dev,
@@ -312,7 +313,7 @@ static int ipoib_cm_rx_handler(struct ib_cm_id *cm_id,
 }
 /* Adjust length of skb with fragments to match received data */
 static void skb_put_frags(struct sk_buff *skb, unsigned int hdr_space,
-			  unsigned int length)
+			  unsigned int length, struct sk_buff *toskb)
 {
 	int i, num_frags;
 	unsigned int size;
@@ -329,7 +330,7 @@ static void skb_put_frags(struct sk_buff *skb, unsigned int hdr_space,
 
 		if (length == 0) {
 			/* don't need this page */
-			__free_page(frag->page);
+			skb_fill_page_desc(toskb, i, frag->page, 0, PAGE_SIZE);
 			--skb_shinfo(skb)->nr_frags;
 		} else {
 			size = min(length, (unsigned) PAGE_SIZE);
@@ -347,10 +348,11 @@ void ipoib_cm_handle_rx_wc(struct net_device *dev, struct ib_wc *wc)
 {
 	struct ipoib_dev_priv *priv = netdev_priv(dev);
 	unsigned int wr_id = wc->wr_id & ~IPOIB_CM_OP_SRQ;
-	struct sk_buff *skb;
+	struct sk_buff *skb, *newskb;
 	struct ipoib_cm_rx *p;
 	unsigned long flags;
 	u64 mapping[IPOIB_CM_RX_SG];
+	int frags;
 
 	ipoib_dbg_data(priv, "cm recv completion: id %d, op %d, status: %d\n",
 		       wr_id, wc->opcode, wc->status);
@@ -386,7 +388,11 @@ void ipoib_cm_handle_rx_wc(struct net_device *dev, struct ib_wc *wc)
 		}
 	}
 
-	if (unlikely(ipoib_cm_alloc_rx_skb(dev, wr_id, mapping))) {
+	frags = PAGE_ALIGN(wc->byte_len - min(wc->byte_len,
+					      (unsigned)IPOIB_CM_HEAD_SIZE)) / PAGE_SIZE;
+
+	newskb = ipoib_cm_alloc_rx_skb(dev, wr_id, frags, mapping);
+	if (unlikely(!newskb)) {
 		/*
 		 * If we can't allocate a new RX buffer, dump
 		 * this packet and reuse the old buffer.
@@ -396,13 +402,13 @@ void ipoib_cm_handle_rx_wc(struct net_device *dev, struct ib_wc *wc)
 		goto repost;
 	}
 
-	ipoib_cm_dma_unmap_rx(priv, priv->cm.srq_ring[wr_id].mapping);
-	memcpy(priv->cm.srq_ring[wr_id].mapping, mapping, sizeof mapping);
+	ipoib_cm_dma_unmap_rx(priv, frags, priv->cm.srq_ring[wr_id].mapping);
+	memcpy(priv->cm.srq_ring[wr_id].mapping, mapping, (frags + 1) * sizeof *mapping);
 
 	ipoib_dbg_data(priv, "received %d bytes, SLID 0x%04x\n",
 		       wc->byte_len, wc->slid);
 
-	skb_put_frags(skb, IPOIB_CM_HEAD_SIZE, wc->byte_len);
+	skb_put_frags(skb, IPOIB_CM_HEAD_SIZE, wc->byte_len, newskb);
 
 	skb->protocol = ((struct ipoib_header *) skb->data)->proto;
 	skb->mac.raw = skb->data;
@@ -1196,7 +1202,8 @@ int ipoib_cm_dev_init(struct net_device *dev)
 	priv->cm.rx_wr.num_sge = IPOIB_CM_RX_SG;
 
 	for (i = 0; i < ipoib_recvq_size; ++i) {
-		if (ipoib_cm_alloc_rx_skb(dev, i, priv->cm.srq_ring[i].mapping)) {
+		if (!ipoib_cm_alloc_rx_skb(dev, i, IPOIB_CM_RX_SG - 1,
+					   priv->cm.srq_ring[i].mapping)) {
 			ipoib_warn(priv, "failed to allocate receive buffer %d\n", i);
 			ipoib_cm_dev_cleanup(dev);
 			return -ENOMEM;
@@ -1231,7 +1238,8 @@ void ipoib_cm_dev_cleanup(struct net_device *dev)
 		return;
 	for (i = 0; i < ipoib_recvq_size; ++i)
 		if (priv->cm.srq_ring[i].skb) {
-			ipoib_cm_dma_unmap_rx(priv, priv->cm.srq_ring[i].mapping);
+			ipoib_cm_dma_unmap_rx(priv, IPOIB_CM_RX_SG - 1,
+					      priv->cm.srq_ring[i].mapping);
 			dev_kfree_skb_any(priv->cm.srq_ring[i].skb);
 			priv->cm.srq_ring[i].skb = NULL;
 		}

-- 
MST


From halr at voltaire.com  Tue Feb 20 10:21:38 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 20 Feb 2007 13:21:38 -0500
Subject: [openib-general] [Fwd: Re: win related [was: Re: [PATCH 1/2]
 opensm: sigusr1: syslog() fixes]]
In-Reply-To: <6C2C79E72C305246B504CBA17B5500C9C41DD5@mtlexch01.mtl.com>
References: <6C2C79E72C305246B504CBA17B5500C9C41DD5@mtlexch01.mtl.com>
Message-ID: <1171995697.4380.315840.camel@hal.voltaire.com>

Hi Tzachi,

On Thu, 2007-02-08 at 16:24, Tzachi Dar wrote:
> See bellow.

I would like to get back to trying to close on this discussion.

> Thanks
> Tzachi 
> 
> > -----Original Message-----
> > From: Sasha Khapyorsky [mailto:sashak at voltaire.com] 
> > Sent: Thursday, February 08, 2007 9:47 PM
> > To: Tzachi Dar
> > Cc: Yossi Leybovich; Gilad Shainer; Yevgeny Kliteynik; 
> > OPENIB; Michael S. Tsirkin; Hal Rosenstock
> > Subject: Re: [Fwd: Re: win related [was: Re: [PATCH 1/2] 
> > opensm: sigusr1: syslog() fixes]]
> > 
> > On 20:31 Thu 08 Feb     , Tzachi Dar wrote:
> > > The windows open IB has decided on using a BSD only license. 
> > > The common implementation of pthreads as far as I know is 
> > LGPL, which 
> > > means that it can not be used in open IB.
> > 
> > Why not? AFAIK it works perfectly (see (5,6 and Preamble)):
> > http://www.gnu.org/copyleft/lesser.html
> > 
> > And of course there are tons of examples when BSD software 
> > links against LGPLed glibc.
> 
> I can of course write you an answer that will be more than 5 pages long
> of why *I* don't think that 
> Using GPL software is bad for everyone, but I guess that my opinion
> doesn't really meter, so I
> Won't do it.
> The page that you have referenced is of the GNU org, and even there it
> is hard to say that they
> are trying to encourage you to use the LGPL license. In any case, the
> main point is that 
> When open IB windows was formed there was a general decision that it
> will use BSD license. If we
> Start having components with the LGPL this will break that decision, and
> therefore this requires
> some voting of the open IB organization.

I may be missing your point but is there something in the Windows
OpenIB/OpenFabrics license that precludes using Windows OpenIB licensed
code (e.g. BSD like license) in concert with non OpenIB code (like LGPL)
? Isn't that essentially what using the Windows pthreads DLL with OpenSM
would be like ? As I understand it, I don't think this requires a
license change or anything in the OpenIB Windows charter prevents this
or needs changing.

> > > The only two ways that I see around this are 1) Change the 
> > license of 
> > > open IB windows which might be a complicated thing. 2) Find an 
> > > implementation of pthreads that is BSD.
> > 
> > BTW, just wondering... What is relation between windows open 
> > IB and OFA (and OFA's "dual-license rule")?
> Well, the way I see it one can take code from the Linux part under the
> BSD licance and use it in 
> The windows part. The otherway around seems fine to me but some say that
> since the windows BSD liscance
> Reqires that some text will always remain there, the other way around is
> not possibale. As I'm not an 
> Expert in that erea I don't know who is right.

I don't see how this affects what is being discussed about OpenSM. In
all the cases I'm aware of, the portability is from Linux to Windows and
not the other way around.

-- Hal

> > Sasha
> > 
> > > 
> > > Thanks
> > > Tzachi
> > > 
> > > > -----Original Message-----
> > > > From: Sasha Khapyorsky [mailto:sashak at voltaire.com]
> > > > Sent: Thursday, February 08, 2007 7:46 PM
> > > > To: Tzachi Dar; Yossi Leybovich
> > > > Cc: Yevgeny Kliteynik; OPENIB; Michael S. Tsirkin; Hal Rosenstock
> > > > Subject: Re: [Fwd: Re: win related [was: Re: [PATCH 1/2]
> > > > opensm: sigusr1: syslog() fixes]]
> > > > 
> > > > On 11:24 Sun 21 Jan     , Yevgeny Kliteynik wrote:
> > > > > Tzachi, Yossi, please join the thread.
> > > > > What do you think about distributing a copy of the pthread DLL 
> > > > > with opensm?
> > > > 
> > > > Any news here? Thanks.
> > > > 
> > > > Sasha
> > > > 
> > > > > 
> > > > > -- Yevgeny.
> > > > > 
> > > > > -------- Original Message --------
> > > > > Subject: Re: win related [was: Re: [PATCH 1/2] opensm: sigusr1: 
> > > > > syslog() fixes]
> > > > > Date: Fri, 19 Jan 2007 00:20:32 +0200
> > > > > From: Sasha Khapyorsky <sashak at voltaire.com>
> > > > > To: Michael S. Tsirkin <mst at mellanox.co.il>
> > > > > CC: Yevgeny Kliteynik <kliteyn at dev.mellanox.co.il>,        
> > > > OPENIB <openib-general at openib.org>
> > > > > References: <20070118194403.GA23783 at sashak.voltaire.com>
> > > > > <20070118215023.GP9890 at mellanox.co.il>
> > > > > 
> > > > > On 23:50 Thu 18 Jan     , Michael S. Tsirkin wrote:
> > > > > > > Quoting Sasha Khapyorsky <sashak at voltaire.com>:
> > > > > > > Subject: Re: win related [was: Re: [PATCH 1/2] 
> > opensm: sigusr1: 
> > > > > > > syslog() fixes]
> > > > > > > 
> > > > > > > On 07:00 Thu 18 Jan     , Michael S. Tsirkin wrote:
> > > > > > > > > What about pure opensource - 
> > > > > > > > > http://sourceware.org/pthreads-win32/? It is licensed 
> > > > > > > > > under LGPL, I see on the net many positive reports about
> > > > stability and usability.
> > > > > > > > 
> > > > > > > > I used it to do a windows port of linux complib at some 
> > > > > > > > point and opensm seemed to work fine with it. What it was
> > > > lacking at
> > > > > > > > that point was support for 64 bit applications, 
> > and for some 
> > > > > > > > reason (which is still unclear to me) there was a
> > > > strong desire to run opensm in 64 bit mode.
> > > > > > > > Seems to have been fixed now, BTW.
> > > > > > > 
> > > > > > > So this seems to be good option for OpenSM on 
> > Windows. Right?
> > > > > > 
> > > > > > No idea. Distributing a copy of the pthread DLL with
> > > > opensm does not
> > > > > > look like a problem. But is it worth it?
> > > > > 
> > > > > Sure, it makes windows porting much more transparent and
> > > > let us to use
> > > > > standard *nix stuff w/out #ifndef WIN32. Other 
> > (generic) benefit 
> > > > > is that posix is more standard and powerful than 
> > wrappers like complib.
> > > > > 
> > > > > Sasha
> > > > > 
> > > > 
> > 


From halr at voltaire.com  Tue Feb 20 10:37:45 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 20 Feb 2007 13:37:45 -0500
Subject: [openib-general] [Fwd: Re: [Fwd: Re: win related [was: Re: [PATCH
 1/2] opensm: sigusr1: syslog() fixes]]]
Message-ID: <1171996664.4380.316818.camel@hal.voltaire.com>

Also, looping in the OpenFabrics Windows email list on this.

-- Hal

-----Forwarded Message-----

From: Hal Rosenstock <halr at voltaire.com>
To: Tzachi Dar <tzachid at mellanox.co.il>
Cc: OPENIB <openib-general at openib.org>, Gilad Shainer <Shainer at Mellanox.com>
Subject: Re: [openib-general] [Fwd: Re: win related [was: Re: [PATCH 1/2] opensm: sigusr1: syslog() fixes]]
Date: 20 Feb 2007 13:21:38 -0500

Hi Tzachi,

On Thu, 2007-02-08 at 16:24, Tzachi Dar wrote:
> See bellow.

I would like to get back to trying to close on this discussion.

> Thanks
> Tzachi 
> 
> > -----Original Message-----
> > From: Sasha Khapyorsky [mailto:sashak at voltaire.com] 
> > Sent: Thursday, February 08, 2007 9:47 PM
> > To: Tzachi Dar
> > Cc: Yossi Leybovich; Gilad Shainer; Yevgeny Kliteynik; 
> > OPENIB; Michael S. Tsirkin; Hal Rosenstock
> > Subject: Re: [Fwd: Re: win related [was: Re: [PATCH 1/2] 
> > opensm: sigusr1: syslog() fixes]]
> > 
> > On 20:31 Thu 08 Feb     , Tzachi Dar wrote:
> > > The windows open IB has decided on using a BSD only license. 
> > > The common implementation of pthreads as far as I know is 
> > LGPL, which 
> > > means that it can not be used in open IB.
> > 
> > Why not? AFAIK it works perfectly (see (5,6 and Preamble)):
> > http://www.gnu.org/copyleft/lesser.html
> > 
> > And of course there are tons of examples when BSD software 
> > links against LGPLed glibc.
> 
> I can of course write you an answer that will be more than 5 pages long
> of why *I* don't think that 
> Using GPL software is bad for everyone, but I guess that my opinion
> doesn't really meter, so I
> Won't do it.
> The page that you have referenced is of the GNU org, and even there it
> is hard to say that they
> are trying to encourage you to use the LGPL license. In any case, the
> main point is that 
> When open IB windows was formed there was a general decision that it
> will use BSD license. If we
> Start having components with the LGPL this will break that decision, and
> therefore this requires
> some voting of the open IB organization.

I may be missing your point but is there something in the Windows
OpenIB/OpenFabrics license that precludes using Windows OpenIB licensed
code (e.g. BSD like license) in concert with non OpenIB code (like LGPL)
? Isn't that essentially what using the Windows pthreads DLL with OpenSM
would be like ? As I understand it, I don't think this requires a
license change or anything in the OpenIB Windows charter prevents this
or needs changing.

> > > The only two ways that I see around this are 1) Change the 
> > license of 
> > > open IB windows which might be a complicated thing. 2) Find an 
> > > implementation of pthreads that is BSD.
> > 
> > BTW, just wondering... What is relation between windows open 
> > IB and OFA (and OFA's "dual-license rule")?
> Well, the way I see it one can take code from the Linux part under the
> BSD licance and use it in 
> The windows part. The otherway around seems fine to me but some say that
> since the windows BSD liscance
> Reqires that some text will always remain there, the other way around is
> not possibale. As I'm not an 
> Expert in that erea I don't know who is right.

I don't see how this affects what is being discussed about OpenSM. In
all the cases I'm aware of, the portability is from Linux to Windows and
not the other way around.

-- Hal

> > Sasha
> > 
> > > 
> > > Thanks
> > > Tzachi
> > > 
> > > > -----Original Message-----
> > > > From: Sasha Khapyorsky [mailto:sashak at voltaire.com]
> > > > Sent: Thursday, February 08, 2007 7:46 PM
> > > > To: Tzachi Dar; Yossi Leybovich
> > > > Cc: Yevgeny Kliteynik; OPENIB; Michael S. Tsirkin; Hal Rosenstock
> > > > Subject: Re: [Fwd: Re: win related [was: Re: [PATCH 1/2]
> > > > opensm: sigusr1: syslog() fixes]]
> > > > 
> > > > On 11:24 Sun 21 Jan     , Yevgeny Kliteynik wrote:
> > > > > Tzachi, Yossi, please join the thread.
> > > > > What do you think about distributing a copy of the pthread DLL 
> > > > > with opensm?
> > > > 
> > > > Any news here? Thanks.
> > > > 
> > > > Sasha
> > > > 
> > > > > 
> > > > > -- Yevgeny.
> > > > > 
> > > > > -------- Original Message --------
> > > > > Subject: Re: win related [was: Re: [PATCH 1/2] opensm: sigusr1: 
> > > > > syslog() fixes]
> > > > > Date: Fri, 19 Jan 2007 00:20:32 +0200
> > > > > From: Sasha Khapyorsky <sashak at voltaire.com>
> > > > > To: Michael S. Tsirkin <mst at mellanox.co.il>
> > > > > CC: Yevgeny Kliteynik <kliteyn at dev.mellanox.co.il>,        
> > > > OPENIB <openib-general at openib.org>
> > > > > References: <20070118194403.GA23783 at sashak.voltaire.com>
> > > > > <20070118215023.GP9890 at mellanox.co.il>
> > > > > 
> > > > > On 23:50 Thu 18 Jan     , Michael S. Tsirkin wrote:
> > > > > > > Quoting Sasha Khapyorsky <sashak at voltaire.com>:
> > > > > > > Subject: Re: win related [was: Re: [PATCH 1/2] 
> > opensm: sigusr1: 
> > > > > > > syslog() fixes]
> > > > > > > 
> > > > > > > On 07:00 Thu 18 Jan     , Michael S. Tsirkin wrote:
> > > > > > > > > What about pure opensource - 
> > > > > > > > > http://sourceware.org/pthreads-win32/? It is licensed 
> > > > > > > > > under LGPL, I see on the net many positive reports about
> > > > stability and usability.
> > > > > > > > 
> > > > > > > > I used it to do a windows port of linux complib at some 
> > > > > > > > point and opensm seemed to work fine with it. What it was
> > > > lacking at
> > > > > > > > that point was support for 64 bit applications, 
> > and for some 
> > > > > > > > reason (which is still unclear to me) there was a
> > > > strong desire to run opensm in 64 bit mode.
> > > > > > > > Seems to have been fixed now, BTW.
> > > > > > > 
> > > > > > > So this seems to be good option for OpenSM on 
> > Windows. Right?
> > > > > > 
> > > > > > No idea. Distributing a copy of the pthread DLL with
> > > > opensm does not
> > > > > > look like a problem. But is it worth it?
> > > > > 
> > > > > Sure, it makes windows porting much more transparent and
> > > > let us to use
> > > > > standard *nix stuff w/out #ifndef WIN32. Other 
> > (generic) benefit 
> > > > > is that posix is more standard and powerful than 
> > wrappers like complib.
> > > > > 
> > > > > Sasha
> > > > > 
> > > > 
> > 


_______________________________________________
openib-general mailing list
openib-general at openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


From halr at voltaire.com  Tue Feb 20 10:42:38 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 20 Feb 2007 13:42:38 -0500
Subject: [openib-general] Port error rate detection
In-Reply-To: <45DB12FB.7010501@ornl.gov>
References: <45DA0E50.7010002@ornl.gov>
	<1171978018.4380.298013.camel@hal.voltaire.com>
	<45DB096B.2060306@ornl.gov>
	<1171982868.4380.302888.camel@hal.voltaire.com>
	<45DB12FB.7010501@ornl.gov>
Message-ID: <1171996956.4380.317120.camel@hal.voltaire.com>

On Tue, 2007-02-20 at 10:25, Steven Carter wrote:
> Hal Rosenstock wrote:
> > On Tue, 2007-02-20 at 09:44, Steven Carter wrote:
> >   
> >> Hal Rosenstock wrote:
> >>     
> >>> On Mon, 2007-02-19 at 15:53, Steven Carter wrote:
> >>>   
> >>>       
> >>>> I have a Nagios module that alerts on connectivity, port errors, 
> >>>> speed/width problems.  I would like to give it the ability to change the 
> >>>> severity of the alert depending on whether errors are just present or if 
> >>>> they are increasing faster than a specified rate.  The intent is to 
> >>>> equip the module to keep the state of the last query and possibly 
> >>>> history, but I wanted to make sure that I was not re-inventing the wheel 
> >>>> first.  Is there an attribute or utility that I am overlooking that will 
> >>>> help me do this?
> >>>>     
> >>>>         
> >>> Not currently (to my knowledge). The thresholding of rate aspect is
> >>> similat to what will be supported in the proposed PerfManager.
> >>>   
> >>>       
> >> I noticed that in your RFC.  How are you planning on presenting the data 
> >> to other agents (e.g. Nagios, Openview, MRTG, etc.)?  One comment that I 
> >> should have made on your RFC is that I wonder if it is necessary to 
> >> include the data analysis/reduction part.
> >>     
> >
> > I think it is because there is too much data to push up the tree to one
> > manager.
> >   
> I agree, but does the data need to be pushed to one node?  If you go 
> with a distributed approach  where information is aggregated per network 
> device (switch or group of switches), 

The proposal includes a distributed approach.

> then a third-party monitoring 
> server can collect and present it in the same way that it does for an 
> Ethernet network.  That way, you do not need to pass information up to a 
> central node.  You can just have a third party monitoring application 
> collect and present the information.  I guess it just depends on how 
> much you want to leverage existing monitoring solutions and/or how much 
> capability you want inherent in the OFA software.

Third party monitoring agents can hook in at the intermediate nodes in
the collection hierarchy if that is what is desired.

> >> Just having a central location that collects the values and presents it via SNMP is extremely 
> >> useful since there are a plethora of monitoring apps (free and 
> >> commercial) that  do what you are proposing.
> >>     
> I should have said 'a location' and not 'a central location'.  Since 
> most monitoring applications support multiple agents, it is not 
> necessary to aggregate the information into one place.
> >
> > In general, this information can be exported via SNMP or whatever the
> > management infrastructure is.
> >
> > BTW, are there SNMP MIBs for all of this information ? To my knowledge,
> > some of these were started but never completed. Also, the MIBs were
> > geared at the agents rather than the managers (in the PerfMgt arena).
> >   
> There are standard MIBS (e.g. mib-2's ifTable) that can present most of 
> the useful information (in/out octets, errors, etc.)

Not most of the useful IB information.

> , but I would suspect that you would have to supplement that with a private MIB as 
> most other technologies/vendors have.

Yes, as this may be data out of a non IBTA specified manager, it is
likely a private MIB unless one goes for all the agent (PMA) data. There
was a proposed MIB for the PMA at the IETF IPoIB WG.

-- Hal

> Steven.
> 
> > -- Hal
> >
> >   
> >> That way, a network manager can leverage existing tools currently used for monitoring 
> >> Ethernet Nodes, Hosts, etc.  You can still include a last change 
> >> attribute with each counter so that simple utilities (like the one that 
> >> I am writing) can get an idea of how quickly errors are occurring.
> >>     
> >
> >   
> >> Steven.
> >>
> >>     
> >>> -- Hal
> >>>
> >>>   
> >>>       
> >>>> Thanks,
> >>>>
> >>>> Steven.
> >>>>
> >>>> _______________________________________________
> >>>> openib-general mailing list
> >>>> openib-general at openib.org
> >>>> http://openib.org/mailman/listinfo/openib-general
> >>>>
> >>>> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
> >>>>
> >>>>     
> >>>>         
> >>>   
> >>>       
> >
> >   
> 


From ftillier at windows.microsoft.com  Tue Feb 20 10:56:33 2007
From: ftillier at windows.microsoft.com (Fab Tillier)
Date: Tue, 20 Feb 2007 10:56:33 -0800
Subject: [openib-general] [ofw] [Fwd: Re: [Fwd: Re: win related [was:
 Re:[PATCH 1/2] opensm: sigusr1: syslog() fixes]]]
In-Reply-To: <1171996664.4380.316818.camel@hal.voltaire.com>
References: <1171996664.4380.316818.camel@hal.voltaire.com>
Message-ID: <D01673A583414F43A090DFFAB9AF6BE303D06B6B@WIN-MSG-20.wingroup.windeploy.ntdev.microsoft.com>

Submissions to the OFW project are supposed to be bound by the
contributor's agreement:

http://windows.openib.org/openib/contribute.aspx

Contributing code under anything but a BSD license violates condition 1,
though there shouldn't be issues with dual licenses as long as one of
the available licenses is a BSD license.

In any case, we're not talking about putting the pthreads library in
source or binary form in the OFW SVN, right?  We're just talking about
having OpenSM link to the pthreads library that is out-of-tree.  So the
question is whether there are any licensing issues with having a BSD
code include an out-of-tree LGPL file that would affect the ability to
retain the BSD license on the OpenSM files.  I can see this causing
problems for builds, as people would need to find/install the pthreads
library before OpenSM would build successfully.

-Fab

-----Original Message-----
From: ofw-bounces at lists.openfabrics.org
[mailto:ofw-bounces at lists.openfabrics.org] On Behalf Of Hal Rosenstock
Sent: Tuesday, February 20, 2007 10:38 AM
To: ofw at lists.openfabrics.org
Cc: Gilad Shainer; OPENIB
Subject: [ofw] [Fwd: Re: [openib-general] [Fwd: Re: win related [was:
Re:[PATCH 1/2] opensm: sigusr1: syslog() fixes]]]

Also, looping in the OpenFabrics Windows email list on this.

-- Hal

-----Forwarded Message-----

From: Hal Rosenstock <halr at voltaire.com>
To: Tzachi Dar <tzachid at mellanox.co.il>
Cc: OPENIB <openib-general at openib.org>, Gilad Shainer
<Shainer at Mellanox.com>
Subject: Re: [openib-general] [Fwd: Re: win related [was: Re: [PATCH
1/2] opensm: sigusr1: syslog() fixes]]
Date: 20 Feb 2007 13:21:38 -0500

Hi Tzachi,

On Thu, 2007-02-08 at 16:24, Tzachi Dar wrote:
> See bellow.

I would like to get back to trying to close on this discussion.

> Thanks
> Tzachi 
> 
> > -----Original Message-----
> > From: Sasha Khapyorsky [mailto:sashak at voltaire.com] 
> > Sent: Thursday, February 08, 2007 9:47 PM
> > To: Tzachi Dar
> > Cc: Yossi Leybovich; Gilad Shainer; Yevgeny Kliteynik; 
> > OPENIB; Michael S. Tsirkin; Hal Rosenstock
> > Subject: Re: [Fwd: Re: win related [was: Re: [PATCH 1/2] 
> > opensm: sigusr1: syslog() fixes]]
> > 
> > On 20:31 Thu 08 Feb     , Tzachi Dar wrote:
> > > The windows open IB has decided on using a BSD only license. 
> > > The common implementation of pthreads as far as I know is 
> > LGPL, which 
> > > means that it can not be used in open IB.
> > 
> > Why not? AFAIK it works perfectly (see (5,6 and Preamble)):
> > http://www.gnu.org/copyleft/lesser.html
> > 
> > And of course there are tons of examples when BSD software 
> > links against LGPLed glibc.
> 
> I can of course write you an answer that will be more than 5 pages
long
> of why *I* don't think that 
> Using GPL software is bad for everyone, but I guess that my opinion
> doesn't really meter, so I
> Won't do it.
> The page that you have referenced is of the GNU org, and even there it
> is hard to say that they
> are trying to encourage you to use the LGPL license. In any case, the
> main point is that 
> When open IB windows was formed there was a general decision that it
> will use BSD license. If we
> Start having components with the LGPL this will break that decision,
and
> therefore this requires
> some voting of the open IB organization.

I may be missing your point but is there something in the Windows
OpenIB/OpenFabrics license that precludes using Windows OpenIB licensed
code (e.g. BSD like license) in concert with non OpenIB code (like LGPL)
? Isn't that essentially what using the Windows pthreads DLL with OpenSM
would be like ? As I understand it, I don't think this requires a
license change or anything in the OpenIB Windows charter prevents this
or needs changing.

> > > The only two ways that I see around this are 1) Change the 
> > license of 
> > > open IB windows which might be a complicated thing. 2) Find an 
> > > implementation of pthreads that is BSD.
> > 
> > BTW, just wondering... What is relation between windows open 
> > IB and OFA (and OFA's "dual-license rule")?
> Well, the way I see it one can take code from the Linux part under the
> BSD licance and use it in 
> The windows part. The otherway around seems fine to me but some say
that
> since the windows BSD liscance
> Reqires that some text will always remain there, the other way around
is
> not possibale. As I'm not an 
> Expert in that erea I don't know who is right.

I don't see how this affects what is being discussed about OpenSM. In
all the cases I'm aware of, the portability is from Linux to Windows and
not the other way around.

-- Hal

> > Sasha
> > 
> > > 
> > > Thanks
> > > Tzachi
> > > 
> > > > -----Original Message-----
> > > > From: Sasha Khapyorsky [mailto:sashak at voltaire.com]
> > > > Sent: Thursday, February 08, 2007 7:46 PM
> > > > To: Tzachi Dar; Yossi Leybovich
> > > > Cc: Yevgeny Kliteynik; OPENIB; Michael S. Tsirkin; Hal
Rosenstock
> > > > Subject: Re: [Fwd: Re: win related [was: Re: [PATCH 1/2]
> > > > opensm: sigusr1: syslog() fixes]]
> > > > 
> > > > On 11:24 Sun 21 Jan     , Yevgeny Kliteynik wrote:
> > > > > Tzachi, Yossi, please join the thread.
> > > > > What do you think about distributing a copy of the pthread DLL

> > > > > with opensm?
> > > > 
> > > > Any news here? Thanks.
> > > > 
> > > > Sasha
> > > > 
> > > > > 
> > > > > -- Yevgeny.
> > > > > 
> > > > > -------- Original Message --------
> > > > > Subject: Re: win related [was: Re: [PATCH 1/2] opensm:
sigusr1: 
> > > > > syslog() fixes]
> > > > > Date: Fri, 19 Jan 2007 00:20:32 +0200
> > > > > From: Sasha Khapyorsky <sashak at voltaire.com>
> > > > > To: Michael S. Tsirkin <mst at mellanox.co.il>
> > > > > CC: Yevgeny Kliteynik <kliteyn at dev.mellanox.co.il>,        
> > > > OPENIB <openib-general at openib.org>
> > > > > References: <20070118194403.GA23783 at sashak.voltaire.com>
> > > > > <20070118215023.GP9890 at mellanox.co.il>
> > > > > 
> > > > > On 23:50 Thu 18 Jan     , Michael S. Tsirkin wrote:
> > > > > > > Quoting Sasha Khapyorsky <sashak at voltaire.com>:
> > > > > > > Subject: Re: win related [was: Re: [PATCH 1/2] 
> > opensm: sigusr1: 
> > > > > > > syslog() fixes]
> > > > > > > 
> > > > > > > On 07:00 Thu 18 Jan     , Michael S. Tsirkin wrote:
> > > > > > > > > What about pure opensource - 
> > > > > > > > > http://sourceware.org/pthreads-win32/? It is licensed 
> > > > > > > > > under LGPL, I see on the net many positive reports
about
> > > > stability and usability.
> > > > > > > > 
> > > > > > > > I used it to do a windows port of linux complib at some 
> > > > > > > > point and opensm seemed to work fine with it. What it
was
> > > > lacking at
> > > > > > > > that point was support for 64 bit applications, 
> > and for some 
> > > > > > > > reason (which is still unclear to me) there was a
> > > > strong desire to run opensm in 64 bit mode.
> > > > > > > > Seems to have been fixed now, BTW.
> > > > > > > 
> > > > > > > So this seems to be good option for OpenSM on 
> > Windows. Right?
> > > > > > 
> > > > > > No idea. Distributing a copy of the pthread DLL with
> > > > opensm does not
> > > > > > look like a problem. But is it worth it?
> > > > > 
> > > > > Sure, it makes windows porting much more transparent and
> > > > let us to use
> > > > > standard *nix stuff w/out #ifndef WIN32. Other 
> > (generic) benefit 
> > > > > is that posix is more standard and powerful than 
> > wrappers like complib.
> > > > > 
> > > > > Sasha
> > > > > 
> > > > 
> > 


_______________________________________________
openib-general mailing list
openib-general at openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit
http://openib.org/mailman/listinfo/openib-general


_______________________________________________
ofw mailing list
ofw at lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ofw


From krause at cup.hp.com  Tue Feb 20 10:56:20 2007
From: krause at cup.hp.com (Michael Krause)
Date: Tue, 20 Feb 2007 10:56:20 -0800
Subject: [openib-general] IB routing discussion summary
In-Reply-To: <45D4D924.8070507@ichips.intel.com>
References: <000601c74fb4$6ed83840$8698070a@amr.corp.intel.com>
	<45D4B705.5020805@ichips.intel.com>
	<6.2.0.14.2.20070215123631.09692088@esmail.cup.hp.com>
	<45D4D924.8070507@ichips.intel.com>
Message-ID: <6.2.0.14.2.20070220103929.02953a20@esmail.cup.hp.com>

At 02:05 PM 2/15/2007, Sean Hefty wrote:
>>Is this first an IBTA problem to solve if you believe there is a problem?
>
>Based on my interpretation, I do not believe that there's an error in the 
>architecture.  It seems consistent.  Additional clarification of what 
>PathRecord fields mean when the GIDs are on different subnets may be 
>needed, and a change to the architecture may make things easier to 
>implement, but that's a separate matter.
>
>>I contend CM does not require anything that is subnet local other than to
>>target a given router port which should be derived from local SM/SA only
>
>Then please state how the passive side obtains the information (e.g. 
>SLID/DLID) it needs in order to configure its QP.  I claim that 
>information is carried in the CM REQ.

It should not be carried in the CM REQ.  The SLID / DLID of the router 
ports should be derived through local subnet SA / SM query.  When a CM REQ 
traverses one or more subnets there will be potentially many SLID / DLID 
involved in the communication.   Each router should be populating its 
routing tables in order to build the new LRH attached to the GRH / CM REQ 
that it is forwarding to the next hop.


>The alternatives that I see are:
>
>1. The passive side extracts the data from the LRH that carries the CM REQ.
>2. The passive side issues its own local path record query.
>
>Will you please clarify where this information comes from?

The router protocol determines path to the next hop.   As noted in prior 
e-mails, the router works in conjunction with the SM/SA to populate its 
database so that any CM or other query for a path record to get to / from 
the router can be derived and optimized based on local policy, e.g. QoS, 
within each subnet.


>>I will further state that SA-SA communication sans perhaps a
>>P_Key / Q_Key service lookup should be avoided wherever possible.
>
>I agree - which is why my proposal avoided SA-SA communication.  I see 
>nothing in the architecture that prohibits a node from querying an SA that 
>is not on its local subnet.

I'd need to go back but the architecture is predicated that the SM and SA 
are strictly local and for security purposes their communication should 
remain local.  Higher level management entities built to communicate with 
SM and SA are responsible for cross subnet communications without exposing 
the SA or SM to direct interaction.  P_Key and Q_Key management across 
subnets is an example of such communication across subnets that would not 
be exposed to the SA and SM.

Mike


From halr at voltaire.com  Tue Feb 20 10:56:38 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 20 Feb 2007 13:56:38 -0500
Subject: [openib-general] [ofw] [Fwd: Re: [Fwd: Re: win related [was:
 Re:[PATCH 1/2] opensm: sigusr1: syslog() fixes]]]
In-Reply-To: <D01673A583414F43A090DFFAB9AF6BE303D06B6B@WIN-MSG-20.wingroup.windeploy.ntdev.microsoft.com>
References: <1171996664.4380.316818.camel@hal.voltaire.com>
	<D01673A583414F43A090DFFAB9AF6BE303D06B6B@WIN-MSG-20.wingroup.windeploy.ntdev.microsoft.com>
Message-ID: <1171997797.4380.318016.camel@hal.voltaire.com>

On Tue, 2007-02-20 at 13:56, Fab Tillier wrote:
> Submissions to the OFW project are supposed to be bound by the
> contributor's agreement:
> 
> http://windows.openib.org/openib/contribute.aspx
> 
> Contributing code under anything but a BSD license violates condition 1,
> though there shouldn't be issues with dual licenses as long as one of
> the available licenses is a BSD license.
> 
> In any case, we're not talking about putting the pthreads library in
> source or binary form in the OFW SVN, right? 

Right (we're not).

>  We're just talking about
> having OpenSM link to the pthreads library that is out-of-tree.

Yes.

>   So the
> question is whether there are any licensing issues with having a BSD
> code include an out-of-tree LGPL file that would affect the ability to
> retain the BSD license on the OpenSM files.

I don't think this is an issue as there are other instances of this
being done (outside of OpenIB).

> I can see this causing
> problems for builds, as people would need to find/install the pthreads
> library before OpenSM would build successfully.

Could install documentation for OpenSM on Windows minimize this as an
issue ?

-- Hal

> -Fab
> 
> -----Original Message-----
> From: ofw-bounces at lists.openfabrics.org
> [mailto:ofw-bounces at lists.openfabrics.org] On Behalf Of Hal Rosenstock
> Sent: Tuesday, February 20, 2007 10:38 AM
> To: ofw at lists.openfabrics.org
> Cc: Gilad Shainer; OPENIB
> Subject: [ofw] [Fwd: Re: [openib-general] [Fwd: Re: win related [was:
> Re:[PATCH 1/2] opensm: sigusr1: syslog() fixes]]]
> 
> Also, looping in the OpenFabrics Windows email list on this.
> 
> -- Hal
> 
> -----Forwarded Message-----
> 
> From: Hal Rosenstock <halr at voltaire.com>
> To: Tzachi Dar <tzachid at mellanox.co.il>
> Cc: OPENIB <openib-general at openib.org>, Gilad Shainer
> <Shainer at Mellanox.com>
> Subject: Re: [openib-general] [Fwd: Re: win related [was: Re: [PATCH
> 1/2] opensm: sigusr1: syslog() fixes]]
> Date: 20 Feb 2007 13:21:38 -0500
> 
> Hi Tzachi,
> 
> On Thu, 2007-02-08 at 16:24, Tzachi Dar wrote:
> > See bellow.
> 
> I would like to get back to trying to close on this discussion.
> 
> > Thanks
> > Tzachi 
> > 
> > > -----Original Message-----
> > > From: Sasha Khapyorsky [mailto:sashak at voltaire.com] 
> > > Sent: Thursday, February 08, 2007 9:47 PM
> > > To: Tzachi Dar
> > > Cc: Yossi Leybovich; Gilad Shainer; Yevgeny Kliteynik; 
> > > OPENIB; Michael S. Tsirkin; Hal Rosenstock
> > > Subject: Re: [Fwd: Re: win related [was: Re: [PATCH 1/2] 
> > > opensm: sigusr1: syslog() fixes]]
> > > 
> > > On 20:31 Thu 08 Feb     , Tzachi Dar wrote:
> > > > The windows open IB has decided on using a BSD only license. 
> > > > The common implementation of pthreads as far as I know is 
> > > LGPL, which 
> > > > means that it can not be used in open IB.
> > > 
> > > Why not? AFAIK it works perfectly (see (5,6 and Preamble)):
> > > http://www.gnu.org/copyleft/lesser.html
> > > 
> > > And of course there are tons of examples when BSD software 
> > > links against LGPLed glibc.
> > 
> > I can of course write you an answer that will be more than 5 pages
> long
> > of why *I* don't think that 
> > Using GPL software is bad for everyone, but I guess that my opinion
> > doesn't really meter, so I
> > Won't do it.
> > The page that you have referenced is of the GNU org, and even there it
> > is hard to say that they
> > are trying to encourage you to use the LGPL license. In any case, the
> > main point is that 
> > When open IB windows was formed there was a general decision that it
> > will use BSD license. If we
> > Start having components with the LGPL this will break that decision,
> and
> > therefore this requires
> > some voting of the open IB organization.
> 
> I may be missing your point but is there something in the Windows
> OpenIB/OpenFabrics license that precludes using Windows OpenIB licensed
> code (e.g. BSD like license) in concert with non OpenIB code (like LGPL)
> ? Isn't that essentially what using the Windows pthreads DLL with OpenSM
> would be like ? As I understand it, I don't think this requires a
> license change or anything in the OpenIB Windows charter prevents this
> or needs changing.
> 
> > > > The only two ways that I see around this are 1) Change the 
> > > license of 
> > > > open IB windows which might be a complicated thing. 2) Find an 
> > > > implementation of pthreads that is BSD.
> > > 
> > > BTW, just wondering... What is relation between windows open 
> > > IB and OFA (and OFA's "dual-license rule")?
> > Well, the way I see it one can take code from the Linux part under the
> > BSD licance and use it in 
> > The windows part. The otherway around seems fine to me but some say
> that
> > since the windows BSD liscance
> > Reqires that some text will always remain there, the other way around
> is
> > not possibale. As I'm not an 
> > Expert in that erea I don't know who is right.
> 
> I don't see how this affects what is being discussed about OpenSM. In
> all the cases I'm aware of, the portability is from Linux to Windows and
> not the other way around.
> 
> -- Hal
> 
> > > Sasha
> > > 
> > > > 
> > > > Thanks
> > > > Tzachi
> > > > 
> > > > > -----Original Message-----
> > > > > From: Sasha Khapyorsky [mailto:sashak at voltaire.com]
> > > > > Sent: Thursday, February 08, 2007 7:46 PM
> > > > > To: Tzachi Dar; Yossi Leybovich
> > > > > Cc: Yevgeny Kliteynik; OPENIB; Michael S. Tsirkin; Hal
> Rosenstock
> > > > > Subject: Re: [Fwd: Re: win related [was: Re: [PATCH 1/2]
> > > > > opensm: sigusr1: syslog() fixes]]
> > > > > 
> > > > > On 11:24 Sun 21 Jan     , Yevgeny Kliteynik wrote:
> > > > > > Tzachi, Yossi, please join the thread.
> > > > > > What do you think about distributing a copy of the pthread DLL
> 
> > > > > > with opensm?
> > > > > 
> > > > > Any news here? Thanks.
> > > > > 
> > > > > Sasha
> > > > > 
> > > > > > 
> > > > > > -- Yevgeny.
> > > > > > 
> > > > > > -------- Original Message --------
> > > > > > Subject: Re: win related [was: Re: [PATCH 1/2] opensm:
> sigusr1: 
> > > > > > syslog() fixes]
> > > > > > Date: Fri, 19 Jan 2007 00:20:32 +0200
> > > > > > From: Sasha Khapyorsky <sashak at voltaire.com>
> > > > > > To: Michael S. Tsirkin <mst at mellanox.co.il>
> > > > > > CC: Yevgeny Kliteynik <kliteyn at dev.mellanox.co.il>,        
> > > > > OPENIB <openib-general at openib.org>
> > > > > > References: <20070118194403.GA23783 at sashak.voltaire.com>
> > > > > > <20070118215023.GP9890 at mellanox.co.il>
> > > > > > 
> > > > > > On 23:50 Thu 18 Jan     , Michael S. Tsirkin wrote:
> > > > > > > > Quoting Sasha Khapyorsky <sashak at voltaire.com>:
> > > > > > > > Subject: Re: win related [was: Re: [PATCH 1/2] 
> > > opensm: sigusr1: 
> > > > > > > > syslog() fixes]
> > > > > > > > 
> > > > > > > > On 07:00 Thu 18 Jan     , Michael S. Tsirkin wrote:
> > > > > > > > > > What about pure opensource - 
> > > > > > > > > > http://sourceware.org/pthreads-win32/? It is licensed 
> > > > > > > > > > under LGPL, I see on the net many positive reports
> about
> > > > > stability and usability.
> > > > > > > > > 
> > > > > > > > > I used it to do a windows port of linux complib at some 
> > > > > > > > > point and opensm seemed to work fine with it. What it
> was
> > > > > lacking at
> > > > > > > > > that point was support for 64 bit applications, 
> > > and for some 
> > > > > > > > > reason (which is still unclear to me) there was a
> > > > > strong desire to run opensm in 64 bit mode.
> > > > > > > > > Seems to have been fixed now, BTW.
> > > > > > > > 
> > > > > > > > So this seems to be good option for OpenSM on 
> > > Windows. Right?
> > > > > > > 
> > > > > > > No idea. Distributing a copy of the pthread DLL with
> > > > > opensm does not
> > > > > > > look like a problem. But is it worth it?
> > > > > > 
> > > > > > Sure, it makes windows porting much more transparent and
> > > > > let us to use
> > > > > > standard *nix stuff w/out #ifndef WIN32. Other 
> > > (generic) benefit 
> > > > > > is that posix is more standard and powerful than 
> > > wrappers like complib.
> > > > > > 
> > > > > > Sasha
> > > > > > 
> > > > > 
> > > 
> 
> 
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit
> http://openib.org/mailman/listinfo/openib-general
> 
> 
> _______________________________________________
> ofw mailing list
> ofw at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ofw


From arlin.r.davis at intel.com  Tue Feb 20 11:06:17 2007
From: arlin.r.davis at intel.com (Arlin Davis)
Date: Tue, 20 Feb 2007 11:06:17 -0800
Subject: [openib-general] Fork issues with simple MPI program
In-Reply-To: <45DAB3FD.8060606@voltaire.com>
Message-ID: <000001c75522$334f6a00$4297070a@amr.corp.intel.com>


>Arlin Davis wrote:
>> Any insight would be greatly appreciated. It was our assumption that the parent process can
>continue
>> to use IB resources after the fixes went into 2.6.16 and OFED 1.1. Is this true?
>
>As was discussed over this list in few occasions: in contrast to popular
>thought the fork support was deployed in libibverbs1.1 where OFED 1.1
>contains libibverbs1.0
 
OFED 1.2 alpha (libibverbs 1.1) on 2.6.20 fails the same way. Does the following disclaimer still
apply?

"Fork support from kernel 2.6.12 and above is available provided
that applications do not use threads. The fork() is supported as long
as parent process does not run before child exits or calls exec().
The former can be achieved by calling wait(childpid) the later can be
achieved by application specific means.  Posix system() call is
supported."


From tziporet at mellanox.co.il  Tue Feb 20 11:54:35 2007
From: tziporet at mellanox.co.il (Tziporet Koren)
Date: Tue, 20 Feb 2007 21:54:35 +0200
Subject: [openib-general] Fork issues with simple MPI program
In-Reply-To: <45DAB3FD.8060606@voltaire.com>
References: <000001c75454$523660f0$eed4180a@amr.corp.intel.com>
	<45DAB3FD.8060606@voltaire.com>
Message-ID: <45DB51FB.5090500@mellanox.co.il>

Or Gerlitz wrote:
> Arlin Davis wrote:
>   
>> Any insight would be greatly appreciated. It was our assumption that the parent process can continue
>> to use IB resources after the fixes went into 2.6.16 and OFED 1.1. Is this true? 
>>     
>
> As was discussed over this list in few occasions: in contrast to popular 
> thought the fork support was deployed in libibverbs1.1 where OFED 1.1 
> contains libibverbs1.0
>
> Or.
>
>
>   
The only fork support in OFED 1.1 is system() or fork & exec.
Note that the support in OFED 1.2 (actually changes in libibverbs 1.1) 
needs some change in the application.

Tziporet


From tziporet at mellanox.co.il  Tue Feb 20 12:01:32 2007
From: tziporet at mellanox.co.il (Tziporet Koren)
Date: Tue, 20 Feb 2007 22:01:32 +0200
Subject: [openib-general] Fork issues with simple MPI program
In-Reply-To: <000001c75522$334f6a00$4297070a@amr.corp.intel.com>
References: <000001c75522$334f6a00$4297070a@amr.corp.intel.com>
Message-ID: <45DB539C.3050905@mellanox.co.il>

Arlin Davis wrote:
>  
> OFED 1.2 alpha (libibverbs 1.1) on 2.6.20 fails the same way. Does the following disclaimer still
> apply?
>
> "Fork support from kernel 2.6.12 and above is available provided
> that applications do not use threads. The fork() is supported as long
> as parent process does not run before child exits or calls exec().
> The former can be achieved by calling wait(childpid) the later can be
> achieved by application specific means.  Posix system() call is
> supported."
>
>   
As replied before - if you want full fork support you need to change the 
application. Look at the verbs header for details.

Tziporet


From rdreier at cisco.com  Tue Feb 20 12:24:37 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Tue, 20 Feb 2007 12:24:37 -0800
Subject: [openib-general] Fork issues with simple MPI program
In-Reply-To: <45DB539C.3050905@mellanox.co.il> (Tziporet Koren's message
	of "Tue, 20 Feb 2007 22:01:32 +0200")
References: <000001c75522$334f6a00$4297070a@amr.corp.intel.com>
	<45DB539C.3050905@mellanox.co.il>
Message-ID: <adak5ycpofu.fsf@cisco.com>

 > As replied before - if you want full fork support you need to change the 
 > application. Look at the verbs header for details.

Or you could try setting the IBV_FORK_SAFE environment variable before
running your application.  I guess for MPI jobs you need to make sure
that environment variable is propagated to every process.


From arlin.r.davis at intel.com  Tue Feb 20 12:40:31 2007
From: arlin.r.davis at intel.com (Arlin Davis)
Date: Tue, 20 Feb 2007 12:40:31 -0800
Subject: [openib-general] Fork issues with simple MPI program
In-Reply-To: <adak5ycpofu.fsf@cisco.com>
Message-ID: <000101c7552f$5cf14c90$4297070a@amr.corp.intel.com>


>
>Or you could try setting the IBV_FORK_SAFE environment variable before
>running your application.  I guess for MPI jobs you need to make sure
>that environment variable is propagated to every process.

Ahh! That's what I was looking for. Thanks!

This information is scattered around in various email threads, header files, and code. Can someone
please add relevant text to the OFED 1.2 release notes or a Wiki page?


From tziporet at mellanox.co.il  Tue Feb 20 12:57:02 2007
From: tziporet at mellanox.co.il (Tziporet Koren)
Date: Tue, 20 Feb 2007 22:57:02 +0200
Subject: [openib-general] Fork issues with simple MPI program
In-Reply-To: <000101c7552f$5cf14c90$4297070a@amr.corp.intel.com>
References: <000101c7552f$5cf14c90$4297070a@amr.corp.intel.com>
Message-ID: <45DB609E.6020701@mellanox.co.il>

Arlin Davis wrote:
>> Or you could try setting the IBV_FORK_SAFE environment variable before
>> running your application.  I guess for MPI jobs you need to make sure
>> that environment variable is propagated to every process.
>>     
>
> Ahh! That's what I was looking for. Thanks!
>
> This information is scattered around in various email threads, header files, and code. Can someone
> please add relevant text to the OFED 1.2 release notes or a Wiki page?
>   
Roland,
If you can send me the details (since you implemented it) I will add it 
to the Wiki

Thanks,
Tziporet


From rdreier at cisco.com  Tue Feb 20 12:58:03 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Tue, 20 Feb 2007 12:58:03 -0800
Subject: [openib-general] Fork issues with simple MPI program
In-Reply-To: <45DB609E.6020701@mellanox.co.il> (Tziporet Koren's message
	of "Tue, 20 Feb 2007 22:57:02 +0200")
References: <000101c7552f$5cf14c90$4297070a@amr.corp.intel.com>
	<45DB609E.6020701@mellanox.co.il>
Message-ID: <adar6sko8bo.fsf@cisco.com>

 > If you can send me the details (since you implemented it) I will add
 > it to the Wiki

An application that wants fork() to work with libibverbs should
either call ibv_fork_init() before doing anything else with
libibverbs, or else a user can set the IBV_FORK_SAFE or
RDMAV_FORK_SAFE environment variable to get the same effect.  There is
some overhead to making fork() work so it is not enabled by default.
This is described in the ibv_fork_init manpage in the latest
libibverbs git tree.

 - R.


From ftillier at windows.microsoft.com  Tue Feb 20 13:08:40 2007
From: ftillier at windows.microsoft.com (Fab Tillier)
Date: Tue, 20 Feb 2007 13:08:40 -0800
Subject: [openib-general] [ofw] [Fwd: Re: [Fwd: Re: win related
 [was:Re:[PATCH 1/2] opensm: sigusr1: syslog() fixes]]]
In-Reply-To: <1171997797.4380.318016.camel@hal.voltaire.com>
References: <1171996664.4380.316818.camel@hal.voltaire.com>
	<D01673A583414F43A090DFFAB9AF6BE303D06B6B@WIN-MSG-20.wingroup.windeploy.ntdev.microsoft.com>
	<1171997797.4380.318016.camel@hal.voltaire.com>
Message-ID: <D01673A583414F43A090DFFAB9AF6BE303D06CE7@WIN-MSG-20.wingroup.windeploy.ntdev.microsoft.com>


-----Original Message-----
From: Hal Rosenstock [mailto:halr at voltaire.com] 
Sent: Tuesday, February 20, 2007 10:57 AM

On Tue, 2007-02-20 at 13:56, Fab Tillier wrote:
> Submissions to the OFW project are supposed to be bound by the
> contributor's agreement:
> 
> I can see this causing
> problems for builds, as people would need to find/install the pthreads
> library before OpenSM would build successfully.

Could install documentation for OpenSM on Windows minimize this as an
issue ?

[ftillier] This isn't just an install issue - it's a build issue.
Anyone that wants to build OpenSM will need to find/download/install the
pthreads library so that the build will succeed.  If linking statically,
the resulting executable will not require any special installation.
It's only an install issue if you link dynamically to pitheads.

-Fab


From halr at voltaire.com  Tue Feb 20 13:43:00 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 20 Feb 2007 16:43:00 -0500
Subject: [openib-general] [ofw] [Fwd: Re: [Fwd: Re: win related
 [was:Re:[PATCH 1/2] opensm: sigusr1: syslog() fixes]]]
In-Reply-To: <D01673A583414F43A090DFFAB9AF6BE303D06CE7@WIN-MSG-20.wingroup.windeploy.ntdev.microsoft.com>
References: <1171996664.4380.316818.camel@hal.voltaire.com>
	<D01673A583414F43A090DFFAB9AF6BE303D06B6B@WIN-MSG-20.wingroup.windeploy.ntdev.microsoft.com>
	<1171997797.4380.318016.camel@hal.voltaire.com>
	<D01673A583414F43A090DFFAB9AF6BE303D06CE7@WIN-MSG-20.wingroup.windeploy.ntdev.microsoft.com>
Message-ID: <1172007778.4380.328202.camel@hal.voltaire.com>

On Tue, 2007-02-20 at 16:08, Fab Tillier wrote:
> -----Original Message-----
> From: Hal Rosenstock [mailto:halr at voltaire.com] 
> Sent: Tuesday, February 20, 2007 10:57 AM
> 
> On Tue, 2007-02-20 at 13:56, Fab Tillier wrote:
> > Submissions to the OFW project are supposed to be bound by the
> > contributor's agreement:
> > 
> > I can see this causing
> > problems for builds, as people would need to find/install the pthreads
> > library before OpenSM would build successfully.
> 
> Could install documentation for OpenSM on Windows minimize this as an
> issue ?
> 
> [ftillier] This isn't just an install issue - it's a build issue.
> Anyone that wants to build OpenSM will need to find/download/install the
> pthreads library so that the build will succeed.  If linking statically,
> the resulting executable will not require any special installation.
> It's only an install issue if you link dynamically to pitheads.

OK; then build and install. How big an issue is this ?

I thought DLLs were dynamically linked but I'm a Windows plebe. 

-- Hal

> -Fab


From ftillier at windows.microsoft.com  Tue Feb 20 13:56:11 2007
From: ftillier at windows.microsoft.com (Fab Tillier)
Date: Tue, 20 Feb 2007 13:56:11 -0800
Subject: [openib-general] [ofw] [Fwd: Re: [Fwd: Re: win
 related[was:Re:[PATCH 1/2] opensm: sigusr1: syslog() fixes]]]
In-Reply-To: <1172007778.4380.328202.camel@hal.voltaire.com>
References: <1171996664.4380.316818.camel@hal.voltaire.com><D01673A583414F43A090DFFAB9AF6BE303D06B6B@WIN-MSG-20.wingroup.windeploy.ntdev.microsoft.com><1171997797.4380.318016.camel@hal.voltaire.com><D01673A583414F43A090DFFAB9AF6BE303D06CE7@WIN-MSG-20.wingroup.windeploy.ntdev.microsoft.com>
	<1172007778.4380.328202.camel@hal.voltaire.com>
Message-ID: <D01673A583414F43A090DFFAB9AF6BE303D06D63@WIN-MSG-20.wingroup.windeploy.ntdev.microsoft.com>

-----Original Message-----
From: ofw-bounces at lists.openfabrics.org
[mailto:ofw-bounces at lists.openfabrics.org] On Behalf Of Hal Rosenstock
Sent: Tuesday, February 20, 2007 1:43 PM

On Tue, 2007-02-20 at 16:08, Fab Tillier wrote:
> -----Original Message-----
> From: Hal Rosenstock [mailto:halr at voltaire.com] 
> Sent: Tuesday, February 20, 2007 10:57 AM
> 
> On Tue, 2007-02-20 at 13:56, Fab Tillier wrote:
> [ftillier] This isn't just an install issue - it's a build issue.
> Anyone that wants to build OpenSM will need to find/download/install
the
> pthreads library so that the build will succeed.  If linking
statically,
> the resulting executable will not require any special installation.
> It's only an install issue if you link dynamically to pitheads.

OK; then build and install. How big an issue is this ?

I thought DLLs were dynamically linked but I'm a Windows plebe. 

[ftillier] When you build, the linker needs the import library for
pthreads so that the functions get resolved as being imported from the
pthreads DLL.  The dependency on the pthreads DLL is then created and
the DLL will be loaded dynamically, assuming it can be found in the
path.

So for the build process, you need to have the pthreads library
available to the build tool (path to the lib).  This requires installing
the pthreads developer package or however it's done.

If you statically link the pthreads lib, rather than dynamically link,
then all the pthreads goodies go directly into the executable and you
remove the dependency on an external DLL.  The build process
requirements are no different than for the dynamically linked case.

There is also the possibility to remove the link-time dependency by
calling GetProcAddress to explicitly resolve the pthreads entrypoints.
This method still requires having the DLL loaded on the user's systems.

Pesonally, I would rather see static linkage to the pthreads library so
that only the builds are affected (something only 'experts' will be
doing), while not affecting the common user.

-Fab


From halr at voltaire.com  Tue Feb 20 14:23:49 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 20 Feb 2007 17:23:49 -0500
Subject: [openib-general] [PATCH] osm/libvendor: compilation fixes
In-Reply-To: <20070219230441.GA27414@sashak.voltaire.com>
References: <20070219214630.GW27414@sashak.voltaire.com>
	<20070219230441.GA27414@sashak.voltaire.com>
Message-ID: <1172010229.4380.330691.camel@hal.voltaire.com>

On Mon, 2007-02-19 at 18:04, Sasha Khapyorsky wrote:
> This adds needed header files inclusion to prevent compilation failures.
> 
> Signed-off-by: Sasha Khapyorsky <sashak at voltaire.com>
> ---

Thanks. Applied (to both master and ofed_1_2).

-- Hal


From arlin.r.davis at intel.com  Tue Feb 20 14:29:40 2007
From: arlin.r.davis at intel.com (Davis, Arlin R)
Date: Tue, 20 Feb 2007 14:29:40 -0800
Subject: [openib-general] Fork issues with simple MPI program
In-Reply-To: <adar6sko8bo.fsf@cisco.com>
Message-ID: <B0095134066CC94FBC80973103FFA1FE0322522A@orsmsx416.amr.corp.intel.com>


>An application that wants fork() to work with libibverbs should
>either call ibv_fork_init() before doing anything else with
>libibverbs, or else a user can set the IBV_FORK_SAFE or
>RDMAV_FORK_SAFE environment variable to get the same effect.  There is
>some overhead to making fork() work so it is not enabled by default.
>This is described in the ibv_fork_init manpage in the latest
>libibverbs git tree.


Does this require 2.6.16 or better kernel support?


From rdreier at cisco.com  Tue Feb 20 14:33:02 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Tue, 20 Feb 2007 14:33:02 -0800
Subject: [openib-general] Fork issues with simple MPI program
In-Reply-To: <B0095134066CC94FBC80973103FFA1FE0322522A@orsmsx416.amr.corp.intel.com>
	(Arlin R. Davis's message of "Tue, 20 Feb 2007 14:29:40 -0800")
References: <B0095134066CC94FBC80973103FFA1FE0322522A@orsmsx416.amr.corp.intel.com>
Message-ID: <adaodnoh335.fsf@cisco.com>

 > Does this require 2.6.16 or better kernel support?

The kernel must support the MADV_DONTFORK flag to madvise(), not sure
when exactly that was merged but 2.6.16 or so sounds right.

ibv_fork_init() will return an error if the kernel support is missing
and fork safety won't actually work.  And if you use the environment
variable a warning will be printed if ibv_fork_init() fails.

 - R.


From rdreier at cisco.com  Tue Feb 20 16:12:29 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Tue, 20 Feb 2007 16:12:29 -0800
Subject: [openib-general] I created a git tree for the libibverbs man
	pages
In-Reply-To: <45BF756B.1060500@dev.mellanox.co.il> (Dotan Barak's
	message of "Tue, 30 Jan 2007 18:42:19 +0200")
References: <45BF63A1.6090402@dev.mellanox.co.il>
	<adaireotr9h.fsf@cisco.com> <45BF756B.1060500@dev.mellanox.co.il>
Message-ID: <aday7msfjwy.fsf@cisco.com>

I merged all these manpages into my libibverbs tree and pushed the
result out to kernel.org.

Please send any future updates as diffs against the libibverbs tree.

Thanks,
  Roland


From greg at kroah.com  Tue Feb 20 17:50:34 2007
From: greg at kroah.com (Greg KH)
Date: Tue, 20 Feb 2007 17:50:34 -0800
Subject: [openib-general] [patch 09/18] IB/mad: Fix race between cancel and
 receive completion
In-Reply-To: <20070221014927.GA3684@kroah.com>
References: <20070221014413.282048309@mini.kroah.org>
Message-ID: <20070221015034.GJ3684@kroah.com>

-stable review patch.  If anyone has any objections, please let us know.

------------------
From: Roland Dreier <rdreier at cisco.com>

When ib_cancel_mad() is called, it puts the canceled send on a list
and schedules a "flushed" callback from process context.  However,
this leaves a window where a receive completion could be processed
before the send is fully flushed.

This is fine, except that ib_find_send_mad() will find the MAD and
return it to the receive processing, which results in the sender
getting both a successful receive and a "flushed" send completion for
the same request.  Understandably, this confuses the sender, which is
expecting only one of these two callbacks, and leads to grief such as
a use-after-free in IPoIB.

Fix this by changing ib_find_send_mad() to return a send struct only
if the status is still successful (and not "flushed").  The search of
the send_list already had this check, so this patch just adds the same
check to the search of the wait_list.

Signed-off-by: Roland Dreier <rolandd at cisco.com>
Signed-off-by: Chris Wright <chrisw at sous-sol.org>
Signed-off-by: Greg Kroah-Hartman <gregkh at suse.de>
---

---
 drivers/infiniband/core/mad.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- linux-2.6.18.7.orig/drivers/infiniband/core/mad.c
+++ linux-2.6.18.7/drivers/infiniband/core/mad.c
@@ -1750,7 +1750,7 @@ ib_find_send_mad(struct ib_mad_agent_pri
 		     */
 		    (is_direct(wc->recv_buf.mad->mad_hdr.mgmt_class) ||
 		     rcv_has_same_gid(mad_agent_priv, wr, wc)))
-			return wr;
+			return (wr->status == IB_WC_SUCCESS) ? wr : NULL;
 	}
 
 	/*

--


From devesh28 at gmail.com  Tue Feb 20 21:21:42 2007
From: devesh28 at gmail.com (Devesh Sharma)
Date: Wed, 21 Feb 2007 10:51:42 +0530
Subject: [openib-general] Immediate data question
In-Reply-To: <6.2.0.14.2.20070215071309.0979bed8@esmail.cup.hp.com>
References: <6C2C79E72C305246B504CBA17B5500C905DC04@mtlexch01.mtl.com>
	<349DCDA352EACF42A0C49FA6DCEA84035DFAFF@G3W0634.americas.hpqcorp.net>
	<adahctxeds8.fsf@cisco.com>
	<6.2.0.14.2.20070208131610.085c52f8@esmail.cup.hp.com>
	<349DCDA352EACF42A0C49FA6DCEA840362A87E@G3W0634.americas.hpqcorp.net>
	<309a667c0702112110h7a79961fv9c3cf46d4392e1d4@mail.gmail.com>
	<309a667c0702130537u35745e98y429d3d564fb093e9@mail.gmail.com>
	<6.2.0.14.2.20070213125130.07f4dbf8@esmail.cup.hp.com>
	<309a667c0702142137p724172f5va93a0ef046a60483@mail.gmail.com>
	<6.2.0.14.2.20070215071309.0979bed8@esmail.cup.hp.com>
Message-ID: <309a667c0702202121p52747748ic891a9d21a02e3d7@mail.gmail.com>

On 2/15/07, Michael Krause <krause at cup.hp.com> wrote:
> At 09:37 PM 2/14/2007, Devesh Sharma wrote:
> >On 2/14/07, Michael Krause <krause at cup.hp.com> wrote:
> >>At 05:37 AM 2/13/2007, Devesh Sharma wrote:
> >> >On 2/12/07, Devesh Sharma <devesh28 at gmail.com> wrote:
> >> >>On 2/10/07, Tang, Changqing <changquing.tang at hp.com> wrote:
> >> >> > > >
> >> >> > > >Not for the receiver, but the sender will be severely slowed down by
> >> >> > > >having to wait for the RNR timeouts.
> >> >> > >
> >> >> > > RNR = Receiver Not Ready so by definition, the data flow
> >> >> > > isn't going to
> >> >> > > progress until the receiver is ready to receive data.   If a
> >> >> > > receive QP
> >> >> > > enters RNR for a RC, then it is likely not progressing as
> >> >> > > desired.   RNR
> >> >> > > was initially put in place to enable a receiver to create
> >> >> > > back pressure to the sender without causing a fatal error
> >> >> > > condition.  It should rarely be entered and therefore should
> >> >> > > have negligible impact on overall performance however when a
> >> >> > > RNR occurs, no forward progress will occur so performance is
> >> >> > > essentially zero.
> >> >> >
> >> >> > Mike:
> >> >> >         I still do not quite understand this issue. I have two
> >> >> > situations that have RNR triggered.
> >> >> >
> >> >> > 1. process A and process B is connected with QP. A first post a send to
> >> >> > B, B does not post receive. Then A and B are doing a long time
> >> >> > RDMA_WRITE each other, A and B just check memory for the RDMA_WRITE
> >> >> > message. Finally B will post a receive. Does the first pending send
> >> in A
> >> >> > block all the later RDMA_WRITE ?
> >> >>According to IBTA spec HCA will process WR entries in strict order in
> >> >>which they are posted so the send will block all WR posted after this
> >> >>send, Until-unless HCA has multiple processing elements, I think even
> >> >>then processing order will be maintained by HCA
> >> >>  If not, since RNR is triggered
> >> >> > periodically till B post receive, does it affect the RDMA_WRITE
> >> >> > performance between A and B ?
> >> >> >
> >> >> > 2. extend above to three processes, A connect to B, B connect to C,
> >> so B
> >> >> > has two QPs, but one CQ.A posts a send to B, B does not post receive,
> >> >post ordering accross QP is not guaranteed hence presence of same CQ
> >> >or different CQ will not affect any thing.
> >> >> > rather B and C are doing a long time RDMA_WRITE,or send/recv. But B
> >> >If RDMA WRITE _on_ B, no effect on performance. If RDMA WRITE _on_ C,
> >I am sorry I have missed that in both cases same DMA channel is in use.
> >> >_may_ affect the performance, since load is on same HCA. In case of
> >> >Send/Recv again _may_ affect the performance, with the same reason.
> >>
> >>Seems orthogonal.  Any time h/w is shared, multiple flows will have an
> >>impact on one another.  That is why we have the different arbitration
> >>mechanisms to enable one to control that impact.
> >Please, can you explain it more clearly?
>
> Most I/O devices are shared by multiple applications / kernel
> subsystems.   Hence, the device acts as a serialization point for what goes
> on the wire / link.   Sharing = resource contention and in order to add any
> structure to that contention, a number of technologies provide arbitration
> options.   In the case of IB, the arbitration is confined to VL arbitration
> where a given data flow is assigned to a VL and that VL is services at some
> particular rate.   A number of years ago I wrote up how one might also
> provide QP arbitration (not part of the IBTA specifications) and I
> understand some implementations have incorporated that or a variation of
> the mechanisms into their products.
Thanks mike for a nice explanation. I am sorry for the late reply,
Now I got it, here Chang is trying to find out performance hit due to
RNR NAK, performance hit due to device sharing is any how going to be
there so "load on same HCA" is not the proper explanation.
Am I correct now?
>
> In addition to IB link contention, there is also PCI link / bus
> contention.   For PCIe, given most designs did not want to waste resources
> on multiple VC, there really isn't any standard arbitration
> mechanism.   However, many devices, especially a device like a HCA or a
> RNIC, already have the concept of separate resource domains, e.g. QP, and
> they provide a mechanism to associate how the QP's DMA requests or
> interrupts requests are scheduled to the PCIe link.
>
>
> >> >> > must sends RNR periodically to A, right?. So does the pending message
> >> >> > from A affects B's overall performance  between B and C ?
> >> >But RNR NAK is not for very long time.....possibly this performance
> >> >hit you will not be able to observe even. The moment rnr_counter
> >> >expires connection will be broken!
> >>
> >>Keep in mind the timeout can be infinite.  RNR NAK are not expected to be
> >>frequent so their performance impact was considered reasonable.
> >Thanks I missed that.
>
> It is a subtlety within the specification that is easy to miss.
>
> Mike
>
>
>


From sweitzen at cisco.com  Tue Feb 20 21:52:43 2007
From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen))
Date: Tue, 20 Feb 2007 21:52:43 -0800
Subject: [openib-general] fix SDP bug 108 for OFED 1.2 beta?
Message-ID: <A15335FBE9BD2449AF2C9EF3D1EB8EA3030AD43A@xmb-sjc-216.amer.cisco.com>

Tziporet and Michael, every since the SDP rewrite in OFED 1.0 rc5, SDP
throughput drops with message size > 64KB, see attached graph.  Can you
please fix this for OFED 1.2 beta?
 
Scott Weitzenkamp
SQA and Release Manager
Server Virtualization Business Unit
Cisco Systems
 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070220/2d1d6a10/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: sdp2sdp.TCP_STREAM.000.tput_log.pdf
Type: application/octet-stream
Size: 2700 bytes
Desc: sdp2sdp.TCP_STREAM.000.tput_log.pdf
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070220/2d1d6a10/attachment.obj>

From ogerlitz at voltaire.com  Tue Feb 20 22:43:43 2007
From: ogerlitz at voltaire.com (Or Gerlitz)
Date: Wed, 21 Feb 2007 08:43:43 +0200
Subject: [openib-general] [PATCH] librdmacm: fix bug causing failure to
 work with partial membership pkey
In-Reply-To: <1171986159.4380.306117.camel@hal.voltaire.com>
References: <Pine.LNX.4.64.0702190838050.26497@zuben>
	<1171984010.4380.304008.camel@hal.voltaire.com>
	<45DB15F5.4090406@voltaire.com>
	<1171986159.4380.306117.camel@hal.voltaire.com>
Message-ID: <45DBEA1F.5090901@voltaire.com>

Hal Rosenstock wrote:
> On Tue, 2007-02-20 at 10:38, Or Gerlitz wrote:

>> Yes. Its a little bit confusing: partial and full members of an IPoIB IB 
>> partition use the same MGID. When an IPoIB MGID is constructed, the pkey 
>> placed by the driver is --always-- the full membership one. However, on 
>> a node with partial membership, what's plugged into the QP is the pkey 
>> index of the partial instance...

> So in this case, do both the full and partial keys need configuring for
> that port ?

No. The SM configures --either-- the full or the partial pkey.

However, no matter what the SM configures, the core & ipoib code act as 
the full pkey is there. This is nice simplification and it works well.

Or.


From rdreier at cisco.com  Tue Feb 20 23:16:53 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Tue, 20 Feb 2007 23:16:53 -0800
Subject: [openib-general] [PATCH for-2.6.21] IPoIB/cm: improve small
 message bandwidth
In-Reply-To: <20070220181755.GC11825@mellanox.co.il> (Michael S.
	Tsirkin's message of "Tue, 20 Feb 2007 20:17:55 +0200")
References: <20070220181755.GC11825@mellanox.co.il>
Message-ID: <adaejokdlp6.fsf@cisco.com>

Thanks, queued for 2.6.21.  With this patch I see small-packet latency
down almost all the way back to what datagram mode gives -- on a pair
of fast woodcrest systems I see latencies for netpipe tcp 1 byte
messages like

  datagram     13.xx
  original CM  17.xx
  patched CM   14.xx

so there is still a measurable difference but it is much less now.

 - R.


From rdreier at cisco.com  Tue Feb 20 23:19:28 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Tue, 20 Feb 2007 23:19:28 -0800
Subject: [openib-general] [2.6 patch] drivers/infiniband/hw/cxgb3/:
	possible cleanups
In-Reply-To: <1171982587.2101.0.camel@stevo-desktop> (Steve Wise's
	message of "Tue, 20 Feb 2007 08:43:06 -0600")
References: <20070220000211.GZ13958@stusta.de>
	<1171982587.2101.0.camel@stevo-desktop>
Message-ID: <ada3b50dlkv.fsf@cisco.com>

 > You could just remove the code instead of #if 0...

Steve, can you decide what the right thing to do with these changes is
and send me the result (or just tell me to apply Adrian's patch
as-is)?

Thanks,
  Roland


From ogerlitz at voltaire.com  Wed Feb 21 00:48:47 2007
From: ogerlitz at voltaire.com (Or Gerlitz)
Date: Wed, 21 Feb 2007 10:48:47 +0200
Subject: [openib-general] [PATCH] librdmacm: fix bug causing failure to
 work with partial membership pkey
In-Reply-To: <1171986159.4380.306117.camel@hal.voltaire.com>
References: <Pine.LNX.4.64.0702190838050.26497@zuben>
	<1171984010.4380.304008.camel@hal.voltaire.com>
	<45DB15F5.4090406@voltaire.com>
	<1171986159.4380.306117.camel@hal.voltaire.com>
Message-ID: <45DC076F.4060607@voltaire.com>

>> Yes. Its a little bit confusing: partial and full members of an IPoIB IB 
>> partition use the same MGID. When an IPoIB MGID is constructed, the pkey 
>> placed by the driver is --always-- the full membership one. However, on 
>> a node with partial membership, what's plugged into the QP is the pkey 
>> index of the partial instance...

> So in this case, do both the full and partial keys need configuring for
> that port ?

No. The SM configures --either-- the full or the partial pkey.

However, no matter what the SM configures, the core & ipoib code act as 
the full pkey is there. This is nice simplification and it works well.

Or.


From tzachid at mellanox.co.il  Wed Feb 21 00:47:36 2007
From: tzachid at mellanox.co.il (Tzachi Dar)
Date: Wed, 21 Feb 2007 10:47:36 +0200
Subject: [openib-general] [ofw] [Fwd: Re: [Fwd: Re:
 winrelated[was:Re:[PATCH 1/2] opensm: sigusr1: syslog() fixes]]]
Message-ID: <6C2C79E72C305246B504CBA17B5500C9EBAFB0@mtlexch01.mtl.com>

OK, Hal let's try to close this.

The windows openib project was agreed by everyone to be BSD only.
The fact that it is BSD means that any partner (or non partner) of the
Community can download the code and use it, the way he wants.
This includes:
	1) Running the code as is.
	2) Making changes to the code and contributing them back.
	3) Making changes to the code and *NOT* giving them back to the
community.

Starting to depend on GPL (or LGPL) code means that the freedom of the
users to do (3) is broken.
Mellanox thinks that this needs a wider agreement of the open-IB
consortium, which we don't have.

More than that, the ideas that were introduced here about sending users
to other places in order for
them to find the pthread implementation are also not that great as this
starts to make the life of
our users harder. Also it is not clear who will give support once there
are problems, and who is
responsible that the license of the library won't change.

So, I hope this closes the subject of using LGPL software in open-IB.

By the way, what implementation of pthreads were you thinking of? I have
noticed that the first implementation that Google brings was only tested
on uni-processor system.
(http://sourceware.org/pthreads-win32/news.html). (this is really
amazing, I thought that these servers were out of the market a long time
ago).

To be more practical:
Can you give us a better view of what you are trying to achieve? In
other words, as far as I know 
Opensm is using complib apis to handle threads. The implementation of
this functions on windows is usually trivial.
Do you intend to make a re-write of opensm so that it will use pthreads
or do you intend to make a find/replace
And replace the complib functions with Pthreads ones? If we are talking
about the second, than one can simply implement the pthread functions
using trivial win32 calls.

And another question: What is the functionality that you are currently
missing? Can this functionality be added?

Thanks
Tzachi

By the way, rumors I have heard say that Voltaire doesn't always give
it's code back to the community, but this are just rumors, right?

> -----Original Message-----
> From: ofw-bounces at lists.openfabrics.org 
> [mailto:ofw-bounces at lists.openfabrics.org] On Behalf Of Fab Tillier
> Sent: Tuesday, February 20, 2007 11:56 PM
> To: Hal Rosenstock
> Cc: ofw at lists.openfabrics.org; Gilad Shainer; OPENIB
> Subject: RE: [ofw] [Fwd: Re: [openib-general] [Fwd: Re: 
> winrelated[was:Re:[PATCH 1/2] opensm: sigusr1: syslog() fixes]]]
> 
> -----Original Message-----
> From: ofw-bounces at lists.openfabrics.org
> [mailto:ofw-bounces at lists.openfabrics.org] On Behalf Of Hal Rosenstock
> Sent: Tuesday, February 20, 2007 1:43 PM
> 
> On Tue, 2007-02-20 at 16:08, Fab Tillier wrote:
> > -----Original Message-----
> > From: Hal Rosenstock [mailto:halr at voltaire.com]
> > Sent: Tuesday, February 20, 2007 10:57 AM
> > 
> > On Tue, 2007-02-20 at 13:56, Fab Tillier wrote:
> > [ftillier] This isn't just an install issue - it's a build issue.
> > Anyone that wants to build OpenSM will need to find/download/install
> the
> > pthreads library so that the build will succeed.  If linking
> statically,
> > the resulting executable will not require any special installation.
> > It's only an install issue if you link dynamically to pitheads.
> 
> OK; then build and install. How big an issue is this ?
> 
> I thought DLLs were dynamically linked but I'm a Windows plebe. 
> 
> [ftillier] When you build, the linker needs the import 
> library for pthreads so that the functions get resolved as 
> being imported from the pthreads DLL.  The dependency on the 
> pthreads DLL is then created and the DLL will be loaded 
> dynamically, assuming it can be found in the path.
> 
> So for the build process, you need to have the pthreads 
> library available to the build tool (path to the lib).  This 
> requires installing the pthreads developer package or however 
> it's done.
> 
> If you statically link the pthreads lib, rather than 
> dynamically link, then all the pthreads goodies go directly 
> into the executable and you remove the dependency on an 
> external DLL.  The build process requirements are no 
> different than for the dynamically linked case.
> 
> There is also the possibility to remove the link-time 
> dependency by calling GetProcAddress to explicitly resolve 
> the pthreads entrypoints.
> This method still requires having the DLL loaded on the 
> user's systems.
> 
> Pesonally, I would rather see static linkage to the pthreads 
> library so that only the builds are affected (something only 
> 'experts' will be doing), while not affecting the common user.
> 
> -Fab
> _______________________________________________
> ofw mailing list
> ofw at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ofw
> 


From vlad at lists.openfabrics.org  Wed Feb 21 02:26:03 2007
From: vlad at lists.openfabrics.org (vlad at lists.openfabrics.org)
Date: Wed, 21 Feb 2007 02:26:03 -0800 (PST)
Subject: [openib-general] ofa_1_2_kernel 20070221-0200 daily build status
Message-ID: <20070221102603.B5502E60804@openfabrics.org>

This email was generated automatically, please do not reply


Common build parameters:  --with-ipoib-mod --with-sdp-mod --with-srp-mod --with-user_mad-mod --with-user_access-mod --with-mthca-mod --with-core-mod --with-addr_trans-mod --with-cxgb3-mod 

Passed:
Passed on i686 with 2.6.15-23-server
Passed on i686 with linux-2.6.17
Passed on i686 with linux-2.6.16
Passed on i686 with linux-2.6.12
Passed on i686 with linux-2.6.18
Passed on i686 with linux-2.6.13
Passed on i686 with linux-2.6.19
Passed on i686 with linux-2.6.14
Passed on i686 with linux-2.6.15
Passed on ppc64 with linux-2.6.18
Passed on powerpc with linux-2.6.19
Passed on x86_64 with linux-2.6.16
Passed on x86_64 with linux-2.6.19
Passed on x86_64 with linux-2.6.20
Passed on x86_64 with linux-2.6.17
Passed on ia64 with linux-2.6.19
Passed on powerpc with linux-2.6.17
Passed on x86_64 with linux-2.6.12
Passed on x86_64 with linux-2.6.15
Passed on x86_64 with linux-2.6.14
Passed on powerpc with linux-2.6.18
Passed on x86_64 with linux-2.6.13
Passed on x86_64 with linux-2.6.18
Passed on ia64 with linux-2.6.18
Passed on powerpc with linux-2.6.12
Passed on powerpc with linux-2.6.15
Passed on ppc64 with linux-2.6.15
Passed on powerpc with linux-2.6.13
Passed on powerpc with linux-2.6.14
Passed on ia64 with linux-2.6.15
Passed on ppc64 with linux-2.6.14
Passed on ia64 with linux-2.6.13
Passed on powerpc with linux-2.6.16
Passed on ppc64 with linux-2.6.16
Passed on ppc64 with linux-2.6.12
Passed on ppc64 with linux-2.6.19
Passed on ia64 with linux-2.6.16
Passed on ia64 with linux-2.6.17
Passed on ppc64 with linux-2.6.17
Passed on ppc64 with linux-2.6.13
Passed on ia64 with linux-2.6.14
Passed on ia64 with linux-2.6.12

Failed:


From bunk at stusta.de  Wed Feb 21 02:52:49 2007
From: bunk at stusta.de (Adrian Bunk)
Date: Wed, 21 Feb 2007 11:52:49 +0100
Subject: [openib-general] [2.6 patch] drivers/infiniband/hw/cxgb3/: cleanups
In-Reply-To: <1171982587.2101.0.camel@stevo-desktop>
References: <20070220000211.GZ13958@stusta.de>
	<1171982587.2101.0.camel@stevo-desktop>
Message-ID: <20070221105249.GC13958@stusta.de>

On Tue, Feb 20, 2007 at 08:43:06AM -0600, Steve Wise wrote:
> On Tue, 2007-02-20 at 01:02 +0100, Adrian Bunk wrote:
> > This patch contains the following possible cleanups:
> > - don't mark static functions in C files as inline - gcc should know
> >   best whether inlining makes sense
> > - never compile the unused cxio_dbg.c
> > - make the following needlessly global functions static:
> >   - cxio_hal.c: cxio_hal_clear_qp_ctx()
> >   - iwch_provider.c: iwch_get_qp()
> > - #if 0 the following unused global functions:
> >   - cxio_hal.c: cxio_allocate_stag()
> >   - cxio_resource.: cxio_hal_get_rhdl()
> >   - cxio_resource.: cxio_hal_put_rhdl()
> > 
> 
> You could just remove the code instead of #if 0...
>...

Updated patch below.

cu
Adrian


<--  snip  -->


This patch contains the following possible cleanups:
- don't mark static functions in C files as inline - gcc should know
  best whether inlining makes sense
- never compile the unused cxio_dbg.c
- make the following needlessly global functions static:
  - cxio_hal.c: cxio_hal_clear_qp_ctx()
  - iwch_provider.c: iwch_get_qp()
- remove the following unused global functions:
  - cxio_hal.c: cxio_allocate_stag()
  - cxio_resource.: cxio_hal_get_rhdl()
  - cxio_resource.: cxio_hal_put_rhdl()

Signed-off-by: Adrian Bunk <bunk at stusta.de>

---

 drivers/infiniband/hw/cxgb3/Makefile        |    1 
 drivers/infiniband/hw/cxgb3/cxio_hal.c      |   31 +++++---------------
 drivers/infiniband/hw/cxgb3/cxio_hal.h      |    5 ---
 drivers/infiniband/hw/cxgb3/cxio_resource.c |   14 +--------
 drivers/infiniband/hw/cxgb3/iwch_cm.c       |    5 +--
 drivers/infiniband/hw/cxgb3/iwch_provider.c |    2 -
 drivers/infiniband/hw/cxgb3/iwch_provider.h |    1 
 drivers/infiniband/hw/cxgb3/iwch_qp.c       |   29 ++++++++----------
 8 files changed, 27 insertions(+), 61 deletions(-)

--- linux-2.6.20-mm1/drivers/infiniband/hw/cxgb3/Makefile.old	2007-02-17 17:21:03.000000000 +0100
+++ linux-2.6.20-mm1/drivers/infiniband/hw/cxgb3/Makefile	2007-02-17 17:21:08.000000000 +0100
@@ -8,5 +8,4 @@
 
 ifdef CONFIG_INFINIBAND_CXGB3_DEBUG
 EXTRA_CFLAGS += -DDEBUG
-iw_cxgb3-y += cxio_dbg.o
 endif
--- linux-2.6.20-mm1/drivers/infiniband/hw/cxgb3/cxio_hal.h.old	2007-02-17 17:22:53.000000000 +0100
+++ linux-2.6.20-mm1/drivers/infiniband/hw/cxgb3/cxio_hal.h	2007-02-17 17:25:08.000000000 +0100
@@ -144,7 +144,6 @@
 void cxio_rdev_close(struct cxio_rdev *rdev);
 int cxio_hal_cq_op(struct cxio_rdev *rdev, struct t3_cq *cq,
 		   enum t3_cq_opcode op, u32 credit);
-int cxio_hal_clear_qp_ctx(struct cxio_rdev *rdev, u32 qpid);
 int cxio_create_cq(struct cxio_rdev *rdev, struct t3_cq *cq);
 int cxio_destroy_cq(struct cxio_rdev *rdev, struct t3_cq *cq);
 int cxio_resize_cq(struct cxio_rdev *rdev, struct t3_cq *cq);
@@ -155,8 +154,6 @@
 int cxio_destroy_qp(struct cxio_rdev *rdev, struct t3_wq *wq,
 		    struct cxio_ucontext *uctx);
 int cxio_peek_cq(struct t3_wq *wr, struct t3_cq *cq, int opcode);
-int cxio_allocate_stag(struct cxio_rdev *rdev, u32 * stag, u32 pdid,
-		       enum tpt_mem_perm perm, u32 * pbl_size, u32 * pbl_addr);
 int cxio_register_phys_mem(struct cxio_rdev *rdev, u32 * stag, u32 pdid,
 			   enum tpt_mem_perm perm, u32 zbva, u64 to, u32 len,
 			   u8 page_size, __be64 *pbl, u32 *pbl_size,
@@ -172,8 +169,6 @@
 int cxio_rdma_init(struct cxio_rdev *rdev, struct t3_rdma_init_attr *attr);
 void cxio_register_ev_cb(cxio_hal_ev_callback_func_t ev_cb);
 void cxio_unregister_ev_cb(cxio_hal_ev_callback_func_t ev_cb);
-u32 cxio_hal_get_rhdl(void);
-void cxio_hal_put_rhdl(u32 rhdl);
 u32 cxio_hal_get_pdid(struct cxio_hal_resource *rscp);
 void cxio_hal_put_pdid(struct cxio_hal_resource *rscp, u32 pdid);
 int __init cxio_hal_init(void);
--- linux-2.6.20-mm1/drivers/infiniband/hw/cxgb3/iwch_provider.h.old	2007-02-17 17:25:35.000000000 +0100
+++ linux-2.6.20-mm1/drivers/infiniband/hw/cxgb3/iwch_provider.h	2007-02-17 17:25:41.000000000 +0100
@@ -179,7 +179,6 @@
 
 void iwch_qp_add_ref(struct ib_qp *qp);
 void iwch_qp_rem_ref(struct ib_qp *qp);
-struct ib_qp *iwch_get_qp(struct ib_device *dev, int qpn);
 
 struct iwch_ucontext {
 	struct ib_ucontext ibucontext;
--- linux-2.6.20-mm1/drivers/infiniband/hw/cxgb3/iwch_provider.c.old	2007-02-17 17:25:50.000000000 +0100
+++ linux-2.6.20-mm1/drivers/infiniband/hw/cxgb3/iwch_provider.c	2007-02-17 17:25:57.000000000 +0100
@@ -949,7 +949,7 @@
 	        wake_up(&(to_iwch_qp(qp)->wait));
 }
 
-struct ib_qp *iwch_get_qp(struct ib_device *dev, int qpn)
+static struct ib_qp *iwch_get_qp(struct ib_device *dev, int qpn)
 {
 	PDBG("%s ib_dev %p qpn 0x%x\n", __FUNCTION__, dev, qpn);
 	return (struct ib_qp *)get_qhp(to_iwch_dev(dev), qpn);
--- linux-2.6.20-mm1/drivers/infiniband/hw/cxgb3/iwch_qp.c.old	2007-02-17 17:27:31.000000000 +0100
+++ linux-2.6.20-mm1/drivers/infiniband/hw/cxgb3/iwch_qp.c	2007-02-17 17:38:07.000000000 +0100
@@ -37,8 +37,8 @@
 
 #define NO_SUPPORT -1
 
-static inline int iwch_build_rdma_send(union t3_wr *wqe, struct ib_send_wr *wr,
-				       u8 * flit_cnt)
+static int iwch_build_rdma_send(union t3_wr *wqe, struct ib_send_wr *wr,
+				u8 * flit_cnt)
 {
 	int i;
 	u32 plen;
@@ -97,8 +97,8 @@
 	return 0;
 }
 
-static inline int iwch_build_rdma_write(union t3_wr *wqe, struct ib_send_wr *wr,
-					u8 *flit_cnt)
+static int iwch_build_rdma_write(union t3_wr *wqe, struct ib_send_wr *wr,
+				 u8 *flit_cnt)
 {
 	int i;
 	u32 plen;
@@ -138,8 +138,8 @@
 	return 0;
 }
 
-static inline int iwch_build_rdma_read(union t3_wr *wqe, struct ib_send_wr *wr,
-				       u8 *flit_cnt)
+static int iwch_build_rdma_read(union t3_wr *wqe, struct ib_send_wr *wr,
+				u8 *flit_cnt)
 {
 	if (wr->num_sge > 1)
 		return -EINVAL;
@@ -159,9 +159,8 @@
 /*
  * TBD: this is going to be moved to firmware. Missing pdid/qpid check for now.
  */
-static inline int iwch_sgl2pbl_map(struct iwch_dev *rhp,
-				   struct ib_sge *sg_list, u32 num_sgle,
-				   u32 * pbl_addr, u8 * page_size)
+static int iwch_sgl2pbl_map(struct iwch_dev *rhp, struct ib_sge *sg_list,
+			    u32 num_sgle, u32 * pbl_addr, u8 * page_size)
 {
 	int i;
 	struct iwch_mr *mhp;
@@ -207,9 +206,8 @@
 	return 0;
 }
 
-static inline int iwch_build_rdma_recv(struct iwch_dev *rhp,
-						    union t3_wr *wqe,
-						    struct ib_recv_wr *wr)
+static int iwch_build_rdma_recv(struct iwch_dev *rhp, union t3_wr *wqe,
+				struct ib_recv_wr *wr)
 {
 	int i, err = 0;
 	u32 pbl_addr[4];
@@ -474,8 +472,7 @@
 	return err;
 }
 
-static inline void build_term_codes(int t3err, u8 *layer_type, u8 *ecode,
-				    int tagged)
+static void build_term_codes(int t3err, u8 *layer_type, u8 *ecode, int tagged)
 {
 	switch (t3err) {
 	case TPT_ERR_STAG:
@@ -673,7 +670,7 @@
 	spin_lock_irqsave(&qhp->lock, *flag);
 }
 
-static inline void flush_qp(struct iwch_qp *qhp, unsigned long *flag)
+static void flush_qp(struct iwch_qp *qhp, unsigned long *flag)
 {
 	if (t3b_device(qhp->rhp))
 		cxio_set_wq_in_error(&qhp->wq);
@@ -685,7 +682,7 @@
 /*
  * Return non zero if at least one RECV was pre-posted.
  */
-static inline int rqes_posted(struct iwch_qp *qhp)
+static int rqes_posted(struct iwch_qp *qhp)
 {
 	return fw_riwrh_opcode((struct fw_riwrh *)qhp->wq.queue) == T3_WR_RCV;
 }
--- linux-2.6.20-mm1/drivers/infiniband/hw/cxgb3/iwch_cm.c.old	2007-02-17 17:27:53.000000000 +0100
+++ linux-2.6.20-mm1/drivers/infiniband/hw/cxgb3/iwch_cm.c	2007-02-17 17:38:23.000000000 +0100
@@ -210,8 +210,7 @@
 	return state;
 }
 
-static inline void __state_set(struct iwch_ep_common *epc,
-			       enum iwch_ep_state new)
+static void __state_set(struct iwch_ep_common *epc, enum iwch_ep_state new)
 {
 	epc->state = new;
 }
@@ -1460,7 +1459,7 @@
 /*
  * Returns whether an ABORT_REQ_RSS message is a negative advice.
  */
-static inline int is_neg_adv_abort(unsigned int status)
+static int is_neg_adv_abort(unsigned int status)
 {
 	return status == CPL_ERR_RTX_NEG_ADVICE ||
 	       status == CPL_ERR_PERSIST_NEG_ADVICE;
--- linux-2.6.20-mm2/drivers/infiniband/hw/cxgb3/cxio_resource.c.old	2007-02-20 23:22:29.000000000 +0100
+++ linux-2.6.20-mm2/drivers/infiniband/hw/cxgb3/cxio_resource.c	2007-02-20 23:12:04.000000000 +0100
@@ -179,7 +179,7 @@
 /*
  * returns 0 if no resource available
  */
-static inline u32 cxio_hal_get_resource(struct kfifo *fifo)
+static u32 cxio_hal_get_resource(struct kfifo *fifo)
 {
 	u32 entry;
 	if (kfifo_get(fifo, (unsigned char *) &entry, sizeof(u32)))
@@ -188,21 +188,11 @@
 		return 0;	/* fifo emptry */
 }
 
-static inline void cxio_hal_put_resource(struct kfifo *fifo, u32 entry)
+static void cxio_hal_put_resource(struct kfifo *fifo, u32 entry)
 {
 	BUG_ON(kfifo_put(fifo, (unsigned char *) &entry, sizeof(u32)) == 0);
 }
 
-u32 cxio_hal_get_rhdl(void)
-{
-	return cxio_hal_get_resource(rhdl_fifo);
-}
-
-void cxio_hal_put_rhdl(u32 rhdl)
-{
-	cxio_hal_put_resource(rhdl_fifo, rhdl);
-}
-
 u32 cxio_hal_get_stag(struct cxio_hal_resource *rscp)
 {
 	return cxio_hal_get_resource(rscp->tpt_fifo);
--- linux-2.6.20-mm2/drivers/infiniband/hw/cxgb3/cxio_hal.c.old	2007-02-20 23:22:42.000000000 +0100
+++ linux-2.6.20-mm2/drivers/infiniband/hw/cxgb3/cxio_hal.c	2007-02-20 23:12:43.000000000 +0100
@@ -45,7 +45,7 @@
 static LIST_HEAD(rdev_list);
 static cxio_hal_ev_callback_func_t cxio_ev_cb = NULL;
 
-static inline struct cxio_rdev *cxio_hal_find_rdev_by_name(char *dev_name)
+static struct cxio_rdev *cxio_hal_find_rdev_by_name(char *dev_name)
 {
 	struct cxio_rdev *rdev;
 
@@ -55,8 +55,7 @@
 	return NULL;
 }
 
-static inline struct cxio_rdev *cxio_hal_find_rdev_by_t3cdev(struct t3cdev
-							     *tdev)
+static struct cxio_rdev *cxio_hal_find_rdev_by_t3cdev(struct t3cdev *tdev)
 {
 	struct cxio_rdev *rdev;
 
@@ -118,7 +117,7 @@
 	return 0;
 }
 
-static inline int cxio_hal_clear_cq_ctx(struct cxio_rdev *rdev_p, u32 cqid)
+static int cxio_hal_clear_cq_ctx(struct cxio_rdev *rdev_p, u32 cqid)
 {
 	struct rdma_cq_setup setup;
 	setup.id = cqid;
@@ -130,7 +129,7 @@
 	return (rdev_p->t3cdev_p->ctl(rdev_p->t3cdev_p, RDMA_CQ_SETUP, &setup));
 }
 
-int cxio_hal_clear_qp_ctx(struct cxio_rdev *rdev_p, u32 qpid)
+static int cxio_hal_clear_qp_ctx(struct cxio_rdev *rdev_p, u32 qpid)
 {
 	u64 sge_cmd;
 	struct t3_modify_qp_wr *wqe;
@@ -425,7 +424,7 @@
 	}
 }
 
-static inline int cqe_completes_wr(struct t3_cqe *cqe, struct t3_wq *wq)
+static int cqe_completes_wr(struct t3_cqe *cqe, struct t3_wq *wq)
 {
 	if (CQE_OPCODE(*cqe) == T3_TERMINATE)
 		return 0;
@@ -760,17 +759,6 @@
 	return err;
 }
 
-/* IN : stag key, pdid, pbl_size
- * Out: stag index, actaul pbl_size, and pbl_addr allocated.
- */
-int cxio_allocate_stag(struct cxio_rdev *rdev_p, u32 * stag, u32 pdid,
-		       enum tpt_mem_perm perm, u32 * pbl_size, u32 * pbl_addr)
-{
-	*stag = T3_STAG_UNSET;
-	return (__cxio_tpt_op(rdev_p, 0, stag, 0, pdid, TPT_NON_SHARED_MR,
-			      perm, 0, 0ULL, 0, 0, NULL, pbl_size, pbl_addr));
-}
-
 int cxio_register_phys_mem(struct cxio_rdev *rdev_p, u32 *stag, u32 pdid,
 			   enum tpt_mem_perm perm, u32 zbva, u64 to, u32 len,
 			   u8 page_size, __be64 *pbl, u32 *pbl_size,
@@ -1029,7 +1017,7 @@
 	cxio_hal_destroy_rhdl_resource();
 }
 
-static inline void flush_completed_wrs(struct t3_wq *wq, struct t3_cq *cq)
+static void flush_completed_wrs(struct t3_wq *wq, struct t3_cq *cq)
 {
 	struct t3_swsq *sqp;
 	__u32 ptr = wq->sq_rptr;
@@ -1058,9 +1046,8 @@
 			break;
 }
 
-static inline void create_read_req_cqe(struct t3_wq *wq,
-				       struct t3_cqe *hw_cqe,
-				       struct t3_cqe *read_cqe)
+static void create_read_req_cqe(struct t3_wq *wq, struct t3_cqe *hw_cqe,
+				struct t3_cqe *read_cqe)
 {
 	read_cqe->u.scqe.wrid_hi = wq->oldest_read->sq_wptr;
 	read_cqe->len = wq->oldest_read->read_len;
@@ -1073,7 +1060,7 @@
 /*
  * Return a ptr to the next read wr in the SWSQ or NULL.
  */
-static inline void advance_oldest_read(struct t3_wq *wq)
+static void advance_oldest_read(struct t3_wq *wq)
 {
 
 	u32 rptr = wq->oldest_read - wq->sq + 1;


From halr at voltaire.com  Wed Feb 21 03:46:10 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 21 Feb 2007 06:46:10 -0500
Subject: [openib-general] [PATCH] librdmacm: fix bug causing failure to
 work with partial membership pkey
In-Reply-To: <45DBEA1F.5090901@voltaire.com>
References: <Pine.LNX.4.64.0702190838050.26497@zuben>
	<1171984010.4380.304008.camel@hal.voltaire.com>
	<45DB15F5.4090406@voltaire.com>
	<1171986159.4380.306117.camel@hal.voltaire.com>
	<45DBEA1F.5090901@voltaire.com>
Message-ID: <1172058368.4380.379947.camel@hal.voltaire.com>

On Wed, 2007-02-21 at 01:43, Or Gerlitz wrote:
> Hal Rosenstock wrote:
> > On Tue, 2007-02-20 at 10:38, Or Gerlitz wrote:
> 
> >> Yes. Its a little bit confusing: partial and full members of an IPoIB IB 
> >> partition use the same MGID. When an IPoIB MGID is constructed, the pkey 
> >> placed by the driver is --always-- the full membership one. However, on 
> >> a node with partial membership, what's plugged into the QP is the pkey 
> >> index of the partial instance...
> 
> > So in this case, do both the full and partial keys need configuring for
> > that port ?
> 
> No. The SM configures --either-- the full or the partial pkey.

That's what I was afraid of :-(

> However, no matter what the SM configures, the core & ipoib code act as 
> the full pkey is there. This is nice simplification and it works well.

I believe it is a spec (compliance) violation for the port to be a
partial member and join as a full member.

-- Hal

> Or.
> 


From ogerlitz at voltaire.com  Wed Feb 21 04:35:34 2007
From: ogerlitz at voltaire.com (Or Gerlitz)
Date: Wed, 21 Feb 2007 14:35:34 +0200
Subject: [openib-general] [PATCH] librdmacm: fix bug causing failure to
 work with partial membership pkey
In-Reply-To: <1172058368.4380.379947.camel@hal.voltaire.com>
References: <Pine.LNX.4.64.0702190838050.26497@zuben>
	<1171984010.4380.304008.camel@hal.voltaire.com>
	<45DB15F5.4090406@voltaire.com>
	<1171986159.4380.306117.camel@hal.voltaire.com>
	<45DBEA1F.5090901@voltaire.com>
	<1172058368.4380.379947.camel@hal.voltaire.com>
Message-ID: <45DC3C96.8040100@voltaire.com>

>> However, no matter what the SM configures, the core & ipoib code act as 
>> the full pkey is there. This is nice simplification and it works well.

> I believe it is a spec (compliance) violation for the port to be a
> partial member and join as a full member.

Since partial members can't talk among themselves, there is no reason to 
form a multicast group containing --only-- ports that can --not-- talk 
to each other... So if the spec does not allow this (having a partial 
member joining with the full member pkey) - it a spec bug...

Or.


From ogerlitz at voltaire.com  Wed Feb 21 04:52:57 2007
From: ogerlitz at voltaire.com (Or Gerlitz)
Date: Wed, 21 Feb 2007 14:52:57 +0200
Subject: [openib-general] [PATCH for-2.6.21] IPoIB/cm: improve small
 message bandwidth
In-Reply-To: <20070220181755.GC11825@mellanox.co.il>
References: <20070220181755.GC11825@mellanox.co.il>
Message-ID: <45DC40A9.507@voltaire.com>

Michael S. Tsirkin wrote:
> Avoid overhead of freeing/reallocating and mapping/unmapping for dma
> for pages that have not been written to by hardware.

> diff --git a/drivers/infiniband/ulp/ipoib/ipoib_cm.c b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
> index 8ee6f06..a23c8e3 100644
> --- a/drivers/infiniband/ulp/ipoib/ipoib_cm.c
> +++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
> @@ -68,14 +68,14 @@ struct ipoib_cm_id {
>  static int ipoib_cm_tx_handler(struct ib_cm_id *cm_id,
>  			       struct ib_cm_event *event);
>  
> -static void ipoib_cm_dma_unmap_rx(struct ipoib_dev_priv *priv,
> +static void ipoib_cm_dma_unmap_rx(struct ipoib_dev_priv *priv, int frags,
>  				  u64 mapping[IPOIB_CM_RX_SG])
>  {
>  	int i;
>  
>  	ib_dma_unmap_single(priv->ca, mapping[0], IPOIB_CM_HEAD_SIZE, DMA_FROM_DEVICE);
>  
> -	for (i = 0; i < IPOIB_CM_RX_SG - 1; ++i)
> +	for (i = 0; i < frags; ++i)
>  		ib_dma_unmap_single(priv->ca, mapping[i + 1], PAGE_SIZE, DMA_FROM_DEVICE);
>  }

I understand that in ipoib_cm_alloc_rx_skb you call dma_map_page on 
IPOIB_CM_RX_SG pages where here you call dma_unmap_single only $frags 
times, correct? does this means you are trashing the IOMMU etc etc of 
the system?

Or.


From mst at mellanox.co.il  Wed Feb 21 05:22:00 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Wed, 21 Feb 2007 15:22:00 +0200
Subject: [openib-general] [PATCH for-2.6.21] IPoIB/cm: improve small
 message bandwidth
In-Reply-To: <45DC40A9.507@voltaire.com>
References: <45DC40A9.507@voltaire.com>
Message-ID: <20070221132159.GC7711@mellanox.co.il>

> Quoting r. Or Gerlitz <ogerlitz at voltaire.com>:
> Subject: Re: [openib-general] [PATCH for-2.6.21] IPoIB/cm: improve small message bandwidth
> 
> Michael S. Tsirkin wrote:
> > Avoid overhead of freeing/reallocating and mapping/unmapping for dma
> > for pages that have not been written to by hardware.
> 
> > diff --git a/drivers/infiniband/ulp/ipoib/ipoib_cm.c b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
> > index 8ee6f06..a23c8e3 100644
> > --- a/drivers/infiniband/ulp/ipoib/ipoib_cm.c
> > +++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
> > @@ -68,14 +68,14 @@ struct ipoib_cm_id {
> >  static int ipoib_cm_tx_handler(struct ib_cm_id *cm_id,
> >  			       struct ib_cm_event *event);
> >  
> > -static void ipoib_cm_dma_unmap_rx(struct ipoib_dev_priv *priv,
> > +static void ipoib_cm_dma_unmap_rx(struct ipoib_dev_priv *priv, int frags,
> >  				  u64 mapping[IPOIB_CM_RX_SG])
> >  {
> >  	int i;
> >  
> >  	ib_dma_unmap_single(priv->ca, mapping[0], IPOIB_CM_HEAD_SIZE, DMA_FROM_DEVICE);
> >  
> > -	for (i = 0; i < IPOIB_CM_RX_SG - 1; ++i)
> > +	for (i = 0; i < frags; ++i)
> >  		ib_dma_unmap_single(priv->ca, mapping[i + 1], PAGE_SIZE, DMA_FROM_DEVICE);
> >  }
> 
> I understand that in ipoib_cm_alloc_rx_skb you call dma_map_page on 
> IPOIB_CM_RX_SG pages where here you call dma_unmap_single only $frags 
> times, correct?

No.

> does this means you are trashing the IOMMU etc etc of 
> the system?

I don't think so.


-- 
MST


From halr at voltaire.com  Wed Feb 21 05:20:23 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 21 Feb 2007 08:20:23 -0500
Subject: [openib-general] [PATCH] librdmacm: fix bug causing failure to
 work with partial membership pkey
In-Reply-To: <45DC3C96.8040100@voltaire.com>
References: <Pine.LNX.4.64.0702190838050.26497@zuben>
	<1171984010.4380.304008.camel@hal.voltaire.com>
	<45DB15F5.4090406@voltaire.com>
	<1171986159.4380.306117.camel@hal.voltaire.com>
	<45DBEA1F.5090901@voltaire.com>
	<1172058368.4380.379947.camel@hal.voltaire.com>
	<45DC3C96.8040100@voltaire.com>
Message-ID: <1172064021.4380.385825.camel@hal.voltaire.com>

On Wed, 2007-02-21 at 07:35, Or Gerlitz wrote:
> >> However, no matter what the SM configures, the core & ipoib code act as 
> >> the full pkey is there. This is nice simplification and it works well.
> 
> > I believe it is a spec (compliance) violation for the port to be a
> > partial member and join as a full member.
> 
> Since partial members can't talk among themselves, there is no reason to 
> form a multicast group containing --only-- ports that can --not-- talk 
> to each other... So if the spec does not allow this (having a partial 
> member joining with the full member pkey) - it a spec bug...

I think there are two issues here then:
1. If this is the case, getting the spec changed to accomodate this use
case.
2. I believe that OpenIB code is supposed to be spec compliant.

-- Hal

> Or.
> 
> 


From jsquyres at cisco.com  Wed Feb 21 06:12:26 2007
From: jsquyres at cisco.com (Jeff Squyres)
Date: Wed, 21 Feb 2007 09:12:26 -0500
Subject: [openib-general] Fwd:  Address List Change for Friday, 2/23/2007
References: <3D84A59A1AD3584DA02AEAD240E8863F0366947F@ES22SNLNT.srn.sandia.gov>
Message-ID: <00D1093C-4EE4-4580-A57F-2B302E694C06@cisco.com>

FYI.  In case you missed it the first time: THIS LIST IS CHANGING ON  
FRIDAY 2/23/2007 (2 days from now).  Please update your addressbooks!

See the notice below for the details.


Begin forwarded message:

> From: "Lee, Michael Paichi" <mplee at sandia.gov>
> Date: February 19, 2007 10:43:23 AM EST
> To: openib-general at openib.org
> Subject: [openib-general] Address List Change for Friday, 2/23/2007
>
> We're in the process of migrating the maillists from the old  
> openib.org server to the new lists.openfabrics.org machine.  The  
> list openib-general will be moved this Friday, February 23, 2007.   
> The new address for the maillist will be  
> general at lists.openfabrics.org.
>
> What this means is that messages will come from  
> general at lists.openfabrics.org.  Conversely, replies should be made  
> to this address as well.  Messages will also have a new subject  
> line prefix of [OFA General].  If you have configured your e-mail  
> client to filter based on maillist address or subject headers, you  
> may need to make some adjustments for filtering.
>
> However, for the sake of transition, messages sent to the previous  
> maillist address on the old server will forward to the new server.   
> This forward will remain in place until the old server is taken  
> offline and final DNS changes are made.  We expect the old server  
> to go offline sometime in early March.
>
> The web archives will also be migrated to the new web address  
> shortly, http://lists.openfabrics.org.
>
> If you have any questions, please don't hesitate to contact me at  
> mplee at sandia.gov.
>
> Regards,
> Michael Lee
>
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
>
> To unsubscribe, please visit http://openib.org/mailman/listinfo/ 
> openib-general


-- 
Jeff Squyres
Server Virtualization Business Unit
Cisco Systems


From swise at opengridcomputing.com  Wed Feb 21 06:31:45 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Wed, 21 Feb 2007 08:31:45 -0600
Subject: [openib-general] [2.6 patch] drivers/infiniband/hw/cxgb3/:
	cleanups
In-Reply-To: <20070221105249.GC13958@stusta.de>
References: <20070220000211.GZ13958@stusta.de>
	<1171982587.2101.0.camel@stevo-desktop>
	<20070221105249.GC13958@stusta.de>
Message-ID: <1172068305.21243.2.camel@stevo-desktop>

Thanks Adrian!

Acked-by: Steve Wise <swise at opengridcomputing.com>


On Wed, 2007-02-21 at 11:52 +0100, Adrian Bunk wrote:
> On Tue, Feb 20, 2007 at 08:43:06AM -0600, Steve Wise wrote:
> > On Tue, 2007-02-20 at 01:02 +0100, Adrian Bunk wrote:
> > > This patch contains the following possible cleanups:
> > > - don't mark static functions in C files as inline - gcc should know
> > >   best whether inlining makes sense
> > > - never compile the unused cxio_dbg.c
> > > - make the following needlessly global functions static:
> > >   - cxio_hal.c: cxio_hal_clear_qp_ctx()
> > >   - iwch_provider.c: iwch_get_qp()
> > > - #if 0 the following unused global functions:
> > >   - cxio_hal.c: cxio_allocate_stag()
> > >   - cxio_resource.: cxio_hal_get_rhdl()
> > >   - cxio_resource.: cxio_hal_put_rhdl()
> > > 
> > 
> > You could just remove the code instead of #if 0...
> >...
> 
> Updated patch below.
> 
> cu
> Adrian
> 
> 
> <--  snip  -->
> 
> 
> This patch contains the following possible cleanups:
> - don't mark static functions in C files as inline - gcc should know
>   best whether inlining makes sense
> - never compile the unused cxio_dbg.c
> - make the following needlessly global functions static:
>   - cxio_hal.c: cxio_hal_clear_qp_ctx()
>   - iwch_provider.c: iwch_get_qp()
> - remove the following unused global functions:
>   - cxio_hal.c: cxio_allocate_stag()
>   - cxio_resource.: cxio_hal_get_rhdl()
>   - cxio_resource.: cxio_hal_put_rhdl()
> 
> Signed-off-by: Adrian Bunk <bunk at stusta.de>
> 
> ---
> 
>  drivers/infiniband/hw/cxgb3/Makefile        |    1 
>  drivers/infiniband/hw/cxgb3/cxio_hal.c      |   31 +++++---------------
>  drivers/infiniband/hw/cxgb3/cxio_hal.h      |    5 ---
>  drivers/infiniband/hw/cxgb3/cxio_resource.c |   14 +--------
>  drivers/infiniband/hw/cxgb3/iwch_cm.c       |    5 +--
>  drivers/infiniband/hw/cxgb3/iwch_provider.c |    2 -
>  drivers/infiniband/hw/cxgb3/iwch_provider.h |    1 
>  drivers/infiniband/hw/cxgb3/iwch_qp.c       |   29 ++++++++----------
>  8 files changed, 27 insertions(+), 61 deletions(-)
> 
> --- linux-2.6.20-mm1/drivers/infiniband/hw/cxgb3/Makefile.old	2007-02-17 17:21:03.000000000 +0100
> +++ linux-2.6.20-mm1/drivers/infiniband/hw/cxgb3/Makefile	2007-02-17 17:21:08.000000000 +0100
> @@ -8,5 +8,4 @@
>  
>  ifdef CONFIG_INFINIBAND_CXGB3_DEBUG
>  EXTRA_CFLAGS += -DDEBUG
> -iw_cxgb3-y += cxio_dbg.o
>  endif
> --- linux-2.6.20-mm1/drivers/infiniband/hw/cxgb3/cxio_hal.h.old	2007-02-17 17:22:53.000000000 +0100
> +++ linux-2.6.20-mm1/drivers/infiniband/hw/cxgb3/cxio_hal.h	2007-02-17 17:25:08.000000000 +0100
> @@ -144,7 +144,6 @@
>  void cxio_rdev_close(struct cxio_rdev *rdev);
>  int cxio_hal_cq_op(struct cxio_rdev *rdev, struct t3_cq *cq,
>  		   enum t3_cq_opcode op, u32 credit);
> -int cxio_hal_clear_qp_ctx(struct cxio_rdev *rdev, u32 qpid);
>  int cxio_create_cq(struct cxio_rdev *rdev, struct t3_cq *cq);
>  int cxio_destroy_cq(struct cxio_rdev *rdev, struct t3_cq *cq);
>  int cxio_resize_cq(struct cxio_rdev *rdev, struct t3_cq *cq);
> @@ -155,8 +154,6 @@
>  int cxio_destroy_qp(struct cxio_rdev *rdev, struct t3_wq *wq,
>  		    struct cxio_ucontext *uctx);
>  int cxio_peek_cq(struct t3_wq *wr, struct t3_cq *cq, int opcode);
> -int cxio_allocate_stag(struct cxio_rdev *rdev, u32 * stag, u32 pdid,
> -		       enum tpt_mem_perm perm, u32 * pbl_size, u32 * pbl_addr);
>  int cxio_register_phys_mem(struct cxio_rdev *rdev, u32 * stag, u32 pdid,
>  			   enum tpt_mem_perm perm, u32 zbva, u64 to, u32 len,
>  			   u8 page_size, __be64 *pbl, u32 *pbl_size,
> @@ -172,8 +169,6 @@
>  int cxio_rdma_init(struct cxio_rdev *rdev, struct t3_rdma_init_attr *attr);
>  void cxio_register_ev_cb(cxio_hal_ev_callback_func_t ev_cb);
>  void cxio_unregister_ev_cb(cxio_hal_ev_callback_func_t ev_cb);
> -u32 cxio_hal_get_rhdl(void);
> -void cxio_hal_put_rhdl(u32 rhdl);
>  u32 cxio_hal_get_pdid(struct cxio_hal_resource *rscp);
>  void cxio_hal_put_pdid(struct cxio_hal_resource *rscp, u32 pdid);
>  int __init cxio_hal_init(void);
> --- linux-2.6.20-mm1/drivers/infiniband/hw/cxgb3/iwch_provider.h.old	2007-02-17 17:25:35.000000000 +0100
> +++ linux-2.6.20-mm1/drivers/infiniband/hw/cxgb3/iwch_provider.h	2007-02-17 17:25:41.000000000 +0100
> @@ -179,7 +179,6 @@
>  
>  void iwch_qp_add_ref(struct ib_qp *qp);
>  void iwch_qp_rem_ref(struct ib_qp *qp);
> -struct ib_qp *iwch_get_qp(struct ib_device *dev, int qpn);
>  
>  struct iwch_ucontext {
>  	struct ib_ucontext ibucontext;
> --- linux-2.6.20-mm1/drivers/infiniband/hw/cxgb3/iwch_provider.c.old	2007-02-17 17:25:50.000000000 +0100
> +++ linux-2.6.20-mm1/drivers/infiniband/hw/cxgb3/iwch_provider.c	2007-02-17 17:25:57.000000000 +0100
> @@ -949,7 +949,7 @@
>  	        wake_up(&(to_iwch_qp(qp)->wait));
>  }
>  
> -struct ib_qp *iwch_get_qp(struct ib_device *dev, int qpn)
> +static struct ib_qp *iwch_get_qp(struct ib_device *dev, int qpn)
>  {
>  	PDBG("%s ib_dev %p qpn 0x%x\n", __FUNCTION__, dev, qpn);
>  	return (struct ib_qp *)get_qhp(to_iwch_dev(dev), qpn);
> --- linux-2.6.20-mm1/drivers/infiniband/hw/cxgb3/iwch_qp.c.old	2007-02-17 17:27:31.000000000 +0100
> +++ linux-2.6.20-mm1/drivers/infiniband/hw/cxgb3/iwch_qp.c	2007-02-17 17:38:07.000000000 +0100
> @@ -37,8 +37,8 @@
>  
>  #define NO_SUPPORT -1
>  
> -static inline int iwch_build_rdma_send(union t3_wr *wqe, struct ib_send_wr *wr,
> -				       u8 * flit_cnt)
> +static int iwch_build_rdma_send(union t3_wr *wqe, struct ib_send_wr *wr,
> +				u8 * flit_cnt)
>  {
>  	int i;
>  	u32 plen;
> @@ -97,8 +97,8 @@
>  	return 0;
>  }
>  
> -static inline int iwch_build_rdma_write(union t3_wr *wqe, struct ib_send_wr *wr,
> -					u8 *flit_cnt)
> +static int iwch_build_rdma_write(union t3_wr *wqe, struct ib_send_wr *wr,
> +				 u8 *flit_cnt)
>  {
>  	int i;
>  	u32 plen;
> @@ -138,8 +138,8 @@
>  	return 0;
>  }
>  
> -static inline int iwch_build_rdma_read(union t3_wr *wqe, struct ib_send_wr *wr,
> -				       u8 *flit_cnt)
> +static int iwch_build_rdma_read(union t3_wr *wqe, struct ib_send_wr *wr,
> +				u8 *flit_cnt)
>  {
>  	if (wr->num_sge > 1)
>  		return -EINVAL;
> @@ -159,9 +159,8 @@
>  /*
>   * TBD: this is going to be moved to firmware. Missing pdid/qpid check for now.
>   */
> -static inline int iwch_sgl2pbl_map(struct iwch_dev *rhp,
> -				   struct ib_sge *sg_list, u32 num_sgle,
> -				   u32 * pbl_addr, u8 * page_size)
> +static int iwch_sgl2pbl_map(struct iwch_dev *rhp, struct ib_sge *sg_list,
> +			    u32 num_sgle, u32 * pbl_addr, u8 * page_size)
>  {
>  	int i;
>  	struct iwch_mr *mhp;
> @@ -207,9 +206,8 @@
>  	return 0;
>  }
>  
> -static inline int iwch_build_rdma_recv(struct iwch_dev *rhp,
> -						    union t3_wr *wqe,
> -						    struct ib_recv_wr *wr)
> +static int iwch_build_rdma_recv(struct iwch_dev *rhp, union t3_wr *wqe,
> +				struct ib_recv_wr *wr)
>  {
>  	int i, err = 0;
>  	u32 pbl_addr[4];
> @@ -474,8 +472,7 @@
>  	return err;
>  }
>  
> -static inline void build_term_codes(int t3err, u8 *layer_type, u8 *ecode,
> -				    int tagged)
> +static void build_term_codes(int t3err, u8 *layer_type, u8 *ecode, int tagged)
>  {
>  	switch (t3err) {
>  	case TPT_ERR_STAG:
> @@ -673,7 +670,7 @@
>  	spin_lock_irqsave(&qhp->lock, *flag);
>  }
>  
> -static inline void flush_qp(struct iwch_qp *qhp, unsigned long *flag)
> +static void flush_qp(struct iwch_qp *qhp, unsigned long *flag)
>  {
>  	if (t3b_device(qhp->rhp))
>  		cxio_set_wq_in_error(&qhp->wq);
> @@ -685,7 +682,7 @@
>  /*
>   * Return non zero if at least one RECV was pre-posted.
>   */
> -static inline int rqes_posted(struct iwch_qp *qhp)
> +static int rqes_posted(struct iwch_qp *qhp)
>  {
>  	return fw_riwrh_opcode((struct fw_riwrh *)qhp->wq.queue) == T3_WR_RCV;
>  }
> --- linux-2.6.20-mm1/drivers/infiniband/hw/cxgb3/iwch_cm.c.old	2007-02-17 17:27:53.000000000 +0100
> +++ linux-2.6.20-mm1/drivers/infiniband/hw/cxgb3/iwch_cm.c	2007-02-17 17:38:23.000000000 +0100
> @@ -210,8 +210,7 @@
>  	return state;
>  }
>  
> -static inline void __state_set(struct iwch_ep_common *epc,
> -			       enum iwch_ep_state new)
> +static void __state_set(struct iwch_ep_common *epc, enum iwch_ep_state new)
>  {
>  	epc->state = new;
>  }
> @@ -1460,7 +1459,7 @@
>  /*
>   * Returns whether an ABORT_REQ_RSS message is a negative advice.
>   */
> -static inline int is_neg_adv_abort(unsigned int status)
> +static int is_neg_adv_abort(unsigned int status)
>  {
>  	return status == CPL_ERR_RTX_NEG_ADVICE ||
>  	       status == CPL_ERR_PERSIST_NEG_ADVICE;
> --- linux-2.6.20-mm2/drivers/infiniband/hw/cxgb3/cxio_resource.c.old	2007-02-20 23:22:29.000000000 +0100
> +++ linux-2.6.20-mm2/drivers/infiniband/hw/cxgb3/cxio_resource.c	2007-02-20 23:12:04.000000000 +0100
> @@ -179,7 +179,7 @@
>  /*
>   * returns 0 if no resource available
>   */
> -static inline u32 cxio_hal_get_resource(struct kfifo *fifo)
> +static u32 cxio_hal_get_resource(struct kfifo *fifo)
>  {
>  	u32 entry;
>  	if (kfifo_get(fifo, (unsigned char *) &entry, sizeof(u32)))
> @@ -188,21 +188,11 @@
>  		return 0;	/* fifo emptry */
>  }
>  
> -static inline void cxio_hal_put_resource(struct kfifo *fifo, u32 entry)
> +static void cxio_hal_put_resource(struct kfifo *fifo, u32 entry)
>  {
>  	BUG_ON(kfifo_put(fifo, (unsigned char *) &entry, sizeof(u32)) == 0);
>  }
>  
> -u32 cxio_hal_get_rhdl(void)
> -{
> -	return cxio_hal_get_resource(rhdl_fifo);
> -}
> -
> -void cxio_hal_put_rhdl(u32 rhdl)
> -{
> -	cxio_hal_put_resource(rhdl_fifo, rhdl);
> -}
> -
>  u32 cxio_hal_get_stag(struct cxio_hal_resource *rscp)
>  {
>  	return cxio_hal_get_resource(rscp->tpt_fifo);
> --- linux-2.6.20-mm2/drivers/infiniband/hw/cxgb3/cxio_hal.c.old	2007-02-20 23:22:42.000000000 +0100
> +++ linux-2.6.20-mm2/drivers/infiniband/hw/cxgb3/cxio_hal.c	2007-02-20 23:12:43.000000000 +0100
> @@ -45,7 +45,7 @@
>  static LIST_HEAD(rdev_list);
>  static cxio_hal_ev_callback_func_t cxio_ev_cb = NULL;
>  
> -static inline struct cxio_rdev *cxio_hal_find_rdev_by_name(char *dev_name)
> +static struct cxio_rdev *cxio_hal_find_rdev_by_name(char *dev_name)
>  {
>  	struct cxio_rdev *rdev;
>  
> @@ -55,8 +55,7 @@
>  	return NULL;
>  }
>  
> -static inline struct cxio_rdev *cxio_hal_find_rdev_by_t3cdev(struct t3cdev
> -							     *tdev)
> +static struct cxio_rdev *cxio_hal_find_rdev_by_t3cdev(struct t3cdev *tdev)
>  {
>  	struct cxio_rdev *rdev;
>  
> @@ -118,7 +117,7 @@
>  	return 0;
>  }
>  
> -static inline int cxio_hal_clear_cq_ctx(struct cxio_rdev *rdev_p, u32 cqid)
> +static int cxio_hal_clear_cq_ctx(struct cxio_rdev *rdev_p, u32 cqid)
>  {
>  	struct rdma_cq_setup setup;
>  	setup.id = cqid;
> @@ -130,7 +129,7 @@
>  	return (rdev_p->t3cdev_p->ctl(rdev_p->t3cdev_p, RDMA_CQ_SETUP, &setup));
>  }
>  
> -int cxio_hal_clear_qp_ctx(struct cxio_rdev *rdev_p, u32 qpid)
> +static int cxio_hal_clear_qp_ctx(struct cxio_rdev *rdev_p, u32 qpid)
>  {
>  	u64 sge_cmd;
>  	struct t3_modify_qp_wr *wqe;
> @@ -425,7 +424,7 @@
>  	}
>  }
>  
> -static inline int cqe_completes_wr(struct t3_cqe *cqe, struct t3_wq *wq)
> +static int cqe_completes_wr(struct t3_cqe *cqe, struct t3_wq *wq)
>  {
>  	if (CQE_OPCODE(*cqe) == T3_TERMINATE)
>  		return 0;
> @@ -760,17 +759,6 @@
>  	return err;
>  }
>  
> -/* IN : stag key, pdid, pbl_size
> - * Out: stag index, actaul pbl_size, and pbl_addr allocated.
> - */
> -int cxio_allocate_stag(struct cxio_rdev *rdev_p, u32 * stag, u32 pdid,
> -		       enum tpt_mem_perm perm, u32 * pbl_size, u32 * pbl_addr)
> -{
> -	*stag = T3_STAG_UNSET;
> -	return (__cxio_tpt_op(rdev_p, 0, stag, 0, pdid, TPT_NON_SHARED_MR,
> -			      perm, 0, 0ULL, 0, 0, NULL, pbl_size, pbl_addr));
> -}
> -
>  int cxio_register_phys_mem(struct cxio_rdev *rdev_p, u32 *stag, u32 pdid,
>  			   enum tpt_mem_perm perm, u32 zbva, u64 to, u32 len,
>  			   u8 page_size, __be64 *pbl, u32 *pbl_size,
> @@ -1029,7 +1017,7 @@
>  	cxio_hal_destroy_rhdl_resource();
>  }
>  
> -static inline void flush_completed_wrs(struct t3_wq *wq, struct t3_cq *cq)
> +static void flush_completed_wrs(struct t3_wq *wq, struct t3_cq *cq)
>  {
>  	struct t3_swsq *sqp;
>  	__u32 ptr = wq->sq_rptr;
> @@ -1058,9 +1046,8 @@
>  			break;
>  }
>  
> -static inline void create_read_req_cqe(struct t3_wq *wq,
> -				       struct t3_cqe *hw_cqe,
> -				       struct t3_cqe *read_cqe)
> +static void create_read_req_cqe(struct t3_wq *wq, struct t3_cqe *hw_cqe,
> +				struct t3_cqe *read_cqe)
>  {
>  	read_cqe->u.scqe.wrid_hi = wq->oldest_read->sq_wptr;
>  	read_cqe->len = wq->oldest_read->read_len;
> @@ -1073,7 +1060,7 @@
>  /*
>   * Return a ptr to the next read wr in the SWSQ or NULL.
>   */
> -static inline void advance_oldest_read(struct t3_wq *wq)
> +static void advance_oldest_read(struct t3_wq *wq)
>  {
>  
>  	u32 rptr = wq->oldest_read - wq->sq + 1;
> 


From mst at mellanox.co.il  Wed Feb 21 06:21:17 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Wed, 21 Feb 2007 16:21:17 +0200
Subject: [openib-general] Fwd: Address List Change for Friday, 2/23/2007
In-Reply-To: <00D1093C-4EE4-4580-A57F-2B302E694C06@cisco.com>
References: <3D84A59A1AD3584DA02AEAD240E8863F0366947F@ES22SNLNT.srn.sandia.gov>
	<00D1093C-4EE4-4580-A57F-2B302E694C06@cisco.com>
Message-ID: <20070221142117.GD13024@mellanox.co.il>

Could an example message be please sent *today* to the new list,
so that client rules can be updated?

I can't access my inbox on Friday or Saturday, and this change
will cause problems and message loss for me unless I can prepare
beforehand.


Quoting Jeff Squyres <jsquyres at cisco.com>:
Subject: Fwd: Address List Change for Friday, 2/23/2007

FYI.  In case you missed it the first time: THIS LIST IS CHANGING ON  
FRIDAY 2/23/2007 (2 days from now).  Please update your addressbooks!

See the notice below for the details.


Begin forwarded message:

> From: "Lee, Michael Paichi" <mplee at sandia.gov>
> Date: February 19, 2007 10:43:23 AM EST
> To: openib-general at openib.org
> Subject: [openib-general] Address List Change for Friday, 2/23/2007
>
> We're in the process of migrating the maillists from the old  
> openib.org server to the new lists.openfabrics.org machine.  The  
> list openib-general will be moved this Friday, February 23, 2007.   
> The new address for the maillist will be  
> general at lists.openfabrics.org.
>
> What this means is that messages will come from  
> general at lists.openfabrics.org.  Conversely, replies should be made  
> to this address as well.  Messages will also have a new subject  
> line prefix of [OFA General].  If you have configured your e-mail  
> client to filter based on maillist address or subject headers, you  
> may need to make some adjustments for filtering.
>
> However, for the sake of transition, messages sent to the previous  
> maillist address on the old server will forward to the new server.   
> This forward will remain in place until the old server is taken  
> offline and final DNS changes are made.  We expect the old server  
> to go offline sometime in early March.
>
> The web archives will also be migrated to the new web address  
> shortly, http://lists.openfabrics.org.
>
> If you have any questions, please don't hesitate to contact me at  
> mplee at sandia.gov.
>
> Regards,
> Michael Lee
>
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
>
> To unsubscribe, please visit http://openib.org/mailman/listinfo/ 
> openib-general


-- 
Jeff Squyres
Server Virtualization Business Unit
Cisco Systems


_______________________________________________
openib-general mailing list
openib-general at openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

-- 
MST


From halr at voltaire.com  Wed Feb 21 06:31:58 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 21 Feb 2007 09:31:58 -0500
Subject: [openib-general] [ofw] [Fwd: Re: [Fwd: Re:
 winrelated[was:Re:[PATCH 1/2] opensm: sigusr1: syslog() fixes]]]
In-Reply-To: <6C2C79E72C305246B504CBA17B5500C9EBAFB0@mtlexch01.mtl.com>
References: <6C2C79E72C305246B504CBA17B5500C9EBAFB0@mtlexch01.mtl.com>
Message-ID: <1172068314.4380.390208.camel@hal.voltaire.com>

Tzachi,

On Wed, 2007-02-21 at 03:47, Tzachi Dar wrote:
> OK, Hal let's try to close this.

Thanks.

> The windows openib project was agreed by everyone to be BSD only.
> The fact that it is BSD means that any partner (or non partner) of the
> Community can download the code and use it, the way he wants.
> This includes:
> 	1) Running the code as is.
> 	2) Making changes to the code and contributing them back.
> 	3) Making changes to the code and *NOT* giving them back to the
> community.
> 
> Starting to depend on GPL (or LGPL) code means that the freedom of the
> users to do (3) is broken.
> Mellanox thinks that this needs a wider agreement of the open-IB
> consortium, which we don't have.

The package in question is licensed with LGPL. I don't think that LGPL
precludes usage #3 (although GPL precludes usage #3). See
http://www.gnu.org/licenses/lgpl.html particularly #5 and #6.

> More than that, the ideas that were introduced here about sending users
> to other places in order for
> them to find the pthread implementation are also not that great as this
> starts to make the life of our users harder.

Is this a major hurdle ? Is it substantially harder ?

> Also it is not clear who will give support once there are problems,

I would think it is from wherever they get OpenSM support. That support
may need to interact with this project on some basis. Is this different
from Linux (and pthreads) ? I agree that it is a change from the
existing model.

> and who is responsible that the license of the library won't change.

I'm not sure how to answer this one but I don't think the license can
just change. I guess if it did, we would need to deal with this when
that occurred. Are you aware of some impending change here ?

> So, I hope this closes the subject of using LGPL software in open-IB.

I don't think we're there yet...

> By the way, what implementation of pthreads were you thinking of? I have
> noticed that the first implementation that Google brings was only tested
> on uni-processor system.
> (http://sourceware.org/pthreads-win32/news.html). (this is really
> amazing, I thought that these servers were out of the market a long time
> ago).

I think that is old information and this also supports 64 bit
architectures as well.

> To be more practical:
> Can you give us a better view of what you are trying to achieve? In
> other words, as far as I know 
> Opensm is using complib apis to handle threads. The implementation of
> this functions on windows is usually trivial.
> Do you intend to make a re-write of opensm so that it will use pthreads
> or do you intend to make a find/replace
> And replace the complib functions with Pthreads ones? If we are talking
> about the second, than one can simply implement the pthread functions
> using trivial win32 calls.
> 
> And another question: What is the functionality that you are currently
> missing? Can this functionality be added?

There will be another posting addressing these questions.

-- Hal

> Thanks
> Tzachi
> 
> By the way, rumors I have heard say that Voltaire doesn't always give
> it's code back to the community, but this are just rumors, right?

> > -----Original Message-----
> > From: ofw-bounces at lists.openfabrics.org 
> > [mailto:ofw-bounces at lists.openfabrics.org] On Behalf Of Fab Tillier
> > Sent: Tuesday, February 20, 2007 11:56 PM
> > To: Hal Rosenstock
> > Cc: ofw at lists.openfabrics.org; Gilad Shainer; OPENIB
> > Subject: RE: [ofw] [Fwd: Re: [openib-general] [Fwd: Re: 
> > winrelated[was:Re:[PATCH 1/2] opensm: sigusr1: syslog() fixes]]]
> > 
> > -----Original Message-----
> > From: ofw-bounces at lists.openfabrics.org
> > [mailto:ofw-bounces at lists.openfabrics.org] On Behalf Of Hal Rosenstock
> > Sent: Tuesday, February 20, 2007 1:43 PM
> > 
> > On Tue, 2007-02-20 at 16:08, Fab Tillier wrote:
> > > -----Original Message-----
> > > From: Hal Rosenstock [mailto:halr at voltaire.com]
> > > Sent: Tuesday, February 20, 2007 10:57 AM
> > > 
> > > On Tue, 2007-02-20 at 13:56, Fab Tillier wrote:
> > > [ftillier] This isn't just an install issue - it's a build issue.
> > > Anyone that wants to build OpenSM will need to find/download/install
> > the
> > > pthreads library so that the build will succeed.  If linking
> > statically,
> > > the resulting executable will not require any special installation.
> > > It's only an install issue if you link dynamically to pitheads.
> > 
> > OK; then build and install. How big an issue is this ?
> > 
> > I thought DLLs were dynamically linked but I'm a Windows plebe. 
> > 
> > [ftillier] When you build, the linker needs the import 
> > library for pthreads so that the functions get resolved as 
> > being imported from the pthreads DLL.  The dependency on the 
> > pthreads DLL is then created and the DLL will be loaded 
> > dynamically, assuming it can be found in the path.
> > 
> > So for the build process, you need to have the pthreads 
> > library available to the build tool (path to the lib).  This 
> > requires installing the pthreads developer package or however 
> > it's done.
> > 
> > If you statically link the pthreads lib, rather than 
> > dynamically link, then all the pthreads goodies go directly 
> > into the executable and you remove the dependency on an 
> > external DLL.  The build process requirements are no 
> > different than for the dynamically linked case.
> > 
> > There is also the possibility to remove the link-time 
> > dependency by calling GetProcAddress to explicitly resolve 
> > the pthreads entrypoints.
> > This method still requires having the DLL loaded on the 
> > user's systems.
> > 
> > Pesonally, I would rather see static linkage to the pthreads 
> > library so that only the builds are affected (something only 
> > 'experts' will be doing), while not affecting the common user.
> > 
> > -Fab
> > _______________________________________________
> > ofw mailing list
> > ofw at lists.openfabrics.org
> > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ofw
> > 


From halr at voltaire.com  Wed Feb 21 06:38:40 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 21 Feb 2007 09:38:40 -0500
Subject: [openib-general] [ofw] [Fwd: Re: [Fwd: Re:
 winrelated[was:Re:[PATCH 1/2] opensm: sigusr1: syslog() fixes]]]
In-Reply-To: <1172068314.4380.390208.camel@hal.voltaire.com>
References: <6C2C79E72C305246B504CBA17B5500C9EBAFB0@mtlexch01.mtl.com>
	<1172068314.4380.390208.camel@hal.voltaire.com>
Message-ID: <1172068719.4380.390591.camel@hal.voltaire.com>

On Wed, 2007-02-21 at 09:31, Hal Rosenstock wrote:
> Tzachi,
> 
> On Wed, 2007-02-21 at 03:47, Tzachi Dar wrote:
> > OK, Hal let's try to close this.
> 
> Thanks.
> 
> > The windows openib project was agreed by everyone to be BSD only.
> > The fact that it is BSD means that any partner (or non partner) of the
> > Community can download the code and use it, the way he wants.
> > This includes:
> > 	1) Running the code as is.
> > 	2) Making changes to the code and contributing them back.
> > 	3) Making changes to the code and *NOT* giving them back to the
> > community.
> > 
> > Starting to depend on GPL (or LGPL) code means that the freedom of the
> > users to do (3) is broken.
> > Mellanox thinks that this needs a wider agreement of the open-IB
> > consortium, which we don't have.
> 
> The package in question is licensed with LGPL. I don't think that LGPL
> precludes usage #3 (although GPL precludes usage #3). See
> http://www.gnu.org/licenses/lgpl.html particularly #5 and #6.
> 
> > More than that, the ideas that were introduced here about sending users
> > to other places in order for
> > them to find the pthread implementation are also not that great as this
> > starts to make the life of our users harder.
> 
> Is this a major hurdle ? Is it substantially harder ?
> 
> > Also it is not clear who will give support once there are problems,
> 
> I would think it is from wherever they get OpenSM support. That support
> may need to interact with this project on some basis. Is this different
> from Linux (and pthreads) ? I agree that it is a change from the
> existing model.
> 
> > and who is responsible that the license of the library won't change.
> 
> I'm not sure how to answer this one but I don't think the license can
> just change. I guess if it did, we would need to deal with this when
> that occurred. Are you aware of some impending change here ?
> 
> > So, I hope this closes the subject of using LGPL software in open-IB.
> 
> I don't think we're there yet...
> 
> > By the way, what implementation of pthreads were you thinking of? I have
> > noticed that the first implementation that Google brings was only tested
> > on uni-processor system.
> > (http://sourceware.org/pthreads-win32/news.html). (this is really
> > amazing, I thought that these servers were out of the market a long time
> > ago).
> 
> I think that is old information and this also supports 64 bit
> architectures as well.

I think I found what you were referring to:

http://sources.redhat.com/pthreads-win32/news.html

RELEASE 2.8.0
-------------
Testing and verification
------------------------
This release has not yet been tested on SMP architechtures. All tests
pass on a uni-processor system.

RELEASE 2.7.0
-------------
Testing and verification
------------------------
This release has been tested (passed the test suite) on both
uni-processor and multi-processor systems.

Release 2.8.0 is relatively new (2006-12-22).

> > To be more practical:
> > Can you give us a better view of what you are trying to achieve? In
> > other words, as far as I know 
> > Opensm is using complib apis to handle threads. The implementation of
> > this functions on windows is usually trivial.
> > Do you intend to make a re-write of opensm so that it will use pthreads
> > or do you intend to make a find/replace
> > And replace the complib functions with Pthreads ones? If we are talking
> > about the second, than one can simply implement the pthread functions
> > using trivial win32 calls.
> > 
> > And another question: What is the functionality that you are currently
> > missing? Can this functionality be added?
> 
> There will be another posting addressing these questions.
> 
> -- Hal
> 
> > Thanks
> > Tzachi
> > 
> > By the way, rumors I have heard say that Voltaire doesn't always give
> > it's code back to the community, but this are just rumors, right?
> 
> > > -----Original Message-----
> > > From: ofw-bounces at lists.openfabrics.org 
> > > [mailto:ofw-bounces at lists.openfabrics.org] On Behalf Of Fab Tillier
> > > Sent: Tuesday, February 20, 2007 11:56 PM
> > > To: Hal Rosenstock
> > > Cc: ofw at lists.openfabrics.org; Gilad Shainer; OPENIB
> > > Subject: RE: [ofw] [Fwd: Re: [openib-general] [Fwd: Re: 
> > > winrelated[was:Re:[PATCH 1/2] opensm: sigusr1: syslog() fixes]]]
> > > 
> > > -----Original Message-----
> > > From: ofw-bounces at lists.openfabrics.org
> > > [mailto:ofw-bounces at lists.openfabrics.org] On Behalf Of Hal Rosenstock
> > > Sent: Tuesday, February 20, 2007 1:43 PM
> > > 
> > > On Tue, 2007-02-20 at 16:08, Fab Tillier wrote:
> > > > -----Original Message-----
> > > > From: Hal Rosenstock [mailto:halr at voltaire.com]
> > > > Sent: Tuesday, February 20, 2007 10:57 AM
> > > > 
> > > > On Tue, 2007-02-20 at 13:56, Fab Tillier wrote:
> > > > [ftillier] This isn't just an install issue - it's a build issue.
> > > > Anyone that wants to build OpenSM will need to find/download/install
> > > the
> > > > pthreads library so that the build will succeed.  If linking
> > > statically,
> > > > the resulting executable will not require any special installation.
> > > > It's only an install issue if you link dynamically to pitheads.
> > > 
> > > OK; then build and install. How big an issue is this ?
> > > 
> > > I thought DLLs were dynamically linked but I'm a Windows plebe. 
> > > 
> > > [ftillier] When you build, the linker needs the import 
> > > library for pthreads so that the functions get resolved as 
> > > being imported from the pthreads DLL.  The dependency on the 
> > > pthreads DLL is then created and the DLL will be loaded 
> > > dynamically, assuming it can be found in the path.
> > > 
> > > So for the build process, you need to have the pthreads 
> > > library available to the build tool (path to the lib).  This 
> > > requires installing the pthreads developer package or however 
> > > it's done.
> > > 
> > > If you statically link the pthreads lib, rather than 
> > > dynamically link, then all the pthreads goodies go directly 
> > > into the executable and you remove the dependency on an 
> > > external DLL.  The build process requirements are no 
> > > different than for the dynamically linked case.
> > > 
> > > There is also the possibility to remove the link-time 
> > > dependency by calling GetProcAddress to explicitly resolve 
> > > the pthreads entrypoints.
> > > This method still requires having the DLL loaded on the 
> > > user's systems.
> > > 
> > > Pesonally, I would rather see static linkage to the pthreads 
> > > library so that only the builds are affected (something only 
> > > 'experts' will be doing), while not affecting the common user.
> > > 
> > > -Fab
> > > _______________________________________________
> > > ofw mailing list
> > > ofw at lists.openfabrics.org
> > > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ofw
> > > 


From sashak at voltaire.com  Wed Feb 21 07:25:55 2007
From: sashak at voltaire.com (Sasha Khapyorsky)
Date: Wed, 21 Feb 2007 17:25:55 +0200
Subject: [openib-general] [ofw] [Fwd: Re: [Fwd: Re:
 winrelated[was:Re:[PATCH 1/2] opensm: sigusr1: syslog() fixes]]]
In-Reply-To: <6C2C79E72C305246B504CBA17B5500C9EBAFB0@mtlexch01.mtl.com>
References: <6C2C79E72C305246B504CBA17B5500C9EBAFB0@mtlexch01.mtl.com>
Message-ID: <20070221152555.GK27414@sashak.voltaire.com>

On 10:47 Wed 21 Feb     , Tzachi Dar wrote:
> OK, Hal let's try to close this.
> 
> The windows openib project was agreed by everyone to be BSD only.
> The fact that it is BSD means that any partner (or non partner) of the
> Community can download the code and use it, the way he wants.
> This includes:
> 	1) Running the code as is.
> 	2) Making changes to the code and contributing them back.
> 	3) Making changes to the code and *NOT* giving them back to the
> community.
> 
> Starting to depend on GPL (or LGPL) code means that the freedom of the
> users to do (3) is broken.

Indeed it would be broken with GPL, but it is _not_ the case with LGPL.
It is fragment from LGPL:

5. A program that contains no derivative of any portion of the Library,
but is designed to work with the Library by being compiled or linked
with it, is called a "work that uses the Library". Such a work, in
isolation, is not a derivative work of the Library, and therefore falls
outside the scope of this License.

> By the way, what implementation of pthreads were you thinking of? I have
> noticed that the first implementation that Google brings was only tested
> on uni-processor system.
> (http://sourceware.org/pthreads-win32/news.html).

Release 2.7.0 of pthreads-w32 was tested on SMP too (as stated in there
http://sourceware.org/pthreads-win32/news.html)

> 
> To be more practical:
> Can you give us a better view of what you are trying to achieve? In
> other words, as far as I know 
> Opensm is using complib apis to handle threads.

Right, and it has very limited and sometimes broken functionality.

> The implementation of
> this functions on windows is usually trivial.
> Do you intend to make a re-write of opensm so that it will use pthreads
> or do you intend to make a find/replace
> And replace the complib functions with Pthreads ones? If we are talking
> about the second, than one can simply implement the pthread functions
> using trivial win32 calls.

I'm fine with this idea (additional functionality will be needed
however). I would suppose that using ready-to-use pthread library is
simpler, but it is up to you - I guess any working pthread implementation
should be fine for us. Hal?

> And another question: What is the functionality that you are currently
> missing?

Mainly conditional variables (pthread_cond_wait(),
pthread_cond_timedwait()), proper thread cancellation primitives
(including threads cleanup), probably some another things later.

> Can this functionality be added?

Probably, but AFAIK Windows don't have pthread_cond_wait() equivalent,
so I don't know.


> 
> Thanks
> Tzachi
> 
> By the way, rumors I have heard say that Voltaire doesn't always give
> it's code back to the community, but this are just rumors, right?

Hey Tzachi, I will not spend the time in order to investigate a "rumors
you have heard". If you have to say something just do it, I don't think
somebody should deal with a "rumors".

Sasha


From jsquyres at cisco.com  Wed Feb 21 07:28:15 2007
From: jsquyres at cisco.com (Jeff Squyres)
Date: Wed, 21 Feb 2007 10:28:15 -0500
Subject: [openib-general] Address List Change for Friday, 2/23/2007
In-Reply-To: <20070221142117.GD13024@mellanox.co.il>
References: <3D84A59A1AD3584DA02AEAD240E8863F0366947F@ES22SNLNT.srn.sandia.gov>
	<00D1093C-4EE4-4580-A57F-2B302E694C06@cisco.com>
	<20070221142117.GD13024@mellanox.co.il>
Message-ID: <924EA79E-8FE2-49A5-85AB-84B7749D535C@cisco.com>

Can you look at the other lists that have migrated for examples?   
(e.g., ewg)

It may be complex to send an actual example message *before* the list  
moves.


On Feb 21, 2007, at 9:21 AM, Michael S. Tsirkin wrote:

> Could an example message be please sent *today* to the new list,
> so that client rules can be updated?
>
> I can't access my inbox on Friday or Saturday, and this change
> will cause problems and message loss for me unless I can prepare
> beforehand.
>
>
> Quoting Jeff Squyres <jsquyres at cisco.com>:
> Subject: Fwd: Address List Change for Friday, 2/23/2007
>
> FYI.  In case you missed it the first time: THIS LIST IS CHANGING ON
> FRIDAY 2/23/2007 (2 days from now).  Please update your addressbooks!
>
> See the notice below for the details.
>
>
>
> Begin forwarded message:
>
>> From: "Lee, Michael Paichi" <mplee at sandia.gov>
>> Date: February 19, 2007 10:43:23 AM EST
>> To: openib-general at openib.org
>> Subject: [openib-general] Address List Change for Friday, 2/23/2007
>>
>> We're in the process of migrating the maillists from the old
>> openib.org server to the new lists.openfabrics.org machine.  The
>> list openib-general will be moved this Friday, February 23, 2007.
>> The new address for the maillist will be
>> general at lists.openfabrics.org.
>>
>> What this means is that messages will come from
>> general at lists.openfabrics.org.  Conversely, replies should be made
>> to this address as well.  Messages will also have a new subject
>> line prefix of [OFA General].  If you have configured your e-mail
>> client to filter based on maillist address or subject headers, you
>> may need to make some adjustments for filtering.
>>
>> However, for the sake of transition, messages sent to the previous
>> maillist address on the old server will forward to the new server.
>> This forward will remain in place until the old server is taken
>> offline and final DNS changes are made.  We expect the old server
>> to go offline sometime in early March.
>>
>> The web archives will also be migrated to the new web address
>> shortly, http://lists.openfabrics.org.
>>
>> If you have any questions, please don't hesitate to contact me at
>> mplee at sandia.gov.
>>
>> Regards,
>> Michael Lee
>>
>> _______________________________________________
>> openib-general mailing list
>> openib-general at openib.org
>> http://openib.org/mailman/listinfo/openib-general
>>
>> To unsubscribe, please visit http://openib.org/mailman/listinfo/
>> openib-general
>
>
> -- 
> Jeff Squyres
> Server Virtualization Business Unit
> Cisco Systems
>
>
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
>
> To unsubscribe, please visit http://openib.org/mailman/listinfo/ 
> openib-general
>
> -- 
> MST


-- 
Jeff Squyres
Server Virtualization Business Unit
Cisco Systems


From mst at mellanox.co.il  Wed Feb 21 07:34:02 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Wed, 21 Feb 2007 17:34:02 +0200
Subject: [openib-general] Address List Change for Friday, 2/23/2007
In-Reply-To: <924EA79E-8FE2-49A5-85AB-84B7749D535C@cisco.com>
References: <924EA79E-8FE2-49A5-85AB-84B7749D535C@cisco.com>
Message-ID: <20070221153402.GB17761@mellanox.co.il>

> Quoting Jeff Squyres <jsquyres at cisco.com>:
> Subject: Re: Address List Change for Friday, 2/23/2007
> 
> Can you look at the other lists that have migrated for examples?   
> (e.g., ewg)

If I look at other lists, there's no guarantee the rule will catch
the actual message.

> 
> It may be complex to send an actual example message *before* the list  
> moves.

In this case, maybe the migration can be done in the middle of the week?

> 
> On Feb 21, 2007, at 9:21 AM, Michael S. Tsirkin wrote:
> 
> > Could an example message be please sent *today* to the new list,
> > so that client rules can be updated?
> >
> > I can't access my inbox on Friday or Saturday, and this change
> > will cause problems and message loss for me unless I can prepare
> > beforehand.
> >
> >
> > Quoting Jeff Squyres <jsquyres at cisco.com>:
> > Subject: Fwd: Address List Change for Friday, 2/23/2007
> >
> > FYI.  In case you missed it the first time: THIS LIST IS CHANGING ON
> > FRIDAY 2/23/2007 (2 days from now).  Please update your addressbooks!
> >
> > See the notice below for the details.
> >
> >
> >
> > Begin forwarded message:
> >
> >> From: "Lee, Michael Paichi" <mplee at sandia.gov>
> >> Date: February 19, 2007 10:43:23 AM EST
> >> To: openib-general at openib.org
> >> Subject: [openib-general] Address List Change for Friday, 2/23/2007
> >>
> >> We're in the process of migrating the maillists from the old
> >> openib.org server to the new lists.openfabrics.org machine.  The
> >> list openib-general will be moved this Friday, February 23, 2007.
> >> The new address for the maillist will be
> >> general at lists.openfabrics.org.
> >>
> >> What this means is that messages will come from
> >> general at lists.openfabrics.org.  Conversely, replies should be made
> >> to this address as well.  Messages will also have a new subject
> >> line prefix of [OFA General].  If you have configured your e-mail
> >> client to filter based on maillist address or subject headers, you
> >> may need to make some adjustments for filtering.
> >>
> >> However, for the sake of transition, messages sent to the previous
> >> maillist address on the old server will forward to the new server.
> >> This forward will remain in place until the old server is taken
> >> offline and final DNS changes are made.  We expect the old server
> >> to go offline sometime in early March.
> >>
> >> The web archives will also be migrated to the new web address
> >> shortly, http://lists.openfabrics.org.
> >>
> >> If you have any questions, please don't hesitate to contact me at
> >> mplee at sandia.gov.
> >>
> >> Regards,
> >> Michael Lee
> >>
> >> _______________________________________________
> >> openib-general mailing list
> >> openib-general at openib.org
> >> http://openib.org/mailman/listinfo/openib-general
> >>
> >> To unsubscribe, please visit http://openib.org/mailman/listinfo/
> >> openib-general
> >
> >
> > -- 
> > Jeff Squyres
> > Server Virtualization Business Unit
> > Cisco Systems
> >
> >
> > _______________________________________________
> > openib-general mailing list
> > openib-general at openib.org
> > http://openib.org/mailman/listinfo/openib-general
> >
> > To unsubscribe, please visit http://openib.org/mailman/listinfo/ 
> > openib-general
> >
> > -- 
> > MST


-- 
Jeff Squyres
Server Virtualization Business Unit
Cisco Systems

-- 
MST


From jsquyres at cisco.com  Wed Feb 21 08:08:59 2007
From: jsquyres at cisco.com (Jeff Squyres)
Date: Wed, 21 Feb 2007 11:08:59 -0500
Subject: [openib-general] Address List Change for Friday, 2/23/2007
In-Reply-To: <20070221153402.GB17761@mellanox.co.il>
References: <924EA79E-8FE2-49A5-85AB-84B7749D535C@cisco.com>
	<20070221153402.GB17761@mellanox.co.il>
Message-ID: <F65CFD32-1CD6-49AD-A91D-38CBED4236C6@cisco.com>

On Feb 21, 2007, at 10:34 AM, Michael S. Tsirkin wrote:

>> Can you look at the other lists that have migrated for examples?
>> (e.g., ewg)
>
> If I look at other lists, there's no guarantee the rule will catch
> the actual message.

Can't you just paste in the new address of the list in your existing  
rules?  I must be missing something.

>> It may be complex to send an actual example message *before* the list
>> moves.
>
> In this case, maybe the migration can be done in the middle of the  
> week?

I'll let Michael Lee answer; we're currently driving off his goodwill  
and his schedule.

I guess I didn't see why this was complex -- if a few mails get  
misplaced over the weekend because cutting-n-pasting the new e-mail  
address into existing rules somehow didn't work, is there a huge  
problem?

-- 
Jeff Squyres
Server Virtualization Business Unit
Cisco Systems


From sashak at voltaire.com  Wed Feb 21 08:43:19 2007
From: sashak at voltaire.com (Sasha Khapyorsky)
Date: Wed, 21 Feb 2007 18:43:19 +0200
Subject: [openib-general] [ofw] [Fwd: Re: [Fwd: Re:
 winrelated[was:Re:[PATCH 1/2] opensm: sigusr1: syslog() fixes]]]
In-Reply-To: <1172068314.4380.390208.camel@hal.voltaire.com>
References: <6C2C79E72C305246B504CBA17B5500C9EBAFB0@mtlexch01.mtl.com>
	<1172068314.4380.390208.camel@hal.voltaire.com>
Message-ID: <20070221164319.GO27414@sashak.voltaire.com>

On 09:31 Wed 21 Feb     , Hal Rosenstock wrote:
> 
> > and who is responsible that the license of the library won't change.
> 
> I'm not sure how to answer this one but I don't think the license can
> just change.

The license changing will not work "backward", only "forward". So if
some version was released under LGPL this version still be usable under
LGPL.

Sasha


From tzachid at mellanox.co.il  Wed Feb 21 08:56:47 2007
From: tzachid at mellanox.co.il (Tzachi Dar)
Date: Wed, 21 Feb 2007 18:56:47 +0200
Subject: [openib-general] [ofw] [Fwd: Re: [Fwd:
 Re:winrelated[was:Re:[PATCH 1/2] opensm: sigusr1: syslog() fixes]]]
Message-ID: <6C2C79E72C305246B504CBA17B5500C9EBB1F2@mtlexch01.mtl.com>

What you are saying is true but there is a problem with that:
If the community decides that there is a different license than probably
most people will
move to it. The rest of the people will stay with an old software that
is not supported
At all.

There are two examples that one can give here:
1) Think of people who have started to write code under the GPL V1. Can
they still find support for that today.
Are there still projects being developed? The version was changed and
everyone had to except it.
2) The second example (from recent time) is of course Novell. (I must
say here that (1) I'm not a lawyer, (2) I'm not an expert to the case,
and (3) I really don't want to start an arguing about Novel). Novel was
using Linux under the GPL code. It did things that were not in the
spirit of GPL but probably didn't break it. Now there is a movement to
change GPL so that Novell will not be able to use it any more. I really
don't know who is right or who is wrong here but if we can avoid being
in that place that is better.

To be on the practical side, I have read the introduction to pthreads in
the past and from what I saw it was relatively easy to implement that on
Win32. I want to look at the functions that were mentioned before in
this thread and see if that is still the case.

Let me get back to you on this at the beginning of next week.

Thanks
Tzachi

> -----Original Message-----
> From: ofw-bounces at lists.openfabrics.org 
> [mailto:ofw-bounces at lists.openfabrics.org] On Behalf Of Sasha 
> Khapyorsky
> Sent: Wednesday, February 21, 2007 6:43 PM
> To: Hal Rosenstock
> Cc: ofw at lists.openfabrics.org; Gilad Shainer; OPENIB; Fab Tillier
> Subject: Re: [openib-general] [ofw] [Fwd: Re: [Fwd: 
> Re:winrelated[was:Re:[PATCH 1/2] opensm: sigusr1: syslog() fixes]]]
> 
> On 09:31 Wed 21 Feb     , Hal Rosenstock wrote:
> > 
> > > and who is responsible that the license of the library 
> won't change.
> > 
> > I'm not sure how to answer this one but I don't think the 
> license can 
> > just change.
> 
> The license changing will not work "backward", only 
> "forward". So if some version was released under LGPL this 
> version still be usable under LGPL.
> 
> Sasha
> _______________________________________________
> ofw mailing list
> ofw at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ofw
> 


From or.gerlitz at gmail.com  Wed Feb 21 09:05:50 2007
From: or.gerlitz at gmail.com (Or Gerlitz)
Date: Wed, 21 Feb 2007 19:05:50 +0200
Subject: [openib-general] [PATCH for-2.6.21] IPoIB/cm: improve small
 message bandwidth
In-Reply-To: <20070221132159.GC7711@mellanox.co.il>
References: <45DC40A9.507@voltaire.com> <20070221132159.GC7711@mellanox.co.il>
Message-ID: <15ddcffd0702210905v4bddbd06n656679c4985d0bf2@mail.gmail.com>

On 2/21/07, Michael S. Tsirkin <mst at mellanox.co.il> wrote:
>> Quoting r. Or Gerlitz <ogerlitz at voltaire.com>:

>> I understand that in ipoib_cm_alloc_rx_skb you call dma_map_page on
>> IPOIB_CM_RX_SG pages where here you call dma_unmap_single only $frags
>> times, correct?

> No.

OK, lets keep this simple: does the ipoib cm post recv flow number of
calls to dma_map_xxx equals to the ipoib cm recv completion handling
number of calls to dma_unmap_xxx ???

Or.


From msalisbury at interactivesupercomputing.com  Wed Feb 21 09:08:05 2007
From: msalisbury at interactivesupercomputing.com (Mark Salisbury)
Date: Wed, 21 Feb 2007 12:08:05 -0500
Subject: [openib-general] <NOOB> initial setup problems
Message-ID: <200702211208.05613.msalisbury@interactivesupercomputing.com>

trying to setup ofed-1.1 on mellanox HW using Intel MPI.

trying to run an MPI hello world equivalent, I get most of the way 
through startup and then it bombs out.

I am unable to find any info about unexpected DAPL event 4008

here is the output of an example run:
running mpdallexit on raki1
LAUNCHED mpd on raki1  via
RUNNING: mpd on raki1
LAUNCHED mpd on raki2  via  raki1
LAUNCHED mpd on raki4  via  raki1
RUNNING: mpd on raki4
RUNNING: mpd on raki2
I_MPI: [0] check_one_device(): attributes for device:
I_MPI: [0] check_one_device(): NEEDS_LDAT                                MAYBE
I_MPI: [0] check_one_device(): HAS_COLLECTIVES                           (null)
I_MPI: [0] check_one_device(): I_MPI_LIBRARY_VERSION                     3.0
I_MPI: [0] check_one_device(): I_MPI_VERSION_DATE_OF_BUILD               Fri Sep 15 14:32:24 MSD 2006
I_MPI: [0] check_one_device(): I_MPI_VERSION_PKGNAME_UNTARRED            mpi_src.32.svsmpi004.20060915
I_MPI: [0] check_one_device(): I_MPI_VERSION_MY_CMD_NAME_CVS_ID          ./BUILD_MPI.sh version: BUILD_MPI.sh,v 1.62 2006/09/15 08:43:15 Exp $
I_MPI: [0] check_one_device(): I_MPI_VERSION_MY_CMD_LINE                 ./BUILD_MPI.sh -pkg_name mpi_src.32.svsmpi004.20060915.tar.gz -explode -explode_dirname mpi2.32e.svsmpi020.20060915 -all -copyout -noinstall
I_MPI: [0] check_one_device(): I_MPI_VERSION_MACHINENAME                 svsmpi020
I_MPI: [0] check_one_device(): I_MPI_DEVICE_VERSION                      3.0.20060915
I_MPI: [0] check_one_device(): I_MPI_GCC_VERSION                         3.4.4 20050721 (Red Hat 3.4.4-2)
I_MPI: [0] set_up_devices(): I_MPI_DAPL_PROVIDER    = NULL
I_MPI: [0] set_up_devices(): I_MPI_DAPL_HOST_SUFFIX = NULL
I_MPI: [0] set_up_devices(): I_MPI_DAPL_HOST        = NULL
I_MPI: [0] set_up_devices(): I_MPI_DAPL_IP_ADDR     = NULL
I_MPI: [0] set_up_devices(): I_MPI_DAPL_PORT        = NULL
I_MPI: [0] check_one_device(): attributes for device:
I_MPI: [0] check_one_device(): NEEDS_LDAT                                MAYBE
I_MPI: [0] check_one_device(): HAS_COLLECTIVES                           (null)
I_MPI: [0] check_one_device(): I_MPI_LIBRARY_VERSION                     3.0
I_MPI: [0] check_one_device(): I_MPI_VERSION_DATE_OF_BUILD               Fri Sep 15 14:32:24 MSD 2006
I_MPI: [0] check_one_device(): I_MPI_VERSION_PKGNAME_UNTARRED            mpi_src.32.svsmpi004.20060915
I_MPI: [0] check_one_device(): I_MPI_VERSION_MY_CMD_NAME_CVS_ID          ./BUILD_MPI.sh version: BUILD_MPI.sh,v 1.62 2006/09/15 08:43:15 Exp $
I_MPI: [0] check_one_device(): I_MPI_VERSION_MY_CMD_LINE                 ./BUILD_MPI.sh -pkg_name mpi_src.32.svsmpi004.20060915.tar.gz -explode -explode_dirname mpi2.32e.svsmpi020.20060915 -all -copyout -noinstall
I_MPI: [0] check_one_device(): I_MPI_VERSION_MACHINENAME                 svsmpi020
I_MPI: [0] check_one_device(): I_MPI_DEVICE_VERSION                      3.0.20060915
I_MPI: [0] check_one_device(): I_MPI_GCC_VERSION                         3.4.4 20050721 (Red Hat 3.4.4-2)
I_MPI: [0] set_up_devices(): I_MPI_DAPL_PROVIDER    = NULL
I_MPI: [0] set_up_devices(): I_MPI_DAPL_HOST_SUFFIX = NULL
I_MPI: [0] set_up_devices(): I_MPI_DAPL_HOST        = NULL
I_MPI: [0] set_up_devices(): I_MPI_DAPL_IP_ADDR     = NULL
I_MPI: [0] set_up_devices(): I_MPI_DAPL_PORT        = NULL
I_MPI: [0] check_one_device(): attributes for device:
I_MPI: [0] check_one_device(): NEEDS_LDAT                                MAYBE
I_MPI: [0] check_one_device(): HAS_COLLECTIVES                           (null)
I_MPI: [0] check_one_device(): I_MPI_LIBRARY_VERSION                     3.0
I_MPI: [0] check_one_device(): I_MPI_VERSION_DATE_OF_BUILD               Fri Sep 15 14:32:24 MSD 2006
I_MPI: [0] check_one_device(): I_MPI_VERSION_PKGNAME_UNTARRED            mpi_src.32.svsmpi004.20060915
I_MPI: [0] check_one_device(): I_MPI_VERSION_MY_CMD_NAME_CVS_ID          ./BUILD_MPI.sh version: BUILD_MPI.sh,v 1.62 2006/09/15 08:43:15 Exp $
I_MPI: [0] check_one_device(): I_MPI_VERSION_MY_CMD_LINE                 ./BUILD_MPI.sh -pkg_name mpi_src.32.svsmpi004.20060915.tar.gz -explode -explode_dirname mpi2.32e.svsmpi020.20060915 -all -copyout -noinstall
I_MPI: [0] check_one_device(): I_MPI_VERSION_MACHINENAME                 svsmpi020
I_MPI: [0] check_one_device(): I_MPI_DEVICE_VERSION                      3.0.20060915
I_MPI: [0] check_one_device(): I_MPI_GCC_VERSION                         3.4.4 20050721 (Red Hat 3.4.4-2)
I_MPI: [0] set_up_devices(): I_MPI_DAPL_PROVIDER    = NULL
I_MPI: [0] set_up_devices(): I_MPI_DAPL_HOST_SUFFIX = NULL
I_MPI: [0] set_up_devices(): I_MPI_DAPL_HOST        = NULL
I_MPI: [0] set_up_devices(): I_MPI_DAPL_IP_ADDR     = NULL
I_MPI: [0] set_up_devices(): I_MPI_DAPL_PORT        = NULL
I_MPI: [0] I_MPI_dlopen_dat(): trying to dlopen default -ldat: libdat.so
I_MPI: [2] I_MPI_dlopen_dat(): I_MPI: [0] my_dlopen(): trying to dlopen: libdat.sotrying to dlopen default -ldat: libdat.so
I_MPI: [2] my_dlopen(): trying to dlopen: libdat.so

I_MPI: [1] I_MPI_dlopen_dat(): trying to dlopen default -ldat: libdat.so
I_MPI: [1] my_dlopen(): trying to dlopen: libdat.so
I_MPI: [0] MPIDI_CH3I_RDMA_init(): will use DAPL provider from registry: OpenIB-cma
I_MPI: [1] MPIDI_CH3I_RDMA_init(): I_MPI: [2] MPIDI_CH3I_RDMA_init(): will use DAPL provider from registry: OpenIB-cma
will use DAPL provider from registry: OpenIB-cma
I_MPI: [0] MPIDI_CH3_Init(): will use rdma configuration
I_MPI: [0] MPI_Init: The process (pid=17898) started on raki1
Greetings from process 17898(0)
I_MPI: [1] MPIDI_CH3_Init(): will use rdma configuration
I_MPI: [1] MPI_Init: The process (pid=16216) started on raki2
I_MPI: [2] MPIDI_CH3_Init(): will use rdma configuration
I_MPI: [2] MPI_Init: The process (pid=16330) started on raki4
[2:raki4] unexpected DAPL event 4008 from 0:raki1
[1:raki2] unexpected DAPL event 4008 from 0:raki1
rank 2 in job 1  raki1_37392   caused collective abort of all ranks
  exit status of rank 2: return code 254
rank 1 in job 1  raki1_37392   caused collective abort of all ranks
  exit status of rank 1: return code 254


From sean.hefty at intel.com  Wed Feb 21 09:09:35 2007
From: sean.hefty at intel.com (Sean Hefty)
Date: Wed, 21 Feb 2007 09:09:35 -0800
Subject: [openib-general] [PATCH] librdmacm: fix bug causing failure to
 work with partial membership pkey
In-Reply-To: <45DBEA1F.5090901@voltaire.com>
Message-ID: <000101c755db$0f9ec290$8698070a@amr.corp.intel.com>

>However, no matter what the SM configures, the core & ipoib code act as
>the full pkey is there. This is nice simplification and it works well.

Is the problem here really in the librdmacm or in the core/ipoib software?

(I looked at the patch, but haven't looked into the full reason why it's
needed.)

- Sean


From mst at mellanox.co.il  Wed Feb 21 09:14:20 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Wed, 21 Feb 2007 19:14:20 +0200
Subject: [openib-general] [PATCH for-2.6.21] IPoIB/cm: improve small
 message bandwidth
In-Reply-To: <15ddcffd0702210905v4bddbd06n656679c4985d0bf2@mail.gmail.com>
References: <45DC40A9.507@voltaire.com> <20070221132159.GC7711@mellanox.co.il>
	<15ddcffd0702210905v4bddbd06n656679c4985d0bf2@mail.gmail.com>
Message-ID: <20070221171420.GB22672@mellanox.co.il>

> Quoting r. Or Gerlitz <or.gerlitz at gmail.com>:
> Subject: Re: [PATCH for-2.6.21] IPoIB/cm: improve small message bandwidth
> 
> On 2/21/07, Michael S. Tsirkin <mst at mellanox.co.il> wrote:
> >> Quoting r. Or Gerlitz <ogerlitz at voltaire.com>:
> 
> >> I understand that in ipoib_cm_alloc_rx_skb you call dma_map_page on
> >> IPOIB_CM_RX_SG pages where here you call dma_unmap_single only $frags
> >> times, correct?
> 
> > No.
> 
> OK, lets keep this simple: does the ipoib cm post recv flow number of
> calls to dma_map_xxx equals to the ipoib cm recv completion handling
> number of calls to dma_unmap_xxx ???

AFAIK yes.

_______________________________________________
openib-general mailing list
openib-general at openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

-- 
MST


From krause at cup.hp.com  Wed Feb 21 09:25:11 2007
From: krause at cup.hp.com (Michael Krause)
Date: Wed, 21 Feb 2007 09:25:11 -0800
Subject: [openib-general] Immediate data question
In-Reply-To: <309a667c0702202121p52747748ic891a9d21a02e3d7@mail.gmail.co
 m>
References: <6C2C79E72C305246B504CBA17B5500C905DC04@mtlexch01.mtl.com>
	<349DCDA352EACF42A0C49FA6DCEA84035DFAFF@G3W0634.americas.hpqcorp.net>
	<adahctxeds8.fsf@cisco.com>
	<6.2.0.14.2.20070208131610.085c52f8@esmail.cup.hp.com>
	<349DCDA352EACF42A0C49FA6DCEA840362A87E@G3W0634.americas.hpqcorp.net>
	<309a667c0702112110h7a79961fv9c3cf46d4392e1d4@mail.gmail.com>
	<309a667c0702130537u35745e98y429d3d564fb093e9@mail.gmail.com>
	<6.2.0.14.2.20070213125130.07f4dbf8@esmail.cup.hp.com>
	<309a667c0702142137p724172f5va93a0ef046a60483@mail.gmail.com>
	<6.2.0.14.2.20070215071309.0979bed8@esmail.cup.hp.com>
	<309a667c0702202121p52747748ic891a9d21a02e3d7@mail.gmail.com>
Message-ID: <6.2.0.14.2.20070221092429.02cc6380@esmail.cup.hp.com>

At 09:21 PM 2/20/2007, Devesh Sharma wrote:
>On 2/15/07, Michael Krause <krause at cup.hp.com> wrote:
>>At 09:37 PM 2/14/2007, Devesh Sharma wrote:
>> >On 2/14/07, Michael Krause <krause at cup.hp.com> wrote:
>> >>At 05:37 AM 2/13/2007, Devesh Sharma wrote:
>> >> >On 2/12/07, Devesh Sharma <devesh28 at gmail.com> wrote:
>> >> >>On 2/10/07, Tang, Changqing <changquing.tang at hp.com> wrote:
>> >> >> > > >
>> >> >> > > >Not for the receiver, but the sender will be severely slowed 
>> down by
>> >> >> > > >having to wait for the RNR timeouts.
>> >> >> > >
>> >> >> > > RNR = Receiver Not Ready so by definition, the data flow
>> >> >> > > isn't going to
>> >> >> > > progress until the receiver is ready to receive data.   If a
>> >> >> > > receive QP
>> >> >> > > enters RNR for a RC, then it is likely not progressing as
>> >> >> > > desired.   RNR
>> >> >> > > was initially put in place to enable a receiver to create
>> >> >> > > back pressure to the sender without causing a fatal error
>> >> >> > > condition.  It should rarely be entered and therefore should
>> >> >> > > have negligible impact on overall performance however when a
>> >> >> > > RNR occurs, no forward progress will occur so performance is
>> >> >> > > essentially zero.
>> >> >> >
>> >> >> > Mike:
>> >> >> >         I still do not quite understand this issue. I have two
>> >> >> > situations that have RNR triggered.
>> >> >> >
>> >> >> > 1. process A and process B is connected with QP. A first post a 
>> send to
>> >> >> > B, B does not post receive. Then A and B are doing a long time
>> >> >> > RDMA_WRITE each other, A and B just check memory for the RDMA_WRITE
>> >> >> > message. Finally B will post a receive. Does the first pending send
>> >> in A
>> >> >> > block all the later RDMA_WRITE ?
>> >> >>According to IBTA spec HCA will process WR entries in strict order in
>> >> >>which they are posted so the send will block all WR posted after this
>> >> >>send, Until-unless HCA has multiple processing elements, I think even
>> >> >>then processing order will be maintained by HCA
>> >> >>  If not, since RNR is triggered
>> >> >> > periodically till B post receive, does it affect the RDMA_WRITE
>> >> >> > performance between A and B ?
>> >> >> >
>> >> >> > 2. extend above to three processes, A connect to B, B connect to C,
>> >> so B
>> >> >> > has two QPs, but one CQ.A posts a send to B, B does not post 
>> receive,
>> >> >post ordering accross QP is not guaranteed hence presence of same CQ
>> >> >or different CQ will not affect any thing.
>> >> >> > rather B and C are doing a long time RDMA_WRITE,or send/recv. But B
>> >> >If RDMA WRITE _on_ B, no effect on performance. If RDMA WRITE _on_ C,
>> >I am sorry I have missed that in both cases same DMA channel is in use.
>> >> >_may_ affect the performance, since load is on same HCA. In case of
>> >> >Send/Recv again _may_ affect the performance, with the same reason.
>> >>
>> >>Seems orthogonal.  Any time h/w is shared, multiple flows will have an
>> >>impact on one another.  That is why we have the different arbitration
>> >>mechanisms to enable one to control that impact.
>> >Please, can you explain it more clearly?
>>
>>Most I/O devices are shared by multiple applications / kernel
>>subsystems.   Hence, the device acts as a serialization point for what goes
>>on the wire / link.   Sharing = resource contention and in order to add any
>>structure to that contention, a number of technologies provide arbitration
>>options.   In the case of IB, the arbitration is confined to VL arbitration
>>where a given data flow is assigned to a VL and that VL is services at some
>>particular rate.   A number of years ago I wrote up how one might also
>>provide QP arbitration (not part of the IBTA specifications) and I
>>understand some implementations have incorporated that or a variation of
>>the mechanisms into their products.
>Thanks mike for a nice explanation. I am sorry for the late reply,
>Now I got it, here Chang is trying to find out performance hit due to
>RNR NAK, performance hit due to device sharing is any how going to be
>there so "load on same HCA" is not the proper explanation.
>Am I correct now?

Yes.   You need to separate RNR NAK performance impacts as distinct from 
the multiple application sharing impacts.

Mike


>>In addition to IB link contention, there is also PCI link / bus
>>contention.   For PCIe, given most designs did not want to waste resources
>>on multiple VC, there really isn't any standard arbitration
>>mechanism.   However, many devices, especially a device like a HCA or a
>>RNIC, already have the concept of separate resource domains, e.g. QP, and
>>they provide a mechanism to associate how the QP's DMA requests or
>>interrupts requests are scheduled to the PCIe link.
>>
>>
>> >> >> > must sends RNR periodically to A, right?. So does the pending 
>> message
>> >> >> > from A affects B's overall performance  between B and C ?
>> >> >But RNR NAK is not for very long time.....possibly this performance
>> >> >hit you will not be able to observe even. The moment rnr_counter
>> >> >expires connection will be broken!
>> >>
>> >>Keep in mind the timeout can be infinite.  RNR NAK are not expected to be
>> >>frequent so their performance impact was considered reasonable.
>> >Thanks I missed that.
>>
>>It is a subtlety within the specification that is easy to miss.
>>
>>Mike
>>
>>


From vlad at dev.mellanox.co.il  Wed Feb 21 09:56:45 2007
From: vlad at dev.mellanox.co.il (Vladimir Sokolovsky)
Date: Wed, 21 Feb 2007 19:56:45 +0200
Subject: [openib-general] OFED-1.2-20070221-1741.tgz package is available
Message-ID: <1172080605.5256.35.camel@vladsk-laptop>

New OFED build is available:

http://www.openfabrics.org/builds/ofed-1.2/OFED-1.2-20070221-1741.tgz


Bugzilla is updated with fixed issues:
https://bugs.openfabrics.org/


-- 
Vladimir Sokolovsky <vlad at dev.mellanox.co.il>
Mellanox Technologies Ltd.


From swise at opengridcomputing.com  Wed Feb 21 10:02:28 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Wed, 21 Feb 2007 12:02:28 -0600
Subject: [openib-general] OFED-1.2-20070221-1741.tgz package is available
In-Reply-To: <1172080605.5256.35.camel@vladsk-laptop>
References: <1172080605.5256.35.camel@vladsk-laptop>
Message-ID: <1172080948.27101.15.camel@stevo-desktop>

Hey Vlad,

What about bugs: 355 and 357?


On Wed, 2007-02-21 at 19:56 +0200, Vladimir Sokolovsky wrote:
> New OFED build is available:
> 
> http://www.openfabrics.org/builds/ofed-1.2/OFED-1.2-20070221-1741.tgz
> 
> 
> Bugzilla is updated with fixed issues:
> https://bugs.openfabrics.org/
> 
> 


From mst at mellanox.co.il  Wed Feb 21 10:14:43 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Wed, 21 Feb 2007 20:14:43 +0200
Subject: [openib-general] [ewg] Re: OFED-1.2-20070221-1741.tgz package
 is available
In-Reply-To: <1172080948.27101.15.camel@stevo-desktop>
References: <1172080605.5256.35.camel@vladsk-laptop>
	<1172080948.27101.15.camel@stevo-desktop>
Message-ID: <20070221181443.GB27239@mellanox.co.il>

Steve, can't you post a patch for 357?

Quoting Steve Wise <swise at opengridcomputing.com>:
Subject: [ewg] Re: [openib-general] OFED-1.2-20070221-1741.tgz package is available

Hey Vlad,

What about bugs: 355 and 357?


On Wed, 2007-02-21 at 19:56 +0200, Vladimir Sokolovsky wrote:
> New OFED build is available:
> 
> http://www.openfabrics.org/builds/ofed-1.2/OFED-1.2-20070221-1741.tgz
> 
> 
> Bugzilla is updated with fixed issues:
> https://bugs.openfabrics.org/
> 
> 


_______________________________________________
ewg mailing list
ewg at lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg

-- 
MST


From vlad at dev.mellanox.co.il  Wed Feb 21 10:16:01 2007
From: vlad at dev.mellanox.co.il (Vladimir Sokolovsky)
Date: Wed, 21 Feb 2007 20:16:01 +0200
Subject: [openib-general] OFED-1.2-20070221-1741.tgz package is available
In-Reply-To: <1172080948.27101.15.camel@stevo-desktop>
References: <1172080605.5256.35.camel@vladsk-laptop>
	<1172080948.27101.15.camel@stevo-desktop>
Message-ID: <1172081761.5256.45.camel@vladsk-laptop>

On Wed, 2007-02-21 at 12:02 -0600, Steve Wise wrote:
> Hey Vlad,
> 
> What about bugs: 355 and 357?
> 
> 
Bug: 355 (problems building modules that depend on OFED 1.2 modules)

In order to build kernel modules depending on OFED's modules you need to
take Modules.symvers file from <prefix>/src/openib/Modules.symvers (part
of kernel-ib-devel RPM) and copy this to modules subdir and then compile
your module.

Currently I see that <prefix>/src/openib/Modules.symvers is empty. I
will check this issue.

For now you can use the attached script to create Modules.symvers file.

Bug: 357 (cxgb3 can't be selected on sles9sp3)

cxgb3 driver compilation failed on sles9sp3 in previous OFED build.
Then it was disabled in build_env.sh script in order to prevent OFED
installation failure.
Did you fixed this compilation issue?


-- 
Vladimir Sokolovsky <vlad at dev.mellanox.co.il>
Mellanox Technologies Ltd.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: create_Module.symvers.sh
Type: application/x-shellscript
Size: 1031 bytes
Desc: not available
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070221/e8d4641d/attachment.bin>

From swise at opengridcomputing.com  Wed Feb 21 10:34:27 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Wed, 21 Feb 2007 12:34:27 -0600
Subject: [openib-general] [ewg] Re: OFED-1.2-20070221-1741.tgz package
 is available
In-Reply-To: <20070221181443.GB27239@mellanox.co.il>
References: <1172080605.5256.35.camel@vladsk-laptop>
	<1172080948.27101.15.camel@stevo-desktop>
	<20070221181443.GB27239@mellanox.co.il>
Message-ID: <1172082867.27101.25.camel@stevo-desktop>

On Wed, 2007-02-21 at 20:14 +0200, Michael S. Tsirkin wrote:
> Steve, can't you post a patch for 357?
> 

I could, but I'm not sure what is _not_ supported for SLES9SP3.  The
script currently only allows mthca, sdp, and ipoib. cxgb3 should be
allowed.  But probably most other drivers too...

I can provide a patch to allow cxgb3 if that's what you want...


Steve.


From robert.j.woodruff at intel.com  Wed Feb 21 10:49:45 2007
From: robert.j.woodruff at intel.com (Woodruff, Robert J)
Date: Wed, 21 Feb 2007 10:49:45 -0800
Subject: [openib-general] Git on hosting.openfabrics.org server seems broken
Message-ID: <BAE9DCEF64577A439B3A37F36F9B691C01C344B0@orsmsx418.amr.corp.intel.com>


I appears that when I clone anyone's git tree 
locally on the hosting.openfabrics.org server,
it only clones the master branch and I get none of the branches.
The only difference in what I do now from what I did before 
is that the git version on the server is now 1.5.0., and before it was 
git version 1.4.4.3. 

Has anyone tried to do a clone of a git tree on the 
server lately ? Can you see the git branches of the cloned
tree with git-branch.

woody


From sashak at voltaire.com  Wed Feb 21 11:09:22 2007
From: sashak at voltaire.com (Sasha Khapyorsky)
Date: Wed, 21 Feb 2007 21:09:22 +0200
Subject: [openib-general] Git on hosting.openfabrics.org server seems
	broken
In-Reply-To: <BAE9DCEF64577A439B3A37F36F9B691C01C344B0@orsmsx418.amr.corp.intel.com>
References: <BAE9DCEF64577A439B3A37F36F9B691C01C344B0@orsmsx418.amr.corp.intel.com>
Message-ID: <20070221190922.GU27414@sashak.voltaire.com>

On 10:49 Wed 21 Feb     , Woodruff, Robert J wrote:
> 
> I appears that when I clone anyone's git tree 
> locally on the hosting.openfabrics.org server,
> it only clones the master branch and I get none of the branches.
> The only difference in what I do now from what I did before 
> is that the git version on the server is now 1.5.0., and before it was 
> git version 1.4.4.3. 

Default branch layout was slightly changed with 1.5.0. Look at:

http://lkml.org/lkml/2007/2/13/426

(but I think it should be due to your local git version, not?)

> Has anyone tried to do a clone of a git tree on the 
> server lately ? Can you see the git branches of the cloned
> tree with git-branch.

git-branch -r

Sasha


From robert.j.woodruff at intel.com  Wed Feb 21 11:09:23 2007
From: robert.j.woodruff at intel.com (Woodruff, Robert J)
Date: Wed, 21 Feb 2007 11:09:23 -0800
Subject: [openib-general] Git on hosting.openfabrics.org server seems
	broken
In-Reply-To: <20070221190922.GU27414@sashak.voltaire.com>
Message-ID: <BAE9DCEF64577A439B3A37F36F9B691C01C34535@orsmsx418.amr.corp.intel.com>

Aaarg!!! Not only is git terse and difficult to use,
but once you finally learn the commands, they 
change them on you in the next version. 

-----Original Message-----
From: Sasha Khapyorsky [mailto:sashak at voltaire.com] 
Sent: Wednesday, February 21, 2007 11:09 AM
To: Woodruff, Robert J
Cc: OPENIB; Michael S. Tsirkin
Subject: Re: Git on hosting.openfabrics.org server seems broken

On 10:49 Wed 21 Feb     , Woodruff, Robert J wrote:
> 
> I appears that when I clone anyone's git tree 
> locally on the hosting.openfabrics.org server,
> it only clones the master branch and I get none of the branches.
> The only difference in what I do now from what I did before 
> is that the git version on the server is now 1.5.0., and before it was

> git version 1.4.4.3. 

Default branch layout was slightly changed with 1.5.0. Look at:

http://lkml.org/lkml/2007/2/13/426

(but I think it should be due to your local git version, not?)

> Has anyone tried to do a clone of a git tree on the 
> server lately ? Can you see the git branches of the cloned
> tree with git-branch.

git-branch -r

Sasha


From sean.hefty at intel.com  Wed Feb 21 11:49:49 2007
From: sean.hefty at intel.com (Sean Hefty)
Date: Wed, 21 Feb 2007 11:49:49 -0800
Subject: [openib-general] IB routing discussion summary
In-Reply-To: <6.2.0.14.2.20070220103929.02953a20@esmail.cup.hp.com>
Message-ID: <000201c755f1$727618d0$8698070a@amr.corp.intel.com>

I sent a message on this topic to the IBTA several days ago, but I am still
awaiting details (likely early next week).

>It should not be carried in the CM REQ.  The SLID / DLID of the router
>ports should be derived through local subnet SA / SM query.  When a CM REQ
>traverses one or more subnets there will be potentially many SLID / DLID
>involved in the communication.   Each router should be populating its
>routing tables in order to build the new LRH attached to the GRH / CM REQ
>that it is forwarding to the next hop.

I'm referring to configuration of the QP, not the operation of the routers.

To establish a connection, the passive side QP needs to transition from Init to
RTR.  As part of that transition, the modify QP verb needs as input the
Destination LID of its local router.  It sounds like you expect the passive side
to perform an SA query to obtain its own local routing information, which would
essentially invalidate the data carried in the primary and alternate path fields
in the CM REQ.

>From reading 12.7.11, 13.5.1, and 17.4, I do not believe that such a requirement
was expected to be placed on the passive side of a connection.  The initial
response I received agreed with this.

>I'd need to go back but the architecture is predicated that the SM and SA
>are strictly local and for security purposes their communication should
>remain local.  Higher level management entities built to communicate with
>SM and SA are responsible for cross subnet communications without exposing
>the SA or SM to direct interaction.  P_Key and Q_Key management across
>subnets is an example of such communication across subnets that would not
>be exposed to the SA and SM.

My initial thoughts are that this sounds like a good idea.  It's not eliminating
the need for interacting with a remote SA, so much as it abstracts it to another
entity.

My hope is that we can reach an agreement on the CM REQ.  Depending on that, it
still needs to determine if the existing SA attributes are sufficient to allow
forming inter-subnet connections, and if they are, can such attributes be
obtained.

- Sean


From or.gerlitz at gmail.com  Wed Feb 21 12:34:11 2007
From: or.gerlitz at gmail.com (Or Gerlitz)
Date: Wed, 21 Feb 2007 22:34:11 +0200
Subject: [openib-general] [PATCH] librdmacm: fix bug causing failure to
 work with partial membership pkey
In-Reply-To: <000101c755db$0f9ec290$8698070a@amr.corp.intel.com>
References: <45DBEA1F.5090901@voltaire.com>
	<000101c755db$0f9ec290$8698070a@amr.corp.intel.com>
Message-ID: <15ddcffd0702211234j36b00a99i944e77ee0837d8c3@mail.gmail.com>

On 2/21/07, Sean Hefty <sean.hefty at intel.com> wrote:
>>However, no matter what the SM configures, the core & ipoib code act as
>>the full pkey is there. This is nice simplification and it works well.

> Is the problem here really in the librdmacm or in the core/ipoib software?

There is no problem. As i have explained over this thread the ipoib
and the core abstract away from the user the actual value of the MSb
of the pkey, that is whether it is a full or partial membership pkey.
IPoIB does it by OR-ing 0x8000 to the pkey it uses and the core does
it in ib_find_cached_pkey() which when provided a pkey, return the
index of $pkey or of $pkey & 0x7fff which ever one of the them is
there. The only missing piece is for librdmacm to play this game as
well and the patch does this.

> (I looked at the patch, but haven't looked into the full reason why it's
> needed.)

start with checking me... tell the SM to configure 0x7fff instead of
0xffff to one of your nodes  as the pkey at index 0, then see that
ping is working but librdmacm RC utils such as rping or ib_rdma_bw -c
do not. Then apply the patch and check again.

Or.
Or.


From sean.hefty at intel.com  Wed Feb 21 12:42:38 2007
From: sean.hefty at intel.com (Sean Hefty)
Date: Wed, 21 Feb 2007 12:42:38 -0800
Subject: [openib-general] [PATCH] librdmacm: fix bug causing failure to
 work with partial membership pkey
In-Reply-To: <15ddcffd0702211234j36b00a99i944e77ee0837d8c3@mail.gmail.com>
Message-ID: <000301c755f8$d2f265e0$8698070a@amr.corp.intel.com>

>There is no problem. As i have explained over this thread the ipoib
>and the core abstract away from the user the actual value of the MSb
>of the pkey, that is whether it is a full or partial membership pkey.

But *why* does the kernel code do this, and should it?

- Sean


From swise at opengridcomputing.com  Wed Feb 21 12:45:39 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Wed, 21 Feb 2007 14:45:39 -0600
Subject: [openib-general] [PATCH 2.6.21] iw_cxgb3: Stop the EP Timer on BAD
	CLOSE.
Message-ID: <1172090739.27101.39.camel@stevo-desktop>

Stop the ep timer in ec_status() if the status indicates a
bad close.

Signed-off-by: Steve Wise <swise at opengridcomputing.com>
---

 drivers/infiniband/hw/cxgb3/iwch_cm.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/drivers/infiniband/hw/cxgb3/iwch_cm.c b/drivers/infiniband/hw/cxgb3/iwch_cm.c
index e5442e3..d00e5dd 100644
--- a/drivers/infiniband/hw/cxgb3/iwch_cm.c
+++ b/drivers/infiniband/hw/cxgb3/iwch_cm.c
@@ -1635,6 +1635,7 @@ static int ec_status(struct t3cdev *tdev
 
 		printk(KERN_ERR MOD "%s BAD CLOSE - Aborting tid %u\n",
 		       __FUNCTION__, ep->hwtid);
+		stop_ep_timer(ep);
 		attrs.next_state = IWCH_QP_STATE_ERROR;
 		iwch_modify_qp(ep->com.qp->rhp,
 			       ep->com.qp, IWCH_QP_ATTR_NEXT_STATE,


From or.gerlitz at gmail.com  Wed Feb 21 12:45:44 2007
From: or.gerlitz at gmail.com (Or Gerlitz)
Date: Wed, 21 Feb 2007 22:45:44 +0200
Subject: [openib-general] [PATCH] librdmacm: fix bug causing failure to
 work with partial membership pkey
In-Reply-To: <1172064021.4380.385825.camel@hal.voltaire.com>
References: <Pine.LNX.4.64.0702190838050.26497@zuben>
	<1171984010.4380.304008.camel@hal.voltaire.com>
	<45DB15F5.4090406@voltaire.com>
	<1171986159.4380.306117.camel@hal.voltaire.com>
	<45DBEA1F.5090901@voltaire.com>
	<1172058368.4380.379947.camel@hal.voltaire.com>
	<45DC3C96.8040100@voltaire.com>
	<1172064021.4380.385825.camel@hal.voltaire.com>
Message-ID: <15ddcffd0702211245w2686b97bhcaf7e86aaa3dedf5@mail.gmail.com>

On 21 Feb 2007 08:20:23 -0500, Hal Rosenstock <halr at voltaire.com> wrote:
> On Wed, 2007-02-21 at 07:35, Or Gerlitz wrote:

> > > I believe it is a spec (compliance) violation for the port to be a
> > > partial member and join as a full member.

> > Since partial members can't talk among themselves, there is no reason to
> > form a multicast group containing --only-- ports that can --not-- talk
> > to each other... So if the spec does not allow this (having a partial
> > member joining with the full member pkey) - it a spec bug...

> I think there are two issues here then:
> 1. If this is the case, getting the spec changed to accomodate this use case
> 2. I believe that OpenIB code is supposed to be spec compliant.

If the IPoIB spec does not allow both partial and full members of a
partition to share a broadcast domain (eg the IPv4 broadcast group
associated with the full membership pkey) or any other multicast
group, burn it (or at least the relevant section).

The OpenIB code supposed to work and as done with the RDMA CM header,
the implementation should not wait for spec to be written or changed.

Or.


From swise at opengridcomputing.com  Wed Feb 21 12:46:40 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Wed, 21 Feb 2007 14:46:40 -0600
Subject: [openib-general] [PATCH ofed_1_2] iw_cxgb3: Stop the EP Timer on
	BAD CLOSE.
Message-ID: <1172090800.27101.40.camel@stevo-desktop>


Stop the ep timer in ec_status() if the status indicates a
bad close.

Signed-off-by: Steve Wise <swise at opengridcomputing.com>
---

 drivers/infiniband/hw/cxgb3/iwch_cm.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/drivers/infiniband/hw/cxgb3/iwch_cm.c b/drivers/infiniband/hw/cxgb3/iwch_cm.c
index e5442e3..d00e5dd 100644
--- a/drivers/infiniband/hw/cxgb3/iwch_cm.c
+++ b/drivers/infiniband/hw/cxgb3/iwch_cm.c
@@ -1635,6 +1635,7 @@ static int ec_status(struct t3cdev *tdev
 
 		printk(KERN_ERR MOD "%s BAD CLOSE - Aborting tid %u\n",
 		       __FUNCTION__, ep->hwtid);
+		stop_ep_timer(ep);
 		attrs.next_state = IWCH_QP_STATE_ERROR;
 		iwch_modify_qp(ep->com.qp->rhp,
 			       ep->com.qp, IWCH_QP_ATTR_NEXT_STATE,


From swise at opengridcomputing.com  Wed Feb 21 12:48:21 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Wed, 21 Feb 2007 14:48:21 -0600
Subject: [openib-general] [PATCH 1/2] ofed_1_2 Fix copyrights in the
 cxgb3 driver.
In-Reply-To: <1171569595.13282.60.camel@stevo-desktop>
References: <1171569595.13282.60.camel@stevo-desktop>
Message-ID: <1172090901.27101.42.camel@stevo-desktop>

Vlad,

Please apply this to ofed_1_2.

Thanks,

Steve.


On Thu, 2007-02-15 at 13:59 -0600, Steve Wise wrote:
> Fix copyrights in the cxgb3 driver.
> 
> Remove the Open Grid Computing copyright.  It shouldn't be there.
> 
> Signed-off-by: Steve Wise <swise at opengridcomputing.com>
> ---
> 
>  drivers/net/cxgb3/cxgb3_defs.h    |    1 -
>  drivers/net/cxgb3/cxgb3_offload.c |    1 -
>  drivers/net/cxgb3/cxgb3_offload.h |    1 -
>  drivers/net/cxgb3/l2t.c           |    1 -
>  drivers/net/cxgb3/l2t.h           |    1 -
>  drivers/net/cxgb3/t3cdev.h        |    1 -
>  6 files changed, 0 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/net/cxgb3/cxgb3_defs.h b/drivers/net/cxgb3/cxgb3_defs.h
> old mode 100755
> new mode 100644
> index 16e0049..e14862b
> --- a/drivers/net/cxgb3/cxgb3_defs.h
> +++ b/drivers/net/cxgb3/cxgb3_defs.h
> @@ -1,6 +1,5 @@
>  /*
>   * Copyright (c) 2006-2007 Chelsio, Inc. All rights reserved.
> - * Copyright (c) 2006-2007 Open Grid Computing, Inc. All rights reserved.
>   *
>   * This software is available to you under a choice of one of two
>   * licenses.  You may choose to be licensed under the terms of the GNU
> diff --git a/drivers/net/cxgb3/cxgb3_offload.c b/drivers/net/cxgb3/cxgb3_offload.c
> old mode 100755
> new mode 100644
> index c3a02d6..46e9068
> --- a/drivers/net/cxgb3/cxgb3_offload.c
> +++ b/drivers/net/cxgb3/cxgb3_offload.c
> @@ -1,6 +1,5 @@
>  /*
>   * Copyright (c) 2006-2007 Chelsio, Inc. All rights reserved.
> - * Copyright (c) 2006-2007 Open Grid Computing, Inc. All rights reserved.
>   *
>   * This software is available to you under a choice of one of two
>   * licenses.  You may choose to be licensed under the terms of the GNU
> diff --git a/drivers/net/cxgb3/cxgb3_offload.h b/drivers/net/cxgb3/cxgb3_offload.h
> old mode 100755
> new mode 100644
> index 0e6beb6..f15446a
> --- a/drivers/net/cxgb3/cxgb3_offload.h
> +++ b/drivers/net/cxgb3/cxgb3_offload.h
> @@ -1,6 +1,5 @@
>  /*
>   * Copyright (c) 2006-2007 Chelsio, Inc. All rights reserved.
> - * Copyright (c) 2006-2007 Open Grid Computing, Inc. All rights reserved.
>   *
>   * This software is available to you under a choice of one of two
>   * licenses.  You may choose to be licensed under the terms of the GNU
> diff --git a/drivers/net/cxgb3/l2t.c b/drivers/net/cxgb3/l2t.c
> old mode 100755
> new mode 100644
> index 3c0cb85..d660af7
> --- a/drivers/net/cxgb3/l2t.c
> +++ b/drivers/net/cxgb3/l2t.c
> @@ -1,6 +1,5 @@
>  /*
>   * Copyright (c) 2003-2007 Chelsio, Inc. All rights reserved.
> - * Copyright (c) 2006-2007 Open Grid Computing, Inc. All rights reserved.
>   *
>   * This software is available to you under a choice of one of two
>   * licenses.  You may choose to be licensed under the terms of the GNU
> diff --git a/drivers/net/cxgb3/l2t.h b/drivers/net/cxgb3/l2t.h
> old mode 100755
> new mode 100644
> index ba5d2cb..d790013
> --- a/drivers/net/cxgb3/l2t.h
> +++ b/drivers/net/cxgb3/l2t.h
> @@ -1,6 +1,5 @@
>  /*
>   * Copyright (c) 2003-2007 Chelsio, Inc. All rights reserved.
> - * Copyright (c) 2006-2007 Open Grid Computing, Inc. All rights reserved.
>   *
>   * This software is available to you under a choice of one of two
>   * licenses.  You may choose to be licensed under the terms of the GNU
> diff --git a/drivers/net/cxgb3/t3cdev.h b/drivers/net/cxgb3/t3cdev.h
> old mode 100755
> new mode 100644
> index 9af3bcd..fa4099b
> --- a/drivers/net/cxgb3/t3cdev.h
> +++ b/drivers/net/cxgb3/t3cdev.h
> @@ -1,6 +1,5 @@
>  /*
>   * Copyright (C) 2006-2007 Chelsio Communications.  All rights reserved.
> - * Copyright (C) 2006-2007 Open Grid Computing, Inc.  All rights reserved.
>   *
>   * This software is available to you under a choice of one of two
>   * licenses.  You may choose to be licensed under the terms of the GNU
> 
> 
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
> 


From swise at opengridcomputing.com  Wed Feb 21 12:48:37 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Wed, 21 Feb 2007 14:48:37 -0600
Subject: [openib-general] [PATCH 2/2] ofed_1_2 Fix copyrights in the
 iw_cxgb3 driver.
In-Reply-To: <1171569621.13282.62.camel@stevo-desktop>
References: <1171569621.13282.62.camel@stevo-desktop>
Message-ID: <1172090917.27101.44.camel@stevo-desktop>

And this one too...

Thanks,

Steve.


On Thu, 2007-02-15 at 14:00 -0600, Steve Wise wrote:
> Fix copyrights in the iw_cxgb3 driver.
> 
> Remove the Open Grid Computing copyright.  It shouldn't be there.
> 
> Signed-off-by: Steve Wise <swise at opengridcomputing.com>
> ---
> 
>  drivers/infiniband/hw/cxgb3/core/cxio_dbg.c      |    1 -
>  drivers/infiniband/hw/cxgb3/core/cxio_hal.c      |    1 -
>  drivers/infiniband/hw/cxgb3/core/cxio_hal.h      |    1 -
>  drivers/infiniband/hw/cxgb3/core/cxio_resource.c |    1 -
>  drivers/infiniband/hw/cxgb3/core/cxio_resource.h |    1 -
>  drivers/infiniband/hw/cxgb3/core/cxio_wr.h       |    1 -
>  drivers/infiniband/hw/cxgb3/iwch.c               |    1 -
>  drivers/infiniband/hw/cxgb3/iwch.h               |    1 -
>  drivers/infiniband/hw/cxgb3/iwch_cm.c            |    1 -
>  drivers/infiniband/hw/cxgb3/iwch_cm.h            |    1 -
>  drivers/infiniband/hw/cxgb3/iwch_cq.c            |    1 -
>  drivers/infiniband/hw/cxgb3/iwch_ev.c            |    1 -
>  drivers/infiniband/hw/cxgb3/iwch_mem.c           |    1 -
>  drivers/infiniband/hw/cxgb3/iwch_provider.c      |    1 -
>  drivers/infiniband/hw/cxgb3/iwch_provider.h      |    1 -
>  drivers/infiniband/hw/cxgb3/iwch_qp.c            |    1 -
>  drivers/infiniband/hw/cxgb3/iwch_user.h          |    1 -
>  17 files changed, 0 insertions(+), 17 deletions(-)
> 
> diff --git a/drivers/infiniband/hw/cxgb3/core/cxio_dbg.c b/drivers/infiniband/hw/cxgb3/core/cxio_dbg.c
> index dfaa704..d6b6c97 100644
> --- a/drivers/infiniband/hw/cxgb3/core/cxio_dbg.c
> +++ b/drivers/infiniband/hw/cxgb3/core/cxio_dbg.c
> @@ -1,6 +1,5 @@
>  /*
>   * Copyright (c) 2006 Chelsio, Inc. All rights reserved.
> - * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved.
>   *
>   * This software is available to you under a choice of one of two
>   * licenses.  You may choose to be licensed under the terms of the GNU
> diff --git a/drivers/infiniband/hw/cxgb3/core/cxio_hal.c b/drivers/infiniband/hw/cxgb3/core/cxio_hal.c
> index 5e31816..229edd5 100644
> --- a/drivers/infiniband/hw/cxgb3/core/cxio_hal.c
> +++ b/drivers/infiniband/hw/cxgb3/core/cxio_hal.c
> @@ -1,6 +1,5 @@
>  /*
>   * Copyright (c) 2006 Chelsio, Inc. All rights reserved.
> - * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved.
>   *
>   * This software is available to you under a choice of one of two
>   * licenses.  You may choose to be licensed under the terms of the GNU
> diff --git a/drivers/infiniband/hw/cxgb3/core/cxio_hal.h b/drivers/infiniband/hw/cxgb3/core/cxio_hal.h
> index e5e702d..1553bda 100644
> --- a/drivers/infiniband/hw/cxgb3/core/cxio_hal.h
> +++ b/drivers/infiniband/hw/cxgb3/core/cxio_hal.h
> @@ -1,6 +1,5 @@
>  /*
>   * Copyright (c) 2006 Chelsio, Inc. All rights reserved.
> - * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved.
>   *
>   * This software is available to you under a choice of one of two
>   * licenses.  You may choose to be licensed under the terms of the GNU
> diff --git a/drivers/infiniband/hw/cxgb3/core/cxio_resource.c b/drivers/infiniband/hw/cxgb3/core/cxio_resource.c
> index d1d8722..cf78050 100644
> --- a/drivers/infiniband/hw/cxgb3/core/cxio_resource.c
> +++ b/drivers/infiniband/hw/cxgb3/core/cxio_resource.c
> @@ -1,6 +1,5 @@
>  /*
>   * Copyright (c) 2006 Chelsio, Inc. All rights reserved.
> - * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved.
>   *
>   * This software is available to you under a choice of one of two
>   * licenses.  You may choose to be licensed under the terms of the GNU
> diff --git a/drivers/infiniband/hw/cxgb3/core/cxio_resource.h b/drivers/infiniband/hw/cxgb3/core/cxio_resource.h
> index a6bbe83..a2703a3 100644
> --- a/drivers/infiniband/hw/cxgb3/core/cxio_resource.h
> +++ b/drivers/infiniband/hw/cxgb3/core/cxio_resource.h
> @@ -1,6 +1,5 @@
>  /*
>   * Copyright (c) 2006 Chelsio, Inc. All rights reserved.
> - * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved.
>   *
>   * This software is available to you under a choice of one of two
>   * licenses.  You may choose to be licensed under the terms of the GNU
> diff --git a/drivers/infiniband/hw/cxgb3/core/cxio_wr.h b/drivers/infiniband/hw/cxgb3/core/cxio_wr.h
> index 234a084..6c7ac55 100644
> --- a/drivers/infiniband/hw/cxgb3/core/cxio_wr.h
> +++ b/drivers/infiniband/hw/cxgb3/core/cxio_wr.h
> @@ -1,6 +1,5 @@
>  /*
>   * Copyright (c) 2006 Chelsio, Inc. All rights reserved.
> - * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved.
>   *
>   * This software is available to you under a choice of one of two
>   * licenses.  You may choose to be licensed under the terms of the GNU
> diff --git a/drivers/infiniband/hw/cxgb3/iwch.c b/drivers/infiniband/hw/cxgb3/iwch.c
> index 0c95f2c..de44c57 100644
> --- a/drivers/infiniband/hw/cxgb3/iwch.c
> +++ b/drivers/infiniband/hw/cxgb3/iwch.c
> @@ -1,6 +1,5 @@
>  /*
>   * Copyright (c) 2006 Chelsio, Inc. All rights reserved.
> - * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved.
>   *
>   * This software is available to you under a choice of one of two
>   * licenses.  You may choose to be licensed under the terms of the GNU
> diff --git a/drivers/infiniband/hw/cxgb3/iwch.h b/drivers/infiniband/hw/cxgb3/iwch.h
> index 8b11198..8d9390b 100644
> --- a/drivers/infiniband/hw/cxgb3/iwch.h
> +++ b/drivers/infiniband/hw/cxgb3/iwch.h
> @@ -1,6 +1,5 @@
>  /*
>   * Copyright (c) 2006 Chelsio, Inc. All rights reserved.
> - * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved.
>   *
>   * This software is available to you under a choice of one of two
>   * licenses.  You may choose to be licensed under the terms of the GNU
> diff --git a/drivers/infiniband/hw/cxgb3/iwch_cm.c b/drivers/infiniband/hw/cxgb3/iwch_cm.c
> index 3237fc8..21fadbe 100644
> --- a/drivers/infiniband/hw/cxgb3/iwch_cm.c
> +++ b/drivers/infiniband/hw/cxgb3/iwch_cm.c
> @@ -1,6 +1,5 @@
>  /*
>   * Copyright (c) 2006 Chelsio, Inc. All rights reserved.
> - * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved.
>   *
>   * This software is available to you under a choice of one of two
>   * licenses.  You may choose to be licensed under the terms of the GNU
> diff --git a/drivers/infiniband/hw/cxgb3/iwch_cm.h b/drivers/infiniband/hw/cxgb3/iwch_cm.h
> index 893f9d0..855f1ef 100644
> --- a/drivers/infiniband/hw/cxgb3/iwch_cm.h
> +++ b/drivers/infiniband/hw/cxgb3/iwch_cm.h
> @@ -1,6 +1,5 @@
>  /*
>   * Copyright (c) 2006 Chelsio, Inc. All rights reserved.
> - * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved.
>   *
>   * This software is available to you under a choice of one of two
>   * licenses.  You may choose to be licensed under the terms of the GNU
> diff --git a/drivers/infiniband/hw/cxgb3/iwch_cq.c b/drivers/infiniband/hw/cxgb3/iwch_cq.c
> index ff09509..225fcfa 100644
> --- a/drivers/infiniband/hw/cxgb3/iwch_cq.c
> +++ b/drivers/infiniband/hw/cxgb3/iwch_cq.c
> @@ -1,6 +1,5 @@
>  /*
>   * Copyright (c) 2006 Chelsio, Inc. All rights reserved.
> - * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved.
>   *
>   * This software is available to you under a choice of one of two
>   * licenses.  You may choose to be licensed under the terms of the GNU
> diff --git a/drivers/infiniband/hw/cxgb3/iwch_ev.c b/drivers/infiniband/hw/cxgb3/iwch_ev.c
> index 646f612..f4cd5ec 100644
> --- a/drivers/infiniband/hw/cxgb3/iwch_ev.c
> +++ b/drivers/infiniband/hw/cxgb3/iwch_ev.c
> @@ -1,6 +1,5 @@
>  /*
>   * Copyright (c) 2006 Chelsio, Inc. All rights reserved.
> - * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved.
>   *
>   * This software is available to you under a choice of one of two
>   * licenses.  You may choose to be licensed under the terms of the GNU
> diff --git a/drivers/infiniband/hw/cxgb3/iwch_mem.c b/drivers/infiniband/hw/cxgb3/iwch_mem.c
> index 5909ec5..335e9a4 100644
> --- a/drivers/infiniband/hw/cxgb3/iwch_mem.c
> +++ b/drivers/infiniband/hw/cxgb3/iwch_mem.c
> @@ -1,6 +1,5 @@
>  /*
>   * Copyright (c) 2006 Chelsio, Inc. All rights reserved.
> - * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved.
>   *
>   * This software is available to you under a choice of one of two
>   * licenses.  You may choose to be licensed under the terms of the GNU
> diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.c b/drivers/infiniband/hw/cxgb3/iwch_provider.c
> index 4a46771..3f64dbf 100644
> --- a/drivers/infiniband/hw/cxgb3/iwch_provider.c
> +++ b/drivers/infiniband/hw/cxgb3/iwch_provider.c
> @@ -1,6 +1,5 @@
>  /*
>   * Copyright (c) 2006 Chelsio, Inc. All rights reserved.
> - * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved.
>   *
>   * This software is available to you under a choice of one of two
>   * licenses.  You may choose to be licensed under the terms of the GNU
> diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.h b/drivers/infiniband/hw/cxgb3/iwch_provider.h
> index d9d94e3..7322773 100644
> --- a/drivers/infiniband/hw/cxgb3/iwch_provider.h
> +++ b/drivers/infiniband/hw/cxgb3/iwch_provider.h
> @@ -1,6 +1,5 @@
>  /*
>   * Copyright (c) 2006 Chelsio, Inc. All rights reserved.
> - * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved.
>   *
>   * This software is available to you under a choice of one of two
>   * licenses.  You may choose to be licensed under the terms of the GNU
> diff --git a/drivers/infiniband/hw/cxgb3/iwch_qp.c b/drivers/infiniband/hw/cxgb3/iwch_qp.c
> index 9cc8b5e..e1e35d9 100644
> --- a/drivers/infiniband/hw/cxgb3/iwch_qp.c
> +++ b/drivers/infiniband/hw/cxgb3/iwch_qp.c
> @@ -1,6 +1,5 @@
>  /*
>   * Copyright (c) 2006 Chelsio, Inc. All rights reserved.
> - * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved.
>   *
>   * This software is available to you under a choice of one of two
>   * licenses.  You may choose to be licensed under the terms of the GNU
> diff --git a/drivers/infiniband/hw/cxgb3/iwch_user.h b/drivers/infiniband/hw/cxgb3/iwch_user.h
> index e8ff061..bf0a2f6 100644
> --- a/drivers/infiniband/hw/cxgb3/iwch_user.h
> +++ b/drivers/infiniband/hw/cxgb3/iwch_user.h
> @@ -1,6 +1,5 @@
>  /*
>   * Copyright (c) 2006 Chelsio, Inc. All rights reserved.
> - * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved.
>   *
>   * This software is available to you under a choice of one of two
>   * licenses.  You may choose to be licensed under the terms of the GNU
> 
> 
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
> 


From or.gerlitz at gmail.com  Wed Feb 21 12:50:26 2007
From: or.gerlitz at gmail.com (Or Gerlitz)
Date: Wed, 21 Feb 2007 22:50:26 +0200
Subject: [openib-general] [PATCH] librdmacm: fix bug causing failure to
 work with partial membership pkey
In-Reply-To: <000301c755f8$d2f265e0$8698070a@amr.corp.intel.com>
References: <15ddcffd0702211234j36b00a99i944e77ee0837d8c3@mail.gmail.com>
	<000301c755f8$d2f265e0$8698070a@amr.corp.intel.com>
Message-ID: <15ddcffd0702211250u49ceaa6bj4d607f9cfe802cdc@mail.gmail.com>

On 2/21/07, Sean Hefty <sean.hefty at intel.com> wrote:
> >There is no problem. As i have explained over this thread the ipoib
> >and the core abstract away from the user the actual value of the MSb
> >of the pkey, that is whether it is a full or partial membership pkey.
>
> But *why* does the kernel code do this, and should it?

It does this since its makes life simple and robust.

Note that since the HCA validates the pkey in the in coming packet, no
matter what the IB SW would do, partial members of a partition can't
talk to each other. So the approach taken by the core/ipoib code was
to just ignore the MSb in places where the code looks for the pkey
--index-- and use the full member pkey when forming MGIDs. This seems
fine to me.

Or.


From halr at voltaire.com  Wed Feb 21 14:29:21 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 21 Feb 2007 17:29:21 -0500
Subject: [openib-general] [PATCH] librdmacm: fix bug causing failure to
 work with partial membership pkey
In-Reply-To: <15ddcffd0702211245w2686b97bhcaf7e86aaa3dedf5@mail.gmail.com>
References: <Pine.LNX.4.64.0702190838050.26497@zuben>
	<1171984010.4380.304008.camel@hal.voltaire.com>
	<45DB15F5.4090406@voltaire.com>
	<1171986159.4380.306117.camel@hal.voltaire.com>
	<45DBEA1F.5090901@voltaire.com>
	<1172058368.4380.379947.camel@hal.voltaire.com>
	<45DC3C96.8040100@voltaire.com>
	<1172064021.4380.385825.camel@hal.voltaire.com>
	<15ddcffd0702211245w2686b97bhcaf7e86aaa3dedf5@mail.gmail.com>
Message-ID: <1172096957.4380.418140.camel@hal.voltaire.com>

On Wed, 2007-02-21 at 15:45, Or Gerlitz wrote:
> On 21 Feb 2007 08:20:23 -0500, Hal Rosenstock <halr at voltaire.com> wrote:
> > On Wed, 2007-02-21 at 07:35, Or Gerlitz wrote:
> 
> > > > I believe it is a spec (compliance) violation for the port to be a
> > > > partial member and join as a full member.
> 
> > > Since partial members can't talk among themselves, there is no reason to
> > > form a multicast group containing --only-- ports that can --not-- talk
> > > to each other... So if the spec does not allow this (having a partial
> > > member joining with the full member pkey) - it a spec bug...
> 
> > I think there are two issues here then:
> > 1. If this is the case, getting the spec changed to accomodate this use case
> > 2. I believe that OpenIB code is supposed to be spec compliant.
> 
> If the IPoIB spec does not allow both partial and full members of a
> partition to share a broadcast domain (eg the IPv4 broadcast group
> associated with the full membership pkey) or any other multicast
> group, burn it (or at least the relevant section).

I was referring to the IB spec, not an IPoIB RFC.

> The OpenIB code supposed to work and as done with the RDMA CM header,
> the implementation should not wait for spec to be written or changed.

Really ? Maybe I'm mistaken but I didn't think that OpenIB/OpenFabrics
wanted to issue code which is not IBA spec compliant.

-- Hal

> Or.


From mshefty at ichips.intel.com  Wed Feb 21 14:36:24 2007
From: mshefty at ichips.intel.com (Sean Hefty)
Date: Wed, 21 Feb 2007 14:36:24 -0800
Subject: [openib-general] [PATCH] librdmacm: fix bug causing failure to
 work with partial membership pkey
In-Reply-To: <15ddcffd0702211250u49ceaa6bj4d607f9cfe802cdc@mail.gmail.com>
References: <15ddcffd0702211234j36b00a99i944e77ee0837d8c3@mail.gmail.com>
	<000301c755f8$d2f265e0$8698070a@amr.corp.intel.com>
	<15ddcffd0702211250u49ceaa6bj4d607f9cfe802cdc@mail.gmail.com>
Message-ID: <45DCC968.30208@ichips.intel.com>

> It does this since its makes life simple and robust.

Is an SM prevented from loading two PKeys into an HCA's PKey table that differ 
by only the membership bit?

I can't think of any reason to do such a thing, but depending on which index was 
selected could limit which nodes you could communicate with.

> Note that since the HCA validates the pkey in the in coming packet, no
> matter what the IB SW would do, partial members of a partition can't
> talk to each other. So the approach taken by the core/ipoib code was
> to just ignore the MSb in places where the code looks for the pkey
> --index-- and use the full member pkey when forming MGIDs. This seems
> fine to me.

My concern is that ib_find_cached_pkey() returns an index to a pkey that wasn't 
the one in the search.  Can this lead to a QP being configured in such a way 
that communication with a remote QP would silently fail?

I realize that a user could call ib_get_cached_pkey and see if the returned 
value matches the one in the original search, but this is a non-obvious way to 
check for a mismatch.

I'm not against this patch, but I want to make sure that I understand the 
issues, so we're not creating a work-around solution.  The patch is against the 
librdmacm, yet there's nothing that I see in the librdmacm that makes me think 
it's behaving incorrectly.

- Sean


From tom at opengridcomputing.com  Wed Feb 21 14:40:21 2007
From: tom at opengridcomputing.com (Tom Tucker)
Date: Wed, 21 Feb 2007 16:40:21 -0600
Subject: [openib-general] OFED-1.2-20070221-1741.tgz package is available
In-Reply-To: <1172081761.5256.45.camel@vladsk-laptop>
References: <1172080605.5256.35.camel@vladsk-laptop>
	<1172080948.27101.15.camel@stevo-desktop>
	<1172081761.5256.45.camel@vladsk-laptop>
Message-ID: <1172097621.5994.13.camel@trinity.ogc.int>

Vlad:

On Wed, 2007-02-21 at 20:16 +0200, Vladimir Sokolovsky wrote:
> On Wed, 2007-02-21 at 12:02 -0600, Steve Wise wrote:
> > Hey Vlad,
> > 
> > What about bugs: 355 and 357?
> > 
> > 
> Bug: 355 (problems building modules that depend on OFED 1.2 modules)
> 
> In order to build kernel modules depending on OFED's modules you need to
> take Modules.symvers file from <prefix>/src/openib/Modules.symvers (part
> of kernel-ib-devel RPM) and copy this to modules subdir and then compile
> your module.

Won't this blow away all the version information for the non-IB symbols?


> Currently I see that <prefix>/src/openib/Modules.symvers is empty. I
> will check this issue.
> 
> For now you can use the attached script to create Modules.symvers file.
> 
> Bug: 357 (cxgb3 can't be selected on sles9sp3)
> 
> cxgb3 driver compilation failed on sles9sp3 in previous OFED build.
> Then it was disabled in build_env.sh script in order to prevent OFED
> installation failure.
> Did you fixed this compilation issue?
> 
> 
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


From mshefty at ichips.intel.com  Wed Feb 21 15:05:58 2007
From: mshefty at ichips.intel.com (Sean Hefty)
Date: Wed, 21 Feb 2007 15:05:58 -0800
Subject: [openib-general] GetTable path record query not
 returningDGID=SGID paths
In-Reply-To: <1171514817.22446.145890.camel@hal.voltaire.com>
References: <000701c75091$c4f59fa0$ff0da8c0@amr.corp.intel.com>
	<1171514817.22446.145890.camel@hal.voltaire.com>
Message-ID: <45DCD056.50108@ichips.intel.com>

>>We haven't looked into this in more detail yet.  This was our observation while
>>testing on a larger (64 node) cluster this morning that we don't have access to
>>at the moment.  With the local SA cache running, we were surprised to see any
>>retries, and when we looked into it more, retries were always for loopback
>>connections.

Our investigation showed a couple of things.  When we pulled our systems off 
into a small cluster and ran opensm, things were fine.  The cache was working as 
normal, and we did get loopback paths from opensm.

On our development cluster, the cache was never getting any path records.  It 
would issue a GetTable query, and the SM would respond.  The response had a 
status of 0 (success), but never returned any path records.  I believe that the 
SM node is running OFED 1.1.1.

I don't have the ability to modify the kernel on the larger 64-node cluster that 
we were testing on to see what is going on there.

- Sean


From halr at voltaire.com  Wed Feb 21 14:53:22 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 21 Feb 2007 17:53:22 -0500
Subject: [openib-general] [PATCH] librdmacm: fix bug causing failure to
 work with partial membership pkey
In-Reply-To: <45DCC968.30208@ichips.intel.com>
References: <15ddcffd0702211234j36b00a99i944e77ee0837d8c3@mail.gmail.com>
	<000301c755f8$d2f265e0$8698070a@amr.corp.intel.com>
	<15ddcffd0702211250u49ceaa6bj4d607f9cfe802cdc@mail.gmail.com>
	<45DCC968.30208@ichips.intel.com>
Message-ID: <1172098401.4380.419534.camel@hal.voltaire.com>

On Wed, 2007-02-21 at 17:36, Sean Hefty wrote:
> > It does this since its makes life simple and robust.
> 
> Is an SM prevented from loading two PKeys into an HCA's PKey table that differ 
> by only the membership bit?

Nope.

> I can't think of any reason to do such a thing,

Me neither. It would be a configuration error of sorts.

> but depending on which index was 
> selected could limit which nodes you could communicate with.

> > Note that since the HCA validates the pkey in the in coming packet, no
> > matter what the IB SW would do, partial members of a partition can't
> > talk to each other. So the approach taken by the core/ipoib code was
> > to just ignore the MSb in places where the code looks for the pkey
> > --index-- and use the full member pkey when forming MGIDs. This seems
> > fine to me.
> 
> My concern is that ib_find_cached_pkey() returns an index to a pkey that wasn't 
> the one in the search.  Can this lead to a QP being configured in such a way 
> that communication with a remote QP would silently fail?
> 
> I realize that a user could call ib_get_cached_pkey and see if the returned 
> value matches the one in the original search, but this is a non-obvious way to 
> check for a mismatch.
> 
> I'm not against this patch, but I want to make sure that I understand the 
> issues, so we're not creating a work-around solution.  The patch is against the 
> librdmacm, yet there's nothing that I see in the librdmacm that makes me think 
> it's behaving incorrectly.

I'm not sure it's this patch in particular but it appears that there may
be some non compliant behavior being exercised IMO.

-- Hal

> - Sean


From halr at voltaire.com  Wed Feb 21 15:22:34 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 21 Feb 2007 18:22:34 -0500
Subject: [openib-general] GetTable path record query not
 returningDGID=SGID paths
In-Reply-To: <45DCD056.50108@ichips.intel.com>
References: <000701c75091$c4f59fa0$ff0da8c0@amr.corp.intel.com>
	<1171514817.22446.145890.camel@hal.voltaire.com>
	<45DCD056.50108@ichips.intel.com>
Message-ID: <1172100153.4380.421309.camel@hal.voltaire.com>

On Wed, 2007-02-21 at 18:05, Sean Hefty wrote:
> >>We haven't looked into this in more detail yet.  This was our observation while
> >>testing on a larger (64 node) cluster this morning that we don't have access to
> >>at the moment.  With the local SA cache running, we were surprised to see any
> >>retries, and when we looked into it more, retries were always for loopback
> >>connections.
> 
> Our investigation showed a couple of things.  When we pulled our systems off 
> into a small cluster and ran opensm, things were fine.  The cache was working as 
> normal, and we did get loopback paths from opensm.
> 
> On our development cluster, the cache was never getting any path records.  It 
> would issue a GetTable query, and the SM would respond.  The response had a 
> status of 0 (success), but never returned any path records.  I believe that the 
> SM node is running OFED 1.1.1.

I'm unaware of any changes in this area of OpenSM which would cause this
but maybe I'm forgetting something. Can you run opensm with -V and send
the logs to me ? This should be instructive.

-- Hal

> I don't have the ability to modify the kernel on the larger 64-node cluster that 
> we were testing on to see what is going on there.
> 
> - Sean


From halr at voltaire.com  Wed Feb 21 15:32:18 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 21 Feb 2007 18:32:18 -0500
Subject: [openib-general] [PATCH] librdmacm: fix bug causing failure to
 work with partial membership pkey
In-Reply-To: <1172098401.4380.419534.camel@hal.voltaire.com>
References: <15ddcffd0702211234j36b00a99i944e77ee0837d8c3@mail.gmail.com>
	<000301c755f8$d2f265e0$8698070a@amr.corp.intel.com>
	<15ddcffd0702211250u49ceaa6bj4d607f9cfe802cdc@mail.gmail.com>
	<45DCC968.30208@ichips.intel.com>
	<1172098401.4380.419534.camel@hal.voltaire.com>
Message-ID: <1172100738.4380.421860.camel@hal.voltaire.com>

On Wed, 2007-02-21 at 17:53, Hal Rosenstock wrote:
> On Wed, 2007-02-21 at 17:36, Sean Hefty wrote:
> > > It does this since its makes life simple and robust.
> > 
> > Is an SM prevented from loading two PKeys into an HCA's PKey table that differ 
> > by only the membership bit?
> 
> Nope.
> 
> > I can't think of any reason to do such a thing,
> 
> Me neither. It would be a configuration error of sorts.

It is vendor dependent whether the SM would allow this. As Sasha points
out, this cannot be done with OpenSM (at least currently).

-- Hal

> > but depending on which index was 
> > selected could limit which nodes you could communicate with.
> 
> > > Note that since the HCA validates the pkey in the in coming packet, no
> > > matter what the IB SW would do, partial members of a partition can't
> > > talk to each other. So the approach taken by the core/ipoib code was
> > > to just ignore the MSb in places where the code looks for the pkey
> > > --index-- and use the full member pkey when forming MGIDs. This seems
> > > fine to me.
> > 
> > My concern is that ib_find_cached_pkey() returns an index to a pkey that wasn't 
> > the one in the search.  Can this lead to a QP being configured in such a way 
> > that communication with a remote QP would silently fail?
> > 
> > I realize that a user could call ib_get_cached_pkey and see if the returned 
> > value matches the one in the original search, but this is a non-obvious way to 
> > check for a mismatch.
> > 
> > I'm not against this patch, but I want to make sure that I understand the 
> > issues, so we're not creating a work-around solution.  The patch is against the 
> > librdmacm, yet there's nothing that I see in the librdmacm that makes me think 
> > it's behaving incorrectly.
> 
> I'm not sure it's this patch in particular but it appears that there may
> be some non compliant behavior being exercised IMO.
> 
> -- Hal
> 
> > - Sean
> 
> 
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
> 


From changquing.tang at hp.com  Wed Feb 21 15:48:59 2007
From: changquing.tang at hp.com (Tang, Changqing)
Date: Wed, 21 Feb 2007 23:48:59 -0000
Subject: [openib-general] I created a git tree for the libibverbs man
 pages
In-Reply-To: <aday7msfjwy.fsf@cisco.com>
References: <45BF63A1.6090402@dev.mellanox.co.il>
	<adaireotr9h.fsf@cisco.com> <45BF756B.1060500@dev.mellanox.co.il>
	<aday7msfjwy.fsf@cisco.com>
Message-ID: <349DCDA352EACF42A0C49FA6DCEA84037D91A1@G3W0634.americas.hpqcorp.net>


Hi, Roland:
	What is the Max # of cards OFED driver/library can support on a
single node ?  
	Thanks.

--CQ

> -----Original Message-----
> From: openib-general-bounces at openib.org 
> [mailto:openib-general-bounces at openib.org] On Behalf Of Roland Dreier
> Sent: Tuesday, February 20, 2007 6:12 PM
> To: Dotan Barak
> Cc: openib-general
> Subject: Re: [openib-general] I created a git tree for the 
> libibverbs man pages
> 
> I merged all these manpages into my libibverbs tree and 
> pushed the result out to kernel.org.
> 
> Please send any future updates as diffs against the libibverbs tree.
> 
> Thanks,
>   Roland
> 
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit 
> http://openib.org/mailman/listinfo/openib-general
> 
> 


From rdreier at cisco.com  Wed Feb 21 16:55:31 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Wed, 21 Feb 2007 16:55:31 -0800
Subject: [openib-general] I created a git tree for the libibverbs man
 pages
In-Reply-To: <349DCDA352EACF42A0C49FA6DCEA84037D91A1@G3W0634.americas.hpqcorp.net>
	(Changqing Tang's message of "Wed, 21 Feb 2007 23:48:59 -0000")
References: <45BF63A1.6090402@dev.mellanox.co.il>
	<adaireotr9h.fsf@cisco.com> <45BF756B.1060500@dev.mellanox.co.il>
	<aday7msfjwy.fsf@cisco.com>
	<349DCDA352EACF42A0C49FA6DCEA84037D91A1@G3W0634.americas.hpqcorp.net>
Message-ID: <adaslcz57uk.fsf@cisco.com>

 > 	What is the Max # of cards OFED driver/library can support on a
 > single node ?  

The lowest limit I know of is the # of device minors available for
/dev/infiniband/uverbs files, which is 32.  How many devices are you
interested in supporting?

This limit could probably be increased without too much trouble, but I
doubt any realistic system will run into it anyway.

 - R.


From changquing.tang at hp.com  Wed Feb 21 17:17:27 2007
From: changquing.tang at hp.com (Tang, Changqing)
Date: Thu, 22 Feb 2007 01:17:27 -0000
Subject: [openib-general] I created a git tree for the libibverbs man
 pages
In-Reply-To: <adaslcz57uk.fsf@cisco.com>
References: <45BF63A1.6090402@dev.mellanox.co.il>
	<adaireotr9h.fsf@cisco.com><45BF756B.1060500@dev.mellanox.co.il>
	<aday7msfjwy.fsf@cisco.com><349DCDA352EACF42A0C49FA6DCEA84037D91A1@G3W0634.americas.hpqcorp.net>
	<adaslcz57uk.fsf@cisco.com>
Message-ID: <349DCDA352EACF42A0C49FA6DCEA84037D9238@G3W0634.americas.hpqcorp.net>


Supporting upto 32 cards on a node is big enough for quite a while. I
just want to check if only 4 or 8 is supported.

Thanks.

--CQ

> -----Original Message-----
> From: Roland Dreier [mailto:rdreier at cisco.com] 
> Sent: Wednesday, February 21, 2007 6:56 PM
> To: Tang, Changqing
> Cc: Dotan Barak; openib-general
> Subject: Re: [openib-general] I created a git tree for the 
> libibverbs man pages
> 
>  > 	What is the Max # of cards OFED driver/library can support on a
>  > single node ?  
> 
> The lowest limit I know of is the # of device minors 
> available for /dev/infiniband/uverbs files, which is 32.  How 
> many devices are you interested in supporting?
> 
> This limit could probably be increased without too much 
> trouble, but I doubt any realistic system will run into it anyway.
> 
>  - R.
> 


From akepner at sgi.com  Wed Feb 21 17:21:11 2007
From: akepner at sgi.com (akepner at sgi.com)
Date: Wed, 21 Feb 2007 17:21:11 -0800
Subject: [openib-general] [RFC/BUG] DMA vs. CQ race
Message-ID: <20070222012111.GB3352@sgi.com>


In:

http://openib.org/pipermail/openib-general/2006-December/030251.html

I described a potential race between DMA and CQ updates on
Altix systems. At that time the bug hadn't been observed,
but was expected to be possible on "large" NUMA systems.

A first-cut at a patch was sent out, some very reasonable
objections were raised, and the thread fizzled out.

Since that time we've been able to produce the bug, and show
that the patch I sent fixes the problem. (OK, the patch I sent
with the addition of a small but important patchlet.)

The biggest concern with the earlier patch seemed to be
backward compatibility. There was a stab at addressing
that in http://tinyurl.com/2x3s52, but no commentary.
(Too ugly for words?)

Any suggestions as to how to proceed? Should I just code
something up in order to have a concrete target to discuss?
Or are there any new thoughts based on the previous emails?

-- 
Arthur


From dotanb at dev.mellanox.co.il  Wed Feb 21 23:08:35 2007
From: dotanb at dev.mellanox.co.il (Dotan Barak)
Date: Thu, 22 Feb 2007 09:08:35 +0200
Subject: [openib-general] I created a git tree for the libibverbs man
	pages
In-Reply-To: <aday7msfjwy.fsf@cisco.com>
References: <45BF63A1.6090402@dev.mellanox.co.il>
	<adaireotr9h.fsf@cisco.com> <45BF756B.1060500@dev.mellanox.co.il>
	<aday7msfjwy.fsf@cisco.com>
Message-ID: <45DD4173.70701@dev.mellanox.co.il>

Roland Dreier wrote:
> I merged all these manpages into my libibverbs tree and pushed the
> result out to kernel.org.
>
> Please send any future updates as diffs against the libibverbs tree.
>
> Thanks,
>   Roland
>   
those are great news, thanks.

Before the OFED 1.2 release i plan to send you a patch to fix some issues.

thank again
Dotan


From sweitzen at cisco.com  Wed Feb 21 23:11:50 2007
From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen))
Date: Wed, 21 Feb 2007 23:11:50 -0800
Subject: [openib-general] anyone have OFED 1.2 alpha1 compiling on ppc64
Message-ID: <A15335FBE9BD2449AF2C9EF3D1EB8EA3030AD952@xmb-sjc-216.amer.cisco.com>

I tried both RHEL4 and SLES10 usinstall.sh, and get this.  I filed bug
379, anyone else tried ppc64?
 
 gcc -DHAVE_CONFIG_H -I. -I. -I. -I./include/infiniband
-I./../libibcommon/incl\
ude/infiniband -Wall -m64 -g -O2 -MT libibumad_la-umad.lo -MD -MP -MF
.deps/lib\
ibumad_la-umad.Tpo -c src/umad.c  -fPIC -DPIC -o
.libs/libibumad_la-umad.o
In file included from src/umad.c:50:
./include/infiniband/umad.h:37:31: infiniband/common.h: No such file or
directo\
ry
src/umad.c: In function `port_alloc':
src/umad.c:94: warning: implicit declaration of function `IBWARN'
src/umad.c: In function `get_port':
src/umad.c:160: warning: implicit declaration of function `snprintf'
src/umad.c:163: warning: implicit declaration of function
`sys_read_uint'
src/umad.c:177: warning: implicit declaration of function
`sys_read_uint64'
src/umad.c:182: warning: implicit declaration of function `sys_read_gid'
src/umad.c: In function `get_ca':
src/umad.c:354: warning: implicit declaration of function
`sys_read_string'
src/umad.c:363: warning: implicit declaration of function
`sys_read_guid'
make[3]: *** [libibumad_la-umad.lo] Error 1
make[3]: Leaving directory
`/var/tmp/OFEDRPM/BUILD/ofa_user-1.2/src/userspace/m\
anagement/libibumad'
make[2]: *** [all-recursive] Error 1
make[2]: Leaving directory
`/var/tmp/OFEDRPM/BUILD/ofa_user-1.2/src/userspace/m\
anagement/libibumad'
make[1]: *** [all] Error 2
make[1]: Leaving directory
`/var/tmp/OFEDRPM/BUILD/ofa_user-1.2/src/userspace/m\
anagement/libibumad'
make: *** [subdirs] Error 1
make: Leaving directory
`/var/tmp/OFEDRPM/BUILD/ofa_user-1.2/src/userspace/mana\
gement'

 
Scott Weitzenkamp
SQA and Release Manager
Server Virtualization Business Unit
Cisco Systems
 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070221/8a758dfd/attachment.html>

From ogerlitz at voltaire.com  Wed Feb 21 23:28:48 2007
From: ogerlitz at voltaire.com (Or Gerlitz)
Date: Thu, 22 Feb 2007 09:28:48 +0200
Subject: [openib-general] [PATCH] librdmacm: fix bug causing failure to
 work with partial membership pkey
In-Reply-To: <1172096957.4380.418140.camel@hal.voltaire.com>
References: <Pine.LNX.4.64.0702190838050.26497@zuben>
	<1171984010.4380.304008.camel@hal.voltaire.com>
	<45DB15F5.4090406@voltaire.com>
	<1171986159.4380.306117.camel@hal.voltaire.com>
	<45DBEA1F.5090901@voltaire.com>
	<1172058368.4380.379947.camel@hal.voltaire.com>
	<45DC3C96.8040100@voltaire.com>
	<1172064021.4380.385825.camel@hal.voltaire.com>
	<15ddcffd0702211245w2686b97bhcaf7e86aaa3dedf5@mail.gmail.com>
	<1172096957.4380.418140.camel@hal.voltaire.com>
Message-ID: <45DD4630.3070101@voltaire.com>

Hal Rosenstock wrote:
> On Wed, 2007-02-21 at 15:45, Or Gerlitz wrote:
>> On 21 Feb 2007 08:20:23 -0500, Hal Rosenstock <halr at voltaire.com> wrote:

>> If the IPoIB spec does not allow both partial and full members of a
>> partition to share a broadcast domain (eg the IPv4 broadcast group
>> associated with the full membership pkey) or any other multicast
>> group, burn it (or at least the relevant section).

> I was referring to the IB spec, not an IPoIB RFC.

Can you provide a pointer?

>> The OpenIB code supposed to work and as done with the RDMA CM header,
>> the implementation should not wait for spec to be written or changed.

> Really ? Maybe I'm mistaken but I didn't think that OpenIB/OpenFabrics
> wanted to issue code which is not IBA spec compliant.

The code resides in the Linux kernel, period. Linux is not under the 
control of this or that organization, period, period. Linux uses an 
hierarchic maintainship structure where Roland, Sean and yourself are 
listed as the maintainers, which means you are able to promote and/or 
block this or that agenda, go for it!

Or.


From ogerlitz at voltaire.com  Thu Feb 22 00:04:14 2007
From: ogerlitz at voltaire.com (Or Gerlitz)
Date: Thu, 22 Feb 2007 10:04:14 +0200
Subject: [openib-general] [PATCH] librdmacm: fix bug causing failure to
 work with partial membership pkey
In-Reply-To: <45DCC968.30208@ichips.intel.com>
References: <15ddcffd0702211234j36b00a99i944e77ee0837d8c3@mail.gmail.com>
	<000301c755f8$d2f265e0$8698070a@amr.corp.intel.com>
	<15ddcffd0702211250u49ceaa6bj4d607f9cfe802cdc@mail.gmail.com>
	<45DCC968.30208@ichips.intel.com>
Message-ID: <45DD4E7E.50009@voltaire.com>

Sean Hefty wrote:
>> Note that since the HCA validates the pkey in the in coming packet, no
>> matter what the IB SW would do, partial members of a partition can't
>> talk to each other. So the approach taken by the core/ipoib code was
>> to just ignore the MSb in places where the code looks for the pkey
>> --index-- and use the full member pkey when forming MGIDs. This seems
>> fine to me.

> My concern is that ib_find_cached_pkey() returns an index to a pkey that wasn't 
> the one in the search.  Can this lead to a QP being configured in such a way 
> that communication with a remote QP would silently fail?

My understanding is that when an IPoIB broadcast domain contains both 
partial and full members (*) attempts to communicate between two partial 
members would silently fail, does this silence is something you think we 
should work to change?

(*) eg when you have bunch or clients and a server or bunch of servers 
and you don't want to allow --clients-- to communicate among themselves)

> I'm not against this patch, but I want to make sure that I understand the 
> issues, so we're not creating a work-around solution.  The patch is against the 
> librdmacm, yet there's nothing that I see in the librdmacm that makes me think 
> it's behaving incorrectly.

My thinking is that if in the end of this thread we are willing to move 
forward without changing ib_find_cached_pkey() then this patch should be 
merged.

Or.


From mst at mellanox.co.il  Thu Feb 22 01:00:06 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Thu, 22 Feb 2007 11:00:06 +0200
Subject: [openib-general] anyone have OFED 1.2 alpha1 compiling on ppc64
In-Reply-To: <A15335FBE9BD2449AF2C9EF3D1EB8EA3030AD952@xmb-sjc-216.amer.cisco.com>
References: <A15335FBE9BD2449AF2C9EF3D1EB8EA3030AD952@xmb-sjc-216.amer.cisco.com>
Message-ID: <20070222090006.GA9727@mellanox.co.il>

> Quoting Scott Weitzenkamp (sweitzen) <sweitzen at cisco.com>:
> Subject: anyone have OFED 1.2 alpha1 compiling on ppc64
> 
> I tried both RHEL4 and SLES10 usinstall.sh, and get this.  I filed bug 379,
> anyone else tried ppc64?

Scott, could pls you upload the kernel sources and .config files to staging?
If you do, we'll be able to add these to mightly cross-build environment.
	
-- 
MST


From noreply at eoxiamail.com  Thu Feb 22 01:52:04 2007
From: noreply at eoxiamail.com (Airtist.com)
Date: Thu, 22 Feb 2007 10:52:04 +0100
Subject: [openib-general]
	=?UTF-8?Q?Airtist_Telecharger_vos_MP3_sans_DRM_a_partir_de_0, 2=80?=
Message-ID: <557febc6d61aa3920004b0efb3c66140@www.eoxiamail.com>

An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070222/ddbc82da/attachment.html>

From vlad at lists.openfabrics.org  Thu Feb 22 02:26:53 2007
From: vlad at lists.openfabrics.org (vlad at lists.openfabrics.org)
Date: Thu, 22 Feb 2007 02:26:53 -0800 (PST)
Subject: [openib-general] ofa_1_2_kernel 20070222-0200 daily build status
Message-ID: <20070222102653.E48CAE6080D@openfabrics.org>

This email was generated automatically, please do not reply


Common build parameters:  --with-ipoib-mod --with-sdp-mod --with-srp-mod --with-user_mad-mod --with-user_access-mod --with-mthca-mod --with-core-mod --with-addr_trans-mod --with-cxgb3-mod 

Passed:
Passed on i686 with 2.6.15-23-server
Passed on i686 with linux-2.6.13
Passed on i686 with linux-2.6.17
Passed on i686 with linux-2.6.16
Passed on i686 with linux-2.6.14
Passed on i686 with linux-2.6.18
Passed on i686 with linux-2.6.15
Passed on i686 with linux-2.6.12
Passed on i686 with linux-2.6.19
Passed on x86_64 with linux-2.6.20
Passed on x86_64 with linux-2.6.19
Passed on powerpc with linux-2.6.19
Passed on powerpc with linux-2.6.18
Passed on x86_64 with linux-2.6.18
Passed on powerpc with linux-2.6.17
Passed on x86_64 with linux-2.6.13
Passed on x86_64 with linux-2.6.17
Passed on x86_64 with linux-2.6.16
Passed on x86_64 with linux-2.6.12
Passed on x86_64 with linux-2.6.15
Passed on ia64 with linux-2.6.15
Passed on x86_64 with linux-2.6.14
Passed on ppc64 with linux-2.6.18
Passed on powerpc with linux-2.6.12
Passed on powerpc with linux-2.6.14
Passed on x86_64 with linux-2.6.16.21-0.8-smp
Passed on ppc64 with linux-2.6.12
Passed on powerpc with linux-2.6.16
Passed on ppc64 with linux-2.6.19
Passed on ppc64 with linux-2.6.16
Passed on ia64 with linux-2.6.19
Passed on ia64 with linux-2.6.17
Passed on ia64 with linux-2.6.14
Passed on powerpc with linux-2.6.15
Passed on powerpc with linux-2.6.13
Passed on ppc64 with linux-2.6.14
Passed on ia64 with linux-2.6.12
Passed on ia64 with linux-2.6.16
Passed on ppc64 with linux-2.6.13
Passed on ppc64 with linux-2.6.15
Passed on ppc64 with linux-2.6.17
Passed on ia64 with linux-2.6.13
Passed on ia64 with linux-2.6.18
Passed on x86_64 with linux-2.6.9-42.ELsmp

Failed:
Build failed on ia64 with linux-2.6.16.21-0.8-default
Log:
/home/vlad/tmp/ofa_1_2_kernel-20070222-0200_linux-2.6.16.21-0.8-default_ia64_check/include/rdma/ib_verbs.h:1590: error: implicit declaration of function ‘sg_dma_len’
/home/vlad/tmp/ofa_1_2_kernel-20070222-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core/addr.c: At top level:
/home/vlad/tmp/ofa_1_2_kernel-20070222-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core/addr.c:61: warning: initialization from incompatible pointer type
make[4]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070222-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core/addr.o] Error 1
make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070222-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core] Error 2
make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070222-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband] Error 2
make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070222-0200_linux-2.6.16.21-0.8-default_ia64_check] Error 2
make[1]: Leaving directory `/home/vlad/kernel.org/ia64/linux-2.6.16.21-0.8-default'
make: *** [kernel] Error 2
----------------------------------------------------------------------------------
Build failed on x86_64 with linux-2.6.9-22.ELsmp
Log:
/home/vlad/tmp/ofa_1_2_kernel-20070222-0200_linux-2.6.9-22.ELsmp_x86_64_check/drivers/net/cxgb3/vsc8211.c:167: error: ‘ADVERTISE_PAUSE_CAP’ undeclared (first use in this function)
/home/vlad/tmp/ofa_1_2_kernel-20070222-0200_linux-2.6.9-22.ELsmp_x86_64_check/drivers/net/cxgb3/vsc8211.c:167: error: (Each undeclared identifier is reported only once
/home/vlad/tmp/ofa_1_2_kernel-20070222-0200_linux-2.6.9-22.ELsmp_x86_64_check/drivers/net/cxgb3/vsc8211.c:167: error: for each function it appears in.)
/home/vlad/tmp/ofa_1_2_kernel-20070222-0200_linux-2.6.9-22.ELsmp_x86_64_check/drivers/net/cxgb3/vsc8211.c:170: error: ‘ADVERTISE_PAUSE_ASYM’ undeclared (first use in this function)
make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070222-0200_linux-2.6.9-22.ELsmp_x86_64_check/drivers/net/cxgb3/vsc8211.o] Error 1
make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070222-0200_linux-2.6.9-22.ELsmp_x86_64_check/drivers/net/cxgb3] Error 2
make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070222-0200_linux-2.6.9-22.ELsmp_x86_64_check] Error 2
make[1]: Leaving directory `/home/vlad/kernel.org/x86_64/linux-2.6.9-22.ELsmp'
make: *** [kernel] Error 2
----------------------------------------------------------------------------------
Build failed on x86_64 with linux-2.6.9-34.ELsmp
Log:
/home/vlad/tmp/ofa_1_2_kernel-20070222-0200_linux-2.6.9-34.ELsmp_x86_64_check/drivers/net/cxgb3/cxgb3_offload.c: In function ‘add_adapter’:
/home/vlad/tmp/ofa_1_2_kernel-20070222-0200_linux-2.6.9-34.ELsmp_x86_64_check/drivers/net/cxgb3/cxgb3_offload.c:1062: error: ‘adapter_list_lock’ undeclared (first use in this function)
/home/vlad/tmp/ofa_1_2_kernel-20070222-0200_linux-2.6.9-34.ELsmp_x86_64_check/drivers/net/cxgb3/cxgb3_offload.c: In function ‘remove_adapter’:
/home/vlad/tmp/ofa_1_2_kernel-20070222-0200_linux-2.6.9-34.ELsmp_x86_64_check/drivers/net/cxgb3/cxgb3_offload.c:1069: error: ‘adapter_list_lock’ undeclared (first use in this function)
make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070222-0200_linux-2.6.9-34.ELsmp_x86_64_check/drivers/net/cxgb3/cxgb3_offload.o] Error 1
make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070222-0200_linux-2.6.9-34.ELsmp_x86_64_check/drivers/net/cxgb3] Error 2
make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070222-0200_linux-2.6.9-34.ELsmp_x86_64_check] Error 2
make[1]: Leaving directory `/home/vlad/kernel.org/x86_64/linux-2.6.9-34.ELsmp'
make: *** [kernel] Error 2
----------------------------------------------------------------------------------


From vlad at lists.openfabrics.org  Thu Feb 22 03:16:15 2007
From: vlad at lists.openfabrics.org (vlad at lists.openfabrics.org)
Date: Thu, 22 Feb 2007 03:16:15 -0800 (PST)
Subject: [openib-general] ofa_1_2_kernel 20070222-0251 daily build status
Message-ID: <20070222111615.7B4B4E603B1@openfabrics.org>

This email was generated automatically, please do not reply


Common build parameters:  --with-ipoib-mod --with-sdp-mod --with-srp-mod --with-user_mad-mod --with-user_access-mod --with-mthca-mod --with-core-mod --with-addr_trans-mod --with-cxgb3-mod 

Passed:
Passed on i686 with 2.6.15-23-server
Passed on i686 with linux-2.6.14
Passed on i686 with linux-2.6.17
Passed on i686 with linux-2.6.18
Passed on i686 with linux-2.6.19
Passed on i686 with linux-2.6.13
Passed on i686 with linux-2.6.16
Passed on i686 with linux-2.6.15
Passed on i686 with linux-2.6.12
Passed on x86_64 with linux-2.6.20
Passed on x86_64 with linux-2.6.19
Passed on powerpc with linux-2.6.19
Passed on powerpc with linux-2.6.18
Passed on x86_64 with linux-2.6.18
Passed on x86_64 with linux-2.6.16
Passed on x86_64 with linux-2.6.17
Passed on x86_64 with linux-2.6.14
Passed on x86_64 with linux-2.6.12
Passed on powerpc with linux-2.6.17
Passed on x86_64 with linux-2.6.13
Passed on x86_64 with linux-2.6.15
Passed on ia64 with linux-2.6.13
Passed on ppc64 with linux-2.6.12
Passed on ia64 with linux-2.6.18
Passed on powerpc with linux-2.6.16
Passed on powerpc with linux-2.6.14
Passed on ppc64 with linux-2.6.14
Passed on powerpc with linux-2.6.12
Passed on ia64 with linux-2.6.15
Passed on powerpc with linux-2.6.13
Passed on ia64 with linux-2.6.16
Passed on ppc64 with linux-2.6.19
Passed on powerpc with linux-2.6.15
Passed on ppc64 with linux-2.6.18
Passed on x86_64 with linux-2.6.9-42.ELsmp
Passed on ppc64 with linux-2.6.17
Passed on x86_64 with linux-2.6.16.21-0.8-smp
Passed on ia64 with linux-2.6.12
Passed on ppc64 with linux-2.6.15
Passed on ia64 with linux-2.6.19
Passed on ppc64 with linux-2.6.13
Passed on ia64 with linux-2.6.17
Passed on ia64 with linux-2.6.14
Passed on ppc64 with linux-2.6.16

Failed:
Build failed on ia64 with linux-2.6.16.21-0.8-default
Log:
/home/vlad/tmp/ofa_1_2_kernel-20070222-0251_linux-2.6.16.21-0.8-default_ia64_check/include/rdma/ib_verbs.h:1590: error: implicit declaration of function 'sg_dma_len'
/home/vlad/tmp/ofa_1_2_kernel-20070222-0251_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core/addr.c: At top level:
/home/vlad/tmp/ofa_1_2_kernel-20070222-0251_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core/addr.c:61: warning: initialization from incompatible pointer type
make[4]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070222-0251_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core/addr.o] Error 1
make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070222-0251_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core] Error 2
make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070222-0251_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband] Error 2
make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070222-0251_linux-2.6.16.21-0.8-default_ia64_check] Error 2
make[1]: Leaving directory `/home/vlad/kernel.org/ia64/linux-2.6.16.21-0.8-default'
make: *** [kernel] Error 2
----------------------------------------------------------------------------------
Build failed on x86_64 with linux-2.6.9-34.ELsmp
Log:
/home/vlad/tmp/ofa_1_2_kernel-20070222-0251_linux-2.6.9-34.ELsmp_x86_64_check/drivers/net/cxgb3/cxgb3_offload.c: In function 'add_adapter':
/home/vlad/tmp/ofa_1_2_kernel-20070222-0251_linux-2.6.9-34.ELsmp_x86_64_check/drivers/net/cxgb3/cxgb3_offload.c:1062: error: 'adapter_list_lock' undeclared (first use in this function)
/home/vlad/tmp/ofa_1_2_kernel-20070222-0251_linux-2.6.9-34.ELsmp_x86_64_check/drivers/net/cxgb3/cxgb3_offload.c: In function 'remove_adapter':
/home/vlad/tmp/ofa_1_2_kernel-20070222-0251_linux-2.6.9-34.ELsmp_x86_64_check/drivers/net/cxgb3/cxgb3_offload.c:1069: error: 'adapter_list_lock' undeclared (first use in this function)
make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070222-0251_linux-2.6.9-34.ELsmp_x86_64_check/drivers/net/cxgb3/cxgb3_offload.o] Error 1
make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070222-0251_linux-2.6.9-34.ELsmp_x86_64_check/drivers/net/cxgb3] Error 2
make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070222-0251_linux-2.6.9-34.ELsmp_x86_64_check] Error 2
make[1]: Leaving directory `/home/vlad/kernel.org/x86_64/linux-2.6.9-34.ELsmp'
make: *** [kernel] Error 2
----------------------------------------------------------------------------------
Build failed on x86_64 with linux-2.6.9-22.ELsmp
Log:
/home/vlad/tmp/ofa_1_2_kernel-20070222-0251_linux-2.6.9-22.ELsmp_x86_64_check/drivers/net/cxgb3/vsc8211.c:167: error: 'ADVERTISE_PAUSE_CAP' undeclared (first use in this function)
/home/vlad/tmp/ofa_1_2_kernel-20070222-0251_linux-2.6.9-22.ELsmp_x86_64_check/drivers/net/cxgb3/vsc8211.c:167: error: (Each undeclared identifier is reported only once
/home/vlad/tmp/ofa_1_2_kernel-20070222-0251_linux-2.6.9-22.ELsmp_x86_64_check/drivers/net/cxgb3/vsc8211.c:167: error: for each function it appears in.)
/home/vlad/tmp/ofa_1_2_kernel-20070222-0251_linux-2.6.9-22.ELsmp_x86_64_check/drivers/net/cxgb3/vsc8211.c:170: error: 'ADVERTISE_PAUSE_ASYM' undeclared (first use in this function)
make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070222-0251_linux-2.6.9-22.ELsmp_x86_64_check/drivers/net/cxgb3/vsc8211.o] Error 1
make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070222-0251_linux-2.6.9-22.ELsmp_x86_64_check/drivers/net/cxgb3] Error 2
make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070222-0251_linux-2.6.9-22.ELsmp_x86_64_check] Error 2
make[1]: Leaving directory `/home/vlad/kernel.org/x86_64/linux-2.6.9-22.ELsmp'
make: *** [kernel] Error 2
----------------------------------------------------------------------------------


From vlad at mellanox.co.il  Thu Feb 22 04:09:29 2007
From: vlad at mellanox.co.il (Vladimir Sokolovsky)
Date: Thu, 22 Feb 2007 14:09:29 +0200
Subject: [openib-general] [PATCH 1/2] ofed_1_2 Fix copyrights in the
 cxgb3 driver.
In-Reply-To: <1171569595.13282.60.camel@stevo-desktop>
References: <1171569595.13282.60.camel@stevo-desktop>
Message-ID: <1172146169.18306.0.camel@vladsk-laptop>

On Thu, 2007-02-15 at 13:59 -0600, Steve Wise wrote:
> Fix copyrights in the cxgb3 driver.
> 
> Remove the Open Grid Computing copyright.  It shouldn't be there.
> 
> Signed-off-by: Steve Wise <swise at opengridcomputing.com>
> ---

Applied.

-- 
Vladimir Sokolovsky <vlad at mellanox.co.il>
Mellanox Technologies Ltd.


From vlad at mellanox.co.il  Thu Feb 22 04:09:47 2007
From: vlad at mellanox.co.il (Vladimir Sokolovsky)
Date: Thu, 22 Feb 2007 14:09:47 +0200
Subject: [openib-general] [PATCH 2/2] ofed_1_2 Fix copyrights in the
 iw_cxgb3 driver.
In-Reply-To: <1171569621.13282.62.camel@stevo-desktop>
References: <1171569621.13282.62.camel@stevo-desktop>
Message-ID: <1172146187.18306.2.camel@vladsk-laptop>

On Thu, 2007-02-15 at 14:00 -0600, Steve Wise wrote:
> Fix copyrights in the iw_cxgb3 driver.
> 
> Remove the Open Grid Computing copyright.  It shouldn't be there.
> 
> Signed-off-by: Steve Wise <swise at opengridcomputing.com>
> ---

Applied.

-- 
Vladimir Sokolovsky <vlad at mellanox.co.il>
Mellanox Technologies Ltd.


From halr at voltaire.com  Thu Feb 22 04:04:12 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 22 Feb 2007 07:04:12 -0500
Subject: [openib-general] [ewg] anyone have OFED 1.2 alpha1 compiling on
	ppc64
In-Reply-To: <A15335FBE9BD2449AF2C9EF3D1EB8EA3030AD952@xmb-sjc-216.amer.cisco.com>
References: <A15335FBE9BD2449AF2C9EF3D1EB8EA3030AD952@xmb-sjc-216.amer.cisco.com>
Message-ID: <1172145850.4380.466231.camel@hal.voltaire.com>

On Thu, 2007-02-22 at 02:11, Scott Weitzenkamp (sweitzen) wrote:
> I tried both RHEL4 and SLES10 usinstall.sh, and get this.  I filed bug
> 379, anyone else tried ppc64?
>  
>  gcc -DHAVE_CONFIG_H -I. -I. -I. -I./include/infiniband
> -I./../libibcommon/incl\
> ude/infiniband -Wall -m64 -g -O2 -MT libibumad_la-umad.lo -MD -MP -MF
> .deps/lib\
> ibumad_la-umad.Tpo -c src/umad.c  -fPIC -DPIC -o
> .libs/libibumad_la-umad.o
> In file included from src/umad.c:50:
> ./include/infiniband/umad.h:37:31: infiniband/common.h: No such file
> or
> directo\
> ry
> src/umad.c: In function `port_alloc':
> src/umad.c:94: warning: implicit declaration of function `IBWARN'
> src/umad.c: In function `get_port':
> src/umad.c:160: warning: implicit declaration of function `snprintf'
> src/umad.c:163: warning: implicit declaration of function
> `sys_read_uint'
> src/umad.c:177: warning: implicit declaration of function
> `sys_read_uint64'
> src/umad.c:182: warning: implicit declaration of function
> `sys_read_gid'
> src/umad.c: In function `get_ca':
> src/umad.c:354: warning: implicit declaration of function
> `sys_read_string'
> src/umad.c:363: warning: implicit declaration of function
> `sys_read_guid'
> make[3]: *** [libibumad_la-umad.lo] Error 1
> make[3]: Leaving directory
> `/var/tmp/OFEDRPM/BUILD/ofa_user-1.2/src/userspace/m\
> anagement/libibumad'
> make[2]: *** [all-recursive] Error 1
> make[2]: Leaving directory
> `/var/tmp/OFEDRPM/BUILD/ofa_user-1.2/src/userspace/m\
> anagement/libibumad'
> make[1]: *** [all] Error 2
> make[1]: Leaving directory
> `/var/tmp/OFEDRPM/BUILD/ofa_user-1.2/src/userspace/m\
> anagement/libibumad'
> make: *** [subdirs] Error 1
> make: Leaving directory
> `/var/tmp/OFEDRPM/BUILD/ofa_user-1.2/src/userspace/mana\
> gement'

That missing header (common.h) is in libibcommon. Somehow, libibcommon
is not installed. libibumad depends on libibcommon. Is this a
build/install script issue with OFED 1.2 ? Vlad ?

-- Hal

> Scott Weitzenkamp
> SQA and Release Manager
> Server Virtualization Business Unit
> Cisco Systems
>  
> 
> ______________________________________________________________________
> 
> _______________________________________________
> ewg mailing list
> ewg at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


From vlad at mellanox.co.il  Thu Feb 22 04:13:53 2007
From: vlad at mellanox.co.il (Vladimir Sokolovsky)
Date: Thu, 22 Feb 2007 14:13:53 +0200
Subject: [openib-general] [PATCH ofed_1_2] iw_cxgb3: Stop the EP Timer
	on BAD CLOSE.
In-Reply-To: <1172090800.27101.40.camel@stevo-desktop>
References: <1172090800.27101.40.camel@stevo-desktop>
Message-ID: <1172146433.18306.4.camel@vladsk-laptop>

On Wed, 2007-02-21 at 14:46 -0600, Steve Wise wrote:
> Stop the ep timer in ec_status() if the status indicates a
> bad close.
> 
> Signed-off-by: Steve Wise <swise at opengridcomputing.com>
> ---
> 
>  drivers/infiniband/hw/cxgb3/iwch_cm.c |    1 +
>  1 files changed, 1 insertions(+), 0 deletions(-)
> 
> diff --git a/drivers/infiniband/hw/cxgb3/iwch_cm.c b/drivers/infiniband/hw/cxgb3/iwch_cm.c
> index e5442e3..d00e5dd 100644
> --- a/drivers/infiniband/hw/cxgb3/iwch_cm.c
> +++ b/drivers/infiniband/hw/cxgb3/iwch_cm.c
> @@ -1635,6 +1635,7 @@ static int ec_status(struct t3cdev *tdev
>  
>  		printk(KERN_ERR MOD "%s BAD CLOSE - Aborting tid %u\n",
>  		       __FUNCTION__, ep->hwtid);
> +		stop_ep_timer(ep);
>  		attrs.next_state = IWCH_QP_STATE_ERROR;
>  		iwch_modify_qp(ep->com.qp->rhp,
>  			       ep->com.qp, IWCH_QP_ATTR_NEXT_STATE,
> 

Applied.


-- 
Vladimir Sokolovsky <vlad at mellanox.co.il>
Mellanox Technologies Ltd.


From jackm at dev.mellanox.co.il  Thu Feb 22 04:32:00 2007
From: jackm at dev.mellanox.co.il (Jack Morgenstein)
Date: Thu, 22 Feb 2007 14:32:00 +0200
Subject: [openib-general] libibverbs: can't compile more than once due to
 man3 symbolic links
Message-ID: <200702221432.00546.jackm@dev.mellanox.co.il>

The code below was just added to libibverbs/Makefile.am

install-data-hook:
        cd $(DESTDIR)$(mandir)/man3 && \
        $(LN_S) ibv_get_async_event.3 ibv_ack_async_event.3 && \
	....

This creates a problem when re-compiling/re-installing libibverbs --
the "ln -s" ( = $(LN_S) ) fails because the symbolic links still exist
in the man/man3 directory.

I rummaged around the libtool documentation, and there is no pre-defined
macro which does "ln -fs" (which would just overwrite the current links).

Any ideas on how to fix this problem cleanly (i.e., without violating spirit
of libtool/automake)?

- Jack


From jsquyres at cisco.com  Thu Feb 22 04:38:03 2007
From: jsquyres at cisco.com (Jeff Squyres)
Date: Thu, 22 Feb 2007 07:38:03 -0500
Subject: [openib-general] libibverbs: can't compile more than once due
 to man3 symbolic links
In-Reply-To: <200702221432.00546.jackm@dev.mellanox.co.il>
References: <200702221432.00546.jackm@dev.mellanox.co.il>
Message-ID: <AFEF87B8-BCC7-4668-82C4-63B539CE9C50@cisco.com>

Is there a reason not to use

man_MANS = ibv_get_async_event.3 ....

?


On Feb 22, 2007, at 7:32 AM, Jack Morgenstein wrote:

> The code below was just added to libibverbs/Makefile.am
>
> install-data-hook:
>         cd $(DESTDIR)$(mandir)/man3 && \
>         $(LN_S) ibv_get_async_event.3 ibv_ack_async_event.3 && \
> 	....
>
> This creates a problem when re-compiling/re-installing libibverbs --
> the "ln -s" ( = $(LN_S) ) fails because the symbolic links still exist
> in the man/man3 directory.
>
> I rummaged around the libtool documentation, and there is no pre- 
> defined
> macro which does "ln -fs" (which would just overwrite the current  
> links).
>
> Any ideas on how to fix this problem cleanly (i.e., without  
> violating spirit
> of libtool/automake)?
>
> - Jack
>
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
>
> To unsubscribe, please visit http://openib.org/mailman/listinfo/ 
> openib-general


-- 
Jeff Squyres
Server Virtualization Business Unit
Cisco Systems


From mst at mellanox.co.il  Thu Feb 22 04:40:12 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Thu, 22 Feb 2007 14:40:12 +0200
Subject: [openib-general] libibverbs: can't compile more than once due
 to man3 symbolic links
In-Reply-To: <200702221432.00546.jackm@dev.mellanox.co.il>
References: <200702221432.00546.jackm@dev.mellanox.co.il>
Message-ID: <20070222124012.GD9727@mellanox.co.il>

> Quoting Jack Morgenstein <jackm at dev.mellanox.co.il>:
> Subject: libibverbs: can't compile more than once due to man3 symbolic links
> 
> The code below was just added to libibverbs/Makefile.am
> 
> install-data-hook:
>         cd $(DESTDIR)$(mandir)/man3 && \
>         $(LN_S) ibv_get_async_event.3 ibv_ack_async_event.3 && \
> 	....
> 
> This creates a problem when re-compiling/re-installing libibverbs --
> the "ln -s" ( = $(LN_S) ) fails because the symbolic links still exist
> in the man/man3 directory.
> 
> I rummaged around the libtool documentation, and there is no pre-defined
> macro which does "ln -fs" (which would just overwrite the current links).
> 
> Any ideas on how to fix this problem cleanly (i.e., without violating spirit
> of libtool/automake)?

Probably just add
	$(RM) ibv_ack_async_event.3 && \

-- 
MST


From jsquyres at cisco.com  Thu Feb 22 04:49:16 2007
From: jsquyres at cisco.com (Jeff Squyres)
Date: Thu, 22 Feb 2007 07:49:16 -0500
Subject: [openib-general] libibverbs: can't compile more than once due
 to man3 symbolic links
In-Reply-To: <AFEF87B8-BCC7-4668-82C4-63B539CE9C50@cisco.com>
References: <200702221432.00546.jackm@dev.mellanox.co.il>
	<AFEF87B8-BCC7-4668-82C4-63B539CE9C50@cisco.com>
Message-ID: <37A54161-C56C-42A5-91C1-D5335A3FC546@cisco.com>

Blah -- disregard; I read the mail too quickly and didn't look at the  
actual Makefile.am to see what you were really asking.

FWIW, the "install" app, by default, removes things before copying in  
the new target.  So putting a manual "rm -f" in here, while klunky,  
has precedent and will make it work.


On Feb 22, 2007, at 7:38 AM, Jeff Squyres wrote:

> Is there a reason not to use
>
> man_MANS = ibv_get_async_event.3 ....
>
> ?
>
>
> On Feb 22, 2007, at 7:32 AM, Jack Morgenstein wrote:
>
>> The code below was just added to libibverbs/Makefile.am
>>
>> install-data-hook:
>>         cd $(DESTDIR)$(mandir)/man3 && \
>>         $(LN_S) ibv_get_async_event.3 ibv_ack_async_event.3 && \
>> 	....
>>
>> This creates a problem when re-compiling/re-installing libibverbs --
>> the "ln -s" ( = $(LN_S) ) fails because the symbolic links still  
>> exist
>> in the man/man3 directory.
>>
>> I rummaged around the libtool documentation, and there is no pre-
>> defined
>> macro which does "ln -fs" (which would just overwrite the current
>> links).
>>
>> Any ideas on how to fix this problem cleanly (i.e., without
>> violating spirit
>> of libtool/automake)?
>>
>> - Jack
>>
>> _______________________________________________
>> openib-general mailing list
>> openib-general at openib.org
>> http://openib.org/mailman/listinfo/openib-general
>>
>> To unsubscribe, please visit http://openib.org/mailman/listinfo/
>> openib-general
>
>
> -- 
> Jeff Squyres
> Server Virtualization Business Unit
> Cisco Systems
>
>
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
>
> To unsubscribe, please visit http://openib.org/mailman/listinfo/ 
> openib-general


-- 
Jeff Squyres
Server Virtualization Business Unit
Cisco Systems


From jackm at dev.mellanox.co.il  Thu Feb 22 04:57:17 2007
From: jackm at dev.mellanox.co.il (Jack Morgenstein)
Date: Thu, 22 Feb 2007 14:57:17 +0200
Subject: [openib-general] [PATCH] libibverbs: can't compile more than once
 due to man3 symbolic links
In-Reply-To: <200702221432.00546.jackm@dev.mellanox.co.il>
References: <200702221432.00546.jackm@dev.mellanox.co.il>
Message-ID: <200702221457.17675.jackm@dev.mellanox.co.il>

The following patch removes manpage symbolic links so that
they may be relinked in the install.

Suggested by Michael Tsirkin.

Signed-off-by: Jack Morgenstein <jackm at mellanox.co.il>

diff --git a/Makefile.am b/Makefile.am
index 5d2383e..455041e 100644
--- a/Makefile.am
+++ b/Makefile.am
@@ -70,6 +70,18 @@ dist-hook: libibverbs.spec
 
 install-data-hook:
 	cd $(DESTDIR)$(mandir)/man3 && \
+	$(RM) ibv_ack_async_event.3 && \
+	$(RM) ibv_ack_cq_events.3 && \
+	$(RM) ibv_close_device.3 && \
+	$(RM) ibv_dealloc_pd.3 && \
+	$(RM) ibv_dereg_mr.3 && \
+	$(RM) ibv_destroy_ah.3 && \
+	$(RM) ibv_destroy_comp_channel.3 && \
+	$(RM) ibv_destroy_cq.3 && \
+	$(RM) ibv_destroy_qp.3 && \
+	$(RM) ibv_destroy_srq.3 && \
+	$(RM) ibv_detach_mcast.3 && \
+	$(RM) ibv_free_device_list.3 && \
 	$(LN_S) ibv_get_async_event.3 ibv_ack_async_event.3 && \
 	$(LN_S) ibv_get_cq_event.3 ibv_ack_cq_events.3 && \
 	$(LN_S) ibv_open_device.3 ibv_close_device.3 && \


From swise at opengridcomputing.com  Thu Feb 22 06:13:17 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Thu, 22 Feb 2007 08:13:17 -0600
Subject: [openib-general] [PATCH 0/7] cxgb3 - Chelsio T3 1G/10G driver
	updates
In-Reply-To: <45DD8559.7090106@chelsio.com>
References: <45DD8559.7090106@chelsio.com>
Message-ID: <1172153597.23995.9.camel@stevo-desktop>

Divy,

Do these need to be pulled into OFED 1.2 as well?

Steve.


On Thu, 2007-02-22 at 03:58 -0800, Divy Le Ray wrote:
> Jeff,
> 
> I'm sending a series of incremental patches updating
> the cxgb3 driver. These patches are built against Linus'git tree.
> 
> Cheers,
> Divy
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo at vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/


From halr at voltaire.com  Thu Feb 22 06:45:02 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 22 Feb 2007 09:45:02 -0500
Subject: [openib-general] [PATCH] librdmacm: fix bug causing failure to
 work with partial membership pkey
In-Reply-To: <45DD4630.3070101@voltaire.com>
References: <Pine.LNX.4.64.0702190838050.26497@zuben>
	<1171984010.4380.304008.camel@hal.voltaire.com>
	<45DB15F5.4090406@voltaire.com>
	<1171986159.4380.306117.camel@hal.voltaire.com>
	<45DBEA1F.5090901@voltaire.com>
	<1172058368.4380.379947.camel@hal.voltaire.com>
	<45DC3C96.8040100@voltaire.com>
	<1172064021.4380.385825.camel@hal.voltaire.com>
	<15ddcffd0702211245w2686b97bhcaf7e86aaa3dedf5@mail.gmail.com>
	<1172096957.4380.418140.camel@hal.voltaire.com>
	<45DD4630.3070101@voltaire.com>
Message-ID: <1172155499.4380.475947.camel@hal.voltaire.com>

On Thu, 2007-02-22 at 02:28, Or Gerlitz wrote:
> Hal Rosenstock wrote:
> > On Wed, 2007-02-21 at 15:45, Or Gerlitz wrote:
> >> On 21 Feb 2007 08:20:23 -0500, Hal Rosenstock <halr at voltaire.com> wrote:
> 
> >> If the IPoIB spec does not allow both partial and full members of a
> >> partition to share a broadcast domain (eg the IPv4 broadcast group
> >> associated with the full membership pkey) or any other multicast
> >> group, burn it (or at least the relevant section).
> 
> > I was referring to the IB spec, not an IPoIB RFC.
> 
> Can you provide a pointer?

See MCMemberRecord:P_Key description in table 210 (p. 908).

> >> The OpenIB code supposed to work and as done with the RDMA CM header,
> >> the implementation should not wait for spec to be written or changed.
> 
> > Really ? Maybe I'm mistaken but I didn't think that OpenIB/OpenFabrics
> > wanted to issue code which is not IBA spec compliant.
> 
> The code resides in the Linux kernel, period. Linux is not under the 
> control of this or that organization, period, period. Linux uses an 
> hierarchic maintainship structure where Roland, Sean and yourself are 
> listed as the maintainers, which means you are able to promote and/or 
> block this or that agenda, go for it!

OpenIB claims IBA compliance (currently mostly v1.2) and is there any
good reason that we shouldn't continue to adhere to this ?

-- Hal

> Or.


From halr at voltaire.com  Thu Feb 22 06:45:56 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 22 Feb 2007 09:45:56 -0500
Subject: [openib-general] [PATCH] librdmacm: fix bug causing failure to
 work with partial membership pkey
In-Reply-To: <45DD4E7E.50009@voltaire.com>
References: <15ddcffd0702211234j36b00a99i944e77ee0837d8c3@mail.gmail.com>
	<000301c755f8$d2f265e0$8698070a@amr.corp.intel.com>
	<15ddcffd0702211250u49ceaa6bj4d607f9cfe802cdc@mail.gmail.com>
	<45DCC968.30208@ichips.intel.com> <45DD4E7E.50009@voltaire.com>
Message-ID: <1172155552.4380.475949.camel@hal.voltaire.com>

On Thu, 2007-02-22 at 03:04, Or Gerlitz wrote:
> Sean Hefty wrote:
> >> Note that since the HCA validates the pkey in the in coming packet, no
> >> matter what the IB SW would do, partial members of a partition can't
> >> talk to each other. So the approach taken by the core/ipoib code was
> >> to just ignore the MSb in places where the code looks for the pkey
> >> --index-- and use the full member pkey when forming MGIDs. This seems
> >> fine to me.
> 
> > My concern is that ib_find_cached_pkey() returns an index to a pkey that wasn't 
> > the one in the search.  Can this lead to a QP being configured in such a way 
> > that communication with a remote QP would silently fail?
> 
> My understanding is that when an IPoIB broadcast domain contains both 
> partial and full members (*) attempts to communicate between two partial 
> members would silently fail,

An IB multicast group _cannot_ have partial members so this never should
get far enough to where two limited members would be unable to
communicate.

-- Hal

> does this silence is something you think we 
> should work to change?
> 
> (*) eg when you have bunch or clients and a server or bunch of servers 
> and you don't want to allow --clients-- to communicate among themselves)
> 
> > I'm not against this patch, but I want to make sure that I understand the 
> > issues, so we're not creating a work-around solution.  The patch is against the 
> > librdmacm, yet there's nothing that I see in the librdmacm that makes me think 
> > it's behaving incorrectly.
> 
> My thinking is that if in the end of this thread we are willing to move 
> forward without changing ib_find_cached_pkey() then this patch should be 
> merged.
> 
> Or.


From ogerlitz at voltaire.com  Thu Feb 22 07:07:13 2007
From: ogerlitz at voltaire.com (Or Gerlitz)
Date: Thu, 22 Feb 2007 17:07:13 +0200 (IST)
Subject: [openib-general] librdmacm examples not working under OFED 1.2 alpha
Message-ID: <Pine.LNX.4.64.0702221633570.16476@zuben>

I have tested RH4 U4 and to some extent also RH5 beta and see the following:

under RH4 U4
============

- rping: addr and route resolution passing, client getting reject on conn req

- udaddy: working fine on both UDP and IPOIB port spaces

- mckey: not applicable on RH4 U4 till my patch with ip_ib_mc_map is merged

under both udaddy and rping librdmacm report:

	librdmacm: couldn't read ABI version.
	librdmacm: assuming: 4

under RH5
=========

basically, the same: rping does not work, udaddy works on both port spaces.
Also was able to check mckey and it works fine on both port spaces.
The ABI error print is not seen.

The rping client/server logs are below,

Or.

rping client
============

root at Adi6 ~]# rping -c -v -d -a 193.168.80.175
ipaddr (193.168.80.175)
librdmacm: couldn't read ABI version.
librdmacm: assuming: 4
created cm_id 0x505f10
cma_event type 0 cma_id 0x505f10 (parent)
cma_event type 2 cma_id 0x505f10 (parent)
rdma_resolve_addr - rdma_resolve_route successful
created pd 0x507830
created channel 0x506260
created cq 0x507880
created qp 0x507990
rping_setup_buffers called on cb 0x505010
allocated & registered buffers...
cq_thread started.
cq completion failed status 5
wait for CONNECTED state 10
connect error -1
rping_free_buffers called on cb 0x505010
cma_event type 8 cma_id 0x505f10 (parent)
cma event 8, error 0

rping server
===========
root at Adi5 ~]# rping -s -d -v -S 100 -C 100
verbose
size 100
count 100
librdmacm: couldn't read ABI version.
librdmacm: assuming: 4
created cm_id 0x505f00
rdma_bind_addr successful
rdma_listen


From swise at opengridcomputing.com  Thu Feb 22 07:12:17 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Thu, 22 Feb 2007 09:12:17 -0600
Subject: [openib-general] librdmacm examples not working under OFED 1.2
 alpha
In-Reply-To: <Pine.LNX.4.64.0702221633570.16476@zuben>
References: <Pine.LNX.4.64.0702221633570.16476@zuben>
Message-ID: <1172157137.26393.3.camel@stevo-desktop>

What device?

On Thu, 2007-02-22 at 17:07 +0200, Or Gerlitz wrote:
> I have tested RH4 U4 and to some extent also RH5 beta and see the following:
> 
> under RH4 U4
> ============
> 
> - rping: addr and route resolution passing, client getting reject on conn req
> 
> - udaddy: working fine on both UDP and IPOIB port spaces
> 
> - mckey: not applicable on RH4 U4 till my patch with ip_ib_mc_map is merged
> 
> under both udaddy and rping librdmacm report:
> 
> 	librdmacm: couldn't read ABI version.
> 	librdmacm: assuming: 4
> 
> under RH5
> =========
> 
> basically, the same: rping does not work, udaddy works on both port spaces.
> Also was able to check mckey and it works fine on both port spaces.
> The ABI error print is not seen.
> 
> The rping client/server logs are below,
> 
> Or.
> 
> rping client
> ============
> 
> root at Adi6 ~]# rping -c -v -d -a 193.168.80.175
> ipaddr (193.168.80.175)
> librdmacm: couldn't read ABI version.
> librdmacm: assuming: 4
> created cm_id 0x505f10
> cma_event type 0 cma_id 0x505f10 (parent)
> cma_event type 2 cma_id 0x505f10 (parent)
> rdma_resolve_addr - rdma_resolve_route successful
> created pd 0x507830
> created channel 0x506260
> created cq 0x507880
> created qp 0x507990
> rping_setup_buffers called on cb 0x505010
> allocated & registered buffers...
> cq_thread started.
> cq completion failed status 5
> wait for CONNECTED state 10
> connect error -1
> rping_free_buffers called on cb 0x505010
> cma_event type 8 cma_id 0x505f10 (parent)
> cma event 8, error 0
> 
> rping server
> ===========
> root at Adi5 ~]# rping -s -d -v -S 100 -C 100
> verbose
> size 100
> count 100
> librdmacm: couldn't read ABI version.
> librdmacm: assuming: 4
> created cm_id 0x505f00
> rdma_bind_addr successful
> rdma_listen
> 
> 
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
> 


From ogerlitz at voltaire.com  Thu Feb 22 07:13:21 2007
From: ogerlitz at voltaire.com (Or Gerlitz)
Date: Thu, 22 Feb 2007 17:13:21 +0200
Subject: [openib-general] librdmacm examples not working under OFED 1.2
 alpha
In-Reply-To: <1172157137.26393.3.camel@stevo-desktop>
References: <Pine.LNX.4.64.0702221633570.16476@zuben>
	<1172157137.26393.3.camel@stevo-desktop>
Message-ID: <45DDB311.30802@voltaire.com>

Steve Wise wrote:
> What device?

mthca


From mst at mellanox.co.il  Thu Feb 22 07:16:06 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Thu, 22 Feb 2007 17:16:06 +0200
Subject: [openib-general] librdmacm examples not working under OFED 1.2
	alpha
In-Reply-To: <Pine.LNX.4.64.0702221633570.16476@zuben>
References: <Pine.LNX.4.64.0702221633570.16476@zuben>
Message-ID: <20070222151606.GD29559@mellanox.co.il>

> 
> 	librdmacm: couldn't read ABI version.
> 	librdmacm: assuming: 4

I think there was a kernel patch from Woody to address this.
Woody?

-- 
MST


From vlad at dev.mellanox.co.il  Thu Feb 22 08:13:28 2007
From: vlad at dev.mellanox.co.il (Vladimir Sokolovsky)
Date: Thu, 22 Feb 2007 18:13:28 +0200
Subject: [openib-general] OFED-1.2-20070221-1741.tgz package is available
In-Reply-To: <1172097621.5994.13.camel@trinity.ogc.int>
References: <1172080605.5256.35.camel@vladsk-laptop>
	<1172080948.27101.15.camel@stevo-desktop>
	<1172081761.5256.45.camel@vladsk-laptop>
	<1172097621.5994.13.camel@trinity.ogc.int>
Message-ID: <1172160808.29968.2.camel@vladsk-laptop>

On Wed, 2007-02-21 at 16:40 -0600, Tom Tucker wrote:
> > Bug: 355 (problems building modules that depend on OFED 1.2 modules)
> > 
> > In order to build kernel modules depending on OFED's modules you need to
> > take Modules.symvers file from <prefix>/src/openib/Modules.symvers (part
> > of kernel-ib-devel RPM) and copy this to modules subdir and then compile
> > your module.
> 
> Won't this blow away all the version information for the non-IB symbols?
> 

See Documentation/kbuild/modules.txt (under kernel sources):

--- 7.3 Symbols from another external module
	...
        Use an extra Module.symvers file
                When an external module is built, a Module.symvers file is
                generated containing all exported symbols which are not
                defined in the kernel.
                To get access to symbols from module 'bar', one can copy the
                Module.symvers file from the compilation of the 'bar' module
                to the directory where the 'foo' module is built.
                During the module build, kbuild will read the Module.symvers
                file in the directory of the external module and when the
                build is finished, a new Module.symvers file is created
                containing the sum of all symbols defined and not part of the
                kernel.


-- 
Vladimir Sokolovsky <vlad at dev.mellanox.co.il>
Mellanox Technologies Ltd.


From mplee at sandia.gov  Thu Feb 22 08:44:25 2007
From: mplee at sandia.gov (Lee, Michael Paichi)
Date: Thu, 22 Feb 2007 09:44:25 -0700
Subject: [openib-general] Address List Change Now Scheduled for Wednesday,
	2/28/2007
In-Reply-To: <F65CFD32-1CD6-49AD-A91D-38CBED4236C6@cisco.com>
References: <924EA79E-8FE2-49A5-85AB-84B7749D535C@cisco.com>
	<20070221153402.GB17761@mellanox.co.il>
	<F65CFD32-1CD6-49AD-A91D-38CBED4236C6@cisco.com>
Message-ID: <3D84A59A1AD3584DA02AEAD240E8863F03BC471E@ES22SNLNT.srn.sandia.gov>

The list will now be migrated on Wednesday, 2/28/2007.

List address:         general at lists.openfabrics.org
Updated change-date:  Wednesday, 2/28/2007

Michael


-----Original Message-----
From: Jeff Squyres [mailto:jsquyres at cisco.com] 
Sent: Wednesday, February 21, 2007 8:09 AM
To: Michael S. Tsirkin
Cc: OpenFabrics General; Lee, Michael Paichi
Subject: Re: Address List Change for Friday, 2/23/2007

On Feb 21, 2007, at 10:34 AM, Michael S. Tsirkin wrote:

>> Can you look at the other lists that have migrated for examples?
>> (e.g., ewg)
>
> If I look at other lists, there's no guarantee the rule will catch the

> actual message.

Can't you just paste in the new address of the list in your existing
rules?  I must be missing something.

>> It may be complex to send an actual example message *before* the list

>> moves.
>
> In this case, maybe the migration can be done in the middle of the 
> week?

I'll let Michael Lee answer; we're currently driving off his goodwill
and his schedule.

I guess I didn't see why this was complex -- if a few mails get
misplaced over the weekend because cutting-n-pasting the new e-mail
address into existing rules somehow didn't work, is there a huge
problem?

--
Jeff Squyres
Server Virtualization Business Unit
Cisco Systems


From sweitzen at cisco.com  Thu Feb 22 09:36:07 2007
From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen))
Date: Thu, 22 Feb 2007 09:36:07 -0800
Subject: [openib-general] [ewg] anyone have OFED 1.2 alpha1 compiling on
	ppc64
In-Reply-To: <1172145850.4380.466231.camel@hal.voltaire.com>
References: <A15335FBE9BD2449AF2C9EF3D1EB8EA3030AD952@xmb-sjc-216.amer.cisco.com>
	<1172145850.4380.466231.camel@hal.voltaire.com>
Message-ID: <A15335FBE9BD2449AF2C9EF3D1EB8EA3030ADA60@xmb-sjc-216.amer.cisco.com>

> That missing header (common.h) is in libibcommon. Somehow, libibcommon
> is not installed. libibumad depends on libibcommon. Is this a
> build/install script issue with OFED 1.2 ? Vlad ?
> 
> -- Hal

I tried install.sh again, this time telling it to build libibcommon
instead of relying on dependencies, and get this:

+ install -m 0755 /var/tmp/OFED/usr/local/ofed/bin32/mread
/var/tmp/OFED/usr/lo\
cal/ofed/bin
install: cannot stat `/var/tmp/OFED/usr/local/ofed/bin32/mread': No such
file o\
r directory

I believe mread has been renamed to mstread.

# ls /var/tmp/OFED/usr/local/ofed/bin32
mstflint  mstmread  mstmwrite  mstregdump  mstvpd

Scott


From sean.hefty at intel.com  Thu Feb 22 09:47:42 2007
From: sean.hefty at intel.com (Sean Hefty)
Date: Thu, 22 Feb 2007 09:47:42 -0800
Subject: [openib-general] [PATCH] librdmacm: fix bug causing failure to
 work with partial membership pkey
In-Reply-To: <45DD4E7E.50009@voltaire.com>
Message-ID: <000201c756a9$8d650c60$8698070a@amr.corp.intel.com>

>My understanding is that when an IPoIB broadcast domain contains both
>partial and full members (*) attempts to communicate between two partial
>members would silently fail, does this silence is something you think we
>should work to change?

I'm looking at this from a different view than just ipoib multicast groups.  For
example, can two users of the ib_cm successfully establish a connection, but not
actually be able to transfer data between each other?  This seems possible,
though unlikely.  This is the type of silent failure I'm referring to.

Without this patch, two clients that try to connect using the librdmacm will
fail.  That failure is reported to the user.  With this patch, the connection
would be created, but I don't think that it guarantees that communication can
actually occur.  I don't want to mask a configuration issue.

>My thinking is that if in the end of this thread we are willing to move
>forward without changing ib_find_cached_pkey() then this patch should be
>merged.

I'm still unsure about where the cause of this problem lies.  It may be that the
kernel rdma_cm or rdma_ucm needs to change if we decide the ib_find_cached_pkey
is correct.

- Sean


From sean.hefty at intel.com  Thu Feb 22 09:56:04 2007
From: sean.hefty at intel.com (Sean Hefty)
Date: Thu, 22 Feb 2007 09:56:04 -0800
Subject: [openib-general] [PATCH] librdmacm: fix bug causing failure to
 work with partial membership pkey
In-Reply-To: <1172155552.4380.475949.camel@hal.voltaire.com>
Message-ID: <000301c756aa$b89e0020$8698070a@amr.corp.intel.com>

>An IB multicast group _cannot_ have partial members so this never should
>get far enough to where two limited members would be unable to
>communicate.

Can someone help my understanding here?  Is ipoib joining a multicast group
using the full membership PKey, even if the node that it joins from only has the
limited membership PKey configured?  And the code in ib_find_cached_pkey helps
enable this?

- Sean


From jackm at dev.mellanox.co.il  Thu Feb 22 10:16:43 2007
From: jackm at dev.mellanox.co.il (Jack Morgenstein)
Date: Thu, 22 Feb 2007 20:16:43 +0200
Subject: [openib-general] [PATCH v2] libibverbs: can't compile more than
 once due to man3 symbolic links
In-Reply-To: <200702221457.17675.jackm@dev.mellanox.co.il>
References: <200702221432.00546.jackm@dev.mellanox.co.il>
	<200702221457.17675.jackm@dev.mellanox.co.il>
Message-ID: <200702222016.43559.jackm@dev.mellanox.co.il>

Missed 2 lines in the patch.

Below is the correct patch:

---
The following patch removes manpage symbolic links so that
they may be relinked in the install.

Suggested by Michael Tsirkin.

Signed-off-by: Jack Morgenstein <jackm at mellanox.co.il>

diff --git a/Makefile.am b/Makefile.am
index 5d2383e..705b184 100644
--- a/Makefile.am
+++ b/Makefile.am
@@ -70,6 +70,20 @@ dist-hook: libibverbs.spec
 
 install-data-hook:
 	cd $(DESTDIR)$(mandir)/man3 && \
+	$(RM) ibv_ack_async_event.3 && \
+	$(RM) ibv_ack_cq_events.3 && \
+	$(RM) ibv_close_device.3 && \
+	$(RM) ibv_dealloc_pd.3 && \
+	$(RM) ibv_dereg_mr.3 && \
+	$(RM) ibv_destroy_ah.3 && \
+	$(RM) ibv_destroy_comp_channel.3 && \
+	$(RM) ibv_destroy_cq.3 && \
+	$(RM) ibv_destroy_qp.3 && \
+	$(RM) ibv_destroy_srq.3 && \
+	$(RM) ibv_detach_mcast.3 && \
+	$(RM) ibv_free_device_list.3 && \
+	$(RM) ibv_init_ah_from_wc.3 && \
+	$(RM) mult_to_ibv_rate.3 && \
 	$(LN_S) ibv_get_async_event.3 ibv_ack_async_event.3 && \
 	$(LN_S) ibv_get_cq_event.3 ibv_ack_cq_events.3 && \
 	$(LN_S) ibv_open_device.3 ibv_close_device.3 && \


From Jesse.Butler at Sun.COM  Thu Feb 22 10:27:12 2007
From: Jesse.Butler at Sun.COM (Jesse Butler)
Date: Thu, 22 Feb 2007 13:27:12 -0500
Subject: [openib-general] mthca adjust_key()
Message-ID: <45DDE080.60507@sun.com>


Could anyone tell me why this routine in mthca is necessary?  There 
aren't any comments to explain it; I'm wondering if this is a workaround 
for Sinai of some kind?

static inline u32 adjust_key(struct mthca_dev *dev, u32 key)
{
    if (dev->mthca_flags & MTHCA_FLAG_SINAI_OPT)
        return ((key << 20) & 0x800000) | (key & 0x7fffff);
    else
        return key;
}

Thanks in advance,
Jesse


From rdreier at cisco.com  Thu Feb 22 10:30:00 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Thu, 22 Feb 2007 10:30:00 -0800
Subject: [openib-general] [PATCH v2] libibverbs: can't compile more than
 once due to man3 symbolic links
In-Reply-To: <200702222016.43559.jackm@dev.mellanox.co.il> (Jack
	Morgenstein's message of "Thu, 22 Feb 2007 20:16:43 +0200")
References: <200702221432.00546.jackm@dev.mellanox.co.il>
	<200702221457.17675.jackm@dev.mellanox.co.il>
	<200702222016.43559.jackm@dev.mellanox.co.il>
Message-ID: <ada8xeq11w7.fsf@cisco.com>

Thanks, I applied this and pushed it out.


From rdreier at cisco.com  Thu Feb 22 10:34:16 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Thu, 22 Feb 2007 10:34:16 -0800
Subject: [openib-general] [RFC/BUG] DMA vs. CQ race
In-Reply-To: <20070222012111.GB3352@sgi.com> (akepner@sgi.com's message
	of "Wed, 21 Feb 2007 17:21:11 -0800")
References: <20070222012111.GB3352@sgi.com>
Message-ID: <ada4ppe11p3.fsf@cisco.com>

 > A first-cut at a patch was sent out, some very reasonable
 > objections were raised, and the thread fizzled out.

Sorry, I meant to respond again, but I never got around to it.

 > The biggest concern with the earlier patch seemed to be
 > backward compatibility. There was a stab at addressing
 > that in http://tinyurl.com/2x3s52, but no commentary.
 > (Too ugly for words?)

I think you went off into the weeds there, but I'll respond to that
earlier email in detail.

 > Any suggestions as to how to proceed? Should I just code
 > something up in order to have a concrete target to discuss?
 > Or are there any new thoughts based on the previous emails?

I actually have a vague plan for a somewhat cleaner way to get this
fix.  For a variety of reasons, I am planning on changing the way the
kernel handles memory registration so that low-level drivers have more
control over what happens.  This would allow us to folow Gleb's
suggestion to use register MR to create and map the kernel's buffer
and avoid some of the error path ugliness.  So I would prefer to map
the coherent memory that way.

However this will take a while to come to fruition, since it is kind
of a background task for me.  How severe is this issue?  In other
words, when you produced the problem, was it a synthetic test, or a
workload that someone might actually want to run?

 - R.


From rdreier at cisco.com  Thu Feb 22 10:40:35 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Thu, 22 Feb 2007 10:40:35 -0800
Subject: [openib-general] mthca adjust_key()
In-Reply-To: <45DDE080.60507@sun.com> (Jesse Butler's message of
	"Thu, 22 Feb 2007 13:27:12 -0500")
References: <45DDE080.60507@sun.com>
Message-ID: <adaps82yr18.fsf@cisco.com>

 > Could anyone tell me why this routine in mthca is necessary?  There 
 > aren't any comments to explain it; I'm wondering if this is a workaround 
 > for Sinai of some kind?
 > 
 > static inline u32 adjust_key(struct mthca_dev *dev, u32 key)
 > {
 >     if (dev->mthca_flags & MTHCA_FLAG_SINAI_OPT)
 >         return ((key << 20) & 0x800000) | (key & 0x7fffff);
 >     else
 >         return key;
 > }

It's a performance optimization for Sinai.

 - R.


From akepner at sgi.com  Thu Feb 22 10:32:08 2007
From: akepner at sgi.com (akepner at sgi.com)
Date: Thu, 22 Feb 2007 10:32:08 -0800
Subject: [openib-general] [RFC/BUG] DMA vs. CQ race
In-Reply-To: <ada4ppe11p3.fsf@cisco.com>
References: <20070222012111.GB3352@sgi.com> <ada4ppe11p3.fsf@cisco.com>
Message-ID: <20070222183208.GC3352@sgi.com>

On Thu, Feb 22, 2007 at 10:34:16AM -0800, Roland Dreier wrote:
> 
> I actually have a vague plan for a somewhat cleaner way to get this
> fix.  For a variety of reasons, I am planning on changing the way the
> kernel handles memory registration so that low-level drivers have more
> control over what happens.  This would allow us to folow Gleb's
> suggestion to use register MR to create and map the kernel's buffer
> and avoid some of the error path ugliness.  So I would prefer to map
> the coherent memory that way.

OK, I look forward to seeing what you have in mind.

> 
> However this will take a while to come to fruition, since it is kind
> of a background task for me.  How severe is this issue?  In other
> words, when you produced the problem, was it a synthetic test, or a
> workload that someone might actually want to run?
> 

We found this accidentally, running a normal MPI job, on a 
"normally sized" machine (i.e., tens, not hundreds of 
processors.) It appears to be more easily produced that 
we'd expected, and we consider it to be a severe problem.

-- 
Arthur


From rdreier at cisco.com  Thu Feb 22 11:07:05 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Thu, 22 Feb 2007 11:07:05 -0800
Subject: [openib-general] [RFC/BUG] libibverbs: DMA vs. CQ race
In-Reply-To: <Pine.LNX.4.61.0702021326050.26058@localhost.localdomain> (
	akepner@sgi.com's message of "Fri, 2 Feb 2007 13:34:15 -0800 (PST)")
References: <Pine.LNX.4.61.0612131626250.24974@localhost.localdomain>
	<ada8xhaq5ze.fsf@cisco.com>
	<Pine.LNX.4.61.0701281444230.32767@localhost.localdomain>
	<adabqkhy04v.fsf@cisco.com>
	<Pine.LNX.4.61.0702021326050.26058@localhost.localdomain>
Message-ID: <adamz36ypt2.fsf@cisco.com>

 > Assuming that something along the lines of the previous patch
 > is used, we need to address userspace/kernel compatibility.
 > 
 > The existing abi versioning doesn't seem to be exactly what
 > we want to use, though, because we want to change a verb's
 > semantics to work around a bug. (Changing the abi_version
 > may be an inevitable result, though.)
 > 
 > How about adding "semantic flags" to the mthca_* commands
 > (mthca_create_cq, etc.)? Userspace could read the contents of
 > a new sysfs file which, if found, would indicate the flags
 > that the kernel understands. Then it could pass the flags, if
 > it chooses, to get the kernel to use the desired semantics.

This is not really the design philosophy that we've used in defining
the user-kernel interfaces for IB verbs.  Rather than having
complexity in the kernel to handle both old and new ways of doing
things, the way we've used to handle cases like this is the following:

 - specify new fixed ABI (in this case, mthca abi_version 2)
 - update library to handle old and new ABI (in this case, update
   libmthca to use mthca kernel abi 1 or 2 depending on what it
   detects at runtime)
 - update kernel to implement new ABI, and remove old ABI from kernel
   (in this case, update kernel mthca driver to abi_version 2)

The net effect of this is that updated userspace works fine with any
kernel, but updating the kernel will require updating userspace
libraries too.  However the important point is that once userspace is
updated, it's still possible to boot into old kernels and have things
work without downgrading userspace.

If we really wanted to export some flags from mthca back to libmthca,
I guess it would be possible to bump the abi version and add a flags
field to the response to the alloc_ucontext command, but in this case
I don't see a reason to worry about it.

 - R.


From bugzilla-daemon at lists.openfabrics.org  Thu Feb 22 11:10:00 2007
From: bugzilla-daemon at lists.openfabrics.org (bugzilla-daemon at lists.openfabrics.org)
Date: Thu, 22 Feb 2007 11:10:00 -0800 (PST)
Subject: [openib-general] [Bug 384] New: OFED 1.2 alpha1 ib-bonding won't
 compile on RHEL4 U3 ppc64
Message-ID: <bug-384-1@https.bugs.openfabrics.org/>

https://bugs.openfabrics.org/show_bug.cgi?id=384

           Summary: OFED 1.2 alpha1 ib-bonding won't compile on RHEL4 U3
                    ppc64
           Product: OpenFabrics Linux
           Version: 1.2alpha1
          Platform: PPC64
        OS/Version: RHEL 4
            Status: NEW
          Severity: normal
          Priority: P3
         Component: IPoIB
        AssignedTo: bugzilla at openib.org
        ReportedBy: sweitzen at cisco.com


+ cd linux/drivers/net/bonding/
++ pwd
+ make -C /lib/modules/2.6.9-34.EL/build
M=/var/tmp/OFEDRPM/BUILD/ib-bonding-0.\
9.0/linux/drivers/net/bonding
make: Entering directory `/usr/src/kernels/2.6.9-34.EL-ppc64'
  LD     
/var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bui\
lt-in.o
  CC [M] 
/var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bon\
d_main.o
In file included from
/var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net\
/bonding/bond_main.c:78:
/var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bonding.h:
In\
 function `bond_set_slave_inactive_flags':
/var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bonding.h:260\
: error: `IFF_SLAVE_INACTIVE' undeclared (first use in this function)
/var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bonding.h:260\
: error: (Each undeclared identifier is reported only once
/var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bonding.h:260\
: error: for each function it appears in.)
/var/tmp/OFEDRPM/BUILD/ib-bonding-0.9.0/linux/drivers/net/bonding/bonding.h:262\
: error: `IFF_SLAVE_NEEDARP' undeclared (first use in this function)
....


-- 
Configure bugmail: https://bugs.openfabrics.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From rdreier at cisco.com  Thu Feb 22 11:10:27 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Thu, 22 Feb 2007 11:10:27 -0800
Subject: [openib-general] [RFC/BUG] DMA vs. CQ race
In-Reply-To: <20070222183208.GC3352@sgi.com> (akepner@sgi.com's message
	of "Thu, 22 Feb 2007 10:32:08 -0800")
References: <20070222012111.GB3352@sgi.com> <ada4ppe11p3.fsf@cisco.com>
	<20070222183208.GC3352@sgi.com>
Message-ID: <adabqjmypng.fsf@cisco.com>

 > We found this accidentally, running a normal MPI job, on a 
 > "normally sized" machine (i.e., tens, not hundreds of 
 > processors.) It appears to be more easily produced that 
 > we'd expected, and we consider it to be a severe problem.

Hmm, OK.  Then I will do my best to make sure we get a fix for this
into 2.6.22.

 - R.


From vlad at mellanox.co.il  Thu Feb 22 12:28:16 2007
From: vlad at mellanox.co.il (Vladimir Sokolovsky)
Date: Thu, 22 Feb 2007 22:28:16 +0200
Subject: [openib-general] [ewg] anyone have OFED 1.2 alpha1 compiling on
	ppc64
Message-ID: <6C2C79E72C305246B504CBA17B5500C922B36D@mtlexch01.mtl.com>

Hi Scott,
Try OFED-1.2-20070221-1741. This issue was fixed in this package.

Regards,
Vladimir

-----Original Message-----
From: Scott Weitzenkamp (sweitzen) [mailto:sweitzen at cisco.com] 
Sent: Thursday, February 22, 2007 7:36 PM
To: Hal Rosenstock
Cc: Openfabrics-ewg at openib.org; OPENIB; Vladimir Sokolovsky
Subject: RE: [ewg] anyone have OFED 1.2 alpha1 compiling on ppc64


> That missing header (common.h) is in libibcommon. Somehow, libibcommon

> is not installed. libibumad depends on libibcommon. Is this a 
> build/install script issue with OFED 1.2 ? Vlad ?
> 
> -- Hal

I tried install.sh again, this time telling it to build libibcommon
instead of relying on dependencies, and get this:

+ install -m 0755 /var/tmp/OFED/usr/local/ofed/bin32/mread
/var/tmp/OFED/usr/lo\
cal/ofed/bin
install: cannot stat `/var/tmp/OFED/usr/local/ofed/bin32/mread': No such
file o\ r directory

I believe mread has been renamed to mstread.

# ls /var/tmp/OFED/usr/local/ofed/bin32
mstflint  mstmread  mstmwrite  mstregdump  mstvpd

Scott


From or.gerlitz at gmail.com  Thu Feb 22 13:09:25 2007
From: or.gerlitz at gmail.com (Or Gerlitz)
Date: Thu, 22 Feb 2007 23:09:25 +0200
Subject: [openib-general] [PATCH] librdmacm: fix bug causing failure to
 work with partial membership pkey
In-Reply-To: <000301c756aa$b89e0020$8698070a@amr.corp.intel.com>
References: <1172155552.4380.475949.camel@hal.voltaire.com>
	<000301c756aa$b89e0020$8698070a@amr.corp.intel.com>
Message-ID: <15ddcffd0702221309q4633a36cg8a7bb5ff69d78776@mail.gmail.com>

On 2/22/07, Sean Hefty <sean.hefty at intel.com> wrote:
> >An IB multicast group _cannot_ have partial members so this never should
> >get far enough to where two limited members would be unable to
> >communicate.

> Can someone help my understanding here?  Is ipoib joining a multicast group
> using the full membership PKey, even if the node that it joins from only has the
> limited membership PKey configured? And the code in ib_find_cached_pkey helps
> enable this?

Yep. The ipoib create_child  function Or-s 0x8000  to the device pkey
which was provided by the user. Now, IPoIB uses the device pkey when
forming MGIDs and when doing modify qp to init. Indeed the way
ib_find_cached_pkey() is implemented, make the latter use trivial.

Or.


From or.gerlitz at gmail.com  Thu Feb 22 13:15:36 2007
From: or.gerlitz at gmail.com (Or Gerlitz)
Date: Thu, 22 Feb 2007 23:15:36 +0200
Subject: [openib-general] [PATCH] librdmacm: fix bug causing failure to
 work with partial membership pkey
In-Reply-To: <000201c756a9$8d650c60$8698070a@amr.corp.intel.com>
References: <45DD4E7E.50009@voltaire.com>
	<000201c756a9$8d650c60$8698070a@amr.corp.intel.com>
Message-ID: <15ddcffd0702221315h1da6b16cre9ce95e5c65790bc@mail.gmail.com>

On 2/22/07, Sean Hefty <sean.hefty at intel.com> wrote:
> >My understanding is that when an IPoIB broadcast domain contains both
> >partial and full members (*) attempts to communicate between two partial
> >members would silently fail, does this silence is something you think we
> >should work to change?
>
> I'm looking at this from a different view than just ipoib multicast groups.  For
> example, can two users of the ib_cm successfully establish a connection, but not
> actually be able to transfer data between each other?  This seems possible,
> though unlikely.  This is the type of silent failure I'm referring to.

I don't think this is possible since the active CM uses the pkey index
of the pkey provided in REQ.path to send the REQ mad, same for the
passive CM - it uses the index in its table of REQ.path.pkey. So if
the CMs are able to talk over QP1 using this pkey index the CM
consumers can talk over their RC (REQ) / UD (SIDR REQ) QPs. And both
the CM and its consumers would use the same index - the one returned
from the ib_get_cached_pkey

Or.


From mst at mellanox.co.il  Thu Feb 22 13:14:42 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Thu, 22 Feb 2007 23:14:42 +0200
Subject: [openib-general] [PATCH for-2.6.21] IPoIB/cm: improve small
 message bandwidth
In-Reply-To: <adaejokdlp6.fsf@cisco.com>
References: <20070220181755.GC11825@mellanox.co.il> <adaejokdlp6.fsf@cisco.com>
Message-ID: <20070222211442.GB9143@mellanox.co.il>

> Quoting Roland Dreier <rdreier at cisco.com>:
> Subject: Re: [PATCH for-2.6.21] IPoIB/cm: improve small message bandwidth
> 
> Thanks, queued for 2.6.21.  With this patch I see small-packet latency
> down almost all the way back to what datagram mode gives -- on a pair
> of fast woodcrest systems I see latencies for netpipe tcp 1 byte
> messages like
> 
>   datagram     13.xx
>   original CM  17.xx
>   patched CM   14.xx
> 
> so there is still a measurable difference but it is much less now.

Hmm. An old system I tried here has a much higher latency, but
does not seem to exhibit latency difference between datagram and CM.

1. Is there something special you do when you run the benchmark (msi, taskset, ...)?
2. On a wild guess that the issue here is higher interrupt rate with CM,
   is there a chance you could test the following patch posted by me earlier?
   http://www.mail-archive.com/openib-general at openib.org/msg29290.html

Thanks,

-- 
MST


From rdreier at cisco.com  Thu Feb 22 13:21:17 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Thu, 22 Feb 2007 13:21:17 -0800
Subject: [openib-general] [PATCH for-2.6.21] IPoIB/cm: improve small
 message bandwidth
In-Reply-To: <20070222211442.GB9143@mellanox.co.il> (Michael S.
	Tsirkin's message of "Thu, 22 Feb 2007 23:14:42 +0200")
References: <20070220181755.GC11825@mellanox.co.il>
	<adaejokdlp6.fsf@cisco.com> <20070222211442.GB9143@mellanox.co.il>
Message-ID: <adamz35x50y.fsf@cisco.com>

 > 1. Is there something special you do when you run the benchmark (msi, taskset, ...)?

Yes, I am using MSI-X, and I pin the interrupt handler to one CPU
(CPU#0 in my particular case).  Then I use taskset to pin the NPtcp
process to a CPU in a different package (CPU#2 in my system).

BTW with these same systems, I am getting up to ~1150 MB/sec of
throughput with DDR mem-free Arbel, as measured with NPtcp.

 > 2. On a wild guess that the issue here is higher interrupt rate with CM,
 >    is there a chance you could test the following patch posted by me earlier?
 >    http://www.mail-archive.com/openib-general at openib.org/msg29290.html

OK, I'll try that when I get a chance.

 - R.


From sweitzen at cisco.com  Thu Feb 22 13:25:57 2007
From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen))
Date: Thu, 22 Feb 2007 13:25:57 -0800
Subject: [openib-general] anyone have OFED 1.2 alpha1 compiling on ppc64
In-Reply-To: <20070222090006.GA9727@mellanox.co.il>
References: <A15335FBE9BD2449AF2C9EF3D1EB8EA3030AD952@xmb-sjc-216.amer.cisco.com>
	<20070222090006.GA9727@mellanox.co.il>
Message-ID: <A15335FBE9BD2449AF2C9EF3D1EB8EA3030ADC00@xmb-sjc-216.amer.cisco.com>

How do I upload sources?

> -----Original Message-----
> From: Michael S. Tsirkin [mailto:mst at mellanox.co.il] 
> Sent: Thursday, February 22, 2007 1:00 AM
> To: Scott Weitzenkamp (sweitzen)
> Cc: Openfabrics-ewg at openib.org; OPENIB
> Subject: Re: anyone have OFED 1.2 alpha1 compiling on ppc64
> 
> > Quoting Scott Weitzenkamp (sweitzen) <sweitzen at cisco.com>:
> > Subject: anyone have OFED 1.2 alpha1 compiling on ppc64
> > 
> > I tried both RHEL4 and SLES10 usinstall.sh, and get this.  
> I filed bug 379,
> > anyone else tried ppc64?
> 
> Scott, could pls you upload the kernel sources and .config 
> files to staging?
> If you do, we'll be able to add these to mightly cross-build 
> environment.
> 	
> -- 
> MST
> 


From rdreier at cisco.com  Thu Feb 22 13:34:13 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Thu, 22 Feb 2007 13:34:13 -0800
Subject: [openib-general] [PATCH for-2.6.21] IPoIB/cm: improve small
 message bandwidth
In-Reply-To: <adamz35x50y.fsf@cisco.com> (Roland Dreier's message of
	"Thu, 22 Feb 2007 13:21:17 -0800")
References: <20070220181755.GC11825@mellanox.co.il>
	<adaejokdlp6.fsf@cisco.com> <20070222211442.GB9143@mellanox.co.il>
	<adamz35x50y.fsf@cisco.com>
Message-ID: <adaslcxvpuy.fsf@cisco.com>

OK, I applied the following patch (I had to change one line of your
patch to get it to apply because the small-message changed the context
so one chunk didn't apply).

Anyway I don't see any difference in small message latency or large
message throughput.  (Actually latency seems slightly worse but I
think the change is within my normal variability so I'm don't think
the difference is significant)

diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h b/drivers/infiniband/ulp/ipoib/ipoib.h
index 2594db2..20d7ad4 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib.h
+++ b/drivers/infiniband/ulp/ipoib/ipoib.h
@@ -98,9 +98,9 @@ enum {
 
 #define	IPOIB_OP_RECV   (1ul << 31)
 #ifdef CONFIG_INFINIBAND_IPOIB_CM
-#define	IPOIB_CM_OP_SRQ (1ul << 30)
+#define	IPOIB_OP_CM     (1ul << 30)
 #else
-#define	IPOIB_CM_OP_SRQ (0)
+#define	IPOIB_OP_CM     (0)
 #endif
 
 /* structs */
@@ -143,7 +143,6 @@ struct ipoib_cm_rx {
 
 struct ipoib_cm_tx {
 	struct ib_cm_id     *id;
-	struct ib_cq        *cq;
 	struct ib_qp        *qp;
 	struct list_head     list;
 	struct net_device   *dev;
@@ -232,6 +231,7 @@ struct ipoib_dev_priv {
 	unsigned             tx_tail;
 	struct ib_sge        tx_sge;
 	struct ib_send_wr    tx_wr;
+	unsigned             tx_outstanding;
 
 	struct ib_wc ibwc[IPOIB_NUM_WC];
 
@@ -438,6 +438,7 @@ void ipoib_cm_destroy_tx(struct ipoib_cm_tx *tx);
 void ipoib_cm_skb_too_long(struct net_device* dev, struct sk_buff *skb,
 			   unsigned int mtu);
 void ipoib_cm_handle_rx_wc(struct net_device *dev, struct ib_wc *wc);
+void ipoib_cm_handle_tx_wc(struct net_device *dev, struct ib_wc *wc);
 #else
 
 struct ipoib_cm_tx;
@@ -526,6 +527,9 @@ static inline void ipoib_cm_handle_rx_wc(struct net_device *dev, struct ib_wc *w
 {
 }
 
+static inline void ipoib_cm_handle_tx_wc(struct net_device *dev, struct ib_wc *wc)
+{
+}
 #endif
 
 #ifdef CONFIG_INFINIBAND_IPOIB_DEBUG
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_cm.c b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
index 3484e8b..9515ef6 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_cm.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
@@ -82,7 +82,7 @@ static int ipoib_cm_post_receive(struct net_device *dev, int id)
 	struct ib_recv_wr *bad_wr;
 	int i, ret;
 
-	priv->cm.rx_wr.wr_id = id | IPOIB_CM_OP_SRQ;
+	priv->cm.rx_wr.wr_id = id | IPOIB_OP_CM | IPOIB_OP_RECV;
 
 	for (i = 0; i < IPOIB_CM_RX_SG; ++i)
 		priv->cm.rx_sge[i].addr = priv->cm.srq_ring[id].mapping[i];
@@ -344,7 +344,7 @@ static void skb_put_frags(struct sk_buff *skb, unsigned int hdr_space,
 void ipoib_cm_handle_rx_wc(struct net_device *dev, struct ib_wc *wc)
 {
 	struct ipoib_dev_priv *priv = netdev_priv(dev);
-	unsigned int wr_id = wc->wr_id & ~IPOIB_CM_OP_SRQ;
+	unsigned int wr_id = wc->wr_id & ~(IPOIB_OP_CM | IPOIB_OP_RECV);
 	struct sk_buff *skb, *newskb;
 	struct ipoib_cm_rx *p;
 	unsigned long flags;
@@ -436,7 +436,7 @@ static inline int post_send(struct ipoib_dev_priv *priv,
 	priv->tx_sge.addr             = addr;
 	priv->tx_sge.length           = len;
 
-	priv->tx_wr.wr_id 	      = wr_id;
+	priv->tx_wr.wr_id 	      = wr_id | IPOIB_OP_CM;
 
 	return ib_post_send(tx->qp, &priv->tx_wr, &bad_wr);
 }
@@ -487,20 +487,19 @@ void ipoib_cm_send(struct net_device *dev, struct sk_buff *skb, struct ipoib_cm_
 		dev->trans_start = jiffies;
 		++tx->tx_head;
 
-		if (tx->tx_head - tx->tx_tail == ipoib_sendq_size) {
+		if (++priv->tx_outstanding == ipoib_sendq_size) {
 			ipoib_dbg(priv, "TX ring 0x%x full, stopping kernel net queue\n",
 				  tx->qp->qp_num);
 			netif_stop_queue(dev);
-			set_bit(IPOIB_FLAG_NETIF_STOPPED, &tx->flags);
 		}
 	}
 }
 
-static void ipoib_cm_handle_tx_wc(struct net_device *dev, struct ipoib_cm_tx *tx,
-				  struct ib_wc *wc)
+void ipoib_cm_handle_tx_wc(struct net_device *dev, struct ib_wc *wc)
 {
 	struct ipoib_dev_priv *priv = netdev_priv(dev);
-	unsigned int wr_id = wc->wr_id;
+	struct ipoib_cm_tx *tx = wc->qp->qp_context;
+	unsigned int wr_id = wc->wr_id & ~IPOIB_OP_CM;
 	struct ipoib_tx_buf *tx_req;
 	unsigned long flags;
 
@@ -525,11 +524,10 @@ static void ipoib_cm_handle_tx_wc(struct net_device *dev, struct ipoib_cm_tx *tx
 
 	spin_lock_irqsave(&priv->tx_lock, flags);
 	++tx->tx_tail;
-	if (unlikely(test_bit(IPOIB_FLAG_NETIF_STOPPED, &tx->flags)) &&
-	    tx->tx_head - tx->tx_tail <= ipoib_sendq_size >> 1) {
-		clear_bit(IPOIB_FLAG_NETIF_STOPPED, &tx->flags);
+	if (unlikely(--priv->tx_outstanding == ipoib_sendq_size >> 1) &&
+	    netif_queue_stopped(dev) &&
+	    test_bit(IPOIB_FLAG_ADMIN_UP, &priv->flags))
 		netif_wake_queue(dev);
-	}
 
 	if (wc->status != IB_WC_SUCCESS &&
 	    wc->status != IB_WC_WR_FLUSH_ERR) {
@@ -552,11 +550,6 @@ static void ipoib_cm_handle_tx_wc(struct net_device *dev, struct ipoib_cm_tx *tx
 			tx->neigh = NULL;
 		}
 
-		/* queue would be re-started anyway when TX is destroyed,
-		 * but it makes sense to do it ASAP here. */
-		if (test_and_clear_bit(IPOIB_FLAG_NETIF_STOPPED, &tx->flags))
-			netif_wake_queue(dev);
-
 		if (test_and_clear_bit(IPOIB_FLAG_INITIALIZED, &tx->flags)) {
 			list_move(&tx->list, &priv->cm.reap_list);
 			queue_work(ipoib_workqueue, &priv->cm.reap_task);
@@ -570,19 +563,6 @@ static void ipoib_cm_handle_tx_wc(struct net_device *dev, struct ipoib_cm_tx *tx
 	spin_unlock_irqrestore(&priv->tx_lock, flags);
 }
 
-static void ipoib_cm_tx_completion(struct ib_cq *cq, void *tx_ptr)
-{
-	struct ipoib_cm_tx *tx = tx_ptr;
-	int n, i;
-
-	ib_req_notify_cq(cq, IB_CQ_NEXT_COMP);
-	do {
-		n = ib_poll_cq(cq, IPOIB_NUM_WC, tx->ibwc);
-		for (i = 0; i < n; ++i)
-			ipoib_cm_handle_tx_wc(tx->dev, tx, tx->ibwc + i);
-	} while (n == IPOIB_NUM_WC);
-}
-
 int ipoib_cm_dev_open(struct net_device *dev)
 {
 	struct ipoib_dev_priv *priv = netdev_priv(dev);
@@ -702,17 +682,18 @@ static int ipoib_cm_rep_handler(struct ib_cm_id *cm_id, struct ib_cm_event *even
 	return 0;
 }
 
-static struct ib_qp *ipoib_cm_create_tx_qp(struct net_device *dev, struct ib_cq *cq)
+static struct ib_qp *ipoib_cm_create_tx_qp(struct net_device *dev, struct ipoib_cm_tx *tx)
 {
 	struct ipoib_dev_priv *priv = netdev_priv(dev);
 	struct ib_qp_init_attr attr = {};
 	attr.recv_cq = priv->cq;
+	attr.send_cq = priv->cq;
 	attr.srq = priv->cm.srq;
 	attr.cap.max_send_wr = ipoib_sendq_size;
 	attr.cap.max_send_sge = 1;
 	attr.sq_sig_type = IB_SIGNAL_ALL_WR;
 	attr.qp_type = IB_QPT_RC;
-	attr.send_cq = cq;
+	attr.qp_context = tx;
 	return ib_create_qp(priv->pd, &attr);
 }
 
@@ -792,21 +773,7 @@ static int ipoib_cm_tx_init(struct ipoib_cm_tx *p, u32 qpn,
 		goto err_tx;
 	}
 
-	p->cq = ib_create_cq(priv->ca, ipoib_cm_tx_completion, NULL, p,
-			     ipoib_sendq_size + 1);
-	if (IS_ERR(p->cq)) {
-		ret = PTR_ERR(p->cq);
-		ipoib_warn(priv, "failed to allocate tx cq: %d\n", ret);
-		goto err_cq;
-	}
-
-	ret = ib_req_notify_cq(p->cq, IB_CQ_NEXT_COMP);
-	if (ret) {
-		ipoib_warn(priv, "failed to request completion notification: %d\n", ret);
-		goto err_req_notify;
-	}
-
-	p->qp = ipoib_cm_create_tx_qp(p->dev, p->cq);
+	p->qp = ipoib_cm_create_tx_qp(p->dev, p);
 	if (IS_ERR(p->qp)) {
 		ret = PTR_ERR(p->qp);
 		ipoib_warn(priv, "failed to allocate tx qp: %d\n", ret);
@@ -843,12 +810,8 @@ err_modify:
 err_id:
 	p->id = NULL;
 	ib_destroy_qp(p->qp);
-err_req_notify:
 err_qp:
 	p->qp = NULL;
-	ib_destroy_cq(p->cq);
-err_cq:
-	p->cq = NULL;
 err_tx:
 	return ret;
 }
@@ -857,6 +820,7 @@ static void ipoib_cm_tx_destroy(struct ipoib_cm_tx *p)
 {
 	struct ipoib_dev_priv *priv = netdev_priv(p->dev);
 	struct ipoib_tx_buf *tx_req;
+	unsigned long flags;
 
 	ipoib_dbg(priv, "Destroy active connection 0x%x head 0x%x tail 0x%x\n",
 		  p->qp ? p->qp->qp_num : 0, p->tx_head, p->tx_tail);
@@ -867,12 +831,6 @@ static void ipoib_cm_tx_destroy(struct ipoib_cm_tx *p)
 	if (p->qp)
 		ib_destroy_qp(p->qp);
 
-	if (p->cq)
-		ib_destroy_cq(p->cq);
-
-	if (test_bit(IPOIB_FLAG_NETIF_STOPPED, &p->flags))
-		netif_wake_queue(p->dev);
-
 	if (p->tx_ring) {
 		while ((int) p->tx_tail - (int) p->tx_head < 0) {
 			tx_req = &p->tx_ring[p->tx_tail & (ipoib_sendq_size - 1)];
@@ -880,6 +838,12 @@ static void ipoib_cm_tx_destroy(struct ipoib_cm_tx *p)
 					 DMA_TO_DEVICE);
 			dev_kfree_skb_any(tx_req->skb);
 			++p->tx_tail;
+			spin_lock_irqsave(&priv->tx_lock, flags);
+			if (unlikely(--priv->tx_outstanding == ipoib_sendq_size >> 1) &&
+			    netif_queue_stopped(p->dev) &&
+			    test_bit(IPOIB_FLAG_ADMIN_UP, &priv->flags))
+				netif_wake_queue(p->dev);
+			spin_unlock_irqrestore(&priv->tx_lock, flags);
 		}
 
 		kfree(p->tx_ring);
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_ib.c b/drivers/infiniband/ulp/ipoib/ipoib_ib.c
index f2aa923..19a3d3e 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_ib.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_ib.c
@@ -266,11 +266,10 @@ static void ipoib_ib_handle_tx_wc(struct net_device *dev, struct ib_wc *wc)
 
 	spin_lock_irqsave(&priv->tx_lock, flags);
 	++priv->tx_tail;
-	if (unlikely(test_bit(IPOIB_FLAG_NETIF_STOPPED, &priv->flags)) &&
-	    priv->tx_head - priv->tx_tail <= ipoib_sendq_size >> 1) {
-		clear_bit(IPOIB_FLAG_NETIF_STOPPED, &priv->flags);
+	if (unlikely(--priv->tx_outstanding == ipoib_sendq_size >> 1) &&
+	    netif_queue_stopped(dev) &&
+	    test_bit(IPOIB_FLAG_ADMIN_UP, &priv->flags))
 		netif_wake_queue(dev);
-	}
 	spin_unlock_irqrestore(&priv->tx_lock, flags);
 
 	if (wc->status != IB_WC_SUCCESS &&
@@ -282,12 +281,17 @@ static void ipoib_ib_handle_tx_wc(struct net_device *dev, struct ib_wc *wc)
 
 static void ipoib_ib_handle_wc(struct net_device *dev, struct ib_wc *wc)
 {
-	if (wc->wr_id & IPOIB_CM_OP_SRQ)
-		ipoib_cm_handle_rx_wc(dev, wc);
-	else if (wc->wr_id & IPOIB_OP_RECV)
-		ipoib_ib_handle_rx_wc(dev, wc);
-	else
-		ipoib_ib_handle_tx_wc(dev, wc);
+	if (wc->wr_id & IPOIB_OP_CM) {
+		if (wc->wr_id & IPOIB_OP_RECV)
+			ipoib_cm_handle_rx_wc(dev, wc);
+		else
+			ipoib_cm_handle_tx_wc(dev, wc);
+	} else {
+		if (wc->wr_id & IPOIB_OP_RECV)
+			ipoib_ib_handle_rx_wc(dev, wc);
+		else
+			ipoib_ib_handle_tx_wc(dev, wc);
+	}
 }
 
 void ipoib_ib_completion(struct ib_cq *cq, void *dev_ptr)
@@ -370,10 +374,9 @@ void ipoib_send(struct net_device *dev, struct sk_buff *skb,
 		address->last_send = priv->tx_head;
 		++priv->tx_head;
 
-		if (priv->tx_head - priv->tx_tail == ipoib_sendq_size) {
+		if (++priv->tx_outstanding == ipoib_sendq_size) {
 			ipoib_dbg(priv, "TX ring full, stopping kernel net queue\n");
 			netif_stop_queue(dev);
-			set_bit(IPOIB_FLAG_NETIF_STOPPED, &priv->flags);
 		}
 	}
 }
@@ -549,6 +552,7 @@ int ipoib_ib_dev_stop(struct net_device *dev)
 						    DMA_TO_DEVICE);
 				dev_kfree_skb_any(tx_req->skb);
 				++priv->tx_tail;
+				--priv->tx_outstanding;
 			}
 
 			for (i = 0; i < ipoib_recvq_size; ++i) {
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c
index 18d27fd..ece1a0c 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
@@ -900,7 +900,7 @@ int ipoib_dev_init(struct net_device *dev, struct ib_device *ca, int port)
 		goto out_rx_ring_cleanup;
 	}
 
-	/* priv->tx_head & tx_tail are already 0 */
+	/* priv->tx_head, tx_tail & tx_outstanding are already 0 */
 
 	if (ipoib_ib_dev_init(dev, ca, port))
 		goto out_tx_ring_cleanup;


From mst at mellanox.co.il  Thu Feb 22 13:38:05 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Thu, 22 Feb 2007 23:38:05 +0200
Subject: [openib-general] anyone have OFED 1.2 alpha1 compiling on ppc64
In-Reply-To: <A15335FBE9BD2449AF2C9EF3D1EB8EA3030ADC00@xmb-sjc-216.amer.cisco.com>
References: <A15335FBE9BD2449AF2C9EF3D1EB8EA3030AD952@xmb-sjc-216.amer.cisco.com>
	<20070222090006.GA9727@mellanox.co.il>
	<A15335FBE9BD2449AF2C9EF3D1EB8EA3030ADC00@xmb-sjc-216.amer.cisco.com>
Message-ID: <20070222213805.GC9143@mellanox.co.il>

Don't you have an account at ssh.openfabrics.org?
If yes, just put kernel sources and the .config under your home directory

Quoting r. Scott Weitzenkamp (sweitzen) <sweitzen at cisco.com>:
Subject: Re: anyone have OFED 1.2 alpha1 compiling on ppc64

How do I upload sources?

> -----Original Message-----
> From: Michael S. Tsirkin [mailto:mst at mellanox.co.il] 
> Sent: Thursday, February 22, 2007 1:00 AM
> To: Scott Weitzenkamp (sweitzen)
> Cc: Openfabrics-ewg at openib.org; OPENIB
> Subject: Re: anyone have OFED 1.2 alpha1 compiling on ppc64
> 
> > Quoting Scott Weitzenkamp (sweitzen) <sweitzen at cisco.com>:
> > Subject: anyone have OFED 1.2 alpha1 compiling on ppc64
> > 
> > I tried both RHEL4 and SLES10 usinstall.sh, and get this.  
> I filed bug 379,
> > anyone else tried ppc64?
> 
> Scott, could pls you upload the kernel sources and .config 
> files to staging?
> If you do, we'll be able to add these to mightly cross-build 
> environment.
> 	
> -- 
> MST
> 

_______________________________________________
openib-general mailing list
openib-general at openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

-- 
MST


From mst at mellanox.co.il  Thu Feb 22 13:42:24 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Thu, 22 Feb 2007 23:42:24 +0200
Subject: [openib-general] [PATCH for-2.6.21] IPoIB/cm: improve small
 message bandwidth
In-Reply-To: <adaslcxvpuy.fsf@cisco.com>
References: <20070220181755.GC11825@mellanox.co.il>
	<adaejokdlp6.fsf@cisco.com> <20070222211442.GB9143@mellanox.co.il>
	<adamz35x50y.fsf@cisco.com> <adaslcxvpuy.fsf@cisco.com>
Message-ID: <20070222214223.GD9143@mellanox.co.il>

> Quoting Roland Dreier <rdreier at cisco.com>:
> Subject: Re: [PATCH for-2.6.21] IPoIB/cm: improve small message bandwidth
> 
> OK, I applied the following patch (I had to change one line of your
> patch to get it to apply because the small-message changed the context
> so one chunk didn't apply).
> 
> Anyway I don't see any difference in small message latency or large
> message throughput.  (Actually latency seems slightly worse but I
> think the change is within my normal variability so I'm don't think
> the difference is significant)

OK, thanks for testing this.
I need to spend more time on reproducing this issue, and profiling.
I'll add this to my todo list.


-- 
MST


From sweitzen at cisco.com  Thu Feb 22 13:53:02 2007
From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen))
Date: Thu, 22 Feb 2007 13:53:02 -0800
Subject: [openib-general] anyone have OFED 1.2 alpha1 compiling on ppc64
In-Reply-To: <20070222213805.GC9143@mellanox.co.il>
References: <A15335FBE9BD2449AF2C9EF3D1EB8EA3030AD952@xmb-sjc-216.amer.cisco.com>
	<20070222090006.GA9727@mellanox.co.il>
	<A15335FBE9BD2449AF2C9EF3D1EB8EA3030ADC00@xmb-sjc-216.amer.cisco.com>
	<20070222213805.GC9143@mellanox.co.il>
Message-ID: <A15335FBE9BD2449AF2C9EF3D1EB8EA3030ADC26@xmb-sjc-216.amer.cisco.com>

> Don't you have an account at ssh.openfabrics.org?

Can an admin please give me an account?

Scott


From mst at mellanox.co.il  Thu Feb 22 14:00:18 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Fri, 23 Feb 2007 00:00:18 +0200
Subject: [openib-general] anyone have OFED 1.2 alpha1 compiling on ppc64
In-Reply-To: <A15335FBE9BD2449AF2C9EF3D1EB8EA3030ADC26@xmb-sjc-216.amer.cisco.com>
References: <A15335FBE9BD2449AF2C9EF3D1EB8EA3030ADC26@xmb-sjc-216.amer.cisco.com>
Message-ID: <20070222220018.GB4542@mellanox.co.il>

> Quoting Scott Weitzenkamp (sweitzen) <sweitzen at cisco.com>:
> Subject: RE: anyone have OFED 1.2 alpha1 compiling on ppc64
> 
> > Don't you have an account at ssh.openfabrics.org?
> 
> Can an admin please give me an account?

I'm not an admin but I think you want to post your ssh
public key.

-- 
MST


From mshefty at ichips.intel.com  Thu Feb 22 14:18:43 2007
From: mshefty at ichips.intel.com (Sean Hefty)
Date: Thu, 22 Feb 2007 14:18:43 -0800
Subject: [openib-general] [PATCH] librdmacm: fix bug causing failure to
 work with partial membership pkey
In-Reply-To: <15ddcffd0702221309q4633a36cg8a7bb5ff69d78776@mail.gmail.com>
References: <1172155552.4380.475949.camel@hal.voltaire.com>
	<000301c756aa$b89e0020$8698070a@amr.corp.intel.com>
	<15ddcffd0702221309q4633a36cg8a7bb5ff69d78776@mail.gmail.com>
Message-ID: <45DE16C3.5020809@ichips.intel.com>

>>Can someone help my understanding here?  Is ipoib joining a multicast group
>>using the full membership PKey, even if the node that it joins from only has the
>>limited membership PKey configured? And the code in ib_find_cached_pkey helps
>>enable this?
> 
> Yep. The ipoib create_child  function Or-s 0x8000  to the device pkey
> which was provided by the user. Now, IPoIB uses the device pkey when
> forming MGIDs and when doing modify qp to init. Indeed the way
> ib_find_cached_pkey() is implemented, make the latter use trivial.

Doesn't this allow ipoib to join a multicast group for which it may not be able 
to communicate with all members?  For the broadcast group, this seems like an 
error to me.  Can ipoib work in such a configuration?  If all nodes were 
assigned a partial membership PKey, none of them could communicate, but no 
errors would be generated anywhere.

Joining a multicast group requires specifying the full membership PKey.  I don't 
see anything in the spec that explicitly prohibits joining the group from a node 
with only a partial membership PKey, but at first glance, this seems like a 
subnet configuration issue.  Is there some use of this I'm overlooking?

- Sean


From divy at chelsio.com  Thu Feb 22 14:21:53 2007
From: divy at chelsio.com (Divy Le Ray)
Date: Thu, 22 Feb 2007 14:21:53 -0800
Subject: [openib-general] [PATCH 0/7] cxgb3 - Chelsio T3 1G/10G driver
	updates
In-Reply-To: <1172153597.23995.9.camel@stevo-desktop>
References: <45DD8559.7090106@chelsio.com>
	<1172153597.23995.9.camel@stevo-desktop>
Message-ID: <45DE1781.5000407@chelsio.com>

Steve Wise wrote:
> Divy,
>
> Do these need to be pulled into OFED 1.2 as well?
>   

Hi Steve,

Yes, I believe so.

Cheers,
Divy
> Steve.
>
>
> On Thu, 2007-02-22 at 03:58 -0800, Divy Le Ray wrote:
>   
>> Jeff,
>>
>> I'm sending a series of incremental patches updating
>> the cxgb3 driver. These patches are built against Linus'git tree.
>>
>> Cheers,
>> Divy
>> -
>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> the body of a message to majordomo at vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at  http://www.tux.org/lkml/
>>     
>
>   


From mst at mellanox.co.il  Thu Feb 22 15:19:17 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Fri, 23 Feb 2007 01:19:17 +0200
Subject: [openib-general] IPOIB NAPI
In-Reply-To: <adar6x7x5rq.fsf@cisco.com>
References: <adar6x7x5rq.fsf@cisco.com>
Message-ID: <20070222231917.GC9059@mellanox.co.il>

>  > An API idea:
>  > how about instead testing missed_events, we add a flag:
>  > 
>  > IB_CQ_TEST (or a longer name IB_CQ_REPORT_MISSED_EVENTS?)
>  > and change ib_req_notify_cq to return int which will keep
>  > the missed_events value, only if this flag is set?
>  > 
>  > This has 2 advatages
>  > - Less churn updating all users to new API - they just ignore return value -
>  >   and still almost no overhead for them as they don't set IB_CQ_TEST
>  > - For all users we have to push less values on stack - note compiler can't
>  >   get rid of them as we are calling function through a pointer
>  > - For users that do
>  >   missed_events = ib_req_notify_cq(priv->cq, IB_CQ_NEXT_COMP | IB_CQ_TEST)
>  >   we get the result in register.
> 
> Yes, I like this.  So ib_req_notify_cq() gets a return value that is
> negative if an error occurred, 0 if everything is fine, or positive if
> a missed event might have happened.
> 
> I think I prefer the longer name IB_CQ_REPORT_MISSED_EVENTS -- at
> least there's a chance at guessing what it means even if you don't
> read the documentation.

By the way, how about extending the userspace API in a similiar
fashion?

missed_events = ibv_req_notify_cq(priv->cq, IBV_CQ_NEXT_COMP |
				  IBV_CQ_REPORT_MISSED_EVENTS)


-- 
MST


From rdreier at cisco.com  Thu Feb 22 15:21:11 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Thu, 22 Feb 2007 15:21:11 -0800
Subject: [openib-general] IPOIB NAPI
In-Reply-To: <20070222231917.GC9059@mellanox.co.il> (Michael S.
	Tsirkin's message of "Fri, 23 Feb 2007 01:19:17 +0200")
References: <adar6x7x5rq.fsf@cisco.com> <20070222231917.GC9059@mellanox.co.il>
Message-ID: <adahctdu6c8.fsf@cisco.com>

 > By the way, how about extending the userspace API in a similiar
 > fashion?
 > 
 > missed_events = ibv_req_notify_cq(priv->cq, IBV_CQ_NEXT_COMP |
 > 				  IBV_CQ_REPORT_MISSED_EVENTS)

It would require a kernel-user ABI bump.  Is it worth it?

 - R.


From sean.hefty at intel.com  Thu Feb 22 15:35:23 2007
From: sean.hefty at intel.com (Sean Hefty)
Date: Thu, 22 Feb 2007 15:35:23 -0800
Subject: [openib-general] ipoib & the partial pkey,
 was: librdmacm: fix bug causing failure to work with partial
 membership pkey
In-Reply-To: <45DE16C3.5020809@ichips.intel.com>
Message-ID: <000401c756da$1f9387d0$8698070a@amr.corp.intel.com>

>Doesn't this allow ipoib to join a multicast group for which it may not be able
>to communicate with all members?  For the broadcast group, this seems like an
>error to me.  Can ipoib work in such a configuration?  If all nodes were
>assigned a partial membership PKey, none of them could communicate, but no
>errors would be generated anywhere.

I looked into this more...

RFC 4391 states (middle of page 5):

For a node to join a partition, one of its ports must be assigned the relevant
P_Key by the SM [RFC4392].

Jumping to RFC 4392 (top of page 4):

at the time of creating an IB multicast group, multiple values such as the
P_Key, Q_Key, Service Level, Hop Limit, Flow ID, TClass, MTU, etc.  have to be
specified.  These values should be such that all potential members of the IB
multicast group are able to communicate with one another when using them.

and page 14:

Note that this IB_join to the broadcast group is a FullMember join.  If any of
the ports or the switches linking the port to the rest of the IPoIB subnet
cannot support the parameters (e.g., path MTU or P_Key) associated with the
broadcast group, then the IB_join request will fail and the requesting port will
not become part of the IPoIB subnet

My initial interpretation of these statements lead me to believe that pkey check
in ib_find_cached_pkey should not mask out the upper bit, which would prevent
ipoib from joining a multicast group until it has been configured with the full
membership pkey for the broadcast group.  Does this seem reasonable?

- Sean


From mst at mellanox.co.il  Thu Feb 22 15:46:24 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Fri, 23 Feb 2007 01:46:24 +0200
Subject: [openib-general] IPOIB NAPI
In-Reply-To: <adahctdu6c8.fsf@cisco.com>
References: <adahctdu6c8.fsf@cisco.com>
Message-ID: <20070222234624.GB4447@mellanox.co.il>

> Quoting Roland Dreier <rdreier at cisco.com>:
> Subject: Re: IPOIB NAPI
> 
>  > By the way, how about extending the userspace API in a similiar
>  > fashion?
>  > 
>  > missed_events = ibv_req_notify_cq(priv->cq, IBV_CQ_NEXT_COMP |
>  > 				  IBV_CQ_REPORT_MISSED_EVENTS)
> 
> It would require a kernel-user ABI bump. Is it worth it?

I hear some people asking for it: I imagine reasons are same as NAPI -
race-free, clean API to switch from polling to event mode -
rather than a minor optimization.

-- 
MST


From mst at mellanox.co.il  Thu Feb 22 15:57:24 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Fri, 23 Feb 2007 01:57:24 +0200
Subject: [openib-general] [PATCH] libmthca: optimize calls to htonl with
 constant parameter
Message-ID: <20070222235724.GC4447@mellanox.co.il>

GCC seems to be unable to propogate constants across calls to htonl.
So it turns out to be worth the while to replace htonl with
a hand-written macro in case of constant parameter.

Signed-off-by: Michael S. Tsirkin <mst at mellanox.co.il>
Signed-off-by: Ishai Rabinovitz <ishai at mellanox.co.il>

---

Roland, I'm looking at micro-optimizing libmthca/mthca some more.
The following optimization is minor, but it seems quite safe.
What do you think? Tested with gcc 4.0.3.

diff --git a/src/cq.c b/src/cq.c
index 0aeb7a9..9428f74 100644
--- a/src/cq.c
+++ b/src/cq.c
@@ -275,7 +275,7 @@ static int handle_error_cqe(struct mthca_cq *cq,
 	 * doorbell count field.  In that case we always free the CQE.
 	 */
 	if (mthca_is_memfree(cq->ibv_cq.context) ||
-	    !(new_wqe & htonl(0x3f)) || (!cqe->db_cnt && dbd))
+	    !(new_wqe & CONSTANT_HTONL(0x3f)) || (!cqe->db_cnt && dbd))
 		return 0;
 
 	cqe->db_cnt   = htons(ntohs(cqe->db_cnt) - dbd);
diff --git a/src/mthca.h b/src/mthca.h
index 1f31bc3..798029f 100644
--- a/src/mthca.h
+++ b/src/mthca.h
@@ -112,6 +112,20 @@ enum {
 	MTHCA_OPCODE_INVALID        = 0xff
 };
 
+/* GCC does not seem to be able to do constant propogation
+ * across htonl/ntohl calls */
+#if __BYTE_ORDER == __LITTLE_ENDIAN
+#define CONSTANT_HTONL(x)                            \
+	        ((((unsigned)x) >> 24)             | \
+		 ((((unsigned)x) >> 8) & 0xff00)   | \
+		 ((((unsigned)x) << 8) & 0xff0000) | \
+		 (((unsigned)x) << 24))
+#elif __BYTE_ORDER == __BIG_ENDIAN
+#define CONSTANT_HTONL(x) (x)
+#else
+#define CONSTANT_HTONL(x) htonl(x)
+#endif
+
 struct mthca_ah_page;
 
 struct mthca_device {
diff --git a/src/qp.c b/src/qp.c
index f2483e9..85d3385 100644
--- a/src/qp.c
+++ b/src/qp.c
@@ -138,10 +138,10 @@ int mthca_tavor_post_send(struct ibv_qp *ibqp, struct ibv_send_wr *wr,
 		((struct mthca_next_seg *) wqe)->ee_nds = 0;
 		((struct mthca_next_seg *) wqe)->flags =
 			((wr->send_flags & IBV_SEND_SIGNALED) ?
-			 htonl(MTHCA_NEXT_CQ_UPDATE) : 0) |
+			 CONSTANT_HTONL(MTHCA_NEXT_CQ_UPDATE) : 0) |
 			((wr->send_flags & IBV_SEND_SOLICITED) ?
-			 htonl(MTHCA_NEXT_SOLICIT) : 0)   |
-			htonl(1);
+			 CONSTANT_HTONL(MTHCA_NEXT_SOLICIT) : 0)   |
+			CONSTANT_HTONL(1);
 		if (wr->opcode == IBV_WR_SEND_WITH_IMM ||
 		    wr->opcode == IBV_WR_RDMA_WRITE_WITH_IMM)
 			((struct mthca_next_seg *) wqe)->imm = wr->imm_data;
@@ -359,9 +359,9 @@ int mthca_tavor_post_recv(struct ibv_qp *ibqp, struct ibv_recv_wr *wr,
 
 		((struct mthca_next_seg *) wqe)->nda_op = 0;
 		((struct mthca_next_seg *) wqe)->ee_nds =
-			htonl(MTHCA_NEXT_DBD);
+			CONSTANT_HTONL(MTHCA_NEXT_DBD);
 		((struct mthca_next_seg *) wqe)->flags =
-			htonl(MTHCA_NEXT_CQ_UPDATE);
+			CONSTANT_HTONL(MTHCA_NEXT_CQ_UPDATE);
 
 		wqe += sizeof (struct mthca_next_seg);
 		size = sizeof (struct mthca_next_seg) / 16;
@@ -505,10 +505,10 @@ int mthca_arbel_post_send(struct ibv_qp *ibqp, struct ibv_send_wr *wr,
 
 		((struct mthca_next_seg *) wqe)->flags =
 			((wr->send_flags & IBV_SEND_SIGNALED) ?
-			 htonl(MTHCA_NEXT_CQ_UPDATE) : 0) |
+			 CONSTANT_HTONL(MTHCA_NEXT_CQ_UPDATE) : 0) |
 			((wr->send_flags & IBV_SEND_SOLICITED) ?
-			 htonl(MTHCA_NEXT_SOLICIT) : 0)   |
-			htonl(1);
+			 CONSTANT_HTONL(MTHCA_NEXT_SOLICIT) : 0)   |
+			CONSTANT_HTONL(1);
 		if (wr->opcode == IBV_WR_SEND_WITH_IMM ||
 		    wr->opcode == IBV_WR_RDMA_WRITE_WITH_IMM)
 			((struct mthca_next_seg *) wqe)->imm = wr->imm_data;
@@ -750,7 +750,7 @@ int mthca_arbel_post_recv(struct ibv_qp *ibqp, struct ibv_recv_wr *wr,
 
 		if (i < qp->rq.max_gs) {
 			((struct mthca_data_seg *) wqe)->byte_count = 0;
-			((struct mthca_data_seg *) wqe)->lkey = htonl(MTHCA_INVAL_LKEY);
+			((struct mthca_data_seg *) wqe)->lkey = CONSTANT_HTONL(MTHCA_INVAL_LKEY);
 			((struct mthca_data_seg *) wqe)->addr = 0;
 		}
 
@@ -872,7 +872,7 @@ int mthca_alloc_qp_buf(struct ibv_pd *pd, struct ibv_qp_cap *cap,
 			for (scatter = (void *) (next + 1);
 			     (void *) scatter < (void *) next + (1 << qp->rq.wqe_shift);
 			     ++scatter)
-				scatter->lkey = htonl(MTHCA_INVAL_LKEY);
+				scatter->lkey = CONSTANT_HTONL(MTHCA_INVAL_LKEY);
 		}
 
 		for (i = 0; i < qp->sq.max; ++i) {
@@ -956,10 +956,10 @@ int mthca_free_err_wqe(struct mthca_qp *qp, int is_send,
 	else
 		next = get_recv_wqe(qp, index);
 
-	*dbd = !!(next->ee_nds & htonl(MTHCA_NEXT_DBD));
-	if (next->ee_nds & htonl(0x3f))
-		*new_wqe = (next->nda_op & htonl(~0x3f)) |
-			(next->ee_nds & htonl(0x3f));
+	*dbd = !!(next->ee_nds & CONSTANT_HTONL(MTHCA_NEXT_DBD));
+	if (next->ee_nds & CONSTANT_HTONL(0x3f))
+		*new_wqe = (next->nda_op & CONSTANT_HTONL(~0x3f)) |
+			(next->ee_nds & CONSTANT_HTONL(0x3f));
 	else
 		*new_wqe = 0;
 
diff --git a/src/srq.c b/src/srq.c
index f9fc006..e27c8dc 100644
--- a/src/srq.c
+++ b/src/srq.c
@@ -142,7 +142,7 @@ int mthca_tavor_post_srq_recv(struct ibv_srq *ibsrq,
 
 		if (i < srq->max_gs) {
 			((struct mthca_data_seg *) wqe)->byte_count = 0;
-			((struct mthca_data_seg *) wqe)->lkey = htonl(MTHCA_INVAL_LKEY);
+			((struct mthca_data_seg *) wqe)->lkey = CONSTANT_HTONL(MTHCA_INVAL_LKEY);
 			((struct mthca_data_seg *) wqe)->addr = 0;
 		}
 
@@ -150,7 +150,7 @@ int mthca_tavor_post_srq_recv(struct ibv_srq *ibsrq,
 			htonl((ind << srq->wqe_shift) | 1);
 		wmb();
 		((struct mthca_next_seg *) prev_wqe)->ee_nds =
-			htonl(MTHCA_NEXT_DBD);
+			CONSTANT_HTONL(MTHCA_NEXT_DBD);
 
 		srq->wrid[ind]  = wr->wr_id;
 		srq->first_free = next_ind;
@@ -247,7 +247,7 @@ int mthca_arbel_post_srq_recv(struct ibv_srq *ibsrq,
 
 		if (i < srq->max_gs) {
 			((struct mthca_data_seg *) wqe)->byte_count = 0;
-			((struct mthca_data_seg *) wqe)->lkey = htonl(MTHCA_INVAL_LKEY);
+			((struct mthca_data_seg *) wqe)->lkey = CONSTANT_HTONL(MTHCA_INVAL_LKEY);
 			((struct mthca_data_seg *) wqe)->addr = 0;
 		}
 
@@ -313,7 +313,7 @@ int mthca_alloc_srq_buf(struct ibv_pd *pd, struct ibv_srq_attr *attr,
 		for (scatter = wqe + sizeof (struct mthca_next_seg);
 		     (void *) scatter < wqe + (1 << srq->wqe_shift);
 		     ++scatter)
-			scatter->lkey = htonl(MTHCA_INVAL_LKEY);
+			scatter->lkey = CONSTANT_HTONL(MTHCA_INVAL_LKEY);
 	}
 
 	srq->first_free = 0;

-- 
MST


From sean.hefty at intel.com  Thu Feb 22 16:59:07 2007
From: sean.hefty at intel.com (Sean Hefty)
Date: Thu, 22 Feb 2007 16:59:07 -0800
Subject: [openib-general] [PATCH] 2.6.21-rc1: please pull rdma-dev.git
	for-roland
Message-ID: <000501c756e5$d26d0c90$8698070a@amr.corp.intel.com>

Roland,

Please consider the following minor fixes for 2.6.21:

rdma_cm: remove unused node_guid from cma_device structure.
ib_cm: remove ca_guid from cm_device structure.
rdma_cm: request reversible paths only.
ib_core: Set hop limit in ib_init_ah_from_wc correctly.

The patches are in git.openfabrics.org/~shefty/rdma-dev.git,
for-roland branch, which is based on 2.6.21-rc1.

Signed-off-by: Sean Hefty <sean.hefty at intel.com>
---
commit 28e218621d36cf9da42f07af08775769eb289fc0
Author: Sean Hefty <sean.hefty at intel.com>
Date:   Thu Feb 22 11:37:44 2007 -0800

    rdma_cm: remove unused node_guid from cma_device structure.

diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index bb27ce9..d441815 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -77,7 +77,6 @@ static int next_port;
 struct cma_device {
 	struct list_head	list;
 	struct ib_device	*device;
-	__be64			node_guid;
 	struct completion	comp;
 	atomic_t		refcount;
 	struct list_head	id_list;
@@ -2674,7 +2673,6 @@ static void cma_add_one(struct ib_device *device)
 		return;
 
 	cma_dev->device = device;
-	cma_dev->node_guid = device->node_guid;
 
 	init_completion(&cma_dev->comp);
 	atomic_set(&cma_dev->refcount, 1);

commit 6de97f2a3373357d720b1653dfc0aac6d40b7506
Author: Sean Hefty <sean.hefty at intel.com>
Date:   Thu Feb 22 11:37:38 2007 -0800

    ib_cm: remove ca_guid from cm_device structure.
    
    The cm_device references an ib_device, which contains the node_guid.

diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
index d446998..842cd0b 100644
--- a/drivers/infiniband/core/cm.c
+++ b/drivers/infiniband/core/cm.c
@@ -88,7 +88,6 @@ struct cm_port {
 struct cm_device {
 	struct list_head list;
 	struct ib_device *device;
-	__be64 ca_guid;
 	struct cm_port port[0];
 };
 
@@ -739,8 +738,8 @@ retest:
 		ib_cancel_mad(cm_id_priv->av.port->mad_agent, cm_id_priv->msg);
 		spin_unlock_irqrestore(&cm_id_priv->lock, flags);
 		ib_send_cm_rej(cm_id, IB_CM_REJ_TIMEOUT,
-			       &cm_id_priv->av.port->cm_dev->ca_guid,
-			       sizeof cm_id_priv->av.port->cm_dev->ca_guid,
+			       &cm_id_priv->id.device->node_guid,
+			       sizeof cm_id_priv->id.device->node_guid,
 			       NULL, 0);
 		break;
 	case IB_CM_REQ_RCVD:
@@ -883,7 +882,7 @@ static void cm_format_req(struct cm_req_msg *req_msg,
 
 	req_msg->local_comm_id = cm_id_priv->id.local_id;
 	req_msg->service_id = param->service_id;
-	req_msg->local_ca_guid = cm_id_priv->av.port->cm_dev->ca_guid;
+	req_msg->local_ca_guid = cm_id_priv->id.device->node_guid;
 	cm_req_set_local_qpn(req_msg, cpu_to_be32(param->qp_num));
 	cm_req_set_resp_res(req_msg, param->responder_resources);
 	cm_req_set_init_depth(req_msg, param->initiator_depth);
@@ -1442,7 +1441,7 @@ static void cm_format_rep(struct cm_rep_msg *rep_msg,
 	cm_rep_set_flow_ctrl(rep_msg, param->flow_control);
 	cm_rep_set_rnr_retry_count(rep_msg, param->rnr_retry_count);
 	cm_rep_set_srq(rep_msg, param->srq);
-	rep_msg->local_ca_guid = cm_id_priv->av.port->cm_dev->ca_guid;
+	rep_msg->local_ca_guid = cm_id_priv->id.device->node_guid;
 
 	if (param->private_data && param->private_data_len)
 		memcpy(rep_msg->private_data, param->private_data,
@@ -3385,7 +3384,6 @@ static void cm_add_one(struct ib_device *device)
 		return;
 
 	cm_dev->device = device;
-	cm_dev->ca_guid = device->node_guid;
 
 	set_bit(IB_MGMT_METHOD_SEND, reg_req.method_mask);
 	for (i = 1; i <= device->phys_port_cnt; i++) {

commit 87680047dd09ca4a4e8ec575dad215c92cf45ed3
Author: Sean Hefty <sean.hefty at intel.com>
Date:   Wed Feb 21 16:40:44 2007 -0800

    rdma_cm: request reversible paths only
    
    The rdma_cm requires that path records be reversible.  Set the reversible
    bit when issuing an path record query.

diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index f8d69b3..bb27ce9 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -1492,11 +1492,13 @@ static int cma_query_ib_route(struct rdma_id_private *id_priv, int
timeout_ms,
 	ib_addr_get_dgid(addr, &path_rec.dgid);
 	path_rec.pkey = cpu_to_be16(ib_addr_get_pkey(addr));
 	path_rec.numb_path = 1;
+	path_rec.reversible = 1;
 
 	id_priv->query_id = ib_sa_path_rec_get(&sa_client, id_priv->id.device,
 				id_priv->id.port_num, &path_rec,
 				IB_SA_PATH_REC_DGID | IB_SA_PATH_REC_SGID |
-				IB_SA_PATH_REC_PKEY | IB_SA_PATH_REC_NUMB_PATH,
+				IB_SA_PATH_REC_PKEY | IB_SA_PATH_REC_NUMB_PATH |
+				IB_SA_PATH_REC_REVERSIBLE,
 				timeout_ms, GFP_KERNEL,
 				cma_query_handler, work, &id_priv->query);
 

commit 30947e5b7db42184d66746ac1187d4abbf89018d
Author: Sean Hefty <sean.hefty at intel.com>
Date:   Wed Feb 21 16:37:31 2007 -0800

    ib_core: Set hop limit in ib_init_ah_from_wc correctly.
    
    The hop_limit value in the ah_attr should be 0xFF, not the value read
    from the received GRH (which should be 0).  See 13.5.4.4 in the 1.2 IB spec.

diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index 8b5dd36..ccdf93d 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -167,7 +167,7 @@ int ib_init_ah_from_wc(struct ib_device *device, u8 port_num, struct ib_wc *wc,
 		ah_attr->grh.sgid_index = (u8) gid_index;
 		flow_class = be32_to_cpu(grh->version_tclass_flow);
 		ah_attr->grh.flow_label = flow_class & 0xFFFFF;
-		ah_attr->grh.hop_limit = grh->hop_limit;
+		ah_attr->grh.hop_limit = 0xFF;
 		ah_attr->grh.traffic_class = (flow_class >> 20) & 0xFF;
 	}
 	return 0;


From rdreier at cisco.com  Thu Feb 22 17:10:40 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Thu, 22 Feb 2007 17:10:40 -0800
Subject: [openib-general] [PATCH] 2.6.21-rc1: please pull rdma-dev.git
	for-roland
In-Reply-To: <000501c756e5$d26d0c90$8698070a@amr.corp.intel.com> (Sean
	Hefty's message of "Thu, 22 Feb 2007 16:59:07 -0800")
References: <000501c756e5$d26d0c90$8698070a@amr.corp.intel.com>
Message-ID: <adaejohk7an.fsf@cisco.com>

These all look fine, I'll queue them up.

> Signed-off-by: Sean Hefty <sean.hefty at intel.com>

I notice that the actual patches you committed don't have your
sign-off in the git changelog.  I assume this is a mistake so I'll add
it back in...


From rdreier at cisco.com  Thu Feb 22 17:15:09 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Thu, 22 Feb 2007 17:15:09 -0800
Subject: [openib-general] [PATCH] 2.6.21-rc1: please pull rdma-dev.git
 for-roland
In-Reply-To: <adaejohk7an.fsf@cisco.com> (Roland Dreier's message of
	"Thu, 22 Feb 2007 17:10:40 -0800")
References: <000501c756e5$d26d0c90$8698070a@amr.corp.intel.com>
	<adaejohk7an.fsf@cisco.com>
Message-ID: <ada3b4xk736.fsf@cisco.com>

 > I notice that the actual patches you committed don't have your
 > sign-off in the git changelog.  I assume this is a mistake so I'll add
 > it back in...

which means I can't just pull your branch.  But that's OK, still doing
git format-patch, edit patches, git am is pretty easy.


From rdreier at cisco.com  Thu Feb 22 17:51:30 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Thu, 22 Feb 2007 17:51:30 -0800
Subject: [openib-general] [PATCH] 2.6.21-rc1: please pull rdma-dev.git
	for-roland
In-Reply-To: <000501c756e5$d26d0c90$8698070a@amr.corp.intel.com> (Sean
	Hefty's message of "Thu, 22 Feb 2007 16:59:07 -0800")
References: <000501c756e5$d26d0c90$8698070a@amr.corp.intel.com>
Message-ID: <adak5y9iqu5.fsf@cisco.com>

 > The patches are in git.openfabrics.org/~shefty/rdma-dev.git,
 > for-roland branch, which is based on 2.6.21-rc1.

One other request: please include a URL that I can just copy and
paste, so I don't actually have to read and parse complete sentences.
Something like:

the patches are in

    git://git.openfabrics.org/~shefty/rdma-dev.git for-roland

 - R.


From rdreier at cisco.com  Thu Feb 22 17:55:36 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Thu, 22 Feb 2007 17:55:36 -0800
Subject: [openib-general] [PATCH] 2.6.21-rc1: please pull rdma-dev.git
 for-roland
In-Reply-To: <adak5y9iqu5.fsf@cisco.com> (Roland Dreier's message of
	"Thu, 22 Feb 2007 17:51:30 -0800")
References: <000501c756e5$d26d0c90$8698070a@amr.corp.intel.com>
	<adak5y9iqu5.fsf@cisco.com>
Message-ID: <ada8xepiqnb.fsf@cisco.com>

Anyway, all 4 queued up in my for-2.6.21 branch


From sean.hefty at intel.com  Thu Feb 22 21:10:32 2007
From: sean.hefty at intel.com (Sean Hefty)
Date: Thu, 22 Feb 2007 21:10:32 -0800
Subject: [openib-general] [PATCH] 2.6.21-rc1: please pull rdma-dev.git
	for-roland
In-Reply-To: <adak5y9iqu5.fsf@cisco.com>
Message-ID: <000401c75708$f1cf2d70$bcd5180a@amr.corp.intel.com>

>the patches are in
>
>    git://git.openfabrics.org/~shefty/rdma-dev.git for-roland

I will do that in the future.

And yes, the sign off line was just a mistake.  Thanks for fixing that.

- Sean


From rdreier at cisco.com  Thu Feb 22 22:09:28 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Thu, 22 Feb 2007 22:09:28 -0800
Subject: [openib-general] [PATCH] libmthca: optimize calls to htonl with
 constant parameter
In-Reply-To: <20070222235724.GC4447@mellanox.co.il> (Michael S.
	Tsirkin's message of "Fri, 23 Feb 2007 01:57:24 +0200")
References: <20070222235724.GC4447@mellanox.co.il>
Message-ID: <ada3b4xh0br.fsf@cisco.com>

 > GCC seems to be unable to propogate constants across calls to htonl.
 > So it turns out to be worth the while to replace htonl with
 > a hand-written macro in case of constant parameter.

I'm wondering why this helps you.  On my system (which has Debian's
old glibc 2.3.6, certainly nothing particularly fancy), I see in
my <netinet/in.h>:

	/* Get machine dependent optimized versions of byte swapping functions.  */
	#include <bits/byteswap.h>
	
	#ifdef __OPTIMIZE__
	/* We can optimize calls to the conversion functions.  Either nothing has
	   to be done or we are using directly the byte-swapping functions which
	   often can be inlined.  */
	# if __BYTE_ORDER == __BIG_ENDIAN
	//...
	# else
	#  if __BYTE_ORDER == __LITTLE_ENDIAN
	#   define ntohl(x)	__bswap_32 (x)

and so on (and gcc defines __OPTIMIZE__ if you pass it any -O flag
including -Os).  And in <bits/byteswap.h> I have

	/* Swap bytes in 32 bit value.  */
	#define __bswap_constant_32(x) \
	     ((((x) & 0xff000000) >> 24) | (((x) & 0x00ff0000) >>  8) |		      \
	      (((x) & 0x0000ff00) <<  8) | (((x) & 0x000000ff) << 24))

and variations of __bswap_32() that look roughly like

	#  define __bswap_32(x) \
	     (__extension__							      \
	      ({ register unsigned int __v, __x = (x);				      \
		 if (__builtin_constant_p (__x))				      \
		   __v = __bswap_constant_32 (__x);				      \
		 else								      \

and so on.  (The point of all this being that for constants, htonl()
should expand to roughly the same thing as your CONSTANT_HTONL() --
the only difference is that you don't have the & for the << 24 and >>
24 parts, which I guess just has the potential to bite us if someone
did something like CONSTANT_HTONL(1L) on a 64-bit system).

As a quick test I compiled the code

	#include <netinet/in.h>
	
	enum {
		Y = 5
	};
	
	uint32_t foo(uint32_t x)
	{
		return x | htonl(Y);
	}
	
with gcc -c -O and the disassembly of foo() looks like

	0000000000000000 <foo>:
	   0:	89 f8                	mov    %edi,%eax
	   2:	0d 00 00 00 05       	or     $0x5000000,%eax
	   7:	c3                   	retq   

and so everything works exactly the way we would want.  (32-bit i386
also just does or with a constant too).

In fact for libmthca I just checked that the preprocessor output of
places like the following (which your patch converts)

			((wr->send_flags & IBV_SEND_SIGNALED) ?
			 htonl(MTHCA_NEXT_CQ_UPDATE) : 0) |

is

   ((wr->send_flags & IBV_SEND_SIGNALED) ?
    (__extension__ ({ register unsigned int __v, __x = (MTHCA_NEXT_CQ_UPDATE); if (__builtin_constant_p (__x)) __v = ((((__x) & 0xff000000) >> 24) | (((__x) & 0x00ff0000) >> 8) | (((__x) & 0x0000ff00) << 8) | (((__x) & 0x000000ff) << 24)); else __asm__ ("bswap %0" : "=r" (__v) : "0" (__x)); __v; })) : 0) |

And if I compare the generated assembly for libmthca with and without
your patch (on both x86-64 and i386), I don't see any significant
difference (the size is exactly the same, I just see things like the
compiler using eax and edx in the opposite order and trivial things
like that).

So what is different in your setup that causes this patch to make a
difference for you?

(BTW, one thing I did notice while looking at the i386 assembly is
that one micro-optimization that might make sense to use something
like __attribute__((regparm(3))) for internal function calls within
libibverbs and libmthca on i386, since otherwise we waste instructions
pushing stuff on the stack for no reason other than compliance with
the crufty old i386 ABI.  Something like a FASTCALL macro in
<infiniband/arch.h> perhaps... if anyone really cares about 32-bit
i386 performance any more)

 - R.


From jgunthorpe at obsidianresearch.com  Thu Feb 22 23:00:55 2007
From: jgunthorpe at obsidianresearch.com (Jason Gunthorpe)
Date: Fri, 23 Feb 2007 00:00:55 -0700
Subject: [openib-general] [PATCH] libmthca: optimize calls to htonl with
 constant parameter
In-Reply-To: <ada3b4xh0br.fsf@cisco.com>
References: <20070222235724.GC4447@mellanox.co.il> <ada3b4xh0br.fsf@cisco.com>
Message-ID: <20070223070055.GC25553@obsidianresearch.com>

On Thu, Feb 22, 2007 at 10:09:28PM -0800, Roland Dreier wrote:

> (BTW, one thing I did notice while looking at the i386 assembly is
> that one micro-optimization that might make sense to use something
> like __attribute__((regparm(3))) for internal function calls within
> libibverbs and libmthca on i386, since otherwise we waste instructions
> pushing stuff on the stack for no reason other than compliance with
> the crufty old i386 ABI.  Something like a FASTCALL macro in
> <infiniband/arch.h> perhaps... if anyone really cares about 32-bit
> i386 performance any more)

Newer gccs have the -fwhole-program --combine options that address
this and more. One of the things that happens is that all internal
functions are made 'static' and all compilation units are optimized in
one go.

gcc will optimize calling convention and alot of other things for
static functions. That should provide an across the board
micro-improvement even on x86-64.

Jason


From vlad at lists.openfabrics.org  Fri Feb 23 02:28:23 2007
From: vlad at lists.openfabrics.org (vlad at lists.openfabrics.org)
Date: Fri, 23 Feb 2007 02:28:23 -0800 (PST)
Subject: [openib-general] ofa_1_2_kernel 20070223-0200 daily build status
Message-ID: <20070223102823.7EAFEE607F3@openfabrics.org>

This email was generated automatically, please do not reply


Common build parameters:  --with-ipoib-mod --with-sdp-mod --with-srp-mod --with-user_mad-mod --with-user_access-mod --with-mthca-mod --with-core-mod --with-addr_trans-mod --with-cxgb3-mod 

Passed:
Passed on i686 with 2.6.15-23-server
Passed on i686 with linux-2.6.18
Passed on i686 with linux-2.6.16
Passed on i686 with linux-2.6.12
Passed on i686 with linux-2.6.17
Passed on i686 with linux-2.6.15
Passed on i686 with linux-2.6.13
Passed on i686 with linux-2.6.14
Passed on i686 with linux-2.6.19
Passed on powerpc with linux-2.6.19
Passed on x86_64 with linux-2.6.20
Passed on x86_64 with linux-2.6.19
Passed on powerpc with linux-2.6.18
Passed on x86_64 with linux-2.6.18
Passed on x86_64 with linux-2.6.16
Passed on x86_64 with linux-2.6.17
Passed on x86_64 with linux-2.6.15
Passed on ppc64 with linux-2.6.19
Passed on x86_64 with linux-2.6.12
Passed on x86_64 with linux-2.6.13
Passed on powerpc with linux-2.6.17
Passed on ppc64 with linux-2.6.16
Passed on ia64 with linux-2.6.19
Passed on ppc64 with linux-2.6.18
Passed on x86_64 with linux-2.6.14
Passed on ia64 with linux-2.6.12
Passed on ia64 with linux-2.6.13
Passed on powerpc with linux-2.6.15
Passed on ppc64 with linux-2.6.15
Passed on powerpc with linux-2.6.14
Passed on ppc64 with linux-2.6.12
Passed on ppc64 with linux-2.6.17
Passed on powerpc with linux-2.6.12
Passed on x86_64 with linux-2.6.5-7.244-smp
Passed on powerpc with linux-2.6.16
Passed on ia64 with linux-2.6.16
Passed on ppc64 with linux-2.6.13
Passed on powerpc with linux-2.6.13
Passed on ia64 with linux-2.6.18
Passed on x86_64 with linux-2.6.16.21-0.8-smp
Passed on ppc64 with linux-2.6.14
Passed on ia64 with linux-2.6.15
Passed on x86_64 with linux-2.6.9-42.ELsmp
Passed on ia64 with linux-2.6.17
Passed on ia64 with linux-2.6.14
Passed on x86_64 with linux-2.6.18-1.2798.fc6

Failed:
Build failed on ia64 with linux-2.6.16.21-0.8-default
Log:
/home/vlad/tmp/ofa_1_2_kernel-20070223-0200_linux-2.6.16.21-0.8-default_ia64_check/include/rdma/ib_verbs.h:1590: error: implicit declaration of function ‘sg_dma_len’
/home/vlad/tmp/ofa_1_2_kernel-20070223-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core/addr.c: At top level:
/home/vlad/tmp/ofa_1_2_kernel-20070223-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core/addr.c:61: warning: initialization from incompatible pointer type
make[4]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070223-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core/addr.o] Error 1
make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070223-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core] Error 2
make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070223-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband] Error 2
make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070223-0200_linux-2.6.16.21-0.8-default_ia64_check] Error 2
make[1]: Leaving directory `/home/vlad/kernel.org/ia64/linux-2.6.16.21-0.8-default'
make: *** [kernel] Error 2
----------------------------------------------------------------------------------
Build failed on x86_64 with linux-2.6.9-22.ELsmp
Log:
/home/vlad/tmp/ofa_1_2_kernel-20070223-0200_linux-2.6.9-22.ELsmp_x86_64_check/drivers/net/cxgb3/vsc8211.c:167: error: ‘ADVERTISE_PAUSE_CAP’ undeclared (first use in this function)
/home/vlad/tmp/ofa_1_2_kernel-20070223-0200_linux-2.6.9-22.ELsmp_x86_64_check/drivers/net/cxgb3/vsc8211.c:167: error: (Each undeclared identifier is reported only once
/home/vlad/tmp/ofa_1_2_kernel-20070223-0200_linux-2.6.9-22.ELsmp_x86_64_check/drivers/net/cxgb3/vsc8211.c:167: error: for each function it appears in.)
/home/vlad/tmp/ofa_1_2_kernel-20070223-0200_linux-2.6.9-22.ELsmp_x86_64_check/drivers/net/cxgb3/vsc8211.c:170: error: ‘ADVERTISE_PAUSE_ASYM’ undeclared (first use in this function)
make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070223-0200_linux-2.6.9-22.ELsmp_x86_64_check/drivers/net/cxgb3/vsc8211.o] Error 1
make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070223-0200_linux-2.6.9-22.ELsmp_x86_64_check/drivers/net/cxgb3] Error 2
make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070223-0200_linux-2.6.9-22.ELsmp_x86_64_check] Error 2
make[1]: Leaving directory `/home/vlad/kernel.org/x86_64/linux-2.6.9-22.ELsmp'
make: *** [kernel] Error 2
----------------------------------------------------------------------------------
Build failed on x86_64 with linux-2.6.9-34.ELsmp
Log:
/home/vlad/tmp/ofa_1_2_kernel-20070223-0200_linux-2.6.9-34.ELsmp_x86_64_check/drivers/net/cxgb3/cxgb3_offload.c: In function ‘add_adapter’:
/home/vlad/tmp/ofa_1_2_kernel-20070223-0200_linux-2.6.9-34.ELsmp_x86_64_check/drivers/net/cxgb3/cxgb3_offload.c:1061: error: ‘adapter_list_lock’ undeclared (first use in this function)
/home/vlad/tmp/ofa_1_2_kernel-20070223-0200_linux-2.6.9-34.ELsmp_x86_64_check/drivers/net/cxgb3/cxgb3_offload.c: In function ‘remove_adapter’:
/home/vlad/tmp/ofa_1_2_kernel-20070223-0200_linux-2.6.9-34.ELsmp_x86_64_check/drivers/net/cxgb3/cxgb3_offload.c:1068: error: ‘adapter_list_lock’ undeclared (first use in this function)
make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070223-0200_linux-2.6.9-34.ELsmp_x86_64_check/drivers/net/cxgb3/cxgb3_offload.o] Error 1
make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070223-0200_linux-2.6.9-34.ELsmp_x86_64_check/drivers/net/cxgb3] Error 2
make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070223-0200_linux-2.6.9-34.ELsmp_x86_64_check] Error 2
make[1]: Leaving directory `/home/vlad/kernel.org/x86_64/linux-2.6.9-34.ELsmp'
make: *** [kernel] Error 2
----------------------------------------------------------------------------------


From mst at mellanox.co.il  Fri Feb 23 03:24:15 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Fri, 23 Feb 2007 13:24:15 +0200
Subject: [openib-general] [PATCH] libmthca: optimize calls to htonl with
 constant parameter
In-Reply-To: <ada3b4xh0br.fsf@cisco.com>
References: <20070222235724.GC4447@mellanox.co.il> <ada3b4xh0br.fsf@cisco.com>
Message-ID: <20070223112415.GB4415@mellanox.co.il>

> So what is different in your setup that causes this patch to make a
> difference for you?

Hmm. I agree it is somewhat strange.

Below is a simple test that attempts to compare htonl, CONSTANT_HTONL,
and an array-driven implementation. The code line is taken directly from htonl.
Could you compile and run it please?

I see:

$ gcc -O2 1.c
$ ./a.out
test1
122396.00 usec
test2
10517799.00 usec
test3
104099.00 usec

which seems to imply CONSTANT_HTONL is much faster.

Ideas?

-------------------------------


#include <stdio.h>
#include <sys/time.h>
#include <time.h>

#include <endian.h>

#define SIZE 255

enum ibv_send_flags {
        IBV_SEND_FENCE          = 1 << 0,
        IBV_SEND_SIGNALED       = 1 << 1,
        IBV_SEND_SOLICITED      = 1 << 2,
        IBV_SEND_INLINE         = 1 << 3
};

enum {
        MTHCA_NEXT_DBD       = 1 << 7,
        MTHCA_NEXT_FENCE     = 1 << 6,
        MTHCA_NEXT_CQ_UPDATE = 1 << 3,
        MTHCA_NEXT_EVENT_GEN = 1 << 2,
        MTHCA_NEXT_SOLICIT   = 1 << 1,
};

int ar[SIZE];


void init_ar()
{
	ar[0]=htonl(1);
	ar[IBV_SEND_SIGNALED]=htonl(MTHCA_NEXT_CQ_UPDATE|1);;
	ar[IBV_SEND_SOLICITED]=htonl(MTHCA_NEXT_SOLICIT|1);;
	ar[IBV_SEND_SIGNALED|IBV_SEND_SOLICITED]=htonl(MTHCA_NEXT_CQ_UPDATE|MTHCA_NEXT_SOLICIT|1);;
}


int test1(int x) 
{
	return ar[x & (IBV_SEND_SIGNALED | IBV_SEND_SOLICITED)];
}


int test2(int x) 
{
	return 
		((x & IBV_SEND_SIGNALED)  ? htonl(MTHCA_NEXT_CQ_UPDATE) : 0) |
		((x & IBV_SEND_SOLICITED)  ? htonl(MTHCA_NEXT_SOLICIT) : 0) |
		htonl(1);
}

#if __BYTE_ORDER == __LITTLE_ENDIAN
#define CONSTANT_HTONL(x) \
	((x >> 24) | ((x >> 8) & 0xff00) | ((x << 8) & 0xff0000) | (x << 24))
#elif __BYTE_ORDER == __BIG_ENDIAN
#define CONSTANT_HTONL(x) (x)
#else
#define CONSTANT_HTONL(x) htonl(x)
#endif

int test3(int x) 
{
	return 
		((x & IBV_SEND_SIGNALED)  ? CONSTANT_HTONL(MTHCA_NEXT_CQ_UPDATE) : 0) |
		((x & IBV_SEND_SOLICITED) ? CONSTANT_HTONL(MTHCA_NEXT_SOLICIT) : 0) |
		CONSTANT_HTONL(1);
}

struct timeval           start, end;

void timestart(void)
{
	if (gettimeofday(&start, NULL)) {
		perror("gettimeofday");
		return;
	}

}


void timeend(void)
{
	if (gettimeofday(&end, NULL)) {
		perror("gettimeofday");
		return;
	}

	{
		float usec = (end.tv_sec - start.tv_sec) * 1000000 +
			(end.tv_usec - start.tv_usec);

		printf("%.2f usec\n", usec);
	}

}

main() 
{
	int i;

	init_ar();

	printf("test1\n");

	timestart();

	for (i=0; i<100000000; ++i) {
		(void) test1(IBV_SEND_SIGNALED);
		(void) test1(0);
		(void) test1(IBV_SEND_SIGNALED | IBV_SEND_SOLICITED);
		(void) test1(IBV_SEND_SOLICITED);
	}
	timeend();

	printf("test2\n");
	timestart();

	for (i=0; i<100000000; ++i) {
		(void) test2(IBV_SEND_SIGNALED);
		(void) test2(0);
		(void) test2(IBV_SEND_SIGNALED | IBV_SEND_SOLICITED);
		(void) test2(IBV_SEND_SOLICITED);
	}
	timeend();
	printf("test3\n");
	timestart();

	for (i=0; i<100000000; ++i) {
		(void) test3(IBV_SEND_SIGNALED);
		(void) test3(0);
		(void) test3(IBV_SEND_SIGNALED | IBV_SEND_SOLICITED);
		(void) test3(IBV_SEND_SOLICITED);
	}
	timeend();
}

-- 
MST


From mst at mellanox.co.il  Fri Feb 23 03:36:43 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Fri, 23 Feb 2007 13:36:43 +0200
Subject: [openib-general] [PATCH] libmthca: optimize calls to htonl with
 constant parameter
In-Reply-To: <20070223112415.GB4415@mellanox.co.il>
References: <20070222235724.GC4447@mellanox.co.il>
	<ada3b4xh0br.fsf@cisco.com> <20070223112415.GB4415@mellanox.co.il>
Message-ID: <20070223113643.GC4415@mellanox.co.il>

> Quoting Michael S. Tsirkin <mst at mellanox.co.il>:
> Subject: Re: [PATCH] libmthca: optimize calls to htonl with constant parameter
> 
> > So what is different in your setup that causes this patch to make a
> > difference for you?
> 
> Hmm. I agree it is somewhat strange.
> 
> Below is a simple test that attempts to compare htonl, CONSTANT_HTONL,
> and an array-driven implementation. The code line is taken directly from htonl.
> Could you compile and run it please?

OK, this was stupid, the test was missing 
#include <netinet/in.h>
so htonl was expanded by a gcc intrinsic which seems to work worse
than the macro tricks present in netinet/in.h.

I guess this include got killed on the test system somehow,
and this explains why I saw a difference in libmthca.

-- 
MST


From halr at voltaire.com  Fri Feb 23 03:49:04 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 23 Feb 2007 06:49:04 -0500
Subject: [openib-general] ipoib & the partial pkey,
 was: librdmacm: fix bug causing failure to work with partial
 membership pkey
In-Reply-To: <1172230425.4102.1248.camel@hal.voltaire.com>
References: <000401c756da$1f9387d0$8698070a@amr.corp.intel.com>
	<1172230425.4102.1248.camel@hal.voltaire.com>
Message-ID: <1172231343.4102.2202.camel@hal.voltaire.com>

On Thu, 2007-02-22 at 18:35, Sean Hefty wrote:
> >Doesn't this allow ipoib to join a multicast group for which it may not be able
> >to communicate with all members?  For the broadcast group, this seems like an
> >error to me.  Can ipoib work in such a configuration?  If all nodes were
> >assigned a partial membership PKey, none of them could communicate, but no
> >errors would be generated anywhere.
> 
> I looked into this more...
> 
> RFC 4391 states (middle of page 5):
> 
> For a node to join a partition, one of its ports must be assigned the relevant
> P_Key by the SM [RFC4392].
> 
> Jumping to RFC 4392 (top of page 4):
> 
> at the time of creating an IB multicast group, multiple values such as the
> P_Key, Q_Key, Service Level, Hop Limit, Flow ID, TClass, MTU, etc.  have to be
> specified.  These values should be such that all potential members of the IB
> multicast group are able to communicate with one another when using them.

Seems to me that for P_Key this would mean full membership.

> and page 14:
> 
> Note that this IB_join to the broadcast group is a FullMember join.

FullMember here is referring to MCMemberRecord:JoinState rather than
partition membership.

-- Hal

> If any of
> the ports or the switches linking the port to the rest of the IPoIB subnet
> cannot support the parameters (e.g., path MTU or P_Key) associated with the
> broadcast group, then the IB_join request will fail and the requesting port will
> not become part of the IPoIB subnet
> 
> My initial interpretation of these statements lead me to believe that pkey check
> in ib_find_cached_pkey should not mask out the upper bit, which would prevent
> ipoib from joining a multicast group until it has been configured with the full
> membership pkey for the broadcast group.  Does this seem reasonable?
> 
> - Sean


From halr at voltaire.com  Fri Feb 23 04:13:59 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 23 Feb 2007 07:13:59 -0500
Subject: [openib-general] [PATCH] librdmacm: fix bug causing failure to
 work with partial membership pkey
In-Reply-To: <1172230422.4102.1246.camel@hal.voltaire.com>
References: <1172155552.4380.475949.camel@hal.voltaire.com>
	<000301c756aa$b89e0020$8698070a@amr.corp.intel.com>
	<15ddcffd0702221309q4633a36cg8a7bb5ff69d78776@mail.gmail.com>
	<45DE16C3.5020809@ichips.intel.com>
	<1172230422.4102.1246.camel@hal.voltaire.com>
Message-ID: <1172232836.4102.3709.camel@hal.voltaire.com>

On Thu, 2007-02-22 at 17:18, Sean Hefty wrote:
> >>Can someone help my understanding here?  Is ipoib joining a multicast group
> >>using the full membership PKey, even if the node that it joins from only has the
> >>limited membership PKey configured? And the code in ib_find_cached_pkey helps
> >>enable this?
> > 
> > Yep. The ipoib create_child  function Or-s 0x8000  to the device pkey
> > which was provided by the user. Now, IPoIB uses the device pkey when
> > forming MGIDs and when doing modify qp to init. Indeed the way
> > ib_find_cached_pkey() is implemented, make the latter use trivial.
> 
> Doesn't this allow ipoib to join a multicast group for which it may not be able 
> to communicate with all members?

Yes, if the join were to succeed which appears to me to be to be noncompliant
behavior.

> For the broadcast group, this seems like an error to me.

Why for just the broadcast group ? Isn't it any IPoIB MC group for which
this would be done ? (See below as to what the IBA spec says).

> Can ipoib work in such a configuration?  If all nodes were 
> assigned a partial membership PKey, none of them could communicate, but no 
> errors would be generated anywhere.
> 
> Joining a multicast group requires specifying the full membership PKey.  I don't 
> see anything in the spec that explicitly prohibits joining the group from a node 
> with only a partial membership PKey,

What about the description og P_Key in MCMemberRecord (table 210 on p.
908 which is compliance) which states:

"All members of the multicast group shall have full membership in the
partition indicated by the partition key."

-- Hal

>  but at first glance, this seems like a 
> subnet configuration issue.  Is there some use of this I'm overlooking?
> 
> - Sean


From halr at voltaire.com  Fri Feb 23 03:33:45 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 23 Feb 2007 06:33:45 -0500
Subject: [openib-general] [PATCH] librdmacm: fix bug causing failure to
 work with partial membership pkey
In-Reply-To: <45DE16C3.5020809@ichips.intel.com>
References: <1172155552.4380.475949.camel@hal.voltaire.com>
	<000301c756aa$b89e0020$8698070a@amr.corp.intel.com>
	<15ddcffd0702221309q4633a36cg8a7bb5ff69d78776@mail.gmail.com>
	<45DE16C3.5020809@ichips.intel.com>
Message-ID: <1172230422.4102.1246.camel@hal.voltaire.com>

On Thu, 2007-02-22 at 17:18, Sean Hefty wrote:
> >>Can someone help my understanding here?  Is ipoib joining a multicast group
> >>using the full membership PKey, even if the node that it joins from only has the
> >>limited membership PKey configured? And the code in ib_find_cached_pkey helps
> >>enable this?
> > 
> > Yep. The ipoib create_child  function Or-s 0x8000  to the device pkey
> > which was provided by the user. Now, IPoIB uses the device pkey when
> > forming MGIDs and when doing modify qp to init. Indeed the way
> > ib_find_cached_pkey() is implemented, make the latter use trivial.
> 
> Doesn't this allow ipoib to join a multicast group for which it may not be able 
> to communicate with all members?

Yes, if the join were to succeed which appears to be to be noncompliant
behavior.

> For the broadcast group, this seems like an error to me.

Why for just the broadcast group ? Isn't it any IPoIB MC group for which
this would be done ? (See below as to what the IBA spec says).

> Can ipoib work in such a configuration?  If all nodes were 
> assigned a partial membership PKey, none of them could communicate, but no 
> errors would be generated anywhere.
> 
> Joining a multicast group requires specifying the full membership PKey.  I don't 
> see anything in the spec that explicitly prohibits joining the group from a node 
> with only a partial membership PKey,

What about the description og P_Key in MCMemberRecord (table 210 on p.
908 which is compliance) which states:

"All members of the multicast group shall have full membership in the
partition indicated by the partition key."

-- Hal

>  but at first glance, this seems like a 
> subnet configuration issue.  Is there some use of this I'm overlooking?
> 
> - Sean


From rdreier at cisco.com  Fri Feb 23 07:32:51 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Fri, 23 Feb 2007 07:32:51 -0800
Subject: [openib-general] [PATCH] libmthca: optimize calls to htonl with
 constant parameter
References: <20070222235724.GC4447@mellanox.co.il> <ada3b4xh0br.fsf@cisco.com>
	<20070223070055.GC25553@obsidianresearch.com>
Message-ID: <aday7moevoc.fsf@cisco.com>

 > Newer gccs have the -fwhole-program --combine options that address
 > this and more. One of the things that happens is that all internal
 > functions are made 'static' and all compilation units are optimized in
 > one go.

Good point... but is there any sane way to use that feature with
automake and libtool?  I know that the autotools are a pain but I
really don't want to reimplement the useful stuff they give us, and I
don't know of any really practical replacement...

 - R.


From sean.hefty at intel.com  Fri Feb 23 12:15:09 2007
From: sean.hefty at intel.com (Sean Hefty)
Date: Fri, 23 Feb 2007 12:15:09 -0800
Subject: [openib-general] [PATCH] for OFED 1.2
Message-ID: <000001c75787$50ff0440$ff0da8c0@amr.corp.intel.com>

I would like these fixes in OFED 1.2 as well.  What git tree / branch do I
generate a patch against?

- Sean

---

rdma_cm: remove unused node_guid from cma_device structure.
ib_cm: remove ca_guid from cm_device structure.
rdma_cm: request reversible paths only.
ib_core: Set hop limit in ib_init_ah_from_wc correctly.

The patches are in:

	git://git.openfabrics.org/~shefty/rdma-dev.git for-roland

(sign-off line was added to the actual commit messages)

Signed-off-by: Sean Hefty <sean.hefty at intel.com>
---
commit 28e218621d36cf9da42f07af08775769eb289fc0
Author: Sean Hefty <sean.hefty at intel.com>
Date:   Thu Feb 22 11:37:44 2007 -0800

    rdma_cm: remove unused node_guid from cma_device structure.

diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index bb27ce9..d441815 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -77,7 +77,6 @@ static int next_port;
 struct cma_device {
 	struct list_head	list;
 	struct ib_device	*device;
-	__be64			node_guid;
 	struct completion	comp;
 	atomic_t		refcount;
 	struct list_head	id_list;
@@ -2674,7 +2673,6 @@ static void cma_add_one(struct ib_device *device)
 		return;
 
 	cma_dev->device = device;
-	cma_dev->node_guid = device->node_guid;
 
 	init_completion(&cma_dev->comp);
 	atomic_set(&cma_dev->refcount, 1);

commit 6de97f2a3373357d720b1653dfc0aac6d40b7506
Author: Sean Hefty <sean.hefty at intel.com>
Date:   Thu Feb 22 11:37:38 2007 -0800

    ib_cm: remove ca_guid from cm_device structure.
    
    The cm_device references an ib_device, which contains the node_guid.

diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
index d446998..842cd0b 100644
--- a/drivers/infiniband/core/cm.c
+++ b/drivers/infiniband/core/cm.c
@@ -88,7 +88,6 @@ struct cm_port {
 struct cm_device {
 	struct list_head list;
 	struct ib_device *device;
-	__be64 ca_guid;
 	struct cm_port port[0];
 };
 
@@ -739,8 +738,8 @@ retest:
 		ib_cancel_mad(cm_id_priv->av.port->mad_agent, cm_id_priv->msg);
 		spin_unlock_irqrestore(&cm_id_priv->lock, flags);
 		ib_send_cm_rej(cm_id, IB_CM_REJ_TIMEOUT,
-			       &cm_id_priv->av.port->cm_dev->ca_guid,
-			       sizeof cm_id_priv->av.port->cm_dev->ca_guid,
+			       &cm_id_priv->id.device->node_guid,
+			       sizeof cm_id_priv->id.device->node_guid,
 			       NULL, 0);
 		break;
 	case IB_CM_REQ_RCVD:
@@ -883,7 +882,7 @@ static void cm_format_req(struct cm_req_msg *req_msg,
 
 	req_msg->local_comm_id = cm_id_priv->id.local_id;
 	req_msg->service_id = param->service_id;
-	req_msg->local_ca_guid = cm_id_priv->av.port->cm_dev->ca_guid;
+	req_msg->local_ca_guid = cm_id_priv->id.device->node_guid;
 	cm_req_set_local_qpn(req_msg, cpu_to_be32(param->qp_num));
 	cm_req_set_resp_res(req_msg, param->responder_resources);
 	cm_req_set_init_depth(req_msg, param->initiator_depth);
@@ -1442,7 +1441,7 @@ static void cm_format_rep(struct cm_rep_msg *rep_msg,
 	cm_rep_set_flow_ctrl(rep_msg, param->flow_control);
 	cm_rep_set_rnr_retry_count(rep_msg, param->rnr_retry_count);
 	cm_rep_set_srq(rep_msg, param->srq);
-	rep_msg->local_ca_guid = cm_id_priv->av.port->cm_dev->ca_guid;
+	rep_msg->local_ca_guid = cm_id_priv->id.device->node_guid;
 
 	if (param->private_data && param->private_data_len)
 		memcpy(rep_msg->private_data, param->private_data,
@@ -3385,7 +3384,6 @@ static void cm_add_one(struct ib_device *device)
 		return;
 
 	cm_dev->device = device;
-	cm_dev->ca_guid = device->node_guid;
 
 	set_bit(IB_MGMT_METHOD_SEND, reg_req.method_mask);
 	for (i = 1; i <= device->phys_port_cnt; i++) {

commit 87680047dd09ca4a4e8ec575dad215c92cf45ed3
Author: Sean Hefty <sean.hefty at intel.com>
Date:   Wed Feb 21 16:40:44 2007 -0800

    rdma_cm: request reversible paths only
    
    The rdma_cm requires that path records be reversible.  Set the reversible
    bit when issuing an path record query.

diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index f8d69b3..bb27ce9 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -1492,11 +1492,13 @@ static int cma_query_ib_route(struct rdma_id_private
*id_priv, int timeout_ms,
 	ib_addr_get_dgid(addr, &path_rec.dgid);
 	path_rec.pkey = cpu_to_be16(ib_addr_get_pkey(addr));
 	path_rec.numb_path = 1;
+	path_rec.reversible = 1;
 
 	id_priv->query_id = ib_sa_path_rec_get(&sa_client, id_priv->id.device,
 				id_priv->id.port_num, &path_rec,
 				IB_SA_PATH_REC_DGID | IB_SA_PATH_REC_SGID |
-				IB_SA_PATH_REC_PKEY | IB_SA_PATH_REC_NUMB_PATH,
+				IB_SA_PATH_REC_PKEY | IB_SA_PATH_REC_NUMB_PATH |
+				IB_SA_PATH_REC_REVERSIBLE,
 				timeout_ms, GFP_KERNEL,
 				cma_query_handler, work, &id_priv->query);
 

commit 30947e5b7db42184d66746ac1187d4abbf89018d
Author: Sean Hefty <sean.hefty at intel.com>
Date:   Wed Feb 21 16:37:31 2007 -0800

    ib_core: Set hop limit in ib_init_ah_from_wc correctly.
    
    The hop_limit value in the ah_attr should be 0xFF, not the value read
    from the received GRH (which should be 0).  See 13.5.4.4 in the 1.2 IB spec.

diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index 8b5dd36..ccdf93d 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -167,7 +167,7 @@ int ib_init_ah_from_wc(struct ib_device *device, u8
port_num, struct ib_wc *wc,
 		ah_attr->grh.sgid_index = (u8) gid_index;
 		flow_class = be32_to_cpu(grh->version_tclass_flow);
 		ah_attr->grh.flow_label = flow_class & 0xFFFFF;
-		ah_attr->grh.hop_limit = grh->hop_limit;
+		ah_attr->grh.hop_limit = 0xFF;
 		ah_attr->grh.traffic_class = (flow_class >> 20) & 0xFF;
 	}
 	return 0;


From rdreier at cisco.com  Fri Feb 23 13:11:34 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Fri, 23 Feb 2007 13:11:34 -0800
Subject: [openib-general] [2.6 patch] drivers/infiniband/hw/cxgb3/:
	cleanups
In-Reply-To: <1172068305.21243.2.camel@stevo-desktop> (Steve Wise's
	message of "Wed, 21 Feb 2007 08:31:45 -0600")
References: <20070220000211.GZ13958@stusta.de>
	<1171982587.2101.0.camel@stevo-desktop>
	<20070221105249.GC13958@stusta.de>
	<1172068305.21243.2.camel@stevo-desktop>
Message-ID: <adafy8wd1fd.fsf@cisco.com>

thanks, queued for 2.6.21


From rdreier at cisco.com  Fri Feb 23 13:13:00 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Fri, 23 Feb 2007 13:13:00 -0800
Subject: [openib-general] [PATCH 2.6.21] iw_cxgb3: Stop the EP Timer on
	BAD CLOSE.
In-Reply-To: <1172090739.27101.39.camel@stevo-desktop> (Steve Wise's
	message of "Wed, 21 Feb 2007 14:45:39 -0600")
References: <1172090739.27101.39.camel@stevo-desktop>
Message-ID: <adabqjkd1cz.fsf@cisco.com>

thanks, queued for 2.6.21


From arlin.r.davis at intel.com  Fri Feb 23 15:06:09 2007
From: arlin.r.davis at intel.com (Arlin Davis)
Date: Fri, 23 Feb 2007 15:06:09 -0800
Subject: [openib-general] [PATCH] uDAPL - include dapltest and dtest in build
Message-ID: <000001c7579f$347974a0$ff0da8c0@amr.corp.intel.com>

This uDAPL patch adds both dapltest and dtest utilities, including manual pages, to the DAPL project
build. The dapltest required some modifications to build on x86_64.

James, please review.

Signed-off by: Arlin Davis ardavis at ichips.intel.com

diff --git a/Makefile.am b/Makefile.am
index 1190f20..e2bf4dc 100644
--- a/Makefile.am
+++ b/Makefile.am
@@ -179,7 +179,9 @@ libdatinclude_HEADERS = dat/include/dat/dat.h \
 			dat/include/dat/udat.h \
 			dat/include/dat/udat_redirection.h \
 			dat/include/dat/udat_vendor_specific.h 
-			
+		
+man_MANS = man/dtest.1 man/dapltest.1 
+    	
 EXTRA_DIST = dat/common/dat_dictionary.h \
 	     dat/common/dat_dr.h \
 	     dat/common/dat_init.h \
@@ -228,8 +230,10 @@ EXTRA_DIST = dat/common/dat_dictionary.h \
 	     dat/udat/libdat.map \
 	     doc/dat.conf \
 	     dapl/udapl/libdaplcma.map \
-	     dapl/udapl/libdaplscm.map \
-	     libdat.spec.in 
+	     libdat.spec.in \
+	     $(man_MANS)
 	     
 dist-hook: libdat.spec 
 	cp libdat.spec $(distdir)
+	
+SUBDIRS = . test/dtest test/dapltest
diff --git a/configure.in b/configure.in
index bf5ec09..324bfa1 100644
--- a/configure.in
+++ b/configure.in
@@ -1,11 +1,11 @@
 dnl Process this file with autoconf to produce a configure script.
 
 AC_PREREQ(2.57)
-AC_INIT(dapl, 1.2.0, dapl-devel at lists.sourceforge.net)
+AC_INIT(dapl, 1.2.1, openib-general at openib.org)
 AC_CONFIG_SRCDIR([dat/udat/udat.c])
 AC_CONFIG_AUX_DIR(config)
 AM_CONFIG_HEADER(config.h)
-AM_INIT_AUTOMAKE(dapl, 1.2.0)
+AM_INIT_AUTOMAKE(dapl, 1.2.1)
 
 AM_PROG_LIBTOOL
 
@@ -60,5 +60,6 @@ AC_CACHE_CHECK(whether this is an RHEL system, ac_cv_rhel,
     fi)
 AM_CONDITIONAL(OS_RHEL, test "$ac_cv_rhel" = "yes")
 
-AC_CONFIG_FILES([Makefile libdat.spec])
+AC_CONFIG_FILES([Makefile test/dtest/Makefile test/dapltest/Makefile libdat.spec])
+
 AC_OUTPUT
diff --git a/man/dapltest.1 b/man/dapltest.1
new file mode 100644
index 0000000..8ff4493
--- /dev/null
+++ b/man/dapltest.1
@@ -0,0 +1,390 @@
+." Text automatically generated by txt2man
+.TH dapltest 1 "February 23, 2007" "uDAPL 1.2" "USER COMMANDS"
+
+.SH NAME
+\fB
+\fBdapltest \fP- test for the Direct Access Programming Library (DAPL)
+\fB
+.SH DESCRIPTION
+
+Dapltest is a set of tests developed to exercise, characterize,
+and verify the DAPL interfaces during development and porting.
+At least two instantiations of the test must be run. One acts
+as the server, fielding requests and spawning server-side test
+threads as needed. Other client invocations connect to the server
+and issue test requests. The server side of the test, once invoked,
+listens continuously for client connection requests, until quit or 
+killed. Upon receipt of a connection request, the connection is 
+established, the server and client sides swap version numbers to 
+verify that they are able to communicate, and the client sends 
+the test request to the server. If the version numbers match, 
+and the test request is well-formed, the server spawns the threads
+needed to run the test before awaiting further connections.
+.SH USAGE
+
+dapltest [ -f script_file_name ]
+[ -T S|Q|T|P|L ] [ -D device_name ] [ -d ] [ -R HT|LL|EC|PM|BE ]
+.PP
+With no arguments, dapltest runs as a server using default values,
+and loops accepting requests from clients.
+
+The -f option allows all arguments to be placed in a file, to ease
+test automation.
+
+The following arguments are common to all tests:
+.TP
+.B
+[ -T S|Q|T|P|L ]
+Test function to be performed:
+.RS
+.TP
+.B
+S
+- server loop
+.TP
+.B
+Q
+- quit, client requests that server
+wait for any outstanding tests to
+complete, then clean up and exit
+.TP
+.B
+T
+- transaction test, transfers data between 
+client and server
+.TP
+.B
+P
+- performance test, times DTO operations
+.TP
+.B
+L
+- limit test, exhausts various resources,
+runs in client w/o server interaction
+Default: S
+.RE
+.TP
+.B
+[ -D device_name ]
+Specifies the interface adapter name as documented in 
+the /etc/dat.conf static configuration file. This name 
+corresponds to the provider library to open. 
+Default: none
+.TP
+.B
+[ -d ]
+Enables extra debug verbosity, primarily tracing
+of the various DAPL operations as they progress.
+Repeating this parameter increases debug spew.
+Errors encountered result in the test spewing some
+explanatory text and stopping; this flag provides
+more detail about what lead up to the error.
+Default: zero
+.TP
+.B
+[ -R BE ]
+Indicate the quality of service (QoS) desired.
+Choices are:
+.RS
+.TP
+.B
+HT
+- high throughput
+.TP
+.B
+LL
+- low latency
+.TP
+.B
+EC
+- economy (neither HT nor LL)
+.TP
+.B
+PM
+- premium
+.TP
+.B
+BE
+- best effort
+Default: BE
+.RE
+.RE
+.PP
+.B
+Usage - Quit test client
+.PP
+.nf
+.fam C
+    dapltest [Common_Args] [ -s server_name ]
+
+    Quit testing (-T Q) connects to the server to ask it to clean up and
+    exit (after it waits for any outstanding test runs to complete).
+    In addition to being more polite than simply killing the server,
+    this test exercises the DAPL object teardown code paths.
+    There is only one argument other than those supported by all tests:
+
+    -s server_name      Specifies the name of the server interface.
+                        No default.
+
+
+.fam T
+.fi
+.B
+Usage - Transaction test client
+.PP
+.nf
+.fam C
+    dapltest [Common_Args] [ -s server_name ]
+             [ -t threads ] [ -w endpoints ] [ -i iterations ] [ -Q ] 
+             [ -V ] [ -P ] OPclient OPserver [ op3, 
+
+    Transaction testing (-T T) transfers a variable amount of data between 
+    client and server.  The data transfer can be described as a sequence of 
+    individual operations; that entire sequence is transferred 'iterations' 
+    times by each thread over all of its endpoint(s).
+
+    The following parameters determine the behavior of the transaction test:
+
+    -s server_name      Specifies the name or IP address of the server interface.
+                        No default.
+
+    [ -t threads ]      Specify the number of threads to be used.
+                        Default: 1
+
+    [ -w endpoints ]    Specify the number of connected endpoints per thread.
+                        Default: 1
+
+    [ -i iterations ]   Specify the number of times the entire sequence
+                        of data transfers will be made over each endpoint.
+                        Default: 1000
+
+    [ -Q ]              Funnel completion events into a CNO.
+                        Default: use EVDs
+
+    [ -V ]              Validate the data being transferred.
+                        Default: ignore the data
+
+    [ -P ]              Turn on DTO completion polling
+                        Default: off
+
+    OP1 OP2 [ OP3, \.\.\. ]
+                        A single transaction (OPx) consists of:
+
+                        server|client   Indicates who initiates the
+                                        data transfer.
+
+                        SR|RR|RW        Indicates the type of transfer:
+                                        SR  send/recv
+                                        RR  RDMA read
+                                        RW  RDMA write
+                        Defaults: none
+
+                        [ seg_size [ num_segs ] ]
+                                        Indicates the amount and format
+                                        of the data to be transferred.
+                                        Default:  4096  1
+                                                  (i.e., 1 4KB buffer)
+
+                        [ -f ]          For SR transfers only, indicates
+                                        that a client's send transfer
+                                        completion should be reaped when
+                                        the next recv completion is reaped.
+                                        Sends and receives must be paired
+                                        (one client, one server, and in that
+                                        order) for this option to be used.
+
+    Restrictions:  
+
+    Due to the flow control algorithm used by the transaction test, there 
+    must be at least one SR OP for both the client and the server.  
+
+    Requesting data validation (-V) causes the test to automatically append 
+    three OPs to those specified. These additional operations provide 
+    synchronization points during each iteration, at which all user-specified 
+    transaction buffers are checked. These three appended operations satisfy 
+    the "one SR in each direction" requirement.
+
+    The transaction OP list is printed out if -d is supplied.
+
+.fam T
+.fi
+.B
+Usage - Performance test client
+.PP
+.nf
+.fam C
+    dapltest [Common_Args] -s server_name [ -m p|b ]
+             [ -i iterations ] [ -p pipeline ] OP
+
+    Performance testing (-T P) times the transfer of an operation.
+    The operation is posted 'iterations' times.
+
+    The following parameters determine the behavior of the transaction test:
+
+    -s server_name      Specifies the name or IP address of the server interface.
+                        No default.
+
+    -m b|p              Used to choose either blocking (b) or polling (p)
+                        Default: blocking (b)
+
+    [ -i iterations ]   Specify the number of times the entire sequence
+                        of data transfers will be made over each endpoint.
+                        Default: 1000
+
+    [ -p pipeline ]     Specify the pipline length, valid arguments are in 
+                        the range [0,MAX_SEND_DTOS]. If a value greater than 
+                        MAX_SEND_DTOS is requested the value will be
+                        adjusted down to MAX_SEND_DTOS.
+                        Default: MAX_SEND_DTOS
+                        
+    OP                  Specifies the operation as follow:                 
+
+                        RR|RW           Indicates the type of transfer:
+                                        RR  RDMA read
+                                        RW  RDMA write
+                                        Defaults: none
+
+                        [ seg_size [ num_segs ] ]
+                                        Indicates the amount and format
+                                        of the data to be transferred.
+                                        Default:  4096  1
+                                                  (i.e., 1 4KB buffer)
+.fam T
+.RE
+.RE
+.PP
+.B
+Usage - Limit test client
+.PP
+.nf
+.fam C
+    Limit testing (-T L) neither requires nor connects to any server
+    instance.  The client runs one or more tests which attempt to
+    exhaust various resources to determine DAPL limits and exercise
+    DAPL error paths.  If no arguments are given, all tests are run.
+
+    Limit testing creates the sequence of DAT objects needed to
+    move data back and forth, attempting to find the limits supported
+    for the DAPL object requested.  For example, if the LMR creation
+    limit is being examined, the test will create a set of
+    {IA, PZ, CNO, EVD, EP} before trying to run dat_lmr_create() to
+    failure using that set of DAPL objects.  The 'width' parameter
+    can be used to control how many of these parallel DAPL object
+    sets are created before beating upon the requested constructor.
+    Use of -m limits the number of dat_*_create() calls that will
+    be attempted, which can be helpful if the DAPL in use supports
+    essentailly unlimited numbers of some objects.
+
+    The limit test arguments are:
+
+    [ -m maximum ]      Specify the maximum number of dapl_*_create()
+                        attempts.
+                        Default: run to object creation failure
+
+    [ -w width ]        Specify the number of DAPL object sets to
+                        create while initializing.
+                        Default: 1
+
+    [ limit_ia ]        Attempt to exhaust dat_ia_open()
+
+    [ limit_pz ]        Attempt to exhaust dat_pz_create()
+
+    [ limit_cno ]       Attempt to exhaust dat_cno_create()
+
+    [ limit_evd ]       Attempt to exhaust dat_evd_create()
+
+    [ limit_ep ]        Attempt to exhaust dat_ep_create()
+
+    [ limit_rsp ]       Attempt to exhaust dat_rsp_create()
+
+    [ limit_psp ]       Attempt to exhaust dat_psp_create()
+
+    [ limit_lmr ]       Attempt to exhaust dat_lmr_create(4KB)
+
+    [ limit_rpost ]     Attempt to exhaust dat_ep_post_recv(4KB)
+
+    [ limit_size_lmr ]  Probe maximum size dat_lmr_create()
+
+.nf
+.fam C
+                        Default: run all tests
+
+
+.fam T
+.fi
+.SH EXAMPLES
+
+dapltest -T S -d -D OpenIB-cma
+.PP
+.nf
+.fam C
+                        Starts a server process with debug verbosity.
+
+.fam T
+.fi
+dapltest -T T -d -s host1-ib0 -D OpenIB-cma -i 100 client SR 4096 2 server SR 4096 2
+.PP
+.nf
+.fam C
+                        Runs a transaction test, with both sides
+                        sending one buffer with two 4KB segments,
+                        one hundred times.
+
+.fam T
+.fi
+dapltest -T P -d -s host1-ib0 -D OpenIB-cma -i 100 SR 4096 2
+.PP
+.nf
+.fam C
+                        Runs a performance test, with the client 
+                        sending one buffer with two 4KB segments,
+                        one hundred times.
+
+.fam T
+.fi
+dapltest -T Q -s host1-ib0 -D OpenIB-cma
+.PP
+.nf
+.fam C
+                        Asks the server to clean up and exit.
+
+.fam T
+.fi
+dapltest -T L -D OpenIB-cma -d -w 16 -m 1000
+.PP
+.nf
+.fam C
+                        Runs all of the limit tests, setting up
+                        16 complete sets of DAPL objects, and
+                        creating at most a thousand instances
+                        when trying to exhaust resources.
+
+.fam T
+.fi
+dapltest -T T -V -d -t 2 -w 4 -i 55555 -s linux3 -D OpenIB-cma  
+client RW 4096 1 server RW 2048 4 
+client SR 1024 4 server SR 4096 2 
+client SR 1024 3 -f server SR 2048 1 -f
+.PP
+.nf
+.fam C
+                        Runs a more complicated transaction test,
+                        with two thread using four EPs each,
+                        sending a more complicated buffer pattern
+                        for a larger number of iterations,
+                        validating the data received.
+
+
+.fam T
+.fi
+.RE
+.TP
+.B
+BUGS
+(and  To Do List)
+.PP
+.nf
+.fam C
+    Use of CNOs (-Q) is not yet supported.
+
+    Further limit tests could be added.
diff --git a/man/dtest.1 b/man/dtest.1
new file mode 100755
index 0000000..1e227e5
--- /dev/null
+++ b/man/dtest.1
@@ -0,0 +1,78 @@
+.TH dtest 1 "February 23, 2007" "uDAPL 1.2" "USER COMMANDS"
+
+.SH NAME
+dtest \- simple uDAPL send/receive and RDMA test
+
+.SH SYNOPSIS
+.B dtest
+[\-P provider] [\-b buf size] [\-B burst count][\-v] [\-c] [\-p] [\-d]\fB [-s]\fR
+
+.B dtest
+[\-P provider] [\-b buf size] [\-B burst count][\-v] [\-c] [\-p] [\-d]\fB [-h HOSTNAME]\fR
+
+.SH DESCRIPTION
+.PP
+dtest is a simple test used to exercise and verify the uDAPL interfaces. 
+At least two instantiations of the test must be run. One acts as the server 
+and the other the client. The server side of the test, once invoked listens 
+for connection requests, until timing out or killed. Upon receipt of a cd 
+connection request, the connection is established, the server and client 
+sides exchange information necessary to perform RDMA writes and reads.
+
+.SH OPTIONS
+
+.PP
+.TP
+\fB\-P\fR=\fIPROVIDER\fR
+use \fIPROVIDER\fR to specify uDAPL interface using /etc/dat.conf (default OpenIB-cma)
+.TP
+\fB\-b\fR=\fIBUFFER_SIZE\fR
+use buffer size \fIBUFFER_SIZE\fR for RDMA(default 64)
+.TP
+\fB\-B\fR=\fIBURST_COUNT\fR
+use busrt count \fIBURST_COUNT\fR for interations (default 10)
+.TP
+\fB\-v\fR, verbose output(default off)
+.TP
+\fB\-c\fR, use consumer notification events (default off)
+.TP
+\fB\-p\fR, use polling (default wait for event)
+.TP
+\fB\-d\fR, delay in seconds before close (default off)
+.TP
+\fB\-s\fR, run as server (default - run as server)
+.TP
+\fB\-h\fR=\fIHOSTNAME\fR
+use \fIHOSTNAME\fR to specify server hostname or IP address (default - none)
+
+.SH EXAMPLES
+
+dtest -P OpenIB-cma -v -s
+.PP
+.nf
+.fam C
+     Starts a server process with debug verbosity using provider OpenIB-cma.
+
+.fam T
+.fi
+dtest -P OpenIB-cma -h server1-ib0 
+.PP
+.nf
+.fam C
+     Starts a client process, using OpenIB-cma provider to connect to hostname server1-ib0.
+
+.fam T
+
+.SH SEE ALSO
+.BR dapltest(1)
+
+.SH AUTHORS
+.TP
+Arlin Davis
+.RI < ardavis at ichips.intel.com >
+
+.SH BUGS 
+
+
+
+
diff --git a/test/dapltest/Makefile.am b/test/dapltest/Makefile.am
new file mode 100755
index 0000000..0c83924
--- /dev/null
+++ b/test/dapltest/Makefile.am
@@ -0,0 +1,56 @@
+INCLUDES =  -I include \
+	    -I mdep/linux
+         
+bin_PROGRAMS = dapltest
+
+dapltest_SOURCES =				\
+	cmd/dapl_main.c				\
+	cmd/dapl_params.c			\
+	cmd/dapl_fft_cmd.c			\
+	cmd/dapl_getopt.c			\
+	cmd/dapl_limit_cmd.c			\
+	cmd/dapl_netaddr.c			\
+	cmd/dapl_performance_cmd.c		\
+	cmd/dapl_qos_util.c			\
+	cmd/dapl_quit_cmd.c			\
+	cmd/dapl_server_cmd.c			\
+	cmd/dapl_transaction_cmd.c		\
+	test/dapl_bpool.c			\
+	test/dapl_client.c			\
+	test/dapl_client_info.c			\
+	test/dapl_cnxn.c			\
+	test/dapl_execute.c			\
+	test/dapl_fft_connmgt.c			\
+	test/dapl_fft_endpoint.c		\
+	test/dapl_fft_hwconn.c			\
+	test/dapl_fft_mem.c			\
+	test/dapl_fft_pz.c			\
+	test/dapl_fft_queryinfo.c		\
+	test/dapl_fft_test.c			\
+	test/dapl_fft_util.c			\
+	test/dapl_limit.c			\
+	test/dapl_memlist.c			\
+	test/dapl_performance_client.c		\
+	test/dapl_performance_server.c		\
+	test/dapl_performance_stats.c		\
+	test/dapl_performance_util.c		\
+	test/dapl_quit_util.c			\
+	test/dapl_server.c			\
+	test/dapl_server_info.c			\
+	test/dapl_test_data.c			\
+	test/dapl_test_util.c			\
+	test/dapl_thread.c			\
+	test/dapl_transaction_stats.c		\
+	test/dapl_transaction_test.c		\
+	test/dapl_transaction_util.c		\
+	test/dapl_util.c			\
+	common/dapl_endian.c			\
+	common/dapl_global.c			\
+	common/dapl_performance_cmd_util.c	\
+	common/dapl_quit_cmd_util.c		\
+	common/dapl_transaction_cmd_util.c	\
+	udapl/udapl_tdep.c			\
+	mdep/linux/dapl_mdep_user.c
+	
+dapltest_LDADD = $(srcdir)/../../dat/udat/libdat.la
+dapltest_LDFLAGS = -lpthread  
diff --git a/test/dapltest/configure.in b/test/dapltest/configure.in
new file mode 100755
index 0000000..ebdd59d
--- /dev/null
+++ b/test/dapltest/configure.in
@@ -0,0 +1,26 @@
+dnl Process this file with autoconf to produce a configure script.
+
+AC_PREREQ(2.57)
+AC_INIT(dapltest, 1.2.1, dapl-devel at lists.sourceforge.net)
+AC_CONFIG_SRCDIR([$top_srcdir/dapl/test/dapltest/cmd/dapl_main.c])
+AC_CONFIG_AUX_DIR(config)
+AM_CONFIG_HEADER(config.h)
+AM_INIT_AUTOMAKE(dapltest, 1.2.1)
+
+AM_PROG_LIBTOOL
+
+dnl Checks for programs
+AC_PROG_CC
+
+dnl Checks for libraries
+if test "$disable_libcheck" != "yes"
+then
+AC_CHECK_LIB(pthread, pthread_attr_init, [],
+    AC_MSG_ERROR([pthread_attr_init() not found,  dapltset requires pthreads]))
+fi
+
+dnl Checks for header files.
+
+AC_CONFIG_FILES([Makefile])
+
+AC_OUTPUT
diff --git a/test/dapltest/mdep/linux/dapl_mdep_user.h b/test/dapltest/mdep/linux/dapl_mdep_user.h
index 981783d..c05dd30 100644
--- a/test/dapltest/mdep/linux/dapl_mdep_user.h
+++ b/test/dapltest/mdep/linux/dapl_mdep_user.h
@@ -138,10 +138,16 @@ DT_Mdep_GetTimeStamp ( void )
     } while (tbu0 != tbu1);
     return (((unsigned long long)tbu0) << 32) | tbl;
 #else
+#if defined(__x86_64__)
+      unsigned int __a,__d; 
+      asm volatile("rdtsc" : "=a" (__a), "=d" (__d)); 
+      return ((unsigned long)__a) | (((unsigned long)__d)<<32);
+#else
 #error "Non-Pentium and Non-PPC Linux - unimplemented"
 #endif
 #endif
 #endif
+#endif
 }
 
 /*
diff --git a/test/dtest/Makefile.am b/test/dtest/Makefile.am
new file mode 100755
index 0000000..ac9837b
--- /dev/null
+++ b/test/dtest/Makefile.am
@@ -0,0 +1,4 @@
+bin_PROGRAMS = dtest
+dtest_SOURCES = dtest.c
+dtest_LDADD = $(srcdir)/../../dat/udat/libdat.la
+
diff --git a/test/dtest/configure.in b/test/dtest/configure.in
new file mode 100755
index 0000000..822df5e
--- /dev/null
+++ b/test/dtest/configure.in
@@ -0,0 +1,21 @@
+dnl Process this file with autoconf to produce a configure script.
+
+AC_PREREQ(2.57)
+AC_INIT(dtest, 1.2.1, dapl-devel at lists.sourceforge.net)
+AC_CONFIG_SRCDIR([$top_srcdir/dapl/test/dtest/dtest.c])
+AC_CONFIG_AUX_DIR(config)
+AM_CONFIG_HEADER(config.h)
+AM_INIT_AUTOMAKE(dtest, 1.2.1)
+
+AM_PROG_LIBTOOL
+
+dnl Checks for programs
+AC_PROG_CC
+
+dnl Checks for libraries
+
+dnl Checks for header files.
+
+AC_CONFIG_FILES([Makefile])
+
+AC_OUTPUT
diff --git a/test/dtest/makefile b/test/dtest/makefile
deleted file mode 100644
index 858d77f..0000000
--- a/test/dtest/makefile
+++ /dev/null
@@ -1,16 +0,0 @@
-CC         = gcc
-CFLAGS     = -O2 -g
-
-DAT_INC = ../../dat/include
-DAT_LIB = /usr/local/lib
-
-all: dtest 
-
-clean:
-	rm -f *.o;touch *.c;rm -f dtest
-
-dtest: ./dtest.c
-	$(CC) $(CFLAGS) ./dtest.c -o dtest \
-	-DDAPL_PROVIDER='"OpenIB-cma"' \
-	-I $(DAT_INC) -L $(DAT_LIB) -ldat
-


From vlad at lists.openfabrics.org  Sat Feb 24 02:28:03 2007
From: vlad at lists.openfabrics.org (vlad at lists.openfabrics.org)
Date: Sat, 24 Feb 2007 02:28:03 -0800 (PST)
Subject: [openib-general] ofa_1_2_kernel 20070224-0200 daily build status
Message-ID: <20070224102804.37709E607FD@openfabrics.org>

This email was generated automatically, please do not reply


Common build parameters:  --with-ipoib-mod --with-sdp-mod --with-srp-mod --with-user_mad-mod --with-user_access-mod --with-mthca-mod --with-core-mod --with-addr_trans-mod --with-cxgb3-mod 

Passed:
Passed on i686 with 2.6.15-23-server
Passed on i686 with linux-2.6.12
Passed on i686 with linux-2.6.17
Passed on i686 with linux-2.6.14
Passed on i686 with linux-2.6.13
Passed on i686 with linux-2.6.15
Passed on i686 with linux-2.6.16
Passed on i686 with linux-2.6.18
Passed on i686 with linux-2.6.19
Passed on powerpc with linux-2.6.19
Passed on x86_64 with linux-2.6.20
Passed on ppc64 with linux-2.6.12
Passed on powerpc with linux-2.6.18
Passed on x86_64 with linux-2.6.19
Passed on ia64 with linux-2.6.19
Passed on x86_64 with linux-2.6.18
Passed on ia64 with linux-2.6.15
Passed on x86_64 with linux-2.6.13
Passed on x86_64 with linux-2.6.17
Passed on ia64 with linux-2.6.12
Passed on ppc64 with linux-2.6.19
Passed on ia64 with linux-2.6.14
Passed on x86_64 with linux-2.6.14
Passed on x86_64 with linux-2.6.12
Passed on ia64 with linux-2.6.16
Passed on ppc64 with linux-2.6.18
Passed on ia64 with linux-2.6.17
Passed on ia64 with linux-2.6.18
Passed on ia64 with linux-2.6.13
Passed on x86_64 with linux-2.6.16
Passed on powerpc with linux-2.6.17
Passed on x86_64 with linux-2.6.15
Passed on ppc64 with linux-2.6.14
Passed on ppc64 with linux-2.6.15
Passed on powerpc with linux-2.6.13
Passed on powerpc with linux-2.6.15
Passed on powerpc with linux-2.6.14
Passed on powerpc with linux-2.6.16
Passed on ppc64 with linux-2.6.16
Passed on powerpc with linux-2.6.12
Passed on ppc64 with linux-2.6.13
Passed on x86_64 with linux-2.6.9-42.ELsmp
Passed on x86_64 with linux-2.6.5-7.244-smp
Passed on x86_64 with linux-2.6.16.21-0.8-smp
Passed on ppc64 with linux-2.6.17
Passed on x86_64 with linux-2.6.18-1.2798.fc6

Failed:
Build failed on ia64 with linux-2.6.16.21-0.8-default
Log:
/home/vlad/tmp/ofa_1_2_kernel-20070224-0200_linux-2.6.16.21-0.8-default_ia64_check/include/rdma/ib_verbs.h:1590: error: implicit declaration of function ‘sg_dma_len’
/home/vlad/tmp/ofa_1_2_kernel-20070224-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core/addr.c: At top level:
/home/vlad/tmp/ofa_1_2_kernel-20070224-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core/addr.c:61: warning: initialization from incompatible pointer type
make[4]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070224-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core/addr.o] Error 1
make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070224-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core] Error 2
make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070224-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband] Error 2
make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070224-0200_linux-2.6.16.21-0.8-default_ia64_check] Error 2
make[1]: Leaving directory `/home/vlad/kernel.org/ia64/linux-2.6.16.21-0.8-default'
make: *** [kernel] Error 2
----------------------------------------------------------------------------------
Build failed on x86_64 with linux-2.6.9-34.ELsmp
Log:
/home/vlad/tmp/ofa_1_2_kernel-20070224-0200_linux-2.6.9-34.ELsmp_x86_64_check/drivers/net/cxgb3/cxgb3_offload.c: In function ‘add_adapter’:
/home/vlad/tmp/ofa_1_2_kernel-20070224-0200_linux-2.6.9-34.ELsmp_x86_64_check/drivers/net/cxgb3/cxgb3_offload.c:1061: error: ‘adapter_list_lock’ undeclared (first use in this function)
/home/vlad/tmp/ofa_1_2_kernel-20070224-0200_linux-2.6.9-34.ELsmp_x86_64_check/drivers/net/cxgb3/cxgb3_offload.c: In function ‘remove_adapter’:
/home/vlad/tmp/ofa_1_2_kernel-20070224-0200_linux-2.6.9-34.ELsmp_x86_64_check/drivers/net/cxgb3/cxgb3_offload.c:1068: error: ‘adapter_list_lock’ undeclared (first use in this function)
make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070224-0200_linux-2.6.9-34.ELsmp_x86_64_check/drivers/net/cxgb3/cxgb3_offload.o] Error 1
make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070224-0200_linux-2.6.9-34.ELsmp_x86_64_check/drivers/net/cxgb3] Error 2
make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070224-0200_linux-2.6.9-34.ELsmp_x86_64_check] Error 2
make[1]: Leaving directory `/home/vlad/kernel.org/x86_64/linux-2.6.9-34.ELsmp'
make: *** [kernel] Error 2
----------------------------------------------------------------------------------
Build failed on x86_64 with linux-2.6.9-22.ELsmp
Log:
/home/vlad/tmp/ofa_1_2_kernel-20070224-0200_linux-2.6.9-22.ELsmp_x86_64_check/drivers/net/cxgb3/vsc8211.c:167: error: ‘ADVERTISE_PAUSE_CAP’ undeclared (first use in this function)
/home/vlad/tmp/ofa_1_2_kernel-20070224-0200_linux-2.6.9-22.ELsmp_x86_64_check/drivers/net/cxgb3/vsc8211.c:167: error: (Each undeclared identifier is reported only once
/home/vlad/tmp/ofa_1_2_kernel-20070224-0200_linux-2.6.9-22.ELsmp_x86_64_check/drivers/net/cxgb3/vsc8211.c:167: error: for each function it appears in.)
/home/vlad/tmp/ofa_1_2_kernel-20070224-0200_linux-2.6.9-22.ELsmp_x86_64_check/drivers/net/cxgb3/vsc8211.c:170: error: ‘ADVERTISE_PAUSE_ASYM’ undeclared (first use in this function)
make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070224-0200_linux-2.6.9-22.ELsmp_x86_64_check/drivers/net/cxgb3/vsc8211.o] Error 1
make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070224-0200_linux-2.6.9-22.ELsmp_x86_64_check/drivers/net/cxgb3] Error 2
make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070224-0200_linux-2.6.9-22.ELsmp_x86_64_check] Error 2
make[1]: Leaving directory `/home/vlad/kernel.org/x86_64/linux-2.6.9-22.ELsmp'
make: *** [kernel] Error 2
----------------------------------------------------------------------------------


From cppbala at yahoo.com  Sat Feb 24 06:33:08 2007
From: cppbala at yahoo.com (Bala)
Date: Sat, 24 Feb 2007 06:33:08 -0800 (PST)
Subject: [openib-general] ib0 interface up but can't ping
Message-ID: <327816.12892.qm@web35105.mail.mud.yahoo.com>

Hi All,
       we are trying to setup IB on HP c-class blade
system(x86_64) having 4X DDR IB Mezzanine HCA with
CentOS-4.

  1. Downloaded ofed-1.1 
  2. build the RPMS and installed two machines
  3. set the IP in icfg-ib0 and rebooted
  4. can see the ib0 interface up on both the machines
  5. but can't ping the other machine using IB
interface

pls let me know what we are missing or what is
required
to ping the other machines using IB??

  also we have tried with RHEL-4 and ofed-1.1 after 
installing the RPMS and rebooting the servers, we can
see the interface but can't set the IP it always 
complains that other machines using the IP.

pls let us know how we can over come this error

thanks in advance,
-bala-


____________________________________________________________________________________
Want to start your own business?
Learn how on Yahoo! Small Business.
http://smallbusiness.yahoo.com/r-index


From halr at voltaire.com  Sat Feb 24 07:28:50 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 24 Feb 2007 10:28:50 -0500
Subject: [openib-general] ib0 interface up but can't ping
In-Reply-To: <327816.12892.qm@web35105.mail.mud.yahoo.com>
References: <327816.12892.qm@web35105.mail.mud.yahoo.com>
Message-ID: <1172330920.4102.100648.camel@hal.voltaire.com>

On Sat, 2007-02-24 at 09:33, Bala wrote:
> Hi All,
>        we are trying to setup IB on HP c-class blade
> system(x86_64) having 4X DDR IB Mezzanine HCA with
> CentOS-4.
> 
>   1. Downloaded ofed-1.1 
>   2. build the RPMS and installed two machines
>   3. set the IP in icfg-ib0 and rebooted
>   4. can see the ib0 interface up on both the machines
>   5. but can't ping the other machine using IB
> interface
> 
> pls let me know what we are missing or what is
> required
> to ping the other machines using IB??
> 
>   also we have tried with RHEL-4 and ofed-1.1 after 
> installing the RPMS and rebooting the servers, we can
> see the interface but can't set the IP it always 
> complains that other machines using the IP.
> 
> pls let us know how we can over come this error

Are the ports in active state ?

-- Hal

> thanks in advance,
> -bala-
> 
> 
> 
> 
>  
> ____________________________________________________________________________________
> Want to start your own business?
> Learn how on Yahoo! Small Business.
> http://smallbusiness.yahoo.com/r-index
> 
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
> 


From dotanb at dev.mellanox.co.il  Sat Feb 24 09:54:02 2007
From: dotanb at dev.mellanox.co.il (dotanb at dev.mellanox.co.il)
Date: Sat, 24 Feb 2007 19:54:02 +0200 (IST)
Subject: [openib-general] ib0 interface up but can't ping
In-Reply-To: <1172330920.4102.100648.camel@hal.voltaire.com>
References: <327816.12892.qm@web35105.mail.mud.yahoo.com>
	<1172330920.4102.100648.camel@hal.voltaire.com>
Message-ID: <2199.85.65.223.188.1172339642.squirrel@dev.mellanox.co.il>

> On Sat, 2007-02-24 at 09:33, Bala wrote:
>> Hi All,
>>        we are trying to setup IB on HP c-class blade
>> system(x86_64) having 4X DDR IB Mezzanine HCA with
>> CentOS-4.
>>
>>   1. Downloaded ofed-1.1
>>   2. build the RPMS and installed two machines
>>   3. set the IP in icfg-ib0 and rebooted
>>   4. can see the ib0 interface up on both the machines
>>   5. but can't ping the other machine using IB
>> interface
>>
>> pls let me know what we are missing or what is
>> required
>> to ping the other machines using IB??
>>
>>   also we have tried with RHEL-4 and ofed-1.1 after
>> installing the RPMS and rebooting the servers, we can
>> see the interface but can't set the IP it always
>> complains that other machines using the IP.
>>
>> pls let us know how we can over come this error
>
> Are the ports in active state ?
>
OpenSM (or any other SM) must be active in order to use IPoIB.

Dotan


From sashak at voltaire.com  Sat Feb 24 12:13:42 2007
From: sashak at voltaire.com (Sasha Khapyorsky)
Date: Sat, 24 Feb 2007 22:13:42 +0200
Subject: [openib-general] [PATCH] opensm: updn performance improvements
Message-ID: <20070224201342.GB9147@sashak.voltaire.com>


There are various performance improvements for up/down routing engine:
- updn_node object which is referenced by switch's priv pointer
- ranking for switches only
- replace time consuming cl_list by cl_qlist
- reuse already collected up/down related information (in updn_node
  structure) instead of rediscovering
- eliminate many inner loops
- mask time consuming logging
- elminate using two lists with BFS
- minor cleaups

Now up/down looks 5-6 times faster.

Signed-off-by: Sasha Khapyorsky <sashak at voltaire.com>
---
 osm/opensm/osm_ucast_updn.c |  743 +++++++++++++++----------------------------
 1 files changed, 257 insertions(+), 486 deletions(-)

diff --git a/osm/opensm/osm_ucast_updn.c b/osm/opensm/osm_ucast_updn.c
index 8b86958..e8282f4 100644
--- a/osm/opensm/osm_ucast_updn.c
+++ b/osm/opensm/osm_ucast_updn.c
@@ -1,5 +1,5 @@
 /*
- * Copyright (c) 2004-2006 Voltaire, Inc. All rights reserved.
+ * Copyright (c) 2004-2007 Voltaire, Inc. All rights reserved.
  * Copyright (c) 2002-2006 Mellanox Technologies LTD. All rights reserved.
  * Copyright (c) 1996-2003 Intel Corporation. All rights reserved.
  *
@@ -40,8 +40,6 @@
  *
  * Environment:
  *      Linux User Mode
- *
- * $Revision: 1.0 $
  */
 
 #if HAVE_CONFIG_H
@@ -61,25 +59,10 @@
 /* direction */
 typedef enum _updn_switch_dir
 {
-    UP = 0,
-    DOWN
+  UP = 0,
+  DOWN
 } updn_switch_dir_t;
 
-/* This enum respresent available states in the UPDN algorithm */
-typedef enum _updn_state
-{
-    UPDN_INIT = 0,
-    UPDN_RANK,
-    UPDN_MIN_HOP_CALC,
-} updn_state_t;
-
-/* Rank value of this node */
-typedef struct _updn_rank
-{
-  cl_map_item_t map_item;
-  uint8_t rank;
-} updn_rank_t;
-
 /* Histogram element - the number of occurences of the same hop value */
 typedef struct _updn_hist
 {
@@ -87,12 +70,6 @@ typedef struct _updn_hist
   uint32_t bar_value;
 } updn_hist_t;
 
-typedef struct _updn_next_step
-{
-  updn_switch_dir_t state;
-  osm_switch_t *p_sw;
-} updn_next_step_t;
-
 /* guids list */
 typedef struct _updn_input
 {
@@ -100,17 +77,26 @@ typedef struct _updn_input
   uint64_t *guid_list;
 } updn_input_t;
 
+struct updn_node {
+  cl_list_item_t list;
+  osm_switch_t *sw;
+  updn_switch_dir_t dir;
+  unsigned rank;
+  unsigned is_root;
+  unsigned visited;
+};
+
 /* updn structure */
 typedef struct _updn
 {
-  updn_state_t   state;
   boolean_t      auto_detect_root_nodes;
-  cl_qmap_t      guid_rank_tbl;
   updn_input_t   updn_ucast_reg_inputs;
-  cl_list_t *    p_root_nodes;
+  cl_list_t     *p_root_nodes;
   osm_opensm_t  *p_osm;
 } updn_t;
 
+#define NOISE_L(log, fmt, arg...)
+
 /* ///////////////////////////////// */
 /*  Statics                          */
 /* ///////////////////////////////// */
@@ -122,27 +108,17 @@ static void __osm_updn_find_root_nodes_by_min_hop(OUT updn_t *p_updn);
    remote ports */
 static updn_switch_dir_t
 __updn_get_dir(
-  IN updn_t *p_updn,
-  IN uint8_t cur_rank,
-  IN uint8_t rem_rank,
+  IN unsigned cur_rank,
+  IN unsigned rem_rank,
   IN uint64_t cur_guid,
-  IN uint64_t rem_guid )
+  IN uint64_t rem_guid,
+  IN unsigned cur_is_root,
+  IN unsigned rem_is_root )
 {
-  uint32_t i = 0, max_num_guids = p_updn->updn_ucast_reg_inputs.num_guids;
-  uint64_t *p_guid = p_updn->updn_ucast_reg_inputs.guid_list;
-  boolean_t cur_is_root = FALSE, rem_is_root = FALSE;
-
   /* HACK: comes to solve root nodes connection, in a classic subnet root nodes do not connect
-     directly, but in case they are we assign to root node an UP direction to allow UPDN discover
+     directly, but in case they are we assign to root node an UP direction to allow UPDN to discover
      the subnet correctly (and not from the point of view of the last root node).
   */
-  for ( i = 0; i < max_num_guids; i++ )
-  {
-    if (cur_guid == p_guid[i])
-      cur_is_root = TRUE;
-    if (rem_guid == p_guid[i])
-      rem_is_root = TRUE;
-  }
   if (cur_is_root && rem_is_root)
     return UP;
 
@@ -162,58 +138,18 @@ __updn_get_dir(
 
 /**********************************************************************
  **********************************************************************/
-/* This function creates a new element of updn_next_step_t type then return its
-   pointer, Null if malloc has failed */
-static updn_next_step_t*
-__updn_create_updn_next_step_t(
-  IN updn_switch_dir_t state,
-  IN osm_switch_t* const p_sw )
-{
-  updn_next_step_t *p_next_step;
-
-  p_next_step = (updn_next_step_t*) malloc(sizeof(*p_next_step));
-  if (p_next_step)
-  {
-    memset(p_next_step, 0, sizeof(*p_next_step));
-    p_next_step->state = state;
-    p_next_step->p_sw = p_sw;
-  }
-
-  return p_next_step;
-}
-
-/**********************************************************************
- **********************************************************************/
-/* This function updates an element in the qmap list by guid index and rank value */
+/* This function updates rank value for a node */
 /* Return 0 if no need to further update 1 if brought a new value */
 static int
 __updn_update_rank(
-  IN cl_qmap_t *p_guid_rank_tbl,
-  IN ib_net64_t guid,
-  IN uint8_t rank )
+  IN struct updn_node *u,
+  IN unsigned rank )
 {
-  updn_rank_t *p_updn_rank;
-
-  p_updn_rank = (updn_rank_t*) cl_qmap_get(p_guid_rank_tbl, guid);
-  if (p_updn_rank == (updn_rank_t*) cl_qmap_end(p_guid_rank_tbl))
+  if (u->rank > rank)
   {
-    p_updn_rank = (updn_rank_t*) malloc(sizeof(updn_rank_t));
-
-    CL_ASSERT(p_updn_rank);
-
-    p_updn_rank->rank = rank;
-
-    cl_qmap_insert(p_guid_rank_tbl, guid, &p_updn_rank->map_item);
+    u->rank = rank;
     return 1;
   }
-  else
-  {
-    if (p_updn_rank->rank > rank)
-    {
-      p_updn_rank->rank = rank;
-      return 1;
-    }
-  }
   return 0;
 }
 
@@ -223,20 +159,18 @@ __updn_update_rank(
  **********************************************************************/
 static int
 __updn_bfs_by_node(
-  IN updn_t *p_updn,
+  IN osm_log_t *p_log,
   IN osm_subn_t *p_subn,
-  IN osm_port_t *p_port,
-  IN cl_qmap_t *p_guid_rank_tbl )
+  IN osm_port_t *p_port )
 {
   /* Init local vars */
   osm_switch_t *p_self_node = NULL;
   uint8_t pn, pn_rem;
   osm_physp_t *p_physp, *p_remote_physp;
-  cl_list_t *p_currList, *p_nextList;
+  cl_qlist_t list;
   uint16_t root_lid;
-  updn_next_step_t *p_updn_switch, *p_tmp;
+  struct updn_node *u;
   updn_switch_dir_t next_dir, current_dir;
-  osm_log_t *p_log = &p_updn->p_osm->log;
 
   OSM_LOG_ENTER( p_log, __updn_bfs_by_node );
 
@@ -248,21 +182,6 @@ __updn_bfs_by_node(
     return 1;
   }
 
-  /* Init the list pointers */
-  p_nextList = (cl_list_t*)malloc(sizeof(cl_list_t));
-  if (!p_nextList)
-  {
-    osm_log( p_log, OSM_LOG_ERROR,
-             "__updn_bfs_by_node: ERR AA14: "
-             "No memory for p_nextList\n" );
-    OSM_LOG_EXIT( p_log );
-    return 1;
-  }
-
-  cl_list_construct( p_nextList );
-  cl_list_init( p_nextList, 10 );
-  p_currList = p_nextList;
-
   /* The Root BFS - lid  */
   root_lid = cl_ntoh16(osm_physp_get_base_lid( p_physp ));
   /* printf ("-V- BFS through lid : 0x%x\n", root_lid); */
@@ -273,7 +192,7 @@ __updn_bfs_by_node(
   if (p_port->p_node->sw)
   {
     p_self_node = p_port->p_node->sw;
-    /* Update its Min Hop Table */
+    /* Update it's Min Hop Table */
     osm_log( p_log, OSM_LOG_DEBUG,
              "__updn_bfs_by_node: "
              "Update Min Hop Table of GUID 0x%" PRIx64 "\n",
@@ -282,7 +201,7 @@ __updn_bfs_by_node(
   }
   else
   {
-    /* This is a CA or router - need to take its remote port */
+    /* This is a CA or router - need to take it's remote port */
     p_remote_physp = p_physp->p_remote_physp;
     /*
       make sure that the following occur:
@@ -304,7 +223,7 @@ __updn_bfs_by_node(
       else
       {
         p_self_node = p_remote_physp->p_node->sw;
-        /* Update its Min Hop Table */
+        /* Update it's Min Hop Table */
         /* NOTE : Check if there is a function which prints the Min Hop Table */
         osm_log( p_log, OSM_LOG_DEBUG,
                  "__updn_bfs_by_node: "
@@ -322,201 +241,111 @@ __updn_bfs_by_node(
            "Starting from switch - port GUID 0x%" PRIx64 "\n",
            cl_ntoh64(p_self_node->p_node->node_info.port_guid) );
 
-  /* Update list with the updn_next_step_t new element */
-  /* NOTE : When inserting an item which is a pointer to a struct, does remove
-     action also free its memory */
-  if (!(p_tmp=__updn_create_updn_next_step_t(UP, p_self_node)))
-  {
-    osm_log( p_log, OSM_LOG_ERROR,
-             "__updn_bfs_by_node:  ERR AA08: "
-             "Could not create updn_next_step_t\n" );
-    return 1;
-  }
+  /* Update current list with the new element */
+  u = p_self_node->priv;
+  u->dir = UP;
 
-  cl_list_insert_tail(p_currList, p_tmp);
+  cl_qlist_init(&list);
+  cl_qlist_insert_tail(&list, &u->list);
 
   /* BFS the list till no next element */
-  osm_log( p_log, OSM_LOG_VERBOSE,
-           "__updn_bfs_by_node: "
-           "BFS the subnet [\n" );
-
-  while (!cl_is_list_empty(p_currList))
+  while (!cl_is_qlist_empty(&list))
   {
-    osm_log( p_log, OSM_LOG_DEBUG,
+    ib_net64_t remote_guid, current_guid;
+
+    NOISE_L( p_log, OSM_LOG_DEBUG,
              "__updn_bfs_by_node: "
              "Starting a new iteration with %zu elements in current list\n",
-             cl_list_count(p_currList) );
-    /* Init the switch directed list */
-    p_nextList = (cl_list_t*)malloc(sizeof(cl_list_t));
-    if (!p_nextList)
-    {
-      osm_log( p_log, OSM_LOG_ERROR,
-               "__updn_bfs_by_node: ERR AA15: "
-               "No memory for p_nextList\n" );
-      OSM_LOG_EXIT( p_log );
-      return 1;
-    }
+             cl_qlist_count(&list) );
 
-    cl_list_construct( p_nextList );
-    cl_list_init( p_nextList, 10 );
-    /* Go over all current list items till it's empty */
-    /* printf ("-V- In inner while\n"); */
-    p_updn_switch = (updn_next_step_t*)cl_list_remove_head( p_currList );
-    /* While there is a pointer to updn struct we continue to BFS */
-    while (p_updn_switch)
+    u = (struct updn_node *)cl_qlist_remove_head(&list);
+    u->visited = 0; /* cleanup */
+    current_dir = u->dir;
+    current_guid = osm_node_get_node_guid(u->sw->p_node);
+    NOISE_L( p_log, OSM_LOG_DEBUG,
+             "__updn_bfs_by_node: "
+             "Visiting port GUID 0x%" PRIx64 "\n",
+             cl_ntoh64(current_guid) );
+    /* Go over all ports of the switch and find unvisited remote nodes */
+    for ( pn = 0; pn < osm_switch_get_num_ports(u->sw); pn++ )
     {
-      current_dir = p_updn_switch->state;
-      osm_log( p_log, OSM_LOG_DEBUG,
+      osm_node_t *p_remote_node;
+      struct updn_node *rem_u;
+      uint8_t current_min_hop, remote_min_hop, set_hop_return_value;
+      osm_switch_t *p_remote_sw;
+
+      p_remote_node = osm_node_get_remote_node(u->sw->p_node, pn, &pn_rem);
+      /* If no remote node OR remote node is not a SWITCH
+         continue to next pn */
+      if( !p_remote_node || !p_remote_node->sw )
+        continue;
+      /* Fetch remote guid only after validation of remote node */
+      remote_guid = osm_node_get_node_guid(p_remote_node);
+      p_remote_sw = p_remote_node->sw;
+      rem_u = p_remote_sw->priv;
+      /* Decide which direction to mark it (UP/DOWN) */
+      next_dir = __updn_get_dir(u->rank, rem_u->rank,
+                                current_guid, remote_guid,
+                                u->is_root, rem_u->is_root);
+
+      NOISE_L( p_log, OSM_LOG_DEBUG,
                "__updn_bfs_by_node: "
-               "Visiting port GUID 0x%" PRIx64 "\n",
-               cl_ntoh64(p_updn_switch->p_sw->p_node->node_info.port_guid) );
-      /* Go over all ports of the switch and find unvisited remote nodes */
-      for ( pn = 0; pn < osm_switch_get_num_ports(p_updn_switch->p_sw); pn++ )
+               "move from 0x%016" PRIx64 " rank: %u "
+               "to 0x%016" PRIx64" rank: %u\n",
+               cl_ntoh64(current_guid), u->rank,
+               cl_ntoh64(remote_guid), rem->rank );
+      /* Check if this is a legal step : the only illegal step is going
+         from DOWN to UP */
+      if ((current_dir == DOWN) && (next_dir == UP))
       {
-        /* printf("-V- Inner for in port num 0x%X\n", pn); */
-        osm_node_t *p_remote_node;
-        cl_list_iterator_t updn_switch_iterator;
-        boolean_t HasVisited = FALSE;
-        ib_net64_t remote_guid,current_guid;
-        updn_rank_t *p_rem_rank, *p_cur_rank;
-        uint8_t current_min_hop, remote_min_hop, set_hop_return_value;
-        osm_switch_t *p_remote_sw;
-
-        current_guid = osm_node_get_node_guid(p_updn_switch->p_sw->p_node);
-        p_remote_node = osm_node_get_remote_node( p_updn_switch->p_sw->p_node,
-                                                  pn, &pn_rem );
-        /* If no remote node OR remote node is not a SWITCH
-           continue to next pn */
-        if( !p_remote_node ||
-            (osm_node_get_type(p_remote_node) != IB_NODE_TYPE_SWITCH) )
-          continue;
-        /* Fetch remote guid only after validation of remote node */
-        remote_guid = osm_node_get_node_guid(p_remote_node);
-        /* printf ("-V- Current guid : 0x%" PRIx64 " Remote guid : 0x%" PRIx64 "\n", */
-        /* cl_ntoh64(current_guid), cl_ntoh64(remote_guid)); */
-        p_remote_sw = p_remote_node->sw;
-        p_rem_rank = (updn_rank_t*)cl_qmap_get(p_guid_rank_tbl, remote_guid);
-        p_cur_rank = (updn_rank_t*)cl_qmap_get(p_guid_rank_tbl, current_guid);
-        /* Decide which direction to mark it (UP/DOWN) */
-        next_dir = __updn_get_dir (p_updn, p_cur_rank->rank, p_rem_rank->rank,
-                                   current_guid, remote_guid);
-
         osm_log( p_log, OSM_LOG_DEBUG,
                  "__updn_bfs_by_node: "
-                 "move from 0x%016" PRIx64 " rank: %u "
-                 "to 0x%016" PRIx64" rank: %u\n",
-                 cl_ntoh64(current_guid), p_cur_rank->rank,
-                 cl_ntoh64(remote_guid), p_rem_rank->rank );
-        /* Check if this is a legal step : the only illegal step is going
-           from DOWN to UP */
-        if ((current_dir == DOWN) && (next_dir == UP))
+                 "Avoiding move from 0x%016" PRIx64 " to 0x%016" PRIx64"\n",
+                 cl_ntoh64(current_guid), cl_ntoh64(remote_guid) );
+        /* Illegal step */
+        continue;
+      }
+      /* Set MinHop value for the current lid */
+      current_min_hop = osm_switch_get_least_hops(u->sw, root_lid);
+      /* Check hop count if better insert into NextState list && update
+         the remote node Min Hop Table */
+      remote_min_hop = osm_switch_get_hop_count(p_remote_sw, root_lid, pn_rem);
+      if (current_min_hop + 1 < remote_min_hop)
+      {
+        NOISE_L( p_log, OSM_LOG_DEBUG,
+                 "__updn_bfs_by_node (less): "
+                 "Setting Min Hop Table of switch: 0x%" PRIx64
+                 "\n\t\tCurrent hop count is: %d, next hop count: %d"
+                 "\n\tlid to set: 0x%x"
+                 "\n\tport number: 0x%X"
+                 "\n\thops number: %d\n",
+                 cl_ntoh64(remote_guid), remote_min_hop,current_min_hop + 1,
+                 root_lid, pn_rem, current_min_hop + 1 );
+        set_hop_return_value = osm_switch_set_hops(p_remote_sw, root_lid, pn_rem, current_min_hop + 1);
+        if (set_hop_return_value)
         {
-          osm_log( p_log, OSM_LOG_DEBUG,
-                   "__updn_bfs_by_node: "
-                   "Avoiding move from 0x%016" PRIx64 " to 0x%016" PRIx64"\n",
-                   cl_ntoh64(current_guid), cl_ntoh64(remote_guid) );
-          /* Illegal step */
-          continue;
+          osm_log( p_log, OSM_LOG_ERROR,
+                   "__updn_bfs_by_node (less) ERR AA01: "
+                   "Invalid value returned from set min hop is: %d\n",
+                   set_hop_return_value );
         }
-        /* Set MinHop value for the current lid */
-        current_min_hop = osm_switch_get_least_hops(p_updn_switch->p_sw,root_lid);
-        /* Check hop count if better insert into NextState list && update
-           the remote node Min Hop Table */
-        remote_min_hop = osm_switch_get_hop_count(p_remote_sw, root_lid, pn_rem);
-        if (current_min_hop + 1 < remote_min_hop)
-        {
-          osm_log( p_log, OSM_LOG_DEBUG,
-                   "__updn_bfs_by_node (less): "
-                   "Setting Min Hop Table of switch: 0x%" PRIx64
-                   "\n\t\tCurrent hop count is: %d, next hop count: %d"
-                   "\n\tlid to set: 0x%x"
-                   "\n\tport number: 0x%X"
-                   " \n\thops number: %d\n",
-                   cl_ntoh64(remote_guid), remote_min_hop,current_min_hop + 1,
-                   root_lid, pn_rem, current_min_hop + 1 );
-          set_hop_return_value = osm_switch_set_hops(p_remote_sw, root_lid, pn_rem, current_min_hop + 1);
-          if (set_hop_return_value)
-          {
-            osm_log( p_log, OSM_LOG_ERROR,
-                     "__updn_bfs_by_node (less) ERR AA01: "
-                     "Invalid value returned from set min hop is: %d\n",
-                     set_hop_return_value );
-          }
-          /* Check if remote port is allready has been visited */
-          updn_switch_iterator = cl_list_head(p_nextList);
-          while( updn_switch_iterator != cl_list_end(p_nextList) )
-          {
-            updn_next_step_t *p_updn;
-            p_updn = (updn_next_step_t*)cl_list_obj(updn_switch_iterator);
-            /* Mark HasVisited only if:
-               1. Same node guid
-               2. Same direction
-            */
-            if ((p_updn->p_sw->p_node == p_remote_node) && (p_updn->state == next_dir))
-              HasVisited = TRUE;
-            updn_switch_iterator = cl_list_next(updn_switch_iterator);
-          }
-          if (!HasVisited)
-          {
-            /* Insert updn_switch item into the next list */
-            if(!(p_tmp=__updn_create_updn_next_step_t(next_dir, p_remote_sw)))
-            {
-              osm_log( p_log, OSM_LOG_ERROR,
-                       "__updn_bfs_by_node: ERR AA11: "
-                       "Could not create updn_next_step_t\n" );
-              return 1;
-            }
-            osm_log( p_log, OSM_LOG_DEBUG,
-                     "__updn_bfs_by_node: "
-                     "Inserting new element to the next list: guid=0x%" PRIx64 " %s\n",
-                     cl_ntoh64(p_tmp->p_sw->p_node->node_info.port_guid),
-                     (p_tmp->state == UP ? "UP" : "DOWN")
-                     );
-            cl_list_insert_tail(p_nextList, p_tmp);
-          }
-          /* If the same value only update entry - at the min hop table */
-        } else if (current_min_hop + 1 == osm_switch_get_hop_count(p_remote_sw,
-                                                                   root_lid,
-                                                                   pn_rem))
+        /* Check if remote port has already been visited */
+        if (!rem_u->visited)
         {
-          osm_log( p_log, OSM_LOG_DEBUG,
-                   "__updn_bfs_by_node (equal): "
-                   "Setting Min Hop Table of switch: 0x%" PRIx64
-                   "\n\t\tCurrent hop count is: %d, next hop count: %d"
-                   "\n\tlid to set: 0x%x"
-                   "\n\tport number: 0x%X"
-                   "\n\thops number: %d\n",
-                   cl_ntoh64(remote_guid),
-                   osm_switch_get_hop_count(p_remote_sw, root_lid, pn_rem),
-                   current_min_hop + 1, root_lid, pn_rem, current_min_hop + 1 );
-          set_hop_return_value = osm_switch_set_hops(p_remote_sw, root_lid, pn_rem, current_min_hop + 1);
-
-          if (set_hop_return_value)
-          {
-            osm_log( p_log, OSM_LOG_ERROR,
-                     "__updn_bfs_by_node (less) ERR AA12: "
-                     "Invalid value returned from set min hop is: %d\n",
-                     set_hop_return_value );
-          }
+          /* Insert updn_switch item into the next list */
+          rem_u->dir = next_dir;
+          rem_u->visited = 1;
+          NOISE_L( p_log, OSM_LOG_DEBUG,
+                   "__updn_bfs_by_node: "
+                   "Inserting new element to the next list: guid=0x%" PRIx64 " %s\n",
+                   cl_ntoh64(rem_u->sw->p_node->node_info.port_guid),
+                   (rem_u->dir == UP ? "UP" : "DOWN"));
+          cl_qlist_insert_tail(&list, &rem_u->list);
         }
       }
-      free (p_updn_switch);
-      p_updn_switch = (updn_next_step_t*)cl_list_remove_head( p_currList );
     }
-    /* Cleanup p_currList */
-    cl_list_destroy( p_currList );
-    free (p_currList);
-
-    /* Reassign p_currList to p_nextList */
-    p_currList = p_nextList;
   }
-  /* Cleanup p_currList - Had the pointer to cl_list_t */
-  cl_list_destroy( p_currList );
-  free (p_currList);
 
-  osm_log( p_log, OSM_LOG_VERBOSE,
-           "__updn_bfs_by_node: "
-           "BFS the subnet ]\n" );
   OSM_LOG_EXIT( p_log );
   return 0;
 }
@@ -527,23 +356,8 @@ static void
 updn_destroy(
   IN updn_t* const p_updn )
 {
-  cl_map_item_t *p_map_item;
   uint64_t *p_guid_list_item;
 
-  /* Destroy the updn struct */
-  p_map_item = cl_qmap_head( &p_updn->guid_rank_tbl);
-  while( p_map_item != cl_qmap_end( &p_updn->guid_rank_tbl ))
-  {
-    osm_log ( &p_updn->p_osm->log, OSM_LOG_DEBUG,
-              "updn_destroy: "
-              "guid = 0x%" PRIx64 " rank = %u\n",
-              cl_ntoh64(cl_qmap_key(p_map_item)),
-              ((updn_rank_t *)p_map_item)->rank );
-    cl_qmap_remove_item( &p_updn->guid_rank_tbl, p_map_item );
-    free( (updn_rank_t *)p_map_item);
-    p_map_item = cl_qmap_head( &p_updn->guid_rank_tbl );
-  }
-
   /* free the array of guids */
   if (p_updn->updn_ucast_reg_inputs.guid_list)
     free(p_updn->updn_ucast_reg_inputs.guid_list);
@@ -592,8 +406,6 @@ updn_init(
   OSM_LOG_ENTER( &p_osm->log, updn_init );
 
   p_updn->p_osm = p_osm;
-  p_updn->state = UPDN_INIT;
-  cl_qmap_init( &p_updn->guid_rank_tbl );
   p_list = (cl_list_t*)malloc(sizeof(cl_list_t));
   if (!p_list)
   {
@@ -691,171 +503,99 @@ updn_subn_rank(
   IN updn_t* p_updn )
 {
   /* Init local vars */
-  osm_port_t *p_root_port = NULL;
-  uint16_t tbl_size;
+  osm_switch_t *p_sw;
   uint8_t rank = base_rank;
-  osm_physp_t *p_physp, *p_remote_physp, *p_physp_temp;
-  cl_list_t *p_currList,*p_nextList;
+  osm_physp_t *p_physp, *p_remote_physp;
+  cl_qlist_t list;
   cl_status_t did_cause_update;
+  struct updn_node *u, *remote_u;
   uint8_t num_ports, port_num;
   osm_log_t *p_log = &p_updn->p_osm->log;
 
   OSM_LOG_ENTER( p_log, updn_subn_rank );
 
-  osm_log( p_log, OSM_LOG_VERBOSE,
-           "updn_subn_rank: "
-           "Ranking starts from GUID 0x%" PRIx64 "\n", root_guid );
-
-  /* Init the list pointers */
-  p_nextList = (cl_list_t*)malloc(sizeof(cl_list_t));
-  if (!p_nextList)
+  p_sw = osm_get_switch_by_guid(&p_updn->p_osm->subn, cl_hton64(root_guid));
+  if(!p_sw)
   {
     osm_log( p_log, OSM_LOG_ERROR,
-             "updn_subn_rank: ERR AA15: "
-             "No memory for p_nextList\n" );
+             "updn_subn_rank: ERR AA05: "
+             "Wrong switch GUID 0x%" PRIx64 "\n", root_guid );
     OSM_LOG_EXIT( p_log );
     return 1;
   }
 
-  cl_list_construct( p_nextList );
-  cl_list_init( p_nextList, 10 );
-  p_currList = p_nextList;
+  osm_log( p_log, OSM_LOG_VERBOSE,
+           "updn_subn_rank: "
+           "Ranking starts from GUID 0x%" PRIx64 "\n", root_guid );
 
-  /* Check valid subnet & guid */
-  tbl_size = (uint16_t)(cl_qmap_count(&p_updn->p_osm->subn.port_guid_tbl));
-  if (tbl_size == 0)
-  {
-    osm_log( p_log, OSM_LOG_ERROR,
-             "updn_subn_rank: ERR AA04: "
-             "Port guid table is empty, cannot perform ranking\n" );
-    OSM_LOG_EXIT( p_log );
-    return 1;
-  }
+  u = p_sw->priv;
+  u->is_root = 1;
 
-  p_root_port = (osm_port_t*) cl_qmap_get(&p_updn->p_osm->subn.port_guid_tbl,
-                                          cl_ntoh64(root_guid));
-  if( p_root_port == (osm_port_t*)cl_qmap_end( &p_updn->p_osm->subn.port_guid_tbl ) )
-  {
-    osm_log( p_log, OSM_LOG_ERROR,
-             "updn_subn_rank: ERR AA05: "
-             "Wrong guid value: 0x%" PRIx64 "\n", root_guid );
-    OSM_LOG_EXIT( p_log );
-    return 1;
-  }
-
-  /* Rank the first chosen guid anyway since its the base rank */
+  /* Rank the first guid chosen anyway since it's the base rank */
   osm_log( p_log, OSM_LOG_DEBUG,
            "updn_subn_rank: "
            "Ranking port GUID 0x%" PRIx64 "\n", root_guid );
 
-  __updn_update_rank(&p_updn->guid_rank_tbl, cl_ntoh64(root_guid), rank);
-  /*
-    HACK: We are assuming SM is running on HCA, so when getting the default
-    port we'll get the port connected to the rest of the subnet. If SM is
-    running on SWITCH - we should try to get a dr path from all switch ports.
-  */
-  p_physp = osm_port_get_default_phys_ptr( p_root_port );
-  CL_ASSERT( p_physp );
-  CL_ASSERT( osm_physp_is_valid( p_physp ) );
-  /* We can safely add the node to the list */
-  cl_list_insert_tail(p_nextList, p_physp);
-  /* Assign pointer to the list for BFS */
-  p_currList = p_nextList;
-
-  /* BFS the list till its empty */
-  osm_log( p_log, OSM_LOG_VERBOSE,
-           "updn_subn_rank: "
-           "BFS the subnet [\n" );
+  __updn_update_rank(u, rank);
+
+  cl_qlist_init(&list);
+  cl_qlist_insert_tail(&list, &u->list);
 
-  while (!cl_is_list_empty(p_currList))
+  /* BFS the list till it's empty */
+  while (!cl_is_qlist_empty(&list))
   {
     rank++;
-    p_nextList = (cl_list_t*)malloc(sizeof(cl_list_t));
-    if (!p_nextList)
-    {
-      osm_log( p_log, OSM_LOG_ERROR,
-               "updn_subn_rank: ERR AA16: "
-               "No memory for p_nextList\n" );
-      OSM_LOG_EXIT( p_log );
-      return 1;
-    }
 
-    cl_list_construct( p_nextList );
-    cl_list_init( p_nextList, 10 );
-    p_physp = (osm_physp_t*)cl_list_remove_head( p_currList );
-    /* Go over all remote nodes and rank them (if not allready visited) till
-       no elemtent in the list p_currList */
-    while ( p_physp != NULL )
+    u = (struct updn_node *)cl_qlist_remove_head(&list);
+    /* Go over all remote nodes and rank them (if not already visited) */
+    p_sw = u->sw;
+    num_ports = osm_switch_get_num_ports(p_sw);
+    osm_log( p_log, OSM_LOG_DEBUG,
+             "updn_subn_rank: "
+             "Handling switch GUID 0x%" PRIx64 "\n",
+             cl_ntoh64(osm_node_get_node_guid(p_sw->p_node)) );
+    for (port_num = 1; port_num < num_ports; port_num++)
     {
-      num_ports = osm_node_get_num_physp( p_physp->p_node );
-      osm_log( p_log, OSM_LOG_DEBUG,
-               "updn_subn_rank: "
-               "Handling port GUID 0x%" PRIx64 "\n",
-               cl_ntoh64(p_physp->port_guid) );
-      for (port_num = 1; port_num < num_ports; port_num++)
+      ib_net64_t port_guid;
+
+      /* Current port fetched in order to get remote side */
+      p_physp = osm_node_get_physp_ptr( p_sw->p_node, port_num );
+      p_remote_physp = p_physp->p_remote_physp;
+
+      /*
+        make sure that all the following occur on p_remote_physp:
+        1. The port isn't NULL
+        2. The port is a valid port
+        3. It is a switch
+      */
+      if ( p_remote_physp &&
+           osm_physp_is_valid( p_remote_physp ) &&
+           p_remote_physp->p_node->sw )
       {
-        ib_net64_t port_guid;
-
-        /* Current port fetched in order to get remote side */
-        p_physp_temp = osm_node_get_physp_ptr( p_physp->p_node, port_num );
-        p_remote_physp = p_physp_temp->p_remote_physp;
-
-        /*
-          make sure that all the following occur on p_remote_physp:
-          1. The port isn't NULL
-          2. The port is a valid port
-        */
-        if ( p_remote_physp &&
-             osm_physp_is_valid ( p_remote_physp ))
-        {
-          port_guid = p_remote_physp->port_guid;
-          osm_log( p_log, OSM_LOG_DEBUG,
-                   "updn_subn_rank: "
-                   "Visiting remote port GUID 0x%" PRIx64 "\n",
-                   cl_ntoh64(port_guid) );
-          /* Was it visited ?
-             Only if the pointer equal to cl_qmap_end its not
-             found in the list */
-          osm_log( p_log, OSM_LOG_DEBUG,
-                   "updn_subn_rank: "
-                   "Ranking port GUID 0x%" PRIx64 "\n", cl_ntoh64(port_guid) );
-          did_cause_update = __updn_update_rank(&p_updn->guid_rank_tbl, port_guid, rank);
-
-          osm_log( p_log, OSM_LOG_VERBOSE,
-                   "updn_subn_rank: "
-                   "Rank of port GUID 0x%" PRIx64 " = %u\n", cl_ntoh64(port_guid),
-                   ((updn_rank_t*)cl_qmap_get(&p_updn->guid_rank_tbl, port_guid))->rank
-                   );
-
-          if (did_cause_update)
-          {
-            cl_list_insert_tail(p_nextList, p_remote_physp);
-          }
-        }
+        remote_u = p_remote_physp->p_node->sw->priv;
+        port_guid = p_remote_physp->port_guid;
+        NOISE_L( p_log, OSM_LOG_DEBUG,
+                 "updn_subn_rank: "
+                 "Ranking port GUID 0x%" PRIx64 "\n", cl_ntoh64(port_guid) );
+        did_cause_update = __updn_update_rank(remote_u, rank);
+
+        osm_log( p_log, OSM_LOG_DEBUG,
+                 "updn_subn_rank: "
+                 "Rank of port GUID 0x%" PRIx64 " = %u\n",
+                 cl_ntoh64(port_guid),
+                 remote_u->rank );
+
+        if (did_cause_update)
+          cl_qlist_insert_tail(&list, &remote_u->list);
       }
-      /* Propagte through the next item in the p_currList */
-      p_physp = (osm_physp_t*)cl_list_remove_head( p_currList );
     }
-    /* First free the allocation of cl_list pointer then reallocate */
-    cl_list_destroy( p_currList );
-    free(p_currList);
-    /* p_currList is empty - need to assign it to p_nextList */
-    p_currList = p_nextList;
   }
 
-  osm_log( p_log, OSM_LOG_VERBOSE,
-           "updn_subn_rank: "
-           "BFS the subnet ]\n" );
-
-  cl_list_destroy( p_currList );
-  free(p_currList);
-
   /* Print Summary of ranking */
   osm_log( p_log, OSM_LOG_VERBOSE,
            "updn_subn_rank: "
            "Rank Info :\n\t Root Guid = 0x%" PRIx64 "\n\t Max Node Rank = %d\n",
-           cl_ntoh64(p_root_port->guid), rank );
-  p_updn->state = UPDN_RANK;
+           root_guid, rank );
   OSM_LOG_EXIT( p_log );
   return 0;
 }
@@ -875,25 +615,6 @@ __osm_subn_set_up_down_min_hop_table(
 
   OSM_LOG_ENTER( p_log, __osm_subn_set_up_down_min_hop_table );
 
-  if (p_updn->state == UPDN_INIT)
-  {
-    osm_log( p_log, OSM_LOG_ERROR,
-             "__osm_subn_set_up_down_min_hop_table: ERR AA06: "
-             "Calculating Min Hop only allowed after ranking\n" );
-    OSM_LOG_EXIT( p_log );
-    return 1;
-  }
-
-  /* Check if its a non switched subnet .. */
-  if ( cl_is_qmap_empty( &p_subn->sw_guid_tbl ) )
-  {
-    osm_log( p_log, OSM_LOG_ERROR,
-             "__osm_subn_set_up_down_min_hop_table: ERR AA10: "
-             "This is a non switched subnet, cannot perform UPDN algorithm\n" );
-    OSM_LOG_EXIT( p_log );
-    return 1;
-  }
-
   /* Go over all the switches in the subnet - for each init their Min Hop
      Table */
   osm_log( p_log, OSM_LOG_VERBOSE,
@@ -927,8 +648,7 @@ __osm_subn_set_up_down_min_hop_table(
              "__osm_subn_set_up_down_min_hop_table: "
              "BFS through port GUID 0x%" PRIx64 "\n",
              cl_ntoh64(port_guid) );
-    if(__updn_bfs_by_node(p_updn, p_subn, p_port,
-                          &p_updn->guid_rank_tbl))
+    if(__updn_bfs_by_node(p_log, p_subn, p_port))
     {
       OSM_LOG_EXIT( p_log );
       return 1;
@@ -952,7 +672,6 @@ __osm_subn_calc_up_down_min_hop_table(
   IN updn_t* p_updn )
 {
   uint8_t idx = 0;
-  cl_map_item_t *p_map_item;
   int status;
 
   OSM_LOG_ENTER( &p_updn->p_osm->log, osm_subn_calc_up_down_min_hop_table );
@@ -965,7 +684,18 @@ __osm_subn_calc_up_down_min_hop_table(
     osm_log( &p_updn->p_osm->log, OSM_LOG_ERROR,
              "__osm_subn_calc_up_down_min_hop_table: ERR AA0A: "
              "No guids were given or number of guids is 0\n" );
-    return 1;
+    status = -1;
+    goto _exit;
+  }
+
+  /* Check if it's not a switched subnet */
+  if ( cl_is_qmap_empty( &p_updn->p_osm->subn.sw_guid_tbl ) )
+  {
+    osm_log( &p_updn->p_osm->log, OSM_LOG_ERROR,
+             "__osm_subn_calc_up_down_min_hop_table: ERR AAOB: "
+             "This is not a switched subnet, cannot perform UPDN algorithm\n" );
+    status = -1;
+    goto _exit;
   }
 
   for (idx = 0; idx < num_guids; idx++)
@@ -980,27 +710,16 @@ __osm_subn_calc_up_down_min_hop_table(
 
   status = __osm_subn_set_up_down_min_hop_table(p_updn);
 
-  /* Cleanup updn rank tbl */
-  p_map_item = cl_qmap_head( &p_updn->guid_rank_tbl);
-  while( p_map_item != cl_qmap_end( &p_updn->guid_rank_tbl ))
-  {
-    osm_log( &p_updn->p_osm->log, OSM_LOG_DEBUG,
-             "__osm_subn_calc_up_down_min_hop_table: "
-             "guid = 0x%" PRIx64 " rank = %u\n",
-             cl_ntoh64(cl_qmap_key(p_map_item)),
-             ((updn_rank_t *)p_map_item)->rank );
-    cl_qmap_remove_item( &p_updn->guid_rank_tbl, p_map_item);
-    free( (updn_rank_t *)p_map_item);
-    p_map_item = cl_qmap_head( &p_updn->guid_rank_tbl);
-  }
-
+ _exit:
   OSM_LOG_EXIT( &p_updn->p_osm->log );
   return status;
 }
 
 /**********************************************************************
  **********************************************************************/
-static void expand_lid_matrices_for_lmc(osm_subn_t *p_subn)
+static void
+expand_lid_matrices_for_lmc(
+  osm_subn_t *p_subn )
 {
   cl_map_item_t *p_next_port, *p_next_sw;
   osm_port_t *p_port;
@@ -1009,7 +728,8 @@ static void expand_lid_matrices_for_lmc(osm_subn_t *p_subn)
   uint8_t port, num_ports;
 
   p_next_port = cl_qmap_head( &p_subn->port_guid_tbl );
-  while (p_next_port != cl_qmap_end(&p_subn->port_guid_tbl)) {
+  while (p_next_port != cl_qmap_end(&p_subn->port_guid_tbl))
+  {
     p_port = (osm_port_t *)p_next_port;
     p_next_port = cl_qmap_next(p_next_port);
     if (p_port->p_node->sw &&
@@ -1019,7 +739,8 @@ static void expand_lid_matrices_for_lmc(osm_subn_t *p_subn)
     if (!min_lid || min_lid == max_lid)
       continue;
     p_next_sw = cl_qmap_head(&p_subn->sw_guid_tbl);
-    while (p_next_sw != cl_qmap_end(&p_subn->sw_guid_tbl)) {
+    while (p_next_sw != cl_qmap_end(&p_subn->sw_guid_tbl))
+    {
       p_sw = (osm_switch_t *)p_next_sw;
       p_next_sw = cl_qmap_next(p_next_sw);
       num_ports = osm_switch_get_num_ports(p_sw);
@@ -1034,20 +755,62 @@ static void expand_lid_matrices_for_lmc(osm_subn_t *p_subn)
 
 /**********************************************************************
  **********************************************************************/
+static struct updn_node *
+create_updn_node(
+  osm_switch_t *sw )
+{
+  struct updn_node *u;
+
+  u = malloc(sizeof(*u));
+  if (!u)
+    return NULL;
+  memset(u, 0, sizeof(*u));
+  u->sw = sw;
+  u->rank = 0xffffffff;
+  return u;
+}
+
+static void
+delete_updn_node(
+  struct updn_node *u )
+{
+  u->sw->priv = NULL;
+  free(u);
+}
+
+/**********************************************************************
+ **********************************************************************/
 /* UPDN callback function */
 static int
 __osm_updn_call(
   void *ctx )
 {
   updn_t *p_updn = ctx;
+  cl_map_item_t *p_item;
+  osm_switch_t *p_sw;
 
   OSM_LOG_ENTER( &p_updn->p_osm->log, __osm_updn_call );
 
+  p_item = cl_qmap_head(&p_updn->p_osm->subn.sw_guid_tbl);
+  while(p_item != cl_qmap_end(&p_updn->p_osm->subn.sw_guid_tbl))
+  {
+    p_sw = (osm_switch_t *)p_item;
+    p_item = cl_qmap_next(p_item);
+    p_sw->priv = create_updn_node(p_sw);
+    if (!p_sw->priv)
+    {
+      osm_log( &(p_updn->p_osm->log), OSM_LOG_ERROR,
+               "__osm_updn_call: ERR AA0C: "
+               " cannot create updn node\n" );
+      OSM_LOG_EXIT( &p_updn->p_osm->log );
+      return -1;
+    }
+  }
+
   /* First auto detect root nodes - if required */
   if ( p_updn->auto_detect_root_nodes )
   {
     osm_ucast_mgr_build_lid_matrices( &p_updn->p_osm->sm.ucast_mgr );
-    /* printf ("-V- b4 osm_updn_find_root_nodes_by_min_hop\n"); */
     __osm_updn_find_root_nodes_by_min_hop( p_updn );
   }
   /* printf ("-V- after osm_updn_find_root_nodes_by_min_hop\n"); */
@@ -1066,8 +829,16 @@ __osm_updn_call(
   else
     osm_log( &p_updn->p_osm->log, OSM_LOG_INFO,
              "__osm_updn_call: "
-             "disable UPDN algorithm, no root nodes were found\n" );
+             "disabling UPDN algorithm, no root nodes were found\n" );
   
+  p_item = cl_qmap_head(&p_updn->p_osm->subn.sw_guid_tbl);
+  while(p_item != cl_qmap_end(&p_updn->p_osm->subn.sw_guid_tbl))
+  {
+    p_sw = (osm_switch_t *)p_item;
+    p_item = cl_qmap_next(p_item);
+    delete_updn_node(p_sw->priv);
+  }
+
   OSM_LOG_EXIT( &p_updn->p_osm->log );
   return 0;
 }
@@ -1137,7 +908,7 @@ __osm_updn_find_root_nodes_by_min_hop(
 
   osm_log( &p_osm->log, OSM_LOG_DEBUG,
            "__osm_updn_find_root_nodes_by_min_hop: "
-           "current number of ports in the subnet is %d\n",
+           "Current number of ports in the subnet is %d\n",
            cl_qmap_count(&p_osm->subn.port_guid_tbl) );
   /* Init the required vars */
   cl_qmap_init( &min_hop_hist );
@@ -1159,7 +930,7 @@ __osm_updn_find_root_nodes_by_min_hop(
   /* Find the Maximum number of CAs (and routers) for histogram normalization */
   osm_log( &p_osm->log, OSM_LOG_VERBOSE,
            "__osm_updn_find_root_nodes_by_min_hop: "
-           "Find the number of CAs and store them in cl_list\n" );
+           "Finding the number of CAs and storing them in cl_map\n" );
   p_next_port = (osm_port_t*)cl_qmap_head( &p_osm->subn.port_guid_tbl );
   while( p_next_port != (osm_port_t*)cl_qmap_end( &p_osm->subn.port_guid_tbl ) ) {
     p_port = p_next_port;
@@ -1177,13 +948,13 @@ __osm_updn_find_root_nodes_by_min_hop(
       cl_map_insert( &ca_by_lid_map, self_lid_ho, (void *)0x1);
       osm_log( &p_osm->log, OSM_LOG_DEBUG,
                "__osm_updn_find_root_nodes_by_min_hop: "
-               "Inserting into array GUID 0x%" PRIx64 ", Lid: 0x%X\n",
+               "Inserting GUID 0x%" PRIx64 ", Lid: 0x%X into array\n",
                cl_ntoh64(osm_port_get_guid(p_port)), self_lid_ho );
     }
   }
   osm_log( &p_osm->log, OSM_LOG_DEBUG,
            "__osm_updn_find_root_nodes_by_min_hop: "
-           "Found %u CA, %u SW in the subnet\n", numCas, numSws );
+           "Found %u CAs, %u SWs in the subnet\n", numCas, numSws );
   p_next_sw = (osm_switch_t*)cl_qmap_head( &p_osm->subn.sw_guid_tbl );
   osm_log( &p_osm->log, OSM_LOG_VERBOSE,
            "__osm_updn_find_root_nodes_by_min_hop: "
@@ -1201,7 +972,7 @@ __osm_updn_find_root_nodes_by_min_hop(
     p_next_sw = (osm_switch_t*)cl_qmap_next( &p_sw->map_item );
 
     /* Clear Min Hop Table && FWD Tbls - This should caused opensm to
-       rebuild its FWD tables , post setting Min Hop Tables */
+       rebuild it's FWD tables, post setting Min Hop Tables */
     max_lid_ho = osm_switch_get_max_lid_ho(p_sw);
     /* Get base lid of switch by retrieving port 0 lid of node pointer */
     self_lid_ho = cl_ntoh16( osm_node_get_base_lid( p_sw->p_node, 0 ) );
@@ -1285,7 +1056,7 @@ __osm_updn_find_root_nodes_by_min_hop(
                numHopBarsOverThd1, numHopBarsOverThd2 );
     }
 
-    /* destroy the qmap table and all its content - no longer needed */
+    /* destroy the qmap table and all it's content - no longer needed */
     osm_log( &p_osm->log, OSM_LOG_DEBUG,
              "__osm_updn_find_root_nodes_by_min_hop: "
              "Cleanup: delete histogram "
-- 
1.5.0.1.26.gf5a92


From mst at mellanox.co.il  Sat Feb 24 21:38:20 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Sun, 25 Feb 2007 07:38:20 +0200
Subject: [openib-general] [PATCH] libmthca: optimize calls to htonl with
 constant parameter
In-Reply-To: <aday7moevoc.fsf@cisco.com>
References: <20070222235724.GC4447@mellanox.co.il> <ada3b4xh0br.fsf@cisco.com>
	<20070223070055.GC25553@obsidianresearch.com>
	<aday7moevoc.fsf@cisco.com>
Message-ID: <20070225053820.GB6177@mellanox.co.il>

> Quoting Roland Dreier <rdreier at cisco.com>:
> Subject: Re: [PATCH] libmthca: optimize calls to htonl with constant parameter
> 
>  > Newer gccs have the -fwhole-program --combine options that address
>  > this and more. One of the things that happens is that all internal
>  > functions are made 'static' and all compilation units are optimized in
>  > one go.
> 
> Good point... but is there any sane way to use that feature with
> automake and libtool?  I know that the autotools are a pain but I
> really don't want to reimplement the useful stuff they give us, and I
> don't know of any really practical replacement...

Once KDE4 is out, I expect that most systems will start shipping cmake.
Maybe it'll be practical to switch to that then.

-- 
MST


From vlad at mellanox.co.il  Sun Feb 25 01:00:57 2007
From: vlad at mellanox.co.il (Vladimir Sokolovsky)
Date: Sun, 25 Feb 2007 11:00:57 +0200
Subject: [openib-general] [PATCH] for OFED 1.2
In-Reply-To: <000001c75787$50ff0440$ff0da8c0@amr.corp.intel.com>
References: <000001c75787$50ff0440$ff0da8c0@amr.corp.intel.com>
Message-ID: <1172394057.12388.3.camel@vladsk-laptop>

On Fri, 2007-02-23 at 12:15 -0800, Sean Hefty wrote:
> I would like these fixes in OFED 1.2 as well.  What git tree / branch do I
> generate a patch against?
> 
> - Sean

git://git.openfabrics.org/~vlad/ofed_1_2/.git
branch: ofed_1_2

- Vladimir

> 
> ---
> 
> rdma_cm: remove unused node_guid from cma_device structure.
> ib_cm: remove ca_guid from cm_device structure.
> rdma_cm: request reversible paths only.
> ib_core: Set hop limit in ib_init_ah_from_wc correctly.
> 
> The patches are in:
> 
> 	git://git.openfabrics.org/~shefty/rdma-dev.git for-roland
> 
> (sign-off line was added to the actual commit messages)
> 
> Signed-off-by: Sean Hefty <sean.hefty at intel.com>
> ---
> commit 28e218621d36cf9da42f07af08775769eb289fc0
> Author: Sean Hefty <sean.hefty at intel.com>
> Date:   Thu Feb 22 11:37:44 2007 -0800
> 
>     rdma_cm: remove unused node_guid from cma_device structure.
> 
> diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
> index bb27ce9..d441815 100644
> --- a/drivers/infiniband/core/cma.c
> +++ b/drivers/infiniband/core/cma.c
> @@ -77,7 +77,6 @@ static int next_port;
>  struct cma_device {
>  	struct list_head	list;
>  	struct ib_device	*device;
> -	__be64			node_guid;
>  	struct completion	comp;
>  	atomic_t		refcount;
>  	struct list_head	id_list;
> @@ -2674,7 +2673,6 @@ static void cma_add_one(struct ib_device *device)
>  		return;
>  
>  	cma_dev->device = device;
> -	cma_dev->node_guid = device->node_guid;
>  
>  	init_completion(&cma_dev->comp);
>  	atomic_set(&cma_dev->refcount, 1);
> 
> commit 6de97f2a3373357d720b1653dfc0aac6d40b7506
> Author: Sean Hefty <sean.hefty at intel.com>
> Date:   Thu Feb 22 11:37:38 2007 -0800
> 
>     ib_cm: remove ca_guid from cm_device structure.
>     
>     The cm_device references an ib_device, which contains the node_guid.
> 
> diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
> index d446998..842cd0b 100644
> --- a/drivers/infiniband/core/cm.c
> +++ b/drivers/infiniband/core/cm.c
> @@ -88,7 +88,6 @@ struct cm_port {
>  struct cm_device {
>  	struct list_head list;
>  	struct ib_device *device;
> -	__be64 ca_guid;
>  	struct cm_port port[0];
>  };
>  
> @@ -739,8 +738,8 @@ retest:
>  		ib_cancel_mad(cm_id_priv->av.port->mad_agent, cm_id_priv->msg);
>  		spin_unlock_irqrestore(&cm_id_priv->lock, flags);
>  		ib_send_cm_rej(cm_id, IB_CM_REJ_TIMEOUT,
> -			       &cm_id_priv->av.port->cm_dev->ca_guid,
> -			       sizeof cm_id_priv->av.port->cm_dev->ca_guid,
> +			       &cm_id_priv->id.device->node_guid,
> +			       sizeof cm_id_priv->id.device->node_guid,
>  			       NULL, 0);
>  		break;
>  	case IB_CM_REQ_RCVD:
> @@ -883,7 +882,7 @@ static void cm_format_req(struct cm_req_msg *req_msg,
>  
>  	req_msg->local_comm_id = cm_id_priv->id.local_id;
>  	req_msg->service_id = param->service_id;
> -	req_msg->local_ca_guid = cm_id_priv->av.port->cm_dev->ca_guid;
> +	req_msg->local_ca_guid = cm_id_priv->id.device->node_guid;
>  	cm_req_set_local_qpn(req_msg, cpu_to_be32(param->qp_num));
>  	cm_req_set_resp_res(req_msg, param->responder_resources);
>  	cm_req_set_init_depth(req_msg, param->initiator_depth);
> @@ -1442,7 +1441,7 @@ static void cm_format_rep(struct cm_rep_msg *rep_msg,
>  	cm_rep_set_flow_ctrl(rep_msg, param->flow_control);
>  	cm_rep_set_rnr_retry_count(rep_msg, param->rnr_retry_count);
>  	cm_rep_set_srq(rep_msg, param->srq);
> -	rep_msg->local_ca_guid = cm_id_priv->av.port->cm_dev->ca_guid;
> +	rep_msg->local_ca_guid = cm_id_priv->id.device->node_guid;
>  
>  	if (param->private_data && param->private_data_len)
>  		memcpy(rep_msg->private_data, param->private_data,
> @@ -3385,7 +3384,6 @@ static void cm_add_one(struct ib_device *device)
>  		return;
>  
>  	cm_dev->device = device;
> -	cm_dev->ca_guid = device->node_guid;
>  
>  	set_bit(IB_MGMT_METHOD_SEND, reg_req.method_mask);
>  	for (i = 1; i <= device->phys_port_cnt; i++) {
> 
> commit 87680047dd09ca4a4e8ec575dad215c92cf45ed3
> Author: Sean Hefty <sean.hefty at intel.com>
> Date:   Wed Feb 21 16:40:44 2007 -0800
> 
>     rdma_cm: request reversible paths only
>     
>     The rdma_cm requires that path records be reversible.  Set the reversible
>     bit when issuing an path record query.
> 
> diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
> index f8d69b3..bb27ce9 100644
> --- a/drivers/infiniband/core/cma.c
> +++ b/drivers/infiniband/core/cma.c
> @@ -1492,11 +1492,13 @@ static int cma_query_ib_route(struct rdma_id_private
> *id_priv, int timeout_ms,
>  	ib_addr_get_dgid(addr, &path_rec.dgid);
>  	path_rec.pkey = cpu_to_be16(ib_addr_get_pkey(addr));
>  	path_rec.numb_path = 1;
> +	path_rec.reversible = 1;
>  
>  	id_priv->query_id = ib_sa_path_rec_get(&sa_client, id_priv->id.device,
>  				id_priv->id.port_num, &path_rec,
>  				IB_SA_PATH_REC_DGID | IB_SA_PATH_REC_SGID |
> -				IB_SA_PATH_REC_PKEY | IB_SA_PATH_REC_NUMB_PATH,
> +				IB_SA_PATH_REC_PKEY | IB_SA_PATH_REC_NUMB_PATH |
> +				IB_SA_PATH_REC_REVERSIBLE,
>  				timeout_ms, GFP_KERNEL,
>  				cma_query_handler, work, &id_priv->query);
>  
> 
> commit 30947e5b7db42184d66746ac1187d4abbf89018d
> Author: Sean Hefty <sean.hefty at intel.com>
> Date:   Wed Feb 21 16:37:31 2007 -0800
> 
>     ib_core: Set hop limit in ib_init_ah_from_wc correctly.
>     
>     The hop_limit value in the ah_attr should be 0xFF, not the value read
>     from the received GRH (which should be 0).  See 13.5.4.4 in the 1.2 IB spec.
> 
> diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
> index 8b5dd36..ccdf93d 100644
> --- a/drivers/infiniband/core/verbs.c
> +++ b/drivers/infiniband/core/verbs.c
> @@ -167,7 +167,7 @@ int ib_init_ah_from_wc(struct ib_device *device, u8
> port_num, struct ib_wc *wc,
>  		ah_attr->grh.sgid_index = (u8) gid_index;
>  		flow_class = be32_to_cpu(grh->version_tclass_flow);
>  		ah_attr->grh.flow_label = flow_class & 0xFFFFF;
> -		ah_attr->grh.hop_limit = grh->hop_limit;
> +		ah_attr->grh.hop_limit = 0xFF;
>  		ah_attr->grh.traffic_class = (flow_class >> 20) & 0xFF;
>  	}
>  	return 0;
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070225/3225dac1/attachment.html>

From vlad at lists.openfabrics.org  Sun Feb 25 02:26:07 2007
From: vlad at lists.openfabrics.org (vlad at lists.openfabrics.org)
Date: Sun, 25 Feb 2007 02:26:07 -0800 (PST)
Subject: [openib-general] ofa_1_2_kernel 20070225-0200 daily build status
Message-ID: <20070225102608.251A3E607F6@openfabrics.org>

This email was generated automatically, please do not reply


Common build parameters:  --with-ipoib-mod --with-sdp-mod --with-srp-mod --with-user_mad-mod --with-user_access-mod --with-mthca-mod --with-core-mod --with-addr_trans-mod --with-cxgb3-mod --with-rds-mod 

Passed:
Passed on i686 with 2.6.15-23-server
Passed on i686 with linux-2.6.18
Passed on i686 with linux-2.6.19
Passed on i686 with linux-2.6.16
Passed on i686 with linux-2.6.17
Passed on i686 with linux-2.6.12
Passed on i686 with linux-2.6.14
Passed on i686 with linux-2.6.13
Passed on i686 with linux-2.6.15
Passed on powerpc with linux-2.6.19
Passed on x86_64 with linux-2.6.20
Passed on ppc64 with linux-2.6.12
Passed on powerpc with linux-2.6.18
Passed on x86_64 with linux-2.6.19
Passed on x86_64 with linux-2.6.18
Passed on x86_64 with linux-2.6.16
Passed on x86_64 with linux-2.6.17
Passed on x86_64 with linux-2.6.15
Passed on x86_64 with linux-2.6.12
Passed on powerpc with linux-2.6.17
Passed on powerpc with linux-2.6.16
Passed on x86_64 with linux-2.6.13
Passed on x86_64 with linux-2.6.14
Passed on ppc64 with linux-2.6.19
Passed on powerpc with linux-2.6.13
Passed on ppc64 with linux-2.6.13
Passed on x86_64 with linux-2.6.9-42.ELsmp
Passed on powerpc with linux-2.6.12
Passed on powerpc with linux-2.6.14
Passed on ppc64 with linux-2.6.15
Passed on ppc64 with linux-2.6.18
Passed on ppc64 with linux-2.6.16
Passed on x86_64 with linux-2.6.16.21-0.8-smp
Passed on ppc64 with linux-2.6.17
Passed on powerpc with linux-2.6.15
Passed on ppc64 with linux-2.6.14
Passed on x86_64 with linux-2.6.18-1.2798.fc6

Failed:
Build failed on ia64 with linux-2.6.19
Log:
/home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.19_ia64_check/include/rdma/ib_verbs.h:1590: error: implicit declaration of function ‘sg_dma_len’
/home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.19_ia64_check/drivers/infiniband/core/addr.c: At top level:
/home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.19_ia64_check/drivers/infiniband/core/addr.c:62: warning: initialization from incompatible pointer type
make[4]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.19_ia64_check/drivers/infiniband/core/addr.o] Error 1
make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.19_ia64_check/drivers/infiniband/core] Error 2
make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.19_ia64_check/drivers/infiniband] Error 2
make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.19_ia64_check] Error 2
make[1]: Leaving directory `/home/vlad/kernel.org/ia64/linux-2.6.19'
make: *** [kernel] Error 2
----------------------------------------------------------------------------------
Build failed on ia64 with linux-2.6.18
Log:
/home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.18_ia64_check/include/rdma/ib_verbs.h:1590: error: implicit declaration of function ‘sg_dma_len’
/home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.18_ia64_check/drivers/infiniband/core/addr.c: At top level:
/home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.18_ia64_check/drivers/infiniband/core/addr.c:62: warning: initialization from incompatible pointer type
make[4]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.18_ia64_check/drivers/infiniband/core/addr.o] Error 1
make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.18_ia64_check/drivers/infiniband/core] Error 2
make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.18_ia64_check/drivers/infiniband] Error 2
make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.18_ia64_check] Error 2
make[1]: Leaving directory `/home/vlad/kernel.org/ia64/linux-2.6.18'
make: *** [kernel] Error 2
----------------------------------------------------------------------------------
Build failed on ia64 with linux-2.6.12
Log:
/home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.12_ia64_check/include/rdma/ib_verbs.h:1590: error: implicit declaration of function ‘sg_dma_len’
/home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.12_ia64_check/drivers/infiniband/core/addr.c: At top level:
/home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.12_ia64_check/drivers/infiniband/core/addr.c:61: warning: initialization from incompatible pointer type
make[4]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.12_ia64_check/drivers/infiniband/core/addr.o] Error 1
make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.12_ia64_check/drivers/infiniband/core] Error 2
make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.12_ia64_check/drivers/infiniband] Error 2
make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.12_ia64_check] Error 2
make[1]: Leaving directory `/home/vlad/kernel.org/ia64/linux-2.6.12'
make: *** [kernel] Error 2
----------------------------------------------------------------------------------
Build failed on ia64 with linux-2.6.13
Log:
/home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.13_ia64_check/include/rdma/ib_verbs.h:1590: error: implicit declaration of function ‘sg_dma_len’
/home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.13_ia64_check/drivers/infiniband/core/addr.c: At top level:
/home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.13_ia64_check/drivers/infiniband/core/addr.c:61: warning: initialization from incompatible pointer type
make[4]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.13_ia64_check/drivers/infiniband/core/addr.o] Error 1
make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.13_ia64_check/drivers/infiniband/core] Error 2
make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.13_ia64_check/drivers/infiniband] Error 2
make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.13_ia64_check] Error 2
make[1]: Leaving directory `/home/vlad/kernel.org/ia64/linux-2.6.13'
make: *** [kernel] Error 2
----------------------------------------------------------------------------------
Build failed on ia64 with linux-2.6.15
Log:
/home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.15_ia64_check/include/rdma/ib_verbs.h:1590: error: implicit declaration of function ‘sg_dma_len’
/home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.15_ia64_check/drivers/infiniband/core/addr.c: At top level:
/home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.15_ia64_check/drivers/infiniband/core/addr.c:61: warning: initialization from incompatible pointer type
make[4]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.15_ia64_check/drivers/infiniband/core/addr.o] Error 1
make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.15_ia64_check/drivers/infiniband/core] Error 2
make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.15_ia64_check/drivers/infiniband] Error 2
make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.15_ia64_check] Error 2
make[1]: Leaving directory `/home/vlad/kernel.org/ia64/linux-2.6.15'
make: *** [kernel] Error 2
----------------------------------------------------------------------------------
Build failed on ia64 with linux-2.6.14
Log:
/home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.14_ia64_check/include/rdma/ib_verbs.h:1590: error: implicit declaration of function ‘sg_dma_len’
/home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.14_ia64_check/drivers/infiniband/core/addr.c: At top level:
/home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.14_ia64_check/drivers/infiniband/core/addr.c:61: warning: initialization from incompatible pointer type
make[4]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.14_ia64_check/drivers/infiniband/core/addr.o] Error 1
make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.14_ia64_check/drivers/infiniband/core] Error 2
make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.14_ia64_check/drivers/infiniband] Error 2
make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.14_ia64_check] Error 2
make[1]: Leaving directory `/home/vlad/kernel.org/ia64/linux-2.6.14'
make: *** [kernel] Error 2
----------------------------------------------------------------------------------
Build failed on ia64 with linux-2.6.17
Log:
/home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.17_ia64_check/include/rdma/ib_verbs.h:1590: error: implicit declaration of function ‘sg_dma_len’
/home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.17_ia64_check/drivers/infiniband/core/addr.c: At top level:
/home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.17_ia64_check/drivers/infiniband/core/addr.c:61: warning: initialization from incompatible pointer type
make[4]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.17_ia64_check/drivers/infiniband/core/addr.o] Error 1
make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.17_ia64_check/drivers/infiniband/core] Error 2
make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.17_ia64_check/drivers/infiniband] Error 2
make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.17_ia64_check] Error 2
make[1]: Leaving directory `/home/vlad/kernel.org/ia64/linux-2.6.17'
make: *** [kernel] Error 2
----------------------------------------------------------------------------------
Build failed on ia64 with linux-2.6.16
Log:
/home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.16_ia64_check/include/rdma/ib_verbs.h:1590: error: implicit declaration of function ‘sg_dma_len’
/home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.16_ia64_check/drivers/infiniband/core/addr.c: At top level:
/home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.16_ia64_check/drivers/infiniband/core/addr.c:61: warning: initialization from incompatible pointer type
make[4]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.16_ia64_check/drivers/infiniband/core/addr.o] Error 1
make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.16_ia64_check/drivers/infiniband/core] Error 2
make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.16_ia64_check/drivers/infiniband] Error 2
make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.16_ia64_check] Error 2
make[1]: Leaving directory `/home/vlad/kernel.org/ia64/linux-2.6.16'
make: *** [kernel] Error 2
----------------------------------------------------------------------------------
Build failed on ia64 with linux-2.6.16.21-0.8-default
Log:
/home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.16.21-0.8-default_ia64_check/include/rdma/ib_verbs.h:1590: error: implicit declaration of function ‘sg_dma_len’
/home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core/addr.c: At top level:
/home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core/addr.c:61: warning: initialization from incompatible pointer type
make[4]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core/addr.o] Error 1
make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/core] Error 2
make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband] Error 2
make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.16.21-0.8-default_ia64_check] Error 2
make[1]: Leaving directory `/home/vlad/kernel.org/ia64/linux-2.6.16.21-0.8-default'
make: *** [kernel] Error 2
----------------------------------------------------------------------------------
Build failed on x86_64 with linux-2.6.9-22.ELsmp
Log:
/home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.9-22.ELsmp_x86_64_check/drivers/net/cxgb3/vsc8211.c:167: error: ‘ADVERTISE_PAUSE_CAP’ undeclared (first use in this function)
/home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.9-22.ELsmp_x86_64_check/drivers/net/cxgb3/vsc8211.c:167: error: (Each undeclared identifier is reported only once
/home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.9-22.ELsmp_x86_64_check/drivers/net/cxgb3/vsc8211.c:167: error: for each function it appears in.)
/home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.9-22.ELsmp_x86_64_check/drivers/net/cxgb3/vsc8211.c:170: error: ‘ADVERTISE_PAUSE_ASYM’ undeclared (first use in this function)
make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.9-22.ELsmp_x86_64_check/drivers/net/cxgb3/vsc8211.o] Error 1
make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.9-22.ELsmp_x86_64_check/drivers/net/cxgb3] Error 2
make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.9-22.ELsmp_x86_64_check] Error 2
make[1]: Leaving directory `/home/vlad/kernel.org/x86_64/linux-2.6.9-22.ELsmp'
make: *** [kernel] Error 2
----------------------------------------------------------------------------------
Build failed on x86_64 with linux-2.6.9-34.ELsmp
Log:
/home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.9-34.ELsmp_x86_64_check/drivers/net/cxgb3/cxgb3_offload.c: In function ‘add_adapter’:
/home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.9-34.ELsmp_x86_64_check/drivers/net/cxgb3/cxgb3_offload.c:1061: error: ‘adapter_list_lock’ undeclared (first use in this function)
/home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.9-34.ELsmp_x86_64_check/drivers/net/cxgb3/cxgb3_offload.c: In function ‘remove_adapter’:
/home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.9-34.ELsmp_x86_64_check/drivers/net/cxgb3/cxgb3_offload.c:1068: error: ‘adapter_list_lock’ undeclared (first use in this function)
make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.9-34.ELsmp_x86_64_check/drivers/net/cxgb3/cxgb3_offload.o] Error 1
make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.9-34.ELsmp_x86_64_check/drivers/net/cxgb3] Error 2
make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.9-34.ELsmp_x86_64_check] Error 2
make[1]: Leaving directory `/home/vlad/kernel.org/x86_64/linux-2.6.9-34.ELsmp'
make: *** [kernel] Error 2
----------------------------------------------------------------------------------
Build failed on x86_64 with linux-2.6.5-7.244-smp
Log:
/home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.5-7.244-smp_x86_64_check/net/rds/af_rds.c: In function 'rds_exit':
/home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.5-7.244-smp_x86_64_check/net/rds/af_rds.c:467: error: implicit declaration of function 'proto_unregister'
/home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.5-7.244-smp_x86_64_check/net/rds/af_rds.c: In function 'rds_init':
/home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.5-7.244-smp_x86_64_check/net/rds/af_rds.c:516: error: implicit declaration of function 'proto_register'
make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.5-7.244-smp_x86_64_check/net/rds/af_rds.o] Error 1
make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.5-7.244-smp_x86_64_check/net/rds] Error 2
make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070225-0200_linux-2.6.5-7.244-smp_x86_64_check] Error 2
make[1]: Leaving directory `/home/vlad/kernel.org/x86_64/linux-2.6.5-7.244-smp'
make: *** [kernel] Error 2
----------------------------------------------------------------------------------


From ogerlitz at voltaire.com  Sun Feb 25 02:48:30 2007
From: ogerlitz at voltaire.com (Or Gerlitz)
Date: Sun, 25 Feb 2007 12:48:30 +0200
Subject: [openib-general] ipoib & the partial pkey
In-Reply-To: <000401c756da$1f9387d0$8698070a@amr.corp.intel.com>
References: <000401c756da$1f9387d0$8698070a@amr.corp.intel.com>
Message-ID: <45E1697E.6050007@voltaire.com>

Sean Hefty wrote:
> I looked into this more...
> RFC 4391 states (middle of page 5):
> For a node to join a partition, one of its ports must be assigned the relevant
> P_Key by the SM [RFC4392].

> Jumping to RFC 4392 (top of page 4):

Just to have us agree on the quote, it is from section 4 of rfc 4392 
(page 14) eg in http://www.ietf.org/rfc/rfc4392.txt

> at the time of creating an IB multicast group, multiple values such as the
> P_Key, Q_Key, Service Level, Hop Limit, Flow ID, TClass, MTU, etc.  have to be
> specified.  These values should be such that all potential members of the IB
> multicast group are able to communicate with one another when using them.

OK, I suggest to remove this spec limitation, as it does not allow the 
use case of a server using a partition for which inter-client 
communication is not allowed.

Actually since it does not let people use partial membership 
partitioning with IPoIB as every ipoib device needs to join the 
broadcast group, it is probably a spec bug and not a limitation done on 
purpose.

A simple real-life example is I/O target, the system admin wants IB 
block and/or file storage traffic to use a partition, but he does not 
want initiators to communicate among themselves on this partition.

To achieve that the SM is configured to assign the partial pkey to the 
initiator nodes and the full pkey to the target ports.

The current implementation of IPoIB and core perfectly (and 
transparently...) supports that.

Or.


From mst at mellanox.co.il  Sun Feb 25 04:22:11 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Sun, 25 Feb 2007 14:22:11 +0200
Subject: [openib-general] [PATCH for-2.6.21] IPoIB/cm: improve small
 message bandwidth
In-Reply-To: <adaslcxvpuy.fsf@cisco.com>
References: <adaslcxvpuy.fsf@cisco.com>
Message-ID: <20070225122211.GD5331@mellanox.co.il>

> Quoting Roland Dreier <rdreier at cisco.com>:
> Subject: Re: [openib-general] [PATCH for-2.6.21] IPoIB/cm: improve small message bandwidth
> 
> OK, I applied the following patch (I had to change one line of your
> patch to get it to apply because the small-message changed the context
> so one chunk didn't apply).
> 
> Anyway I don't see any difference in small message latency or large
> message throughput.  (Actually latency seems slightly worse but I
> think the change is within my normal variability so I'm don't think
> the difference is significant)

OK.
I wonder whether unrolling the loop in skb_put_frags might be helpful.
Could you please try the following? Does this affect latency for you?
(I don't see any difference in between UD and CM either with or without
 this patch).


Try to improve small message latency some more by unrolling more loops.

Signed-off-by: Michael S. Tsirkin <mst at mellanox.co.il>

---

diff --git a/drivers/infiniband/ulp/ipoib/ipoib_cm.c b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
index a389854..a8895b4 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_cm.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
@@ -311,38 +311,6 @@ static int ipoib_cm_rx_handler(struct ib_cm_id *cm_id,
 		return 0;
 	}
 }
-/* Adjust length of skb with fragments to match received data */
-static void skb_put_frags(struct sk_buff *skb, unsigned int hdr_space,
-			  unsigned int length, struct sk_buff *toskb)
-{
-	int i, num_frags;
-	unsigned int size;
-
-	/* put header into skb */
-	size = min(length, hdr_space);
-	skb->tail += size;
-	skb->len += size;
-	length -= size;
-
-	num_frags = skb_shinfo(skb)->nr_frags;
-	for (i = 0; i < num_frags; i++) {
-		skb_frag_t *frag = &skb_shinfo(skb)->frags[i];
-
-		if (length == 0) {
-			/* don't need this page */
-			skb_fill_page_desc(toskb, i, frag->page, 0, PAGE_SIZE);
-			--skb_shinfo(skb)->nr_frags;
-		} else {
-			size = min(length, (unsigned) PAGE_SIZE);
-
-			frag->size = size;
-			skb->data_len += size;
-			skb->truesize += size;
-			skb->len += size;
-			length -= size;
-		}
-	}
-}
 
 void ipoib_cm_handle_rx_wc(struct net_device *dev, struct ib_wc *wc)
 {
@@ -352,7 +320,7 @@ void ipoib_cm_handle_rx_wc(struct net_device *dev, struct ib_wc *wc)
 	struct ipoib_cm_rx *p;
 	unsigned long flags;
 	u64 mapping[IPOIB_CM_RX_SG];
-	int frags;
+	unsigned head_size, frag_size, frags;
 
 	ipoib_dbg_data(priv, "cm recv completion: id %d, op %d, status: %d\n",
 		       wr_id, wc->opcode, wc->status);
@@ -388,8 +356,9 @@ void ipoib_cm_handle_rx_wc(struct net_device *dev, struct ib_wc *wc)
 		}
 	}
 
-	frags = PAGE_ALIGN(wc->byte_len - min(wc->byte_len,
-					      (unsigned)IPOIB_CM_HEAD_SIZE)) / PAGE_SIZE;
+	head_size = min(wc->byte_len, (unsigned)IPOIB_CM_HEAD_SIZE);
+	frag_size = wc->byte_len - head_size;
+	frags = PAGE_ALIGN(frag_size) / PAGE_SIZE;
 
 	newskb = ipoib_cm_alloc_rx_skb(dev, wr_id, frags, mapping);
 	if (unlikely(!newskb)) {
@@ -408,7 +377,18 @@ void ipoib_cm_handle_rx_wc(struct net_device *dev, struct ib_wc *wc)
 	ipoib_dbg_data(priv, "received %d bytes, SLID 0x%04x\n",
 		       wc->byte_len, wc->slid);
 
-	skb_put_frags(skb, IPOIB_CM_HEAD_SIZE, wc->byte_len, newskb);
+	memcpy(&skb_shinfo(newskb)->frags[frags], &skb_shinfo(skb)->frags[frags],
+	       (IPOIB_CM_RX_SG - 1 - frags) * sizeof(skb_frag_t));
+	skb_shinfo(newskb)->nr_frags = IPOIB_CM_RX_SG - 1;
+
+	skb_shinfo(skb)->nr_frags = frags;
+	skb->tail += head_size;
+	skb->len += wc->byte_len;
+	skb->data_len += frag_size;
+	skb->truesize += frag_size;
+	if (frags)
+		skb_shinfo(skb)->frags[frags - 1].size =
+			(frag_size - 1) % PAGE_SIZE + 1;
 
 	skb->protocol = ((struct ipoib_header *) skb->data)->proto;
 	skb->mac.raw = skb->data;


-- 
MST


From dotanb at dev.mellanox.co.il  Sun Feb 25 06:03:28 2007
From: dotanb at dev.mellanox.co.il (Dotan Barak)
Date: Sun, 25 Feb 2007 16:03:28 +0200
Subject: [openib-general] [PATCH] IB/core: Set static rate in
 ib_init_ah_from_path()
In-Reply-To: <adar6sn74fq.fsf@cisco.com>
References: <000401c75223$29e86ea0$e598070a@amr.corp.intel.com>
	<1431.85.65.224.140.1171732569.squirrel@dev.mellanox.co.il>
	<adar6sn74fq.fsf@cisco.com>
Message-ID: <45E19730.7010008@dev.mellanox.co.il>

Hi and sorry about the delay in the reply.

Roland Dreier wrote:
>  > In issue number 296 that i opened several months ago in the Bugzilla, i
>  > reported about two missing attributes: the first one is the static_rate,
>  > and the second one is the src_path_bits which is not being filled right.
>
> The patch I posted fixes the static rate, right?
>
> You'll need to explain what you mean about src_path_bits, because at
> first glance the code looks OK to me.
>   
Here is the code that handles the src_path_bits:

int ib_init_ah_from_path(struct ib_device *device, u8 port_num,
                         struct ib_sa_path_rec *rec, struct ib_ah_attr 
*ah_attr)
{
        int ret;
        u16 gid_index;

        memset(ah_attr, 0, sizeof *ah_attr);
        ah_attr->dlid = be16_to_cpu(rec->dlid);
        ah_attr->sl = rec->sl;
        ah_attr->src_path_bits = be16_to_cpu(rec->slid) & 0x7f;


I have a feeling that this function doesn't handle the src_path_bits as 
it should because
it doesn't care what is the LMC value of the slid (i think that if the 
LMC is < 8) wrong bits
may be set in the src_path_bits.

I think that no one noticed any failure in this code (so far) because 
not many users use LMC > 0
in their subnet, and most of the code that will call this function will 
use it with the base port LID.


thanks
Dotan


From kliteyn at dev.mellanox.co.il  Sun Feb 25 06:23:30 2007
From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik)
Date: Sun, 25 Feb 2007 16:23:30 +0200
Subject: [openib-general] [PATCH] osm: Flushing log file after OSM_SYS_LOG
	message
Message-ID: <45E19BE2.2070704@dev.mellanox.co.il>

Hi Hal,

OSM log should be flushed when OSM_SYS_LOG message is
printed. We had this once, but somehow it has disappeared.

This fix has to go both to trunk and to 1.2.

Thanks,

--Yevgeny

Signed-off-by: Yevgeny Kliteynik <kliteyn at dev.mellanox.co.il>
---
 osm/opensm/osm_log.c |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/osm/opensm/osm_log.c b/osm/opensm/osm_log.c
index d76031d..f95ed85 100644
--- a/osm/opensm/osm_log.c
+++ b/osm/opensm/osm_log.c
@@ -204,7 +204,8 @@ osm_log(
 #endif
  
     /*  flush log */
-    if (ret > 0 && (p_log->flush || (verbosity & OSM_LOG_ERROR)) &&
+    if ( ret > 0 && 
+        (p_log->flush || (verbosity & OSM_LOG_ERROR) || (verbosity & OSM_LOG_SYS)) &&
         fflush( p_log->out_port ) < 0)
       ret = -1;
 
-- 
1.4.4.1.GIT


From vlad at lists.openfabrics.org  Sun Feb 25 08:06:32 2007
From: vlad at lists.openfabrics.org (vlad at lists.openfabrics.org)
Date: Sun, 25 Feb 2007 08:06:32 -0800 (PST)
Subject: [openib-general] ofa_1_2_kernel 20070225-0736 daily build status
Message-ID: <20070225160632.A0A9AE603C8@openfabrics.org>

This email was generated automatically, please do not reply


Common build parameters:  --with-ipoib-mod --with-sdp-mod --with-srp-mod --with-user_mad-mod --with-user_access-mod --with-mthca-mod --with-core-mod --with-addr_trans-mod --with-cxgb3-mod --with-rds-mod 

Passed:
Passed on i686 with 2.6.15-23-server
Passed on i686 with linux-2.6.19
Passed on i686 with linux-2.6.17
Passed on i686 with linux-2.6.18
Passed on i686 with linux-2.6.15
Passed on i686 with linux-2.6.12
Passed on i686 with linux-2.6.16
Passed on i686 with linux-2.6.13
Passed on i686 with linux-2.6.14
Passed on powerpc with linux-2.6.19
Passed on x86_64 with linux-2.6.19
Passed on x86_64 with linux-2.6.20
Passed on x86_64 with linux-2.6.12
Passed on x86_64 with linux-2.6.18
Passed on x86_64 with linux-2.6.16
Passed on x86_64 with linux-2.6.17
Passed on x86_64 with linux-2.6.13
Passed on x86_64 with linux-2.6.14
Passed on ppc64 with linux-2.6.14
Passed on powerpc with linux-2.6.13
Passed on powerpc with linux-2.6.18
Passed on x86_64 with linux-2.6.15
Passed on ia64 with linux-2.6.15
Passed on powerpc with linux-2.6.17
Passed on ppc64 with linux-2.6.12
Passed on x86_64 with linux-2.6.9-42.ELsmp
Passed on ppc64 with linux-2.6.15
Passed on ppc64 with linux-2.6.13
Passed on ia64 with linux-2.6.16
Passed on powerpc with linux-2.6.14
Passed on powerpc with linux-2.6.12
Passed on ppc64 with linux-2.6.19
Passed on powerpc with linux-2.6.15
Passed on ia64 with linux-2.6.18
Passed on ppc64 with linux-2.6.17
Passed on powerpc with linux-2.6.16
Passed on ia64 with linux-2.6.14
Passed on x86_64 with linux-2.6.16.21-0.8-smp
Passed on ppc64 with linux-2.6.16
Passed on ia64 with linux-2.6.13
Passed on ia64 with linux-2.6.17
Passed on ppc64 with linux-2.6.18
Passed on ia64 with linux-2.6.12
Passed on ia64 with linux-2.6.19
Passed on ia64 with linux-2.6.16.21-0.8-default
Passed on x86_64 with linux-2.6.18-1.2798.fc6

Failed:
Build failed on x86_64 with linux-2.6.9-34.ELsmp
Log:
/home/vlad/tmp/ofa_1_2_kernel-20070225-0736_linux-2.6.9-34.ELsmp_x86_64_check/drivers/net/cxgb3/cxgb3_offload.c: In function 'add_adapter':
/home/vlad/tmp/ofa_1_2_kernel-20070225-0736_linux-2.6.9-34.ELsmp_x86_64_check/drivers/net/cxgb3/cxgb3_offload.c:1061: error: 'adapter_list_lock' undeclared (first use in this function)
/home/vlad/tmp/ofa_1_2_kernel-20070225-0736_linux-2.6.9-34.ELsmp_x86_64_check/drivers/net/cxgb3/cxgb3_offload.c: In function 'remove_adapter':
/home/vlad/tmp/ofa_1_2_kernel-20070225-0736_linux-2.6.9-34.ELsmp_x86_64_check/drivers/net/cxgb3/cxgb3_offload.c:1068: error: 'adapter_list_lock' undeclared (first use in this function)
make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070225-0736_linux-2.6.9-34.ELsmp_x86_64_check/drivers/net/cxgb3/cxgb3_offload.o] Error 1
make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070225-0736_linux-2.6.9-34.ELsmp_x86_64_check/drivers/net/cxgb3] Error 2
make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070225-0736_linux-2.6.9-34.ELsmp_x86_64_check] Error 2
make[1]: Leaving directory `/home/vlad/kernel.org/x86_64/linux-2.6.9-34.ELsmp'
make: *** [kernel] Error 2
----------------------------------------------------------------------------------
Build failed on x86_64 with linux-2.6.9-22.ELsmp
Log:
/home/vlad/tmp/ofa_1_2_kernel-20070225-0736_linux-2.6.9-22.ELsmp_x86_64_check/drivers/net/cxgb3/vsc8211.c:167: error: 'ADVERTISE_PAUSE_CAP' undeclared (first use in this function)
/home/vlad/tmp/ofa_1_2_kernel-20070225-0736_linux-2.6.9-22.ELsmp_x86_64_check/drivers/net/cxgb3/vsc8211.c:167: error: (Each undeclared identifier is reported only once
/home/vlad/tmp/ofa_1_2_kernel-20070225-0736_linux-2.6.9-22.ELsmp_x86_64_check/drivers/net/cxgb3/vsc8211.c:167: error: for each function it appears in.)
/home/vlad/tmp/ofa_1_2_kernel-20070225-0736_linux-2.6.9-22.ELsmp_x86_64_check/drivers/net/cxgb3/vsc8211.c:170: error: 'ADVERTISE_PAUSE_ASYM' undeclared (first use in this function)
make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070225-0736_linux-2.6.9-22.ELsmp_x86_64_check/drivers/net/cxgb3/vsc8211.o] Error 1
make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070225-0736_linux-2.6.9-22.ELsmp_x86_64_check/drivers/net/cxgb3] Error 2
make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070225-0736_linux-2.6.9-22.ELsmp_x86_64_check] Error 2
make[1]: Leaving directory `/home/vlad/kernel.org/x86_64/linux-2.6.9-22.ELsmp'
make: *** [kernel] Error 2
----------------------------------------------------------------------------------
Build failed on x86_64 with linux-2.6.5-7.244-smp
Log:
/home/vlad/tmp/ofa_1_2_kernel-20070225-0736_linux-2.6.5-7.244-smp_x86_64_check/net/rds/af_rds.c: In function 'rds_exit':
/home/vlad/tmp/ofa_1_2_kernel-20070225-0736_linux-2.6.5-7.244-smp_x86_64_check/net/rds/af_rds.c:467: error: implicit declaration of function 'proto_unregister'
/home/vlad/tmp/ofa_1_2_kernel-20070225-0736_linux-2.6.5-7.244-smp_x86_64_check/net/rds/af_rds.c: In function 'rds_init':
/home/vlad/tmp/ofa_1_2_kernel-20070225-0736_linux-2.6.5-7.244-smp_x86_64_check/net/rds/af_rds.c:516: error: implicit declaration of function 'proto_register'
make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070225-0736_linux-2.6.5-7.244-smp_x86_64_check/net/rds/af_rds.o] Error 1
make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070225-0736_linux-2.6.5-7.244-smp_x86_64_check/net/rds] Error 2
make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070225-0736_linux-2.6.5-7.244-smp_x86_64_check] Error 2
make[1]: Leaving directory `/home/vlad/kernel.org/x86_64/linux-2.6.5-7.244-smp'
make: *** [kernel] Error 2
----------------------------------------------------------------------------------


From mst at mellanox.co.il  Sun Feb 25 09:08:16 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Sun, 25 Feb 2007 19:08:16 +0200
Subject: [openib-general] [openfabrics-ewg] new OFED 1.2 package
In-Reply-To: <BAE9DCEF64577A439B3A37F36F9B691C01B54EC7@orsmsx418.amr.corp.intel.com>
References: <BAE9DCEF64577A439B3A37F36F9B691C01B54D33@orsmsx418.amr.corp.intel.com>
	<BAE9DCEF64577A439B3A37F36F9B691C01B54EC7@orsmsx418.amr.corp.intel.com>
Message-ID: <20070225170816.GA18630@mellanox.co.il>

> Quoting r. Woodruff, Robert J <robert.j.woodruff at intel.com>:
> Subject: Re: [openfabrics-ewg] new OFED 1.2 package
> 
> I am also still seeing the issue with the rdma_cm abi_version on RedHat
> EL4-U3,
> bug number, 347. The bug report contains the patch that should fix this.

OK, here's a somewhat cleaned-up patch.  However, I have a question: should not the
module cleanup function remove ucma_class and class_attr_abi_version that were
created at module initialization?

----

diff --git a/drivers/infiniband/core/ucma.c b/drivers/infiniband/core/ucma.c
index e2e8d32..e9e024e 100644
--- a/drivers/infiniband/core/ucma.c
+++ b/drivers/infiniband/core/ucma.c
@@ -847,13 +847,12 @@ static struct miscdevice ucma_misc = {
 	.fops	= &ucma_fops,
 };
 
-static ssize_t show_abi_version(struct device *dev,
-				struct device_attribute *attr,
-				char *buf)
+static struct class *ucma_class;
+static ssize_t show_abi_version(struct class *class_dev, char *buf)
 {
 	return sprintf(buf, "%d\n", RDMA_USER_CM_ABI_VERSION);
 }
-static DEVICE_ATTR(abi_version, S_IRUGO, show_abi_version, NULL);
+static CLASS_ATTR(abi_version, S_IRUGO, show_abi_version, NULL);
 
 static int __init ucma_init(void)
 {
@@ -863,7 +862,13 @@ static int __init ucma_init(void)
 	if (ret)
 		return ret;
 
-	ret = device_create_file(ucma_misc.this_device, &dev_attr_abi_version);
+	ucma_class = class_create(THIS_MODULE, "infiniband_ucma");
+	if (IS_ERR(ucma_class)) {
+		printk(KERN_ERR "rdma_ucm: couldn't create class infiniband_ucma\n");
+		goto err;
+	}
+
+	ret = class_create_file(ucma_class, &class_attr_abi_version);
 	if (ret) {
 		printk(KERN_ERR "rdma_ucm: couldn't create abi_version attr\n");
 		goto err;
@@ -876,7 +881,6 @@ err:
 
 static void __exit ucma_cleanup(void)
 {
-	device_remove_file(ucma_misc.this_device, &dev_attr_abi_version);
 	misc_deregister(&ucma_misc);
 	idr_destroy(&ctx_idr);
 }


-- 
MST


From sashak at voltaire.com  Sun Feb 25 11:58:37 2007
From: sashak at voltaire.com (Sasha Khapyorsky)
Date: Sun, 25 Feb 2007 21:58:37 +0200
Subject: [openib-general] [PATCH] osm: Flushing log file after
 OSM_SYS_LOG message
In-Reply-To: <45E19BE2.2070704@dev.mellanox.co.il>
References: <45E19BE2.2070704@dev.mellanox.co.il>
Message-ID: <20070225195837.GC11957@sashak.voltaire.com>

On 16:23 Sun 25 Feb     , Yevgeny Kliteynik wrote:
> Hi Hal,
> 
> OSM log should be flushed when OSM_SYS_LOG message is
> printed. We had this once, but somehow it has disappeared.
> 
> This fix has to go both to trunk and to 1.2.
> 
> Thanks,
> 
> --Yevgeny
> 
> Signed-off-by: Yevgeny Kliteynik <kliteyn at dev.mellanox.co.il>
> ---
>  osm/opensm/osm_log.c |    3 ++-
>  1 files changed, 2 insertions(+), 1 deletions(-)
> 
> diff --git a/osm/opensm/osm_log.c b/osm/opensm/osm_log.c
> index d76031d..f95ed85 100644
> --- a/osm/opensm/osm_log.c
> +++ b/osm/opensm/osm_log.c
> @@ -204,7 +204,8 @@ osm_log(
>  #endif
>   
>      /*  flush log */
> -    if (ret > 0 && (p_log->flush || (verbosity & OSM_LOG_ERROR)) &&
> +    if ( ret > 0 && 
> +        (p_log->flush || (verbosity & OSM_LOG_ERROR) || (verbosity & OSM_LOG_SYS)) &&

verbosity & (OSM_LOG_ERROR|OSM_LOG_SYS)?

Sasha


From sashak at voltaire.com  Sun Feb 25 13:48:45 2007
From: sashak at voltaire.com (Sasha Khapyorsky)
Date: Sun, 25 Feb 2007 23:48:45 +0200
Subject: [openib-general] [PATCH] opensm: faster min hops
Message-ID: <20070225214845.GF11957@sashak.voltaire.com>


After gprof output analyzing, I noticed that current lmx (switch's lid
matrix) implementation is extremely slow. This simple hops matrix
reimplementation makes lid matrices build process two times faster.

Signed-off-by: Sasha Khapyorsky <sashak at voltaire.com>
---
 osm/include/opensm/osm_port_profile.h |    1 -
 osm/include/opensm/osm_router.h       |    1 -
 osm/include/opensm/osm_switch.h       |  182 ++++++++-------------------------
 osm/opensm/osm_switch.c               |  115 ++++++++++++++-------
 osm/opensm/osm_ucast_ftree.c          |    3 -
 osm/opensm/osm_ucast_mgr.c            |   16 +--
 osm/opensm/osm_ucast_updn.c           |    2 +-
 7 files changed, 124 insertions(+), 196 deletions(-)

diff --git a/osm/include/opensm/osm_port_profile.h b/osm/include/opensm/osm_port_profile.h
index 952393d..a07b057 100644
--- a/osm/include/opensm/osm_port_profile.h
+++ b/osm/include/opensm/osm_port_profile.h
@@ -55,7 +55,6 @@
 #include <opensm/osm_subnet.h>
 #include <opensm/osm_node.h>
 #include <opensm/osm_port.h>
-#include <opensm/osm_matrix.h>
 #include <opensm/osm_fwd_tbl.h>
 #include <opensm/osm_mcast_tbl.h>
 
diff --git a/osm/include/opensm/osm_router.h b/osm/include/opensm/osm_router.h
index 168ce77..63c7566 100644
--- a/osm/include/opensm/osm_router.h
+++ b/osm/include/opensm/osm_router.h
@@ -52,7 +52,6 @@
 #include <opensm/osm_madw.h>
 #include <opensm/osm_node.h>
 #include <opensm/osm_port.h>
-#include <opensm/osm_matrix.h>
 #include <opensm/osm_fwd_tbl.h>
 #include <opensm/osm_mcast_tbl.h>
 #include <opensm/osm_port_profile.h>
diff --git a/osm/include/opensm/osm_switch.h b/osm/include/opensm/osm_switch.h
index 053b18a..19381f8 100644
--- a/osm/include/opensm/osm_switch.h
+++ b/osm/include/opensm/osm_switch.h
@@ -53,7 +53,6 @@
 #include <opensm/osm_madw.h>
 #include <opensm/osm_node.h>
 #include <opensm/osm_port.h>
-#include <opensm/osm_matrix.h>
 #include <opensm/osm_fwd_tbl.h>
 #include <opensm/osm_mcast_tbl.h>
 #include <opensm/osm_port_profile.h>
@@ -105,10 +104,12 @@ typedef struct _osm_switch
 	cl_map_item_t				map_item;
 	osm_node_t				*p_node;
 	ib_switch_info_t			switch_info;
-	osm_fwd_tbl_t				fwd_tbl;
-	osm_lid_matrix_t			lmx;
 	uint16_t				max_lid_ho;
+	unsigned				num_ports;
+	unsigned				num_hops;
+	uint8_t					**hops;
 	osm_port_profile_t			*p_prof;
+	osm_fwd_tbl_t				fwd_tbl;
 	osm_mcast_tbl_t				mcast_tbl;
 	uint32_t				discovery_count;
 	void					*priv;
@@ -124,19 +125,25 @@ typedef struct _osm_switch
 *	switch_info
 *		IBA defined SwitchInfo structure for this switch.
 *
-*	fwd_tbl
-*		This switch's forwarding table.
+*	max_lid_ho
+*		Max LID that is accessible from this switch.
+*
+*	num_ports
+*		Number of ports for this switch.
 *
-*	lmx
+*	num_hops
+*		Size of hops table for this switch.
+* 
+*	hops
 *		LID Matrix for this switch containing the hop count
 *		to every LID from every port.
 *
-*	max_lid_ho
-*		Max LID that is accessible from this switch.
-* 
-*	p_pro
+*	p_prof
 *		Pointer to array of Port Profile objects for this switch.
 *
+*	fwd_tbl
+*		This switch's forwarding table.
+*
 *	mcast_tbl
 *		Multicast forwarding table for this switch.
 *
@@ -149,70 +156,9 @@ typedef struct _osm_switch
 *	Switch object
 *********/
 
-/****f* OpenSM: Switch/osm_switch_construct
+/****f* OpenSM: Switch/osm_switch_delete
 * NAME
-*	osm_switch_construct
-*
-* DESCRIPTION
-*	This function constructs a Switch object.
-*
-* SYNOPSIS
-*/
-void
-osm_switch_construct(
-	IN osm_switch_t* const p_sw );
-/*
-* PARAMETERS
-*	p_sw
-*		[in] Pointer to a Switch object to construct.
-*
-* RETURN VALUE
-*	This function does not return a value.
-*
-* NOTES
-*	Allows calling osm_switch_init, and osm_switch_destroy.
-*
-*	Calling osm_switch_construct is a prerequisite to calling any other
-*	method except osm_switch_init.
-*
-* SEE ALSO
-*	Switch object, osm_switch_init, osm_switch_destroy
-*********/
-
-/****f* OpenSM: Switch/osm_switch_destroy
-* NAME
-*	osm_switch_destroy
-*
-* DESCRIPTION
-*	The osm_switch_destroy function destroys the object, releasing
-*	all resources.
-*
-* SYNOPSIS
-*/
-void
-osm_switch_destroy(
-	IN osm_switch_t* const p_sw );
-/*
-* PARAMETERS
-*	p_sw
-*		[in] Pointer to the object to destroy.
-*
-* RETURN VALUE
-*	None.
-*
-* NOTES
-*	Performs any necessary cleanup of the specified object.
-*	Further operations should not be attempted on the destroyed object.
-*	This function should only be called after a call to osm_switch_construct
-*	or osm_switch_init.
-*
-* SEE ALSO
-*	Switch object, osm_switch_construct, osm_switch_init
-*********/
-
-/****f* OpenSM: Switch/osm_switch_destroy
-* NAME
-*	osm_switch_destroy
+*	osm_switch_delete
 *
 * DESCRIPTION
 *	Destroys and deallocates the object.
@@ -236,42 +182,6 @@ osm_switch_delete(
 *	Switch object, osm_switch_construct, osm_switch_init
 *********/
 
-/****f* OpenSM: Switch/osm_switch_init
-* NAME
-*	osm_switch_init
-*
-* DESCRIPTION
-*	The osm_switch_init function initializes a Switch object for use.
-*
-* SYNOPSIS
-*/
-ib_api_status_t
-osm_switch_init(
-	IN osm_switch_t* const p_sw,
-	IN osm_node_t* const p_node,
-	IN const osm_madw_t* const p_madw );
-/*
-* PARAMETERS
-*	p_sw
-*		[in] Pointer to an osm_switch_t object to initialize.
-*
-*	p_node
-*		[in] Pointer to the node object of this switch
-*
-*	p_madw
-*		[in] Pointer to the MAD Wrapper containing the switch's
-*		SwitchInfo attribute.
-*
-* RETURN VALUES
-*	IB_SUCCESS if the Switch object was initialized successfully.
-*
-* NOTES
-*	Allows calling other node methods.
-*
-* SEE ALSO
-*	Switch object, osm_switch_construct, osm_switch_destroy
-*********/
-
 /****f* OpenSM: Switch/osm_switch_new
 * NAME
 *	osm_switch_new
@@ -317,8 +227,9 @@ static inline boolean_t
 osm_switch_is_leaf_lid(
 	IN const osm_switch_t* const p_sw,
 	IN const uint16_t lid_ho )
-{
-	return( osm_lid_matrix_get_least_hops( &p_sw->lmx, lid_ho ) <= 1 );
+{	
+	return (lid_ho > p_sw->max_lid_ho || !p_sw->hops[lid_ho]) ? FALSE :
+		(p_sw->hops[lid_ho][0] <= 1);
 }
 /*
 * PARAMETERS
@@ -353,7 +264,8 @@ osm_switch_get_hop_count(
 	IN const uint16_t lid_ho,
 	IN const uint8_t port_num )
 {
-	return( osm_lid_matrix_get( &p_sw->lmx, lid_ho, port_num ) );
+	return (lid_ho > p_sw->max_lid_ho || !p_sw->hops[lid_ho]) ?
+		OSM_NO_PATH : p_sw->hops[lid_ho][port_num];
 }
 /*
 * PARAMETERS
@@ -411,15 +323,12 @@ osm_switch_get_fwd_tbl_ptr(
 *
 * SYNOPSIS
 */
-static inline cl_status_t
+cl_status_t
 osm_switch_set_hops(
 	IN osm_switch_t* const p_sw,
 	IN const uint16_t lid_ho,
 	IN const uint8_t port_num,
-	IN const uint8_t num_hops )
-{
-	return( osm_lid_matrix_set( &p_sw->lmx, lid_ho, port_num, num_hops ) );
-}
+	IN const uint8_t num_hops );
 /*
 * PARAMETERS
 *	p_sw
@@ -442,35 +351,23 @@ osm_switch_set_hops(
 * SEE ALSO
 *********/
 
-/****f* OpenSM: Switch/osm_switch_set_min_lid_size
+/****f* OpenSM: Switch/osm_switch_hops_clear
 * NAME
-*	osm_switch_set_min_lid_size
+*	osm_switch_hops_clear
 *
 * DESCRIPTION
-*	Sets the size of the switch's routing table to at least accomodate the
-*	specified LID value (host ordered)
+*	Cleanup existing hops tables (lid matrix)
 *
 * SYNOPSIS
 */
-static inline cl_status_t
-osm_switch_set_min_lid_size(
-	IN osm_switch_t* const p_sw,
-	IN const uint16_t lid_ho )
-{
-	return( osm_lid_matrix_set_min_lid_size( &p_sw->lmx, lid_ho ) );
-}
+void
+osm_switch_hops_clear(
+	IN osm_switch_t *p_sw );
 /*
 * PARAMETERS
 *	p_sw
 *		[in] Pointer to a Switch object.
 *
-*	lid_ho
-*		[in] LID value (host order) for which to set the count.
-*
-* RETURN VALUES
-*	Sets the size of the switch's routing table to at least accomodate the
-*	specified LID value (host ordered)
-*
 * NOTES
 *
 * SEE ALSO
@@ -491,7 +388,8 @@ osm_switch_get_least_hops(
 	IN const osm_switch_t* const p_sw,
 	IN const uint16_t lid_ho )
 {
-	return( osm_lid_matrix_get_least_hops( &p_sw->lmx, lid_ho ) );
+	return (lid_ho > p_sw->max_lid_ho || !p_sw->hops[lid_ho]) ?
+		OSM_NO_PATH : p_sw->hops[lid_ho][0];
 }
 /*
 * PARAMETERS
@@ -768,9 +666,7 @@ static inline uint16_t
 osm_switch_get_max_lid_ho(
 	IN const osm_switch_t* const p_sw )
 {
-	if (p_sw->max_lid_ho != 0)
-		return p_sw->max_lid_ho;
-	return( osm_lid_matrix_get_max_lid_ho( &p_sw->lmx ) );
+	return p_sw->max_lid_ho;
 }
 /*
 * PARAMETERS
@@ -799,7 +695,7 @@ static inline uint8_t
 osm_switch_get_num_ports(
 	IN const osm_switch_t* const p_sw )
 {
-	return( osm_lid_matrix_get_num_ports( &p_sw->lmx ) );
+	return p_sw->num_ports;
 }
 /*
 * PARAMETERS
@@ -1348,12 +1244,16 @@ osm_switch_path_count_get(
 */
 void
 osm_switch_prepare_path_rebuild(
-	IN osm_switch_t* const p_sw );
+	IN osm_switch_t* p_sw,
+	IN uint16_t max_lids );
 /*
 * PARAMETERS
 *	p_sw
 *		[in] Pointer to the Switch object.
 *
+*	max_lids
+*		[in] Max number of lids in the subnet.
+*
 * RETURN VALUE
 *	None.
 *
diff --git a/osm/opensm/osm_switch.c b/osm/opensm/osm_switch.c
index 8e7728b..7c57398 100644
--- a/osm/opensm/osm_switch.c
+++ b/osm/opensm/osm_switch.c
@@ -55,20 +55,34 @@
 #include <iba/ib_types.h>
 #include <opensm/osm_switch.h>
 
+cl_status_t
 /**********************************************************************
  **********************************************************************/
-void
-osm_switch_construct(
-  IN osm_switch_t* const p_sw )
+osm_switch_set_hops(
+  IN osm_switch_t* const p_sw,
+  IN const uint16_t lid_ho,
+  IN const uint8_t port_num,
+  IN const uint8_t num_hops )
 {
-  CL_ASSERT( p_sw );
-  memset( p_sw, 0, sizeof(*p_sw) );
-  osm_lid_matrix_construct( &p_sw->lmx );
+  if (lid_ho > p_sw->max_lid_ho)
+    return -1;
+  if (!p_sw->hops[lid_ho]) {
+    p_sw->hops[lid_ho] = malloc(p_sw->num_ports);
+    if (!p_sw->hops[lid_ho])
+      return -1;
+    memset(p_sw->hops[lid_ho], 0xff, p_sw->num_ports);
+  }
+
+  p_sw->hops[lid_ho][port_num] = num_hops;
+  if (p_sw->hops[lid_ho][0] > num_hops)
+    p_sw->hops[lid_ho][0] = num_hops;
+
+  return 0;
 }
 
 /**********************************************************************
  **********************************************************************/
-ib_api_status_t
+static ib_api_status_t
 osm_switch_init(
   IN osm_switch_t* const p_sw,
   IN osm_node_t* const p_node,
@@ -80,12 +94,6 @@ osm_switch_init(
   uint8_t               num_ports;
   uint32_t              port_num;
 
-  CL_ASSERT( p_sw );
-  CL_ASSERT( p_madw );
-  CL_ASSERT( p_node );
-
-  osm_switch_construct( p_sw );
-
   p_smp = osm_madw_get_smp_ptr( p_madw );
   p_si = (ib_switch_info_t*)ib_smp_get_payload_ptr( p_smp );
   num_ports = osm_node_get_num_physp( p_node );
@@ -94,10 +102,7 @@ osm_switch_init(
 
   p_sw->p_node = p_node;
   p_sw->switch_info = *p_si;
-
-  status = osm_lid_matrix_init( &p_sw->lmx, num_ports );
-  if( status != IB_SUCCESS )
-    goto Exit;
+  p_sw->num_ports = num_ports;
 
   status = osm_fwd_tbl_init( &p_sw->fwd_tbl, p_si );
   if( status != IB_SUCCESS )
@@ -127,23 +132,20 @@ osm_switch_init(
 /**********************************************************************
  **********************************************************************/
 void
-osm_switch_destroy(
-  IN osm_switch_t* const p_sw )
+osm_switch_delete(
+  IN OUT osm_switch_t** const pp_sw )
 {
-  /* free memory to avoid leaks */
+  osm_switch_t *p_sw = *pp_sw;
+  unsigned i;
   osm_mcast_tbl_destroy( &p_sw->mcast_tbl );
   free( p_sw->p_prof );
   osm_fwd_tbl_destroy( &p_sw->fwd_tbl );
-  osm_lid_matrix_destroy( &p_sw->lmx );
-}
-
-/**********************************************************************
- **********************************************************************/
-void
-osm_switch_delete(
-  IN OUT osm_switch_t** const pp_sw )
-{
-  osm_switch_destroy( *pp_sw );
+  if (p_sw->hops) {
+    for (i = 0 ; i < p_sw->num_hops ; i++)
+      if (p_sw->hops[i])
+        free(p_sw->hops[i]);
+    free(p_sw->hops);
+  }
   free( *pp_sw );
   *pp_sw = NULL;
 }
@@ -158,6 +160,9 @@ osm_switch_new(
   ib_api_status_t status;
   osm_switch_t *p_sw;
 
+  CL_ASSERT( p_madw );
+  CL_ASSERT( p_node );
+
   p_sw = (osm_switch_t*)malloc( sizeof(*p_sw) );
   if( p_sw )
   {
@@ -322,6 +327,9 @@ osm_switch_recommend_path(
     }
   }
 
+  if (osm_node_get_base_lid(p_sw->p_node, 0) == cl_hton16(lid_ho))
+    return 0;
+
   /*
     This algorithm selects a port based on a static load balanced
     selection across equal hop-count ports.
@@ -337,7 +345,7 @@ osm_switch_recommend_path(
   */
 
   /* port number starts with zero and num_ports is 1 + num phys ports */
-  for ( port_num = 0; port_num < num_ports; port_num++ )
+  for ( port_num = 1; port_num < num_ports; port_num++ )
   {
     if ( osm_switch_get_hop_count( p_sw, lid_ho, port_num ) == least_hops)
     {
@@ -466,16 +474,45 @@ osm_switch_recommend_path(
 /**********************************************************************
  **********************************************************************/
 void
-osm_switch_prepare_path_rebuild(
-  IN osm_switch_t* const p_sw )
+osm_switch_hops_clear(
+  IN osm_switch_t *p_sw )
 {
-  uint8_t port_num;
-  uint8_t num_ports;
+  unsigned i;
+  for (i = 0 ; i < p_sw->num_hops ; i++)
+    if (p_sw->hops[i])
+      memset(p_sw->hops[i], 0xff, p_sw->num_ports);
+}
 
-  num_ports = osm_switch_get_num_ports( p_sw );
-  osm_lid_matrix_clear( &p_sw->lmx );
-  for( port_num = 0; port_num < num_ports; port_num++ )
-    osm_port_prof_construct( &p_sw->p_prof[port_num] );
+/**********************************************************************
+ **********************************************************************/
+void
+osm_switch_prepare_path_rebuild(
+  IN osm_switch_t* p_sw,
+  IN uint16_t max_lids )
+{
+  unsigned i;
+
+  for( i = 0; i < p_sw->num_ports; i++ )
+    osm_port_prof_construct( &p_sw->p_prof[i] );
+  if (!p_sw->hops) {
+    p_sw->hops = malloc((max_lids + 1)*sizeof(p_sw->hops[0]));
+    if (!p_sw->hops)
+      return;
+    memset(p_sw->hops, 0, (max_lids + 1)*sizeof(p_sw->hops[0]));
+    p_sw->num_hops = max_lids + 1;
+  }
+  else if (max_lids + 1 > p_sw->num_hops) {
+    uint8_t **old_hops = p_sw->hops;
+    p_sw->hops = malloc((max_lids + 1)*sizeof(p_sw->hops[0]));
+    if (!p_sw->hops)
+      return;
+    memcpy(p_sw->hops, old_hops, p_sw->num_hops*sizeof(p_sw->hops[0]));
+    memset(p_sw->hops + p_sw->num_hops, 0,
+           (max_lids + 1 - p_sw->num_hops)*sizeof(p_sw->hops[0]));
+    p_sw->num_hops = max_lids + 1;
+    free(old_hops);
+  }
+  p_sw->max_lid_ho = max_lids;
 }
 
 /**********************************************************************
diff --git a/osm/opensm/osm_ucast_ftree.c b/osm/opensm/osm_ucast_ftree.c
index 21aa4a8..61db1d7 100644
--- a/osm/opensm/osm_ucast_ftree.c
+++ b/osm/opensm/osm_ucast_ftree.c
@@ -782,9 +782,6 @@ __osm_ftree_sw_set_hops(
    IN  uint8_t          port_num,
    IN  uint8_t          hops)
 {
-   /* make sure the lid matrix has enough room */
-   osm_switch_set_min_lid_size(p_sw->p_osm_sw, max_lid_ho);
-
    /* set local min hop table(LID) */
    return osm_switch_set_hops(p_sw->p_osm_sw,
                               lid_ho,
diff --git a/osm/opensm/osm_ucast_mgr.c b/osm/opensm/osm_ucast_mgr.c
index 306c795..93cafae 100644
--- a/osm/opensm/osm_ucast_mgr.c
+++ b/osm/opensm/osm_ucast_mgr.c
@@ -407,11 +407,13 @@ static void __osm_ucast_mgr_dump_tables(osm_ucast_mgr_t *p_mgr)
  Starting a rebuild, so notify the switch so it can clear tables, etc...
 **********************************************************************/
 static void
-__osm_ucast_mgr_clean_switch(
+__osm_ucast_mgr_setup_switch(
   IN cl_map_item_t* const  p_map_item,
-  IN void* context )
+  IN void* cxt )
 {
-  osm_switch_prepare_path_rebuild((osm_switch_t *)p_map_item);
+  uint16_t lids = cl_ptr_vector_get_size(&((osm_subn_t *)cxt)->port_lid_tbl);
+  osm_switch_prepare_path_rebuild((osm_switch_t *)p_map_item,
+                                  lids ? lids - 1 : 0);
 }
 
 /**********************************************************************
@@ -519,12 +521,6 @@ __osm_ucast_mgr_process_neighbor(
   */
   max_lid_ho = osm_switch_get_max_lid_ho( p_remote_sw );
 
-  /*
-    Make sure the local lid matrix has enough room to hold
-    all the LID info coming from the remote LID matrix.
-  */
-  osm_switch_set_min_lid_size( p_sw, max_lid_ho );
-
   hops = OSM_NO_PATH;
   for( lid_ho = 1; lid_ho <= max_lid_ho; lid_ho++ )
   {
@@ -1221,7 +1217,7 @@ osm_ucast_mgr_process(
     goto Exit;
 
   p_mgr->any_change = FALSE;
-  cl_qmap_apply_func(p_sw_guid_tbl, __osm_ucast_mgr_clean_switch, NULL);
+  cl_qmap_apply_func(p_sw_guid_tbl, __osm_ucast_mgr_setup_switch, p_mgr->p_subn);
 
   if (!p_routing_eng->build_lid_matrices ||
       p_routing_eng->build_lid_matrices(p_routing_eng->context) != 0)
diff --git a/osm/opensm/osm_ucast_updn.c b/osm/opensm/osm_ucast_updn.c
index e8282f4..950bcb4 100644
--- a/osm/opensm/osm_ucast_updn.c
+++ b/osm/opensm/osm_ucast_updn.c
@@ -627,7 +627,7 @@ __osm_subn_set_up_down_min_hop_table(
     p_sw = p_next_sw;
     p_next_sw = (osm_switch_t*)cl_qmap_next( &p_sw->map_item );
     /* Clear Min Hop Table */
-    osm_lid_matrix_clear(&(p_sw->lmx));
+    osm_switch_hops_clear(p_sw);
   }
 
   osm_log( p_log, OSM_LOG_VERBOSE,
-- 
1.5.0.1.26.gf5a92


From sashak at voltaire.com  Sun Feb 25 14:19:43 2007
From: sashak at voltaire.com (Sasha Khapyorsky)
Date: Mon, 26 Feb 2007 00:19:43 +0200
Subject: [openib-general] [PATCH] opensm: remove osm_matrix.* files
In-Reply-To: <20070225214845.GF11957@sashak.voltaire.com>
References: <20070225214845.GF11957@sashak.voltaire.com>
Message-ID: <20070225221943.GG11957@sashak.voltaire.com>


Following previously submitted min hops reimplementation this removes
unused osm_matrix.* files.

Signed-off-by: Sasha Khapyorsky <sashak at voltaire.com>
---
 osm/include/Makefile.am         |    1 -
 osm/include/opensm/osm_matrix.h |  456 ---------------------------------------
 osm/opensm/Makefile.am          |    2 +-
 osm/opensm/osm_matrix.c         |  156 -------------
 4 files changed, 1 insertions(+), 614 deletions(-)
 delete mode 100644 osm/include/opensm/osm_matrix.h
 delete mode 100644 osm/opensm/osm_matrix.c

diff --git a/osm/include/Makefile.am b/osm/include/Makefile.am
index cf1b0e7..57b5296 100644
--- a/osm/include/Makefile.am
+++ b/osm/include/Makefile.am
@@ -17,7 +17,6 @@ EXTRA_DIST = \
 	$(srcdir)/opensm/osm_madw.h \
 	$(srcdir)/opensm/osm_subnet.h \
 	$(srcdir)/opensm/osm_sweep_fail_ctrl.h \
-	$(srcdir)/opensm/osm_matrix.h \
 	$(srcdir)/opensm/osm_sa_lft_record.h \
 	$(srcdir)/opensm/osm_sa_mft_record.h \
 	$(srcdir)/opensm/osm_resp.h \
diff --git a/osm/include/opensm/osm_matrix.h b/osm/include/opensm/osm_matrix.h
deleted file mode 100644
index 65db20a..0000000
--- a/osm/include/opensm/osm_matrix.h
+++ /dev/null
@@ -1,456 +0,0 @@
-/*
- * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved.
- * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved.
- * Copyright (c) 1996-2003 Intel Corporation. All rights reserved.
- *
- * This software is available to you under a choice of one of two
- * licenses.  You may choose to be licensed under the terms of the GNU
- * General Public License (GPL) Version 2, available from the file
- * COPYING in the main directory of this source tree, or the
- * OpenIB.org BSD license below:
- *
- *     Redistribution and use in source and binary forms, with or
- *     without modification, are permitted provided that the following
- *     conditions are met:
- *
- *      - Redistributions of source code must retain the above
- *        copyright notice, this list of conditions and the following
- *        disclaimer.
- *
- *      - Redistributions in binary form must reproduce the above
- *        copyright notice, this list of conditions and the following
- *        disclaimer in the documentation and/or other materials
- *        provided with the distribution.
- *
- * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
- * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
- * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
- * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
- * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
- * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
- * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
- * SOFTWARE.
- *
- */
-
-/*
- * Abstract:
- * 	Declaration of osm_lid_matrix_t.
- *	This object represents a two dimensional array of port numbers
- *	and LID values.
- *	This object is part of the OpenSM family of objects.
- *
- * Environment:
- * 	Linux User Mode
- *
- * $Revision: 1.5 $
- */
-
-#ifndef _OSM_MATRIX_H_
-#define _OSM_MATRIX_H_
-
-#include <iba/ib_types.h>
-#include <complib/cl_vector.h>
-#include <opensm/osm_base.h>
-
-#ifdef __cplusplus
-#  define BEGIN_C_DECLS extern "C" {
-#  define END_C_DECLS   }
-#else /* !__cplusplus */
-#  define BEGIN_C_DECLS
-#  define END_C_DECLS
-#endif /* __cplusplus */
-
-BEGIN_C_DECLS
-
-/****h* OpenSM/LID Matrix
-* NAME
-*	LID Matrix
-*
-* DESCRIPTION
-*	The LID Matrix object encapsulates the information needed by the
-*	OpenSM to manage fabric routes.  It is a two dimensional array
-*	index by LID value and Port Number.  Each element contains the
-*	number of hops from that Port Number to the LID.
-*	Every Switch object contains a LID Matrix.
-*
-*	The LID Matrix is not thread safe, thus callers must provide
-*	serialization.
-*
-*	This object should be treated as opaque and should be
-*	manipulated only through the provided functions.
-*
-* AUTHOR
-*	Steve King, Intel
-*
-*********/
-
-/****s* OpenSM: LID Matrix/osm_lid_matrix_t
-* NAME
-*	osm_lid_matrix_t
-*
-* DESCRIPTION
-*
-*	The LID Matrix object encapsulates the information needed by the
-*	OpenSM to manage fabric routes.  It is a two dimensional array
-*	indexed by LID value and Port Number.  Each element contains the
-*	number of hops from that Port Number to the LID.
-*	Every Switch object contains a LID Matrix.
-*
-*	The LID Matrix is not thread safe, thus callers must provide
-*	serialization.
-*
-*	The num_ports index into the matrix serves a special purpose, in that it
-*	contains the shortest hop path for that LID through any port.
-*
-*	This object should be treated as opaque and should be
-*	manipulated only through the provided functions.
-*
-* SYNOPSIS
-*/
-typedef struct _osm_lid_matrix_t
-{
-	cl_vector_t			lid_vec;
-	uint8_t				num_ports;
-} osm_lid_matrix_t;
-/*
-* FIELDS
-*	lid_vec
-*		Vector (indexed by LID) of port arrays (indexed by port number)
-*
-*	num_ports
-*		Number of ports at each entry in the LID vector.
-*
-* SEE ALSO
-*********/
-
-/****f* OpenSM: LID Matrix/osm_lid_matrix_construct
-* NAME
-*	osm_lid_matrix_construct
-*
-* DESCRIPTION
-*	This function constructs a LID Matrix object.
-*
-* SYNOPSIS
-*/
-static inline void
-osm_lid_matrix_construct(
-	IN osm_lid_matrix_t* const p_lmx )
-{
-	p_lmx->num_ports = 0;
-	cl_vector_construct( &p_lmx->lid_vec );
-}
-/*
-* PARAMETERS
-*	p_lmx
-*		[in] Pointer to a LID Matrix object to construct.
-*
-* RETURN VALUE
-*	This function does not return a value.
-*
-* NOTES
-*	Allows calling osm_lid_matrix_init, osm_lid_matrix_destroy
-*
-*	Calling osm_lid_matrix_construct is a prerequisite to calling any other
-*	method except osm_lid_matrix_init.
-*
-* SEE ALSO
-*	LID Matrix object, osm_lid_matrix_init, osm_lid_matrix_destroy
-*********/
-
-/****f* OpenSM: LID Matrix/osm_lid_matrix_destroy
-* NAME
-*	osm_lid_matrix_destroy
-*
-* DESCRIPTION
-*	The osm_lid_matrix_destroy function destroys a node, releasing
-*	all resources.
-*
-* SYNOPSIS
-*/
-void osm_lid_matrix_destroy(
-	IN osm_lid_matrix_t* const p_lmx );
-/*
-* PARAMETERS
-*	p_lmx
-*		[in] Pointer to a LID Matrix object to destroy.
-*
-* RETURN VALUE
-*	This function does not return a value.
-*
-* NOTES
-*	Performs any necessary cleanup of the specified LID Matrix object.
-*	Further operations should not be attempted on the destroyed object.
-*	This function should only be called after a call to osm_lid_matrix_construct or
-*	osm_lid_matrix_init.
-*
-* SEE ALSO
-*	LID Matrix object, osm_lid_matrix_construct, osm_lid_matrix_init
-*********/
-
-/****f* OpenSM: LID Matrix/osm_lid_matrix_init
-* NAME
-*	osm_lid_matrix_init
-*
-* DESCRIPTION
-*	Initializes a LID Matrix object for use.
-*
-* SYNOPSIS
-*/
-ib_api_status_t
-osm_lid_matrix_init(
-	IN osm_lid_matrix_t* const p_lmx,
-	IN const uint8_t num_ports );
-/*
-* PARAMETERS
-*	p_lmx
-*		[in] Pointer to an osm_lid_matrix_t object to initialize.
-*
-*	num_ports
-*		[in] Number of ports at each LID index.  This value is fixed
-*		at initialization time.
-*
-* RETURN VALUES
-*	IB_SUCCESS on success
-*
-* NOTES
-*
-* SEE ALSO
-*********/
-
-/****f* OpenSM: LID Matrix/osm_lid_matrix_get
-* NAME
-*	osm_lid_matrix_get
-*
-* DESCRIPTION
-*	Returns the hop count at the specified LID/Port intersection.
-*
-* SYNOPSIS
-*/
-static inline uint8_t
-osm_lid_matrix_get(
-	IN const osm_lid_matrix_t* const p_lmx,
-	IN const uint16_t lid_ho,
-	IN const uint8_t port_num )
-{
-	CL_ASSERT( port_num < p_lmx->num_ports );
-
-	if ( lid_ho >= cl_vector_get_size( &p_lmx->lid_vec ) )
-		return OSM_NO_PATH;
-
-	return( ((uint8_t *)cl_vector_get_ptr(
-			&p_lmx->lid_vec, lid_ho ))[port_num] );
-}
-/*
-* PARAMETERS
-*	p_lmx
-*		[in] Pointer to an osm_lid_matrix_t object.
-*
-*	lid_ho
-*		[in] LID value (host order) for which to return the hop count
-*
-*	port_num
-*		[in] Port number in the switch
-*
-* RETURN VALUES
-*	Returns the hop count at the specified LID/Port intersection.
-*
-* NOTES
-*
-* SEE ALSO
-*********/
-
-/****f* OpenSM: LID Matrix/osm_lid_matrix_get_max_lid_ho
-* NAME
-*	osm_lid_matrix_get_max_lid_ho
-*
-* DESCRIPTION
-*	Returns the maximum LID (host order) value contained
-*	in the matrix.
-*
-* SYNOPSIS
-*/
-static inline uint16_t
-osm_lid_matrix_get_max_lid_ho(
-	IN const osm_lid_matrix_t* const p_lmx )
-{
-	return cl_vector_get_size( &p_lmx->lid_vec ) ?
-		(uint16_t)(cl_vector_get_size( &p_lmx->lid_vec ) - 1) : 0;
-}
-/*
-* PARAMETERS
-*	p_lmx
-*		[in] Pointer to an osm_lid_matrix_t object.
-*
-* RETURN VALUES
-*	Returns the maximum LID (host order) value contained
-*	in the matrix.
-*
-* NOTES
-*
-* SEE ALSO
-*********/
-
-/****f* OpenSM: LID Matrix/osm_lid_matrix_get_num_ports
-* NAME
-*	osm_lid_matrix_get_num_ports
-*
-* DESCRIPTION
-*	Returns the number of ports in this lid matrix.
-*
-* SYNOPSIS
-*/
-static inline uint8_t
-osm_lid_matrix_get_num_ports(
-	IN const osm_lid_matrix_t* const p_lmx )
-{
-	return( p_lmx->num_ports );
-}
-/*
-* PARAMETERS
-*	p_lmx
-*		[in] Pointer to an osm_lid_matrix_t object.
-*
-* RETURN VALUES
-*	Returns the number of ports in this lid matrix.
-*
-* NOTES
-*
-* SEE ALSO
-*********/
-
-/****f* OpenSM: LID Matrix/osm_lid_matrix_get_least_hops
-* NAME
-*	osm_lid_matrix_get_least_hops
-*
-* DESCRIPTION
-*	Returns the least number of hops for specified lid
-*
-* SYNOPSIS
-*/
-static inline uint8_t
-osm_lid_matrix_get_least_hops(
-	IN const osm_lid_matrix_t* const p_lmx,
-	IN const uint16_t lid_ho )
-{
-	if( lid_ho > osm_lid_matrix_get_max_lid_ho( p_lmx ) )
-		return( OSM_NO_PATH );
-
-	return( ((uint8_t *)cl_vector_get_ptr(
-			&p_lmx->lid_vec, lid_ho ))[p_lmx->num_ports] );
-}
-/*
-* PARAMETERS
-*	p_lmx
-*		[in] Pointer to an osm_lid_matrix_t object.
-*
-*	lid_ho
-*		[in] LID (host order) for which to retrieve the shortest hop count.
-*
-* RETURN VALUES
-*	Returns the least number of hops for specified lid
-*
-* NOTES
-*
-* SEE ALSO
-*********/
-
-/****f* OpenSM: LID Matrix/osm_lid_matrix_set
-* NAME
-*	osm_lid_matrix_set
-*
-* DESCRIPTION
-*	Sets the hop count at the specified LID/Port intersection.
-*
-* SYNOPSIS
-*/
-cl_status_t
-osm_lid_matrix_set(
-	IN osm_lid_matrix_t* const p_lmx,
-	IN const uint16_t lid_ho,
-	IN const uint8_t port_num,
-	IN const uint8_t val );
-/*
-* PARAMETERS
-*	p_lmx
-*		[in] Pointer to an osm_lid_matrix_t object.
-*
-*	lid_ho
-*		[in] LID value (host order) to index into the vector.
-*
-*	port_num
-*		[in] port number index into the vector entry.
-*
-*	val
-*		[in] value (number of hops) to assign to this entry.
-*
-* RETURN VALUES
-*	Returns the hop count at the specified LID/Port intersection.
-*
-* NOTES
-*
-* SEE ALSO
-*********/
-
-/****f* OpenSM: LID Matrix/osm_lid_matrix_set_min_lid_size
-* NAME
-*	osm_lid_matrix_set_min_lid_size
-*
-* DESCRIPTION
-*	Sets the size of the matrix to at least accomodate the
-*	specified LID value (host ordered)
-*
-* SYNOPSIS
-*/
-static inline cl_status_t
-osm_lid_matrix_set_min_lid_size(
-	IN osm_lid_matrix_t* const p_lmx,
-	IN const uint16_t lid_ho )
-{
-	return( cl_vector_set_min_size( &p_lmx->lid_vec, lid_ho + 1 ) );
-}
-/*
-* PARAMETERS
-*	p_lmx
-*		[in] Pointer to an osm_lid_matrix_t object.
-*
-*	lid_ho
-*		[in] Minimum LID value (host order) to accomodate.
-*
-* RETURN VALUES
-*	Sets the size of the matrix to at least accomodate the
-*	specified LID value (host ordered)
-*
-* NOTES
-*
-* SEE ALSO
-*********/
-
-/****f* OpenSM: LID Matrix/osm_lid_matrix_clear
-* NAME
-*	osm_lid_matrix_clear
-*
-* DESCRIPTION
-*	Clears a LID Matrix object in anticipation of a rebuild.
-*
-* SYNOPSIS
-*/
-void
-osm_lid_matrix_clear(
-	IN osm_lid_matrix_t* const p_lmx );
-/*
-* PARAMETERS
-*	p_lmx
-*		[in] Pointer to an osm_lid_matrix_t object to clear.
-*
-* RETURN VALUES
-*	None.
-*
-* NOTES
-*
-* SEE ALSO
-*********/
-
-END_C_DECLS
-
-#endif		/* _OSM_MATRIX_H_ */
diff --git a/osm/opensm/Makefile.am b/osm/opensm/Makefile.am
index 15af336..01e1423 100644
--- a/osm/opensm/Makefile.am
+++ b/osm/opensm/Makefile.am
@@ -31,7 +31,7 @@ opensm_SOURCES = main.c osm_console.c osm_db_files.c \
 		 osm_db_pack.c osm_drop_mgr.c osm_fwd_tbl.c \
 		 osm_inform.c osm_lid_mgr.c osm_lin_fwd_rcv.c \
 		 osm_lin_fwd_tbl.c osm_link_mgr.c \
-		 osm_matrix.c osm_mcast_fwd_rcv.c \
+		 osm_mcast_fwd_rcv.c \
 		 osm_mcast_mgr.c osm_mcast_tbl.c osm_mcm_info.c \
 		 osm_mcm_port.c osm_mtree.c osm_multicast.c osm_node.c \
 		 osm_node_desc_rcv.c osm_node_info_rcv.c \
diff --git a/osm/opensm/osm_matrix.c b/osm/opensm/osm_matrix.c
deleted file mode 100644
index 7202922..0000000
--- a/osm/opensm/osm_matrix.c
+++ /dev/null
@@ -1,156 +0,0 @@
-/*
- * Copyright (c) 2004-2006 Voltaire, Inc. All rights reserved.
- * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved.
- * Copyright (c) 1996-2003 Intel Corporation. All rights reserved.
- *
- * This software is available to you under a choice of one of two
- * licenses.  You may choose to be licensed under the terms of the GNU
- * General Public License (GPL) Version 2, available from the file
- * COPYING in the main directory of this source tree, or the
- * OpenIB.org BSD license below:
- *
- *     Redistribution and use in source and binary forms, with or
- *     without modification, are permitted provided that the following
- *     conditions are met:
- *
- *      - Redistributions of source code must retain the above
- *        copyright notice, this list of conditions and the following
- *        disclaimer.
- *
- *      - Redistributions in binary form must reproduce the above
- *        copyright notice, this list of conditions and the following
- *        disclaimer in the documentation and/or other materials
- *        provided with the distribution.
- *
- * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
- * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
- * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
- * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
- * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
- * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
- * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
- * SOFTWARE.
- *
- */
-
-/*
- * Abstract:
- *    Implementation of osm_lid_matrix_t.
- * This file implements the LID Matrix object.
- *
- * Environment:
- *    Linux User Mode
- *
- * $Revision: 1.7 $
- */
-
-#if HAVE_CONFIG_H
-#  include <config.h>
-#endif /* HAVE_CONFIG_H */
-
-#include <string.h>
-#include <opensm/osm_matrix.h>
-
-/**********************************************************************
- **********************************************************************/
-void
-osm_lid_matrix_destroy(
-  IN osm_lid_matrix_t* const p_lmx )
-{
-  cl_vector_destroy( &p_lmx->lid_vec );
-}
-
-/**********************************************************************
- Initializer function called by cl_vector
-**********************************************************************/
-cl_status_t
-__osm_lid_matrix_vec_init(
-  IN void* const p_elem,
-  IN void* context )
-{
-  osm_lid_matrix_t* const p_lmx = (osm_lid_matrix_t*)context;
-
-  memset( p_elem, OSM_NO_PATH, p_lmx->num_ports + 1);
-  return( CL_SUCCESS );
-}
-
-/**********************************************************************
- Initializer function called by cl_vector
-**********************************************************************/
-void
-__osm_lid_matrix_vec_clear(
-  IN const size_t index,
-  IN void* const p_elem,
-  IN void* context )
-{
-  osm_lid_matrix_t* const p_lmx = (osm_lid_matrix_t*)context;
-
-  UNUSED_PARAM( index );
-  memset( p_elem, OSM_NO_PATH, p_lmx->num_ports + 1);
-}
-
-/**********************************************************************
- **********************************************************************/
-void
-osm_lid_matrix_clear(
-  IN osm_lid_matrix_t* const p_lmx )
-{
-  cl_vector_apply_func( &p_lmx->lid_vec,
-                        __osm_lid_matrix_vec_clear, p_lmx );
-}
-
-/**********************************************************************
- **********************************************************************/
-ib_api_status_t
-osm_lid_matrix_init(
-  IN osm_lid_matrix_t* const p_lmx,
-  IN const uint8_t num_ports )
-{
-  cl_vector_t *p_vec;
-  cl_status_t status;
-
-  CL_ASSERT( p_lmx );
-  CL_ASSERT( num_ports );
-
-  p_lmx->num_ports = num_ports;
-
-  p_vec = &p_lmx->lid_vec;
-  /*
-    Initialize the vector for the number of ports plus an
-    extra entry to hold the "least-hops" count for that LID.
-  */
-  status = cl_vector_init( p_vec,
-                           0,             /* min_size, */
-                           1,             /* grow_size */
-                           sizeof(uint8_t)*(num_ports + 1), /*  element size */
-                           __osm_lid_matrix_vec_init, /*  init function */
-                           NULL,           /*  destory func */
-                           p_lmx          /*  context */
-                           );
-
-  return( status );
-}
-
-/**********************************************************************
- **********************************************************************/
-cl_status_t
-osm_lid_matrix_set(
-  IN osm_lid_matrix_t* const p_lmx,
-  IN const uint16_t lid_ho,
-  IN const uint8_t port_num,
-  IN const uint8_t val )
-{
-  uint8_t *p_port_array;
-  cl_status_t status;
-
-  CL_ASSERT( port_num < p_lmx->num_ports );
-  status = cl_vector_set_min_size( &p_lmx->lid_vec, lid_ho + 1 );
-  if( status == CL_SUCCESS )
-  {
-    p_port_array = (uint8_t *)cl_vector_get_ptr( &p_lmx->lid_vec, lid_ho );
-    p_port_array[port_num] = val;
-    if( p_port_array[p_lmx->num_ports] > val )
-      p_port_array[p_lmx->num_ports] = val;
-  }
-  return( status );
-}
-- 
1.5.0.1.26.gf5a92


From sashak at voltaire.com  Sun Feb 25 15:47:05 2007
From: sashak at voltaire.com (Sasha Khapyorsky)
Date: Mon, 26 Feb 2007 01:47:05 +0200
Subject: [openib-general] [PATCH] opensm: remove some unneeded osm_switch
	functions
In-Reply-To: <20070225214845.GF11957@sashak.voltaire.com>
References: <20070225214845.GF11957@sashak.voltaire.com>
Message-ID: <20070225234705.GH11957@sashak.voltaire.com>


Following introduced simplification this patch removes single field
access functions from osm_switch.

Signed-off-by: Sasha Khapyorsky <sashak at voltaire.com>
---
 osm/include/opensm/osm_switch.h |  176 ---------------------------------------
 osm/opensm/osm_mcast_mgr.c      |   27 ++----
 osm/opensm/osm_mtree.c          |    4 +-
 osm/opensm/osm_node_info_rcv.c  |    2 +-
 osm/opensm/osm_state_mgr.c      |    4 +-
 osm/opensm/osm_sw_info_rcv.c    |   14 ++--
 osm/opensm/osm_switch.c         |    6 +-
 osm/opensm/osm_ucast_file.c     |   11 +--
 osm/opensm/osm_ucast_ftree.c    |   41 +++++-----
 osm/opensm/osm_ucast_lash.c     |    4 +-
 osm/opensm/osm_ucast_mgr.c      |   34 ++++----
 osm/opensm/osm_ucast_updn.c     |    8 +-
 12 files changed, 72 insertions(+), 259 deletions(-)

diff --git a/osm/include/opensm/osm_switch.h b/osm/include/opensm/osm_switch.h
index 19381f8..4e0d46d 100644
--- a/osm/include/opensm/osm_switch.h
+++ b/osm/include/opensm/osm_switch.h
@@ -623,93 +623,6 @@ osm_switch_get_max_block_id_in_use(
 *	Switch object
 *********/
 
-/****f* OpenSM: Switch/osm_switch_get_node_ptr
-* NAME
-*	osm_switch_get_node_ptr
-*
-* DESCRIPTION
-*	Returns a pointer to the Node object for this switch.
-*
-* SYNOPSIS
-*/
-static inline osm_node_t*
-osm_switch_get_node_ptr(
-	IN const osm_switch_t* const p_sw )
-{
-	return( p_sw->p_node );
-}
-/*
-* PARAMETERS
-*	p_sw
-*		[in] Pointer to an osm_switch_t object.
-*
-* RETURN VALUES
-*	Returns a pointer to the Node object for this switch.
-*	
-* NOTES
-*
-* SEE ALSO
-*	Switch object
-*********/
-
-/****f* OpenSM: Switch/osm_switch_get_max_lid_ho
-* NAME
-*	osm_switch_get_max_lid_ho
-*
-* DESCRIPTION
-*	Returns the maximum LID (host order) value contained
-*	in the switch routing tables.
-*
-* SYNOPSIS
-*/
-static inline uint16_t
-osm_switch_get_max_lid_ho(
-	IN const osm_switch_t* const p_sw )
-{
-	return p_sw->max_lid_ho;
-}
-/*
-* PARAMETERS
-*	p_sw
-*		[in] Pointer to a switch object.
-*
-* RETURN VALUES
-*	Returns the maximum LID (host order) value contained
-*	in the switch routing tables.
-*
-* NOTES
-*
-* SEE ALSO
-*********/
-
-/****f* OpenSM: Switch/osm_switch_get_num_ports
-* NAME
-*	osm_switch_get_num_ports
-*
-* DESCRIPTION
-*	Returns the number of ports in this switch.
-*
-* SYNOPSIS
-*/
-static inline uint8_t
-osm_switch_get_num_ports(
-	IN const osm_switch_t* const p_sw )
-{
-	return p_sw->num_ports;
-}
-/*
-* PARAMETERS
-*	p_sw
-*		[in] Pointer to an osm_switch_t object.
-*
-* RETURN VALUES
-*	Returns the number of ports in this switch.
-*
-* NOTES
-*
-* SEE ALSO
-*********/
-
 /****f* OpenSM: Switch/osm_switch_get_fwd_tbl_block
 * NAME
 *	osm_switch_get_fwd_tbl_block
@@ -1330,95 +1243,6 @@ osm_switch_is_in_mcast_tree(
 * SEE ALSO
 *********/
 
-/****f* OpenSM: Node/osm_switch_discovery_count_get
-* NAME
-*	osm_switch_discovery_count_get
-*
-* DESCRIPTION
-*	Returns a pointer to the physical port object at the
-*	specified local port number.
-*
-* SYNOPSIS
-*/
-static inline uint32_t
-osm_switch_discovery_count_get(
-	IN const osm_switch_t* const p_switch )
-{
-	return( p_switch->discovery_count );
-}
-/*
-* PARAMETERS
-*	p_switch
-*		[in] Pointer to an osm_switch_t object.
-*
-* RETURN VALUES
-*	Returns the discovery count for this node.
-*
-* NOTES
-*
-* SEE ALSO
-*	Node object
-*********/
-
-/****f* OpenSM: Node/osm_switch_discovery_count_reset
-* NAME
-*	osm_switch_discovery_count_reset
-*
-* DESCRIPTION
-*	Resets the discovery count for this node to zero.
-*	This operation should be performed at the start of a sweep.
-*
-* SYNOPSIS
-*/
-static inline void
-osm_switch_discovery_count_reset(
-	IN osm_switch_t* const p_switch )
-{
-	p_switch->discovery_count = 0;
-}
-/*
-* PARAMETERS
-*	p_switch
-*		[in] Pointer to an osm_switch_t object.
-*
-* RETURN VALUES
-*	None.
-*
-* NOTES
-*
-* SEE ALSO
-*	Node object
-*********/
-
-/****f* OpenSM: Node/osm_switch_discovery_count_inc
-* NAME
-*	osm_switch_discovery_count_inc
-*
-* DESCRIPTION
-*	Increments the discovery count for this node.
-*
-* SYNOPSIS
-*/
-static inline void
-osm_switch_discovery_count_inc(
-	IN osm_switch_t* const p_switch )
-{
-	p_switch->discovery_count++;
-}
-/*
-* PARAMETERS
-*	p_switch
-*		[in] Pointer to an osm_switch_t object.
-*
-* RETURN VALUES
-*	None.
-*
-* NOTES
-*
-* SEE ALSO
-*	Node object
-*********/
-
 END_C_DECLS
 
 #endif /* _OSM_SWITCH_H_ */
diff --git a/osm/opensm/osm_mcast_mgr.c b/osm/opensm/osm_mcast_mgr.c
index a5ad024..cf8ae7d 100644
--- a/osm/opensm/osm_mcast_mgr.c
+++ b/osm/opensm/osm_mcast_mgr.c
@@ -319,9 +319,7 @@ __osm_mcast_mgr_find_optimal_switch(
 
     if( osm_log_is_active( p_mgr->p_log, OSM_LOG_DEBUG ) )
     {
-      sw_guid_ho = cl_ntoh64( osm_node_get_node_guid(
-                                osm_switch_get_node_ptr( p_sw ) ) );
-
+      sw_guid_ho = cl_ntoh64( osm_node_get_node_guid(p_sw->p_node) );
       osm_log( p_mgr->p_log, OSM_LOG_DEBUG,
                "__osm_mcast_mgr_find_optimal_switch: "
                "Switch 0x%016" PRIx64 ", hops = %f\n",
@@ -339,9 +337,7 @@ __osm_mcast_mgr_find_optimal_switch(
   {
     if( p_best_sw )
     {
-      sw_guid_ho = cl_ntoh64( osm_node_get_node_guid(
-                                osm_switch_get_node_ptr( p_best_sw ) ) );
-
+      sw_guid_ho = cl_ntoh64( osm_node_get_node_guid(p_best_sw->p_node) );
       osm_log( p_mgr->p_log, OSM_LOG_VERBOSE,
                "__osm_mcast_mgr_find_optimal_switch: "
                "Best switch is 0x%" PRIx64 ", hops = %f\n",
@@ -459,7 +455,7 @@ __osm_mcast_mgr_set_tbl(
 
   CL_ASSERT( p_sw );
 
-  p_node = osm_switch_get_node_ptr( p_sw );
+  p_node = p_sw->p_node;
 
   CL_ASSERT( p_node );
 
@@ -571,9 +567,7 @@ __osm_mcast_mgr_subdivide(
         multicast and the multicast tree must branch at this
         switch.
       */
-      uint64_t node_guid_ho = cl_ntoh64( osm_node_get_node_guid(
-                                           osm_switch_get_node_ptr( p_sw ) ) );
-
+      uint64_t node_guid_ho = cl_ntoh64( osm_node_get_node_guid(p_sw->p_node) );
       osm_log( p_mgr->p_log, OSM_LOG_ERROR,
                "__osm_mcast_mgr_subdivide: ERR 0A03: "
                "Error routing MLID 0x%X through switch 0x%" PRIx64 "\n"
@@ -587,9 +581,7 @@ __osm_mcast_mgr_subdivide(
 
     if( port_num > array_size )
     {
-      uint64_t node_guid_ho = cl_ntoh64( osm_node_get_node_guid(
-                                           osm_switch_get_node_ptr( p_sw ) ) );
-
+      uint64_t node_guid_ho = cl_ntoh64( osm_node_get_node_guid(p_sw->p_node) );
       osm_log( p_mgr->p_log, OSM_LOG_ERROR,
                "__osm_mcast_mgr_subdivide: ERR 0A04: "
                "Error routing MLID 0x%X through switch 0x%" PRIx64 "\n"
@@ -669,7 +661,7 @@ __osm_mcast_mgr_branch(
   CL_ASSERT( p_list );
   CL_ASSERT( p_max_depth );
 
-  node_guid = osm_node_get_node_guid(  osm_switch_get_node_ptr( p_sw ) );
+  node_guid = osm_node_get_node_guid( p_sw->p_node );
   node_guid_ho = cl_ntoh64( node_guid );
   mlid_ho = cl_ntoh16( osm_mgrp_get_mlid( p_mgrp ) );
 
@@ -823,7 +815,7 @@ __osm_mcast_mgr_branch(
          needed to add the port to the table */
       continue;
 
-    p_node = osm_switch_get_node_ptr( p_sw );
+    p_node = p_sw->p_node;
     p_remote_node = osm_node_get_remote_node( p_node, i, NULL );
 
     if( osm_node_get_type( p_remote_node ) == IB_NODE_TYPE_SWITCH )
@@ -1033,8 +1025,7 @@ osm_mcast_mgr_set_table(
     osm_log( p_mgr->p_log, OSM_LOG_VERBOSE,
              "osm_mcast_mgr_set_table: "
              "Configuring MLID 0x%X on switch 0x%" PRIx64 "\n",
-             mlid_ho, osm_node_get_node_guid(
-               osm_switch_get_node_ptr( p_sw ) ) );
+             mlid_ho, osm_node_get_node_guid(p_sw->p_node) );
   }
 
   /*
@@ -1389,7 +1380,7 @@ mcast_mgr_dump_sw_routes(
   if( !osm_log_is_active( p_mgr->p_log, OSM_LOG_ROUTING ) )
     goto Exit;
 
-  p_node = osm_switch_get_node_ptr( p_sw );
+  p_node = p_sw->p_node;
 
   p_tbl = osm_switch_get_mcast_tbl_ptr( p_sw );
 
diff --git a/osm/opensm/osm_mtree.c b/osm/opensm/osm_mtree.c
index a98df2f..14bfa36 100644
--- a/osm/opensm/osm_mtree.c
+++ b/osm/opensm/osm_mtree.c
@@ -68,7 +68,7 @@ osm_mtree_node_init(
   osm_mtree_node_construct( p_mtn );
 
   p_mtn->p_sw = (osm_switch_t*)p_sw;
-  p_mtn->max_children = osm_switch_get_num_ports( p_sw );
+  p_mtn->max_children = p_sw->num_ports;
 
   for( i = 0; i < p_mtn->max_children; i++ )
     p_mtn->child_array[i] = NULL;
@@ -83,7 +83,7 @@ osm_mtree_node_new(
   osm_mtree_node_t *p_mtn;
 
   p_mtn = malloc( sizeof(osm_mtree_node_t) +
-                  sizeof(void*) * (osm_switch_get_num_ports( p_sw ) - 1) );
+                  sizeof(void*) * (p_sw->num_ports - 1) );
 
   if( p_mtn != NULL )
     osm_mtree_node_init( p_mtn, p_sw );
diff --git a/osm/opensm/osm_node_info_rcv.c b/osm/opensm/osm_node_info_rcv.c
index 5cbd3b7..3053df5 100644
--- a/osm/opensm/osm_node_info_rcv.c
+++ b/osm/opensm/osm_node_info_rcv.c
@@ -657,7 +657,7 @@ __osm_ni_rcv_process_existing_switch(
   else
   {
     /* Make sure we have SwitchInfo on this node */
-    if( !p_node->sw || osm_switch_discovery_count_get( p_node->sw ) == 0 )
+    if( !p_node->sw || p_node->sw->discovery_count == 0 )
     {
       /* we don't have the SwitchInfo - retry to get it */
       osm_log( p_rcv->p_log, OSM_LOG_DEBUG,
diff --git a/osm/opensm/osm_state_mgr.c b/osm/opensm/osm_state_mgr.c
index 2905857..61de8d2 100644
--- a/osm/opensm/osm_state_mgr.c
+++ b/osm/opensm/osm_state_mgr.c
@@ -566,7 +566,7 @@ __osm_state_mgr_reset_switch_count(
                cl_ntoh64( osm_node_get_node_guid( p_sw->p_node ) ) );
    }
 
-   osm_switch_discovery_count_reset( p_sw );
+   p_sw->discovery_count = 0;
 }
 
 /**********************************************************************
@@ -585,7 +585,7 @@ __osm_state_mgr_get_sw_info(
 
    OSM_LOG_ENTER( p_mgr->p_log, __osm_state_mgr_get_sw_info );
 
-   p_node = osm_switch_get_node_ptr( p_sw );
+   p_node = p_sw->p_node;
    p_dr_path = osm_node_get_any_dr_path_ptr( p_node );
 
    memset( &context, 0, sizeof( context ) );
diff --git a/osm/opensm/osm_sw_info_rcv.c b/osm/opensm/osm_sw_info_rcv.c
index fe3fe9f..013a724 100644
--- a/osm/opensm/osm_sw_info_rcv.c
+++ b/osm/opensm/osm_sw_info_rcv.c
@@ -82,7 +82,7 @@ __osm_si_rcv_get_port_info(
 
   CL_ASSERT( p_sw );
 
-  p_node = osm_switch_get_node_ptr( p_sw );
+  p_node = p_sw->p_node;
   p_smp = osm_madw_get_smp_ptr( p_madw );
 
   CL_ASSERT( osm_node_get_type( p_node ) == IB_NODE_TYPE_SWITCH );
@@ -154,7 +154,7 @@ __osm_si_rcv_get_fwd_tbl(
 
   CL_ASSERT( p_sw );
 
-  p_node = osm_switch_get_node_ptr( p_sw );
+  p_node = p_sw->p_node;
 
   CL_ASSERT( osm_node_get_type( p_node ) == IB_NODE_TYPE_SWITCH );
 
@@ -223,7 +223,7 @@ __osm_si_rcv_get_mcast_fwd_tbl(
 
   CL_ASSERT( p_sw );
 
-  p_node = osm_switch_get_node_ptr( p_sw );
+  p_node = p_sw->p_node;
 
   CL_ASSERT( osm_node_get_type( p_node ) == IB_NODE_TYPE_SWITCH );
 
@@ -393,7 +393,7 @@ __osm_si_rcv_process_new(
     info we just received.
   */
   osm_switch_set_switch_info( p_sw, p_si );
-  osm_switch_discovery_count_inc( p_sw );
+  p_sw->discovery_count++;
 
   /*
     Get the PortInfo attribute for every port.
@@ -505,14 +505,14 @@ __osm_si_rcv_process_existing(
         This is a heavy sweep.  Get information regardless
         of the state change bit.
       */
-      osm_switch_discovery_count_inc( p_sw );
+      p_sw->discovery_count++;
       osm_log( p_rcv->p_log, OSM_LOG_VERBOSE,
                "__osm_si_rcv_process_existing: "
                "discovery_count is:%u\n",
-               osm_switch_discovery_count_get( p_sw ) );
+               p_sw->discovery_count );
 
       /* If this is the first discovery - then get the port_info */
-      if ( osm_switch_discovery_count_get( p_sw ) == 1 )
+      if ( p_sw->discovery_count == 1 )
         __osm_si_rcv_get_port_info( p_rcv, p_sw, p_madw );
       else
       {
diff --git a/osm/opensm/osm_switch.c b/osm/opensm/osm_switch.c
index 7c57398..6db8add 100644
--- a/osm/opensm/osm_switch.c
+++ b/osm/opensm/osm_switch.c
@@ -195,7 +195,7 @@ osm_switch_get_fwd_tbl_block(
   CL_ASSERT( p_block );
 
   p_tbl = osm_switch_get_fwd_tbl_ptr( p_sw );
-  max_lid_ho = osm_switch_get_max_lid_ho( p_sw );
+  max_lid_ho = p_sw->max_lid_ho;
   lids_per_block = osm_fwd_tbl_get_lids_per_block( &p_sw->fwd_tbl );
   base_lid_ho = (uint16_t)(block_id * lids_per_block);
 
@@ -278,7 +278,7 @@ osm_switch_recommend_path(
 
   CL_ASSERT( lid_ho > 0 );
 
-  num_ports = osm_switch_get_num_ports( p_sw );
+  num_ports = p_sw->num_ports;
 
   least_hops = osm_switch_get_least_hops( p_sw, lid_ho );
   if ( least_hops == OSM_NO_PATH )
@@ -532,7 +532,7 @@ osm_switch_recommend_mcast_path(
   CL_ASSERT( lid_ho > 0 );
   CL_ASSERT( mlid_ho >= IB_LID_MCAST_START_HO );
 
-  num_ports = osm_switch_get_num_ports( p_sw );
+  num_ports = p_sw->num_ports;
 
   /*
     If the user wants us to ignore existing multicast routes,
diff --git a/osm/opensm/osm_ucast_file.c b/osm/opensm/osm_ucast_file.c
index a623a26..4de4c02 100644
--- a/osm/opensm/osm_ucast_file.c
+++ b/osm/opensm/osm_ucast_file.c
@@ -93,8 +93,8 @@ static void add_path(osm_opensm_t * p_osm,
 		osm_log(&p_osm->log, OSM_LOG_VERBOSE,
 			"add_path: LID collision is detected on switch "
 			"0x016%" PRIx64 ", will overwrite LID 0x%x entry\n",
-			cl_ntoh64(osm_node_get_node_guid
-				  (osm_switch_get_node_ptr(p_sw))), new_lid);
+			cl_ntoh64(osm_node_get_node_guid(p_sw->p_node)),
+			new_lid);
 	}
 
 	p_osm->sm.ucast_mgr.lft_buf[new_lid] = port_num;
@@ -106,8 +106,7 @@ static void add_path(osm_opensm_t * p_osm,
 		"add_path: route 0x%04x(was 0x%04x) %u 0x%016" PRIx64
 		" is added to switch 0x%016" PRIx64 "\n",
 		new_lid, lid, port_num, cl_ntoh64(port_guid),
-		cl_ntoh64(osm_node_get_node_guid
-			  (osm_switch_get_node_ptr(p_sw))));
+		cl_ntoh64(osm_node_get_node_guid(p_sw->p_node)));
 }
 
 static void add_lid_hops(osm_opensm_t *p_osm, osm_switch_t *p_sw,
@@ -118,8 +117,8 @@ static void add_lid_hops(osm_opensm_t *p_osm, osm_switch_t *p_sw,
 	uint8_t i;
 
 	new_lid = guid ? remap_lid(p_osm, lid, guid) : lid;
-	if (len > osm_switch_get_num_ports(p_sw))
-		len = osm_switch_get_num_ports(p_sw);
+	if (len > p_sw->num_ports)
+		len = p_sw->num_ports;
 
 	for (i = 0 ; i < len ; i++)
 		osm_switch_set_hops(p_sw, lid, i, hops[i]);
diff --git a/osm/opensm/osm_ucast_ftree.c b/osm/opensm/osm_ucast_ftree.c
index 61db1d7..ac8302b 100644
--- a/osm/opensm/osm_ucast_ftree.c
+++ b/osm/opensm/osm_ucast_ftree.c
@@ -579,7 +579,7 @@ __osm_ftree_sw_create(
    uint8_t ports_num;
 
    /* make sure that the switch has ports */
-   if (osm_switch_get_num_ports(p_osm_sw) == 1)
+   if (p_osm_sw->num_ports == 1)
       return NULL;
 
    p_sw = (ftree_sw_t *)malloc(sizeof(ftree_sw_t));
@@ -591,9 +591,9 @@ __osm_ftree_sw_create(
    p_sw->rank = 0xFF;
    __osm_ftree_tuple_init(p_sw->tuple);
 
-   p_sw->base_lid = osm_node_get_base_lid(osm_switch_get_node_ptr(p_sw->p_osm_sw),0);
+   p_sw->base_lid = osm_node_get_base_lid(p_sw->p_osm_sw->p_node,0);
 
-   ports_num = osm_node_get_num_physp(osm_switch_get_node_ptr(p_sw->p_osm_sw));
+   ports_num = osm_node_get_num_physp(p_sw->p_osm_sw->p_node);
    p_sw->down_port_groups = 
       (ftree_port_group_t **) malloc(ports_num * sizeof(ftree_port_group_t *));
    p_sw->up_port_groups = 
@@ -657,7 +657,7 @@ __osm_ftree_sw_dump(
            "__osm_ftree_sw_dump: "
            "Switch index: %s, GUID: 0x%016" PRIx64 ", Ports: %u DOWN, %u UP\n",
           __osm_ftree_tuple_to_str(p_sw->tuple),
-          cl_ntoh64(osm_node_get_node_guid(osm_switch_get_node_ptr(p_sw->p_osm_sw))), 
+          cl_ntoh64(osm_node_get_node_guid(p_sw->p_osm_sw->p_node)),
           p_sw->down_port_groups_num, 
           p_sw->up_port_groups_num);
 
@@ -1214,7 +1214,7 @@ __osm_ftree_fabric_dump_general_info(
                osm_log(&p_ftree->p_osm->log, OSM_LOG_VERBOSE,
                        "__osm_ftree_fabric_dump_general_info: "
                        "      GUID: 0x%016" PRIx64 ", LID: 0x%x, Index %s\n",
-                       cl_ntoh64(osm_node_get_node_guid(osm_switch_get_node_ptr(p_sw->p_osm_sw))),
+                       cl_ntoh64(osm_node_get_node_guid(p_sw->p_osm_sw->p_node)),
                        cl_ntoh16(p_sw->base_lid),
                        __osm_ftree_tuple_to_str(p_sw->tuple));
       }
@@ -1228,8 +1228,7 @@ __osm_ftree_fabric_dump_general_info(
                     "__osm_ftree_fabric_dump_general_info: "
                     "      GUID: 0x%016" PRIx64 ", LID: 0x%x, Index %s\n",
                     cl_ntoh64(osm_node_get_node_guid(
-                                 osm_switch_get_node_ptr(
-                                    p_ftree->leaf_switches[i]->p_osm_sw))),
+                              p_ftree->leaf_switches[i]->p_osm_sw->p_node)),
                     cl_ntoh16(p_ftree->leaf_switches[i]->base_lid),
                     __osm_ftree_tuple_to_str(p_ftree->leaf_switches[i]->tuple));
       }
@@ -1443,7 +1442,7 @@ __osm_ftree_fabric_make_indexing(
            p_sw->rank,
            __osm_ftree_tuple_to_str(p_sw->tuple),
            cl_ntoh16(p_sw->base_lid),
-           cl_ntoh64(osm_node_get_node_guid(osm_switch_get_node_ptr(p_sw->p_osm_sw))));
+           cl_ntoh64(osm_node_get_node_guid(p_sw->p_osm_sw->p_node)));
 
    /* 
     * Now run BFS and assign indexes to all switches
@@ -1617,11 +1616,11 @@ __osm_ftree_fabric_validate_topology(
                     "ERR AB09: Different number of upward port groups on switches:\n"
                     "       GUID 0x%016" PRIx64 ", LID 0x%x, Index %s - %u groups\n"
                     "       GUID 0x%016" PRIx64 ", LID 0x%x, Index %s - %u groups\n",
-                    cl_ntoh64(osm_node_get_node_guid(osm_switch_get_node_ptr(reference_sw_arr[p_sw->rank]->p_osm_sw))),
+                    cl_ntoh64(osm_node_get_node_guid(reference_sw_arr[p_sw->rank]->p_osm_sw->p_node)),
                     cl_ntoh16(reference_sw_arr[p_sw->rank]->base_lid),
                     __osm_ftree_tuple_to_str(reference_sw_arr[p_sw->rank]->tuple),
                     reference_sw_arr[p_sw->rank]->up_port_groups_num,
-                    cl_ntoh64(osm_node_get_node_guid(osm_switch_get_node_ptr(p_sw->p_osm_sw))),
+                    cl_ntoh64(osm_node_get_node_guid(p_sw->p_osm_sw->p_node)),
                     cl_ntoh16(p_sw->base_lid),
                     __osm_ftree_tuple_to_str(p_sw->tuple),
                     p_sw->up_port_groups_num);
@@ -1638,11 +1637,11 @@ __osm_ftree_fabric_validate_topology(
                     "ERR AB0A: Different number of downward port groups on switches:\n"
                     "       GUID 0x%016" PRIx64 ", LID 0x%x, Index %s - %u port groups\n"
                     "       GUID 0x%016" PRIx64 ", LID 0x%x, Index %s - %u port groups\n",
-                    cl_ntoh64(osm_node_get_node_guid(osm_switch_get_node_ptr(reference_sw_arr[p_sw->rank]->p_osm_sw))),
+                    cl_ntoh64(osm_node_get_node_guid(reference_sw_arr[p_sw->rank]->p_osm_sw->p_node)),
                     cl_ntoh16(reference_sw_arr[p_sw->rank]->base_lid),
                     __osm_ftree_tuple_to_str(reference_sw_arr[p_sw->rank]->tuple),
                     reference_sw_arr[p_sw->rank]->down_port_groups_num,
-                    cl_ntoh64(osm_node_get_node_guid(osm_switch_get_node_ptr(p_sw->p_osm_sw))),
+                    cl_ntoh64(osm_node_get_node_guid(p_sw->p_osm_sw->p_node)),
                     cl_ntoh16(p_sw->base_lid),
                     __osm_ftree_tuple_to_str(p_sw->tuple),
                     p_sw->down_port_groups_num);
@@ -1663,11 +1662,11 @@ __osm_ftree_fabric_validate_topology(
                            "ERR AB0B: Different number of ports in an upward port group on switches:\n"
                            "       GUID 0x%016" PRIx64 ", LID 0x%x, Index %s - %u ports\n"
                            "       GUID 0x%016" PRIx64 ", LID 0x%x, Index %s - %u ports\n",
-                           cl_ntoh64(osm_node_get_node_guid(osm_switch_get_node_ptr(reference_sw_arr[p_sw->rank]->p_osm_sw))),
+                           cl_ntoh64(osm_node_get_node_guid(reference_sw_arr[p_sw->rank]->p_osm_sw->p_node)),
                            cl_ntoh16(reference_sw_arr[p_sw->rank]->base_lid),
                            __osm_ftree_tuple_to_str(reference_sw_arr[p_sw->rank]->tuple),
                            cl_ptr_vector_get_size(&p_ref_group->ports),
-                           cl_ntoh64(osm_node_get_node_guid(osm_switch_get_node_ptr(p_sw->p_osm_sw))),
+                           cl_ntoh64(osm_node_get_node_guid(p_sw->p_osm_sw->p_node)),
                            cl_ntoh16(p_sw->base_lid),
                            __osm_ftree_tuple_to_str(p_sw->tuple),
                            cl_ptr_vector_get_size(&p_group->ports));
@@ -1691,11 +1690,11 @@ __osm_ftree_fabric_validate_topology(
                            "ERR AB0C: Different number of ports in an downward port group on switches:\n"
                            "       GUID 0x%016" PRIx64 ", LID 0x%x, Index %s - %u ports\n"
                            "       GUID 0x%016" PRIx64 ", LID 0x%x, Index %s - %u ports\n",
-                           cl_ntoh64(osm_node_get_node_guid(osm_switch_get_node_ptr(reference_sw_arr[p_sw->rank]->p_osm_sw))),
+                           cl_ntoh64(osm_node_get_node_guid(reference_sw_arr[p_sw->rank]->p_osm_sw->p_node)),
                            cl_ntoh16(reference_sw_arr[p_sw->rank]->base_lid),
                            __osm_ftree_tuple_to_str(reference_sw_arr[p_sw->rank]->tuple),
                            cl_ptr_vector_get_size(&p_ref_group->ports),
-                           cl_ntoh64(osm_node_get_node_guid(osm_switch_get_node_ptr(p_sw->p_osm_sw))),
+                           cl_ntoh64(osm_node_get_node_guid(p_sw->p_osm_sw->p_node)),
                            cl_ntoh16(p_sw->base_lid),
                            __osm_ftree_tuple_to_str(p_sw->tuple),
                            cl_ptr_vector_get_size(&p_group->ports));
@@ -2439,7 +2438,7 @@ __osm_ftree_rank_from_switch(
       p_sw = p_sw_tbl_element->p_sw;
       __osm_ftree_sw_tbl_element_destroy(p_sw_tbl_element);
 
-      p_node = osm_switch_get_node_ptr(p_sw->p_osm_sw);
+      p_node = p_sw->p_osm_sw->p_node;
 
       /* note: skipping port 0 on switches */
       for (i = 1; i < osm_node_get_num_physp(p_node); i++)
@@ -2550,7 +2549,7 @@ __osm_ftree_rank_switches_from_hca(
               "                                            - Switch guid: 0x%016" PRIx64 "\n"
               "                                            - Switch LID : 0x%x\n",
               cl_ntoh64(osm_node_get_node_guid(p_hca->p_osm_node)),
-              cl_ntoh64(osm_node_get_node_guid(osm_switch_get_node_ptr(p_sw->p_osm_sw))),
+              cl_ntoh64(osm_node_get_node_guid(p_sw->p_osm_sw->p_node)),
               cl_ntoh16(p_sw->base_lid));
       __osm_ftree_rank_from_switch(p_ftree, p_sw);
    }
@@ -2672,7 +2671,7 @@ __osm_ftree_fabric_construct_sw_ports(
 {
    ftree_hca_t       * p_remote_hca;
    ftree_sw_t        * p_remote_sw;
-   osm_node_t        * p_node = osm_switch_get_node_ptr(p_sw->p_osm_sw);
+   osm_node_t        * p_node = p_sw->p_osm_sw->p_node;
    osm_node_t        * p_remote_node;
    ib_net16_t          remote_base_lid;
    uint8_t             remote_node_type;
@@ -2740,10 +2739,10 @@ __osm_ftree_fabric_construct_sw_ports(
                        "       GUID 0x%016" PRIx64 ", LID 0x%x, rank %u\n",
                        p_sw->rank,
                        p_remote_sw->rank,
-                       cl_ntoh64(osm_node_get_node_guid(osm_switch_get_node_ptr(p_sw->p_osm_sw))),
+                       cl_ntoh64(osm_node_get_node_guid(p_sw->p_osm_sw->p_node)),
                        cl_ntoh16(p_sw->base_lid),
                        p_sw->rank,
-                       cl_ntoh64(osm_node_get_node_guid(osm_switch_get_node_ptr(p_remote_sw->p_osm_sw))),
+                       cl_ntoh64(osm_node_get_node_guid(p_remote_sw->p_osm_sw->p_node)),
                        cl_ntoh16(p_remote_sw->base_lid),
                        p_remote_sw->rank);
                res = -1;
diff --git a/osm/opensm/osm_ucast_lash.c b/osm/opensm/osm_ucast_lash.c
index f7ce5cd..2ce334a 100644
--- a/osm/opensm/osm_ucast_lash.c
+++ b/osm/opensm/osm_ucast_lash.c
@@ -1172,7 +1172,7 @@ static void populate_fwd_tbls(lash_t *p_lash)
       p_sw = p_next_sw;
       p_next_sw = (osm_switch_t*)cl_qmap_next( &p_sw->map_item );
 
-      max_lid_ho = osm_switch_get_max_lid_ho(p_sw);
+      max_lid_ho = p_sw->max_lid_ho;
       current_guid = p_sw->p_node->node_info.port_guid;
       sw = p_sw->priv;
 
@@ -1223,7 +1223,7 @@ static void print_fwd_table(IN const osm_switch_t *p_sw)
   uint16_t max_lid_ho, lid_ho;
   uint64_t switch_guid = osm_lash_get_switch_guid(p_sw);
 
-  max_lid_ho = osm_switch_get_max_lid_ho(p_sw);
+  max_lid_ho = p_sw->max_lid_ho;
   printf("FWDTBL: 0x%016" PRIx64 " max LID 0x%04X\n", cl_ntoh64(switch_guid), max_lid_ho);
 
   // starting at 1, not 0. Assuming no LID with an ID of 0
diff --git a/osm/opensm/osm_ucast_mgr.c b/osm/opensm/osm_ucast_mgr.c
index 93cafae..15dda55 100644
--- a/osm/opensm/osm_ucast_mgr.c
+++ b/osm/opensm/osm_ucast_mgr.c
@@ -190,8 +190,8 @@ __osm_ucast_mgr_dump_path_distribution(
 
   OSM_LOG_ENTER( p_mgr->p_log, __osm_ucast_mgr_dump_path_distribution );
 
-  p_node = osm_switch_get_node_ptr( p_sw );
-  num_ports = osm_switch_get_num_ports( p_sw );
+  p_node = p_sw->p_node;
+  num_ports = p_sw->num_ports;
 
   osm_log_printf( p_mgr->p_log, OSM_LOG_DEBUG,
                   "__osm_ucast_mgr_dump_path_distribution: "
@@ -260,9 +260,9 @@ __osm_ucast_mgr_dump_ucast_routes(
 
   OSM_LOG_ENTER( p_mgr->p_log, __osm_ucast_mgr_dump_ucast_routes );
 
-  p_node = osm_switch_get_node_ptr( p_sw );
+  p_node = p_sw->p_node;
 
-  max_lid_ho = osm_switch_get_max_lid_ho( p_sw );
+  max_lid_ho = p_sw->max_lid_ho;
 
   fprintf( file, "__osm_ucast_mgr_dump_ucast_routes: "
            "Switch 0x%016" PRIx64 "\n"
@@ -325,9 +325,9 @@ ucast_mgr_dump_lid_matrix(cl_map_item_t *p_map_item, void *cxt)
 	osm_switch_t* p_sw = (osm_switch_t *)p_map_item;
 	osm_ucast_mgr_t* p_mgr = ((struct ucast_mgr_dump_context *)cxt)->p_mgr;
 	FILE *file = ((struct ucast_mgr_dump_context *)cxt)->file;
-	osm_node_t *p_node = osm_switch_get_node_ptr(p_sw);
-	unsigned max_lid = osm_switch_get_max_lid_ho(p_sw);
-	unsigned max_port = osm_switch_get_num_ports(p_sw);
+	osm_node_t *p_node = p_sw->p_node;
+	unsigned max_lid = p_sw->max_lid_ho;
+	unsigned max_port = p_sw->num_ports;
 	uint16_t lid;
 	uint8_t port;
 
@@ -356,9 +356,9 @@ ucast_mgr_dump_lfts(cl_map_item_t *p_map_item, void *cxt)
 	osm_switch_t* p_sw = (osm_switch_t *)p_map_item;
 	osm_ucast_mgr_t* p_mgr = ((struct ucast_mgr_dump_context *)cxt)->p_mgr;
 	FILE *file = ((struct ucast_mgr_dump_context *)cxt)->file;
-	osm_node_t *p_node = osm_switch_get_node_ptr(p_sw);
-	unsigned max_lid = osm_switch_get_max_lid_ho(p_sw);
-	unsigned max_port = osm_switch_get_num_ports(p_sw);
+	osm_node_t *p_node = p_sw->p_node;
+	unsigned max_lid = p_sw->max_lid_ho;
+	unsigned max_port = p_sw->num_ports;
 	uint16_t lid;
 	uint8_t port;
 
@@ -496,8 +496,8 @@ __osm_ucast_mgr_process_neighbor(
   CL_ASSERT( port_num );
   CL_ASSERT( remote_port_num );
 
-  p_node = osm_switch_get_node_ptr( p_sw );
-  p_remote_node = osm_switch_get_node_ptr( p_remote_sw );
+  p_node = p_sw->p_node;
+  p_remote_node = p_remote_sw->p_node;
 
   CL_ASSERT( p_node );
   CL_ASSERT( p_remote_node );
@@ -519,7 +519,7 @@ __osm_ucast_mgr_process_neighbor(
   /*
     Iterate through all the LIDs in the neighbor switch.
   */
-  max_lid_ho = osm_switch_get_max_lid_ho( p_remote_sw );
+  max_lid_ho = p_remote_sw->max_lid_ho;
 
   hops = OSM_NO_PATH;
   for( lid_ho = 1; lid_ho <= max_lid_ho; lid_ho++ )
@@ -773,7 +773,7 @@ __osm_ucast_mgr_process_port(
   */
   CL_ASSERT( max_lid_ho < osm_switch_get_fwd_tbl_size( p_sw ) );
 
-  node_guid = osm_node_get_node_guid(osm_switch_get_node_ptr( p_sw ) );
+  node_guid = osm_node_get_node_guid( p_sw->p_node );
 
   /*
     The lid matrix contains the number of hops to each
@@ -887,7 +887,7 @@ osm_ucast_mgr_set_fwd_table(
 
   CL_ASSERT( p_sw );
 
-  p_node = osm_switch_get_node_ptr( p_sw );
+  p_node = p_sw->p_node;
 
   CL_ASSERT( p_node );
 
@@ -899,7 +899,7 @@ osm_ucast_mgr_set_fwd_table(
     Set the top of the unicast forwarding table.
   */
   si = p_sw->switch_info;
-  lin_top = cl_hton16( osm_switch_get_max_lid_ho( p_sw ) );
+  lin_top = cl_hton16( p_sw->max_lid_ho );
   if (lin_top != si.lin_top)
   {
     set_swinfo_require = TRUE;
@@ -927,7 +927,7 @@ osm_ucast_mgr_set_fwd_table(
       osm_log( p_mgr->p_log, OSM_LOG_DEBUG,
                "osm_ucast_mgr_set_fwd_table: "
                "Setting switch FT top to LID 0x%X\n",
-               osm_switch_get_max_lid_ho( p_sw ) );
+               p_sw->max_lid_ho );
     }
     
     context.si_context.light_sweep = FALSE;
diff --git a/osm/opensm/osm_ucast_updn.c b/osm/opensm/osm_ucast_updn.c
index 950bcb4..05b7347 100644
--- a/osm/opensm/osm_ucast_updn.c
+++ b/osm/opensm/osm_ucast_updn.c
@@ -267,7 +267,7 @@ __updn_bfs_by_node(
              "Visiting port GUID 0x%" PRIx64 "\n",
              cl_ntoh64(current_guid) );
     /* Go over all ports of the switch and find unvisited remote nodes */
-    for ( pn = 0; pn < osm_switch_get_num_ports(u->sw); pn++ )
+    for ( pn = 1; pn < u->sw->num_ports; pn++ )
     {
       osm_node_t *p_remote_node;
       struct updn_node *rem_u;
@@ -549,7 +549,7 @@ updn_subn_rank(
     u = (struct updn_node *)cl_qlist_remove_head(&list);
     /* Go over all remote nodes and rank them (if not already visited) */
     p_sw = u->sw;
-    num_ports = osm_switch_get_num_ports(p_sw);
+    num_ports = p_sw->num_ports;
     osm_log( p_log, OSM_LOG_DEBUG,
              "updn_subn_rank: "
              "Handling switch GUID 0x%" PRIx64 "\n",
@@ -743,7 +743,7 @@ expand_lid_matrices_for_lmc(
     {
       p_sw = (osm_switch_t *)p_next_sw;
       p_next_sw = cl_qmap_next(p_next_sw);
-      num_ports = osm_switch_get_num_ports(p_sw);
+      num_ports = p_sw->num_ports;
       for (port = 0; port < num_ports; port++) {
         hops = osm_switch_get_hop_count(p_sw, min_lid, port);
         for (lid = min_lid + 1 ; lid <= max_lid; lid++)
@@ -973,7 +973,7 @@ __osm_updn_find_root_nodes_by_min_hop(
 
     /* Clear Min Hop Table && FWD Tbls - This should caused opensm to
        rebuild it's FWD tables, post setting Min Hop Tables */
-    max_lid_ho = osm_switch_get_max_lid_ho(p_sw);
+    max_lid_ho = p_sw->max_lid_ho;
     /* Get base lid of switch by retrieving port 0 lid of node pointer */
     self_lid_ho = cl_ntoh16( osm_node_get_base_lid( p_sw->p_node, 0 ) );
     osm_log( &p_osm->log, OSM_LOG_DEBUG,
-- 
1.5.0.1.26.gf5a92


From kliteyn at dev.mellanox.co.il  Sun Feb 25 22:20:56 2007
From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik)
Date: Mon, 26 Feb 2007 08:20:56 +0200
Subject: [openib-general] [PATCH] osm: Flushing log file after
 OSM_SYS_LOG message
In-Reply-To: <20070225195837.GC11957@sashak.voltaire.com>
References: <45E19BE2.2070704@dev.mellanox.co.il>
	<20070225195837.GC11957@sashak.voltaire.com>
Message-ID: <45E27C48.8010700@dev.mellanox.co.il>


Sasha Khapyorsky wrote:
> On 16:23 Sun 25 Feb     , Yevgeny Kliteynik wrote:
>> Hi Hal,
>>
>> OSM log should be flushed when OSM_SYS_LOG message is
>> printed. We had this once, but somehow it has disappeared.
>>
>> This fix has to go both to trunk and to 1.2.
>>
>> Thanks,
>>
>> --Yevgeny
>>
>> Signed-off-by: Yevgeny Kliteynik <kliteyn at dev.mellanox.co.il>
>> ---
>>  osm/opensm/osm_log.c |    3 ++-
>>  1 files changed, 2 insertions(+), 1 deletions(-)
>>
>> diff --git a/osm/opensm/osm_log.c b/osm/opensm/osm_log.c
>> index d76031d..f95ed85 100644
>> --- a/osm/opensm/osm_log.c
>> +++ b/osm/opensm/osm_log.c
>> @@ -204,7 +204,8 @@ osm_log(
>>  #endif
>>   
>>      /*  flush log */
>> -    if (ret > 0 && (p_log->flush || (verbosity & OSM_LOG_ERROR)) &&
>> +    if ( ret > 0 && 
>> +        (p_log->flush || (verbosity & OSM_LOG_ERROR) || (verbosity & OSM_LOG_SYS)) &&
> 
> verbosity & (OSM_LOG_ERROR|OSM_LOG_SYS)?

Sure - why not

-- Yevgeny
 
> Sasha
> 


From sweitzen at cisco.com  Sun Feb 25 23:34:04 2007
From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen))
Date: Sun, 25 Feb 2007 23:34:04 -0800
Subject: [openib-general] bugs filed for problems compiling OFED 1.2 alpha1
Message-ID: <A15335FBE9BD2449AF2C9EF3D1EB8EA30314EBC5@xmb-sjc-216.amer.cisco.com>

Please fix these bugs for beta.  
 
I've compiled for RHEL4 and SLES10 on x86_64, i686, ia64, and ppc64.  I
compiled all MPIs with GNU, Intel, and PGI compilers.

*	380     OFED 1.2 alpha1 gcc MVAPICH won't compile on RHEL4 IA64
*	381     OFED 1.2 alpha1 MVAPICH2 won't compile on RHEL4 IA64
with Intel compiler
*	382     OFED 1.2 alpha1 mpitests won't compile with Intel
compiler for Open MPI (RHEL4 IA64)
*	383     OFED 1.2 alpha1 core/addr.c won't compile on SLES10 IA64
*	384     OFED 1.2 alpha1 ib-bonding won't compile on RHEL4 U3
ppc64
*	386     OFED 1.2 alpha1 gcc MVAPICH2 won't compile on RHEL4
ppc64 (add -m64)
*	387     OFED 1.2 alpha1 Open MPI won't compile on SLES10 ppc64

Scott Weitzenkamp
SQA and Release Manager
Server Virtualization Business Unit
Cisco Systems
 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070225/e0b9523e/attachment.html>

From sweitzen at cisco.com  Sun Feb 25 23:37:36 2007
From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen))
Date: Sun, 25 Feb 2007 23:37:36 -0800
Subject: [openib-general] bugs filed for OFED 1.2 alpha1 MPI compiler support
Message-ID: <A15335FBE9BD2449AF2C9EF3D1EB8EA30314EBC6@xmb-sjc-216.amer.cisco.com>

Please fix these bugs for beta.  
 
I've compiled for RHEL4 and SLES10 on x86_64, i686, ia64, and ppc64.  I
compiled all MPIs with GNU, Intel, and PGI compilers, and tried
compiling and running C, C++, Fortran 77, and Fortran 90 programs with
each combo.

*	370     OFED 1.2 alpha1 MVAPICH does not have Intel Fortran
support
*	372     MVAPICH2 GNU mpif90 uses PGI not GNU compiler
*	373     MVAPICH2 Intel mpif90 does not include -rpath like
mpif77 does
*	374     MVAPICH2 PGI mpif90 link failure: undefined reference
..Dm_mpi
*	375     Open MPI PGI C++ failure at runtime

Scott Weitzenkamp
SQA and Release Manager
Server Virtualization Business Unit
Cisco Systems
 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070225/9128ed9c/attachment.html>

From mst at mellanox.co.il  Sun Feb 25 23:47:39 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Mon, 26 Feb 2007 09:47:39 +0200
Subject: [openib-general] bugs filed for problems compiling OFED 1.2
	alpha1
In-Reply-To: <A15335FBE9BD2449AF2C9EF3D1EB8EA30314EBC5@xmb-sjc-216.amer.cisco.com>
References: <A15335FBE9BD2449AF2C9EF3D1EB8EA30314EBC5@xmb-sjc-216.amer.cisco.com>
Message-ID: <20070226074739.GA27677@mellanox.co.il>

> Quoting Scott Weitzenkamp (sweitzen) <sweitzen at cisco.com>:
> Subject: bugs filed for problems compiling OFED 1.2 alpha1
> 
> Please fix these bugs for beta. 
>  

Scott, you have assigned all bugs to bugzilla at openib.org.
To have the bugs resolved, please assign them to maintainers of
appropriate module.

For example, bonding module owner is Moni Shoua <monis at voltaire.com>,
so I think bug 384 should be assigned to him.

-- 
MST


From bugzilla-daemon at lists.openfabrics.org  Sun Feb 25 23:49:20 2007
From: bugzilla-daemon at lists.openfabrics.org (bugzilla-daemon at lists.openfabrics.org)
Date: Sun, 25 Feb 2007 23:49:20 -0800 (PST)
Subject: [openib-general] [Bug 384] OFED 1.2 alpha1 ib-bonding won't compile
 on RHEL4 U3 ppc64
In-Reply-To: <bug-384-1@https.bugs.openfabrics.org/>
Message-ID: <20070226074920.7BFACE6080D@openfabrics.org>

https://bugs.openfabrics.org/show_bug.cgi?id=384


sweitzen at cisco.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         AssignedTo|bugzilla at openib.org         |monis at voltaire.com


------- Comment #1 from sweitzen at cisco.com  2007-02-25 23:49 -------
Scott, you have assigned all bugs to bugzilla at openib.org.
To have the bugs resolved, please assign them to maintainers of
appropriate module.

For example, bonding module owner is Moni Shoua <monis at voltaire.com>,
so I think bug 384 should be assigned to him.

-- 
MST


-- 
Configure bugmail: https://bugs.openfabrics.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
You are the assignee for the bug, or are watching the assignee.


From sweitzen at cisco.com  Sun Feb 25 23:50:42 2007
From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen))
Date: Sun, 25 Feb 2007 23:50:42 -0800
Subject: [openib-general] bugs filed for problems compiling OFED 1.2
	alpha1
In-Reply-To: <20070226074739.GA27677@mellanox.co.il>
References: <A15335FBE9BD2449AF2C9EF3D1EB8EA30314EBC5@xmb-sjc-216.amer.cisco.com>
	<20070226074739.GA27677@mellanox.co.il>
Message-ID: <A15335FBE9BD2449AF2C9EF3D1EB8EA30314EBC8@xmb-sjc-216.amer.cisco.com>


> Scott, you have assigned all bugs to bugzilla at openib.org.
> To have the bugs resolved, please assign them to maintainers of
> appropriate module.

Not sure what you mean by "all", only 384 was not assigned to a specific
person.

Scott


From mst at mellanox.co.il  Sun Feb 25 23:50:59 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Mon, 26 Feb 2007 09:50:59 +0200
Subject: [openib-general] bugs filed for problems compiling OFED 1.2
	alpha1
In-Reply-To: <A15335FBE9BD2449AF2C9EF3D1EB8EA30314EBC5@xmb-sjc-216.amer.cisco.com>
References: <A15335FBE9BD2449AF2C9EF3D1EB8EA30314EBC5@xmb-sjc-216.amer.cisco.com>
Message-ID: <20070226075059.GB27677@mellanox.co.il>

> Quoting Scott Weitzenkamp (sweitzen) <sweitzen at cisco.com>:
> Subject: bugs filed for problems compiling OFED 1.2 alpha1
> 
> Please fix these bugs for beta. 
>  
> I've compiled for RHEL4 and SLES10 on x86_64, i686, ia64, and ppc64.  I
> compiled all MPIs with GNU, Intel, and PGI compilers.
> 
>   • 380     OFED 1.2 alpha1 gcc MVAPICH won't compile on RHEL4 IA64
>   • 381     OFED 1.2 alpha1 MVAPICH2 won't compile on RHEL4 IA64 with Intel
>     compiler
>   • 382     OFED 1.2 alpha1 mpitests won't compile with Intel compiler for Open
>     MPI (RHEL4 IA64)
>   • 383     OFED 1.2 alpha1 core/addr.c won't compile on SLES10 IA64
>   • 384     OFED 1.2 alpha1 ib-bonding won't compile on RHEL4 U3 ppc64
>   • 386     OFED 1.2 alpha1 gcc MVAPICH2 won't compile on RHEL4 ppc64 (add
>     -m64)
>   • 387     OFED 1.2 alpha1 Open MPI won't compile on SLES10 ppc64

Some of these might be fixed in recent nightly builds.
Specifically I know 383 was fixed yesterday. Please check this and let us know.

-- 
MST


From mst at mellanox.co.il  Sun Feb 25 23:53:03 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Mon, 26 Feb 2007 09:53:03 +0200
Subject: [openib-general] bugs filed for problems compiling OFED 1.2
	alpha1
In-Reply-To: <A15335FBE9BD2449AF2C9EF3D1EB8EA30314EBC8@xmb-sjc-216.amer.cisco.com>
References: <A15335FBE9BD2449AF2C9EF3D1EB8EA30314EBC5@xmb-sjc-216.amer.cisco.com>
	<20070226074739.GA27677@mellanox.co.il>
	<A15335FBE9BD2449AF2C9EF3D1EB8EA30314EBC8@xmb-sjc-216.amer.cisco.com>
Message-ID: <20070226075303.GC27677@mellanox.co.il>

> Quoting Scott Weitzenkamp (sweitzen) <sweitzen at cisco.com>:
> Subject: RE: bugs filed for problems compiling OFED 1.2 alpha1
> 
> 
> > Scott, you have assigned all bugs to bugzilla at openib.org.
> > To have the bugs resolved, please assign them to maintainers of
> > appropriate module.
> 
> Not sure what you mean by "all", only 384 was not assigned to a specific
> person.

Correct. Sorry about that.

-- 
MST


From vlad at lists.openfabrics.org  Mon Feb 26 02:26:28 2007
From: vlad at lists.openfabrics.org (vlad at lists.openfabrics.org)
Date: Mon, 26 Feb 2007 02:26:28 -0800 (PST)
Subject: [openib-general] ofa_1_2_kernel 20070226-0200 daily build status
Message-ID: <20070226102629.CDC67E6080A@openfabrics.org>

This email was generated automatically, please do not reply


Common build parameters:  --with-ipoib-mod --with-sdp-mod --with-srp-mod --with-user_mad-mod --with-user_access-mod --with-mthca-mod --with-core-mod --with-addr_trans-mod --with-cxgb3-mod --with-vnic-mod 

Passed:
Passed on i686 with 2.6.15-23-server
Passed on i686 with linux-2.6.18
Passed on i686 with linux-2.6.17
Passed on i686 with linux-2.6.15
Passed on i686 with linux-2.6.16
Passed on i686 with linux-2.6.19
Passed on x86_64 with linux-2.6.20
Passed on x86_64 with linux-2.6.16
Passed on x86_64 with linux-2.6.17
Passed on x86_64 with linux-2.6.15
Passed on x86_64 with linux-2.6.19
Passed on x86_64 with linux-2.6.18
Passed on x86_64 with linux-2.6.9-42.ELsmp
Passed on x86_64 with linux-2.6.18-1.2798.fc6

Failed:
Build failed on i686 with linux-2.6.12
Build failed on i686 with linux-2.6.13
Build failed on i686 with linux-2.6.14
Build failed on powerpc with linux-2.6.19
Log:
/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.19_powerpc_check/drivers/infiniband/ulp/vnic/vnic_control.c:1404: error: implicit declaration of function ‘vmalloc’
/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.19_powerpc_check/drivers/infiniband/ulp/vnic/vnic_control.c:1404: warning: assignment makes pointer from integer without a cast
/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.19_powerpc_check/drivers/infiniband/ulp/vnic/vnic_control.c:1440: error: implicit declaration of function ‘vfree’
make[4]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.19_powerpc_check/drivers/infiniband/ulp/vnic/vnic_control.o] Error 1
make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.19_powerpc_check/drivers/infiniband/ulp/vnic] Error 2
make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.19_powerpc_check/drivers/infiniband] Error 2
make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.19_powerpc_check] Error 2
make[1]: Leaving directory `/home/vlad/kernel.org/powerpc/linux-2.6.19'
make: *** [kernel] Error 2
----------------------------------------------------------------------------------
Build failed on powerpc with linux-2.6.18
Log:
/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.18_powerpc_check/drivers/infiniband/ulp/vnic/vnic_control.c:1404: error: implicit declaration of function ‘vmalloc’
/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.18_powerpc_check/drivers/infiniband/ulp/vnic/vnic_control.c:1404: warning: assignment makes pointer from integer without a cast
/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.18_powerpc_check/drivers/infiniband/ulp/vnic/vnic_control.c:1440: error: implicit declaration of function ‘vfree’
make[4]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.18_powerpc_check/drivers/infiniband/ulp/vnic/vnic_control.o] Error 1
make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.18_powerpc_check/drivers/infiniband/ulp/vnic] Error 2
make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.18_powerpc_check/drivers/infiniband] Error 2
make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.18_powerpc_check] Error 2
make[1]: Leaving directory `/home/vlad/kernel.org/powerpc/linux-2.6.18'
make: *** [kernel] Error 2
----------------------------------------------------------------------------------
Build failed on ppc64 with linux-2.6.19
Log:
/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.19_ppc64_check/drivers/infiniband/ulp/vnic/vnic_control.c:1736: warning: format ‘%llx’ expects type ‘long long unsigned int’, but argument 3 has type ‘__be64’
/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.19_ppc64_check/drivers/infiniband/ulp/vnic/vnic_control.c: In function ‘control_log_data_path_pkt’:
/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.19_ppc64_check/drivers/infiniband/ulp/vnic/vnic_control.c:1751: warning: format ‘%llx’ expects type ‘long long unsigned int’, but argument 2 has type ‘u64’
make[4]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.19_ppc64_check/drivers/infiniband/ulp/vnic/vnic_control.o] Error 1
make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.19_ppc64_check/drivers/infiniband/ulp/vnic] Error 2
make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.19_ppc64_check/drivers/infiniband] Error 2
make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.19_ppc64_check] Error 2
make[1]: Leaving directory `/home/vlad/kernel.org/ppc64/linux-2.6.19'
make: *** [kernel] Error 2
----------------------------------------------------------------------------------
Build failed on powerpc with linux-2.6.15
Log:
/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.15_powerpc_check/drivers/infiniband/ulp/vnic/vnic_control.c:1404: error: implicit declaration of function ‘vmalloc’
/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.15_powerpc_check/drivers/infiniband/ulp/vnic/vnic_control.c:1404: warning: assignment makes pointer from integer without a cast
/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.15_powerpc_check/drivers/infiniband/ulp/vnic/vnic_control.c:1440: error: implicit declaration of function ‘vfree’
make[4]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.15_powerpc_check/drivers/infiniband/ulp/vnic/vnic_control.o] Error 1
make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.15_powerpc_check/drivers/infiniband/ulp/vnic] Error 2
make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.15_powerpc_check/drivers/infiniband] Error 2
make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.15_powerpc_check] Error 2
make[1]: Leaving directory `/home/vlad/kernel.org/powerpc/linux-2.6.15'
make: *** [kernel] Error 2
----------------------------------------------------------------------------------
Build failed on powerpc with linux-2.6.13
Log:
/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.13_powerpc_check/drivers/infiniband/ulp/vnic/vnic_control.c:1404: error: implicit declaration of function ‘vmalloc’
/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.13_powerpc_check/drivers/infiniband/ulp/vnic/vnic_control.c:1404: warning: assignment makes pointer from integer without a cast
/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.13_powerpc_check/drivers/infiniband/ulp/vnic/vnic_control.c:1440: error: implicit declaration of function ‘vfree’
make[4]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.13_powerpc_check/drivers/infiniband/ulp/vnic/vnic_control.o] Error 1
make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.13_powerpc_check/drivers/infiniband/ulp/vnic] Error 2
make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.13_powerpc_check/drivers/infiniband] Error 2
make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.13_powerpc_check] Error 2
make[1]: Leaving directory `/home/vlad/kernel.org/powerpc/linux-2.6.13'
make: *** [kernel] Error 2
----------------------------------------------------------------------------------
Build failed on ia64 with linux-2.6.18
Log:
/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.18_ia64_check/drivers/infiniband/ulp/vnic/vnic_control.c:1736: warning: format ‘%llx’ expects type ‘long long unsigned int’, but argument 3 has type ‘long unsigned int’
/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.18_ia64_check/drivers/infiniband/ulp/vnic/vnic_control.c: In function ‘control_log_data_path_pkt’:
/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.18_ia64_check/drivers/infiniband/ulp/vnic/vnic_control.c:1751: warning: format ‘%llx’ expects type ‘long long unsigned int’, but argument 2 has type ‘u64’
make[4]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.18_ia64_check/drivers/infiniband/ulp/vnic/vnic_control.o] Error 1
make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.18_ia64_check/drivers/infiniband/ulp/vnic] Error 2
make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.18_ia64_check/drivers/infiniband] Error 2
make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.18_ia64_check] Error 2
make[1]: Leaving directory `/home/vlad/kernel.org/ia64/linux-2.6.18'
make: *** [kernel] Error 2
----------------------------------------------------------------------------------
Build failed on ia64 with linux-2.6.19
Log:
/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.19_ia64_check/drivers/infiniband/ulp/vnic/vnic_control.c:1736: warning: format ‘%llx’ expects type ‘long long unsigned int’, but argument 3 has type ‘long unsigned int’
/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.19_ia64_check/drivers/infiniband/ulp/vnic/vnic_control.c: In function ‘control_log_data_path_pkt’:
/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.19_ia64_check/drivers/infiniband/ulp/vnic/vnic_control.c:1751: warning: format ‘%llx’ expects type ‘long long unsigned int’, but argument 2 has type ‘u64’
make[4]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.19_ia64_check/drivers/infiniband/ulp/vnic/vnic_control.o] Error 1
make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.19_ia64_check/drivers/infiniband/ulp/vnic] Error 2
make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.19_ia64_check/drivers/infiniband] Error 2
make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.19_ia64_check] Error 2
make[1]: Leaving directory `/home/vlad/kernel.org/ia64/linux-2.6.19'
make: *** [kernel] Error 2
----------------------------------------------------------------------------------
Build failed on ppc64 with linux-2.6.12
Log:
/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.12_ppc64_check/drivers/infiniband/ulp/vnic/vnic_control.c:1736: warning: format ‘%llx’ expects type ‘long long unsigned int’, but argument 3 has type ‘__be64’
/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.12_ppc64_check/drivers/infiniband/ulp/vnic/vnic_control.c: In function ‘control_log_data_path_pkt’:
/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.12_ppc64_check/drivers/infiniband/ulp/vnic/vnic_control.c:1751: warning: format ‘%llx’ expects type ‘long long unsigned int’, but argument 2 has type ‘u64’
make[4]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.12_ppc64_check/drivers/infiniband/ulp/vnic/vnic_control.o] Error 1
make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.12_ppc64_check/drivers/infiniband/ulp/vnic] Error 2
make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.12_ppc64_check/drivers/infiniband] Error 2
make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.12_ppc64_check] Error 2
make[1]: Leaving directory `/home/vlad/kernel.org/ppc64/linux-2.6.12'
make: *** [kernel] Error 2
----------------------------------------------------------------------------------
Build failed on ppc64 with linux-2.6.18
Log:
/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.18_ppc64_check/drivers/infiniband/ulp/vnic/vnic_control.c:1736: warning: format ‘%llx’ expects type ‘long long unsigned int’, but argument 3 has type ‘__be64’
/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.18_ppc64_check/drivers/infiniband/ulp/vnic/vnic_control.c: In function ‘control_log_data_path_pkt’:
/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.18_ppc64_check/drivers/infiniband/ulp/vnic/vnic_control.c:1751: warning: format ‘%llx’ expects type ‘long long unsigned int’, but argument 2 has type ‘u64’
make[4]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.18_ppc64_check/drivers/infiniband/ulp/vnic/vnic_control.o] Error 1
make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.18_ppc64_check/drivers/infiniband/ulp/vnic] Error 2
make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.18_ppc64_check/drivers/infiniband] Error 2
make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.18_ppc64_check] Error 2
make[1]: Leaving directory `/home/vlad/kernel.org/ppc64/linux-2.6.18'
make: *** [kernel] Error 2
----------------------------------------------------------------------------------
Build failed on powerpc with linux-2.6.17
Log:
/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.17_powerpc_check/drivers/infiniband/ulp/vnic/vnic_control.c:1404: error: implicit declaration of function ‘vmalloc’
/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.17_powerpc_check/drivers/infiniband/ulp/vnic/vnic_control.c:1404: warning: assignment makes pointer from integer without a cast
/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.17_powerpc_check/drivers/infiniband/ulp/vnic/vnic_control.c:1440: error: implicit declaration of function ‘vfree’
make[4]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.17_powerpc_check/drivers/infiniband/ulp/vnic/vnic_control.o] Error 1
make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.17_powerpc_check/drivers/infiniband/ulp/vnic] Error 2
make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.17_powerpc_check/drivers/infiniband] Error 2
make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.17_powerpc_check] Error 2
make[1]: Leaving directory `/home/vlad/kernel.org/powerpc/linux-2.6.17'
make: *** [kernel] Error 2
----------------------------------------------------------------------------------
Build failed on ia64 with linux-2.6.13
Log:
/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.13_ia64_check/drivers/infiniband/ulp/vnic/vnic_control.c:1736: warning: format ‘%llx’ expects type ‘long long unsigned int’, but argument 3 has type ‘long unsigned int’
/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.13_ia64_check/drivers/infiniband/ulp/vnic/vnic_control.c: In function ‘control_log_data_path_pkt’:
/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.13_ia64_check/drivers/infiniband/ulp/vnic/vnic_control.c:1751: warning: format ‘%llx’ expects type ‘long long unsigned int’, but argument 2 has type ‘u64’
make[4]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.13_ia64_check/drivers/infiniband/ulp/vnic/vnic_control.o] Error 1
make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.13_ia64_check/drivers/infiniband/ulp/vnic] Error 2
make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.13_ia64_check/drivers/infiniband] Error 2
make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.13_ia64_check] Error 2
make[1]: Leaving directory `/home/vlad/kernel.org/ia64/linux-2.6.13'
make: *** [kernel] Error 2
----------------------------------------------------------------------------------
Build failed on powerpc with linux-2.6.14
Log:
/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.14_powerpc_check/drivers/infiniband/ulp/vnic/vnic_control.c:1404: error: implicit declaration of function ‘vmalloc’
/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.14_powerpc_check/drivers/infiniband/ulp/vnic/vnic_control.c:1404: warning: assignment makes pointer from integer without a cast
/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.14_powerpc_check/drivers/infiniband/ulp/vnic/vnic_control.c:1440: error: implicit declaration of function ‘vfree’
make[4]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.14_powerpc_check/drivers/infiniband/ulp/vnic/vnic_control.o] Error 1
make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.14_powerpc_check/drivers/infiniband/ulp/vnic] Error 2
make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.14_powerpc_check/drivers/infiniband] Error 2
make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.14_powerpc_check] Error 2
make[1]: Leaving directory `/home/vlad/kernel.org/powerpc/linux-2.6.14'
make: *** [kernel] Error 2
----------------------------------------------------------------------------------
Build failed on x86_64 with linux-2.6.13
Log:
/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.13_x86_64_check/drivers/infiniband/ulp/vnic/vnic_sys.c:443: error: ‘struct class_device’ has no member named ‘parent’
/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.13_x86_64_check/drivers/infiniband/ulp/vnic/vnic_sys.c: In function ‘setup_path_class_files’:
/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.13_x86_64_check/drivers/infiniband/ulp/vnic/vnic_sys.c:645: error: ‘struct class_device’ has no member named ‘parent’
make[4]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.13_x86_64_check/drivers/infiniband/ulp/vnic/vnic_sys.o] Error 1
make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.13_x86_64_check/drivers/infiniband/ulp/vnic] Error 2
make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.13_x86_64_check/drivers/infiniband] Error 2
make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.13_x86_64_check] Error 2
make[1]: Leaving directory `/home/vlad/kernel.org/x86_64/linux-2.6.13'
make: *** [kernel] Error 2
----------------------------------------------------------------------------------
Build failed on x86_64 with linux-2.6.12
Log:
/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.12_x86_64_check/drivers/infiniband/ulp/vnic/vnic_sys.c:443: error: ‘struct class_device’ has no member named ‘parent’
/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.12_x86_64_check/drivers/infiniband/ulp/vnic/vnic_sys.c: In function ‘setup_path_class_files’:
/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.12_x86_64_check/drivers/infiniband/ulp/vnic/vnic_sys.c:645: error: ‘struct class_device’ has no member named ‘parent’
make[4]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.12_x86_64_check/drivers/infiniband/ulp/vnic/vnic_sys.o] Error 1
make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.12_x86_64_check/drivers/infiniband/ulp/vnic] Error 2
make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.12_x86_64_check/drivers/infiniband] Error 2
make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.12_x86_64_check] Error 2
make[1]: Leaving directory `/home/vlad/kernel.org/x86_64/linux-2.6.12'
make: *** [kernel] Error 2
----------------------------------------------------------------------------------
Build failed on ia64 with linux-2.6.12
Log:
/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.12_ia64_check/drivers/infiniband/ulp/vnic/vnic_control.c:1736: warning: format ‘%llx’ expects type ‘long long unsigned int’, but argument 3 has type ‘long unsigned int’
/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.12_ia64_check/drivers/infiniband/ulp/vnic/vnic_control.c: In function ‘control_log_data_path_pkt’:
/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.12_ia64_check/drivers/infiniband/ulp/vnic/vnic_control.c:1751: warning: format ‘%llx’ expects type ‘long long unsigned int’, but argument 2 has type ‘u64’
make[4]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.12_ia64_check/drivers/infiniband/ulp/vnic/vnic_control.o] Error 1
make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.12_ia64_check/drivers/infiniband/ulp/vnic] Error 2
make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.12_ia64_check/drivers/infiniband] Error 2
make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.12_ia64_check] Error 2
make[1]: Leaving directory `/home/vlad/kernel.org/ia64/linux-2.6.12'
make: *** [kernel] Error 2
----------------------------------------------------------------------------------
Build failed on ppc64 with linux-2.6.15
Log:
/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.15_ppc64_check/drivers/infiniband/ulp/vnic/vnic_control.c:1736: warning: format ‘%llx’ expects type ‘long long unsigned int’, but argument 3 has type ‘__be64’
/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.15_ppc64_check/drivers/infiniband/ulp/vnic/vnic_control.c: In function ‘control_log_data_path_pkt’:
/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.15_ppc64_check/drivers/infiniband/ulp/vnic/vnic_control.c:1751: warning: format ‘%llx’ expects type ‘long long unsigned int’, but argument 2 has type ‘u64’
make[4]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.15_ppc64_check/drivers/infiniband/ulp/vnic/vnic_control.o] Error 1
make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.15_ppc64_check/drivers/infiniband/ulp/vnic] Error 2
make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.15_ppc64_check/drivers/infiniband] Error 2
make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.15_ppc64_check] Error 2
make[1]: Leaving directory `/home/vlad/kernel.org/ppc64/linux-2.6.15'
make: *** [kernel] Error 2
----------------------------------------------------------------------------------
Build failed on ia64 with linux-2.6.14
Log:
/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.14_ia64_check/drivers/infiniband/ulp/vnic/vnic_control.c:1736: warning: format ‘%llx’ expects type ‘long long unsigned int’, but argument 3 has type ‘long unsigned int’
/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.14_ia64_check/drivers/infiniband/ulp/vnic/vnic_control.c: In function ‘control_log_data_path_pkt’:
/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.14_ia64_check/drivers/infiniband/ulp/vnic/vnic_control.c:1751: warning: format ‘%llx’ expects type ‘long long unsigned int’, but argument 2 has type ‘u64’
make[4]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.14_ia64_check/drivers/infiniband/ulp/vnic/vnic_control.o] Error 1
make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.14_ia64_check/drivers/infiniband/ulp/vnic] Error 2
make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.14_ia64_check/drivers/infiniband] Error 2
make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.14_ia64_check] Error 2
make[1]: Leaving directory `/home/vlad/kernel.org/ia64/linux-2.6.14'
make: *** [kernel] Error 2
----------------------------------------------------------------------------------
Build failed on ia64 with linux-2.6.15
Log:
/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.15_ia64_check/drivers/infiniband/ulp/vnic/vnic_control.c:1736: warning: format ‘%llx’ expects type ‘long long unsigned int’, but argument 3 has type ‘long unsigned int’
/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.15_ia64_check/drivers/infiniband/ulp/vnic/vnic_control.c: In function ‘control_log_data_path_pkt’:
/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.15_ia64_check/drivers/infiniband/ulp/vnic/vnic_control.c:1751: warning: format ‘%llx’ expects type ‘long long unsigned int’, but argument 2 has type ‘u64’
make[4]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.15_ia64_check/drivers/infiniband/ulp/vnic/vnic_control.o] Error 1
make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.15_ia64_check/drivers/infiniband/ulp/vnic] Error 2
make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.15_ia64_check/drivers/infiniband] Error 2
make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.15_ia64_check] Error 2
make[1]: Leaving directory `/home/vlad/kernel.org/ia64/linux-2.6.15'
make: *** [kernel] Error 2
----------------------------------------------------------------------------------
Build failed on ia64 with linux-2.6.17
Log:
/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.17_ia64_check/drivers/infiniband/ulp/vnic/vnic_control.c:1736: warning: format ‘%llx’ expects type ‘long long unsigned int’, but argument 3 has type ‘long unsigned int’
/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.17_ia64_check/drivers/infiniband/ulp/vnic/vnic_control.c: In function ‘control_log_data_path_pkt’:
/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.17_ia64_check/drivers/infiniband/ulp/vnic/vnic_control.c:1751: warning: format ‘%llx’ expects type ‘long long unsigned int’, but argument 2 has type ‘u64’
make[4]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.17_ia64_check/drivers/infiniband/ulp/vnic/vnic_control.o] Error 1
make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.17_ia64_check/drivers/infiniband/ulp/vnic] Error 2
make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.17_ia64_check/drivers/infiniband] Error 2
make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.17_ia64_check] Error 2
make[1]: Leaving directory `/home/vlad/kernel.org/ia64/linux-2.6.17'
make: *** [kernel] Error 2
----------------------------------------------------------------------------------
Build failed on ppc64 with linux-2.6.16
Log:
/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.16_ppc64_check/drivers/infiniband/ulp/vnic/vnic_control.c:1736: warning: format ‘%llx’ expects type ‘long long unsigned int’, but argument 3 has type ‘__be64’
/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.16_ppc64_check/drivers/infiniband/ulp/vnic/vnic_control.c: In function ‘control_log_data_path_pkt’:
/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.16_ppc64_check/drivers/infiniband/ulp/vnic/vnic_control.c:1751: warning: format ‘%llx’ expects type ‘long long unsigned int’, but argument 2 has type ‘u64’
make[4]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.16_ppc64_check/drivers/infiniband/ulp/vnic/vnic_control.o] Error 1
make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.16_ppc64_check/drivers/infiniband/ulp/vnic] Error 2
make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.16_ppc64_check/drivers/infiniband] Error 2
make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.16_ppc64_check] Error 2
make[1]: Leaving directory `/home/vlad/kernel.org/ppc64/linux-2.6.16'
make: *** [kernel] Error 2
----------------------------------------------------------------------------------
Build failed on powerpc with linux-2.6.16
Log:
Build failed on x86_64 with linux-2.6.14
Log:
/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.14_x86_64_check/drivers/infiniband/ulp/vnic/vnic_sys.c:443: error: ‘struct class_device’ has no member named ‘parent’
/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.14_x86_64_check/drivers/infiniband/ulp/vnic/vnic_sys.c: In function ‘setup_path_class_files’:
/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.14_x86_64_check/drivers/infiniband/ulp/vnic/vnic_sys.c:645: error: ‘struct class_device’ has no member named ‘parent’
make[4]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.14_x86_64_check/drivers/infiniband/ulp/vnic/vnic_sys.o] Error 1
make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.14_x86_64_check/drivers/infiniband/ulp/vnic] Error 2
make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.14_x86_64_check/drivers/infiniband] Error 2
make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.14_x86_64_check] Error 2
make[1]: Leaving directory `/home/vlad/kernel.org/x86_64/linux-2.6.14'
make: *** [kernel] Error 2
----------------------------------------------------------------------------------
/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.16_powerpc_check/drivers/infiniband/ulp/vnic/vnic_control.c:1404: error: implicit declaration of function ‘vmalloc’
/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.16_powerpc_check/drivers/infiniband/ulp/vnic/vnic_control.c:1404: warning: assignment makes pointer from integer without a cast
/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.16_powerpc_check/drivers/infiniband/ulp/vnic/vnic_control.c:1440: error: implicit declaration of function ‘vfree’
make[4]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.16_powerpc_check/drivers/infiniband/ulp/vnic/vnic_control.o] Error 1
make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.16_powerpc_check/drivers/infiniband/ulp/vnic] Error 2
make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.16_powerpc_check/drivers/infiniband] Error 2
make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.16_powerpc_check] Error 2
make[1]: Leaving directory `/home/vlad/kernel.org/powerpc/linux-2.6.16'
make: *** [kernel] Error 2
----------------------------------------------------------------------------------
Build failed on powerpc with linux-2.6.12
Log:
/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.12_powerpc_check/drivers/infiniband/ulp/vnic/vnic_control.c:1404: error: implicit declaration of function ‘vmalloc’
/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.12_powerpc_check/drivers/infiniband/ulp/vnic/vnic_control.c:1404: warning: assignment makes pointer from integer without a cast
/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.12_powerpc_check/drivers/infiniband/ulp/vnic/vnic_control.c:1440: error: implicit declaration of function ‘vfree’
make[4]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.12_powerpc_check/drivers/infiniband/ulp/vnic/vnic_control.o] Error 1
make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.12_powerpc_check/drivers/infiniband/ulp/vnic] Error 2
make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.12_powerpc_check/drivers/infiniband] Error 2
make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.12_powerpc_check] Error 2
make[1]: Leaving directory `/home/vlad/kernel.org/powerpc/linux-2.6.12'
make: *** [kernel] Error 2
----------------------------------------------------------------------------------
Build failed on ppc64 with linux-2.6.14
Log:
/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.14_ppc64_check/drivers/infiniband/ulp/vnic/vnic_control.c:1736: warning: format ‘%llx’ expects type ‘long long unsigned int’, but argument 3 has type ‘__be64’
/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.14_ppc64_check/drivers/infiniband/ulp/vnic/vnic_control.c: In function ‘control_log_data_path_pkt’:
/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.14_ppc64_check/drivers/infiniband/ulp/vnic/vnic_control.c:1751: warning: format ‘%llx’ expects type ‘long long unsigned int’, but argument 2 has type ‘u64’
make[4]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.14_ppc64_check/drivers/infiniband/ulp/vnic/vnic_control.o] Error 1
make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.14_ppc64_check/drivers/infiniband/ulp/vnic] Error 2
make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.14_ppc64_check/drivers/infiniband] Error 2
make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.14_ppc64_check] Error 2
make[1]: Leaving directory `/home/vlad/kernel.org/ppc64/linux-2.6.14'
make: *** [kernel] Error 2
----------------------------------------------------------------------------------
Build failed on ia64 with linux-2.6.16
Log:
/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.16_ia64_check/drivers/infiniband/ulp/vnic/vnic_control.c:1736: warning: format ‘%llx’ expects type ‘long long unsigned int’, but argument 3 has type ‘long unsigned int’
/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.16_ia64_check/drivers/infiniband/ulp/vnic/vnic_control.c: In function ‘control_log_data_path_pkt’:
/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.16_ia64_check/drivers/infiniband/ulp/vnic/vnic_control.c:1751: warning: format ‘%llx’ expects type ‘long long unsigned int’, but argument 2 has type ‘u64’
make[4]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.16_ia64_check/drivers/infiniband/ulp/vnic/vnic_control.o] Error 1
make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.16_ia64_check/drivers/infiniband/ulp/vnic] Error 2
make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.16_ia64_check/drivers/infiniband] Error 2
make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.16_ia64_check] Error 2
make[1]: Leaving directory `/home/vlad/kernel.org/ia64/linux-2.6.16'
make: *** [kernel] Error 2
----------------------------------------------------------------------------------
Build failed on ppc64 with linux-2.6.13
Log:
/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.13_ppc64_check/drivers/infiniband/ulp/vnic/vnic_control.c:1736: warning: format ‘%llx’ expects type ‘long long unsigned int’, but argument 3 has type ‘__be64’
/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.13_ppc64_check/drivers/infiniband/ulp/vnic/vnic_control.c: In function ‘control_log_data_path_pkt’:
/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.13_ppc64_check/drivers/infiniband/ulp/vnic/vnic_control.c:1751: warning: format ‘%llx’ expects type ‘long long unsigned int’, but argument 2 has type ‘u64’
make[4]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.13_ppc64_check/drivers/infiniband/ulp/vnic/vnic_control.o] Error 1
make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.13_ppc64_check/drivers/infiniband/ulp/vnic] Error 2
make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.13_ppc64_check/drivers/infiniband] Error 2
make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.13_ppc64_check] Error 2
make[1]: Leaving directory `/home/vlad/kernel.org/ppc64/linux-2.6.13'
make: *** [kernel] Error 2
----------------------------------------------------------------------------------
Build failed on ppc64 with linux-2.6.17
Log:
/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.17_ppc64_check/drivers/infiniband/ulp/vnic/vnic_control.c:1736: warning: format ‘%llx’ expects type ‘long long unsigned int’, but argument 3 has type ‘__be64’
/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.17_ppc64_check/drivers/infiniband/ulp/vnic/vnic_control.c: In function ‘control_log_data_path_pkt’:
/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.17_ppc64_check/drivers/infiniband/ulp/vnic/vnic_control.c:1751: warning: format ‘%llx’ expects type ‘long long unsigned int’, but argument 2 has type ‘u64’
make[4]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.17_ppc64_check/drivers/infiniband/ulp/vnic/vnic_control.o] Error 1
make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.17_ppc64_check/drivers/infiniband/ulp/vnic] Error 2
make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.17_ppc64_check/drivers/infiniband] Error 2
make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.17_ppc64_check] Error 2
make[1]: Leaving directory `/home/vlad/kernel.org/ppc64/linux-2.6.17'
make: *** [kernel] Error 2
----------------------------------------------------------------------------------
Build failed on x86_64 with linux-2.6.16.21-0.8-smp
Log:
In file included from /home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.16.21-0.8-smp_x86_64_check/drivers/infiniband/ulp/vnic/vnic_sys.c:33:
include/linux/parser.h:34: error: expected declaration specifiers or ‘...’ before ‘u64’
include/linux/parser.h:35: error: expected declaration specifiers or ‘...’ before ‘s64’
make[4]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.16.21-0.8-smp_x86_64_check/drivers/infiniband/ulp/vnic/vnic_sys.o] Error 1
make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.16.21-0.8-smp_x86_64_check/drivers/infiniband/ulp/vnic] Error 2
make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.16.21-0.8-smp_x86_64_check/drivers/infiniband] Error 2
make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.16.21-0.8-smp_x86_64_check] Error 2
make[1]: Leaving directory `/home/vlad/kernel.org/x86_64/linux-2.6.16.21-0.8-smp'
make: *** [kernel] Error 2
----------------------------------------------------------------------------------
Build failed on x86_64 with linux-2.6.5-7.244-smp
Log:
/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.5-7.244-smp_x86_64_check/drivers/infiniband/ulp/vnic/vnic_sys.c:443: error: 'struct class_device' has no member named 'parent'
/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.5-7.244-smp_x86_64_check/drivers/infiniband/ulp/vnic/vnic_sys.c: In function 'setup_path_class_files':
/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.5-7.244-smp_x86_64_check/drivers/infiniband/ulp/vnic/vnic_sys.c:645: error: 'struct class_device' has no member named 'parent'
make[4]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.5-7.244-smp_x86_64_check/drivers/infiniband/ulp/vnic/vnic_sys.o] Error 1
make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.5-7.244-smp_x86_64_check/drivers/infiniband/ulp/vnic] Error 2
make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.5-7.244-smp_x86_64_check/drivers/infiniband] Error 2
make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.5-7.244-smp_x86_64_check] Error 2
make[1]: Leaving directory `/home/vlad/kernel.org/x86_64/linux-2.6.5-7.244-smp'
make: *** [kernel] Error 2
----------------------------------------------------------------------------------
Build failed on x86_64 with linux-2.6.9-22.ELsmp
Log:
/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.9-22.ELsmp_x86_64_check/drivers/infiniband/ulp/vnic/vnic_sys.c:443: error: ‘struct class_device’ has no member named ‘parent’
/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.9-22.ELsmp_x86_64_check/drivers/infiniband/ulp/vnic/vnic_sys.c: In function ‘setup_path_class_files’:
/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.9-22.ELsmp_x86_64_check/drivers/infiniband/ulp/vnic/vnic_sys.c:645: error: ‘struct class_device’ has no member named ‘parent’
make[4]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.9-22.ELsmp_x86_64_check/drivers/infiniband/ulp/vnic/vnic_sys.o] Error 1
make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.9-22.ELsmp_x86_64_check/drivers/infiniband/ulp/vnic] Error 2
make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.9-22.ELsmp_x86_64_check/drivers/infiniband] Error 2
make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.9-22.ELsmp_x86_64_check] Error 2
make[1]: Leaving directory `/home/vlad/kernel.org/x86_64/linux-2.6.9-22.ELsmp'
make: *** [kernel] Error 2
----------------------------------------------------------------------------------
Build failed on x86_64 with linux-2.6.9-34.ELsmp
Log:
/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.9-34.ELsmp_x86_64_check/drivers/net/cxgb3/cxgb3_offload.c: In function ‘add_adapter’:
/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.9-34.ELsmp_x86_64_check/drivers/net/cxgb3/cxgb3_offload.c:1061: error: ‘adapter_list_lock’ undeclared (first use in this function)
/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.9-34.ELsmp_x86_64_check/drivers/net/cxgb3/cxgb3_offload.c: In function ‘remove_adapter’:
/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.9-34.ELsmp_x86_64_check/drivers/net/cxgb3/cxgb3_offload.c:1068: error: ‘adapter_list_lock’ undeclared (first use in this function)
make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.9-34.ELsmp_x86_64_check/drivers/net/cxgb3/cxgb3_offload.o] Error 1
make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.9-34.ELsmp_x86_64_check/drivers/net/cxgb3] Error 2
make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.9-34.ELsmp_x86_64_check] Error 2
make[1]: Leaving directory `/home/vlad/kernel.org/x86_64/linux-2.6.9-34.ELsmp'
make: *** [kernel] Error 2
----------------------------------------------------------------------------------
Build failed on ia64 with linux-2.6.16.21-0.8-default
Log:
/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/ulp/vnic/vnic_control.c:1736: warning: format ‘%llx’ expects type ‘long long unsigned int’, but argument 3 has type ‘long unsigned int’
/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/ulp/vnic/vnic_control.c: In function ‘control_log_data_path_pkt’:
/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/ulp/vnic/vnic_control.c:1751: warning: format ‘%llx’ expects type ‘long long unsigned int’, but argument 2 has type ‘u64’
make[4]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/ulp/vnic/vnic_control.o] Error 1
make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/ulp/vnic] Error 2
make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband] Error 2
make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070226-0200_linux-2.6.16.21-0.8-default_ia64_check] Error 2
make[1]: Leaving directory `/home/vlad/kernel.org/ia64/linux-2.6.16.21-0.8-default'
make: *** [kernel] Error 2
----------------------------------------------------------------------------------


From kliteyn at dev.mellanox.co.il  Mon Feb 26 03:20:06 2007
From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik)
Date: Mon, 26 Feb 2007 13:20:06 +0200
Subject: [openib-general] [PATCH] osm: trivial data type change to remove
 compilation warning
Message-ID: <45E2C266.5000503@dev.mellanox.co.il>

Hi Hal

Trivial data type change to remove compilation warning.
Please apply to the trunk and to the 1.2 branch.

Thanks.

Signed-off-by: Yevgeny Kliteynik <kliteyn at dev.mellanox.co.il>
---
 osm/opensm/osm_ucast_updn.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/osm/opensm/osm_ucast_updn.c b/osm/opensm/osm_ucast_updn.c
index 8b86958..70ae10f 100644
--- a/osm/opensm/osm_ucast_updn.c
+++ b/osm/opensm/osm_ucast_updn.c
@@ -1005,8 +1005,8 @@ static void expand_lid_matrices_for_lmc(
   cl_map_item_t *p_next_port, *p_next_sw;
   osm_port_t *p_port;
   osm_switch_t *p_sw;
-  uint16_t lid, min_lid, max_lid, hops;
-  uint8_t port, num_ports;
+  uint16_t lid, min_lid, max_lid;
+  uint8_t port, num_ports, hops;
 
   p_next_port = cl_qmap_head( &p_subn->port_guid_tbl );
   while (p_next_port != cl_qmap_end(&p_subn->port_guid_tbl)) {
-- 
1.4.4.1.GIT


From halr at voltaire.com  Mon Feb 26 05:45:11 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 26 Feb 2007 08:45:11 -0500
Subject: [openib-general] [PATCH] osm: Flushing log file after
	OSM_SYS_LOG message
In-Reply-To: <45E19BE2.2070704@dev.mellanox.co.il>
References: <45E19BE2.2070704@dev.mellanox.co.il>
Message-ID: <1172497508.4102.267757.camel@hal.voltaire.com>

On Sun, 2007-02-25 at 09:23, Yevgeny Kliteynik wrote:
> Hi Hal,
> 
> OSM log should be flushed when OSM_SYS_LOG message is
> printed. We had this once, but somehow it has disappeared.
> 
> This fix has to go both to trunk and to 1.2.
> 
> Thanks,
> 
> --Yevgeny
> 
> Signed-off-by: Yevgeny Kliteynik <kliteyn at dev.mellanox.co.il>

Thanks. Applied (to both master and ofed_1_2).

-- Hal


From halr at voltaire.com  Mon Feb 26 05:55:49 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 26 Feb 2007 08:55:49 -0500
Subject: [openib-general] [PATCH] opensm: updn performance improvements
In-Reply-To: <20070224201342.GB9147@sashak.voltaire.com>
References: <20070224201342.GB9147@sashak.voltaire.com>
Message-ID: <1172498135.4102.268407.camel@hal.voltaire.com>

On Sat, 2007-02-24 at 15:13, Sasha Khapyorsky wrote:
> There are various performance improvements for up/down routing engine:
> - updn_node object which is referenced by switch's priv pointer
> - ranking for switches only
> - replace time consuming cl_list by cl_qlist
> - reuse already collected up/down related information (in updn_node
>   structure) instead of rediscovering
> - eliminate many inner loops
> - mask time consuming logging
> - elminate using two lists with BFS
> - minor cleaups
> 
> Now up/down looks 5-6 times faster.

Nice work!

> Signed-off-by: Sasha Khapyorsky <sashak at voltaire.com>

Thanks. Applied (to master only at least for right now; will get to
ofed_1_2 in a bit).

-- Hal


From dy.manju at gmail.com  Mon Feb 26 06:35:37 2007
From: dy.manju at gmail.com (manju y)
Date: Mon, 26 Feb 2007 20:05:37 +0530
Subject: [openib-general] How to enable fast registration.
Message-ID: <ba89687d0702260635k7b815a17j34653087a178453a@mail.gmail.com>

Hi
Can any one suggest me how to enable fast registration bit while
creation of queue pair

Thanks
 manju


From tziporet at mellanox.co.il  Mon Feb 26 07:10:59 2007
From: tziporet at mellanox.co.il (Tziporet Koren)
Date: Mon, 26 Feb 2007 17:10:59 +0200
Subject: [openib-general] reminder: OFED 1.2 coordination meeting today
 (Monday Feb-26) at 9amPST
Message-ID: <45E2F883.7000001@mellanox.co.il>

Hi all,

I wish to remind you that we have the OFED 1.2 coordination meeting today (Monday Feb-26) at 9am PST.

Agenda:
1. Status update toward beta next week


Tziporet

Bridge info:

Meeting ID:              2106670
Meeting Password:

Global Access Numbers:
http://cisco.com/en/US/about/doing_business/conferencing/index.html

     US/Canada:  +1.866.432.9903    United Kingdom:   +44.20.8824.0117
     India:      +91.80.4103.3979   Germany:          +49.619.6773.9002
     Japan:      +81.3.5763.9394    China:            +86.10.8515.5666

for world-wide access numbers see:

http://openib.org/pipermail/openib-general/2007-January/031282.html


_______________________________________________


From vlad at dev.mellanox.co.il  Mon Feb 26 07:07:45 2007
From: vlad at dev.mellanox.co.il (Vladimir Sokolovsky)
Date: Mon, 26 Feb 2007 17:07:45 +0200
Subject: [openib-general] HOWTO check ofa_kernel build from your git tree
Message-ID: <1172502465.21382.44.camel@vladsk-laptop>


On ssh.openfabrics.org:
Run
env git_url=/home/mst/scm/ofed_1_2_devel.git git_branch=ofed_1_2 \
	CHECK_LOCAL=yes \
	CHECK_KERNEL_ORG=yes \
	CHECK_CROSS=yes /home/vlad/scripts/build_ofa_kernel.sh

-- 
Vladimir Sokolovsky <vlad at dev.mellanox.co.il>
Mellanox Technologies Ltd.


From halr at voltaire.com  Mon Feb 26 07:23:58 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 26 Feb 2007 10:23:58 -0500
Subject: [openib-general] ipoib & the partial pkey
In-Reply-To: <45E1697E.6050007@voltaire.com>
References: <000401c756da$1f9387d0$8698070a@amr.corp.intel.com>
	<45E1697E.6050007@voltaire.com>
Message-ID: <1172503433.4102.273563.camel@hal.voltaire.com>

On Sun, 2007-02-25 at 05:48, Or Gerlitz wrote:
> Sean Hefty wrote:
> > I looked into this more...
> > RFC 4391 states (middle of page 5):
> > For a node to join a partition, one of its ports must be assigned the relevant
> > P_Key by the SM [RFC4392].
> 
> > Jumping to RFC 4392 (top of page 4):
> 
> Just to have us agree on the quote, it is from section 4 of rfc 4392 
> (page 14) eg in http://www.ietf.org/rfc/rfc4392.txt
> 
> > at the time of creating an IB multicast group, multiple values such as the
> > P_Key, Q_Key, Service Level, Hop Limit, Flow ID, TClass, MTU, etc.  have to be
> > specified.  These values should be such that all potential members of the IB
> > multicast group are able to communicate with one another when using them.
> 
> OK, I suggest to remove this spec limitation,

IMO you would need to get the IB spec changed first in order to do this.

> as it does not allow the 
> use case of a server using a partition for which inter-client 
> communication is not allowed.

> Actually since it does not let people use partial membership 
> partitioning with IPoIB as every ipoib device needs to join the 
> broadcast group, it is probably a spec bug and not a limitation done on 
> purpose.

I'm pretty sure this was done on purpose (a conscious choice) as it is
based on what the IBA spec requires.

The flip side of this approach are the partial connectivity issues which
Sean mentioned and this will be reported as SM failures (e.g. more
support issues).

> A simple real-life example is I/O target, the system admin wants IB 
> block and/or file storage traffic to use a partition, but he does not 
> want initiators to communicate among themselves on this partition.
> 
> To achieve that the SM is configured to assign the partial pkey to the 
> initiator nodes and the full pkey to the target ports.
> 
> The current implementation of IPoIB and core perfectly (and 
> transparently...) supports that.

and is currently non compliant in its behavior.

-- Hal

> Or.
> 
> 
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
> 


From ogerlitz at voltaire.com  Mon Feb 26 07:37:38 2007
From: ogerlitz at voltaire.com (Or Gerlitz)
Date: Mon, 26 Feb 2007 17:37:38 +0200
Subject: [openib-general] ipoib & the partial pkey
In-Reply-To: <1172503433.4102.273563.camel@hal.voltaire.com>
References: <000401c756da$1f9387d0$8698070a@amr.corp.intel.com>
	<45E1697E.6050007@voltaire.com>
	<1172503433.4102.273563.camel@hal.voltaire.com>
Message-ID: <45E2FEC2.6010708@voltaire.com>

Hal Rosenstock wrote:
> On Sun, 2007-02-25 at 05:48, Or Gerlitz wrote:

>> Just to have us agree on the quote, it is from section 4 of rfc 4392 
>> (page 14) eg in http://www.ietf.org/rfc/rfc4392.txt

>>> at the time of creating an IB multicast group, multiple values such as the
>>> P_Key, Q_Key, Service Level, Hop Limit, Flow ID, TClass, MTU, etc.  have to be
>>> specified.  These values should be such that all potential members of the IB
>>> multicast group are able to communicate with one another when using them.

>> OK, I suggest to remove this spec limitation,

> IMO you would need to get the IB spec changed first in order to do this.

do you refers to this?

> What about the description og P_Key in MCMemberRecord (table 210 on p.
> 908 which is compliance) which states:
> 
> "All members of the multicast group shall have full membership in the
> partition indicated by the partition key."

if yes, indeed, this also has to be changed.

Or.


From halr at voltaire.com  Mon Feb 26 08:25:07 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 26 Feb 2007 11:25:07 -0500
Subject: [openib-general] ipoib & the partial pkey
In-Reply-To: <45E2FEC2.6010708@voltaire.com>
References: <000401c756da$1f9387d0$8698070a@amr.corp.intel.com>
	<45E1697E.6050007@voltaire.com>
	<1172503433.4102.273563.camel@hal.voltaire.com>
	<45E2FEC2.6010708@voltaire.com>
Message-ID: <1172507101.4102.277140.camel@hal.voltaire.com>

On Mon, 2007-02-26 at 10:37, Or Gerlitz wrote:
> Hal Rosenstock wrote:
> > On Sun, 2007-02-25 at 05:48, Or Gerlitz wrote:
> 
> >> Just to have us agree on the quote, it is from section 4 of rfc 4392 
> >> (page 14) eg in http://www.ietf.org/rfc/rfc4392.txt
> 
> >>> at the time of creating an IB multicast group, multiple values such as the
> >>> P_Key, Q_Key, Service Level, Hop Limit, Flow ID, TClass, MTU, etc.  have to be
> >>> specified.  These values should be such that all potential members of the IB
> >>> multicast group are able to communicate with one another when using them.
> 
> >> OK, I suggest to remove this spec limitation,
> 
> > IMO you would need to get the IB spec changed first in order to do this.
> 
> do you refers to this?
> 
> > What about the description og P_Key in MCMemberRecord (table 210 on p.
> > 908 which is compliance) which states:
> > 
> > "All members of the multicast group shall have full membership in the
> > partition indicated by the partition key."
> 
> if yes, indeed, this also has to be changed.

Yes, for one. There may be others; I didn't look exhaustively at the
spec for this.

-- Hal

> Or.
> 


From rdreier at cisco.com  Mon Feb 26 08:42:20 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Mon, 26 Feb 2007 08:42:20 -0800
Subject: [openib-general] [PATCH for-2.6.21] IPoIB/cm: improve small
 message bandwidth
In-Reply-To: <20070225122211.GD5331@mellanox.co.il> (Michael S.
	Tsirkin's message of "Sun, 25 Feb 2007 14:22:11 +0200")
References: <adaslcxvpuy.fsf@cisco.com> <20070225122211.GD5331@mellanox.co.il>
Message-ID: <adaodng50r7.fsf@cisco.com>

nope, doesn't seem to make a difference.


From sweitzen at cisco.com  Mon Feb 26 08:49:59 2007
From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen))
Date: Mon, 26 Feb 2007 08:49:59 -0800
Subject: [openib-general] bugs filed for problems compiling OFED 1.2
	alpha1
In-Reply-To: <20070226075059.GB27677@mellanox.co.il>
References: <A15335FBE9BD2449AF2C9EF3D1EB8EA30314EBC5@xmb-sjc-216.amer.cisco.com>
	<20070226075059.GB27677@mellanox.co.il>
Message-ID: <A15335FBE9BD2449AF2C9EF3D1EB8EA30314EC78@xmb-sjc-216.amer.cisco.com>

> Some of these might be fixed in recent nightly builds.
> Specifically I know 383 was fixed yesterday. Please check 
> this and let us know.

Thanks, what is the URL for the nightly builds?

Scott


From vlad at dev.mellanox.co.il  Mon Feb 26 08:59:22 2007
From: vlad at dev.mellanox.co.il (Vladimir Sokolovsky)
Date: Mon, 26 Feb 2007 18:59:22 +0200
Subject: [openib-general] [ewg] RE: bugs filed for problems compiling
	OFED 1.2 alpha1
In-Reply-To: <A15335FBE9BD2449AF2C9EF3D1EB8EA30314EC78@xmb-sjc-216.amer.cisco.com>
References: <A15335FBE9BD2449AF2C9EF3D1EB8EA30314EBC5@xmb-sjc-216.amer.cisco.com>
	<20070226075059.GB27677@mellanox.co.il>
	<A15335FBE9BD2449AF2C9EF3D1EB8EA30314EC78@xmb-sjc-216.amer.cisco.com>
Message-ID: <1172509162.21382.45.camel@vladsk-laptop>

On Mon, 2007-02-26 at 08:49 -0800, Scott Weitzenkamp (sweitzen) wrote:
> > Some of these might be fixed in recent nightly builds.
> > Specifically I know 383 was fixed yesterday. Please check 
> > this and let us know.
> 
> Thanks, what is the URL for the nightly builds?
> 
> Scott
> 
> _______________________________________________
> ewg mailing list
> ewg at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg

http://www.openfabrics.org/builds/ofa_1_2_kernel/

The latest:
http://www.openfabrics.org/builds/ofa_1_2_kernel/ofa_1_2_kernel-20070226-0405.tgz


-- 
Vladimir Sokolovsky <vlad at dev.mellanox.co.il>
Mellanox Technologies Ltd.


From sweitzen at cisco.com  Mon Feb 26 09:00:49 2007
From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen))
Date: Mon, 26 Feb 2007 09:00:49 -0800
Subject: [openib-general] [ewg] RE: bugs filed for problems compiling
	OFED 1.2 alpha1
In-Reply-To: <1172509162.21382.45.camel@vladsk-laptop>
References: <A15335FBE9BD2449AF2C9EF3D1EB8EA30314EBC5@xmb-sjc-216.amer.cisco.com>
	<20070226075059.GB27677@mellanox.co.il>
	<A15335FBE9BD2449AF2C9EF3D1EB8EA30314EC78@xmb-sjc-216.amer.cisco.com>
	<1172509162.21382.45.camel@vladsk-laptop>
Message-ID: <A15335FBE9BD2449AF2C9EF3D1EB8EA30314EC95@xmb-sjc-216.amer.cisco.com>

I want a full OFED build, please.  This was agreed to in one of the OFED
bi-weekly calls.

Scott 

> -----Original Message-----
> From: Vladimir Sokolovsky [mailto:vlad at dev.mellanox.co.il] 
> Sent: Monday, February 26, 2007 8:59 AM
> To: Scott Weitzenkamp (sweitzen)
> Cc: Michael S. Tsirkin; Openfabrics-ewg at openib.org; OPENIB
> Subject: Re: [ewg] RE: bugs filed for problems compiling OFED 
> 1.2 alpha1
> 
> On Mon, 2007-02-26 at 08:49 -0800, Scott Weitzenkamp (sweitzen) wrote:
> > > Some of these might be fixed in recent nightly builds.
> > > Specifically I know 383 was fixed yesterday. Please check 
> > > this and let us know.
> > 
> > Thanks, what is the URL for the nightly builds?
> > 
> > Scott
> > 
> > _______________________________________________
> > ewg mailing list
> > ewg at lists.openfabrics.org
> > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
> 
> http://www.openfabrics.org/builds/ofa_1_2_kernel/
> 
> The latest:
> http://www.openfabrics.org/builds/ofa_1_2_kernel/ofa_1_2_kerne
> l-20070226-0405.tgz
> 
> 
> -- 
> Vladimir Sokolovsky <vlad at dev.mellanox.co.il>
> Mellanox Technologies Ltd.
> 


From jsquyres at cisco.com  Mon Feb 26 09:05:30 2007
From: jsquyres at cisco.com (Jeff Squyres)
Date: Mon, 26 Feb 2007 12:05:30 -0500
Subject: [openib-general] Fwd: Address List Change Now Scheduled for
	Wednesday, 2/28/2007
References: <3D84A59A1AD3584DA02AEAD240E8863F03BC471E@ES22SNLNT.srn.sandia.gov>
Message-ID: <B53C816F-F33B-4A88-98BE-E7F20AF6833B@cisco.com>

FYI.  In case you missed it the Nth time: THIS LIST IS CHANGING ON  
WEDNESDAY 2/28/2007 (2 days from now).  Really.  For sure this time.   
Trust me.  Honest.

Please update your addressbooks!


Begin forwarded message:

> From: "Lee, Michael Paichi" <mplee at sandia.gov>
> Date: February 22, 2007 11:44:25 AM EST
> To: "Jeff Squyres" <jsquyres at cisco.com>, "Michael S. Tsirkin"  
> <mst at mellanox.co.il>
> Cc: "OpenFabrics General" <openib-general at openib.org>
> Subject: Address List Change Now Scheduled for Wednesday, 2/28/2007
>
> The list will now be migrated on Wednesday, 2/28/2007.
>
> List address:         general at lists.openfabrics.org
> Updated change-date:  Wednesday, 2/28/2007
>
> Michael


-- 
Jeff Squyres
Server Virtualization Business Unit
Cisco Systems


From monis at voltaire.com  Mon Feb 26 09:07:30 2007
From: monis at voltaire.com (Moni Shoua)
Date: Mon, 26 Feb 2007 19:07:30 +0200
Subject: [openib-general] [RFC] [PATCH v2] IB/ipoib: Add bonding support to
	IPoIB
Message-ID: <45E313D2.70909@voltaire.com>

Hi,

This post follows a previous one, regarding required changes to IPoIB to enable
it to work with bonding. Please find it here: http://openib.org/pipermail/openib-general/2007-February/032598.html

This patch version adds fixes to the comments from Michael Tsirkin from the last post.

IPoIB uses a two layer neighboring scheme, such that for each struct neighbour
whose device is an ipoib one, there is a struct ipoib_neigh buddy which is
created on demand at the tx flow by an ipoib_neigh_alloc(skb->dst->neighbour)
call.

When using the bonding driver, neighbours are created by the net stack on behalf
of the bonding (master) device. On the tx flow the bonding code gets an skb such
that skb->dev points to the master device, it changes this skb to point on the
slave device and calls the slave hard_start_xmit function.

Combing these two flows, there is a hole if some code at ipoib
(ipoib_neigh_destructor) assumes that for each struct neighbour it gets, n->dev
is an ipoib device so for example netdev_priv(n->dev) would be of type struct
ipoib_dev_priv.

To fix it, this patch adds a dev field to struct ipoib_neigh which is used
instead of the struct neighbour dev one.

In addition, if an IPoIB device is removed before bonding is unloaded it may 
cause bond0 neighbours (neighbours that point to bond0) to exist after the IPoIB
device no longer exist. This is why a neighbour cleanup is required during device 
cleanup. This cleanup scans the arp cache and the ndisc cache to find there 
neighbours of bond0 which refer also to the relevant ibX. Also, when ib_ipoib module is
unloaded, the neighbour destructor must be set to NULL because the neighbour function is in
ib_ipoib.
For this neigh table cleanup, it is required to export the symbol nd_tbl just like the symbol arp_tbl is.

During my tests I found that when running 

	1. modprobe -r ib_mthca (to delete IPoIB interfaces)
	2. ping somewhere on the subnet of bond0

I get this stack dump (which ends with kernel death)
	 [<ffffffff8037ff32>] skb_under_panic+0x5c/0x60
	 [<ffffffff882e00c2>] :ib_ipoib:ipoib_hard_header+0xa6/0xc0
	 [<ffffffff803c3c98>] arp_create+0x120/0x226
	 [<ffffffff803c3dc3>] arp_send+0x25/0x3b
	 [<ffffffff803c466a>] arp_solicit+0x186/0x195
	 [<ffffffff8038c0ac>] neigh_timer_handler+0x2b5/0x309
	 [<ffffffff8038bdf7>] neigh_timer_handler+0x0/0x309
	 [<ffffffff80239599>] run_timer_softirq+0x130/0x19e
	 [<ffffffff80235fcc>] __do_softirq+0x55/0xc3
	 [<ffffffff8020acac>] call_softirq+0x1c/0x28
	 [<ffffffff8020c02b>] do_softirq+0x2c/0x7d
	 [<ffffffff8021864a>] smp_apic_timer_interrupt+0x57/0x6a
	 [<ffffffff80208e19>] mwait_idle+0x0/0x45
	 [<ffffffff8020a756>] apic_timer_interrupt+0x66/0x70
	 <EOI>  [<ffffffff80208e5b>] mwait_idle+0x42/0x45
	 [<ffffffff80208db1>] cpu_idle+0x8b/0xae
	 [<ffffffff80217d60>] start_secondary+0x47f/0x48f

The only way I found to avoid this (for now) is to check skb headroom in
ipoib_hard_header. I guess that this safety check doesn't harm regular IPoIB 
operation and it seems to solve my problem. However, I would be happy to hear what
others think of this last issue.

I would really appreciate comments.

thanks

 -MoniS

------------------------------------------------------------------------------
diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h b/drivers/infiniband/ulp/ipoib/ipoib.h
index 07deee8..31bc6d8 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib.h
+++ b/drivers/infiniband/ulp/ipoib/ipoib.h
@@ -216,6 +216,7 @@ struct ipoib_neigh {
 	struct sk_buff_head queue;
 
 	struct neighbour   *neighbour;
+	struct net_device *dev;
 
 	struct list_head    list;
 };
@@ -232,7 +233,8 @@ static inline struct ipoib_neigh **to_ip
 				     INFINIBAND_ALEN, sizeof(void *));
 }
 
-struct ipoib_neigh *ipoib_neigh_alloc(struct neighbour *neigh);
+struct ipoib_neigh *ipoib_neigh_alloc(struct neighbour *neigh,
+				      struct net_device *dev);
 void ipoib_neigh_free(struct net_device *dev, struct ipoib_neigh *neigh);
 
 extern struct workqueue_struct *ipoib_workqueue;
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c
index 705eb1d..0e3953e 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
@@ -48,6 +48,8 @@ #include <linux/ip.h>
 #include <linux/in.h>
 
 #include <net/dst.h>
+#include <net/arp.h>
+#include <net/ndisc.h>
 
 #define IPOIB_QPN(ha) (be32_to_cpup((__be32 *) ha) & 0xffffff)
 
@@ -70,6 +72,7 @@ module_param_named(debug_level, ipoib_de
 MODULE_PARM_DESC(debug_level, "Enable debug tracing if > 0");
 #endif
 
+static int ipoib_at_exit = 0;
 struct ipoib_path_iter {
 	struct net_device *dev;
 	struct ipoib_path  path;
@@ -490,7 +493,7 @@ static void neigh_add_path(struct sk_buf
 	struct ipoib_path *path;
 	struct ipoib_neigh *neigh;
 
-	neigh = ipoib_neigh_alloc(skb->dst->neighbour);
+	neigh = ipoib_neigh_alloc(skb->dst->neighbour, skb->dev);
 	if (!neigh) {
 		++priv->stats.tx_dropped;
 		dev_kfree_skb_any(skb);
@@ -735,6 +738,9 @@ static int ipoib_hard_header(struct sk_b
 {
 	struct ipoib_header *header;
 
+	if (skb_headroom(skb) < sizeof *header) {
+		return -1;
+	}
 	header = (struct ipoib_header *) skb_push(skb, sizeof *header);
 
 	header->proto = htons(type);
@@ -746,8 +752,11 @@ static int ipoib_hard_header(struct sk_b
 	 * figure out where to send the packet later.
 	 */
 	if ((!skb->dst || !skb->dst->neighbour) && daddr) {
-		struct ipoib_pseudoheader *phdr =
-			(struct ipoib_pseudoheader *) skb_push(skb, sizeof *phdr);
+		struct ipoib_pseudoheader *phdr = NULL;
+		if (skb_headroom(skb) < sizeof *phdr) {
+			return -1;
+		}
+		phdr = (struct ipoib_pseudoheader *) skb_push(skb, sizeof *phdr);
 		memcpy(phdr->hwaddr, daddr, INFINIBAND_ALEN);
 	}
 
@@ -769,32 +778,69 @@ static void ipoib_set_mcast_list(struct 
 static void ipoib_neigh_destructor(struct neighbour *n)
 {
 	struct ipoib_neigh *neigh;
-	struct ipoib_dev_priv *priv = netdev_priv(n->dev);
+	struct ipoib_dev_priv *priv;
 	unsigned long flags;
 	struct ipoib_ah *ah = NULL;
 
-	ipoib_dbg(priv,
-		  "neigh_destructor for %06x " IPOIB_GID_FMT "\n",
-		  IPOIB_QPN(n->ha),
-		  IPOIB_GID_RAW_ARG(n->ha + 4));
-
-	spin_lock_irqsave(&priv->lock, flags);
 
 	neigh = *to_ipoib_neigh(n);
 	if (neigh) {
+		priv = netdev_priv(neigh->dev);
+		ipoib_dbg(priv,
+			  "neigh_destructor for %06x " IPOIB_GID_FMT "\n",
+			  IPOIB_QPN(n->ha),
+			  IPOIB_GID_RAW_ARG(n->ha + 4));
+
+		spin_lock_irqsave(&priv->lock, flags);
 		if (neigh->ah)
 			ah = neigh->ah;
 		list_del(&neigh->list);
 		ipoib_neigh_free(n->dev, neigh);
+		spin_unlock_irqrestore(&priv->lock, flags);
 	}
-
-	spin_unlock_irqrestore(&priv->lock, flags);
-
 	if (ah)
 		ipoib_put_ah(ah);
 }
 
-struct ipoib_neigh *ipoib_neigh_alloc(struct neighbour *neighbour)
+static void ipoib_neigh_tbl_cleanup_master(struct neigh_table *tbl,
+					   struct net_device* master,
+					   struct net_device* slave)
+{
+	int i;
+	struct ipoib_neigh *neigh;
+
+	write_lock_bh(&tbl->lock);
+	for (i = 0; i <= tbl->hash_mask; i++) {
+		struct neighbour *n, **np;
+
+		np = &tbl->hash_buckets[i];
+		while ((n = *np) != NULL) {
+			write_lock(&n->lock);
+			if (n->dev == master) {
+				neigh = *to_ipoib_neigh(n);
+				if (neigh && (neigh->dev == slave)){
+					if (ipoib_at_exit)
+						n->parms->neigh_destructor = NULL;
+					ipoib_neigh_destructor(n);
+				}
+			}
+			write_unlock(&n->lock);
+			np = &n->next;
+		}
+	}
+	write_unlock_bh(&tbl->lock);
+}
+
+static void ipoib_neigh_cleanup_by_master(struct net_device* master,struct net_device* slave){
+	netif_stop_queue(slave);
+	if (master) {
+		ipoib_neigh_tbl_cleanup_master(&arp_tbl,master, slave);
+		ipoib_neigh_tbl_cleanup_master(&nd_tbl,master, slave);
+	}
+}
+
+struct ipoib_neigh *ipoib_neigh_alloc(struct neighbour *neighbour,
+				      struct net_device *dev)
 {
 	struct ipoib_neigh *neigh;
 
@@ -803,6 +849,7 @@ struct ipoib_neigh *ipoib_neigh_alloc(st
 		return NULL;
 
 	neigh->neighbour = neighbour;
+	neigh->dev = dev;
 	*to_ipoib_neigh(neighbour) = neigh;
 	skb_queue_head_init(&neigh->queue);
 
@@ -874,6 +921,7 @@ void ipoib_dev_cleanup(struct net_device
 
 	/* Delete any child interfaces first */
 	list_for_each_entry_safe(cpriv, tcpriv, &priv->child_intfs, list) {
+		ipoib_neigh_cleanup_by_master(cpriv->dev->master, cpriv->dev);
 		unregister_netdev(cpriv->dev);
 		ipoib_dev_cleanup(cpriv->dev);
 		free_netdev(cpriv->dev);
@@ -1159,6 +1207,7 @@ static void ipoib_remove_one(struct ib_d
 		ib_unregister_event_handler(&priv->event_handler);
 		flush_scheduled_work();
 
+		ipoib_neigh_cleanup_by_master(priv->dev->master, priv->dev);
 		unregister_netdev(priv->dev);
 		ipoib_dev_cleanup(priv->dev);
 		free_netdev(priv->dev);
@@ -1217,6 +1266,8 @@ err_fs:
 
 static void __exit ipoib_cleanup_module(void)
 {
+	ipoib_at_exit = 1;
+
 	ib_unregister_client(&ipoib_client);
 	ib_sa_unregister_client(&ipoib_sa_client);
 	ipoib_unregister_debugfs();
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
index b04b72c..a41a949 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
@@ -774,7 +774,7 @@ out:
 		if (skb->dst            &&
 		    skb->dst->neighbour &&
 		    !*to_ipoib_neigh(skb->dst->neighbour)) {
-			struct ipoib_neigh *neigh = ipoib_neigh_alloc(skb->dst->neighbour);
+			struct ipoib_neigh *neigh = ipoib_neigh_alloc(skb->dst->neighbour, skb->dev);
 
 			if (neigh) {
 				kref_get(&mcast->ah->ref);
diff --git a/net/ipv6/ndisc.c b/net/ipv6/ndisc.c
index 6a9f616..557be98 100644
--- a/net/ipv6/ndisc.c
+++ b/net/ipv6/ndisc.c
@@ -153,6 +153,7 @@ struct neigh_table nd_tbl = {
 	.gc_thresh2 =	 512,
 	.gc_thresh3 =	1024,
 };
+EXPORT_SYMBOL(nd_tbl);
 
 /* ND options */
 struct ndisc_options {


From vlad at dev.mellanox.co.il  Mon Feb 26 09:24:00 2007
From: vlad at dev.mellanox.co.il (Vladimir Sokolovsky)
Date: Mon, 26 Feb 2007 19:24:00 +0200
Subject: [openib-general] [ewg] RE: bugs filed for problems compiling
	OFED 1.2 alpha1
In-Reply-To: <A15335FBE9BD2449AF2C9EF3D1EB8EA30314EC95@xmb-sjc-216.amer.cisco.com>
References: <A15335FBE9BD2449AF2C9EF3D1EB8EA30314EBC5@xmb-sjc-216.amer.cisco.com>
	<20070226075059.GB27677@mellanox.co.il>
	<A15335FBE9BD2449AF2C9EF3D1EB8EA30314EC78@xmb-sjc-216.amer.cisco.com>
	<1172509162.21382.45.camel@vladsk-laptop>
	<A15335FBE9BD2449AF2C9EF3D1EB8EA30314EC95@xmb-sjc-216.amer.cisco.com>
Message-ID: <1172510640.21382.48.camel@vladsk-laptop>

On Mon, 2007-02-26 at 09:00 -0800, Scott Weitzenkamp (sweitzen) wrote:
> I want a full OFED build, please.  This was agreed to in one of the OFED
> bi-weekly calls.
> 
> Scott 
> 


http://www.openfabrics.org/builds/ofed-1.2/OFED-1.2-20070226-1758.tgz

Regards,
Vladimir


From mshefty at ichips.intel.com  Mon Feb 26 09:46:50 2007
From: mshefty at ichips.intel.com (Sean Hefty)
Date: Mon, 26 Feb 2007 09:46:50 -0800
Subject: [openib-general] [PATCH] for OFED 1.2
In-Reply-To: <1172394057.12388.3.camel@vladsk-laptop>
References: <000001c75787$50ff0440$ff0da8c0@amr.corp.intel.com>
	<1172394057.12388.3.camel@vladsk-laptop>
Message-ID: <45E31D0A.20400@ichips.intel.com>

Vladimir Sokolovsky wrote:
> On Fri, 2007-02-23 at 12:15 -0800, Sean Hefty wrote:
>  > I would like these fixes in OFED 1.2 as well.  What git tree / branch 
> do I
>  > generate a patch against?
>  >
>  > - Sean
> 
> git://git.openfabrics.org/~vlad/ofed_1_2/.git
> branch: ofed_1_2

Can you try pulling from:

git://git.openfabrics.org/~shefty/ofed_1_2.git   ofed_1_2

- Sean


From mshefty at ichips.intel.com  Mon Feb 26 10:15:41 2007
From: mshefty at ichips.intel.com (Sean Hefty)
Date: Mon, 26 Feb 2007 10:15:41 -0800
Subject: [openib-general] [PATCH] IB/core: Set static rate in
 ib_init_ah_from_path()
In-Reply-To: <45E19730.7010008@dev.mellanox.co.il>
References: <000401c75223$29e86ea0$e598070a@amr.corp.intel.com>
	<1431.85.65.224.140.1171732569.squirrel@dev.mellanox.co.il>
	<adar6sn74fq.fsf@cisco.com> <45E19730.7010008@dev.mellanox.co.il>
Message-ID: <45E323CD.3080800@ichips.intel.com>

> int ib_init_ah_from_path(struct ib_device *device, u8 port_num,
>                          struct ib_sa_path_rec *rec, struct ib_ah_attr 
> *ah_attr)
> {
>         int ret;
>         u16 gid_index;
> 
>         memset(ah_attr, 0, sizeof *ah_attr);
>         ah_attr->dlid = be16_to_cpu(rec->dlid);
>         ah_attr->sl = rec->sl;
>         ah_attr->src_path_bits = be16_to_cpu(rec->slid) & 0x7f;

I'm not sure about the '& 0x7f', but...

> I have a feeling that this function doesn't handle the src_path_bits as 
> it should because
> it doesn't care what is the LMC value of the slid (i think that if the 
> LMC is < 8) wrong bits
> may be set in the src_path_bits.

Wouldn't the function simply include the port's base LID in the source path 
bits?  I would think that the LMC would mask out those bits in the address 
vector before ANDing the base LID back in to form the SLID.  But even if the 
bits weren't masked out, ANDing the source path bits with the base LID should 
produce the same result.

If I'm not seeing this correctly, can you describe the problem more?

- Sean


From or.gerlitz at gmail.com  Mon Feb 26 11:05:48 2007
From: or.gerlitz at gmail.com (Or Gerlitz)
Date: Mon, 26 Feb 2007 21:05:48 +0200
Subject: [openib-general] failure to create an FMR mapping 1K pages on
	memfree
In-Reply-To: <15ddcffd0702261104x6df977b6g9e4ca0071c8489ad@mail.gmail.com>
References: <15ddcffd0702261104x6df977b6g9e4ca0071c8489ad@mail.gmail.com>
Message-ID: <15ddcffd0702261105s377ad165h7bfe258f69ede152@mail.gmail.com>

oops - i fogot to CC openib-general.

On 2/26/07, Or Gerlitz <or.gerlitz at gmail.com> wrote:
> Hi Roland,
>
> I have got a report on failure to create FMR mapping 1K pages (that is
> 4MB) on memfree.
>
> I don't have the exact details (ie if Arbel/Sinai / what FW  / etc)
> nor which exact check fails in
> mthca_fmr_alloc, but what's clear is that the latter function returns
> -ENOMEM when attr.max_pages is 1024 and it works fine when
> attr.max_pages is 256.
>
> Is this failure clear to you? if yes, does a HW or FW limit is being
> hit or its a driver design issue?
>
> Or.
>


From bugzilla-daemon at lists.openfabrics.org  Mon Feb 26 12:14:54 2007
From: bugzilla-daemon at lists.openfabrics.org (bugzilla-daemon at lists.openfabrics.org)
Date: Mon, 26 Feb 2007 12:14:54 -0800 (PST)
Subject: [openib-general] [Bug 390] New: perftools don't work on alpha1
Message-ID: <bug-390-1@https.bugs.openfabrics.org/>

https://bugs.openfabrics.org/show_bug.cgi?id=390

           Summary: perftools don't work on alpha1
           Product: OpenFabrics Linux
           Version: 1.2alpha1
          Platform: Other
        OS/Version: Other
            Status: NEW
          Severity: blocker
          Priority: P1
         Component: Verbs
        AssignedTo: bugzilla at openib.org
        ReportedBy: swise at opengridcomputing.com
                CC: mst at mellanox.co.il


There is no correct component so I assigned it to Verbs.

But ib_rmda_bw --cma doesn't seem to work.  It just exits immediately after
displaying the params:

[mpi at r1-iw ~]$ /usr/local/ofed/bin/ib_rdma_bw  --cma
5915: | port=18515 | ib_port=1 | size=65536 | tx_depth=100 | iters=1000 |
duplex=0 | cma=1 |
[mpi at r1-iw ~]$ /usr/local/ofed/bin/ib_rdma_bw --cma
5916: | port=18515 | ib_port=1 | size=65536 | tx_depth=100 | iters=1000 |
duplex=0 | cma=1 |
[mpi at r1-iw ~]$ /usr/local/ofed/bin/ib_rdma_bw --cma  --iters=10
5917: | port=18515 | ib_port=1 | size=65536 | tx_depth=100 | iters=10 |
duplex=0 | cma=1 |


-- 
Configure bugmail: https://bugs.openfabrics.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at lists.openfabrics.org  Mon Feb 26 12:15:18 2007
From: bugzilla-daemon at lists.openfabrics.org (bugzilla-daemon at lists.openfabrics.org)
Date: Mon, 26 Feb 2007 12:15:18 -0800 (PST)
Subject: [openib-general] [Bug 390] perftools don't work on alpha1
In-Reply-To: <bug-390-1@https.bugs.openfabrics.org/>
Message-ID: <20070226201518.2E5C3E60803@openfabrics.org>

https://bugs.openfabrics.org/show_bug.cgi?id=390


------- Comment #1 from swise at opengridcomputing.com  2007-02-26 12:15 -------
ib_rdma_lat works fine.


-- 
Configure bugmail: https://bugs.openfabrics.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From sean.hefty at intel.com  Mon Feb 26 12:17:30 2007
From: sean.hefty at intel.com (Sean Hefty)
Date: Mon, 26 Feb 2007 12:17:30 -0800
Subject: [openib-general] [RFC] [PATCH] ib_cache: do not mask upper bit when
 searching for a pkey
In-Reply-To: <1172507101.4102.277140.camel@hal.voltaire.com>
Message-ID: <000201c759e3$24828410$55d8180a@amr.corp.intel.com>

I think the following patch would make ipoib spec compliant.
ib_find_cached_pkey is called by ib_cm, rdma_cm, ib_srp, and ib_ipoib.
I'm not certain what this change would do to SRP, but the ib_cm and
rdma_cm look okay, given that non-reversible paths aren't supported
yet anyway.
--

ib_find_cached_pkey masks off the upper-bit of the PKey when searching
for a match.  The upper bit indicates partial or full membership.  Ignoring
the upper bit can result in a full membership PKey matching with a partial
membership PKey.  For ipoib, this can result in joining a multicast group
that disallows communication between all members.

Signed-off-by: Sean Hefty <sean.hefty at intel.com>
---
 drivers/infiniband/core/cache.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/infiniband/core/cache.c b/drivers/infiniband/core/cache.c
index 558c9a0..6f366c3 100644
--- a/drivers/infiniband/core/cache.c
+++ b/drivers/infiniband/core/cache.c
@@ -179,7 +179,7 @@ int ib_find_cached_pkey(struct ib_device *device,
 	*index = -1;
 
 	for (i = 0; i < cache->table_len; ++i)
-		if ((cache->table[i] & 0x7fff) == (pkey & 0x7fff)) {
+		if (cache->table[i] == pkey) {
 			*index = i;
 			ret = 0;
 			break;
-- 
1.4.4.3


From mst at mellanox.co.il  Mon Feb 26 13:01:11 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Mon, 26 Feb 2007 23:01:11 +0200
Subject: [openib-general] [RFC] [PATCH v2] IB/ipoib: Add bonding support
	to IPoIB
In-Reply-To: <45E313D2.70909@voltaire.com>
References: <45E313D2.70909@voltaire.com>
Message-ID: <20070226210111.GC12919@mellanox.co.il>

> 
> During my tests I found that when running 
> 
> 	1. modprobe -r ib_mthca (to delete IPoIB interfaces)
> 	2. ping somewhere on the subnet of bond0
> 
> I get this stack dump (which ends with kernel death)
> 	 [<ffffffff8037ff32>] skb_under_panic+0x5c/0x60
> 	 [<ffffffff882e00c2>] :ib_ipoib:ipoib_hard_header+0xa6/0xc0
> 	 [<ffffffff803c3c98>] arp_create+0x120/0x226
> 	 [<ffffffff803c3dc3>] arp_send+0x25/0x3b
> 	 [<ffffffff803c466a>] arp_solicit+0x186/0x195
> 	 [<ffffffff8038c0ac>] neigh_timer_handler+0x2b5/0x309
> 	 [<ffffffff8038bdf7>] neigh_timer_handler+0x0/0x309
> 	 [<ffffffff80239599>] run_timer_softirq+0x130/0x19e
> 	 [<ffffffff80235fcc>] __do_softirq+0x55/0xc3
> 	 [<ffffffff8020acac>] call_softirq+0x1c/0x28
> 	 [<ffffffff8020c02b>] do_softirq+0x2c/0x7d
> 	 [<ffffffff8021864a>] smp_apic_timer_interrupt+0x57/0x6a
> 	 [<ffffffff80208e19>] mwait_idle+0x0/0x45
> 	 [<ffffffff8020a756>] apic_timer_interrupt+0x66/0x70
> 	 <EOI>  [<ffffffff80208e5b>] mwait_idle+0x42/0x45
> 	 [<ffffffff80208db1>] cpu_idle+0x8b/0xae
> 	 [<ffffffff80217d60>] start_secondary+0x47f/0x48f
> 
> The only way I found to avoid this (for now) is to check skb headroom in
> ipoib_hard_header. I guess that this safety check doesn't harm regular IPoIB 
> operation and it seems to solve my problem. However, I would be happy to hear what
> others think of this last issue.

This seems to mean that hard_header_len is not copied from slave to master
device. Right? Maybe that's what needs to be fixed.

-- 
MST


From rdreier at cisco.com  Mon Feb 26 13:05:30 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Mon, 26 Feb 2007 13:05:30 -0800
Subject: [openib-general] [GIT PULL] please pull infiniband.git
Message-ID: <adafy8s3a05.fsf@cisco.com>

Linus, please pull from

    master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband.git for-linus

This tree is also available from kernel.org mirrors at:

    git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git for-linus

This will get various post-rc1 cleanups and fixes:

Adrian Bunk (2):
      IB/mthca: Make 2 functions static
      RDMA/cxgb3: cleanups

Michael S. Tsirkin (1):
      IPoIB/cm: Improve small message bandwidth

Roland Dreier (3):
      IPoIB: Remove unused local_rate tracking
      IB/uverbs: Return correct error for invalid PD in register MR
      IPoIB: Correct debugging output when path record lookup fails

Sean Hefty (4):
      IB/core: Set hop limit in ib_init_ah_from_wc correctly
      RDMA/cma: Request reversible paths only
      IB/cm: Remove ca_guid from cm_device structure
      RDMA/cma: Remove unused node_guid from cma_device structure

Steve Wise (1):
      RDMA/cxgb3: Stop the EP Timer on BAD CLOSE

 drivers/infiniband/core/cm.c                   |   10 ++---
 drivers/infiniband/core/cma.c                  |    6 ++--
 drivers/infiniband/core/uverbs_cmd.c           |    4 ++-
 drivers/infiniband/core/verbs.c                |    2 +-
 drivers/infiniband/hw/cxgb3/Makefile           |    1 -
 drivers/infiniband/hw/cxgb3/cxio_hal.c         |   31 +++++-----------
 drivers/infiniband/hw/cxgb3/cxio_hal.h         |    5 ---
 drivers/infiniband/hw/cxgb3/cxio_resource.c    |   14 +------
 drivers/infiniband/hw/cxgb3/iwch_cm.c          |    6 ++--
 drivers/infiniband/hw/cxgb3/iwch_provider.c    |    2 +-
 drivers/infiniband/hw/cxgb3/iwch_provider.h    |    1 -
 drivers/infiniband/hw/cxgb3/iwch_qp.c          |   29 +++++++--------
 drivers/infiniband/hw/mthca/mthca_mr.c         |   10 +++--
 drivers/infiniband/ulp/ipoib/ipoib.h           |    1 -
 drivers/infiniband/ulp/ipoib/ipoib_cm.c        |   46 ++++++++++++++----------
 drivers/infiniband/ulp/ipoib/ipoib_main.c      |    2 +-
 drivers/infiniband/ulp/ipoib/ipoib_multicast.c |    8 ++---
 17 files changed, 76 insertions(+), 102 deletions(-)


diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
index d446998..842cd0b 100644
--- a/drivers/infiniband/core/cm.c
+++ b/drivers/infiniband/core/cm.c
@@ -88,7 +88,6 @@ struct cm_port {
 struct cm_device {
 	struct list_head list;
 	struct ib_device *device;
-	__be64 ca_guid;
 	struct cm_port port[0];
 };
 
@@ -739,8 +738,8 @@ retest:
 		ib_cancel_mad(cm_id_priv->av.port->mad_agent, cm_id_priv->msg);
 		spin_unlock_irqrestore(&cm_id_priv->lock, flags);
 		ib_send_cm_rej(cm_id, IB_CM_REJ_TIMEOUT,
-			       &cm_id_priv->av.port->cm_dev->ca_guid,
-			       sizeof cm_id_priv->av.port->cm_dev->ca_guid,
+			       &cm_id_priv->id.device->node_guid,
+			       sizeof cm_id_priv->id.device->node_guid,
 			       NULL, 0);
 		break;
 	case IB_CM_REQ_RCVD:
@@ -883,7 +882,7 @@ static void cm_format_req(struct cm_req_msg *req_msg,
 
 	req_msg->local_comm_id = cm_id_priv->id.local_id;
 	req_msg->service_id = param->service_id;
-	req_msg->local_ca_guid = cm_id_priv->av.port->cm_dev->ca_guid;
+	req_msg->local_ca_guid = cm_id_priv->id.device->node_guid;
 	cm_req_set_local_qpn(req_msg, cpu_to_be32(param->qp_num));
 	cm_req_set_resp_res(req_msg, param->responder_resources);
 	cm_req_set_init_depth(req_msg, param->initiator_depth);
@@ -1442,7 +1441,7 @@ static void cm_format_rep(struct cm_rep_msg *rep_msg,
 	cm_rep_set_flow_ctrl(rep_msg, param->flow_control);
 	cm_rep_set_rnr_retry_count(rep_msg, param->rnr_retry_count);
 	cm_rep_set_srq(rep_msg, param->srq);
-	rep_msg->local_ca_guid = cm_id_priv->av.port->cm_dev->ca_guid;
+	rep_msg->local_ca_guid = cm_id_priv->id.device->node_guid;
 
 	if (param->private_data && param->private_data_len)
 		memcpy(rep_msg->private_data, param->private_data,
@@ -3385,7 +3384,6 @@ static void cm_add_one(struct ib_device *device)
 		return;
 
 	cm_dev->device = device;
-	cm_dev->ca_guid = device->node_guid;
 
 	set_bit(IB_MGMT_METHOD_SEND, reg_req.method_mask);
 	for (i = 1; i <= device->phys_port_cnt; i++) {
diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index f8d69b3..d441815 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -77,7 +77,6 @@ static int next_port;
 struct cma_device {
 	struct list_head	list;
 	struct ib_device	*device;
-	__be64			node_guid;
 	struct completion	comp;
 	atomic_t		refcount;
 	struct list_head	id_list;
@@ -1492,11 +1491,13 @@ static int cma_query_ib_route(struct rdma_id_private *id_priv, int timeout_ms,
 	ib_addr_get_dgid(addr, &path_rec.dgid);
 	path_rec.pkey = cpu_to_be16(ib_addr_get_pkey(addr));
 	path_rec.numb_path = 1;
+	path_rec.reversible = 1;
 
 	id_priv->query_id = ib_sa_path_rec_get(&sa_client, id_priv->id.device,
 				id_priv->id.port_num, &path_rec,
 				IB_SA_PATH_REC_DGID | IB_SA_PATH_REC_SGID |
-				IB_SA_PATH_REC_PKEY | IB_SA_PATH_REC_NUMB_PATH,
+				IB_SA_PATH_REC_PKEY | IB_SA_PATH_REC_NUMB_PATH |
+				IB_SA_PATH_REC_REVERSIBLE,
 				timeout_ms, GFP_KERNEL,
 				cma_query_handler, work, &id_priv->query);
 
@@ -2672,7 +2673,6 @@ static void cma_add_one(struct ib_device *device)
 		return;
 
 	cma_dev->device = device;
-	cma_dev->node_guid = device->node_guid;
 
 	init_completion(&cma_dev->comp);
 	atomic_set(&cma_dev->refcount, 1);
diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
index df1efbc..4fd75af 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -622,8 +622,10 @@ ssize_t ib_uverbs_reg_mr(struct ib_uverbs_file *file,
 	obj->umem.virt_base = cmd.hca_va;
 
 	pd = idr_read_pd(cmd.pd_handle, file->ucontext);
-	if (!pd)
+	if (!pd) {
+		ret = -EINVAL;
 		goto err_release;
+	}
 
 	mr = pd->device->reg_user_mr(pd, &obj->umem, cmd.access_flags, &udata);
 	if (IS_ERR(mr)) {
diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index 8b5dd36..ccdf93d 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -167,7 +167,7 @@ int ib_init_ah_from_wc(struct ib_device *device, u8 port_num, struct ib_wc *wc,
 		ah_attr->grh.sgid_index = (u8) gid_index;
 		flow_class = be32_to_cpu(grh->version_tclass_flow);
 		ah_attr->grh.flow_label = flow_class & 0xFFFFF;
-		ah_attr->grh.hop_limit = grh->hop_limit;
+		ah_attr->grh.hop_limit = 0xFF;
 		ah_attr->grh.traffic_class = (flow_class >> 20) & 0xFF;
 	}
 	return 0;
diff --git a/drivers/infiniband/hw/cxgb3/Makefile b/drivers/infiniband/hw/cxgb3/Makefile
index 0e110f3..36b9898 100644
--- a/drivers/infiniband/hw/cxgb3/Makefile
+++ b/drivers/infiniband/hw/cxgb3/Makefile
@@ -8,5 +8,4 @@ iw_cxgb3-y :=  iwch_cm.o iwch_ev.o iwch_cq.o iwch_qp.o iwch_mem.o \
 
 ifdef CONFIG_INFINIBAND_CXGB3_DEBUG
 EXTRA_CFLAGS += -DDEBUG
-iw_cxgb3-y += cxio_dbg.o
 endif
diff --git a/drivers/infiniband/hw/cxgb3/cxio_hal.c b/drivers/infiniband/hw/cxgb3/cxio_hal.c
index 114ac3b..d737c73 100644
--- a/drivers/infiniband/hw/cxgb3/cxio_hal.c
+++ b/drivers/infiniband/hw/cxgb3/cxio_hal.c
@@ -45,7 +45,7 @@
 static LIST_HEAD(rdev_list);
 static cxio_hal_ev_callback_func_t cxio_ev_cb = NULL;
 
-static inline struct cxio_rdev *cxio_hal_find_rdev_by_name(char *dev_name)
+static struct cxio_rdev *cxio_hal_find_rdev_by_name(char *dev_name)
 {
 	struct cxio_rdev *rdev;
 
@@ -55,8 +55,7 @@ static inline struct cxio_rdev *cxio_hal_find_rdev_by_name(char *dev_name)
 	return NULL;
 }
 
-static inline struct cxio_rdev *cxio_hal_find_rdev_by_t3cdev(struct t3cdev
-							     *tdev)
+static struct cxio_rdev *cxio_hal_find_rdev_by_t3cdev(struct t3cdev *tdev)
 {
 	struct cxio_rdev *rdev;
 
@@ -118,7 +117,7 @@ int cxio_hal_cq_op(struct cxio_rdev *rdev_p, struct t3_cq *cq,
 	return 0;
 }
 
-static inline int cxio_hal_clear_cq_ctx(struct cxio_rdev *rdev_p, u32 cqid)
+static int cxio_hal_clear_cq_ctx(struct cxio_rdev *rdev_p, u32 cqid)
 {
 	struct rdma_cq_setup setup;
 	setup.id = cqid;
@@ -130,7 +129,7 @@ static inline int cxio_hal_clear_cq_ctx(struct cxio_rdev *rdev_p, u32 cqid)
 	return (rdev_p->t3cdev_p->ctl(rdev_p->t3cdev_p, RDMA_CQ_SETUP, &setup));
 }
 
-int cxio_hal_clear_qp_ctx(struct cxio_rdev *rdev_p, u32 qpid)
+static int cxio_hal_clear_qp_ctx(struct cxio_rdev *rdev_p, u32 qpid)
 {
 	u64 sge_cmd;
 	struct t3_modify_qp_wr *wqe;
@@ -425,7 +424,7 @@ void cxio_flush_hw_cq(struct t3_cq *cq)
 	}
 }
 
-static inline int cqe_completes_wr(struct t3_cqe *cqe, struct t3_wq *wq)
+static int cqe_completes_wr(struct t3_cqe *cqe, struct t3_wq *wq)
 {
 	if (CQE_OPCODE(*cqe) == T3_TERMINATE)
 		return 0;
@@ -760,17 +759,6 @@ ret:
 	return err;
 }
 
-/* IN : stag key, pdid, pbl_size
- * Out: stag index, actaul pbl_size, and pbl_addr allocated.
- */
-int cxio_allocate_stag(struct cxio_rdev *rdev_p, u32 * stag, u32 pdid,
-		       enum tpt_mem_perm perm, u32 * pbl_size, u32 * pbl_addr)
-{
-	*stag = T3_STAG_UNSET;
-	return (__cxio_tpt_op(rdev_p, 0, stag, 0, pdid, TPT_NON_SHARED_MR,
-			      perm, 0, 0ULL, 0, 0, NULL, pbl_size, pbl_addr));
-}
-
 int cxio_register_phys_mem(struct cxio_rdev *rdev_p, u32 *stag, u32 pdid,
 			   enum tpt_mem_perm perm, u32 zbva, u64 to, u32 len,
 			   u8 page_size, __be64 *pbl, u32 *pbl_size,
@@ -1029,7 +1017,7 @@ void __exit cxio_hal_exit(void)
 	cxio_hal_destroy_rhdl_resource();
 }
 
-static inline void flush_completed_wrs(struct t3_wq *wq, struct t3_cq *cq)
+static void flush_completed_wrs(struct t3_wq *wq, struct t3_cq *cq)
 {
 	struct t3_swsq *sqp;
 	__u32 ptr = wq->sq_rptr;
@@ -1058,9 +1046,8 @@ static inline void flush_completed_wrs(struct t3_wq *wq, struct t3_cq *cq)
 			break;
 }
 
-static inline void create_read_req_cqe(struct t3_wq *wq,
-				       struct t3_cqe *hw_cqe,
-				       struct t3_cqe *read_cqe)
+static void create_read_req_cqe(struct t3_wq *wq, struct t3_cqe *hw_cqe,
+				struct t3_cqe *read_cqe)
 {
 	read_cqe->u.scqe.wrid_hi = wq->oldest_read->sq_wptr;
 	read_cqe->len = wq->oldest_read->read_len;
@@ -1073,7 +1060,7 @@ static inline void create_read_req_cqe(struct t3_wq *wq,
 /*
  * Return a ptr to the next read wr in the SWSQ or NULL.
  */
-static inline void advance_oldest_read(struct t3_wq *wq)
+static void advance_oldest_read(struct t3_wq *wq)
 {
 
 	u32 rptr = wq->oldest_read - wq->sq + 1;
diff --git a/drivers/infiniband/hw/cxgb3/cxio_hal.h b/drivers/infiniband/hw/cxgb3/cxio_hal.h
index 8ab04a7..99543d6 100644
--- a/drivers/infiniband/hw/cxgb3/cxio_hal.h
+++ b/drivers/infiniband/hw/cxgb3/cxio_hal.h
@@ -143,7 +143,6 @@ int cxio_rdev_open(struct cxio_rdev *rdev);
 void cxio_rdev_close(struct cxio_rdev *rdev);
 int cxio_hal_cq_op(struct cxio_rdev *rdev, struct t3_cq *cq,
 		   enum t3_cq_opcode op, u32 credit);
-int cxio_hal_clear_qp_ctx(struct cxio_rdev *rdev, u32 qpid);
 int cxio_create_cq(struct cxio_rdev *rdev, struct t3_cq *cq);
 int cxio_destroy_cq(struct cxio_rdev *rdev, struct t3_cq *cq);
 int cxio_resize_cq(struct cxio_rdev *rdev, struct t3_cq *cq);
@@ -154,8 +153,6 @@ int cxio_create_qp(struct cxio_rdev *rdev, u32 kernel_domain, struct t3_wq *wq,
 int cxio_destroy_qp(struct cxio_rdev *rdev, struct t3_wq *wq,
 		    struct cxio_ucontext *uctx);
 int cxio_peek_cq(struct t3_wq *wr, struct t3_cq *cq, int opcode);
-int cxio_allocate_stag(struct cxio_rdev *rdev, u32 * stag, u32 pdid,
-		       enum tpt_mem_perm perm, u32 * pbl_size, u32 * pbl_addr);
 int cxio_register_phys_mem(struct cxio_rdev *rdev, u32 * stag, u32 pdid,
 			   enum tpt_mem_perm perm, u32 zbva, u64 to, u32 len,
 			   u8 page_size, __be64 *pbl, u32 *pbl_size,
@@ -171,8 +168,6 @@ int cxio_deallocate_window(struct cxio_rdev *rdev, u32 stag);
 int cxio_rdma_init(struct cxio_rdev *rdev, struct t3_rdma_init_attr *attr);
 void cxio_register_ev_cb(cxio_hal_ev_callback_func_t ev_cb);
 void cxio_unregister_ev_cb(cxio_hal_ev_callback_func_t ev_cb);
-u32 cxio_hal_get_rhdl(void);
-void cxio_hal_put_rhdl(u32 rhdl);
 u32 cxio_hal_get_pdid(struct cxio_hal_resource *rscp);
 void cxio_hal_put_pdid(struct cxio_hal_resource *rscp, u32 pdid);
 int __init cxio_hal_init(void);
diff --git a/drivers/infiniband/hw/cxgb3/cxio_resource.c b/drivers/infiniband/hw/cxgb3/cxio_resource.c
index 65bf577..d3095ae 100644
--- a/drivers/infiniband/hw/cxgb3/cxio_resource.c
+++ b/drivers/infiniband/hw/cxgb3/cxio_resource.c
@@ -179,7 +179,7 @@ tpt_err:
 /*
  * returns 0 if no resource available
  */
-static inline u32 cxio_hal_get_resource(struct kfifo *fifo)
+static u32 cxio_hal_get_resource(struct kfifo *fifo)
 {
 	u32 entry;
 	if (kfifo_get(fifo, (unsigned char *) &entry, sizeof(u32)))
@@ -188,21 +188,11 @@ static inline u32 cxio_hal_get_resource(struct kfifo *fifo)
 		return 0;	/* fifo emptry */
 }
 
-static inline void cxio_hal_put_resource(struct kfifo *fifo, u32 entry)
+static void cxio_hal_put_resource(struct kfifo *fifo, u32 entry)
 {
 	BUG_ON(kfifo_put(fifo, (unsigned char *) &entry, sizeof(u32)) == 0);
 }
 
-u32 cxio_hal_get_rhdl(void)
-{
-	return cxio_hal_get_resource(rhdl_fifo);
-}
-
-void cxio_hal_put_rhdl(u32 rhdl)
-{
-	cxio_hal_put_resource(rhdl_fifo, rhdl);
-}
-
 u32 cxio_hal_get_stag(struct cxio_hal_resource *rscp)
 {
 	return cxio_hal_get_resource(rscp->tpt_fifo);
diff --git a/drivers/infiniband/hw/cxgb3/iwch_cm.c b/drivers/infiniband/hw/cxgb3/iwch_cm.c
index e5442e3..b21fde8 100644
--- a/drivers/infiniband/hw/cxgb3/iwch_cm.c
+++ b/drivers/infiniband/hw/cxgb3/iwch_cm.c
@@ -209,8 +209,7 @@ static enum iwch_ep_state state_read(struct iwch_ep_common *epc)
 	return state;
 }
 
-static inline void __state_set(struct iwch_ep_common *epc,
-			       enum iwch_ep_state new)
+static void __state_set(struct iwch_ep_common *epc, enum iwch_ep_state new)
 {
 	epc->state = new;
 }
@@ -1459,7 +1458,7 @@ static int peer_close(struct t3cdev *tdev, struct sk_buff *skb, void *ctx)
 /*
  * Returns whether an ABORT_REQ_RSS message is a negative advice.
  */
-static inline int is_neg_adv_abort(unsigned int status)
+static int is_neg_adv_abort(unsigned int status)
 {
 	return status == CPL_ERR_RTX_NEG_ADVICE ||
 	       status == CPL_ERR_PERSIST_NEG_ADVICE;
@@ -1635,6 +1634,7 @@ static int ec_status(struct t3cdev *tdev, struct sk_buff *skb, void *ctx)
 
 		printk(KERN_ERR MOD "%s BAD CLOSE - Aborting tid %u\n",
 		       __FUNCTION__, ep->hwtid);
+		stop_ep_timer(ep);
 		attrs.next_state = IWCH_QP_STATE_ERROR;
 		iwch_modify_qp(ep->com.qp->rhp,
 			       ep->com.qp, IWCH_QP_ATTR_NEXT_STATE,
diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.c b/drivers/infiniband/hw/cxgb3/iwch_provider.c
index 2aef122..9947a14 100644
--- a/drivers/infiniband/hw/cxgb3/iwch_provider.c
+++ b/drivers/infiniband/hw/cxgb3/iwch_provider.c
@@ -948,7 +948,7 @@ void iwch_qp_rem_ref(struct ib_qp *qp)
 	        wake_up(&(to_iwch_qp(qp)->wait));
 }
 
-struct ib_qp *iwch_get_qp(struct ib_device *dev, int qpn)
+static struct ib_qp *iwch_get_qp(struct ib_device *dev, int qpn)
 {
 	PDBG("%s ib_dev %p qpn 0x%x\n", __FUNCTION__, dev, qpn);
 	return (struct ib_qp *)get_qhp(to_iwch_dev(dev), qpn);
diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.h b/drivers/infiniband/hw/cxgb3/iwch_provider.h
index 2af3e93..de0fe1b 100644
--- a/drivers/infiniband/hw/cxgb3/iwch_provider.h
+++ b/drivers/infiniband/hw/cxgb3/iwch_provider.h
@@ -178,7 +178,6 @@ static inline struct iwch_qp *to_iwch_qp(struct ib_qp *ibqp)
 
 void iwch_qp_add_ref(struct ib_qp *qp);
 void iwch_qp_rem_ref(struct ib_qp *qp);
-struct ib_qp *iwch_get_qp(struct ib_device *dev, int qpn);
 
 struct iwch_ucontext {
 	struct ib_ucontext ibucontext;
diff --git a/drivers/infiniband/hw/cxgb3/iwch_qp.c b/drivers/infiniband/hw/cxgb3/iwch_qp.c
index 4dda2f6..9ea00cc 100644
--- a/drivers/infiniband/hw/cxgb3/iwch_qp.c
+++ b/drivers/infiniband/hw/cxgb3/iwch_qp.c
@@ -36,8 +36,8 @@
 
 #define NO_SUPPORT -1
 
-static inline int iwch_build_rdma_send(union t3_wr *wqe, struct ib_send_wr *wr,
-				       u8 * flit_cnt)
+static int iwch_build_rdma_send(union t3_wr *wqe, struct ib_send_wr *wr,
+				u8 * flit_cnt)
 {
 	int i;
 	u32 plen;
@@ -96,8 +96,8 @@ static inline int iwch_build_rdma_send(union t3_wr *wqe, struct ib_send_wr *wr,
 	return 0;
 }
 
-static inline int iwch_build_rdma_write(union t3_wr *wqe, struct ib_send_wr *wr,
-					u8 *flit_cnt)
+static int iwch_build_rdma_write(union t3_wr *wqe, struct ib_send_wr *wr,
+				 u8 *flit_cnt)
 {
 	int i;
 	u32 plen;
@@ -137,8 +137,8 @@ static inline int iwch_build_rdma_write(union t3_wr *wqe, struct ib_send_wr *wr,
 	return 0;
 }
 
-static inline int iwch_build_rdma_read(union t3_wr *wqe, struct ib_send_wr *wr,
-				       u8 *flit_cnt)
+static int iwch_build_rdma_read(union t3_wr *wqe, struct ib_send_wr *wr,
+				u8 *flit_cnt)
 {
 	if (wr->num_sge > 1)
 		return -EINVAL;
@@ -158,9 +158,8 @@ static inline int iwch_build_rdma_read(union t3_wr *wqe, struct ib_send_wr *wr,
 /*
  * TBD: this is going to be moved to firmware. Missing pdid/qpid check for now.
  */
-static inline int iwch_sgl2pbl_map(struct iwch_dev *rhp,
-				   struct ib_sge *sg_list, u32 num_sgle,
-				   u32 * pbl_addr, u8 * page_size)
+static int iwch_sgl2pbl_map(struct iwch_dev *rhp, struct ib_sge *sg_list,
+			    u32 num_sgle, u32 * pbl_addr, u8 * page_size)
 {
 	int i;
 	struct iwch_mr *mhp;
@@ -206,9 +205,8 @@ static inline int iwch_sgl2pbl_map(struct iwch_dev *rhp,
 	return 0;
 }
 
-static inline int iwch_build_rdma_recv(struct iwch_dev *rhp,
-						    union t3_wr *wqe,
-						    struct ib_recv_wr *wr)
+static int iwch_build_rdma_recv(struct iwch_dev *rhp, union t3_wr *wqe,
+				struct ib_recv_wr *wr)
 {
 	int i, err = 0;
 	u32 pbl_addr[4];
@@ -473,8 +471,7 @@ int iwch_bind_mw(struct ib_qp *qp,
 	return err;
 }
 
-static inline void build_term_codes(int t3err, u8 *layer_type, u8 *ecode,
-				    int tagged)
+static void build_term_codes(int t3err, u8 *layer_type, u8 *ecode, int tagged)
 {
 	switch (t3err) {
 	case TPT_ERR_STAG:
@@ -672,7 +669,7 @@ static void __flush_qp(struct iwch_qp *qhp, unsigned long *flag)
 	spin_lock_irqsave(&qhp->lock, *flag);
 }
 
-static inline void flush_qp(struct iwch_qp *qhp, unsigned long *flag)
+static void flush_qp(struct iwch_qp *qhp, unsigned long *flag)
 {
 	if (t3b_device(qhp->rhp))
 		cxio_set_wq_in_error(&qhp->wq);
@@ -684,7 +681,7 @@ static inline void flush_qp(struct iwch_qp *qhp, unsigned long *flag)
 /*
  * Return non zero if at least one RECV was pre-posted.
  */
-static inline int rqes_posted(struct iwch_qp *qhp)
+static int rqes_posted(struct iwch_qp *qhp)
 {
 	return fw_riwrh_opcode((struct fw_riwrh *)qhp->wq.queue) == T3_WR_RCV;
 }
diff --git a/drivers/infiniband/hw/mthca/mthca_mr.c b/drivers/infiniband/hw/mthca/mthca_mr.c
index 6037dd3..8e4846b 100644
--- a/drivers/infiniband/hw/mthca/mthca_mr.c
+++ b/drivers/infiniband/hw/mthca/mthca_mr.c
@@ -310,8 +310,9 @@ int mthca_write_mtt_size(struct mthca_dev *dev)
 	return mthca_is_memfree(dev) ? (PAGE_SIZE / sizeof (u64)) : 0x7ffffff;
 }
 
-void mthca_tavor_write_mtt_seg(struct mthca_dev *dev, struct mthca_mtt *mtt,
-			      int start_index, u64 *buffer_list, int list_len)
+static void mthca_tavor_write_mtt_seg(struct mthca_dev *dev,
+				      struct mthca_mtt *mtt, int start_index,
+				      u64 *buffer_list, int list_len)
 {
 	u64 __iomem *mtts;
 	int i;
@@ -323,8 +324,9 @@ void mthca_tavor_write_mtt_seg(struct mthca_dev *dev, struct mthca_mtt *mtt,
 				  mtts + i);
 }
 
-void mthca_arbel_write_mtt_seg(struct mthca_dev *dev, struct mthca_mtt *mtt,
-			      int start_index, u64 *buffer_list, int list_len)
+static void mthca_arbel_write_mtt_seg(struct mthca_dev *dev,
+				      struct mthca_mtt *mtt, int start_index,
+				      u64 *buffer_list, int list_len)
 {
 	__be64 *mtts;
 	dma_addr_t dma_handle;
diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h b/drivers/infiniband/ulp/ipoib/ipoib.h
index 2594db2..fd55826 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib.h
+++ b/drivers/infiniband/ulp/ipoib/ipoib.h
@@ -219,7 +219,6 @@ struct ipoib_dev_priv {
 
 	union ib_gid local_gid;
 	u16          local_lid;
-	u8           local_rate;
 
 	unsigned int admin_mtu;
 	unsigned int mcast_mtu;
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_cm.c b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
index 4d59682..3484e8b 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_cm.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
@@ -65,14 +65,14 @@ struct ipoib_cm_id {
 static int ipoib_cm_tx_handler(struct ib_cm_id *cm_id,
 			       struct ib_cm_event *event);
 
-static void ipoib_cm_dma_unmap_rx(struct ipoib_dev_priv *priv,
+static void ipoib_cm_dma_unmap_rx(struct ipoib_dev_priv *priv, int frags,
 				  u64 mapping[IPOIB_CM_RX_SG])
 {
 	int i;
 
 	ib_dma_unmap_single(priv->ca, mapping[0], IPOIB_CM_HEAD_SIZE, DMA_FROM_DEVICE);
 
-	for (i = 0; i < IPOIB_CM_RX_SG - 1; ++i)
+	for (i = 0; i < frags; ++i)
 		ib_dma_unmap_single(priv->ca, mapping[i + 1], PAGE_SIZE, DMA_FROM_DEVICE);
 }
 
@@ -90,7 +90,8 @@ static int ipoib_cm_post_receive(struct net_device *dev, int id)
 	ret = ib_post_srq_recv(priv->cm.srq, &priv->cm.rx_wr, &bad_wr);
 	if (unlikely(ret)) {
 		ipoib_warn(priv, "post srq failed for buf %d (%d)\n", id, ret);
-		ipoib_cm_dma_unmap_rx(priv, priv->cm.srq_ring[id].mapping);
+		ipoib_cm_dma_unmap_rx(priv, IPOIB_CM_RX_SG - 1,
+				      priv->cm.srq_ring[id].mapping);
 		dev_kfree_skb_any(priv->cm.srq_ring[id].skb);
 		priv->cm.srq_ring[id].skb = NULL;
 	}
@@ -98,8 +99,8 @@ static int ipoib_cm_post_receive(struct net_device *dev, int id)
 	return ret;
 }
 
-static int ipoib_cm_alloc_rx_skb(struct net_device *dev, int id,
-				 u64 mapping[IPOIB_CM_RX_SG])
+static struct sk_buff *ipoib_cm_alloc_rx_skb(struct net_device *dev, int id, int frags,
+					     u64 mapping[IPOIB_CM_RX_SG])
 {
 	struct ipoib_dev_priv *priv = netdev_priv(dev);
 	struct sk_buff *skb;
@@ -107,7 +108,7 @@ static int ipoib_cm_alloc_rx_skb(struct net_device *dev, int id,
 
 	skb = dev_alloc_skb(IPOIB_CM_HEAD_SIZE + 12);
 	if (unlikely(!skb))
-		return -ENOMEM;
+		return NULL;
 
 	/*
 	 * IPoIB adds a 4 byte header. So we need 12 more bytes to align the
@@ -119,10 +120,10 @@ static int ipoib_cm_alloc_rx_skb(struct net_device *dev, int id,
 				       DMA_FROM_DEVICE);
 	if (unlikely(ib_dma_mapping_error(priv->ca, mapping[0]))) {
 		dev_kfree_skb_any(skb);
-		return -EIO;
+		return NULL;
 	}
 
-	for (i = 0; i < IPOIB_CM_RX_SG - 1; i++) {
+	for (i = 0; i < frags; i++) {
 		struct page *page = alloc_page(GFP_ATOMIC);
 
 		if (!page)
@@ -136,7 +137,7 @@ static int ipoib_cm_alloc_rx_skb(struct net_device *dev, int id,
 	}
 
 	priv->cm.srq_ring[id].skb = skb;
-	return 0;
+	return skb;
 
 partial_error:
 
@@ -146,7 +147,7 @@ partial_error:
 		ib_dma_unmap_single(priv->ca, mapping[i + 1], PAGE_SIZE, DMA_FROM_DEVICE);
 
 	dev_kfree_skb_any(skb);
-	return -ENOMEM;
+	return NULL;
 }
 
 static struct ib_qp *ipoib_cm_create_rx_qp(struct net_device *dev,
@@ -309,7 +310,7 @@ static int ipoib_cm_rx_handler(struct ib_cm_id *cm_id,
 }
 /* Adjust length of skb with fragments to match received data */
 static void skb_put_frags(struct sk_buff *skb, unsigned int hdr_space,
-			  unsigned int length)
+			  unsigned int length, struct sk_buff *toskb)
 {
 	int i, num_frags;
 	unsigned int size;
@@ -326,7 +327,7 @@ static void skb_put_frags(struct sk_buff *skb, unsigned int hdr_space,
 
 		if (length == 0) {
 			/* don't need this page */
-			__free_page(frag->page);
+			skb_fill_page_desc(toskb, i, frag->page, 0, PAGE_SIZE);
 			--skb_shinfo(skb)->nr_frags;
 		} else {
 			size = min(length, (unsigned) PAGE_SIZE);
@@ -344,10 +345,11 @@ void ipoib_cm_handle_rx_wc(struct net_device *dev, struct ib_wc *wc)
 {
 	struct ipoib_dev_priv *priv = netdev_priv(dev);
 	unsigned int wr_id = wc->wr_id & ~IPOIB_CM_OP_SRQ;
-	struct sk_buff *skb;
+	struct sk_buff *skb, *newskb;
 	struct ipoib_cm_rx *p;
 	unsigned long flags;
 	u64 mapping[IPOIB_CM_RX_SG];
+	int frags;
 
 	ipoib_dbg_data(priv, "cm recv completion: id %d, op %d, status: %d\n",
 		       wr_id, wc->opcode, wc->status);
@@ -383,7 +385,11 @@ void ipoib_cm_handle_rx_wc(struct net_device *dev, struct ib_wc *wc)
 		}
 	}
 
-	if (unlikely(ipoib_cm_alloc_rx_skb(dev, wr_id, mapping))) {
+	frags = PAGE_ALIGN(wc->byte_len - min(wc->byte_len,
+					      (unsigned)IPOIB_CM_HEAD_SIZE)) / PAGE_SIZE;
+
+	newskb = ipoib_cm_alloc_rx_skb(dev, wr_id, frags, mapping);
+	if (unlikely(!newskb)) {
 		/*
 		 * If we can't allocate a new RX buffer, dump
 		 * this packet and reuse the old buffer.
@@ -393,13 +399,13 @@ void ipoib_cm_handle_rx_wc(struct net_device *dev, struct ib_wc *wc)
 		goto repost;
 	}
 
-	ipoib_cm_dma_unmap_rx(priv, priv->cm.srq_ring[wr_id].mapping);
-	memcpy(priv->cm.srq_ring[wr_id].mapping, mapping, sizeof mapping);
+	ipoib_cm_dma_unmap_rx(priv, frags, priv->cm.srq_ring[wr_id].mapping);
+	memcpy(priv->cm.srq_ring[wr_id].mapping, mapping, (frags + 1) * sizeof *mapping);
 
 	ipoib_dbg_data(priv, "received %d bytes, SLID 0x%04x\n",
 		       wc->byte_len, wc->slid);
 
-	skb_put_frags(skb, IPOIB_CM_HEAD_SIZE, wc->byte_len);
+	skb_put_frags(skb, IPOIB_CM_HEAD_SIZE, wc->byte_len, newskb);
 
 	skb->protocol = ((struct ipoib_header *) skb->data)->proto;
 	skb->mac.raw = skb->data;
@@ -1193,7 +1199,8 @@ int ipoib_cm_dev_init(struct net_device *dev)
 	priv->cm.rx_wr.num_sge = IPOIB_CM_RX_SG;
 
 	for (i = 0; i < ipoib_recvq_size; ++i) {
-		if (ipoib_cm_alloc_rx_skb(dev, i, priv->cm.srq_ring[i].mapping)) {
+		if (!ipoib_cm_alloc_rx_skb(dev, i, IPOIB_CM_RX_SG - 1,
+					   priv->cm.srq_ring[i].mapping)) {
 			ipoib_warn(priv, "failed to allocate receive buffer %d\n", i);
 			ipoib_cm_dev_cleanup(dev);
 			return -ENOMEM;
@@ -1228,7 +1235,8 @@ void ipoib_cm_dev_cleanup(struct net_device *dev)
 		return;
 	for (i = 0; i < ipoib_recvq_size; ++i)
 		if (priv->cm.srq_ring[i].skb) {
-			ipoib_cm_dma_unmap_rx(priv, priv->cm.srq_ring[i].mapping);
+			ipoib_cm_dma_unmap_rx(priv, IPOIB_CM_RX_SG - 1,
+					      priv->cm.srq_ring[i].mapping);
 			dev_kfree_skb_any(priv->cm.srq_ring[i].skb);
 			priv->cm.srq_ring[i].skb = NULL;
 		}
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c
index 18d27fd..f9dbc6f 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
@@ -385,7 +385,7 @@ static void path_rec_completion(int status,
 	struct sk_buff *skb;
 	unsigned long flags;
 
-	if (pathrec)
+	if (!status)
 		ipoib_dbg(priv, "PathRec LID 0x%04x for GID " IPOIB_GID_FMT "\n",
 			  be16_to_cpu(pathrec->dlid), IPOIB_GID_ARG(pathrec->dgid));
 	else
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
index b303ce6..bb2e3d5 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
@@ -527,11 +527,9 @@ void ipoib_mcast_join_task(struct work_struct *work)
 	{
 		struct ib_port_attr attr;
 
-		if (!ib_query_port(priv->ca, priv->port, &attr)) {
-			priv->local_lid  = attr.lid;
-			priv->local_rate = attr.active_speed *
-				ib_width_enum_to_int(attr.active_width);
-		} else
+		if (!ib_query_port(priv->ca, priv->port, &attr))
+			priv->local_lid = attr.lid;
+		else
 			ipoib_warn(priv, "ib_query_port failed\n");
 	}
 

From Ashish.Batwara at lsi.com  Mon Feb 26 13:04:41 2007
From: Ashish.Batwara at lsi.com (Batwara, Ashish)
Date: Mon, 26 Feb 2007 14:04:41 -0700
Subject: [openib-general] opensm issue
Message-ID: <01B9E81EECACE94DBBD0A556E768FB8A013A9361@NAMAIL2.ad.lsil.com>

Hi,
I am trying to bring up opensm, but it not letting me. When I look at
the /var/log/messages, I see that it becomes UP for a moment and then
again it goes down. Look for " SUBNET UP  " in below logs. Can anyone
know what the problem is? I am using OFED-1.1.1 with patches almost 1
month ago.

Thanks
Ashish


Feb 26 14:38:37 p49 run_srp_daemon[7640]: failed srp_daemon:
[HCA=mthca0] [port=2] [exit status=0]
Feb 26 14:38:37 p49 run_srp_daemon[7642]: failed srp_daemon:
[HCA=mthca0] [port=1] [exit status=0]
Feb 26 14:38:46 p49 OpenSM[7433]: SM port is down  
Feb 26 14:38:53 p49 run_srp_daemon[7653]: starting srp_daemon:
[HCA=mthca0] [port=2]
Feb 26 14:38:53 p49 run_srp_daemon[7658]: starting srp_daemon:
[HCA=mthca0] [port=1]
Feb 26 14:38:56 p49 OpenSM[7433]: SM port is down  
Feb 26 14:38:56 p49 run_srp_daemon[7675]: failed srp_daemon:
[HCA=mthca0] [port=2] [exit status=0]
Feb 26 14:38:56 p49 run_srp_daemon[7680]: failed srp_daemon:
[HCA=mthca0] [port=1] [exit status=0]
Feb 26 14:39:06 p49 OpenSM[7433]: SM port is down  
Feb 26 14:39:26 p49 last message repeated 2 times
Feb 26 14:39:26 p49 run_srp_daemon[7691]: starting srp_daemon:
[HCA=mthca0] [port=1]
Feb 26 14:39:26 p49 run_srp_daemon[7692]: starting srp_daemon:
[HCA=mthca0] [port=2]
Feb 26 14:39:29 p49 run_srp_daemon[7715]: failed srp_daemon:
[HCA=mthca0] [port=1] [exit status=0]
Feb 26 14:39:29 p49 run_srp_daemon[7716]: failed srp_daemon:
[HCA=mthca0] [port=2] [exit status=0]
Feb 26 14:39:36 p49 OpenSM[7433]: SM port is down  
Feb 26 14:39:56 p49 last message repeated 2 times
Feb 26 14:39:59 p49 run_srp_daemon[7728]: starting srp_daemon:
[HCA=mthca0] [port=1]
Feb 26 14:39:59 p49 run_srp_daemon[7727]: starting srp_daemon:
[HCA=mthca0] [port=2]
Feb 26 14:40:02 p49 run_srp_daemon[7752]: failed srp_daemon:
[HCA=mthca0] [port=1] [exit status=0]
Feb 26 14:40:02 p49 run_srp_daemon[7751]: failed srp_daemon:
[HCA=mthca0] [port=2] [exit status=0]
Feb 26 14:40:06 p49 OpenSM[7433]: SM port is down  
Feb 26 14:40:26 p49 last message repeated 2 times
Feb 26 14:40:32 p49 run_srp_daemon[7791]: starting srp_daemon:
[HCA=mthca0] [port=2]
Feb 26 14:40:32 p49 run_srp_daemon[7792]: starting srp_daemon:
[HCA=mthca0] [port=1]
Feb 26 14:40:35 p49 run_srp_daemon[7812]: failed srp_daemon:
[HCA=mthca0] [port=1] [exit status=0]
Feb 26 14:40:35 p49 run_srp_daemon[7817]: failed srp_daemon:
[HCA=mthca0] [port=2] [exit status=0]
Feb 26 14:40:36 p49 OpenSM[7433]: SM port is down  
Feb 26 14:40:46 p49 OpenSM[7433]: SM port is down  
Feb 26 14:40:56 p49 OpenSM[7433]: Entering MASTER state  
Feb 26 14:40:56 p49 OpenSM[7433]: SUBNET UP  
Feb 26 14:41:05 p49 run_srp_daemon[7823]: starting srp_daemon:
[HCA=mthca0] [port=1]
Feb 26 14:41:05 p49 run_srp_daemon[7832]: starting srp_daemon:
[HCA=mthca0] [port=2]
Feb 26 14:41:06 p49 OpenSM[7433]: SM port is down  
Feb 26 14:41:08 p49 run_srp_daemon[7847]: failed srp_daemon:
[HCA=mthca0] [port=2] [exit status=0]
Feb 26 14:41:14 p49 run_srp_daemon[7853]: failed srp_daemon:
[HCA=mthca0] [port=1] [exit status=0]
Feb 26 14:41:16 p49 OpenSM[7433]: SM port is down  


From halr at voltaire.com  Mon Feb 26 13:25:28 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 26 Feb 2007 16:25:28 -0500
Subject: [openib-general] opensm issue
In-Reply-To: <01B9E81EECACE94DBBD0A556E768FB8A013A9361@NAMAIL2.ad.lsil.com>
References: <01B9E81EECACE94DBBD0A556E768FB8A013A9361@NAMAIL2.ad.lsil.com>
Message-ID: <1172525125.4102.295158.camel@hal.voltaire.com>

Hi Ashish,

On Mon, 2007-02-26 at 16:04, Batwara, Ashish wrote:
> Hi,
> I am trying to bring up opensm, but it not letting me. When I look at
> the /var/log/messages, I see that it becomes UP for a moment and then
> again it goes down. Look for " SUBNET UP  " in below logs. Can anyone
> know what the problem is? I am using OFED-1.1.1 with patches almost 1
> month ago.
> 
> Thanks
> Ashish
> 
> 
> Feb 26 14:38:37 p49 run_srp_daemon[7640]: failed srp_daemon:
> [HCA=mthca0] [port=2] [exit status=0]
> Feb 26 14:38:37 p49 run_srp_daemon[7642]: failed srp_daemon:
> [HCA=mthca0] [port=1] [exit status=0]
> Feb 26 14:38:46 p49 OpenSM[7433]: SM port is down  
> Feb 26 14:38:53 p49 run_srp_daemon[7653]: starting srp_daemon:
> [HCA=mthca0] [port=2]
> Feb 26 14:38:53 p49 run_srp_daemon[7658]: starting srp_daemon:
> [HCA=mthca0] [port=1]
> Feb 26 14:38:56 p49 OpenSM[7433]: SM port is down  
> Feb 26 14:38:56 p49 run_srp_daemon[7675]: failed srp_daemon:
> [HCA=mthca0] [port=2] [exit status=0]
> Feb 26 14:38:56 p49 run_srp_daemon[7680]: failed srp_daemon:
> [HCA=mthca0] [port=1] [exit status=0]
> Feb 26 14:39:06 p49 OpenSM[7433]: SM port is down  
> Feb 26 14:39:26 p49 last message repeated 2 times
> Feb 26 14:39:26 p49 run_srp_daemon[7691]: starting srp_daemon:
> [HCA=mthca0] [port=1]
> Feb 26 14:39:26 p49 run_srp_daemon[7692]: starting srp_daemon:
> [HCA=mthca0] [port=2]
> Feb 26 14:39:29 p49 run_srp_daemon[7715]: failed srp_daemon:
> [HCA=mthca0] [port=1] [exit status=0]
> Feb 26 14:39:29 p49 run_srp_daemon[7716]: failed srp_daemon:
> [HCA=mthca0] [port=2] [exit status=0]
> Feb 26 14:39:36 p49 OpenSM[7433]: SM port is down  
> Feb 26 14:39:56 p49 last message repeated 2 times
> Feb 26 14:39:59 p49 run_srp_daemon[7728]: starting srp_daemon:
> [HCA=mthca0] [port=1]
> Feb 26 14:39:59 p49 run_srp_daemon[7727]: starting srp_daemon:
> [HCA=mthca0] [port=2]
> Feb 26 14:40:02 p49 run_srp_daemon[7752]: failed srp_daemon:
> [HCA=mthca0] [port=1] [exit status=0]
> Feb 26 14:40:02 p49 run_srp_daemon[7751]: failed srp_daemon:
> [HCA=mthca0] [port=2] [exit status=0]
> Feb 26 14:40:06 p49 OpenSM[7433]: SM port is down  
> Feb 26 14:40:26 p49 last message repeated 2 times
> Feb 26 14:40:32 p49 run_srp_daemon[7791]: starting srp_daemon:
> [HCA=mthca0] [port=2]
> Feb 26 14:40:32 p49 run_srp_daemon[7792]: starting srp_daemon:
> [HCA=mthca0] [port=1]
> Feb 26 14:40:35 p49 run_srp_daemon[7812]: failed srp_daemon:
> [HCA=mthca0] [port=1] [exit status=0]
> Feb 26 14:40:35 p49 run_srp_daemon[7817]: failed srp_daemon:
> [HCA=mthca0] [port=2] [exit status=0]
> Feb 26 14:40:36 p49 OpenSM[7433]: SM port is down  
> Feb 26 14:40:46 p49 OpenSM[7433]: SM port is down  
> Feb 26 14:40:56 p49 OpenSM[7433]: Entering MASTER state  
> Feb 26 14:40:56 p49 OpenSM[7433]: SUBNET UP  
> Feb 26 14:41:05 p49 run_srp_daemon[7823]: starting srp_daemon:
> [HCA=mthca0] [port=1]
> Feb 26 14:41:05 p49 run_srp_daemon[7832]: starting srp_daemon:
> [HCA=mthca0] [port=2]
> Feb 26 14:41:06 p49 OpenSM[7433]: SM port is down  
> Feb 26 14:41:08 p49 run_srp_daemon[7847]: failed srp_daemon:
> [HCA=mthca0] [port=2] [exit status=0]
> Feb 26 14:41:14 p49 run_srp_daemon[7853]: failed srp_daemon:
> [HCA=mthca0] [port=1] [exit status=0]
> Feb 26 14:41:16 p49 OpenSM[7433]: SM port is down  

It appears your SM port to some switch (?) is losing physical
connectivity. Try a different (known good) cable.

-- Hal

> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
> 


From xma at us.ibm.com  Mon Feb 26 14:00:52 2007
From: xma at us.ibm.com (Shirley Ma)
Date: Mon, 26 Feb 2007 14:00:52 -0800
Subject: [openib-general] ib0 interface up but can't ping
In-Reply-To: <327816.12892.qm@web35105.mail.mud.yahoo.com>
Message-ID: <OF84669D1B.F9D0FBD4-ON8725728E.0078BB23-8825728E.004CFF3A@us.ibm.com>


If your subnet is already has a SM running. Please look at the ifconfig
output. If the interface ib0 is UP but not RUNNING, you can't ping since
the carrier is not ON. Also look at /var/log/messages to see whether there
is any errors.

Thanks
Shirley Ma
IBM Linux Technology Center
15300 SW Koll Parkway
Beaverton, OR 97006-6063
Phone(Fax): (503) 578-7638
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070226/977bb275/attachment.html>

From xma at us.ibm.com  Mon Feb 26 14:04:53 2007
From: xma at us.ibm.com (Shirley Ma)
Date: Mon, 26 Feb 2007 14:04:53 -0800
Subject: [openib-general] IPOIB NAPI
In-Reply-To: <adahctdu6c8.fsf@cisco.com>
Message-ID: <OF0044A863.BC16B4C9-ON8725728E.0079158C-8825728E.004D5D86@us.ibm.com>


Roland,

Yes. It would be good to reduce number of interrupts by changing all upper
layer protocols to use:

poll CQ
notify CQ, rotting packet notification
poll again

instead of
notify CQ
poll CQ

If possible this can be in OFED-1.2?

Thanks
Shirley Ma
IBM Linux Technology Center
15300 SW Koll Parkway
Beaverton, OR 97006-6063
Phone(Fax): (503) 578-7638
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070226/3022cf44/attachment.html>

From rdreier at cisco.com  Mon Feb 26 14:09:48 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Mon, 26 Feb 2007 14:09:48 -0800
Subject: [openib-general] [RFC/BUG] DMA vs. CQ race
In-Reply-To: <OF800955F1.F79D78A8-ON8725728E.00798436-8825728E.004DB912@us.ibm.com>
	(Shirley Ma's message of "Mon, 26 Feb 2007 14:08:48 -0800")
References: <OF800955F1.F79D78A8-ON8725728E.00798436-8825728E.004DB912@us.ibm.com>
Message-ID: <ada7iu41sgj.fsf@cisco.com>

 > That would be great. We hit a similar problem in our cluster test -- data
 > corruption because of this race.

On what platform?

 - R.


From xma at us.ibm.com  Mon Feb 26 14:08:48 2007
From: xma at us.ibm.com (Shirley Ma)
Date: Mon, 26 Feb 2007 14:08:48 -0800
Subject: [openib-general] [RFC/BUG] DMA vs. CQ race
In-Reply-To: <adabqjmypng.fsf@cisco.com>
Message-ID: <OF800955F1.F79D78A8-ON8725728E.00798436-8825728E.004DB912@us.ibm.com>


> Hmm, OK.  Then I will do my best to make sure we get a fix for this
> into 2.6.22.

That would be great. We hit a similar problem in our cluster test -- data
corruption because of this race.

Thanks
Shirley Ma
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070226/6ed9e990/attachment.html>

From xma at us.ibm.com  Mon Feb 26 14:20:56 2007
From: xma at us.ibm.com (Shirley Ma)
Date: Mon, 26 Feb 2007 14:20:56 -0800
Subject: [openib-general] [RFC/BUG] DMA vs. CQ race
In-Reply-To: <ada7iu41sgj.fsf@cisco.com>
Message-ID: <OFDFE7B645.5CBDA3F0-ON8725728E.007A8C1A-8825728E.004ED577@us.ibm.com>


Roland Dreier <rdreier at cisco.com> wrote on 02/26/2007 02:09:48 PM:
>  > That would be great. We hit a similar problem in our cluster test --
data
>  > corruption because of this race.
>
> On what platform?
>
>  - R.

On our cell blade + PCI-e Mellanox.

Thanks
Shirley Ma
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070226/da3c314a/attachment.html>

From rdreier at cisco.com  Mon Feb 26 14:27:42 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Mon, 26 Feb 2007 14:27:42 -0800
Subject: [openib-general] [RFC/BUG] DMA vs. CQ race
In-Reply-To: <OFDFE7B645.5CBDA3F0-ON8725728E.007A8C1A-8825728E.004ED577@us.ibm.com>
	(Shirley Ma's message of "Mon, 26 Feb 2007 14:20:56 -0800")
References: <OFDFE7B645.5CBDA3F0-ON8725728E.007A8C1A-8825728E.004ED577@us.ibm.com>
Message-ID: <adaodngzh9d.fsf@cisco.com>

 > On our cell blade + PCI-e Mellanox.

I don't see anything in arch/powerpc that looks like
dma_alloc_coherent() will do anything other than allocate some memory
and map it with DMA_BIDIRECTIONAL.  So how does this altix fix help in
your situation?  Am I misreading the Cell IOMMU code?

 - R.


From rdreier at cisco.com  Mon Feb 26 14:36:26 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Mon, 26 Feb 2007 14:36:26 -0800
Subject: [openib-general] IPOIB NAPI
In-Reply-To: <OF0044A863.BC16B4C9-ON8725728E.0079158C-8825728E.004D5D86@us.ibm.com>
	(Shirley Ma's message of "Mon, 26 Feb 2007 14:04:53 -0800")
References: <OF0044A863.BC16B4C9-ON8725728E.0079158C-8825728E.004D5D86@us.ibm.com>
Message-ID: <adad53wzgut.fsf@cisco.com>

 > Yes. It would be good to reduce number of interrupts by changing all upper
 > layer protocols to use:
 > 
 > poll CQ
 > notify CQ, rotting packet notification
 > poll again
 > 
 > instead of
 > notify CQ
 > poll CQ
 > 
 > If possible this can be in OFED-1.2?

No way, it's way too late at this point to change the kernel-user ABI,
let alone change all ULPs.

 - R.


From halr at voltaire.com  Mon Feb 26 14:47:58 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 26 Feb 2007 17:47:58 -0500
Subject: [openib-general] [PATCH] opensm: faster min hops
In-Reply-To: <20070225214845.GF11957@sashak.voltaire.com>
References: <20070225214845.GF11957@sashak.voltaire.com>
Message-ID: <1172530075.4102.299979.camel@hal.voltaire.com>

On Sun, 2007-02-25 at 16:48, Sasha Khapyorsky wrote:
> After gprof output analyzing, I noticed that current lmx (switch's lid
> matrix) implementation is extremely slow. This simple hops matrix
> reimplementation makes lid matrices build process two times faster.

Excellent!

> Signed-off-by: Sasha Khapyorsky <sashak at voltaire.com>

Thanks! Applied (to master only right now).

-- Hal


From krause at cup.hp.com  Mon Feb 26 15:34:39 2007
From: krause at cup.hp.com (Michael Krause)
Date: Mon, 26 Feb 2007 15:34:39 -0800
Subject: [openib-general] IB routing discussion summary
In-Reply-To: <000201c755f1$727618d0$8698070a@amr.corp.intel.com>
References: <6.2.0.14.2.20070220103929.02953a20@esmail.cup.hp.com>
	<000201c755f1$727618d0$8698070a@amr.corp.intel.com>
Message-ID: <6.2.0.14.2.20070226152634.026a85a8@esmail.cup.hp.com>

At 11:49 AM 2/21/2007, Sean Hefty wrote:
>I sent a message on this topic to the IBTA several days ago, but I am still
>awaiting details (likely early next week).

Unclear if that will occur.  I just responded to some e-mail in the IBTA on 
the router subject as well.    Given that discussion, I suspect it will be 
some time coming to fully answer the router dilemma.


> >It should not be carried in the CM REQ.  The SLID / DLID of the router
> >ports should be derived through local subnet SA / SM query.  When a CM REQ
> >traverses one or more subnets there will be potentially many SLID / DLID
> >involved in the communication.   Each router should be populating its
> >routing tables in order to build the new LRH attached to the GRH / CM REQ
> >that it is forwarding to the next hop.
>
>I'm referring to configuration of the QP, not the operation of the routers.
>
>To establish a connection, the passive side QP needs to transition from 
>Init to
>RTR.  As part of that transition, the modify QP verb needs as input the
>Destination LID of its local router.  It sounds like you expect the 
>passive side
>to perform an SA query to obtain its own local routing information, which 
>would
>essentially invalidate the data carried in the primary and alternate path 
>fields
>in the CM REQ.

The source always queries to obtain a subnet-local router Port.   A sink 
can simply reflect back the LRH with source / destination LID reversed 
assuming it had such information or it can query to find the optimal / 
preferred subnet-local router Port.


> >From reading 12.7.11, 13.5.1, and 17.4, I do not believe that such a 
> requirement
>was expected to be placed on the passive side of a connection.  The initial
>response I received agreed with this.
>
> >I'd need to go back but the architecture is predicated that the SM and SA
> >are strictly local and for security purposes their communication should
> >remain local.  Higher level management entities built to communicate with
> >SM and SA are responsible for cross subnet communications without exposing
> >the SA or SM to direct interaction.  P_Key and Q_Key management across
> >subnets is an example of such communication across subnets that would not
> >be exposed to the SA and SM.
>
>My initial thoughts are that this sounds like a good idea.  It's not 
>eliminating
>the need for interacting with a remote SA, so much as it abstracts it to 
>another
>entity.
>
>My hope is that we can reach an agreement on the CM REQ.  Depending on 
>that, it
>still needs to determine if the existing SA attributes are sufficient to allow
>forming inter-subnet connections, and if they are, can such attributes be
>obtained.

A lot of discussion will be required within the IBTA to nail anything 
down.   As I noted above, I just provided answers to a number of questions 
posed as well as opened up perhaps a few more.   I am not aware of a TTM to 
complete this work but clearly some amount of standardization is required 
and it will take a bit to define the scope so that the specification does 
not become so large that it will take significant amount of time to develop 
and more importantly, significant resources and time to validate that the 
routing protocol is solid.   Routing protocols are not as simple as some 
may think - they vary as a function of the functional robustness and 
scalability provided.

For now, I'll assume this discussion is on hold until the IBTA gets its act 
together.

Mike


From vartval at itweurope.com  Mon Feb 26 17:15:16 2007
From: vartval at itweurope.com (ITWorks =?ISO-8859-1?Q?V=E5rtVal?=)
Date: Tue, 27 Feb 2007 02:15:16 +0100
Subject: [openib-general] =?iso-8859-1?q?B=E4ttre_luft_=3D=3FISO-8859-1=3F?=
 =?iso-8859-1?q?Q=3Ff=3DF6r_b=3DE4ttre_h=3DE4lsa=3F=3D?=
Message-ID: <01151779636287@quercus.itweurope.com>

 - This mail is in HTML. Some elements may be ommited in plain text. -

Tel: +46 (0)8 625 46 40
ULTRA-TYST LUFTRENARE F�R KONTOR OCH HEM
Lider du av allergiska besv�r p� v�ren?
Besv�ras du av illaluktande �mnen p� din arbetsplats eller i hemmet? En luftrenare kan g�ra underverk p� din h�lsa! Vi rekommenderar pollenallergiska personer att st�lla ett luftfilter vid anslutning till arbetsplatsen och i sovrummet
Lukt- och dammsensorer k�nner automatiskt av luftmilj�n
Renar luften fr�n pollen, damm och partiklar
Anti-bakteriellt filter minskar infektionsrisken i gemensamma utrymmen
I princip ljudl�s - perfekt f�r sovrummet eller andra ljudk�nsliga milj�er
Elegant design och l�g vikt (endast 7 kg)
Mycket l�ttanv�nd med fj�rrkontroll
Tv�ttbart filter
2 �rs fabriksgaranti
&gt;&gt;
Best�ll eller l�s mer genom att klicka h�r!
Tips:
Vill du k�pa Blu-ray eller �kta HDTV?
E-posta
vartval at itweurope.com
s� hj�lper vi dig med leverans. Avbetalning och leasing OK!
Om du inte vill ha fler erbjudanden fr�n ITWorks, skicka ett e-brev till
removeme at itweurope.com
med �rende "remove"
If you do not want to recieve any more e-mails from ITWorks sales, please send a message to
removeme at itweurope.com
with subject "remove"
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070227/2ee40120/attachment.html>

From nimrodg at mellanox.com  Mon Feb 26 17:02:09 2007
From: nimrodg at mellanox.com (Nimrod Gindi)
Date: Mon, 26 Feb 2007 17:02:09 -0800
Subject: [openib-general] OFED release testing Task force meeting minutes
Message-ID: <1E3DCD1C63492545881FACB6063A57C1D4C8D8@mtiexch01.mti.com>

Meeting took place on Wednesday - Feb. 21st, 2007    8:30AM (PST)

 
Agenda:

1. Review combined report summary (as sent from Nimrod G.- Mellanox) and
vote for approval

2. Next steps

3. Open discussion

 
Attending companies: Qlogic, Mellanox, NetEffect, Voltaire,
SystemFabricWorks

 
Discussion Items and Action Items:

1.	Reviewed the new report structure
2.	Spread sheet was voted and agreed upon with 2 minor changes to
make it rev 2

	a.	AI 1: Nimrod G. - move all items from sheet 3 to sheet 2
in the drop down format and remove sheet 3
	b.	AI 2: Nimrod G. - Add RHEL4 up4 and up3 to the supported
section per latest decisions of ewg.

3.	Next agreed steps:

	a.	Start using the spread sheet towards later Alpha build
of OFED 1.2 to assist with visibility into testing done by members
	b.	Adding tests from member companies to shared OFED
repository.

                                                               i.
AI 3: Amit K. - send out a pointer to tests which are already posted by
Mellanox in OFED.

	c.	Start considering ULP owners under the following
understanding of responsibilities:

                                                               i.
ULP owner will be in charge of approving entering tests of the ULP to
enter the list/repository

                                                             ii.
ULP owner to flag the task force in case in which the ULP under his
responsibility is falling behind on testing in the community.

 
Follow-up meeting will be scheduled for 7th-March 2007 8:30am
PDT=11:30am EDT=6:30pm Israel.

 
Nimrod  Gindi

Mellanox Technologies Ltd.

mail  :  nimrodg at mellanox.com

Cell  :  +1-408-750-4801

Office:  +1-347-342-0011

Fax   :  +1-212-987-0275

 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070226/d97d657b/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: OFED testing report format rev2.xls
Type: application/vnd.ms-excel
Size: 48640 bytes
Desc: OFED testing report format rev2.xls
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070226/d97d657b/attachment.xls>

From m3mber at eBay.com  Mon Feb 26 18:25:38 2007
From: m3mber at eBay.com (eBay Member)
Date: Tue, 27 Feb 2007 03:25:38 +0100 (CET)
Subject: [openib-general] Message From eBay Member
Message-ID: <20070227022539.0179D2F80EC@dd1224.kasserver.com>

An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070227/796352ba/attachment.html>

From rdreier at cisco.com  Mon Feb 26 20:32:40 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Mon, 26 Feb 2007 20:32:40 -0800
Subject: [openib-general] failure to create an FMR mapping 1K pages on
	memfree
In-Reply-To: <15ddcffd0702261105s377ad165h7bfe258f69ede152@mail.gmail.com>
	(Or Gerlitz's message of "Mon, 26 Feb 2007 21:05:48 +0200")
References: <15ddcffd0702261104x6df977b6g9e4ca0071c8489ad@mail.gmail.com>
	<15ddcffd0702261105s377ad165h7bfe258f69ede152@mail.gmail.com>
Message-ID: <adawt24w787.fsf@cisco.com>

 > I have got a report on failure to create FMR mapping 1K pages (that is
 > 4MB) on memfree.
 >
 > I don't have the exact details (ie if Arbel/Sinai / what FW  / etc)
 > nor which exact check fails in
 > mthca_fmr_alloc, but what's clear is that the latter function returns
 > -ENOMEM when attr.max_pages is 1024 and it works fine when
 > attr.max_pages is 256.
 >
 > Is this failure clear to you? if yes, does a HW or FW limit is being
 > hit or its a driver design issue?

Is it really returning -ENOMEM?  It seems much more likely that you
are hitting the code

	/* For Arbel, all MTTs must fit in the same page. */
	if (mthca_is_memfree(dev) &&
	    mr->attr.max_pages * sizeof *mr->mem.arbel.mtts > PAGE_SIZE)
		return -EINVAL;

I guess you could call this limit a driver design issue.

 - R.


From mst at mellanox.co.il  Mon Feb 26 22:02:45 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Tue, 27 Feb 2007 08:02:45 +0200
Subject: [openib-general] [RFC] [PATCH v2] IB/ipoib: Add bonding support
	to IPoIB
In-Reply-To: <45E313D2.70909@voltaire.com>
References: <45E313D2.70909@voltaire.com>
Message-ID: <20070227060245.GI12919@mellanox.co.il>

> When using the bonding driver, neighbours are created by the net stack on behalf
> of the bonding (master) device. On the tx flow the bonding code gets an skb such
> that skb->dev points to the master device, it changes this skb to point on the
> slave device and calls the slave hard_start_xmit function.
> 
> 
> Combing these two flows, there is a hole if some code at ipoib
> (ipoib_neigh_destructor) assumes that for each struct neighbour it gets, n->dev
> is an ipoib device so for example netdev_priv(n->dev) would be of type struct
> ipoib_dev_priv.
> 
> To fix it, this patch adds a dev field to struct ipoib_neigh which is used
> instead of the struct neighbour dev one.

It seems that in this design, if multiple ipoib interfaces are present, we might
get an skb such that skb->dev will be different from the new dev field in struct
ipoib_neigh.

It seems that the result will be that the packet will be sent on a wrong interface.
Right?

> In addition, if an IPoIB device is removed before bonding is unloaded it may 
> cause bond0 neighbours (neighbours that point to bond0) to exist after the IPoIB
> device no longer exist. This is why a neighbour cleanup is required during device 
> cleanup. This cleanup scans the arp cache and the ndisc cache to find there 
> neighbours of bond0 which refer also to the relevant ibX. Also, when ib_ipoib module is
> unloaded, the neighbour destructor must be set to NULL because the neighbour function is in
> ib_ipoib.
> For this neigh table cleanup, it is required to export the symbol nd_tbl just like the symbol arp_tbl is.

I wonder about this: is it really true that any allocated neighbour is always in
either arp_tbl or nd_tbl? For example, could some code have called neigh_hold
and retained a neighbour that is not in either one of these tables?

> During my tests I found that when running 
> 
> 	1. modprobe -r ib_mthca (to delete IPoIB interfaces)
> 	2. ping somewhere on the subnet of bond0
> 
> I get this stack dump (which ends with kernel death)
> 	 [<ffffffff8037ff32>] skb_under_panic+0x5c/0x60
> 	 [<ffffffff882e00c2>] :ib_ipoib:ipoib_hard_header+0xa6/0xc0
> 	 [<ffffffff803c3c98>] arp_create+0x120/0x226
> 	 [<ffffffff803c3dc3>] arp_send+0x25/0x3b
> 	 [<ffffffff803c466a>] arp_solicit+0x186/0x195
> 	 [<ffffffff8038c0ac>] neigh_timer_handler+0x2b5/0x309
> 	 [<ffffffff8038bdf7>] neigh_timer_handler+0x0/0x309
> 	 [<ffffffff80239599>] run_timer_softirq+0x130/0x19e
> 	 [<ffffffff80235fcc>] __do_softirq+0x55/0xc3
> 	 [<ffffffff8020acac>] call_softirq+0x1c/0x28
> 	 [<ffffffff8020c02b>] do_softirq+0x2c/0x7d
> 	 [<ffffffff8021864a>] smp_apic_timer_interrupt+0x57/0x6a
> 	 [<ffffffff80208e19>] mwait_idle+0x0/0x45
> 	 [<ffffffff8020a756>] apic_timer_interrupt+0x66/0x70
> 	 <EOI>  [<ffffffff80208e5b>] mwait_idle+0x42/0x45
> 	 [<ffffffff80208db1>] cpu_idle+0x8b/0xae
> 	 [<ffffffff80217d60>] start_secondary+0x47f/0x48f
> 
> The only way I found to avoid this (for now) is to check skb headroom in
> ipoib_hard_header. I guess that this safety check doesn't harm regular IPoIB 
> operation and it seems to solve my problem. However, I would be happy to hear what
> others think of this last issue.

As I said, this seems to indicate a problem in the bonding code.
But what will happen after you error out in ipoib_hard_header?
Is the packet dropped? What might break as a result?

> I would really appreciate comments.
> 
> thanks
> 
>  -MoniS

------------------------------------------------------------------------------
diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h b/drivers/infiniband/ulp/ipoib/ipoib.h
index 07deee8..31bc6d8 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib.h
+++ b/drivers/infiniband/ulp/ipoib/ipoib.h
@@ -216,6 +216,7 @@ struct ipoib_neigh {
 	struct sk_buff_head queue;
 
 	struct neighbour   *neighbour;
+	struct net_device *dev;
 
 	struct list_head    list;
 };
@@ -232,7 +233,8 @@ static inline struct ipoib_neigh **to_ip
 				     INFINIBAND_ALEN, sizeof(void *));
 }
 
-struct ipoib_neigh *ipoib_neigh_alloc(struct neighbour *neigh);
+struct ipoib_neigh *ipoib_neigh_alloc(struct neighbour *neigh,
+				      struct net_device *dev);
 void ipoib_neigh_free(struct net_device *dev, struct ipoib_neigh *neigh);
 
 extern struct workqueue_struct *ipoib_workqueue;
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c
index 705eb1d..0e3953e 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
@@ -48,6 +48,8 @@ #include <linux/ip.h>
 #include <linux/in.h>
 
 #include <net/dst.h>
+#include <net/arp.h>
+#include <net/ndisc.h>
 
 #define IPOIB_QPN(ha) (be32_to_cpup((__be32 *) ha) & 0xffffff)
 
@@ -70,6 +72,7 @@ module_param_named(debug_level, ipoib_de
 MODULE_PARM_DESC(debug_level, "Enable debug tracing if > 0");
 #endif
 
+static int ipoib_at_exit = 0;
 struct ipoib_path_iter {
 	struct net_device *dev;
 	struct ipoib_path  path;
@@ -490,7 +493,7 @@ static void neigh_add_path(struct sk_buf
 	struct ipoib_path *path;
 	struct ipoib_neigh *neigh;
 
-	neigh = ipoib_neigh_alloc(skb->dst->neighbour);
+	neigh = ipoib_neigh_alloc(skb->dst->neighbour, skb->dev);
 	if (!neigh) {
 		++priv->stats.tx_dropped;
 		dev_kfree_skb_any(skb);
@@ -735,6 +738,9 @@ static int ipoib_hard_header(struct sk_b
 {
 	struct ipoib_header *header;
 
+	if (skb_headroom(skb) < sizeof *header) {
+		return -1;
+	}
 	header = (struct ipoib_header *) skb_push(skb, sizeof *header);
 
 	header->proto = htons(type);
@@ -746,8 +752,11 @@ static int ipoib_hard_header(struct sk_b
 	 * figure out where to send the packet later.
 	 */
 	if ((!skb->dst || !skb->dst->neighbour) && daddr) {
-		struct ipoib_pseudoheader *phdr =
-			(struct ipoib_pseudoheader *) skb_push(skb, sizeof *phdr);
+		struct ipoib_pseudoheader *phdr = NULL;
+		if (skb_headroom(skb) < sizeof *phdr) {
+			return -1;
+		}
+		phdr = (struct ipoib_pseudoheader *) skb_push(skb, sizeof *phdr);
 		memcpy(phdr->hwaddr, daddr, INFINIBAND_ALEN);
 	}
 
@@ -769,32 +778,69 @@ static void ipoib_set_mcast_list(struct 
 static void ipoib_neigh_destructor(struct neighbour *n)
 {
 	struct ipoib_neigh *neigh;
-	struct ipoib_dev_priv *priv = netdev_priv(n->dev);
+	struct ipoib_dev_priv *priv;
 	unsigned long flags;
 	struct ipoib_ah *ah = NULL;
 
-	ipoib_dbg(priv,
-		  "neigh_destructor for %06x " IPOIB_GID_FMT "\n",
-		  IPOIB_QPN(n->ha),
-		  IPOIB_GID_RAW_ARG(n->ha + 4));
-
-	spin_lock_irqsave(&priv->lock, flags);
 
 	neigh = *to_ipoib_neigh(n);
 	if (neigh) {
+		priv = netdev_priv(neigh->dev);
+		ipoib_dbg(priv,
+			  "neigh_destructor for %06x " IPOIB_GID_FMT "\n",
+			  IPOIB_QPN(n->ha),
+			  IPOIB_GID_RAW_ARG(n->ha + 4));
+
+		spin_lock_irqsave(&priv->lock, flags);
 		if (neigh->ah)
 			ah = neigh->ah;
 		list_del(&neigh->list);
 		ipoib_neigh_free(n->dev, neigh);
+		spin_unlock_irqrestore(&priv->lock, flags);
 	}
-
-	spin_unlock_irqrestore(&priv->lock, flags);
-
 	if (ah)
 		ipoib_put_ah(ah);
 }
 
-struct ipoib_neigh *ipoib_neigh_alloc(struct neighbour *neighbour)
+static void ipoib_neigh_tbl_cleanup_master(struct neigh_table *tbl,
+					   struct net_device* master,
+					   struct net_device* slave)
+{
+	int i;
+	struct ipoib_neigh *neigh;
+
+	write_lock_bh(&tbl->lock);
+	for (i = 0; i <= tbl->hash_mask; i++) {
+		struct neighbour *n, **np;
+
+		np = &tbl->hash_buckets[i];
+		while ((n = *np) != NULL) {
+			write_lock(&n->lock);
+			if (n->dev == master) {
+				neigh = *to_ipoib_neigh(n);
+				if (neigh && (neigh->dev == slave)){
+					if (ipoib_at_exit)
+						n->parms->neigh_destructor = NULL;
+					ipoib_neigh_destructor(n);
+				}
+			}
+			write_unlock(&n->lock);
+			np = &n->next;
+		}
+	}
+	write_unlock_bh(&tbl->lock);
+}
+
+static void ipoib_neigh_cleanup_by_master(struct net_device* master,struct net_device* slave){
+	netif_stop_queue(slave);
+	if (master) {
+		ipoib_neigh_tbl_cleanup_master(&arp_tbl,master, slave);
+		ipoib_neigh_tbl_cleanup_master(&nd_tbl,master, slave);
+	}
+}
+
+struct ipoib_neigh *ipoib_neigh_alloc(struct neighbour *neighbour,
+				      struct net_device *dev)
 {
 	struct ipoib_neigh *neigh;
 
@@ -803,6 +849,7 @@ struct ipoib_neigh *ipoib_neigh_alloc(st
 		return NULL;
 
 	neigh->neighbour = neighbour;
+	neigh->dev = dev;
 	*to_ipoib_neigh(neighbour) = neigh;
 	skb_queue_head_init(&neigh->queue);
 
@@ -874,6 +921,7 @@ void ipoib_dev_cleanup(struct net_device
 
 	/* Delete any child interfaces first */
 	list_for_each_entry_safe(cpriv, tcpriv, &priv->child_intfs, list) {
+		ipoib_neigh_cleanup_by_master(cpriv->dev->master, cpriv->dev);
 		unregister_netdev(cpriv->dev);
 		ipoib_dev_cleanup(cpriv->dev);
 		free_netdev(cpriv->dev);
@@ -1159,6 +1207,7 @@ static void ipoib_remove_one(struct ib_d
 		ib_unregister_event_handler(&priv->event_handler);
 		flush_scheduled_work();
 
+		ipoib_neigh_cleanup_by_master(priv->dev->master, priv->dev);
 		unregister_netdev(priv->dev);
 		ipoib_dev_cleanup(priv->dev);
 		free_netdev(priv->dev);
@@ -1217,6 +1266,8 @@ err_fs:
 
 static void __exit ipoib_cleanup_module(void)
 {
+	ipoib_at_exit = 1;
+
 	ib_unregister_client(&ipoib_client);
 	ib_sa_unregister_client(&ipoib_sa_client);
 	ipoib_unregister_debugfs();
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
index b04b72c..a41a949 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
@@ -774,7 +774,7 @@ out:
 		if (skb->dst            &&
 		    skb->dst->neighbour &&
 		    !*to_ipoib_neigh(skb->dst->neighbour)) {
-			struct ipoib_neigh *neigh = ipoib_neigh_alloc(skb->dst->neighbour);
+			struct ipoib_neigh *neigh = ipoib_neigh_alloc(skb->dst->neighbour, skb->dev);
 
 			if (neigh) {
 				kref_get(&mcast->ah->ref);
diff --git a/net/ipv6/ndisc.c b/net/ipv6/ndisc.c
index 6a9f616..557be98 100644
--- a/net/ipv6/ndisc.c
+++ b/net/ipv6/ndisc.c
@@ -153,6 +153,7 @@ struct neigh_table nd_tbl = {
 	.gc_thresh2 =	 512,
 	.gc_thresh3 =	1024,
 };
+EXPORT_SYMBOL(nd_tbl);
 
 /* ND options */
 struct ndisc_options {


-- 
MST


From diego.guella at sircomtech.com  Mon Feb 26 23:10:50 2007
From: diego.guella at sircomtech.com (Diego Guella)
Date: Tue, 27 Feb 2007 08:10:50 +0100
Subject: [openib-general] Fwd: Address List Change Now Scheduled for
 Wednesday, 2/28/2007
References: <3D84A59A1AD3584DA02AEAD240E8863F03BC471E@ES22SNLNT.srn.sandia.gov>
	<B53C816F-F33B-4A88-98BE-E7F20AF6833B@cisco.com>
Message-ID: <009301c75a3e$7165aef0$05c8a8c0@DIEGO>

Should I do something to get subscribed to the new mailing list or I will be 
automatically subscribed?
The only change is that I have to write messages to 
general at lists.openfabrics.org, correct?


----- Original Message ----- 
From: "Jeff Squyres" <jsquyres at cisco.com>
To: "OpenFabrics General" <openib-general at openib.org>
Sent: Monday, February 26, 2007 6:05 PM
Subject: [openib-general] Fwd: Address List Change Now Scheduled for 
Wednesday, 2/28/2007


> FYI.  In case you missed it the Nth time: THIS LIST IS CHANGING ON
> WEDNESDAY 2/28/2007 (2 days from now).  Really.  For sure this time.
> Trust me.  Honest.
>
> Please update your addressbooks!
>
>
>
> Begin forwarded message:
>
>> From: "Lee, Michael Paichi" <mplee at sandia.gov>
>> Date: February 22, 2007 11:44:25 AM EST
>> To: "Jeff Squyres" <jsquyres at cisco.com>, "Michael S. Tsirkin"
>> <mst at mellanox.co.il>
>> Cc: "OpenFabrics General" <openib-general at openib.org>
>> Subject: Address List Change Now Scheduled for Wednesday, 2/28/2007
>>
>> The list will now be migrated on Wednesday, 2/28/2007.
>>
>> List address:         general at lists.openfabrics.org
>> Updated change-date:  Wednesday, 2/28/2007
>>
>> Michael
>
>
> -- 
> Jeff Squyres
> Server Virtualization Business Unit
> Cisco Systems
>
>
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
>
> To unsubscribe, please visit 
> http://openib.org/mailman/listinfo/openib-general
> 


From philippe_bernadat at hp.com  Tue Feb 27 00:33:11 2007
From: philippe_bernadat at hp.com (Bernadat, Philippe)
Date: Tue, 27 Feb 2007 09:33:11 +0100
Subject: [openib-general] failure to create an FMR mapping 1K pages on
	memfree
In-Reply-To: <adawt24w787.fsf@cisco.com>
References: <15ddcffd0702261104x6df977b6g9e4ca0071c8489ad@mail.gmail.com><15ddcffd0702261105s377ad165h7bfe258f69ede152@mail.gmail.com>
	<adawt24w787.fsf@cisco.com>
Message-ID: <3F3894AC7A13B04E83CEBC95CFD3047E05B06D13@idaexc03.emea.cpqcorp.net>

Roland is right, I checked were mthca_fmr_alloc() was failing.
Mtts is one page of pointers, so max is 512.
Does work with 512.
I checked, mthca_alloc_fmr returns EINVAL, then ib_create_fmr_pool
returns ENOMEM.

So this isn't a hardware limitation since the Voltaire Stack managed to
handle 1024 pages on the same board.
Is there a way to fix OFED ? 

Philippe

> -----Original Message-----
> From: Roland Dreier [mailto:rdreier at cisco.com] 
> Sent: Tuesday, February 27, 2007 5:33 AM
> To: Or Gerlitz
> Cc: Bernadat, Philippe; openib
> Subject: Re: failure to create an FMR mapping 1K pages on memfree
> 
>  > I have got a report on failure to create FMR mapping 1K 
> pages (that is
>  > 4MB) on memfree.
>  >
>  > I don't have the exact details (ie if Arbel/Sinai / what FW  / etc)
>  > nor which exact check fails in
>  > mthca_fmr_alloc, but what's clear is that the latter 
> function returns
>  > -ENOMEM when attr.max_pages is 1024 and it works fine when
>  > attr.max_pages is 256.
>  >
>  > Is this failure clear to you? if yes, does a HW or FW 
> limit is being
>  > hit or its a driver design issue?
> 
> Is it really returning -ENOMEM?  It seems much more likely that you
> are hitting the code
> 
> 	/* For Arbel, all MTTs must fit in the same page. */
> 	if (mthca_is_memfree(dev) &&
> 	    mr->attr.max_pages * sizeof *mr->mem.arbel.mtts > PAGE_SIZE)
> 		return -EINVAL;
> 
> I guess you could call this limit a driver design issue.
> 
>  - R.
> 


From vlad at lists.openfabrics.org  Tue Feb 27 02:29:38 2007
From: vlad at lists.openfabrics.org (vlad at lists.openfabrics.org)
Date: Tue, 27 Feb 2007 02:29:38 -0800 (PST)
Subject: [openib-general] ofa_1_2_kernel 20070227-0200 daily build status
Message-ID: <20070227102938.74A69E60803@openfabrics.org>

This email was generated automatically, please do not reply


Common build parameters:  --with-ipoib-mod --with-sdp-mod --with-srp-mod --with-user_mad-mod --with-user_access-mod --with-mthca-mod --with-core-mod --with-addr_trans-mod --with-cxgb3-mod 

Passed:
Passed on i686 with 2.6.15-23-server
Passed on i686 with linux-2.6.12
Passed on i686 with linux-2.6.13
Passed on i686 with linux-2.6.17
Passed on i686 with linux-2.6.14
Passed on i686 with linux-2.6.16
Passed on i686 with linux-2.6.15
Passed on i686 with linux-2.6.18
Passed on i686 with linux-2.6.19
Passed on x86_64 with linux-2.6.12
Passed on x86_64 with linux-2.6.20
Passed on powerpc with linux-2.6.19
Passed on powerpc with linux-2.6.17
Passed on x86_64 with linux-2.6.19
Passed on powerpc with linux-2.6.18
Passed on x86_64 with linux-2.6.18
Passed on x86_64 with linux-2.6.16
Passed on ppc64 with linux-2.6.19
Passed on x86_64 with linux-2.6.14
Passed on x86_64 with linux-2.6.15
Passed on x86_64 with linux-2.6.17
Passed on ia64 with linux-2.6.18
Passed on powerpc with linux-2.6.16
Passed on x86_64 with linux-2.6.13
Passed on ia64 with linux-2.6.19
Passed on ppc64 with linux-2.6.12
Passed on ppc64 with linux-2.6.18
Passed on ppc64 with linux-2.6.16
Passed on powerpc with linux-2.6.12
Passed on powerpc with linux-2.6.14
Passed on x86_64 with linux-2.6.9-42.ELsmp
Passed on ppc64 with linux-2.6.17
Passed on ia64 with linux-2.6.17
Passed on powerpc with linux-2.6.13
Passed on ppc64 with linux-2.6.14
Passed on x86_64 with linux-2.6.5-7.244-smp
Passed on x86_64 with linux-2.6.16.21-0.8-smp
Passed on ia64 with linux-2.6.12
Passed on ia64 with linux-2.6.16
Passed on ppc64 with linux-2.6.15
Passed on powerpc with linux-2.6.15
Passed on ia64 with linux-2.6.15
Passed on ia64 with linux-2.6.13
Passed on ppc64 with linux-2.6.13
Passed on ia64 with linux-2.6.14
Passed on x86_64 with linux-2.6.18-1.2798.fc6
Passed on ia64 with linux-2.6.16.21-0.8-default

Failed:
Build failed on x86_64 with linux-2.6.9-22.ELsmp
Log:
/home/vlad/tmp/ofa_1_2_kernel-20070227-0200_linux-2.6.9-22.ELsmp_x86_64_check/drivers/net/cxgb3/vsc8211.c:167: error: ‘ADVERTISE_PAUSE_CAP’ undeclared (first use in this function)
/home/vlad/tmp/ofa_1_2_kernel-20070227-0200_linux-2.6.9-22.ELsmp_x86_64_check/drivers/net/cxgb3/vsc8211.c:167: error: (Each undeclared identifier is reported only once
/home/vlad/tmp/ofa_1_2_kernel-20070227-0200_linux-2.6.9-22.ELsmp_x86_64_check/drivers/net/cxgb3/vsc8211.c:167: error: for each function it appears in.)
/home/vlad/tmp/ofa_1_2_kernel-20070227-0200_linux-2.6.9-22.ELsmp_x86_64_check/drivers/net/cxgb3/vsc8211.c:170: error: ‘ADVERTISE_PAUSE_ASYM’ undeclared (first use in this function)
make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070227-0200_linux-2.6.9-22.ELsmp_x86_64_check/drivers/net/cxgb3/vsc8211.o] Error 1
make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070227-0200_linux-2.6.9-22.ELsmp_x86_64_check/drivers/net/cxgb3] Error 2
make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070227-0200_linux-2.6.9-22.ELsmp_x86_64_check] Error 2
make[1]: Leaving directory `/home/vlad/kernel.org/x86_64/linux-2.6.9-22.ELsmp'
make: *** [kernel] Error 2
----------------------------------------------------------------------------------
Build failed on x86_64 with linux-2.6.9-34.ELsmp
Log:
/home/vlad/tmp/ofa_1_2_kernel-20070227-0200_linux-2.6.9-34.ELsmp_x86_64_check/drivers/net/cxgb3/cxgb3_offload.c: In function ‘add_adapter’:
/home/vlad/tmp/ofa_1_2_kernel-20070227-0200_linux-2.6.9-34.ELsmp_x86_64_check/drivers/net/cxgb3/cxgb3_offload.c:1061: error: ‘adapter_list_lock’ undeclared (first use in this function)
/home/vlad/tmp/ofa_1_2_kernel-20070227-0200_linux-2.6.9-34.ELsmp_x86_64_check/drivers/net/cxgb3/cxgb3_offload.c: In function ‘remove_adapter’:
/home/vlad/tmp/ofa_1_2_kernel-20070227-0200_linux-2.6.9-34.ELsmp_x86_64_check/drivers/net/cxgb3/cxgb3_offload.c:1068: error: ‘adapter_list_lock’ undeclared (first use in this function)
make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070227-0200_linux-2.6.9-34.ELsmp_x86_64_check/drivers/net/cxgb3/cxgb3_offload.o] Error 1
make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070227-0200_linux-2.6.9-34.ELsmp_x86_64_check/drivers/net/cxgb3] Error 2
make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070227-0200_linux-2.6.9-34.ELsmp_x86_64_check] Error 2
make[1]: Leaving directory `/home/vlad/kernel.org/x86_64/linux-2.6.9-34.ELsmp'
make: *** [kernel] Error 2
----------------------------------------------------------------------------------


From jsquyres at cisco.com  Tue Feb 27 03:08:31 2007
From: jsquyres at cisco.com (Jeff Squyres)
Date: Tue, 27 Feb 2007 06:08:31 -0500
Subject: [openib-general] Fwd: Address List Change Now Scheduled for
 Wednesday, 2/28/2007
In-Reply-To: <009301c75a3e$7165aef0$05c8a8c0@DIEGO>
References: <3D84A59A1AD3584DA02AEAD240E8863F03BC471E@ES22SNLNT.srn.sandia.gov>
	<B53C816F-F33B-4A88-98BE-E7F20AF6833B@cisco.com>
	<009301c75a3e$7165aef0$05c8a8c0@DIEGO>
Message-ID: <FF3ECCDD-D976-46EF-9F97-644B94C6E32D@cisco.com>

On Feb 27, 2007, at 2:10 AM, Diego Guella wrote:

> Should I do something to get subscribed to the new mailing list or  
> I will be automatically subscribed?

There is nothing that you need to do; the list is simply being  
migrated from one server to another and changing names in the process.

> The only change is that I have to write messages to  
> general at lists.openfabrics.org, correct?

Correct.  There will be aliases in place to redirect messages from  
the old name to the new name, too.  So the warning is more about  
updating e-mail client filters, etc.


>
>
>
> ----- Original Message ----- From: "Jeff Squyres" <jsquyres at cisco.com>
> To: "OpenFabrics General" <openib-general at openib.org>
> Sent: Monday, February 26, 2007 6:05 PM
> Subject: [openib-general] Fwd: Address List Change Now Scheduled  
> for Wednesday, 2/28/2007
>
>
>> FYI.  In case you missed it the Nth time: THIS LIST IS CHANGING ON
>> WEDNESDAY 2/28/2007 (2 days from now).  Really.  For sure this time.
>> Trust me.  Honest.
>>
>> Please update your addressbooks!
>>
>>
>>
>> Begin forwarded message:
>>
>>> From: "Lee, Michael Paichi" <mplee at sandia.gov>
>>> Date: February 22, 2007 11:44:25 AM EST
>>> To: "Jeff Squyres" <jsquyres at cisco.com>, "Michael S. Tsirkin"
>>> <mst at mellanox.co.il>
>>> Cc: "OpenFabrics General" <openib-general at openib.org>
>>> Subject: Address List Change Now Scheduled for Wednesday, 2/28/2007
>>>
>>> The list will now be migrated on Wednesday, 2/28/2007.
>>>
>>> List address:         general at lists.openfabrics.org
>>> Updated change-date:  Wednesday, 2/28/2007
>>>
>>> Michael
>>
>>
>> -- 
>> Jeff Squyres
>> Server Virtualization Business Unit
>> Cisco Systems
>>
>>
>> _______________________________________________
>> openib-general mailing list
>> openib-general at openib.org
>> http://openib.org/mailman/listinfo/openib-general
>>
>> To unsubscribe, please visit http://openib.org/mailman/listinfo/ 
>> openib-general


-- 
Jeff Squyres
Server Virtualization Business Unit
Cisco Systems


From cppbala at yahoo.com  Tue Feb 27 03:30:48 2007
From: cppbala at yahoo.com (Bala)
Date: Tue, 27 Feb 2007 03:30:48 -0800 (PST)
Subject: [openib-general] ib0 shows MAC address as 00-00-00.... is it
	normal??
Message-ID: <87194.51250.qm@web35102.mail.mud.yahoo.com>

Hi All,
       We have build and installed OFED-1.1 on
RHEL-4 machine, using ipoib we set the IPs
for the interface and able to ping each other,
but my ifconfig shows ib0 MAC address as
shown below
"00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00"

--------------
ib0       Link encap:UNSPEC  HWaddr
00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00
          inet addr:192.168.0.1  Bcast:192.168.0.255 
Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:2044 
Metric:1
          RX packets:271465 errors:0 dropped:0
overruns:0 frame:0
          TX packets:1444336 errors:0 dropped:0
overruns:0 carrier:0
          collisions:0 txqueuelen:128
          RX bytes:15664386 (14.9 MiB)  TX
bytes:2718736764 (2.5 GiB)
-------------------

pls let me know is it normal, is there any way
to get the real hw/mac address.

regards,
Bala.


____________________________________________________________________________________
Be a PS3 game guru.
Get your game face on with the latest PS3 news and previews at Yahoo! Games.
http://videogames.yahoo.com/platform?platform=120121


From cppbala at yahoo.com  Tue Feb 27 03:35:03 2007
From: cppbala at yahoo.com (Bala)
Date: Tue, 27 Feb 2007 03:35:03 -0800 (PST)
Subject: [openib-general] mpi over IB
Message-ID: <656981.21379.qm@web35115.mail.mud.yahoo.com>

Hi All,
       We have build and installed OFED-1.1
on RHEL-4 machines, while compiling selected
mpi support, pls through some light on how
to use mpi over IB interface, using what 
modules etc. or do we need to install separate
mpi software to use.

thanks in advance,
-bala-


____________________________________________________________________________________
8:00? 8:25? 8:40? Find a flick in no time 
with the Yahoo! Search movie showtime shortcut.
http://tools.search.yahoo.com/shortcuts/#news


From jsquyres at cisco.com  Tue Feb 27 03:43:36 2007
From: jsquyres at cisco.com (Jeff Squyres)
Date: Tue, 27 Feb 2007 06:43:36 -0500
Subject: [openib-general] mpi over IB
In-Reply-To: <656981.21379.qm@web35115.mail.mud.yahoo.com>
References: <656981.21379.qm@web35115.mail.mud.yahoo.com>
Message-ID: <E9DBD764-273A-4A19-A28A-4A4E7CBFAACA@cisco.com>

During the installation process, the OFED installer should have asked  
you if you wanted to install Open MPI and/or MVAPICH.  Both of these  
MPI implementations are capable of communicating natively over the IB  
interface.

Running MPI applications with Open MPI should natively choose the IB  
interface at run time if your IB network is up and running properly  
(e.g., try running ibv_devinfo to ensure that ports are listed in the  
PORT_ACTIVE state, etc.).  I assume that the same is true with  
MVAPICH as well.


On Feb 27, 2007, at 6:35 AM, Bala wrote:

> Hi All,
>        We have build and installed OFED-1.1
> on RHEL-4 machines, while compiling selected
> mpi support, pls through some light on how
> to use mpi over IB interface, using what
> modules etc. or do we need to install separate
> mpi software to use.
>
> thanks in advance,
> -bala-
>
>
>
> ______________________________________________________________________ 
> ______________
> 8:00? 8:25? 8:40? Find a flick in no time
> with the Yahoo! Search movie showtime shortcut.
> http://tools.search.yahoo.com/shortcuts/#news
>
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
>
> To unsubscribe, please visit http://openib.org/mailman/listinfo/ 
> openib-general


-- 
Jeff Squyres
Server Virtualization Business Unit
Cisco Systems


From halr at voltaire.com  Tue Feb 27 03:38:08 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 27 Feb 2007 06:38:08 -0500
Subject: [openib-general] ib0 shows MAC address as 00-00-00.... is it
 normal??
In-Reply-To: <87194.51250.qm@web35102.mail.mud.yahoo.com>
References: <87194.51250.qm@web35102.mail.mud.yahoo.com>
Message-ID: <1172576284.4102.346987.camel@hal.voltaire.com>

On Tue, 2007-02-27 at 06:30, Bala wrote:
> Hi All,
>        We have build and installed OFED-1.1 on
> RHEL-4 machine, using ipoib we set the IPs
> for the interface and able to ping each other,
> but my ifconfig shows ib0 MAC address as
> shown below
> "00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00"
> 
> --------------
> ib0       Link encap:UNSPEC  HWaddr
> 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00
>           inet addr:192.168.0.1  Bcast:192.168.0.255 
> Mask:255.255.255.0
>           UP BROADCAST RUNNING MULTICAST  MTU:2044 
> Metric:1
>           RX packets:271465 errors:0 dropped:0
> overruns:0 frame:0
>           TX packets:1444336 errors:0 dropped:0
> overruns:0 carrier:0
>           collisions:0 txqueuelen:128
>           RX bytes:15664386 (14.9 MiB)  TX
> bytes:2718736764 (2.5 GiB)
> -------------------
> 
> pls let me know is it normal,

Depends on the (truncated) guid for the HCA port.

>  is there any way
> to get the real hw/mac address.

ip addr show ib0

-- Hal

> regards,
> Bala.
> 
> 
>  
> ____________________________________________________________________________________
> Be a PS3 game guru.
> Get your game face on with the latest PS3 news and previews at Yahoo! Games.
> http://videogames.yahoo.com/platform?platform=120121
> 
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
> 


From monis at voltaire.com  Tue Feb 27 03:54:59 2007
From: monis at voltaire.com (Moni Shoua)
Date: Tue, 27 Feb 2007 13:54:59 +0200
Subject: [openib-general] [RFC] [PATCH v2] IB/ipoib: Add bonding support
	to IPoIB
In-Reply-To: <20070227060245.GI12919@mellanox.co.il>
References: <45E313D2.70909@voltaire.com>
	<20070227060245.GI12919@mellanox.co.il>
Message-ID: <45E41C13.8090300@voltaire.com>


Thanks for the comments 

>> To fix it, this patch adds a dev field to struct ipoib_neigh which is used
>> instead of the struct neighbour dev one.
> 
> It seems that in this design, if multiple ipoib interfaces are present, we might
> get an skb such that skb->dev will be different from the new dev field in struct
> ipoib_neigh.
> 
> It seems that the result will be that the packet will be sent on a wrong interface.
> Right?
> 
I don't see how. The field dev in ipoib_neigh doesn't take part in interface selection.
As I see it, skb travels this path:
1. Passed to bond_dev->hard_start_xmit
2. bond_dev->hard_start_xmit chooses the current active interface, changes skb->dev and enqueues it back for xmittig.

>> In addition, if an IPoIB device is removed before bonding is unloaded it may 
>> cause bond0 neighbours (neighbours that point to bond0) to exist after the IPoIB
>> device no longer exist. This is why a neighbour cleanup is required during device 
>> cleanup. This cleanup scans the arp cache and the ndisc cache to find there 
>> neighbours of bond0 which refer also to the relevant ibX. Also, when ib_ipoib module is
>> unloaded, the neighbour destructor must be set to NULL because the neighbour function is in
>> ib_ipoib.
>> For this neigh table cleanup, it is required to export the symbol nd_tbl just like the symbol arp_tbl is.
> 
> I wonder about this: is it really true that any allocated neighbour is always in
> either arp_tbl or nd_tbl? For example, could some code have called neigh_hold
> and retained a neighbour that is not in either one of these tables?
> 
I got the assumption about neighbours living in one of these 2 tables from observation and code reading.
I preferred that that on keeping track of all ipoib_neighs and putting them in a list. However, I could 
do that instead of neigh_table scanning. Do you think it's better?
For the example... I didn't understand it. Could you please explain?

>> During my tests I found that when running 
>>
>> 	1. modprobe -r ib_mthca (to delete IPoIB interfaces)
>> 	2. ping somewhere on the subnet of bond0
>>
>> I get this stack dump (which ends with kernel death)
>> 	 [<ffffffff8037ff32>] skb_under_panic+0x5c/0x60
>> 	 [<ffffffff882e00c2>] :ib_ipoib:ipoib_hard_header+0xa6/0xc0
>> 	 [<ffffffff803c3c98>] arp_create+0x120/0x226
>> 	 [<ffffffff803c3dc3>] arp_send+0x25/0x3b
>> 	 [<ffffffff803c466a>] arp_solicit+0x186/0x195
>> 	 [<ffffffff8038c0ac>] neigh_timer_handler+0x2b5/0x309
>> 	 [<ffffffff8038bdf7>] neigh_timer_handler+0x0/0x309
>> 	 [<ffffffff80239599>] run_timer_softirq+0x130/0x19e
>> 	 [<ffffffff80235fcc>] __do_softirq+0x55/0xc3
>> 	 [<ffffffff8020acac>] call_softirq+0x1c/0x28
>> 	 [<ffffffff8020c02b>] do_softirq+0x2c/0x7d
>> 	 [<ffffffff8021864a>] smp_apic_timer_interrupt+0x57/0x6a
>> 	 [<ffffffff80208e19>] mwait_idle+0x0/0x45
>> 	 [<ffffffff8020a756>] apic_timer_interrupt+0x66/0x70
>> 	 <EOI>  [<ffffffff80208e5b>] mwait_idle+0x42/0x45
>> 	 [<ffffffff80208db1>] cpu_idle+0x8b/0xae
>> 	 [<ffffffff80217d60>] start_secondary+0x47f/0x48f
>>
>> The only way I found to avoid this (for now) is to check skb headroom in
>> ipoib_hard_header. I guess that this safety check doesn't harm regular IPoIB 
>> operation and it seems to solve my problem. However, I would be happy to hear what
>> others think of this last issue.
> 
> As I said, this seems to indicate a problem in the bonding code.
> But what will happen after you error out in ipoib_hard_header?
> Is the packet dropped? What might break as a result?
> 
I will check the hard_header_len issue in the bonding code more carefully. From first look
it seems that bonding does borrow the hard_header_len.
Also, my checks show that it is safe to return with error from hard_header().
For example,  in neigh_connected_output:

        err = dev->hard_header(skb, dev, ntohs(skb->protocol),
                               neigh->ha, NULL, skb->len);
        read_unlock_bh(&neigh->lock);
        if (err >= 0)
                err = neigh->ops->queue_xmit(skb);
        else {
                err = -EINVAL;
                kfree_skb(skb);
 
>> I would really appreciate comments.
>>
>> thanks
>>
>>  -MoniS
> 


From monil at voltaire.com  Tue Feb 27 05:02:50 2007
From: monil at voltaire.com (Moni Levy)
Date: Tue, 27 Feb 2007 15:02:50 +0200
Subject: [openib-general] [RFC] IB/ipoib: Asynchronous events delivered
 without port parameter.
Message-ID: <6a122cc00702270502h27d90515k117bf23ea3f31f4d@mail.gmail.com>

Hello,
    I did a short code review of the ipoib code concentrating on
partitioning support and I mentioned that the asynchronous events
handler in the ipoib code does not take the port number reported in
the event record into consideration. The effect of that is that all of
the ib# devices related to that specific HCA are flushed when it seems
to me that only the relevant port one should be. Is that done on
purpose, or am I missing something ?

Thanks,
Moni

p.s. I'm working on a patch that should solve another issue caused by
PKEY reordering & ipoib behavior and the above issue further
complicates things for me.


From mst at mellanox.co.il  Tue Feb 27 05:51:46 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Tue, 27 Feb 2007 15:51:46 +0200
Subject: [openib-general] [RFC] IB/ipoib: Asynchronous events delivered
 without port parameter.
In-Reply-To: <6a122cc00702270502h27d90515k117bf23ea3f31f4d@mail.gmail.com>
References: <6a122cc00702270502h27d90515k117bf23ea3f31f4d@mail.gmail.com>
Message-ID: <20070227135131.GA4437@mellanox.co.il>

> Quoting Moni Levy <monil at voltaire.com>:
> Subject: [RFC] IB/ipoib: Asynchronous events delivered without port parameter.
> 
> Hello,
>     I did a short code review of the ipoib code concentrating on
> partitioning support and I mentioned that the asynchronous events
> handler in the ipoib code does not take the port number reported in
> the event record into consideration. The effect of that is that all of
> the ib# devices related to that specific HCA are flushed when it seems
> to me that only the relevant port one should be. Is that done on
> purpose, or am I missing something ?
> 
> Thanks,
> Moni
> 
> p.s. I'm working on a patch that should solve another issue caused by
> PKEY reordering & ipoib behavior and the above issue further
> complicates things for me.

If true, why is this a problem?

-- 
MST


From swise at opengridcomputing.com  Tue Feb 27 06:23:30 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Tue, 27 Feb 2007 08:23:30 -0600
Subject: [openib-general] HOWTO check ofa_kernel build from your git tree
In-Reply-To: <1172502465.21382.44.camel@vladsk-laptop>
References: <1172502465.21382.44.camel@vladsk-laptop>
Message-ID: <1172586210.11870.16.camel@stevo-desktop>

Where are all the kernel src trees on ssh. openfabrics.org?

I would like to build against specific trees that are failing with
cxgb3...

Also:  

what RH distro ships:

linux-2.6.9-22.ELsmp

and

linux-2.6.9-34.ELsmp


Thanks,

Steve.


On Mon, 2007-02-26 at 17:07 +0200, Vladimir Sokolovsky wrote:
> On ssh.openfabrics.org:
> Run
> env git_url=/home/mst/scm/ofed_1_2_devel.git git_branch=ofed_1_2 \
> 	CHECK_LOCAL=yes \
> 	CHECK_KERNEL_ORG=yes \
> 	CHECK_CROSS=yes /home/vlad/scripts/build_ofa_kernel.sh
> 


From monil at voltaire.com  Tue Feb 27 06:29:56 2007
From: monil at voltaire.com (Moni Levy)
Date: Tue, 27 Feb 2007 16:29:56 +0200
Subject: [openib-general] [PATCHv2] IB/ipoib: Fix ipoib handling for pkey
	reordering
Message-ID: <45E44064.4020407@voltaire.com>

This issue was found during partitioning & SM fail over testing. The fix was tested over the weekend with pkey reshuffling, removal and addition every few seconds concurrent with OFED restart. The patch applies on Roland's git tree. 

Changes from v1: 
	* added flush flag to ipoib_ib_dev_stop(), ipoib_ib_dev_down() alike
	* fixed a bug in device extraction from the work struct
	* removed some warnings in case they are caused due to missing PKEY as this seems like a valid flow now.

SM reconfiguration or failover possibly causes a shuffling of the values in the port pkey table. The current implementation only queries for the index of the pkey once, when it creates the device QP and after that moves it into working state, and hence does not address this scenario. Fix this by using the PKEY_CHANGE event as a trigger to reconfigure the device QP.

Signed-off-by: Moni Levy <monil at voltaire.com>
---
 ipoib.h           |    4 +++-
 ipoib_ib.c        |   51 +++++++++++++++++++++++++++++++++++++++++----------
 ipoib_main.c      |    5 +++--
 ipoib_multicast.c |   11 ++++++-----
 ipoib_verbs.c     |    8 +++++++-
 5 files changed, 60 insertions(+), 19 deletions(-)

diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h b/drivers/infiniband/ulp/ipoib/ipoib.h
index 2594db2..d08ecca 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib.h
+++ b/drivers/infiniband/ulp/ipoib/ipoib.h
@@ -205,6 +205,7 @@ struct ipoib_dev_priv {
 	struct delayed_work pkey_task;
 	struct delayed_work mcast_task;
 	struct work_struct flush_task;
+	struct work_struct flush_restart_qp_task;
 	struct work_struct restart_task;
 	struct delayed_work ah_reap_task;
 
@@ -334,12 +335,13 @@ struct ipoib_dev_priv *ipoib_intf_alloc(
 
 int ipoib_ib_dev_init(struct net_device *dev, struct ib_device *ca, int port);
 void ipoib_ib_dev_flush(struct work_struct *work);
+void ipoib_ib_dev_flush_restart_qp(struct work_struct *work);
 void ipoib_ib_dev_cleanup(struct net_device *dev);
 
 int ipoib_ib_dev_open(struct net_device *dev);
 int ipoib_ib_dev_up(struct net_device *dev);
 int ipoib_ib_dev_down(struct net_device *dev, int flush);
-int ipoib_ib_dev_stop(struct net_device *dev);
+int ipoib_ib_dev_stop(struct net_device *dev, int flush);
 
 int ipoib_dev_init(struct net_device *dev, struct ib_device *ca, int port);
 void ipoib_dev_cleanup(struct net_device *dev);
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_ib.c b/drivers/infiniband/ulp/ipoib/ipoib_ib.c
index f2aa923..b0287c1 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_ib.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_ib.c
@@ -415,21 +415,22 @@ int ipoib_ib_dev_open(struct net_device 
 
 	ret = ipoib_init_qp(dev);
 	if (ret) {
-		ipoib_warn(priv, "ipoib_init_qp returned %d\n", ret);
+		if (ret != -ENOENT)
+			ipoib_warn(priv, "ipoib_init_qp returned %d\n", ret);
 		return -1;
 	}
 
 	ret = ipoib_ib_post_receives(dev);
 	if (ret) {
 		ipoib_warn(priv, "ipoib_ib_post_receives returned %d\n", ret);
-		ipoib_ib_dev_stop(dev);
+		ipoib_ib_dev_stop(dev, 1);
 		return -1;
 	}
 
 	ret = ipoib_cm_dev_open(dev);
 	if (ret) {
 		ipoib_warn(priv, "ipoib_ib_post_receives returned %d\n", ret);
-		ipoib_ib_dev_stop(dev);
+		ipoib_ib_dev_stop(dev, 1);
 		return -1;
 	}
 
@@ -508,7 +509,7 @@ static int recvs_pending(struct net_devi
 	return pending;
 }
 
-int ipoib_ib_dev_stop(struct net_device *dev)
+int ipoib_ib_dev_stop(struct net_device *dev, int flush)
 {
 	struct ipoib_dev_priv *priv = netdev_priv(dev);
 	struct ib_qp_attr qp_attr;
@@ -581,7 +582,8 @@ timeout:
 	/* Wait for all AHs to be reaped */
 	set_bit(IPOIB_STOP_REAPER, &priv->flags);
 	cancel_delayed_work(&priv->ah_reap_task);
-	flush_workqueue(ipoib_workqueue);
+	if (flush)
+		flush_workqueue(ipoib_workqueue);
 
 	begin = jiffies;
 
@@ -622,13 +624,17 @@ int ipoib_ib_dev_init(struct net_device 
 	return 0;
 }
 
-void ipoib_ib_dev_flush(struct work_struct *work)
+static void __ipoib_ib_dev_flush(struct ipoib_dev_priv *priv, int restart_qp)
 {
-	struct ipoib_dev_priv *cpriv, *priv =
-		container_of(work, struct ipoib_dev_priv, flush_task);
+	struct ipoib_dev_priv *cpriv;
 	struct net_device *dev = priv->dev;
 
-	if (!test_bit(IPOIB_FLAG_INITIALIZED, &priv->flags) ) {
+	/*
+	 * ipoib_ib_dev_stop() below may not find the PKey and leave the
+	 * IPOIB_FLAG_INITIALIZED flag off so flush in that case with restart_qp
+	 * flag on is Ok.
+	 */
+	if (!test_bit(IPOIB_FLAG_INITIALIZED, &priv->flags) && !restart_qp) {
 		ipoib_dbg(priv, "Not flushing - IPOIB_FLAG_INITIALIZED not set.\n");
 		return;
 	}
@@ -641,6 +647,13 @@ void ipoib_ib_dev_flush(struct work_stru
 	ipoib_dbg(priv, "flushing\n");
 
 	ipoib_ib_dev_down(dev, 0);
+	
+	if (restart_qp) {
+		ipoib_dbg(priv, "restarting the device QP\n");
+		if (test_bit(IPOIB_FLAG_INITIALIZED, &priv->flags) )
+			ipoib_ib_dev_stop(dev, 0);
+		ipoib_ib_dev_open(dev);
+	}
 
 	/*
 	 * The device could have been brought down between the start and when
@@ -655,11 +668,29 @@ void ipoib_ib_dev_flush(struct work_stru
 
 	/* Flush any child interfaces too */
 	list_for_each_entry(cpriv, &priv->child_intfs, list)
-		ipoib_ib_dev_flush(&cpriv->flush_task);
+		__ipoib_ib_dev_flush(cpriv, restart_qp);
 
 	mutex_unlock(&priv->vlan_mutex);
 }
 
+void ipoib_ib_dev_flush(struct work_struct *work)
+{
+	struct ipoib_dev_priv *priv =
+		container_of(work, struct ipoib_dev_priv, flush_task);
+ 	/* We only restart the QP in case of PKEY change event */ 
+        ipoib_dbg(priv, "Flushing %s\n", priv->dev->name);
+ 	__ipoib_ib_dev_flush(priv, 0);
+}
+
+void ipoib_ib_dev_flush_restart_qp(struct work_struct *work)
+{
+	struct ipoib_dev_priv *priv =
+		container_of(work, struct ipoib_dev_priv, flush_restart_qp_task);
+ 	/* We only restart the QP in case of PKEY change event */ 
+        ipoib_dbg(priv, "Flushing %s and restarting it's QP\n", priv->dev->name);
+ 	__ipoib_ib_dev_flush(priv, 1);
+}
+
 void ipoib_ib_dev_cleanup(struct net_device *dev)
 {
 	struct ipoib_dev_priv *priv = netdev_priv(dev);
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c
index 18d27fd..2eab846 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
@@ -107,7 +107,7 @@ int ipoib_open(struct net_device *dev)
 		return -EINVAL;
 
 	if (ipoib_ib_dev_up(dev)) {
-		ipoib_ib_dev_stop(dev);
+		ipoib_ib_dev_stop(dev, 1);
 		return -EINVAL;
 	}
 
@@ -152,7 +152,7 @@ static int ipoib_stop(struct net_device 
 	flush_workqueue(ipoib_workqueue);
 
 	ipoib_ib_dev_down(dev, 1);
-	ipoib_ib_dev_stop(dev);
+	ipoib_ib_dev_stop(dev, 1);
 
 	if (!test_bit(IPOIB_FLAG_SUBINTERFACE, &priv->flags)) {
 		struct ipoib_dev_priv *cpriv;
@@ -993,6 +993,7 @@ static void ipoib_setup(struct net_devic
 	INIT_DELAYED_WORK(&priv->pkey_task,    ipoib_pkey_poll);
 	INIT_DELAYED_WORK(&priv->mcast_task,   ipoib_mcast_join_task);
 	INIT_WORK(&priv->flush_task,   ipoib_ib_dev_flush);
+	INIT_WORK(&priv->flush_restart_qp_task, ipoib_ib_dev_flush_restart_qp);
 	INIT_WORK(&priv->restart_task, ipoib_mcast_restart_task);
 	INIT_DELAYED_WORK(&priv->ah_reap_task, ipoib_reap_ah);
 }
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
index b303ce6..27d6fd4 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
@@ -232,9 +232,10 @@ static int ipoib_mcast_join_finish(struc
 		ret = ipoib_mcast_attach(dev, be16_to_cpu(mcast->mcmember.mlid),
 					 &mcast->mcmember.mgid);
 		if (ret < 0) {
-			ipoib_warn(priv, "couldn't attach QP to multicast group "
-				   IPOIB_GID_FMT "\n",
-				   IPOIB_GID_ARG(mcast->mcmember.mgid));
+			if (ret != -ENXIO) /* No PKEY found */
+				ipoib_warn(priv, "couldn't attach QP to multicast group "
+					   IPOIB_GID_FMT "\n",
+					   IPOIB_GID_ARG(mcast->mcmember.mgid));
 
 			clear_bit(IPOIB_MCAST_FLAG_ATTACHED, &mcast->flags);
 			return ret;
@@ -312,7 +313,7 @@ ipoib_mcast_sendonly_join_complete(int s
 		status = ipoib_mcast_join_finish(mcast, &multicast->rec);
 
 	if (status) {
-		if (mcast->logcount++ < 20)
+		if (mcast->logcount++ < 20 && status != -ENXIO)
 			ipoib_dbg_mcast(netdev_priv(dev), "multicast join failed for "
 					IPOIB_GID_FMT ", status %d\n",
 					IPOIB_GID_ARG(mcast->mcmember.mgid), status);
@@ -416,7 +417,7 @@ static int ipoib_mcast_join_complete(int
 					", status %d\n",
 					IPOIB_GID_ARG(mcast->mcmember.mgid),
 					status);
-		} else {
+		} else if (status != -ENXIO) {
 			ipoib_warn(priv, "multicast join failed for "
 				   IPOIB_GID_FMT ", status %d\n",
 				   IPOIB_GID_ARG(mcast->mcmember.mgid),
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_verbs.c b/drivers/infiniband/ulp/ipoib/ipoib_verbs.c
index 3cb551b..d0384ea 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_verbs.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_verbs.c
@@ -52,8 +52,10 @@ int ipoib_mcast_attach(struct net_device
 	if (ib_find_cached_pkey(priv->ca, priv->port, priv->pkey, &pkey_index)) {
 		clear_bit(IPOIB_PKEY_ASSIGNED, &priv->flags);
 		ret = -ENXIO;
+		ipoib_dbg(priv, "PKEY %X not found\n", priv->pkey);
 		goto out;
 	}
+	ipoib_dbg(priv, "PKEY %X found at index %d\n", priv->pkey, pkey_index);
 	set_bit(IPOIB_PKEY_ASSIGNED, &priv->flags);
 
 	/* set correct QKey for QP */
@@ -105,9 +107,11 @@ int ipoib_init_qp(struct net_device *dev
 	 */
 	ret = ib_find_cached_pkey(priv->ca, priv->port, priv->pkey, &pkey_index);
 	if (ret) {
+		ipoib_dbg(priv, "PKEY %X not found.\n", priv->pkey);
 		clear_bit(IPOIB_PKEY_ASSIGNED, &priv->flags);
 		return ret;
 	}
+	ipoib_dbg(priv, "PKEY %X found at index %d.\n", priv->pkey, pkey_index);
 	set_bit(IPOIB_PKEY_ASSIGNED, &priv->flags);
 
 	qp_attr.qp_state = IB_QPS_INIT;
@@ -260,12 +264,14 @@ void ipoib_event(struct ib_event_handler
 		container_of(handler, struct ipoib_dev_priv, event_handler);
 
 	if (record->event == IB_EVENT_PORT_ERR    ||
-	    record->event == IB_EVENT_PKEY_CHANGE ||
 	    record->event == IB_EVENT_PORT_ACTIVE ||
 	    record->event == IB_EVENT_LID_CHANGE  ||
 	    record->event == IB_EVENT_SM_CHANGE   ||
 	    record->event == IB_EVENT_CLIENT_REREGISTER) {
 		ipoib_dbg(priv, "Port state change event\n");
 		queue_work(ipoib_workqueue, &priv->flush_task);
+	} else if (record->event == IB_EVENT_PKEY_CHANGE) {
+		ipoib_dbg(priv, "PKEY change event\n");
+		queue_work(ipoib_workqueue, &priv->flush_restart_qp_task);
 	}
 }


From mst at mellanox.co.il  Tue Feb 27 06:51:14 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Tue, 27 Feb 2007 16:51:14 +0200
Subject: [openib-general] [RFC] [PATCH v2] IB/ipoib: Add bonding support
	to IPoIB
In-Reply-To: <45E41C13.8090300@voltaire.com>
References: <45E313D2.70909@voltaire.com>
	<20070227060245.GI12919@mellanox.co.il> <45E41C13.8090300@voltaire.com>
Message-ID: <20070227145114.GC4437@mellanox.co.il>

> Quoting Moni Shoua <monis at voltaire.com>:
> Subject: Re: [RFC] [PATCH v2] IB/ipoib: Add bonding support to IPoIB
> 
> 
> Thanks for the comments 
> 
> >> To fix it, this patch adds a dev field to struct ipoib_neigh which is used
> >> instead of the struct neighbour dev one.
> > 
> > It seems that in this design, if multiple ipoib interfaces are present, we might
> > get an skb such that skb->dev will be different from the new dev field in struct
> > ipoib_neigh.
> > 
> > It seems that the result will be that the packet will be sent on a wrong interface.
> > Right?
> > 
> I don't see how. The field dev in ipoib_neigh doesn't take part in interface selection.
> As I see it, skb travels this path:
> 1. Passed to bond_dev->hard_start_xmit
> 2. bond_dev->hard_start_xmit chooses the current active interface, changes skb->dev and enqueues it back for xmittig.

ipoib_neigh ah field includes struct ib_ah *.
This selects important parameters which depend on both packet source and
destination interfaces.

I think the right thing might be to compare ipoib_neigh dev and skb->dev,
and destroy ipoib_neigh if these do not match.

> >> In addition, if an IPoIB device is removed before bonding is unloaded it may 
> >> cause bond0 neighbours (neighbours that point to bond0) to exist after the IPoIB
> >> device no longer exist. This is why a neighbour cleanup is required during device 
> >> cleanup. This cleanup scans the arp cache and the ndisc cache to find there 
> >> neighbours of bond0 which refer also to the relevant ibX. Also, when ib_ipoib module is
> >> unloaded, the neighbour destructor must be set to NULL because the neighbour function is in
> >> ib_ipoib.
> >> For this neigh table cleanup, it is required to export the symbol nd_tbl just like the symbol arp_tbl is.
> > 
> > I wonder about this: is it really true that any allocated neighbour is always in
> > either arp_tbl or nd_tbl? For example, could some code have called neigh_hold
> > and retained a neighbour that is not in either one of these tables?
> > 
> I got the assumption about neighbours living in one of these 2 tables from
> observation and code reading.  I preferred that that on keeping track of all
> ipoib_neighs and putting them in a list. However, I could do that instead of
> neigh_table scanning. Do you think it's better?

If some neighbours are not on any tables, it seems using our own lists
(e.g. lists we have in ipoib_path) is the only option, no?

> For the example... I didn't
> understand it. Could you please explain?

grep for neigh_hold. neighbour is only destroyed when ref count goes to 0.
If some code does neigh_hold, it seems neighbour could be removed from table
but destructor not yet called.

> >> During my tests I found that when running 
> >>
> >> 	1. modprobe -r ib_mthca (to delete IPoIB interfaces)
> >> 	2. ping somewhere on the subnet of bond0
> >>
> >> I get this stack dump (which ends with kernel death)
> >> 	 [<ffffffff8037ff32>] skb_under_panic+0x5c/0x60
> >> 	 [<ffffffff882e00c2>] :ib_ipoib:ipoib_hard_header+0xa6/0xc0
> >> 	 [<ffffffff803c3c98>] arp_create+0x120/0x226
> >> 	 [<ffffffff803c3dc3>] arp_send+0x25/0x3b
> >> 	 [<ffffffff803c466a>] arp_solicit+0x186/0x195
> >> 	 [<ffffffff8038c0ac>] neigh_timer_handler+0x2b5/0x309
> >> 	 [<ffffffff8038bdf7>] neigh_timer_handler+0x0/0x309
> >> 	 [<ffffffff80239599>] run_timer_softirq+0x130/0x19e
> >> 	 [<ffffffff80235fcc>] __do_softirq+0x55/0xc3
> >> 	 [<ffffffff8020acac>] call_softirq+0x1c/0x28
> >> 	 [<ffffffff8020c02b>] do_softirq+0x2c/0x7d
> >> 	 [<ffffffff8021864a>] smp_apic_timer_interrupt+0x57/0x6a
> >> 	 [<ffffffff80208e19>] mwait_idle+0x0/0x45
> >> 	 [<ffffffff8020a756>] apic_timer_interrupt+0x66/0x70
> >> 	 <EOI>  [<ffffffff80208e5b>] mwait_idle+0x42/0x45
> >> 	 [<ffffffff80208db1>] cpu_idle+0x8b/0xae
> >> 	 [<ffffffff80217d60>] start_secondary+0x47f/0x48f
> >>
> >> The only way I found to avoid this (for now) is to check skb headroom in
> >> ipoib_hard_header. I guess that this safety check doesn't harm regular IPoIB 
> >> operation and it seems to solve my problem. However, I would be happy to hear what
> >> others think of this last issue.
> > 
> > As I said, this seems to indicate a problem in the bonding code.
> > But what will happen after you error out in ipoib_hard_header?
> > Is the packet dropped? What might break as a result?
> > 
> I will check the hard_header_len issue in the bonding code more carefully.
> From first look it seems that bonding does borrow the hard_header_len.

So where does a shorter message come from?

> Also,
> my checks show that it is safe to return with error from
> hard_header().  For example,  in neigh_connected_output:
> 
>         err = dev->hard_header(skb, dev, ntohs(skb->protocol),
>                                neigh->ha, NULL, skb->len);
>         read_unlock_bh(&neigh->lock);
>         if (err >= 0)
>                 err = neigh->ops->queue_xmit(skb);
>         else {
>                 err = -EINVAL;
>                 kfree_skb(skb);
>  
> >> I would really appreciate comments.
> >>
> >> thanks
> >>
> >>  -MoniS
> > 

-- 
MST


From mst at mellanox.co.il  Tue Feb 27 07:12:12 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Tue, 27 Feb 2007 17:12:12 +0200
Subject: [openib-general] [PATCHv2] IB/ipoib: Fix ipoib handling for
	pkey reordering
In-Reply-To: <45E44064.4020407@voltaire.com>
References: <45E44064.4020407@voltaire.com>
Message-ID: <20070227151212.GD4437@mellanox.co.il>

I just gave this a cursory glance.
A suggestion: would it not be much simpler to modify the QP from RTS to RTS on pkey
change?

> diff --git a/drivers/infiniband/ulp/ipoib/ipoib_ib.c b/drivers/infiniband/ulp/ipoib/ipoib_ib.c
> index f2aa923..b0287c1 100644
> --- a/drivers/infiniband/ulp/ipoib/ipoib_ib.c
> +++ b/drivers/infiniband/ulp/ipoib/ipoib_ib.c
> @@ -415,21 +415,22 @@ int ipoib_ib_dev_open(struct net_device 
>  
>  	ret = ipoib_init_qp(dev);
>  	if (ret) {
> -		ipoib_warn(priv, "ipoib_init_qp returned %d\n", ret);
> +		if (ret != -ENOENT)
> +			ipoib_warn(priv, "ipoib_init_qp returned %d\n", ret);
>  		return -1;
>  	}
 
What's the reason for this?

> @@ -993,6 +993,7 @@ static void ipoib_setup(struct net_devic
>  	INIT_DELAYED_WORK(&priv->pkey_task,    ipoib_pkey_poll);
>  	INIT_DELAYED_WORK(&priv->mcast_task,   ipoib_mcast_join_task);
>  	INIT_WORK(&priv->flush_task,   ipoib_ib_dev_flush);
> +	INIT_WORK(&priv->flush_restart_qp_task, ipoib_ib_dev_flush_restart_qp);
>  	INIT_WORK(&priv->restart_task, ipoib_mcast_restart_task);
>  	INIT_DELAYED_WORK(&priv->ah_reap_task, ipoib_reap_ah);
>  }

Shorter name?

> diff --git a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
> index b303ce6..27d6fd4 100644
> --- a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
> +++ b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
> @@ -232,9 +232,10 @@ static int ipoib_mcast_join_finish(struc
>  		ret = ipoib_mcast_attach(dev, be16_to_cpu(mcast->mcmember.mlid),
>  					 &mcast->mcmember.mgid);
>  		if (ret < 0) {
> -			ipoib_warn(priv, "couldn't attach QP to multicast group "
> -				   IPOIB_GID_FMT "\n",
> -				   IPOIB_GID_ARG(mcast->mcmember.mgid));
> +			if (ret != -ENXIO) /* No PKEY found */
> +				ipoib_warn(priv, "couldn't attach QP to multicast group "
> +					   IPOIB_GID_FMT "\n",
> +					   IPOIB_GID_ARG(mcast->mcmember.mgid));
>  
>  			clear_bit(IPOIB_MCAST_FLAG_ATTACHED, &mcast->flags);
>  			return ret;
> @@ -312,7 +313,7 @@ ipoib_mcast_sendonly_join_complete(int s
>  		status = ipoib_mcast_join_finish(mcast, &multicast->rec);
>  
>  	if (status) {
> -		if (mcast->logcount++ < 20)
> +		if (mcast->logcount++ < 20 && status != -ENXIO)
>  			ipoib_dbg_mcast(netdev_priv(dev), "multicast join failed for "
>  					IPOIB_GID_FMT ", status %d\n",
>  					IPOIB_GID_ARG(mcast->mcmember.mgid), status);
> @@ -416,7 +417,7 @@ static int ipoib_mcast_join_complete(int
>  					", status %d\n",
>  					IPOIB_GID_ARG(mcast->mcmember.mgid),
>  					status);
> -		} else {
> +		} else if (status != -ENXIO) {
>  			ipoib_warn(priv, "multicast join failed for "
>  				   IPOIB_GID_FMT ", status %d\n",
>  				   IPOIB_GID_ARG(mcast->mcmember.mgid),
> diff --git a/drivers/infiniband/ulp/ipoib/ipoib_verbs.c b/drivers/infiniband/ulp/ipoib/ipoib_verbs.c
> index 3cb551b..d0384ea 100644
> --- a/drivers/infiniband/ulp/ipoib/ipoib_verbs.c
> +++ b/drivers/infiniband/ulp/ipoib/ipoib_verbs.c
> @@ -52,8 +52,10 @@ int ipoib_mcast_attach(struct net_device
>  	if (ib_find_cached_pkey(priv->ca, priv->port, priv->pkey, &pkey_index)) {
>  		clear_bit(IPOIB_PKEY_ASSIGNED, &priv->flags);
>  		ret = -ENXIO;
> +		ipoib_dbg(priv, "PKEY %X not found\n", priv->pkey);
>  		goto out;
>  	}
> +	ipoib_dbg(priv, "PKEY %X found at index %d\n", priv->pkey, pkey_index);
>  	set_bit(IPOIB_PKEY_ASSIGNED, &priv->flags);
>  
>  	/* set correct QKey for QP */

Make it PKey or pkey: no text in uppercase in log messages please.

> @@ -105,9 +107,11 @@ int ipoib_init_qp(struct net_device *dev
>  	 */
>  	ret = ib_find_cached_pkey(priv->ca, priv->port, priv->pkey, &pkey_index);
>  	if (ret) {
> +		ipoib_dbg(priv, "PKEY %X not found.\n", priv->pkey);
>  		clear_bit(IPOIB_PKEY_ASSIGNED, &priv->flags);
>  		return ret;
>  	}
> +	ipoib_dbg(priv, "PKEY %X found at index %d.\n", priv->pkey, pkey_index);
>  	set_bit(IPOIB_PKEY_ASSIGNED, &priv->flags);
>  
>  	qp_attr.qp_state = IB_QPS_INIT;

going a bit overboard on the number of debug messages here.

> @@ -260,12 +264,14 @@ void ipoib_event(struct ib_event_handler
>  		container_of(handler, struct ipoib_dev_priv, event_handler);
>  
>  	if (record->event == IB_EVENT_PORT_ERR    ||
> -	    record->event == IB_EVENT_PKEY_CHANGE ||
>  	    record->event == IB_EVENT_PORT_ACTIVE ||
>  	    record->event == IB_EVENT_LID_CHANGE  ||
>  	    record->event == IB_EVENT_SM_CHANGE   ||
>  	    record->event == IB_EVENT_CLIENT_REREGISTER) {
>  		ipoib_dbg(priv, "Port state change event\n");
>  		queue_work(ipoib_workqueue, &priv->flush_task);
> +	} else if (record->event == IB_EVENT_PKEY_CHANGE) {
> +		ipoib_dbg(priv, "PKEY change event\n");
> +		queue_work(ipoib_workqueue, &priv->flush_restart_qp_task);
>  	}
>  }


-- 
MST


From rdreier at cisco.com  Tue Feb 27 07:30:44 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Tue, 27 Feb 2007 07:30:44 -0800
Subject: [openib-general] [PATCHv2] IB/ipoib: Fix ipoib handling for
	pkey reordering
In-Reply-To: <20070227151212.GD4437@mellanox.co.il> (Michael S.
	Tsirkin's message of "Tue, 27 Feb 2007 17:12:12 +0200")
References: <45E44064.4020407@voltaire.com>
	<20070227151212.GD4437@mellanox.co.il>
Message-ID: <adaodnfwrbv.fsf@cisco.com>

 > I just gave this a cursory glance.

I haven't really read it except to think "why is this so complicated"?

 > A suggestion: would it not be much simpler to modify the QP from RTS to RTS on pkey
 > change?

Changing the P_Key index is not allowed for RTS->RTS.  You would have
to modify the QP RTS->SQD, wait for the SQ to drain, then modify the
P_Key index with SQD->SQD, and finally go SQD->RTS.

 - R.


From mst at mellanox.co.il  Tue Feb 27 07:36:10 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Tue, 27 Feb 2007 17:36:10 +0200
Subject: [openib-general] [PATCHv2] IB/ipoib: Fix ipoib handling for
	pkey reordering
In-Reply-To: <adaodnfwrbv.fsf@cisco.com>
References: <20070227151212.GD4437@mellanox.co.il> <adaodnfwrbv.fsf@cisco.com>
Message-ID: <20070227153610.GI4437@mellanox.co.il>

> Quoting Roland Dreier <rdreier at cisco.com>:
> Subject: Re: [PATCHv2] IB/ipoib: Fix ipoib handling for pkey reordering
> 
>  > I just gave this a cursory glance.
> 
> I haven't really read it except to think "why is this so complicated"?
> 
>  > A suggestion: would it not be much simpler to modify the QP from RTS to RTS on pkey
>  > change?
> 
> Changing the P_Key index is not allowed for RTS->RTS.  You would have
> to modify the QP RTS->SQD, wait for the SQ to drain, then modify the
> P_Key index with SQD->SQD, and finally go SQD->RTS.

True, I misread the spec.

-- 
MST


From rdreier at cisco.com  Tue Feb 27 07:38:08 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Tue, 27 Feb 2007 07:38:08 -0800
Subject: [openib-general] [RFC] IB/ipoib: Asynchronous events delivered
 without port parameter.
In-Reply-To: <6a122cc00702270502h27d90515k117bf23ea3f31f4d@mail.gmail.com>
	(Moni Levy's message of "Tue, 27 Feb 2007 15:02:50 +0200")
References: <6a122cc00702270502h27d90515k117bf23ea3f31f4d@mail.gmail.com>
Message-ID: <adak5y3wqzj.fsf@cisco.com>

 >    I did a short code review of the ipoib code concentrating on
 > partitioning support and I mentioned that the asynchronous events
 > handler in the ipoib code does not take the port number reported in
 > the event record into consideration. The effect of that is that all of
 > the ib# devices related to that specific HCA are flushed when it seems
 > to me that only the relevant port one should be. Is that done on
 > purpose, or am I missing something ?

I don't think there's any particular reason the code is that way
except for the oversight never being corrected.  But it looks trivial
to fix, like the patch below.  Does that look right to you?

 > p.s. I'm working on a patch that should solve another issue caused by
 > PKEY reordering & ipoib behavior and the above issue further
 > complicates things for me.

Why not fix the issue first then?

commit a27cbe878203076247c1b5287f5ab59ed143b560
Author: Roland Dreier <rolandd at cisco.com>
Date:   Tue Feb 27 07:37:49 2007 -0800

    IPoIB: Only handle async events for one port
    
    An asynchronous event carries the port number that the event occurred
    on, so there's no reason for an IPoIB interface to process an event
    associated with a different local HCA port.
    
    Signed-off-by: Roland Dreier <rolandd at cisco.com>

diff --git a/drivers/infiniband/ulp/ipoib/ipoib_verbs.c b/drivers/infiniband/ulp/ipoib/ipoib_verbs.c
index 3cb551b..7f3ec20 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_verbs.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_verbs.c
@@ -259,12 +259,13 @@ void ipoib_event(struct ib_event_handler *handler,
 	struct ipoib_dev_priv *priv =
 		container_of(handler, struct ipoib_dev_priv, event_handler);
 
-	if (record->event == IB_EVENT_PORT_ERR    ||
-	    record->event == IB_EVENT_PKEY_CHANGE ||
-	    record->event == IB_EVENT_PORT_ACTIVE ||
-	    record->event == IB_EVENT_LID_CHANGE  ||
-	    record->event == IB_EVENT_SM_CHANGE   ||
-	    record->event == IB_EVENT_CLIENT_REREGISTER) {
+	if ((record->event == IB_EVENT_PORT_ERR    ||
+	     record->event == IB_EVENT_PKEY_CHANGE ||
+	     record->event == IB_EVENT_PORT_ACTIVE ||
+	     record->event == IB_EVENT_LID_CHANGE  ||
+	     record->event == IB_EVENT_SM_CHANGE   ||
+	     record->event == IB_EVENT_CLIENT_REREGISTER) &&
+	    record->element.port_num == priv->port) {
 		ipoib_dbg(priv, "Port state change event\n");
 		queue_work(ipoib_workqueue, &priv->flush_task);
 	}


From monil at voltaire.com  Tue Feb 27 07:44:29 2007
From: monil at voltaire.com (Moni Levy)
Date: Tue, 27 Feb 2007 17:44:29 +0200
Subject: [openib-general] [PATCHv2] IB/ipoib: Fix ipoib handling for
 pkey reordering
In-Reply-To: <adaodnfwrbv.fsf@cisco.com>
References: <45E44064.4020407@voltaire.com>
	<20070227151212.GD4437@mellanox.co.il> <adaodnfwrbv.fsf@cisco.com>
Message-ID: <6a122cc00702270744u43f55c37r15311ef8ba80f4f9@mail.gmail.com>

On 2/27/07, Roland Dreier <rdreier at cisco.com> wrote:
>  > I just gave this a cursory glance.
>
> I haven't really read it except to think "why is this so complicated"?

Do you refer to that complication of the patch of the issue ?

>
>  > A suggestion: would it not be much simpler to modify the QP from RTS to RTS on pkey
>  > change?
>
> Changing the P_Key index is not allowed for RTS->RTS.  You would have
> to modify the QP RTS->SQD, wait for the SQ to drain, then modify the
> P_Key index with SQD->SQD, and finally go SQD->RTS.

Do you think that using that way to solve it will be a significant
simplification ? We'll still have to reuse that handling for missed
completion that is currently implemented in ipoib_ib_dev_stop and
still have additional work element.

-- Moni

>
>  - R.
>
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
>
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
>
>


From mst at mellanox.co.il  Tue Feb 27 07:44:19 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Tue, 27 Feb 2007 17:44:19 +0200
Subject: [openib-general] [RFC] IB/ipoib: Asynchronous events delivered
 without port parameter.
In-Reply-To: <adak5y3wqzj.fsf@cisco.com>
References: <6a122cc00702270502h27d90515k117bf23ea3f31f4d@mail.gmail.com>
	<adak5y3wqzj.fsf@cisco.com>
Message-ID: <20070227154419.GJ4437@mellanox.co.il>

> Quoting Roland Dreier <rdreier at cisco.com>:
> Subject: Re: [RFC] IB/ipoib: Asynchronous events delivered without port parameter.
> 
>  >    I did a short code review of the ipoib code concentrating on
>  > partitioning support and I mentioned that the asynchronous events
>  > handler in the ipoib code does not take the port number reported in
>  > the event record into consideration. The effect of that is that all of
>  > the ib# devices related to that specific HCA are flushed when it seems
>  > to me that only the relevant port one should be. Is that done on
>  > purpose, or am I missing something ?
> 
> I don't think there's any particular reason the code is that way
> except for the oversight never being corrected.  But it looks trivial
> to fix, like the patch below.  Does that look right to you?
> 
>  > p.s. I'm working on a patch that should solve another issue caused by
>  > PKEY reordering & ipoib behavior and the above issue further
>  > complicates things for me.
> 
> Why not fix the issue first then?
> 
> commit a27cbe878203076247c1b5287f5ab59ed143b560
> Author: Roland Dreier <rolandd at cisco.com>
> Date:   Tue Feb 27 07:37:49 2007 -0800
> 
>     IPoIB: Only handle async events for one port
>     
>     An asynchronous event carries the port number that the event occurred
>     on, so there's no reason for an IPoIB interface to process an event
>     associated with a different local HCA port.
>     
>     Signed-off-by: Roland Dreier <rolandd at cisco.com>
> 
> diff --git a/drivers/infiniband/ulp/ipoib/ipoib_verbs.c b/drivers/infiniband/ulp/ipoib/ipoib_verbs.c
> index 3cb551b..7f3ec20 100644
> --- a/drivers/infiniband/ulp/ipoib/ipoib_verbs.c
> +++ b/drivers/infiniband/ulp/ipoib/ipoib_verbs.c
> @@ -259,12 +259,13 @@ void ipoib_event(struct ib_event_handler *handler,
>  	struct ipoib_dev_priv *priv =
>  		container_of(handler, struct ipoib_dev_priv, event_handler);
>  
> -	if (record->event == IB_EVENT_PORT_ERR    ||
> -	    record->event == IB_EVENT_PKEY_CHANGE ||
> -	    record->event == IB_EVENT_PORT_ACTIVE ||
> -	    record->event == IB_EVENT_LID_CHANGE  ||
> -	    record->event == IB_EVENT_SM_CHANGE   ||
> -	    record->event == IB_EVENT_CLIENT_REREGISTER) {
> +	if ((record->event == IB_EVENT_PORT_ERR    ||
> +	     record->event == IB_EVENT_PKEY_CHANGE ||
> +	     record->event == IB_EVENT_PORT_ACTIVE ||
> +	     record->event == IB_EVENT_LID_CHANGE  ||
> +	     record->event == IB_EVENT_SM_CHANGE   ||
> +	     record->event == IB_EVENT_CLIENT_REREGISTER) &&
> +	    record->element.port_num == priv->port) {
>  		ipoib_dbg(priv, "Port state change event\n");
>  		queue_work(ipoib_workqueue, &priv->flush_task);
>  	}

Looks good.

-- 
MST


From monil at voltaire.com  Tue Feb 27 07:47:59 2007
From: monil at voltaire.com (Moni Levy)
Date: Tue, 27 Feb 2007 17:47:59 +0200
Subject: [openib-general] [RFC] IB/ipoib: Asynchronous events delivered
 without port parameter.
In-Reply-To: <adak5y3wqzj.fsf@cisco.com>
References: <6a122cc00702270502h27d90515k117bf23ea3f31f4d@mail.gmail.com>
	<adak5y3wqzj.fsf@cisco.com>
Message-ID: <6a122cc00702270747l26c1adavd57de9ba2d9a472b@mail.gmail.com>

On 2/27/07, Roland Dreier <rdreier at cisco.com> wrote:
>  >    I did a short code review of the ipoib code concentrating on
>  > partitioning support and I mentioned that the asynchronous events
>  > handler in the ipoib code does not take the port number reported in
>  > the event record into consideration. The effect of that is that all of
>  > the ib# devices related to that specific HCA are flushed when it seems
>  > to me that only the relevant port one should be. Is that done on
>  > purpose, or am I missing something ?
>
> I don't think there's any particular reason the code is that way
> except for the oversight never being corrected.  But it looks trivial
> to fix, like the patch below.  Does that look right to you?
>
>  > p.s. I'm working on a patch that should solve another issue caused by
>  > PKEY reordering & ipoib behavior and the above issue further
>  > complicates things for me.
>
> Why not fix the issue first then?
>
> commit a27cbe878203076247c1b5287f5ab59ed143b560
> Author: Roland Dreier <rolandd at cisco.com>
> Date:   Tue Feb 27 07:37:49 2007 -0800
>
>     IPoIB: Only handle async events for one port
>
>     An asynchronous event carries the port number that the event occurred
>     on, so there's no reason for an IPoIB interface to process an event
>     associated with a different local HCA port.
>
>     Signed-off-by: Roland Dreier <rolandd at cisco.com>
>
> diff --git a/drivers/infiniband/ulp/ipoib/ipoib_verbs.c b/drivers/infiniband/ulp/ipoib/ipoib_verbs.c
> index 3cb551b..7f3ec20 100644
> --- a/drivers/infiniband/ulp/ipoib/ipoib_verbs.c
> +++ b/drivers/infiniband/ulp/ipoib/ipoib_verbs.c
> @@ -259,12 +259,13 @@ void ipoib_event(struct ib_event_handler *handler,
>         struct ipoib_dev_priv *priv =
>                 container_of(handler, struct ipoib_dev_priv, event_handler);
>
> -       if (record->event == IB_EVENT_PORT_ERR    ||
> -           record->event == IB_EVENT_PKEY_CHANGE ||
> -           record->event == IB_EVENT_PORT_ACTIVE ||
> -           record->event == IB_EVENT_LID_CHANGE  ||
> -           record->event == IB_EVENT_SM_CHANGE   ||
> -           record->event == IB_EVENT_CLIENT_REREGISTER) {
> +       if ((record->event == IB_EVENT_PORT_ERR    ||
> +            record->event == IB_EVENT_PKEY_CHANGE ||
> +            record->event == IB_EVENT_PORT_ACTIVE ||
> +            record->event == IB_EVENT_LID_CHANGE  ||
> +            record->event == IB_EVENT_SM_CHANGE   ||
> +            record->event == IB_EVENT_CLIENT_REREGISTER) &&
> +           record->element.port_num == priv->port) {
>                 ipoib_dbg(priv, "Port state change event\n");
>                 queue_work(ipoib_workqueue, &priv->flush_task);
>         }
>

That's exactly what I intended to post.

--Moni


From rdreier at cisco.com  Tue Feb 27 07:47:10 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Tue, 27 Feb 2007 07:47:10 -0800
Subject: [openib-general] [PATCHv2] IB/ipoib: Fix ipoib handling for
 pkey reordering
In-Reply-To: <6a122cc00702270744u43f55c37r15311ef8ba80f4f9@mail.gmail.com>
	(Moni Levy's message of "Tue, 27 Feb 2007 17:44:29 +0200")
References: <45E44064.4020407@voltaire.com>
	<20070227151212.GD4437@mellanox.co.il> <adaodnfwrbv.fsf@cisco.com>
	<6a122cc00702270744u43f55c37r15311ef8ba80f4f9@mail.gmail.com>
Message-ID: <adafy8rwqkh.fsf@cisco.com>

 > > I haven't really read it except to think "why is this so complicated"?
 > 
 > Do you refer to that complication of the patch of the issue ?

the patch.

 > > Changing the P_Key index is not allowed for RTS->RTS.  You would have
 > > to modify the QP RTS->SQD, wait for the SQ to drain, then modify the
 > > P_Key index with SQD->SQD, and finally go SQD->RTS.
 > 
 > Do you think that using that way to solve it will be a significant
 > simplification ? We'll still have to reuse that handling for missed
 > completion that is currently implemented in ipoib_ib_dev_stop and
 > still have additional work element.

no, I don't think SQD is really useful in practice.


From vlad at dev.mellanox.co.il  Tue Feb 27 07:48:03 2007
From: vlad at dev.mellanox.co.il (Vladimir Sokolovsky)
Date: Tue, 27 Feb 2007 17:48:03 +0200
Subject: [openib-general] HOWTO check ofa_kernel build from your git tree
In-Reply-To: <1172586210.11870.16.camel@stevo-desktop>
References: <1172502465.21382.44.camel@vladsk-laptop>
	<1172586210.11870.16.camel@stevo-desktop>
Message-ID: <1172591283.21382.84.camel@vladsk-laptop>

On Tue, 2007-02-27 at 08:23 -0600, Steve Wise wrote:
> Where are all the kernel src trees on ssh. openfabrics.org?
> 
> I would like to build against specific trees that are failing with
> cxgb3...
> 
/home/vlad/kernel.org/<arch>/<kernel>

> Also:  
> 
> what RH distro ships:
> 
> linux-2.6.9-22.ELsmp
> 
RHEL4.0U2
> 
> and
> 
> linux-2.6.9-34.ELsmp
> 
RHEL4.0U3
> 
> Thanks,
> 
> Steve.
> 
> 
> 
> On Mon, 2007-02-26 at 17:07 +0200, Vladimir Sokolovsky wrote:
> > On ssh.openfabrics.org:
> > Run
> > env git_url=/home/mst/scm/ofed_1_2_devel.git git_branch=ofed_1_2 \
> > 	CHECK_LOCAL=yes \
> > 	CHECK_KERNEL_ORG=yes \
> > 	CHECK_CROSS=yes /home/vlad/scripts/build_ofa_kernel.sh
> > 


From monil at voltaire.com  Tue Feb 27 07:52:09 2007
From: monil at voltaire.com (Moni Levy)
Date: Tue, 27 Feb 2007 17:52:09 +0200
Subject: [openib-general] [PATCHv2] IB/ipoib: Fix ipoib handling for
 pkey reordering
In-Reply-To: <adafy8rwqkh.fsf@cisco.com>
References: <45E44064.4020407@voltaire.com>
	<20070227151212.GD4437@mellanox.co.il> <adaodnfwrbv.fsf@cisco.com>
	<6a122cc00702270744u43f55c37r15311ef8ba80f4f9@mail.gmail.com>
	<adafy8rwqkh.fsf@cisco.com>
Message-ID: <6a122cc00702270752i391a9e90ubf70569993f1f6d1@mail.gmail.com>

On 2/27/07, Roland Dreier <rdreier at cisco.com> wrote:
>  > > I haven't really read it except to think "why is this so complicated"?
>  >
>  > Do you refer to that complication of the patch of the issue ?
>
> the patch.

Please advise and I'll change it.

>
>  > > Changing the P_Key index is not allowed for RTS->RTS.  You would have
>  > > to modify the QP RTS->SQD, wait for the SQ to drain, then modify the
>  > > P_Key index with SQD->SQD, and finally go SQD->RTS.
>  >
>  > Do you think that using that way to solve it will be a significant
>  > simplification ? We'll still have to reuse that handling for missed
>  > completion that is currently implemented in ipoib_ib_dev_stop and
>  > still have additional work element.
>
> no, I don't think SQD is really useful in practice.
>


From swise at opengridcomputing.com  Tue Feb 27 07:59:53 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Tue, 27 Feb 2007 09:59:53 -0600
Subject: [openib-general] [PATCH  0/6] ofed_1_2: cxgb3 bug fixes
Message-ID: <20070227155953.21615.96154.stgit@dell3.ogc.int>


Hey Vlad,

These fixes need to be pulled into ofed_1_2 for the Chelsio Ethernet
driver.

You can pull them directly from my ofa git tree:

git://staging.openfabrics.org/~swise/ofed_1_2 cxgb3_fixes

Thanks,

Steve.


From swise at opengridcomputing.com  Tue Feb 27 07:59:55 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Tue, 27 Feb 2007 09:59:55 -0600
Subject: [openib-general] [PATCH 1/6] sysfs attributes are now managed per
 port, no longer per adapter.
In-Reply-To: <20070227155953.21615.96154.stgit@dell3.ogc.int>
References: <20070227155953.21615.96154.stgit@dell3.ogc.int>
Message-ID: <20070227155955.21615.20784.stgit@dell3.ogc.int>


sysfs attributes are now managed per port, no longer per adapter.

Signed-off-by: Divy Le Ray <divy at chelsio.com>
---

 drivers/net/cxgb3/cxgb3_main.c |   21 ++++++++++++---------
 1 files changed, 12 insertions(+), 9 deletions(-)

diff --git a/drivers/net/cxgb3/cxgb3_main.c b/drivers/net/cxgb3/cxgb3_main.c
index dfa035a..638b0ab 100755
--- a/drivers/net/cxgb3/cxgb3_main.c
+++ b/drivers/net/cxgb3/cxgb3_main.c
@@ -435,26 +435,24 @@ static int setup_sge_qsets(struct adapte
 }
 
 static ssize_t attr_show(struct class_device *cd, char *buf,
-			 ssize_t(*format) (struct adapter *, char *))
+			 ssize_t(*format) (struct net_device *, char *))
 {
 	ssize_t len;
-	struct adapter *adap = to_net_dev(cd)->priv;
 
 	/* Synchronize with ioctls that may shut down the device */
 	rtnl_lock();
-	len = (*format) (adap, buf);
+	len = (*format) (to_net_dev(cd), buf);
 	rtnl_unlock();
 	return len;
 }
 
 static ssize_t attr_store(struct class_device *cd, const char *buf, size_t len,
-			  ssize_t(*set) (struct adapter *, unsigned int),
+			  ssize_t(*set) (struct net_device *, unsigned int),
 			  unsigned int min_val, unsigned int max_val)
 {
 	char *endp;
 	ssize_t ret;
 	unsigned int val;
-	struct adapter *adap = to_net_dev(cd)->priv;
 
 	if (!capable(CAP_NET_ADMIN))
 		return -EPERM;
@@ -464,7 +462,7 @@ static ssize_t attr_store(struct class_d
 		return -EINVAL;
 
 	rtnl_lock();
-	ret = (*set) (adap, val);
+	ret = (*set) (to_net_dev(cd), val);
 	if (!ret)
 		ret = len;
 	rtnl_unlock();
@@ -472,8 +470,9 @@ static ssize_t attr_store(struct class_d
 }
 
 #define CXGB3_SHOW(name, val_expr) \
-static ssize_t format_##name(struct adapter *adap, char *buf) \
+static ssize_t format_##name(struct net_device *dev, char *buf) \
 { \
+	struct adapter *adap = dev->priv; \
 	return sprintf(buf, "%u\n", val_expr); \
 } \
 static ssize_t show_##name(struct class_device *cd, char *buf) \
@@ -481,8 +480,10 @@ static ssize_t show_##name(struct class_
 	return attr_show(cd, buf, format_##name); \
 }
 
-static ssize_t set_nfilters(struct adapter *adap, unsigned int val)
+static ssize_t set_nfilters(struct net_device *dev, unsigned int val)
 {
+	struct adapter *adap = dev->priv;
+
 	if (adap->flags & FULL_INIT_DONE)
 		return -EBUSY;
 	if (val && adap->params.rev == 0)
@@ -499,8 +500,10 @@ static ssize_t store_nfilters(struct cla
 	return attr_store(cd, buf, len, set_nfilters, 0, ~0);
 }
 
-static ssize_t set_nservers(struct adapter *adap, unsigned int val)
+static ssize_t set_nservers(struct net_device *dev, unsigned int val)
 {
+	struct adapter *adap = dev->priv;
+
 	if (adap->flags & FULL_INIT_DONE)
 		return -EBUSY;
 	if (val > t3_mc5_size(&adap->mc5) - adap->params.mc5.nfilters)


From swise at opengridcomputing.com  Tue Feb 27 07:59:57 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Tue, 27 Feb 2007 09:59:57 -0600
Subject: [openib-general] [PATCH 2/6] Clean up some private ioctls.
In-Reply-To: <20070227155953.21615.96154.stgit@dell3.ogc.int>
References: <20070227155953.21615.96154.stgit@dell3.ogc.int>
Message-ID: <20070227155957.21615.98689.stgit@dell3.ogc.int>


Clean up some private ioctls.

Signed-off-by: Divy Le Ray <divy at chelsio.com>
---

 drivers/net/cxgb3/cxgb3_ioctl.h |   33 +++++++++------------------
 drivers/net/cxgb3/cxgb3_main.c  |   48 +++------------------------------------
 2 files changed, 15 insertions(+), 66 deletions(-)

diff --git a/drivers/net/cxgb3/cxgb3_ioctl.h b/drivers/net/cxgb3/cxgb3_ioctl.h
old mode 100755
new mode 100644
index a942818..0a82fcd
--- a/drivers/net/cxgb3/cxgb3_ioctl.h
+++ b/drivers/net/cxgb3/cxgb3_ioctl.h
@@ -36,28 +36,17 @@ #define __CHIOCTL_H__
  * Ioctl commands specific to this driver.
  */
 enum {
-	CHELSIO_SETREG = 1024,
-	CHELSIO_GETREG,
-	CHELSIO_SETTPI,
-	CHELSIO_GETTPI,
-	CHELSIO_GETMTUTAB,
-	CHELSIO_SETMTUTAB,
-	CHELSIO_GETMTU,
-	CHELSIO_SET_PM,
-	CHELSIO_GET_PM,
-	CHELSIO_GET_TCAM,
-	CHELSIO_SET_TCAM,
-	CHELSIO_GET_TCB,
-	CHELSIO_GET_MEM,
-	CHELSIO_LOAD_FW,
-	CHELSIO_GET_PROTO,
-	CHELSIO_SET_PROTO,
-	CHELSIO_SET_TRACE_FILTER,
-	CHELSIO_SET_QSET_PARAMS,
-	CHELSIO_GET_QSET_PARAMS,
-	CHELSIO_SET_QSET_NUM,
-	CHELSIO_GET_QSET_NUM,
-	CHELSIO_SET_PKTSCHED,
+	CHELSIO_GETMTUTAB 		= 1029,
+	CHELSIO_SETMTUTAB 		= 1030,
+	CHELSIO_SET_PM 			= 1032,
+	CHELSIO_GET_PM			= 1033,
+	CHELSIO_GET_MEM			= 1038,
+	CHELSIO_LOAD_FW			= 1041,
+	CHELSIO_SET_TRACE_FILTER	= 1044,
+	CHELSIO_SET_QSET_PARAMS		= 1045,
+	CHELSIO_GET_QSET_PARAMS		= 1046,
+	CHELSIO_SET_QSET_NUM		= 1047,
+	CHELSIO_GET_QSET_NUM		= 1048,
 };
 
 struct ch_reg {
diff --git a/drivers/net/cxgb3/cxgb3_main.c b/drivers/net/cxgb3/cxgb3_main.c
old mode 100755
new mode 100644
index 638b0ab..0e84c4e
--- a/drivers/net/cxgb3/cxgb3_main.c
+++ b/drivers/net/cxgb3/cxgb3_main.c
@@ -1547,32 +1547,6 @@ static int cxgb_extension_ioctl(struct n
 		return -EFAULT;
 
 	switch (cmd) {
-	case CHELSIO_SETREG:{
-		struct ch_reg edata;
-
-		if (!capable(CAP_NET_ADMIN))
-			return -EPERM;
-		if (copy_from_user(&edata, useraddr, sizeof(edata)))
-			return -EFAULT;
-		if ((edata.addr & 3) != 0
-			|| edata.addr >= adapter->mmio_len)
-			return -EINVAL;
-		writel(edata.val, adapter->regs + edata.addr);
-		break;
-	}
-	case CHELSIO_GETREG:{
-		struct ch_reg edata;
-
-		if (copy_from_user(&edata, useraddr, sizeof(edata)))
-			return -EFAULT;
-		if ((edata.addr & 3) != 0
-			|| edata.addr >= adapter->mmio_len)
-			return -EINVAL;
-		edata.val = readl(adapter->regs + edata.addr);
-		if (copy_to_user(useraddr, &edata, sizeof(edata)))
-			return -EFAULT;
-		break;
-	}
 	case CHELSIO_SET_QSET_PARAMS:{
 		int i;
 		struct qset_params *q;
@@ -1836,10 +1810,10 @@ static int cxgb_extension_ioctl(struct n
 			return -EINVAL;
 
 		/*
-			* Version scheme:
-			* bits 0..9: chip version
-			* bits 10..15: chip revision
-			*/
+		 * Version scheme:
+		 * bits 0..9: chip version
+		 * bits 10..15: chip revision
+		 */
 		t.version = 3 | (adapter->params.rev << 10);
 		if (copy_to_user(useraddr, &t, sizeof(t)))
 			return -EFAULT;
@@ -1888,20 +1862,6 @@ static int cxgb_extension_ioctl(struct n
 						t.trace_rx);
 		break;
 	}
-	case CHELSIO_SET_PKTSCHED:{
-		struct ch_pktsched_params p;
-
-		if (!capable(CAP_NET_ADMIN))
-				return -EPERM;
-		if (!adapter->open_device_map)
-				return -EAGAIN;	/* uP and SGE must be running */
-		if (copy_from_user(&p, useraddr, sizeof(p)))
-				return -EFAULT;
-		send_pktsched_cmd(adapter, p.sched, p.idx, p.min, p.max,
-				  p.binding);
-		break;
-			
-	}
 	default:
 		return -EOPNOTSUPP;
 	}


From swise at opengridcomputing.com  Tue Feb 27 07:59:59 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Tue, 27 Feb 2007 09:59:59 -0600
Subject: [openib-general] [PATCH 3/6] Update FW version to 3.2
In-Reply-To: <20070227155953.21615.96154.stgit@dell3.ogc.int>
References: <20070227155953.21615.96154.stgit@dell3.ogc.int>
Message-ID: <20070227155959.21615.25648.stgit@dell3.ogc.int>


Update FW version to 3.2

Signed-off-by: Steve Wise <swise at opengridcomputing.com>
---

 drivers/net/cxgb3/t3_hw.c   |    6 ++++--
 drivers/net/cxgb3/version.h |    2 ++
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/net/cxgb3/t3_hw.c b/drivers/net/cxgb3/t3_hw.c
old mode 100755
new mode 100644
index 365a7f5..eaa7a2e
--- a/drivers/net/cxgb3/t3_hw.c
+++ b/drivers/net/cxgb3/t3_hw.c
@@ -884,11 +884,13 @@ int t3_check_fw_version(struct adapter *
 	major = G_FW_VERSION_MAJOR(vers);
 	minor = G_FW_VERSION_MINOR(vers);
 
-	if (type == FW_VERSION_T3 && major == 3 && minor == 1)
+	if (type == FW_VERSION_T3 && major == FW_VERSION_MAJOR &&
+	    minor == FW_VERSION_MINOR)
 		return 0;
 
 	CH_ERR(adapter, "found wrong FW version(%u.%u), "
-	       "driver needs version 3.1\n", major, minor);
+	       "driver needs version %u.%u\n", major, minor,
+	       FW_VERSION_MAJOR, FW_VERSION_MINOR);
 	return -EINVAL;
 }
 
diff --git a/drivers/net/cxgb3/version.h b/drivers/net/cxgb3/version.h
old mode 100755
new mode 100644
index 2b67dd5..782a6cf
--- a/drivers/net/cxgb3/version.h
+++ b/drivers/net/cxgb3/version.h
@@ -36,4 +36,6 @@ #define DRV_DESC "Chelsio T3 Network Dri
 #define DRV_NAME "cxgb3"
 /* Driver version */
 #define DRV_VERSION "1.0"
+#define FW_VERSION_MAJOR 3
+#define FW_VERSION_MINOR 2
 #endif				/* __CHELSIO_VERSION_H */


From swise at opengridcomputing.com  Tue Feb 27 08:00:01 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Tue, 27 Feb 2007 10:00:01 -0600
Subject: [openib-general] [PATCH 4/6] Offload packets may be DMAed long
 after their SGE Tx descriptors are done
In-Reply-To: <20070227155953.21615.96154.stgit@dell3.ogc.int>
References: <20070227155953.21615.96154.stgit@dell3.ogc.int>
Message-ID: <20070227160001.21615.66513.stgit@dell3.ogc.int>


Offload packets may be DMAed long after their SGE Tx descriptors are done

so they must remain mapped until they are freed rather than until their
descriptors are freed.  Unmap such packets through an skb destructor.

Signed-off-by: Divy Le Ray <divy at chelsio.com>
---

 drivers/net/cxgb3/sge.c |   63 ++++++++++++++++++++++++++++++++++++++++++++++-
 1 files changed, 61 insertions(+), 2 deletions(-)

diff --git a/drivers/net/cxgb3/sge.c b/drivers/net/cxgb3/sge.c
old mode 100755
new mode 100644
index 3f2cf8a..822a598
--- a/drivers/net/cxgb3/sge.c
+++ b/drivers/net/cxgb3/sge.c
@@ -105,6 +105,15 @@ struct unmap_info {		/* packet unmapping
 };
 
 /*
+ * Holds unmapping information for Tx packets that need deferred unmapping.
+ * This structure lives at skb->head and must be allocated by callers.
+ */
+struct deferred_unmap_info {
+	struct pci_dev *pdev;
+	dma_addr_t addr[MAX_SKB_FRAGS + 1];
+};
+
+/*
  * Maps a number of flits to the number of Tx descriptors that can hold them.
  * The formula is
  *
@@ -252,10 +261,13 @@ static void free_tx_desc(struct adapter 
 	struct pci_dev *pdev = adapter->pdev;
 	unsigned int cidx = q->cidx;
 
+	const int need_unmap = need_skb_unmap() &&
+			       q->cntxt_id >= FW_TUNNEL_SGEEC_START;
+
 	d = &q->sdesc[cidx];
 	while (n--) {
 		if (d->skb) {	/* an SGL is present */
-			if (need_skb_unmap())
+			if (need_unmap)
 				unmap_skb(d->skb, q, cidx, pdev);
 			if (d->skb->priority == cidx)
 				kfree_skb(d->skb);
@@ -1227,6 +1239,50 @@ int t3_mgmt_tx(struct adapter *adap, str
 }
 
 /**
+ *	deferred_unmap_destructor - unmap a packet when it is freed
+ *	@skb: the packet
+ *
+ *	This is the packet destructor used for Tx packets that need to remain
+ *	mapped until they are freed rather than until their Tx descriptors are
+ *	freed.
+ */
+static void deferred_unmap_destructor(struct sk_buff *skb)
+{
+	int i;
+	const dma_addr_t *p;
+	const struct skb_shared_info *si;
+	const struct deferred_unmap_info *dui;
+	const struct unmap_info *ui = (struct unmap_info *)skb->cb;
+
+	dui = (struct deferred_unmap_info *)skb->head;
+	p = dui->addr;
+
+	if (ui->len)
+		pci_unmap_single(dui->pdev, *p++, ui->len, PCI_DMA_TODEVICE);
+
+	si = skb_shinfo(skb);
+	for (i = 0; i < si->nr_frags; i++)
+		pci_unmap_page(dui->pdev, *p++, si->frags[i].size,
+			       PCI_DMA_TODEVICE);
+}
+
+static void setup_deferred_unmapping(struct sk_buff *skb, struct pci_dev *pdev,
+				     const struct sg_ent *sgl, int sgl_flits)
+{
+	dma_addr_t *p;
+	struct deferred_unmap_info *dui;
+
+	dui = (struct deferred_unmap_info *)skb->head;
+	dui->pdev = pdev;
+	for (p = dui->addr; sgl_flits >= 3; sgl++, sgl_flits -= 3) {
+		*p++ = be64_to_cpu(sgl->addr[0]);
+		*p++ = be64_to_cpu(sgl->addr[1]);
+	}
+	if (sgl_flits)
+		*p = be64_to_cpu(sgl->addr[0]);
+}
+
+/**
  *	write_ofld_wr - write an offload work request
  *	@adap: the adapter
  *	@skb: the packet to send
@@ -1262,8 +1318,11 @@ static void write_ofld_wr(struct adapter
 	sgp = ndesc == 1 ? (struct sg_ent *)&d->flit[flits] : sgl;
 	sgl_flits = make_sgl(skb, sgp, skb->h.raw, skb->tail - skb->h.raw,
 			     adap->pdev);
-	if (need_skb_unmap())
+	if (need_skb_unmap()) {
+		setup_deferred_unmapping(skb, adap->pdev, sgp, sgl_flits);
+		skb->destructor = deferred_unmap_destructor;
 		((struct unmap_info *)skb->cb)->len = skb->tail - skb->h.raw;
+	}
 
 	write_wr_hdr_sgl(ndesc, skb, d, pidx, q, sgl, flits, sgl_flits,
 			 gen, from->wr_hi, from->wr_lo);


From swise at opengridcomputing.com  Tue Feb 27 08:00:04 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Tue, 27 Feb 2007 10:00:04 -0600
Subject: [openib-general] [PATCH 5/6] Improve the traffic recovery after the
 HW ran out of response queue entries.
In-Reply-To: <20070227155953.21615.96154.stgit@dell3.ogc.int>
References: <20070227155953.21615.96154.stgit@dell3.ogc.int>
Message-ID: <20070227160003.21615.34378.stgit@dell3.ogc.int>


Improve the traffic recovery after the HW ran out of response queue entries.

Signed-off-by: Divy Le Ray <divy at chelsio.com>
---

 drivers/net/cxgb3/adapter.h |    2 ++
 drivers/net/cxgb3/sge.c     |   15 ++++++++++++++-
 2 files changed, 16 insertions(+), 1 deletions(-)

diff --git a/drivers/net/cxgb3/adapter.h b/drivers/net/cxgb3/adapter.h
old mode 100755
new mode 100644
index 5c97a64..01b99b9
--- a/drivers/net/cxgb3/adapter.h
+++ b/drivers/net/cxgb3/adapter.h
@@ -121,6 +121,8 @@ struct sge_rspq {		/* state for an SGE r
 	unsigned long empty;	/* # of times queue ran out of credits */
 	unsigned long nomem;	/* # of responses deferred due to no mem */
 	unsigned long unhandled_irqs;	/* # of spurious intrs */
+	unsigned long starved;
+	unsigned long restarted;
 };
 
 struct tx_desc;
diff --git a/drivers/net/cxgb3/sge.c b/drivers/net/cxgb3/sge.c
index 822a598..4ff0ab6 100644
--- a/drivers/net/cxgb3/sge.c
+++ b/drivers/net/cxgb3/sge.c
@@ -2376,13 +2376,26 @@ static void sge_timer_cb(unsigned long d
 		spin_unlock(&qs->txq[TXQ_OFLD].lock);
 	}
 	lock = (adap->flags & USING_MSIX) ? &qs->rspq.lock :
-	    &adap->sge.qs[0].rspq.lock;
+					    &adap->sge.qs[0].rspq.lock;
 	if (spin_trylock_irq(lock)) {
 		if (!napi_is_scheduled(qs->netdev)) {
+			u32 status = t3_read_reg(adap, A_SG_RSPQ_FL_STATUS);
+
 			if (qs->fl[0].credits < qs->fl[0].size)
 				__refill_fl(adap, &qs->fl[0]);
 			if (qs->fl[1].credits < qs->fl[1].size)
 				__refill_fl(adap, &qs->fl[1]);
+
+			if (status & (1 << qs->rspq.cntxt_id)) {
+				qs->rspq.starved++;
+				if (qs->rspq.credits) {
+					refill_rspq(adap, &qs->rspq, 1);
+					qs->rspq.credits--;
+					qs->rspq.restarted++;
+					t3_write_reg(adap, A_SG_RSPQ_FL_STATUS, 
+						     1 << qs->rspq.cntxt_id);
+				}
+			}
 		}
 		spin_unlock_irq(lock);
 	}


From swise at opengridcomputing.com  Tue Feb 27 08:00:06 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Tue, 27 Feb 2007 10:00:06 -0600
Subject: [openib-general] [PATCH 6/6] Populate Rx free list with pages.
In-Reply-To: <20070227155953.21615.96154.stgit@dell3.ogc.int>
References: <20070227155953.21615.96154.stgit@dell3.ogc.int>
Message-ID: <20070227160006.21615.53181.stgit@dell3.ogc.int>


Populate Rx free list with pages.

Signed-off-by: Divy Le Ray <divy at chelsio.com>
---

 drivers/net/cxgb3/adapter.h |    9 +
 drivers/net/cxgb3/sge.c     |  318 +++++++++++++++++++++++++++++++------------
 2 files changed, 235 insertions(+), 92 deletions(-)

diff --git a/drivers/net/cxgb3/adapter.h b/drivers/net/cxgb3/adapter.h
index 01b99b9..80c3d8f 100644
--- a/drivers/net/cxgb3/adapter.h
+++ b/drivers/net/cxgb3/adapter.h
@@ -74,6 +74,11 @@ enum {				/* adapter flags */
 struct rx_desc;
 struct rx_sw_desc;
 
+struct sge_fl_page {
+	struct skb_frag_struct frag;
+	unsigned char *va;
+};
+
 struct sge_fl {			/* SGE per free-buffer list state */
 	unsigned int buf_size;	/* size of each Rx buffer */
 	unsigned int credits;	/* # of available Rx buffers */
@@ -81,11 +86,13 @@ struct sge_fl {			/* SGE per free-buffer
 	unsigned int cidx;	/* consumer index */
 	unsigned int pidx;	/* producer index */
 	unsigned int gen;	/* free list generation */
+	unsigned int cntxt_id;	/* SGE context id for the free list */
+	struct sge_fl_page page;
 	struct rx_desc *desc;	/* address of HW Rx descriptor ring */
 	struct rx_sw_desc *sdesc;	/* address of SW Rx descriptor ring */
 	dma_addr_t phys_addr;	/* physical address of HW ring start */
-	unsigned int cntxt_id;	/* SGE context id for the free list */
 	unsigned long empty;	/* # of times queue ran out of buffers */
+	unsigned long alloc_failed; /* # of times buffer allocation failed */
 };
 
 /*
diff --git a/drivers/net/cxgb3/sge.c b/drivers/net/cxgb3/sge.c
index 4ff0ab6..c237834 100644
--- a/drivers/net/cxgb3/sge.c
+++ b/drivers/net/cxgb3/sge.c
@@ -45,9 +45,25 @@ #include "firmware_exports.h"
 #define USE_GTS 0
 
 #define SGE_RX_SM_BUF_SIZE 1536
+
+/*
+ * If USE_RX_PAGE is defined, the small freelist populated with (partial)
+ * pages instead of skbs. Pages are carved up into RX_PAGE_SIZE chunks (must
+ * be a multiple of the host page size).
+ */
+#define USE_RX_PAGE
+#define RX_PAGE_SIZE 2048
+
+/*
+ * skb freelist packets are copied into a new skb (and the freelist one is 
+ * reused) if their len is <= 
+ */
 #define SGE_RX_COPY_THRES  256
 
-# define SGE_RX_DROP_THRES 16
+/*
+ * Minimum number of freelist entries before we start dropping TUNNEL frames.
+ */
+#define SGE_RX_DROP_THRES 16
 
 /*
  * Period of the Tx buffer reclaim timer.  This timer does not need to run
@@ -85,7 +101,10 @@ struct tx_sw_desc {		/* SW state per Tx 
 };
 
 struct rx_sw_desc {		/* SW state per Rx descriptor */
-	struct sk_buff *skb;
+	union {
+		struct sk_buff *skb;
+		struct sge_fl_page page;
+	} t;
 	 DECLARE_PCI_UNMAP_ADDR(dma_addr);
 };
 
@@ -332,16 +351,27 @@ static void free_rx_bufs(struct pci_dev 
 
 		pci_unmap_single(pdev, pci_unmap_addr(d, dma_addr),
 				 q->buf_size, PCI_DMA_FROMDEVICE);
-		kfree_skb(d->skb);
-		d->skb = NULL;
+
+		if (q->buf_size != RX_PAGE_SIZE) {
+			kfree_skb(d->t.skb);
+			d->t.skb = NULL;
+		} else {
+			if (d->t.page.frag.page)
+				put_page(d->t.page.frag.page);
+			d->t.page.frag.page = NULL;
+		}
 		if (++cidx == q->size)
 			cidx = 0;
 	}
+
+	if (q->page.frag.page)
+		put_page(q->page.frag.page);
+	q->page.frag.page = NULL;
 }
 
 /**
  *	add_one_rx_buf - add a packet buffer to a free-buffer list
- *	@skb: the buffer to add
+ *	@va: va of the buffer to add
  *	@len: the buffer length
  *	@d: the HW Rx descriptor to write
  *	@sd: the SW Rx descriptor to write
@@ -351,14 +381,13 @@ static void free_rx_bufs(struct pci_dev 
  *	Add a buffer of the given length to the supplied HW and SW Rx
  *	descriptors.
  */
-static inline void add_one_rx_buf(struct sk_buff *skb, unsigned int len,
+static inline void add_one_rx_buf(unsigned char *va, unsigned int len,
 				  struct rx_desc *d, struct rx_sw_desc *sd,
 				  unsigned int gen, struct pci_dev *pdev)
 {
 	dma_addr_t mapping;
 
-	sd->skb = skb;
-	mapping = pci_map_single(pdev, skb->data, len, PCI_DMA_FROMDEVICE);
+	mapping = pci_map_single(pdev, va, len, PCI_DMA_FROMDEVICE);
 	pci_unmap_addr_set(sd, dma_addr, mapping);
 
 	d->addr_lo = cpu_to_be32(mapping);
@@ -383,14 +412,47 @@ static void refill_fl(struct adapter *ad
 {
 	struct rx_sw_desc *sd = &q->sdesc[q->pidx];
 	struct rx_desc *d = &q->desc[q->pidx];
+	struct sge_fl_page *p = &q->page;
 
 	while (n--) {
-		struct sk_buff *skb = alloc_skb(q->buf_size, gfp);
+		unsigned char *va;
 
-		if (!skb)
-			break;
+		if (unlikely(q->buf_size != RX_PAGE_SIZE)) {
+			struct sk_buff *skb = alloc_skb(q->buf_size, gfp);
+
+			if (!skb) {
+				q->alloc_failed++;
+				break;
+			}
+			va = skb->data;
+			sd->t.skb = skb;
+		} else {
+			if (!p->frag.page) {
+				p->frag.page = alloc_pages(gfp, 0);
+				if (unlikely(!p->frag.page)) {
+					q->alloc_failed++;
+					break;
+				} else {
+					p->frag.size = RX_PAGE_SIZE;
+					p->frag.page_offset = 0;
+					p->va = page_address(p->frag.page);
+				}
+			}
+
+			memcpy(&sd->t, p, sizeof(*p));
+			va = p->va;
+
+			p->frag.page_offset += RX_PAGE_SIZE;
+			BUG_ON(p->frag.page_offset > PAGE_SIZE);
+			p->va += RX_PAGE_SIZE;
+			if (p->frag.page_offset == PAGE_SIZE)
+				p->frag.page = NULL;
+			else
+				get_page(p->frag.page);
+		}
+
+		add_one_rx_buf(va, q->buf_size, d, sd, q->gen, adap->pdev);
 
-		add_one_rx_buf(skb, q->buf_size, d, sd, q->gen, adap->pdev);
 		d++;
 		sd++;
 		if (++q->pidx == q->size) {
@@ -425,7 +487,7 @@ static void recycle_rx_buf(struct adapte
 	struct rx_desc *from = &q->desc[idx];
 	struct rx_desc *to = &q->desc[q->pidx];
 
-	q->sdesc[q->pidx] = q->sdesc[idx];
+	memcpy(&q->sdesc[q->pidx], &q->sdesc[idx], sizeof(struct rx_sw_desc));
 	to->addr_lo = from->addr_lo;	/* already big endian */
 	to->addr_hi = from->addr_hi;	/* likewise */
 	wmb();
@@ -458,7 +520,7 @@ static void recycle_rx_buf(struct adapte
  *	of the SW ring.
  */
 static void *alloc_ring(struct pci_dev *pdev, size_t nelem, size_t elem_size,
-			size_t sw_size, dma_addr_t *phys, void *metadata)
+			size_t sw_size, dma_addr_t * phys, void *metadata)
 {
 	size_t len = nelem * elem_size;
 	void *s = NULL;
@@ -588,61 +650,6 @@ static inline unsigned int flits_to_desc
 }
 
 /**
- *	get_packet - return the next ingress packet buffer from a free list
- *	@adap: the adapter that received the packet
- *	@fl: the SGE free list holding the packet
- *	@len: the packet length including any SGE padding
- *	@drop_thres: # of remaining buffers before we start dropping packets
- *
- *	Get the next packet from a free list and complete setup of the
- *	sk_buff.  If the packet is small we make a copy and recycle the
- *	original buffer, otherwise we use the original buffer itself.  If a
- *	positive drop threshold is supplied packets are dropped and their
- *	buffers recycled if (a) the number of remaining buffers is under the
- *	threshold and the packet is too big to copy, or (b) the packet should
- *	be copied but there is no memory for the copy.
- */
-static struct sk_buff *get_packet(struct adapter *adap, struct sge_fl *fl,
-				  unsigned int len, unsigned int drop_thres)
-{
-	struct sk_buff *skb = NULL;
-	struct rx_sw_desc *sd = &fl->sdesc[fl->cidx];
-
-	prefetch(sd->skb->data);
-
-	if (len <= SGE_RX_COPY_THRES) {
-		skb = alloc_skb(len, GFP_ATOMIC);
-		if (likely(skb != NULL)) {
-			__skb_put(skb, len);
-			pci_dma_sync_single_for_cpu(adap->pdev,
-						    pci_unmap_addr(sd,
-								   dma_addr),
-						    len, PCI_DMA_FROMDEVICE);
-			memcpy(skb->data, sd->skb->data, len);
-			pci_dma_sync_single_for_device(adap->pdev,
-						       pci_unmap_addr(sd,
-								      dma_addr),
-						       len, PCI_DMA_FROMDEVICE);
-		} else if (!drop_thres)
-			goto use_orig_buf;
-	      recycle:
-		recycle_rx_buf(adap, fl, fl->cidx);
-		return skb;
-	}
-
-	if (unlikely(fl->credits < drop_thres))
-		goto recycle;
-
-      use_orig_buf:
-	pci_unmap_single(adap->pdev, pci_unmap_addr(sd, dma_addr),
-			 fl->buf_size, PCI_DMA_FROMDEVICE);
-	skb = sd->skb;
-	skb_put(skb, len);
-	__refill_fl(adap, fl);
-	return skb;
-}
-
-/**
  *	get_imm_packet - return the next ingress packet buffer from a response
  *	@resp: the response descriptor containing the packet data
  *
@@ -1676,7 +1683,6 @@ static void rx_eth(struct adapter *adap,
 	struct cpl_rx_pkt *p = (struct cpl_rx_pkt *)(skb->data + pad);
 	struct port_info *pi;
 
-	rq->eth_pkts++;
 	skb_pull(skb, sizeof(*p) + pad);
 	skb->dev = adap->port[p->iff];
 	skb->dev->last_rx = jiffies;
@@ -1704,6 +1710,85 @@ static void rx_eth(struct adapter *adap,
 		netif_rx(skb);
 }
 
+#define SKB_DATA_SIZE 128
+
+static void skb_data_init(struct sk_buff *skb, struct sge_fl_page *p,
+			  unsigned int len)
+{
+	skb->len = len;
+	if (len <= SKB_DATA_SIZE) {
+		memcpy(skb->data, p->va, len);
+		skb->tail += len;
+		put_page(p->frag.page);
+	} else {
+		memcpy(skb->data, p->va, SKB_DATA_SIZE);
+		skb_shinfo(skb)->frags[0].page = p->frag.page;
+		skb_shinfo(skb)->frags[0].page_offset =
+		    p->frag.page_offset + SKB_DATA_SIZE;
+		skb_shinfo(skb)->frags[0].size = len - SKB_DATA_SIZE;
+		skb_shinfo(skb)->nr_frags = 1;
+		skb->data_len = len - SKB_DATA_SIZE;
+		skb->tail += SKB_DATA_SIZE;
+		skb->truesize += skb->data_len;
+	}
+}
+
+/**
+*      get_packet - return the next ingress packet buffer from a free list
+*      @adap: the adapter that received the packet
+*      @fl: the SGE free list holding the packet
+*      @len: the packet length including any SGE padding
+*      @drop_thres: # of remaining buffers before we start dropping packets
+*
+*      Get the next packet from a free list and complete setup of the
+*      sk_buff.  If the packet is small we make a copy and recycle the
+*      original buffer, otherwise we use the original buffer itself.  If a
+*      positive drop threshold is supplied packets are dropped and their
+*      buffers recycled if (a) the number of remaining buffers is under the
+*      threshold and the packet is too big to copy, or (b) the packet should
+*      be copied but there is no memory for the copy.
+*/
+static struct sk_buff *get_packet(struct adapter *adap, struct sge_fl *fl,
+				  unsigned int len, unsigned int drop_thres)
+{
+	struct sk_buff *skb = NULL;
+	struct rx_sw_desc *sd = &fl->sdesc[fl->cidx];
+
+	prefetch(sd->t.skb->data);
+
+	if (len <= SGE_RX_COPY_THRES) {
+		skb = alloc_skb(len, GFP_ATOMIC);
+		if (likely(skb != NULL)) {
+			struct rx_desc *d = &fl->desc[fl->cidx];
+			dma_addr_t mapping =
+			    (dma_addr_t)((u64) be32_to_cpu(d->addr_hi) << 32 |
+					 be32_to_cpu(d->addr_lo));
+
+			__skb_put(skb, len);
+			pci_dma_sync_single_for_cpu(adap->pdev, mapping, len,
+						    PCI_DMA_FROMDEVICE);
+			memcpy(skb->data, sd->t.skb->data, len);
+			pci_dma_sync_single_for_device(adap->pdev, mapping, len,
+						       PCI_DMA_FROMDEVICE);
+		} else if (!drop_thres)
+			goto use_orig_buf;
+recycle:
+		recycle_rx_buf(adap, fl, fl->cidx);
+		return skb;
+	}
+
+	if (unlikely(fl->credits < drop_thres))
+		goto recycle;
+
+use_orig_buf:
+	pci_unmap_single(adap->pdev, pci_unmap_addr(sd, dma_addr),
+			 fl->buf_size, PCI_DMA_FROMDEVICE);
+	skb = sd->t.skb;
+	skb_put(skb, len);
+	__refill_fl(adap, fl);
+	return skb;
+}
+
 /**
  *	handle_rsp_cntrl_info - handles control information in a response
  *	@qs: the queue set corresponding to the response
@@ -1826,7 +1911,7 @@ static int process_responses(struct adap
 	q->next_holdoff = q->holdoff_tmr;
 
 	while (likely(budget_left && is_new_response(r, q))) {
-		int eth, ethpad = 0;
+		int eth, ethpad = 2;
 		struct sk_buff *skb = NULL;
 		u32 len, flags = ntohl(r->flags);
 		u32 rss_hi = *(const u32 *)r, rss_lo = r->rss_hdr.rss_hash_val;
@@ -1853,18 +1938,56 @@ static int process_responses(struct adap
 				break;
 			}
 			q->imm_data++;
+			ethpad = 0;
 		} else if ((len = ntohl(r->len_cq)) != 0) {
-			struct sge_fl *fl;
+			struct sge_fl *fl =
+			    (len & F_RSPD_FLQ) ? &qs->fl[1] : &qs->fl[0];
+
+			if (fl->buf_size == RX_PAGE_SIZE) {
+				struct rx_sw_desc *sd = &fl->sdesc[fl->cidx];
+				struct sge_fl_page *p = &sd->t.page;
+
+				prefetch(p->va);
+				prefetch(p->va + L1_CACHE_BYTES);
+
+				__refill_fl(adap, fl);
+
+				pci_unmap_single(adap->pdev,
+						 pci_unmap_addr(sd, dma_addr),
+						 fl->buf_size,
+						 PCI_DMA_FROMDEVICE);
+
+				if (eth) {
+					if (unlikely(fl->credits <
+						     SGE_RX_DROP_THRES))
+						goto eth_recycle;
+
+					skb = alloc_skb(SKB_DATA_SIZE,
+							GFP_ATOMIC);
+					if (unlikely(!skb)) {
+eth_recycle:
+						q->rx_drops++;
+						recycle_rx_buf(adap, fl,
+							       fl->cidx);
+						goto eth_done;
+					}
+				} else {
+					skb = alloc_skb(SKB_DATA_SIZE,
+							GFP_ATOMIC);
+					if (unlikely(!skb))
+						goto no_mem;
+				}
+
+				skb_data_init(skb, p, G_RSPD_LEN(len));
+eth_done:
+				fl->credits--;
+				q->eth_pkts++;
+			} else {
+				fl->credits--;
+				skb = get_packet(adap, fl, G_RSPD_LEN(len),
+						 eth ? SGE_RX_DROP_THRES : 0);
+			}
 
-			fl = (len & F_RSPD_FLQ) ? &qs->fl[1] : &qs->fl[0];
-			fl->credits--;
-			skb = get_packet(adap, fl, G_RSPD_LEN(len),
-					 eth ? SGE_RX_DROP_THRES : 0);
-			if (!skb)
-				q->rx_drops++;
-			else if (r->rss_hdr.opcode == CPL_TRACE_PKT)
-				__skb_pull(skb, 2);
-			ethpad = 2;
 			if (++fl->cidx == fl->size)
 				fl->cidx = 0;
 		} else
@@ -1888,18 +2011,23 @@ static int process_responses(struct adap
 			q->credits = 0;
 		}
 
-		if (likely(skb != NULL)) {
+		if (skb) {
+			/* Preserve the RSS info in csum & priority */
+			skb->csum = rss_hi;
+			skb->priority = rss_lo;
+
 			if (eth)
 				rx_eth(adap, q, skb, ethpad);
 			else {
-				/* Preserve the RSS info in csum & priority */
-				skb->csum = rss_hi;
-				skb->priority = rss_lo;
-				ngathered = rx_offload(&adap->tdev, q, skb,
-						       offload_skbs, ngathered);
+				if (unlikely(r->rss_hdr.opcode ==
+					     CPL_TRACE_PKT))
+					__skb_pull(skb, ethpad);
+
+				ngathered = rx_offload(&adap->tdev, q,
+						       skb, offload_skbs,
+						       ngathered);
 			}
 		}
-
 		--budget_left;
 	}
 
@@ -2376,7 +2504,7 @@ static void sge_timer_cb(unsigned long d
 		spin_unlock(&qs->txq[TXQ_OFLD].lock);
 	}
 	lock = (adap->flags & USING_MSIX) ? &qs->rspq.lock :
-					    &adap->sge.qs[0].rspq.lock;
+	    &adap->sge.qs[0].rspq.lock;
 	if (spin_trylock_irq(lock)) {
 		if (!napi_is_scheduled(qs->netdev)) {
 			u32 status = t3_read_reg(adap, A_SG_RSPQ_FL_STATUS);
@@ -2392,7 +2520,7 @@ static void sge_timer_cb(unsigned long d
 					refill_rspq(adap, &qs->rspq, 1);
 					qs->rspq.credits--;
 					qs->rspq.restarted++;
-					t3_write_reg(adap, A_SG_RSPQ_FL_STATUS, 
+					t3_write_reg(adap, A_SG_RSPQ_FL_STATUS,
 						     1 << qs->rspq.cntxt_id);
 				}
 			}
@@ -2504,13 +2632,21 @@ int t3_sge_alloc_qset(struct adapter *ad
 	    flits_to_desc(sgl_len(MAX_SKB_FRAGS + 1) + 3);
 
 	if (ntxq == 1) {
+#ifdef USE_RX_PAGE
+		q->fl[0].buf_size = RX_PAGE_SIZE;
+#else
 		q->fl[0].buf_size = SGE_RX_SM_BUF_SIZE + 2 +
 		    sizeof(struct cpl_rx_pkt);
+#endif
 		q->fl[1].buf_size = MAX_FRAME_SIZE + 2 +
 		    sizeof(struct cpl_rx_pkt);
 	} else {
+#ifdef USE_RX_PAGE
+		q->fl[0].buf_size = RX_PAGE_SIZE;
+#else
 		q->fl[0].buf_size = SGE_RX_SM_BUF_SIZE +
 		    sizeof(struct cpl_rx_data);
+#endif
 		q->fl[1].buf_size = (16 * 1024) -
 		    SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
 	}
@@ -2704,7 +2840,7 @@ void __devinit t3_sge_prep(struct adapte
 		q->polling = adap->params.rev > 0;
 		q->coalesce_usecs = 5;
 		q->rspq_size = 1024;
-		q->fl_size = 4096;
+		q->fl_size = 1024;
 		q->jumbo_size = 512;
 		q->txq_size[TXQ_ETH] = 1024;
 		q->txq_size[TXQ_OFLD] = 1024;


From vlad at mellanox.co.il  Tue Feb 27 08:36:46 2007
From: vlad at mellanox.co.il (Vladimir Sokolovsky)
Date: Tue, 27 Feb 2007 18:36:46 +0200
Subject: [openib-general] [PATCH] for OFED 1.2
In-Reply-To: <45E31D0A.20400@ichips.intel.com>
References: <000001c75787$50ff0440$ff0da8c0@amr.corp.intel.com>
	<1172394057.12388.3.camel@vladsk-laptop>
	<45E31D0A.20400@ichips.intel.com>
Message-ID: <1172594206.21382.90.camel@vladsk-laptop>

On Mon, 2007-02-26 at 09:46 -0800, Sean Hefty wrote:
> Vladimir Sokolovsky wrote:
> > On Fri, 2007-02-23 at 12:15 -0800, Sean Hefty wrote:
> >  > I would like these fixes in OFED 1.2 as well.  What git tree / branch 
> > do I
> >  > generate a patch against?
> >  >
> >  > - Sean
> > 
> > git://git.openfabrics.org/~vlad/ofed_1_2/.git
> > branch: ofed_1_2
> 
> Can you try pulling from:
> 
> git://git.openfabrics.org/~shefty/ofed_1_2.git   ofed_1_2
> 
> - Sean

Sean,
Please send patches that will be added to kernel_patches/fixes.

Please update your git tree from
git://git.openfabrics.org/~vlad/ofed_1_2/.git  ofed_1_2


-- 
Vladimir Sokolovsky <vlad at mellanox.co.il>
Mellanox Technologies Ltd.


From monil at voltaire.com  Tue Feb 27 08:43:21 2007
From: monil at voltaire.com (Moni Levy)
Date: Tue, 27 Feb 2007 18:43:21 +0200
Subject: [openib-general] [RFC] IB/ipoib: Asynchronous events delivered
 without port parameter.
In-Reply-To: <6a122cc00702270747l26c1adavd57de9ba2d9a472b@mail.gmail.com>
References: <6a122cc00702270502h27d90515k117bf23ea3f31f4d@mail.gmail.com>
	<adak5y3wqzj.fsf@cisco.com>
	<6a122cc00702270747l26c1adavd57de9ba2d9a472b@mail.gmail.com>
Message-ID: <6a122cc00702270843h34e407bek9f8757e9fb309fc6@mail.gmail.com>

On 2/27/07, Moni Levy <monil at voltaire.com> wrote:
> On 2/27/07, Roland Dreier <rdreier at cisco.com> wrote:
> >  >    I did a short code review of the ipoib code concentrating on
> >  > partitioning support and I mentioned that the asynchronous events
> >  > handler in the ipoib code does not take the port number reported in
> >  > the event record into consideration. The effect of that is that all of
> >  > the ib# devices related to that specific HCA are flushed when it seems
> >  > to me that only the relevant port one should be. Is that done on
> >  > purpose, or am I missing something ?
> >
> > I don't think there's any particular reason the code is that way
> > except for the oversight never being corrected.  But it looks trivial
> > to fix, like the patch below.  Does that look right to you?
> >
> >  > p.s. I'm working on a patch that should solve another issue caused by
> >  > PKEY reordering & ipoib behavior and the above issue further
> >  > complicates things for me.
> >
> > Why not fix the issue first then?
> >
> > commit a27cbe878203076247c1b5287f5ab59ed143b560
> > Author: Roland Dreier <rolandd at cisco.com>
> > Date:   Tue Feb 27 07:37:49 2007 -0800
> >
> >     IPoIB: Only handle async events for one port
> >
> >     An asynchronous event carries the port number that the event occurred
> >     on, so there's no reason for an IPoIB interface to process an event
> >     associated with a different local HCA port.
> >
> >     Signed-off-by: Roland Dreier <rolandd at cisco.com>
> >
> > diff --git a/drivers/infiniband/ulp/ipoib/ipoib_verbs.c b/drivers/infiniband/ulp/ipoib/ipoib_verbs.c
> > index 3cb551b..7f3ec20 100644
> > --- a/drivers/infiniband/ulp/ipoib/ipoib_verbs.c
> > +++ b/drivers/infiniband/ulp/ipoib/ipoib_verbs.c
> > @@ -259,12 +259,13 @@ void ipoib_event(struct ib_event_handler *handler,
> >         struct ipoib_dev_priv *priv =
> >                 container_of(handler, struct ipoib_dev_priv, event_handler);
> >
> > -       if (record->event == IB_EVENT_PORT_ERR    ||
> > -           record->event == IB_EVENT_PKEY_CHANGE ||
> > -           record->event == IB_EVENT_PORT_ACTIVE ||
> > -           record->event == IB_EVENT_LID_CHANGE  ||
> > -           record->event == IB_EVENT_SM_CHANGE   ||
> > -           record->event == IB_EVENT_CLIENT_REREGISTER) {
> > +       if ((record->event == IB_EVENT_PORT_ERR    ||
> > +            record->event == IB_EVENT_PKEY_CHANGE ||
> > +            record->event == IB_EVENT_PORT_ACTIVE ||
> > +            record->event == IB_EVENT_LID_CHANGE  ||
> > +            record->event == IB_EVENT_SM_CHANGE   ||
> > +            record->event == IB_EVENT_CLIENT_REREGISTER) &&
> > +           record->element.port_num == priv->port) {
> >                 ipoib_dbg(priv, "Port state change event\n");
> >                 queue_work(ipoib_workqueue, &priv->flush_task);
> >         }
> >
>
> That's exactly what I intended to post.

On a second thought based on the fact that on a two port HCA we'll
have a 50% miss on the events being delivered, I would move the new
condition to be evaluated first. I apologize if this is too much of
micro optimization. What do you think ?

--Moni

>
> --Moni
>


From sean.hefty at intel.com  Tue Feb 27 08:45:41 2007
From: sean.hefty at intel.com (Sean Hefty)
Date: Tue, 27 Feb 2007 08:45:41 -0800
Subject: [openib-general] [PATCH] for OFED 1.2
In-Reply-To: <1172594206.21382.90.camel@vladsk-laptop>
Message-ID: <000001c75a8e$b7a2e3b0$c6d8180a@amr.corp.intel.com>

>Please send patches that will be added to kernel_patches/fixes.
>
>Please update your git tree from
>git://git.openfabrics.org/~vlad/ofed_1_2/.git  ofed_1_2

You want me to create a patch that adds a file that contains the actual patches?

Why not apply the patches directly?


From rdreier at cisco.com  Tue Feb 27 08:51:19 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Tue, 27 Feb 2007 08:51:19 -0800
Subject: [openib-general] [RFC] IB/ipoib: Asynchronous events delivered
 without port parameter.
In-Reply-To: <6a122cc00702270843h34e407bek9f8757e9fb309fc6@mail.gmail.com>
	(Moni Levy's message of "Tue, 27 Feb 2007 18:43:21 +0200")
References: <6a122cc00702270502h27d90515k117bf23ea3f31f4d@mail.gmail.com>
	<adak5y3wqzj.fsf@cisco.com>
	<6a122cc00702270747l26c1adavd57de9ba2d9a472b@mail.gmail.com>
	<6a122cc00702270843h34e407bek9f8757e9fb309fc6@mail.gmail.com>
Message-ID: <ada3b4rwnlk.fsf@cisco.com>

 > On a second thought based on the fact that on a two port HCA we'll
 > have a 50% miss on the events being delivered, I would move the new
 > condition to be evaluated first. I apologize if this is too much of
 > micro optimization. What do you think ?

That wouldn't really be correct since element.port_num isn't valid
unless we already know it's a port-related event.

And it's not worth worrying about this since it's not remotely a hot path.

 - R.


From vlad at mellanox.co.il  Tue Feb 27 08:53:51 2007
From: vlad at mellanox.co.il (Vladimir Sokolovsky)
Date: Tue, 27 Feb 2007 18:53:51 +0200
Subject: [openib-general] [PATCH] for OFED 1.2
In-Reply-To: <000001c75a8e$b7a2e3b0$c6d8180a@amr.corp.intel.com>
References: <000001c75a8e$b7a2e3b0$c6d8180a@amr.corp.intel.com>
Message-ID: <1172595231.21382.96.camel@vladsk-laptop>

On Tue, 2007-02-27 at 08:45 -0800, Sean Hefty wrote:
> >Please send patches that will be added to kernel_patches/fixes.
> >
> >Please update your git tree from
> >git://git.openfabrics.org/~vlad/ofed_1_2/.git  ofed_1_2
> 
> You want me to create a patch that adds a file that contains the actual patches?
Yes, actual patches should be created under kernel_patches/fixes.

Please update your git tree because the following patch fails:

        From 2e7e33936de5f92656c0565ce88f97e796367dae Mon Sep 17 00:00:00 2001
        From: Sean Hefty <sean.hefty at intel.com>
        Date: Fri, 23 Feb 2007 12:35:43 -0800
        Subject: [PATCH] rdma_cm: request reversible paths only
        
        The rdma_cm requires that path records be reversible.  Set the reversible
        bit when issuing an path record query.
        
        Signed-off-by: Sean Hefty <sean.hefty at intel.com>
        
        diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
        index 9e0ab04..171cce9 100644
        --- a/drivers/infiniband/core/cma.c
        +++ b/drivers/infiniband/core/cma.c
        @@ -1396,11 +1396,13 @@ static int cma_query_ib_route(struct rdma_id_private *id_priv, int timeout_ms,
                ib_addr_get_dgid(addr, &path_rec.dgid);
                path_rec.pkey = cpu_to_be16(ib_addr_get_pkey(addr));
                path_rec.numb_path = 1;
        +       path_rec.reversible = 1;
        
                id_priv->query_id = ib_sa_path_rec_get(&sa_client, id_priv->id.device,
                                        id_priv->id.port_num, &path_rec,
                                        IB_SA_PATH_REC_DGID | IB_SA_PATH_REC_SGID |
        -                               IB_SA_PATH_REC_PKEY | IB_SA_PATH_REC_NUMB_PATH,
        +                               IB_SA_PATH_REC_PKEY | IB_SA_PATH_REC_NUMB_PATH |
        +                               IB_SA_PATH_REC_REVERSIBLE,
                                        timeout_ms, GFP_KERNEL,
                                        cma_query_handler, work, &id_priv->query);
        

> 
> Why not apply the patches directly?
> 
To be consistent with 2.6.20 kernel.


-- 
Vladimir Sokolovsky <vlad at mellanox.co.il>
Mellanox Technologies Ltd.


From mst at mellanox.co.il  Tue Feb 27 08:55:32 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Tue, 27 Feb 2007 18:55:32 +0200
Subject: [openib-general] [PATCH] for OFED 1.2
In-Reply-To: <000001c75a8e$b7a2e3b0$c6d8180a@amr.corp.intel.com>
References: <1172594206.21382.90.camel@vladsk-laptop>
	<000001c75a8e$b7a2e3b0$c6d8180a@amr.corp.intel.com>
Message-ID: <20070227165532.GB10245@mellanox.co.il>

> Quoting Sean Hefty <sean.hefty at intel.com>:
> Subject: Re: [PATCH] for OFED 1.2
> 
> >Please send patches that will be added to kernel_patches/fixes.
> >
> >Please update your git tree from
> >git://git.openfabrics.org/~vlad/ofed_1_2/.git  ofed_1_2
> 
> You want me to create a patch that adds a file that contains the actual patches?
> 
> Why not apply the patches directly?

That's the ofed structure, this was discussed multiple times already.
The point is to keep all changes to upstream components separate,
to make updating to upstream kernel trivial in the future.

Worked quite well for OFED 1.1 -> 1.2 transition.

-- 
MST


From monil at voltaire.com  Tue Feb 27 08:57:31 2007
From: monil at voltaire.com (Moni Levy)
Date: Tue, 27 Feb 2007 18:57:31 +0200
Subject: [openib-general] [RFC] [PATCH] ib_cache: do not mask upper bit
 when searching for a pkey
In-Reply-To: <000201c759e3$24828410$55d8180a@amr.corp.intel.com>
References: <1172507101.4102.277140.camel@hal.voltaire.com>
	<000201c759e3$24828410$55d8180a@amr.corp.intel.com>
Message-ID: <6a122cc00702270857o41d36732sef607282f013a4b4@mail.gmail.com>

Sean,
On 2/26/07, Sean Hefty <sean.hefty at intel.com> wrote:
> I think the following patch would make ipoib spec compliant.
> ib_find_cached_pkey is called by ib_cm, rdma_cm, ib_srp, and ib_ipoib.
> I'm not certain what this change would do to SRP, but the ib_cm and
> rdma_cm look okay, given that non-reversible paths aren't supported
> yet anyway.

Sorry for jumping into that thread, but although this patch will make
things more spec compliant, it will break functionality we depend one.
I suggest that we first find an alternate way to enable usage of
partial partition membership before disabling that functionality at
all.

--Moni

> --
>
> ib_find_cached_pkey masks off the upper-bit of the PKey when searching
> for a match.  The upper bit indicates partial or full membership.  Ignoring
> the upper bit can result in a full membership PKey matching with a partial
> membership PKey.  For ipoib, this can result in joining a multicast group
> that disallows communication between all members.
>
> Signed-off-by: Sean Hefty <sean.hefty at intel.com>
> ---
>  drivers/infiniband/core/cache.c |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/drivers/infiniband/core/cache.c b/drivers/infiniband/core/cache.c
> index 558c9a0..6f366c3 100644
> --- a/drivers/infiniband/core/cache.c
> +++ b/drivers/infiniband/core/cache.c
> @@ -179,7 +179,7 @@ int ib_find_cached_pkey(struct ib_device *device,
>         *index = -1;
>
>         for (i = 0; i < cache->table_len; ++i)
> -               if ((cache->table[i] & 0x7fff) == (pkey & 0x7fff)) {
> +               if (cache->table[i] == pkey) {
>                         *index = i;
>                         ret = 0;
>                         break;
> --
> 1.4.4.3
>
>
>
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
>
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
>
>


From monil at voltaire.com  Tue Feb 27 09:00:01 2007
From: monil at voltaire.com (Moni Levy)
Date: Tue, 27 Feb 2007 19:00:01 +0200
Subject: [openib-general] [RFC] IB/ipoib: Asynchronous events delivered
 without port parameter.
In-Reply-To: <ada3b4rwnlk.fsf@cisco.com>
References: <6a122cc00702270502h27d90515k117bf23ea3f31f4d@mail.gmail.com>
	<adak5y3wqzj.fsf@cisco.com>
	<6a122cc00702270747l26c1adavd57de9ba2d9a472b@mail.gmail.com>
	<6a122cc00702270843h34e407bek9f8757e9fb309fc6@mail.gmail.com>
	<ada3b4rwnlk.fsf@cisco.com>
Message-ID: <6a122cc00702270900q43b6e3fo7008aeaf64236d38@mail.gmail.com>

On 2/27/07, Roland Dreier <rdreier at cisco.com> wrote:
>  > On a second thought based on the fact that on a two port HCA we'll
>  > have a 50% miss on the events being delivered, I would move the new
>  > condition to be evaluated first. I apologize if this is too much of
>  > micro optimization. What do you think ?
>
> That wouldn't really be correct since element.port_num isn't valid
> unless we already know it's a port-related event.

You're perfectly right, sorry.

>
> And it's not worth worrying about this since it's not remotely a hot path.

Ok.

--Moni

>
>  - R.
>


From sean.hefty at intel.com  Tue Feb 27 09:00:41 2007
From: sean.hefty at intel.com (Sean Hefty)
Date: Tue, 27 Feb 2007 09:00:41 -0800
Subject: [openib-general] [PATCH] for OFED 1.2
In-Reply-To: <1172595231.21382.96.camel@vladsk-laptop>
Message-ID: <000101c75a90$d04f31f0$c6d8180a@amr.corp.intel.com>

>Yes, actual patches should be created under kernel_patches/fixes.
>
>Please update your git tree because the following patch fails:

Can you explain how the patch fails?  I don't see how putting the patch into a
file helps.

>> Why not apply the patches directly?
>>
>To be consistent with 2.6.20 kernel.

You can check out stock 2.6.20 using a tag.  Why maintain the ofed code in git
if you don't use it to track patches?

- Sean


From mshefty at ichips.intel.com  Tue Feb 27 09:06:53 2007
From: mshefty at ichips.intel.com (Sean Hefty)
Date: Tue, 27 Feb 2007 09:06:53 -0800
Subject: [openib-general] [RFC] [PATCH] ib_cache: do not mask upper bit
 when searching for a pkey
In-Reply-To: <6a122cc00702270857o41d36732sef607282f013a4b4@mail.gmail.com>
References: <1172507101.4102.277140.camel@hal.voltaire.com>
	<000201c759e3$24828410$55d8180a@amr.corp.intel.com>
	<6a122cc00702270857o41d36732sef607282f013a4b4@mail.gmail.com>
Message-ID: <45E4652D.2070704@ichips.intel.com>

> Sorry for jumping into that thread, but although this patch will make
> things more spec compliant, it will break functionality we depend one.
> I suggest that we first find an alternate way to enable usage of
> partial partition membership before disabling that functionality at
> all.

Can you clarify the functionality you depend on?  Are you reliant on ipoib being 
able to join a multicast group from partial partition membership?  If so, do all 
SA's and switches support this?

- Sean


From mst at mellanox.co.il  Tue Feb 27 09:18:14 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Tue, 27 Feb 2007 19:18:14 +0200
Subject: [openib-general] [PATCH] for OFED 1.2
In-Reply-To: <000101c75a90$d04f31f0$c6d8180a@amr.corp.intel.com>
References: <1172595231.21382.96.camel@vladsk-laptop>
	<000101c75a90$d04f31f0$c6d8180a@amr.corp.intel.com>
Message-ID: <20070227171814.GD10245@mellanox.co.il>

> Quoting Sean Hefty <sean.hefty at intel.com>:
> Subject: Re: [PATCH] for OFED 1.2
> 
> >Yes, actual patches should be created under kernel_patches/fixes.
> >
> >Please update your git tree because the following patch fails:
> 
> Can you explain how the patch fails?  I don't see how putting the patch into a
> file helps.

Try applying it?

> >> Why not apply the patches directly?
> >>
> >To be consistent with 2.6.20 kernel.
> 
> You can check out stock 2.6.20 using a tag.  Why maintain the ofed code in git
> if you don't use it to track patches?

Basically so that conflicts in future merges from upstream are easy to resolve.
If you like, let's reopen this for 1.3. We are after freeze in OFED 1.2.

-- 
MST


From swise at opengridcomputing.com  Tue Feb 27 09:20:31 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Tue, 27 Feb 2007 11:20:31 -0600
Subject: [openib-general] [PATCH] for OFED 1.2
In-Reply-To: <20070227165532.GB10245@mellanox.co.il>
References: <1172594206.21382.90.camel@vladsk-laptop>
	<000001c75a8e$b7a2e3b0$c6d8180a@amr.corp.intel.com>
	<20070227165532.GB10245@mellanox.co.il>
Message-ID: <1172596831.11870.51.camel@stevo-desktop>

On Tue, 2007-02-27 at 18:55 +0200, Michael S. Tsirkin wrote:
> > Quoting Sean Hefty <sean.hefty at intel.com>:
> > Subject: Re: [PATCH] for OFED 1.2
> > 
> > >Please send patches that will be added to kernel_patches/fixes.
> > >
> > >Please update your git tree from
> > >git://git.openfabrics.org/~vlad/ofed_1_2/.git  ofed_1_2
> > 
> > You want me to create a patch that adds a file that contains the actual patches?
> > 
> > Why not apply the patches directly?
> 
> That's the ofed structure, this was discussed multiple times already.
> The point is to keep all changes to upstream components separate,
> to make updating to upstream kernel trivial in the future.
> 
> Worked quite well for OFED 1.1 -> 1.2 transition.
> 

Having these patches as files is painful for every developer because
they cannot create a patch against ofed_1_2/drivers/infiniband/* nor the
kernel.org upstream tree.  They need to apply all the current patches
and then create a patch on top of that. Or hope the patch applies
fuzzily.  

I think with stacked git or just git and rebasing at key times, you
could keep an ofed_1_2 tree that folks can easily apply patches to...

Its too late to change this for 1.2, but you might want to reconsider
the design for 1.3.


my 2 cents...


From monil at voltaire.com  Tue Feb 27 09:25:27 2007
From: monil at voltaire.com (Moni Levy)
Date: Tue, 27 Feb 2007 19:25:27 +0200
Subject: [openib-general] [RFC] [PATCH] ib_cache: do not mask upper bit
 when searching for a pkey
In-Reply-To: <45E4652D.2070704@ichips.intel.com>
References: <1172507101.4102.277140.camel@hal.voltaire.com>
	<000201c759e3$24828410$55d8180a@amr.corp.intel.com>
	<6a122cc00702270857o41d36732sef607282f013a4b4@mail.gmail.com>
	<45E4652D.2070704@ichips.intel.com>
Message-ID: <6a122cc00702270925j47a79e8ey82708c4ef8038480@mail.gmail.com>

On 2/27/07, Sean Hefty <mshefty at ichips.intel.com> wrote:
> > Sorry for jumping into that thread, but although this patch will make
> > things more spec compliant, it will break functionality we depend one.
> > I suggest that we first find an alternate way to enable usage of
> > partial partition membership before disabling that functionality at
> > all.
>
> Can you clarify the functionality you depend on?  Are you reliant on ipoib being
> able to join a multicast group from partial partition membership?

Exactly.

> If so, do all SA's and switches support this?

I can't commit on all the SA's and switches.

-- Moni

>
> - Sean
>
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
>
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
>
>


From halr at voltaire.com  Tue Feb 27 09:19:45 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 27 Feb 2007 12:19:45 -0500
Subject: [openib-general] [RFC] [PATCH] ib_cache: do not mask upper bit
 when searching for a pkey
In-Reply-To: <45E4652D.2070704@ichips.intel.com>
References: <1172507101.4102.277140.camel@hal.voltaire.com>
	<000201c759e3$24828410$55d8180a@amr.corp.intel.com>
	<6a122cc00702270857o41d36732sef607282f013a4b4@mail.gmail.com>
	<45E4652D.2070704@ichips.intel.com>
Message-ID: <1172596773.4102.367435.camel@hal.voltaire.com>

On Tue, 2007-02-27 at 12:06, Sean Hefty wrote:
> > Sorry for jumping into that thread, but although this patch will make
> > things more spec compliant, it will break functionality we depend one.
> > I suggest that we first find an alternate way to enable usage of
> > partial partition membership before disabling that functionality at
> > all.
> 
> Can you clarify the functionality you depend on?  Are you reliant on ipoib being 
> able to join a multicast group from partial partition membership?  If so, do all 
> SA's and switches support this?

I'm not sure who can speak for all SAs nor necessarily would the vendor
SAs indicate this. From a quick code inspection of OpenSM, it appears to
not enforce the compliance properly.

Switches do whatever they are told to do by the SM.

-- Hal

> - Sean
> 
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
> 


From mst at mellanox.co.il  Tue Feb 27 09:31:22 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Tue, 27 Feb 2007 19:31:22 +0200
Subject: [openib-general] [PATCH] for OFED 1.2
In-Reply-To: <1172596831.11870.51.camel@stevo-desktop>
References: <1172594206.21382.90.camel@vladsk-laptop>
	<000001c75a8e$b7a2e3b0$c6d8180a@amr.corp.intel.com>
	<20070227165532.GB10245@mellanox.co.il>
	<1172596831.11870.51.camel@stevo-desktop>
Message-ID: <20070227173122.GE10245@mellanox.co.il>

> Quoting Steve Wise <swise at opengridcomputing.com>:
> Subject: Re: [openib-general] [PATCH] for OFED 1.2
> 
> On Tue, 2007-02-27 at 18:55 +0200, Michael S. Tsirkin wrote:
> > > Quoting Sean Hefty <sean.hefty at intel.com>:
> > > Subject: Re: [PATCH] for OFED 1.2
> > > 
> > > >Please send patches that will be added to kernel_patches/fixes.
> > > >
> > > >Please update your git tree from
> > > >git://git.openfabrics.org/~vlad/ofed_1_2/.git  ofed_1_2
> > > 
> > > You want me to create a patch that adds a file that contains the actual patches?
> > > 
> > > Why not apply the patches directly?
> > 
> > That's the ofed structure, this was discussed multiple times already.
> > The point is to keep all changes to upstream components separate,
> > to make updating to upstream kernel trivial in the future.
> > 
> > Worked quite well for OFED 1.1 -> 1.2 transition.
> > 
> 
> Having these patches as files is painful for every developer because
> they cannot create a patch against ofed_1_2/drivers/infiniband/* nor the
> kernel.org upstream tree.

Did you try using quilt which makes managing patch stacks quite easy?
If you have quilt installed, OFED scripts actually use it
to apply patches, so things are easy.

> They need to apply all the current patches
> and then create a patch on top of that. Or hope the patch applies
> fuzzily.  

One point I can't stress enough: whatever way you create a patch,
developers are expected to build and test it in OFED environment
before posting.

> I think with stacked git or just git and rebasing at key times, you
> could keep an ofed_1_2 tree that folks can easily apply patches to...
> 
> Its too late to change this for 1.2, but you might want to reconsider
> the design for 1.3.

Well, I experimented with git rebase and it is unfortunately still
fragile at this point.

I agree using stacked git might be a good idea, I just did not
have the chance to experiment with it enough. I had an impression
that publishing stg managed branch creates problems for whoever
attempts to track it, but I might be wrong.


-- 
MST


From sean.hefty at intel.com  Tue Feb 27 09:30:02 2007
From: sean.hefty at intel.com (Sean Hefty)
Date: Tue, 27 Feb 2007 09:30:02 -0800
Subject: [openib-general] [PATCH] for OFED 1.2
In-Reply-To: <1172596831.11870.51.camel@stevo-desktop>
Message-ID: <000201c75a94$e9c12f40$c6d8180a@amr.corp.intel.com>

>I think with stacked git or just git and rebasing at key times, you
>could keep an ofed_1_2 tree that folks can easily apply patches to...
>
>Its too late to change this for 1.2, but you might want to reconsider
>the design for 1.3.

Can't we just create a new branch (ofed_1_2_patched) with these patches already
applied and in the correct order?  

Maybe I'm just not understanding the work flow here...

- Sean


From jsquyres at cisco.com  Tue Feb 27 09:39:18 2007
From: jsquyres at cisco.com (Jeff Squyres)
Date: Tue, 27 Feb 2007 12:39:18 -0500
Subject: [openib-general] [PATCH] for OFED 1.2
In-Reply-To: <20070227173122.GE10245@mellanox.co.il>
References: <1172594206.21382.90.camel@vladsk-laptop>
	<000001c75a8e$b7a2e3b0$c6d8180a@amr.corp.intel.com>
	<20070227165532.GB10245@mellanox.co.il>
	<1172596831.11870.51.camel@stevo-desktop>
	<20070227173122.GE10245@mellanox.co.il>
Message-ID: <72A5229F-E8E8-4548-BADC-2E33263CF5B1@cisco.com>

It would be great if all of this knowledge is posted to the wiki to  
avoid repeating this conversation in the future (or one of countless  
variations of this conversation).  For example, I admit to not paying  
close attention to many of the threads on this list, but this was the  
first time I'd head of "quilt".

Specifically: if there are tools and methods that are helpful for OFA/ 
OFED development, they should be detailed on the wiki.  The wiki is  
where all permanent knowledge should be posted.

This is just my $0.000001...


On Feb 27, 2007, at 12:31 PM, Michael S. Tsirkin wrote:

>> Quoting Steve Wise <swise at opengridcomputing.com>:
>> Subject: Re: [openib-general] [PATCH] for OFED 1.2
>>
>> On Tue, 2007-02-27 at 18:55 +0200, Michael S. Tsirkin wrote:
>>>> Quoting Sean Hefty <sean.hefty at intel.com>:
>>>> Subject: Re: [PATCH] for OFED 1.2
>>>>
>>>>> Please send patches that will be added to kernel_patches/fixes.
>>>>>
>>>>> Please update your git tree from
>>>>> git://git.openfabrics.org/~vlad/ofed_1_2/.git  ofed_1_2
>>>>
>>>> You want me to create a patch that adds a file that contains the  
>>>> actual patches?
>>>>
>>>> Why not apply the patches directly?
>>>
>>> That's the ofed structure, this was discussed multiple times  
>>> already.
>>> The point is to keep all changes to upstream components separate,
>>> to make updating to upstream kernel trivial in the future.
>>>
>>> Worked quite well for OFED 1.1 -> 1.2 transition.
>>>
>>
>> Having these patches as files is painful for every developer because
>> they cannot create a patch against ofed_1_2/drivers/infiniband/*  
>> nor the
>> kernel.org upstream tree.
>
> Did you try using quilt which makes managing patch stacks quite easy?
> If you have quilt installed, OFED scripts actually use it
> to apply patches, so things are easy.
>
>> They need to apply all the current patches
>> and then create a patch on top of that. Or hope the patch applies
>> fuzzily.
>
> One point I can't stress enough: whatever way you create a patch,
> developers are expected to build and test it in OFED environment
> before posting.
>
>> I think with stacked git or just git and rebasing at key times, you
>> could keep an ofed_1_2 tree that folks can easily apply patches to...
>>
>> Its too late to change this for 1.2, but you might want to reconsider
>> the design for 1.3.
>
> Well, I experimented with git rebase and it is unfortunately still
> fragile at this point.
>
> I agree using stacked git might be a good idea, I just did not
> have the chance to experiment with it enough. I had an impression
> that publishing stg managed branch creates problems for whoever
> attempts to track it, but I might be wrong.
>
>
> -- 
> MST
>
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
>
> To unsubscribe, please visit http://openib.org/mailman/listinfo/ 
> openib-general


-- 
Jeff Squyres
Server Virtualization Business Unit
Cisco Systems


From mst at mellanox.co.il  Tue Feb 27 09:44:26 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Tue, 27 Feb 2007 19:44:26 +0200
Subject: [openib-general] [PATCH] for OFED 1.2
In-Reply-To: <000201c75a94$e9c12f40$c6d8180a@amr.corp.intel.com>
References: <1172596831.11870.51.camel@stevo-desktop>
	<000201c75a94$e9c12f40$c6d8180a@amr.corp.intel.com>
Message-ID: <20070227174426.GF10245@mellanox.co.il>

> Quoting Sean Hefty <sean.hefty at intel.com>:
> Subject: Re: [PATCH] for OFED 1.2
> 
> >I think with stacked git or just git and rebasing at key times, you
> >could keep an ofed_1_2 tree that folks can easily apply patches to...
> >
> >Its too late to change this for 1.2, but you might want to reconsider
> >the design for 1.3.
> 
> Can't we just create a new branch (ofed_1_2_patched) with these patches already
> applied and in the correct order?  

Then what we do when we want to update to new upstream? Throw this branch away?
As it is, I just pull then build and remove patches that conflict.

By the way, there are backport patches, etc - it is still incorrect
to say that you would be able to generate a patch out of git
and know it's a good one without test-build.

> Maybe I'm just not understanding the work flow here...

Sean, please install quilt and try using it for working with the system.
Adding new patch is usually done in this way
quilt new <patch>
quilt add <files>
edit
quilt refresh

cp patches/<patch> kernel_patches/fixes/
git add kernel_patches/fixes/<patch>
git commit kernel_patches/fixes/<patch>


-- 
MST


From mst at mellanox.co.il  Tue Feb 27 09:45:53 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Tue, 27 Feb 2007 19:45:53 +0200
Subject: [openib-general] [PATCH] for OFED 1.2
In-Reply-To: <72A5229F-E8E8-4548-BADC-2E33263CF5B1@cisco.com>
References: <1172594206.21382.90.camel@vladsk-laptop>
	<000001c75a8e$b7a2e3b0$c6d8180a@amr.corp.intel.com>
	<20070227165532.GB10245@mellanox.co.il>
	<1172596831.11870.51.camel@stevo-desktop>
	<20070227173122.GE10245@mellanox.co.il>
	<72A5229F-E8E8-4548-BADC-2E33263CF5B1@cisco.com>
Message-ID: <20070227174553.GG10245@mellanox.co.il>

Lot's of stuff *is* in wiki already - did you look at pages Vlad created?
Things can always be improved, you can add stuff too.


Quoting Jeff Squyres <jsquyres at cisco.com>:
Subject: Re: [PATCH] for OFED 1.2

It would be great if all of this knowledge is posted to the wiki to  
avoid repeating this conversation in the future (or one of countless  
variations of this conversation).  For example, I admit to not paying  
close attention to many of the threads on this list, but this was the  
first time I'd head of "quilt".

Specifically: if there are tools and methods that are helpful for OFA/ 
OFED development, they should be detailed on the wiki.  The wiki is  
where all permanent knowledge should be posted.

This is just my $0.000001...


On Feb 27, 2007, at 12:31 PM, Michael S. Tsirkin wrote:

>> Quoting Steve Wise <swise at opengridcomputing.com>:
>> Subject: Re: [openib-general] [PATCH] for OFED 1.2
>>
>> On Tue, 2007-02-27 at 18:55 +0200, Michael S. Tsirkin wrote:
>>>> Quoting Sean Hefty <sean.hefty at intel.com>:
>>>> Subject: Re: [PATCH] for OFED 1.2
>>>>
>>>>> Please send patches that will be added to kernel_patches/fixes.
>>>>>
>>>>> Please update your git tree from
>>>>> git://git.openfabrics.org/~vlad/ofed_1_2/.git  ofed_1_2
>>>>
>>>> You want me to create a patch that adds a file that contains the  
>>>> actual patches?
>>>>
>>>> Why not apply the patches directly?
>>>
>>> That's the ofed structure, this was discussed multiple times  
>>> already.
>>> The point is to keep all changes to upstream components separate,
>>> to make updating to upstream kernel trivial in the future.
>>>
>>> Worked quite well for OFED 1.1 -> 1.2 transition.
>>>
>>
>> Having these patches as files is painful for every developer because
>> they cannot create a patch against ofed_1_2/drivers/infiniband/*  
>> nor the
>> kernel.org upstream tree.
>
> Did you try using quilt which makes managing patch stacks quite easy?
> If you have quilt installed, OFED scripts actually use it
> to apply patches, so things are easy.
>
>> They need to apply all the current patches
>> and then create a patch on top of that. Or hope the patch applies
>> fuzzily.
>
> One point I can't stress enough: whatever way you create a patch,
> developers are expected to build and test it in OFED environment
> before posting.
>
>> I think with stacked git or just git and rebasing at key times, you
>> could keep an ofed_1_2 tree that folks can easily apply patches to...
>>
>> Its too late to change this for 1.2, but you might want to reconsider
>> the design for 1.3.
>
> Well, I experimented with git rebase and it is unfortunately still
> fragile at this point.
>
> I agree using stacked git might be a good idea, I just did not
> have the chance to experiment with it enough. I had an impression
> that publishing stg managed branch creates problems for whoever
> attempts to track it, but I might be wrong.
>
>
> -- 
> MST
>
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
>
> To unsubscribe, please visit http://openib.org/mailman/listinfo/ 
> openib-general


-- 
Jeff Squyres
Server Virtualization Business Unit
Cisco Systems


_______________________________________________
openib-general mailing list
openib-general at openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

-- 
MST


From mst at mellanox.co.il  Tue Feb 27 09:47:18 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Tue, 27 Feb 2007 19:47:18 +0200
Subject: [openib-general] [PATCH] for OFED 1.2
In-Reply-To: <72A5229F-E8E8-4548-BADC-2E33263CF5B1@cisco.com>
References: <1172594206.21382.90.camel@vladsk-laptop>
	<000001c75a8e$b7a2e3b0$c6d8180a@amr.corp.intel.com>
	<20070227165532.GB10245@mellanox.co.il>
	<1172596831.11870.51.camel@stevo-desktop>
	<20070227173122.GE10245@mellanox.co.il>
	<72A5229F-E8E8-4548-BADC-2E33263CF5B1@cisco.com>
Message-ID: <20070227174718.GH10245@mellanox.co.il>

> This is just my $0.000001...

Thanks for the suggestions, but what does $0.000001 buy one in US today?

-- 
MST


From swise at opengridcomputing.com  Tue Feb 27 09:55:52 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Tue, 27 Feb 2007 11:55:52 -0600
Subject: [openib-general] [PATCH] for OFED 1.2
In-Reply-To: <20070227174426.GF10245@mellanox.co.il>
References: <1172596831.11870.51.camel@stevo-desktop>
	<000201c75a94$e9c12f40$c6d8180a@amr.corp.intel.com>
	<20070227174426.GF10245@mellanox.co.il>
Message-ID: <1172598952.11870.74.camel@stevo-desktop>

On Tue, 2007-02-27 at 19:44 +0200, Michael S. Tsirkin wrote:
> > Quoting Sean Hefty <sean.hefty at intel.com>:
> > Subject: Re: [PATCH] for OFED 1.2
> > 
> > >I think with stacked git or just git and rebasing at key times, you
> > >could keep an ofed_1_2 tree that folks can easily apply patches to...
> > >
> > >Its too late to change this for 1.2, but you might want to reconsider
> > >the design for 1.3.
> > 
> > Can't we just create a new branch (ofed_1_2_patched) with these patches already
> > applied and in the correct order?  
> 
> Then what we do when we want to update to new upstream? Throw this branch away?
> As it is, I just pull then build and remove patches that conflict.
> 
> By the way, there are backport patches, etc - it is still incorrect
> to say that you would be able to generate a patch out of git
> and know it's a good one without test-build.
> 
> > Maybe I'm just not understanding the work flow here...
> 
> Sean, please install quilt and try using it for working with the system.
> Adding new patch is usually done in this way
> quilt new <patch>
> quilt add <files>
> edit
> quilt refresh
> 
> cp patches/<patch> kernel_patches/fixes/
> git add kernel_patches/fixes/<patch>
> git commit kernel_patches/fixes/<patch>

NOTE: The key to the above process is the assumption that the developer
maintains _all_ of the existing patches from kernel_patches/ on top of
the ofed_1_2 tree using quilt or stg.  Otherwise quilt/stg isn't buying
you anything.

And this doesn't take into account backports.

Regardless, you need to build, install and test any ofed patch on an
ofed system, so you're gonna have extra work:

1) create ofed-specific patch
   build/test it on ofed
   post it to openib-general/ewg

2) create kernel.org patch
   build/test it on kernel.org
   post it to openib-gernel/lklm/netdev


My .27 cents...


From jsquyres at cisco.com  Tue Feb 27 10:11:18 2007
From: jsquyres at cisco.com (Jeff Squyres)
Date: Tue, 27 Feb 2007 13:11:18 -0500
Subject: [openib-general] [PATCH] for OFED 1.2
In-Reply-To: <20070227174553.GG10245@mellanox.co.il>
References: <1172594206.21382.90.camel@vladsk-laptop>
	<000001c75a8e$b7a2e3b0$c6d8180a@amr.corp.intel.com>
	<20070227165532.GB10245@mellanox.co.il>
	<1172596831.11870.51.camel@stevo-desktop>
	<20070227173122.GE10245@mellanox.co.il>
	<72A5229F-E8E8-4548-BADC-2E33263CF5B1@cisco.com>
	<20070227174553.GG10245@mellanox.co.il>
Message-ID: <BFAAFF92-8312-4933-9AD2-03D882F641B5@cisco.com>

On Feb 27, 2007, at 12:45 PM, Michael S. Tsirkin wrote:

> Lot's of stuff *is* in wiki already - did you look at pages Vlad  
> created?

A search for "quilt" on the wiki turns up nothing (I checked before I  
posted :-) ).

And yes, I have [thoroughly] read the pages Vlad created.  But the  
very fact that this conversation is occurring is because either the  
information is not on the wiki or what is on the wiki is not clear.   
Otherwise, I suspect that you simply would have pointed Steve to the  
wiki and said "Please read the fine manual at http://....".

Don't get me wrong; what has already been posted is great.  I'm just  
saying: keep it coming!  The wiki should be a living document that  
changes as our procedures and collective wisdom changes.  It saves us  
*all* time over the long run.  A one-time dump of information is not  
nearly as useful as an ever-updated document.

> Things can always be improved, you can add stuff too.

https://wiki.openfabrics.org/tiki-lastchanges.php?days=31 shows that  
only Tziporet and myself have changed the OFED portion of the wiki  
over the past month.

So -- *you* can add stuff to the wiki, too.  :-)

> This is just my $0.000001...

It buys very little, if anything.  In fact, a whole $0.02 also buys  
very little, if anything.  So take my comments for what they're worth.

-- 
Jeff Squyres
Server Virtualization Business Unit
Cisco Systems


From mst at mellanox.co.il  Tue Feb 27 10:14:08 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Tue, 27 Feb 2007 20:14:08 +0200
Subject: [openib-general] [PATCH] for OFED 1.2
In-Reply-To: <1172598952.11870.74.camel@stevo-desktop>
References: <1172596831.11870.51.camel@stevo-desktop>
	<000201c75a94$e9c12f40$c6d8180a@amr.corp.intel.com>
	<20070227174426.GF10245@mellanox.co.il>
	<1172598952.11870.74.camel@stevo-desktop>
Message-ID: <20070227181353.GI10245@mellanox.co.il>

> Quoting Steve Wise <swise at opengridcomputing.com>:
> Subject: Re: [PATCH] for OFED 1.2
> 
> On Tue, 2007-02-27 at 19:44 +0200, Michael S. Tsirkin wrote:
> > > Quoting Sean Hefty <sean.hefty at intel.com>:
> > > Subject: Re: [PATCH] for OFED 1.2
> > > 
> > > >I think with stacked git or just git and rebasing at key times, you
> > > >could keep an ofed_1_2 tree that folks can easily apply patches to...
> > > >
> > > >Its too late to change this for 1.2, but you might want to reconsider
> > > >the design for 1.3.
> > > 
> > > Can't we just create a new branch (ofed_1_2_patched) with these patches already
> > > applied and in the correct order?  
> > 
> > Then what we do when we want to update to new upstream? Throw this branch away?
> > As it is, I just pull then build and remove patches that conflict.
> > 
> > By the way, there are backport patches, etc - it is still incorrect
> > to say that you would be able to generate a patch out of git
> > and know it's a good one without test-build.
> > 
> > > Maybe I'm just not understanding the work flow here...
> > 
> > Sean, please install quilt and try using it for working with the system.
> > Adding new patch is usually done in this way
> > quilt new <patch>
> > quilt add <files>
> > edit
> > quilt refresh
> > 
> > cp patches/<patch> kernel_patches/fixes/
> > git add kernel_patches/fixes/<patch>
> > git commit kernel_patches/fixes/<patch>
> 
> NOTE: The key to the above process is the assumption that the developer
> maintains _all_ of the existing patches from kernel_patches/ on top of
> the ofed_1_2 tree using quilt or stg.  Otherwise quilt/stg isn't buying
> you anything.

OFED will do this automatically.

> And this doesn't take into account backports.

The process works with backport patches too: you just have to do this

> quilt pop -a
> 
> > > quilt new <patch>
> > > quilt add <files>
> > > edit
> > > quilt refresh
> 
> quilt push -a


-- 
MST


From mst at mellanox.co.il  Tue Feb 27 10:15:28 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Tue, 27 Feb 2007 20:15:28 +0200
Subject: [openib-general] [PATCH] for OFED 1.2
In-Reply-To: <BFAAFF92-8312-4933-9AD2-03D882F641B5@cisco.com>
References: <1172594206.21382.90.camel@vladsk-laptop>
	<000001c75a8e$b7a2e3b0$c6d8180a@amr.corp.intel.com>
	<20070227165532.GB10245@mellanox.co.il>
	<1172596831.11870.51.camel@stevo-desktop>
	<20070227173122.GE10245@mellanox.co.il>
	<72A5229F-E8E8-4548-BADC-2E33263CF5B1@cisco.com>
	<20070227174553.GG10245@mellanox.co.il>
	<BFAAFF92-8312-4933-9AD2-03D882F641B5@cisco.com>
Message-ID: <20070227181528.GJ10245@mellanox.co.il>

> > This is just my $0.000001...
> 
> It buys very little, if anything.  In fact, a whole $0.02 also buys  
> very little, if anything.  So take my comments for what they're worth.

Oh, good, I thought deflation is getting out of hand ...

-- 
MST


From mst at mellanox.co.il  Tue Feb 27 10:16:53 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Tue, 27 Feb 2007 20:16:53 +0200
Subject: [openib-general] [PATCH] for OFED 1.2
In-Reply-To: <BFAAFF92-8312-4933-9AD2-03D882F641B5@cisco.com>
References: <1172594206.21382.90.camel@vladsk-laptop>
	<000001c75a8e$b7a2e3b0$c6d8180a@amr.corp.intel.com>
	<20070227165532.GB10245@mellanox.co.il>
	<1172596831.11870.51.camel@stevo-desktop>
	<20070227173122.GE10245@mellanox.co.il>
	<72A5229F-E8E8-4548-BADC-2E33263CF5B1@cisco.com>
	<20070227174553.GG10245@mellanox.co.il>
	<BFAAFF92-8312-4933-9AD2-03D882F641B5@cisco.com>
Message-ID: <20070227181653.GK10245@mellanox.co.il>

> > Lot's of stuff *is* in wiki already - did you look at pages Vlad  
> > created?
> 
> A search for "quilt" on the wiki turns up nothing (I checked before I  
> posted :-) ).
> 
> And yes, I have [thoroughly] read the pages Vlad created.  But the  
> very fact that this conversation is occurring is because either the  
> information is not on the wiki or what is on the wiki is not clear.   
> Otherwise, I suspect that you simply would have pointed Steve to the  
> wiki and said "Please read the fine manual at http://....".

You are right in that, I don't disclaim it.
Thanks for the suggestion, I'll try to find the time to add this to wiki.

-- 
MST


From swise at opengridcomputing.com  Tue Feb 27 10:43:18 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Tue, 27 Feb 2007 12:43:18 -0600
Subject: [openib-general] [PATCH] for OFED 1.2
In-Reply-To: <20070227181353.GI10245@mellanox.co.il>
References: <1172596831.11870.51.camel@stevo-desktop>
	<000201c75a94$e9c12f40$c6d8180a@amr.corp.intel.com>
	<20070227174426.GF10245@mellanox.co.il>
	<1172598952.11870.74.camel@stevo-desktop>
	<20070227181353.GI10245@mellanox.co.il>
Message-ID: <1172601798.11870.103.camel@stevo-desktop>

> > > 
> > > Sean, please install quilt and try using it for working with the system.
> > > Adding new patch is usually done in this way
> > > quilt new <patch>
> > > quilt add <files>
> > > edit
> > > quilt refresh
> > > 
> > > cp patches/<patch> kernel_patches/fixes/
> > > git add kernel_patches/fixes/<patch>
> > > git commit kernel_patches/fixes/<patch>
> > 
> > NOTE: The key to the above process is the assumption that the developer
> > maintains _all_ of the existing patches from kernel_patches/ on top of
> > the ofed_1_2 tree using quilt or stg.  Otherwise quilt/stg isn't buying
> > you anything.
> 
> OFED will do this automatically.
> 

uh, can you explain this?  Given I have a freshly cloned ofed_1_2 git
tree, and I want to change cma.c (a good one cuz there are patches).
What do I do?  There's no quilt stack at all at this point.  Right?  


> > And this doesn't take into account backports.
> 
> The process works with backport patches too: you just have to do this
> 
> > quilt pop -a
> > 
> > > > quilt new <patch>
> > > > quilt add <files>
> > > > edit
> > > > quilt refresh
> > 
> > quilt push -a


But you cannot keep a stack for more than one backport pushed, right?
So you still need to be slapping the stacks of patches around for each
backport.  

Or maybe I'm confused?


From sean.hefty at intel.com  Tue Feb 27 10:49:08 2007
From: sean.hefty at intel.com (Sean Hefty)
Date: Tue, 27 Feb 2007 10:49:08 -0800
Subject: [openib-general] [PATCH] for OFED 1.2
In-Reply-To: <1172601798.11870.103.camel@stevo-desktop>
Message-ID: <000301c75a9f$f6d28480$c6d8180a@amr.corp.intel.com>

>But you cannot keep a stack for more than one backport pushed, right?
>So you still need to be slapping the stacks of patches around for each
>backport.

Why not have separate branches for each kernels too?


From mst at mellanox.co.il  Tue Feb 27 10:51:07 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Tue, 27 Feb 2007 20:51:07 +0200
Subject: [openib-general] [PATCH] for OFED 1.2
In-Reply-To: <1172601798.11870.103.camel@stevo-desktop>
References: <1172596831.11870.51.camel@stevo-desktop>
	<000201c75a94$e9c12f40$c6d8180a@amr.corp.intel.com>
	<20070227174426.GF10245@mellanox.co.il>
	<1172598952.11870.74.camel@stevo-desktop>
	<20070227181353.GI10245@mellanox.co.il>
	<1172601798.11870.103.camel@stevo-desktop>
Message-ID: <20070227185107.GL10245@mellanox.co.il>

> Quoting Steve Wise <swise at opengridcomputing.com>:
> Subject: Re: [PATCH] for OFED 1.2
> 
> > > > 
> > > > Sean, please install quilt and try using it for working with the system.
> > > > Adding new patch is usually done in this way
> > > > quilt new <patch>
> > > > quilt add <files>
> > > > edit
> > > > quilt refresh
> > > > 
> > > > cp patches/<patch> kernel_patches/fixes/
> > > > git add kernel_patches/fixes/<patch>
> > > > git commit kernel_patches/fixes/<patch>
> > > 
> > > NOTE: The key to the above process is the assumption that the developer
> > > maintains _all_ of the existing patches from kernel_patches/ on top of
> > > the ofed_1_2 tree using quilt or stg.  Otherwise quilt/stg isn't buying
> > > you anything.
> > 
> > OFED will do this automatically.
> > 
> 
> uh, can you explain this?  Given I have a freshly cloned ofed_1_2 git
> tree, and I want to change cma.c (a good one cuz there are patches).
> What do I do?  There's no quilt stack at all at this point.  Right?  

Try running the configure script.
After this, quilt applied will show what patches are applied.

> > > And this doesn't take into account backports.
> > 
> > The process works with backport patches too: you just have to do this
> > 
> > > quilt pop -a
> > > 
> > > > > quilt new <patch>
> > > > > quilt add <files>
> > > > > edit
> > > > > quilt refresh
> > > 
> > > quilt push -a
> 
> 
> But you cannot keep a stack for more than one backport pushed, right?
> So you still need to be slapping the stacks of patches around for each
> backport.  
> 
> Or maybe I'm confused?

Yes.

Fortunately it's not too hard: you can do
quilt pop -a
and re-run configure for another kernel.

Of course for testing the patch, it is easier to commit the change in your tree
and then to use openfabrics cross-build functionality that will clone this
tree and build for multiple arches/kernels.


-- 
MST


From mst at mellanox.co.il  Tue Feb 27 10:53:26 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Tue, 27 Feb 2007 20:53:26 +0200
Subject: [openib-general] [PATCH] for OFED 1.2
In-Reply-To: <000301c75a9f$f6d28480$c6d8180a@amr.corp.intel.com>
References: <1172601798.11870.103.camel@stevo-desktop>
	<000301c75a9f$f6d28480$c6d8180a@amr.corp.intel.com>
Message-ID: <20070227185326.GM10245@mellanox.co.il>

> Quoting Sean Hefty <sean.hefty at intel.com>:
> Subject: RE: [PATCH] for OFED 1.2
> 
> >But you cannot keep a stack for more than one backport pushed, right?
> >So you still need to be slapping the stacks of patches around for each
> >backport.
> 
> Why not have separate branches for each kernels too?

I think it'll be much more work to maintain all these branches.
And again, there will be conflicts, and it's too easy to get confused when
resolving a conflict.

With patches we have scripts to automate this.


-- 
MST


From troy at scl.ameslab.gov  Tue Feb 27 11:03:16 2007
From: troy at scl.ameslab.gov (Troy Benjegerdes)
Date: Tue, 27 Feb 2007 13:03:16 -0600
Subject: [openib-general] remove www.openfabrics.org SVN links..
Message-ID: <20070227190316.GA12092@minbar-g5.scl.ameslab.gov>

Can someone please update the main www.openfabrics.org web page to
remove all references to subversion, and link to a wiki page on how to
get the latest source?

Thanks.


From mshefty at ichips.intel.com  Tue Feb 27 11:10:17 2007
From: mshefty at ichips.intel.com (Sean Hefty)
Date: Tue, 27 Feb 2007 11:10:17 -0800
Subject: [openib-general] [PATCH] for OFED 1.2
In-Reply-To: <20070227185326.GM10245@mellanox.co.il>
References: <1172601798.11870.103.camel@stevo-desktop>
	<000301c75a9f$f6d28480$c6d8180a@amr.corp.intel.com>
	<20070227185326.GM10245@mellanox.co.il>
Message-ID: <45E48219.7030904@ichips.intel.com>

> I think it'll be much more work to maintain all these branches.
> And again, there will be conflicts, and it's too easy to get confused when
> resolving a conflict.

Storing patches in a directory seems confusing to me.  They must be applied in a 
specific order for everything to work, and that knowledge is not captured. 
Conflicts need to be resolved anyway.

If someone wants to use scripts to make their life easier, that's fine, but they 
shouldn't be a necessity to checking out code and creating patches using git. 
For OFED they are.


From sashak at voltaire.com  Tue Feb 27 12:11:39 2007
From: sashak at voltaire.com (Sasha Khapyorsky)
Date: Tue, 27 Feb 2007 22:11:39 +0200
Subject: [openib-general] [PATCH] for OFED 1.2
In-Reply-To: <20070227174426.GF10245@mellanox.co.il>
References: <1172596831.11870.51.camel@stevo-desktop>
	<000201c75a94$e9c12f40$c6d8180a@amr.corp.intel.com>
	<20070227174426.GF10245@mellanox.co.il>
Message-ID: <20070227201139.GB13938@sashak.voltaire.com>

On 19:44 Tue 27 Feb     , Michael S. Tsirkin wrote:
> > Quoting Sean Hefty <sean.hefty at intel.com>:
> > Subject: Re: [PATCH] for OFED 1.2
> > 
> > >I think with stacked git or just git and rebasing at key times, you
> > >could keep an ofed_1_2 tree that folks can easily apply patches to...
> > >
> > >Its too late to change this for 1.2, but you might want to reconsider
> > >the design for 1.3.
> > 
> > Can't we just create a new branch (ofed_1_2_patched) with these patches already
> > applied and in the correct order?  
> 
> Then what we do when we want to update to new upstream? Throw this branch away?
> As it is, I just pull then build and remove patches that conflict.

You can save this branch as <branch-name>-<upstream-name> (or better)
and to rebase <branch-name> to the new upstream.

> By the way, there are backport patches, etc - it is still incorrect
> to say that you would be able to generate a patch out of git
> and know it's a good one without test-build.

In similar way you can track backport patch sets as branches.

> > Maybe I'm just not understanding the work flow here...
> 
> Sean, please install quilt and try using it for working with the system.
> Adding new patch is usually done in this way
> quilt new <patch>
> quilt add <files>
> edit
> quilt refresh
> 
> cp patches/<patch> kernel_patches/fixes/
> git add kernel_patches/fixes/<patch>
> git commit kernel_patches/fixes/<patch>

This looks strange for me to track patches against patches...

Sasha


From rowland at cse.ohio-state.edu  Tue Feb 27 11:57:03 2007
From: rowland at cse.ohio-state.edu (Shaun Rowland)
Date: Tue, 27 Feb 2007 14:57:03 -0500
Subject: [openib-general] ofed_1_2_scripts for bug 372
Message-ID: <45E48D0F.8070403@cse.ohio-state.edu>

Hi Vladimir. I've attached a small patch to the ofed_1_2_scripts
build.sh file for the mvapich2() function. This fixes bug 372 where the
F90 compiler was not being set properly for the GNU compiler case and
other possible compilers in the path were being found. This patch is
against the latest ofed_1_2_scripts git.
-- 
Shaun Rowland	rowland at cse.ohio-state.edu
http://www.cse.ohio-state.edu/~rowland/
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: bug-372.patch
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070227/eee6450e/attachment.ksh>

From mst at mellanox.co.il  Tue Feb 27 12:23:31 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Tue, 27 Feb 2007 22:23:31 +0200
Subject: [openib-general] [PATCH] for OFED 1.2
In-Reply-To: <20070227201139.GB13938@sashak.voltaire.com>
References: <1172596831.11870.51.camel@stevo-desktop>
	<000201c75a94$e9c12f40$c6d8180a@amr.corp.intel.com>
	<20070227174426.GF10245@mellanox.co.il>
	<20070227201139.GB13938@sashak.voltaire.com>
Message-ID: <20070227202331.GP10245@mellanox.co.il>

> Quoting Sasha Khapyorsky <sashak at voltaire.com>:
> Subject: Re: [PATCH] for OFED 1.2
> 
> On 19:44 Tue 27 Feb     , Michael S. Tsirkin wrote:
> > > Quoting Sean Hefty <sean.hefty at intel.com>:
> > > Subject: Re: [PATCH] for OFED 1.2
> > > 
> > > >I think with stacked git or just git and rebasing at key times, you
> > > >could keep an ofed_1_2 tree that folks can easily apply patches to...
> > > >
> > > >Its too late to change this for 1.2, but you might want to reconsider
> > > >the design for 1.3.
> > > 
> > > Can't we just create a new branch (ofed_1_2_patched) with these patches already
> > > applied and in the correct order?  
> > 
> > Then what we do when we want to update to new upstream? Throw this branch away?
> > As it is, I just pull then build and remove patches that conflict.
> 
> You can save this branch as <branch-name>-<upstream-name> (or better)
> and to rebase <branch-name> to the new upstream.

rebase does not seem to be too robust when run on such a large repository as the
linux kernel.  Maybe stacked git will work.

> > By the way, there are backport patches, etc - it is still incorrect
> > to say that you would be able to generate a patch out of git
> > and know it's a good one without test-build.
> 
> In similar way you can track backport patch sets as branches.

At the moment it seems like a lot of work. Again, maybe stg makes it easy,
I know it's hard with plain git.

And I think lots of people (including me) will be confused if we have a ton of branches.

> > > Maybe I'm just not understanding the work flow here...
> > 
> > Sean, please install quilt and try using it for working with the system.
> > Adding new patch is usually done in this way
> > quilt new <patch>
> > quilt add <files>
> > edit
> > quilt refresh
> > 
> > cp patches/<patch> kernel_patches/fixes/
> > git add kernel_patches/fixes/<patch>
> > git commit kernel_patches/fixes/<patch>
> 
> This looks strange for me to track patches against patches...

One gets used to it :)
Seriously, we have these patches, and we want to version them together
with source they are intended to apply to.

-- 
MST


From or.gerlitz at gmail.com  Tue Feb 27 12:26:32 2007
From: or.gerlitz at gmail.com (Or Gerlitz)
Date: Tue, 27 Feb 2007 22:26:32 +0200
Subject: [openib-general] failure to create an FMR mapping 1K pages on
	memfree
In-Reply-To: <adawt24w787.fsf@cisco.com>
References: <15ddcffd0702261104x6df977b6g9e4ca0071c8489ad@mail.gmail.com>
	<15ddcffd0702261105s377ad165h7bfe258f69ede152@mail.gmail.com>
	<adawt24w787.fsf@cisco.com>
Message-ID: <15ddcffd0702271226m6c54fa66x3328129f7a7e608@mail.gmail.com>

On 2/27/07, Roland Dreier <rdreier at cisco.com> wrote:
> Is it really returning -ENOMEM?  It seems much more likely that you
> are hitting the code
>
>         /* For Arbel, all MTTs must fit in the same page. */
>         if (mthca_is_memfree(dev) &&
>             mr->attr.max_pages * sizeof *mr->mem.arbel.mtts > PAGE_SIZE)
>                 return -EINVAL;
>
> I guess you could call this limit a driver design issue.

Indeed, sorry for the in accorate description, mthca_fmr_alloc returns
-EINVAL and the fmr pool code returns -ENOMEM. Thanks for the
clarification.

Or.


From mst at mellanox.co.il  Tue Feb 27 13:29:24 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Tue, 27 Feb 2007 23:29:24 +0200
Subject: [openib-general] Fwd: [ANNOUNCE] GIT 1.5.0.2
Message-ID: <20070227212924.GB24555@mellanox.co.il>

FYI.

----- Forwarded message from Junio C Hamano <junkio at cox.net> -----

Subject: [ANNOUNCE] GIT 1.5.0.2
Date: Tue, 27 Feb 2007 10:58:22 +0200
In-Reply-To: <7vwt2ec32p.fsf at assigned-by-dhcp.cox.net> (Junio C. Hamano'smessage of "Sun, 18 Feb 2007 18:07:42 -0800")
References: <7vwt2ec32p.fsf at assigned-by-dhcp.cox.net>
From: Junio C Hamano <junkio at cox.net>

The latest maintenance release GIT 1.5.0.2 is available at the
usual places:

  http://www.kernel.org/pub/software/scm/git/

  git-1.5.0.2.tar.{gz,bz2}			(tarball)
  git-htmldocs-1.5.0.2.tar.{gz,bz2}		(preformatted docs)
  git-manpages-1.5.0.2.tar.{gz,bz2}		(preformatted docs)
  RPMS/$arch/git-*-1.5.0.2-1.$arch.rpm	(RPM)


GIT v1.5.0.2 Release Notes
==========================

Fixes since v1.5.0.1
--------------------

* Bugfixes

  - Automated merge conflict handling when changes to symbolic
    links conflicted were completely broken.  The merge-resolve
    strategy created a regular file with conflict markers in it
    in place of the symbolic link.  The default strategy,
    merge-recursive was even more broken.  It removed the path
    that was pointed at by the symbolic link.  Both of these
    problems have been fixed.

  - 'git diff maint master next' did not correctly give combined
    diff across three trees.

  - 'git fast-import' portability fix for Solaris.

  - 'git show-ref --verify' without arguments did not error out
    but segfaulted.

  - 'git diff :tracked-file `pwd`/an-untracked-file' gave an extra
    slashes after a/ and b/.

  - 'git format-patch' produced too long filenames if the commit
    message had too long line at the beginning.

  - Running 'make all' and then without changing anything
    running 'make install' still rebuilt some files.  This
    was inconvenient when building as yourself and then
    installing as root (especially problematic when the source
    directory is on NFS and root is mapped to nobody).

  - 'git-rerere' failed to deal with two unconflicted paths that
    sorted next to each other.

  - 'git-rerere' attempted to open(2) a symlink and failed if
    there was a conflict.  Since a conflicting change to a
    symlink would not benefit from rerere anyway, the command
    now ignores conflicting changes to symlinks.

  - 'git-repack' did not like to pass more than 64 arguments
    internally to underlying 'rev-list' logic, which made it
    impossible to repack after accumulating many (small) packs
    in the repository.

  - 'git-diff' to review the combined diff during a conflicted
    merge were not reading the working tree version correctly
    when changes to a symbolic link conflicted.  It should have
    read the data using readlink(2) but read from the regular
    file the symbolic link pointed at.

  - 'git-remote' did not like period in a remote's name.

* Documentation updates

  - added and clarified core.bare, core.legacyheaders configurations.

  - updated "git-clone --depth" documentation.


* Assorted git-gui fixes.

-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

----- End forwarded message -----

-- 
MST


From rdreier at cisco.com  Tue Feb 27 13:40:36 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Tue, 27 Feb 2007 13:40:36 -0800
Subject: [openib-general] [RFC/BUG] DMA vs. CQ race
In-Reply-To: <adaodngzh9d.fsf@cisco.com> (Roland Dreier's message of
	"Mon, 26 Feb 2007 14:27:42 -0800")
References: <OFDFE7B645.5CBDA3F0-ON8725728E.007A8C1A-8825728E.004ED577@us.ibm.com>
	<adaodngzh9d.fsf@cisco.com>
Message-ID: <adaodnfth2j.fsf@cisco.com>

 >  > On our cell blade + PCI-e Mellanox.
 > 
 > I don't see anything in arch/powerpc that looks like
 > dma_alloc_coherent() will do anything other than allocate some memory
 > and map it with DMA_BIDIRECTIONAL.  So how does this altix fix help in
 > your situation?  Am I misreading the Cell IOMMU code?

Shirley, can you clarify why doing dma_alloc_coherent() in the kernel
helps on your Cell blade?  It really seems that dma_alloc_coherent()
just allocates some memory and then does dma_map(DMA_BIDIRECTIONAL),
which would be exactly the same as allocating the CQ buffer in
userspace and using ib_umem_get() to map it into the kernel.

I'm looking at a possibly cleaner solution to the Altix issue, so I
would like to make sure it fixes whatever the bug on Cell is as well.
So any details you can provide about the problem you see on Cell would
help a lot.

Thanks...


From hozer at hozed.org  Tue Feb 27 13:47:43 2007
From: hozer at hozed.org (Troy Benjegerdes)
Date: Tue, 27 Feb 2007 15:47:43 -0600
Subject: [openib-general] Port error rate detection
In-Reply-To: <45DA0E50.7010002@ornl.gov>
References: <45DA0E50.7010002@ornl.gov>
Message-ID: <20070227214739.GZ21482@narn.hozed.org>

On Mon, Feb 19, 2007 at 03:53:36PM -0500, Steven Carter wrote:
> I have a Nagios module that alerts on connectivity, port errors, 
> speed/width problems.  I would like to give it the ability to change the 
> severity of the alert depending on whether errors are just present or if 
> they are increasing faster than a specified rate.  The intent is to 
> equip the module to keep the state of the last query and possibly 
> history, but I wanted to make sure that I was not re-inventing the wheel 
> first.  Is there an attribute or utility that I am overlooking that will 
> help me do this?

One other thing you might want to take a look at is the Fountain/Goanna
node monitoring setup... It's not really anything like the proposed
performance manager, but it might get you want you need. (And we'd like
some feedback on what it should do differently ;)

http://www.scl.ameslab.gov/Projects/Monitor/


From xma at us.ibm.com  Tue Feb 27 14:14:51 2007
From: xma at us.ibm.com (Shirley Ma)
Date: Tue, 27 Feb 2007 14:14:51 -0800
Subject: [openib-general] [RFC/BUG] DMA vs. CQ race
In-Reply-To: <adaodnfth2j.fsf@cisco.com>
Message-ID: <OFB4C20655.C907A319-ON8725728F.00798763-8825728F.004E4808@us.ibm.com>


Roland Dreier <rdreier at cisco.com> wrote on 02/27/2007 01:40:36 PM:

> Shirley, can you clarify why doing dma_alloc_coherent() in the kernel
> helps on your Cell blade?  It really seems that dma_alloc_coherent()
> just allocates some memory and then does dma_map(DMA_BIDIRECTIONAL),
> which would be exactly the same as allocating the CQ buffer in
> userspace and using ib_umem_get() to map it into the kernel.
>
> I'm looking at a possibly cleaner solution to the Altix issue, so I
> would like to make sure it fixes whatever the bug on Cell is as well.
> So any details you can provide about the problem you see on Cell would
> help a lot.
>
> Thanks...
Thanks, Roland. The failure on Cell is different with Altix issue after I
reviewed the whole thread. So this fix might not help Cell. The problem I
have might be related to multiple DMAs mapping to the same CQ. It might be
somewhere else lost the sync.

Thanks
Shirley Ma
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070227/a0fe901b/attachment.html>

From xma at us.ibm.com  Tue Feb 27 14:28:54 2007
From: xma at us.ibm.com (Shirley Ma)
Date: Tue, 27 Feb 2007 15:28:54 -0700
Subject: [openib-general] Fw: [PATCH] enable IPoIB only if broadcast join
 finish
Message-ID: <OF3BA2F709.CD937B1E-ON8725728F.007AFBBF-8825728F.004F9115@us.ibm.com>


Hello Roland,

      Sorry to bother you again. Could you please review below patch to see
it's possible to be in upper stream soon? IPoIB can't ping each other if
broadcast join successfully but encounting any other IB multicast join
failure (like  IB multicast group join failure for default IPv6 link local
solicited address) when bringing the interface up. It does impact IPoIB
usability in large node cluster when MCG LIDs are limited.

Thanks
Shirley Ma


----- Forwarded by Shirley Ma/Beaverton/IBM on 02/27/07 06:23 AM -----
                                                                           
             Shirley                                                       
             Ma/Beaverton/IBM@                                             
             IBMUS                                                      To 
             Sent by:                  "Roland Dreier" <rdreier at cisco.com> 
             openib-general-bo                                          cc 
             unces at openib.org          openib-general at openib.org           
                                                                   Subject 
                                       [openib-general] [PATCH] enable     
             02/05/07 06:50 AM         IPoIB only if broadcast join finish 
                                                                           
                                                                           
Hi, Roland,

Please review this patch. According to IPoIB RFC4391 section 5, once IPoIB
broacast group has been joined, the interface should be ready for data
transfer. In current IPoIB implementation, the interface is UP and RUNNING
when all default multicast join successful. We hit a problem while the
broadcast join finishe and sucessful but the all hosts multicast join
failure.

Here is the patch, if possible please give your input asap, we have an
urgent customer issue need to be resolved:

diff -urpN ipoib/ipoib_multicast.c ipoib-multicast/ipoib_multicast.c
--- ipoib/ipoib_multicast.c 2006-11-29 13:57:37.000000000 -0800
+++ ipoib-multicast/ipoib_multicast.c 2007-02-04 22:34:16.000000000 -0800
@@ -402,6 +402,11 @@ static void ipoib_mcast_join_complete(in
queue_work(ipoib_workqueue, &priv->mcast_task);
mutex_unlock(&mcast_mutex);
complete(&mcast->done);
+ /*
+ * broadcast join finished, enable carrier
+ */
+ if (mcast == priv->broadcast)
+ netif_carrier_on(dev);
return;
}

@@ -599,7 +604,6 @@ void ipoib_mcast_join_task(void *dev_ptr
ipoib_dbg_mcast(priv, "successfully joined all multicast groups\n");

clear_bit(IPOIB_MCAST_RUN, &priv->flags);
- netif_carrier_on(dev);
}

int ipoib_mcast_start_thread(struct net_device *dev)

(See attached file: ipoib-multicast.patch)

Thanks
Shirley Ma
IBM Linux Technology Center
15300 SW Koll Parkway
Beaverton, OR 97006-6063
Phone(Fax): (503) 578-7638(See attached file: ipoib-multicast.patch)
_______________________________________________
openib-general mailing list
openib-general at openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit
http://openib.org/mailman/listinfo/openib-general
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070227/d60f6b87/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: pic08451.gif
Type: image/gif
Size: 1255 bytes
Desc: not available
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070227/d60f6b87/attachment.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ecblank.gif
Type: image/gif
Size: 45 bytes
Desc: not available
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070227/d60f6b87/attachment-0001.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ipoib-multicast.patch
Type: application/octet-stream
Size: 777 bytes
Desc: not available
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070227/d60f6b87/attachment.obj>

From rdreier at cisco.com  Tue Feb 27 14:35:34 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Tue, 27 Feb 2007 14:35:34 -0800
Subject: [openib-general] Fw: [PATCH] enable IPoIB only if broadcast
 join finish
In-Reply-To: <OF3BA2F709.CD937B1E-ON8725728F.007AFBBF-8825728F.004F9115@us.ibm.com>
	(Shirley Ma's message of "Tue, 27 Feb 2007 15:28:54 -0700")
References: <OF3BA2F709.CD937B1E-ON8725728F.007AFBBF-8825728F.004F9115@us.ibm.com>
Message-ID: <adak5y3teix.fsf@cisco.com>

I don't think this applies any more since Sean's multicast stuff was
merged.  I didn't realize you wanted to get this merged upstream --
anyway, can you please regenerate the patch against the latest kernel?

Thanks


From xma at us.ibm.com  Tue Feb 27 14:38:55 2007
From: xma at us.ibm.com (Shirley Ma)
Date: Tue, 27 Feb 2007 14:38:55 -0800
Subject: [openib-general] IPOIB NAPI
In-Reply-To: <adad53wzgut.fsf@cisco.com>
Message-ID: <OFE898EECD.13A1B3FB-ON8725728F.007BDC79-8825728F.00507C3E@us.ibm.com>


Roland Dreier <rdreier at cisco.com> wrote on 02/26/2007 02:36:26 PM:
> No way, it's way too late at this point to change the kernel-user ABI,
> let alone change all ULPs.
>
>  - R.

Hello Roland,

So the IBV_CQ_REPORT_MISSED_EVENTS has been part of OFED-1.2 already? I can
generate the patch for all ULPs to use this for review. Do you need me to
do that?

Thanks
Shirley Ma
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070227/073f201f/attachment.html>

From rdreier at cisco.com  Tue Feb 27 14:41:44 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Tue, 27 Feb 2007 14:41:44 -0800
Subject: [openib-general] IPOIB NAPI
In-Reply-To: <OFE898EECD.13A1B3FB-ON8725728F.007BDC79-8825728F.00507C3E@us.ibm.com>
	(Shirley Ma's message of "Tue, 27 Feb 2007 14:38:55 -0800")
References: <OFE898EECD.13A1B3FB-ON8725728F.007BDC79-8825728F.00507C3E@us.ibm.com>
Message-ID: <adafy8rte8n.fsf@cisco.com>

 > So the IBV_CQ_REPORT_MISSED_EVENTS has been part of OFED-1.2 already? I can
 > generate the patch for all ULPs to use this for review. Do you need me to
 > do that?

No, it's not in OFED 1.2 or the upstream kernel.  And no one has
implemented it for userspace (and I'm somewhat reluctant to break the
ABI at this point without some performance numbers to motivate making
this API change).

Have the NAPI performance problems with ehca been resolved?  We could
probably merge IPoIB NAPI for 2.6.22 then, which would pull in the
kernel changes at least.

 - R.


From swise at opengridcomputing.com  Tue Feb 27 14:43:51 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Tue, 27 Feb 2007 16:43:51 -0600
Subject: [openib-general] cannot instal ofed-1.2 kernel rpm on 2.6.20.1
Message-ID: <1172616231.11870.142.camel@stevo-desktop>

I built the ofed 1.2 rpms from the OFED-1.2-20070227-0602 build and the
kernel rpm fails to install on a 2.6.20.1 kernel:

vic13:/usr/local/src/OFED-1.2-20070227-0602/RPMS/sles-release-10-15.2 # rpm -U kernel-ib-1.2-2.6.20.1.x86_64.rpm
error: Failed dependencies:
        ksym(schedule) = 1000e51 is needed by kernel-ib-1.2-2.6.20.1.x86_64
        ksym(__up_wakeup) = 1042cbb5 is needed by kernel-ib-1.2-2.6.20.1.x86_64
        ksym(pci_request_region) = 10cc2981 is needed by kernel-ib-1.2-2.6.20.1.x86_64
        ksym(skb_dequeue) = 10fc721b is needed by kernel-ib-1.2-2.6.20.1.x86_64
        ksym(mod_timer) = 14777d07 is needed by kernel-ib-1.2-2.6.20.1.x86_64
        ksym(remap_pfn_range) = 155834a8 is needed by kernel-ib-1.2-2.6.20.1.x86_64
        ksym(unregister_netevent_notifier) = 1598dc9d is needed by kernel-ib-1.2-2.6.20.1.x86_64
        ksym(bad_dma_address) = 1675606f is needed by kernel-ib-1.2-2.6.20.1.x86_64
        ksym(dev_get_by_name) = 16ab1a6b is needed by kernel-ib-1.2-2.6.20.1.x86_64

...

<many more of these deleted>

Anybody seen this?


From xma at us.ibm.com  Tue Feb 27 14:46:25 2007
From: xma at us.ibm.com (Shirley Ma)
Date: Tue, 27 Feb 2007 14:46:25 -0800
Subject: [openib-general] Fw: [PATCH] enable IPoIB only if broadcast
 join finish
In-Reply-To: <adak5y3teix.fsf@cisco.com>
Message-ID: <OF0CC9E20D.85FCD0B8-ON8725728F.007D087A-8825728F.00512BC3@us.ibm.com>


Roland Dreier <rdreier at cisco.com> wrote on 02/27/2007 02:35:34 PM:

> I don't think this applies any more since Sean's multicast stuff was
> merged.  I didn't realize you wanted to get this merged upstream --
> anyway, can you please regenerate the patch against the latest kernel?
>
> Thanks
Sure. I will generate a new patch.

Thanks
Shirley Ma
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070227/3e15f840/attachment.html>

From halr at voltaire.com  Tue Feb 27 14:48:15 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 27 Feb 2007 17:48:15 -0500
Subject: [openib-general] [PATCH] osm: trivial data type change to
 remove compilation warning
In-Reply-To: <45E2C266.5000503@dev.mellanox.co.il>
References: <45E2C266.5000503@dev.mellanox.co.il>
Message-ID: <1172616493.31770.10684.camel@hal.voltaire.com>

On Mon, 2007-02-26 at 06:20, Yevgeny Kliteynik wrote: 
> Hi Hal
> 
> Trivial data type change to remove compilation warning.
> Please apply to the trunk and to the 1.2 branch.
> 
> Thanks.
> 
> Signed-off-by: Yevgeny Kliteynik <kliteyn at dev.mellanox.co.il>

Thanks. Applied (to both master and ofed_1_2).

-- Hal


From xma at us.ibm.com  Tue Feb 27 14:54:27 2007
From: xma at us.ibm.com (Shirley Ma)
Date: Tue, 27 Feb 2007 14:54:27 -0800
Subject: [openib-general] IPOIB NAPI
In-Reply-To: <adafy8rte8n.fsf@cisco.com>
Message-ID: <OF0FE519D5.C031F5B5-ON8725728F.007D2FAA-8825728F.0051E800@us.ibm.com>


oland Dreier <rdreier at cisco.com> wrote on 02/27/2007 02:41:44 PM:

>  > So the IBV_CQ_REPORT_MISSED_EVENTS has been part of OFED-1.2 already?
I can
>  > generate the patch for all ULPs to use this for review. Do you need me
to
>  > do that?
>
> No, it's not in OFED 1.2 or the upstream kernel.  And no one has
> implemented it for userspace (and I'm somewhat reluctant to break the
> ABI at this point without some performance numbers to motivate making
> this API change).
>
> Have the NAPI performance problems with ehca been resolved?  We could
> probably merge IPoIB NAPI for 2.6.22 then, which would pull in the
> kernel changes at least.
>
>  - R.
We have addressed the NAPI performance issues with ehca driver. I believe
the patches have been upper stream. However the test results show that it's
better to delay poll again to next NAPI interval, something like this:

poll-cq
notify-cq, if missed_event && netif_rx_reschedule()
return 1

vs.
poll-cq,
notify-cq, if missed_event && netif_rx_reschedule()
poll again
return 0

It seems ehca delivering packet much faster than other HCAs. So poll again
would stay in the loop for many many times. So the above changes doesn't
impact other HCAs, I would recommand it. I saw same implementations on
other ethernet drivers.

Thanks
Shirley Ma
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070227/ac0096a6/attachment.html>

From swise at opengridcomputing.com  Tue Feb 27 15:05:37 2007
From: swise at opengridcomputing.com (Steve Wise)
Date: Tue, 27 Feb 2007 17:05:37 -0600
Subject: [openib-general] cannot instal ofed-1.2 kernel rpm on 2.6.20.1
In-Reply-To: <1172616231.11870.142.camel@stevo-desktop>
References: <1172616231.11870.142.camel@stevo-desktop>
Message-ID: <1172617537.11870.143.camel@stevo-desktop>

I opened bug 399 to track this.

I also opened bug 398 because I got an error installing opensm with this
same OFED-1.2 build.


Steve.


On Tue, 2007-02-27 at 16:43 -0600, Steve Wise wrote:
> I built the ofed 1.2 rpms from the OFED-1.2-20070227-0602 build and the
> kernel rpm fails to install on a 2.6.20.1 kernel:
> 
> vic13:/usr/local/src/OFED-1.2-20070227-0602/RPMS/sles-release-10-15.2 # rpm -U kernel-ib-1.2-2.6.20.1.x86_64.rpm
> error: Failed dependencies:
>         ksym(schedule) = 1000e51 is needed by kernel-ib-1.2-2.6.20.1.x86_64
>         ksym(__up_wakeup) = 1042cbb5 is needed by kernel-ib-1.2-2.6.20.1.x86_64
>         ksym(pci_request_region) = 10cc2981 is needed by kernel-ib-1.2-2.6.20.1.x86_64
>         ksym(skb_dequeue) = 10fc721b is needed by kernel-ib-1.2-2.6.20.1.x86_64
>         ksym(mod_timer) = 14777d07 is needed by kernel-ib-1.2-2.6.20.1.x86_64
>         ksym(remap_pfn_range) = 155834a8 is needed by kernel-ib-1.2-2.6.20.1.x86_64
>         ksym(unregister_netevent_notifier) = 1598dc9d is needed by kernel-ib-1.2-2.6.20.1.x86_64
>         ksym(bad_dma_address) = 1675606f is needed by kernel-ib-1.2-2.6.20.1.x86_64
>         ksym(dev_get_by_name) = 16ab1a6b is needed by kernel-ib-1.2-2.6.20.1.x86_64
> 
> ...
> 
> <many more of these deleted>
> 
> Anybody seen this?
> 
> 
> 
> 
> 
> 
> 
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
> 


From xma at us.ibm.com  Tue Feb 27 15:59:23 2007
From: xma at us.ibm.com (Shirley Ma)
Date: Tue, 27 Feb 2007 16:59:23 -0700
Subject: [openib-general] Fw: [PATCH] enable IPoIB only if broadcast
 join finish
In-Reply-To: <OF0CC9E20D.85FCD0B8-ON8725728F.007D087A-8825728F.00512BC3@us.ibm.com>
Message-ID: <OF33C71E03.98C43D68-ON8725728F.008355F5-8825728F.0057D9E3@us.ibm.com>


Hello Roland,

Here is the new patch against 2.6.20-rc1 kernel. Please review it.

diff -urpN ipoib/ipoib_multicast.c ipoib-link/ipoib_multicast.c
--- ipoib/ipoib_multicast.c   2007-02-27 07:21:50.000000000 -0800
+++ ipoib-link/ipoib_multicast.c    2007-02-27 07:52:10.000000000 -0800
@@ -407,6 +407,11 @@ static int ipoib_mcast_join_complete(int
                  queue_delayed_work(ipoib_workqueue,
                                 &priv->mcast_task, 0);
            mutex_unlock(&mcast_mutex);
+           /*
+            * broadcast join finished, enable carrier
+            */
+           if (unlikely(mcast == priv->broadcast))
+                 netif_carrier_on(dev);
            return 0;
      }

@@ -596,7 +601,6 @@ void ipoib_mcast_join_task(struct work_s
      ipoib_dbg_mcast(priv, "successfully joined all multicast groups\n");

      clear_bit(IPOIB_MCAST_RUN, &priv->flags);
-     netif_carrier_on(dev);
 }

 int ipoib_mcast_start_thread(struct net_device *dev)

(See attached file: ipoib-link.patch)

Thanks
Shirley Ma
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070227/12450971/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ipoib-link.patch
Type: application/octet-stream
Size: 772 bytes
Desc: not available
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070227/12450971/attachment.obj>

From bugzilla-daemon at lists.openfabrics.org  Tue Feb 27 21:00:29 2007
From: bugzilla-daemon at lists.openfabrics.org (bugzilla-daemon at lists.openfabrics.org)
Date: Tue, 27 Feb 2007 21:00:29 -0800 (PST)
Subject: [openib-general] [Bug 263] OFED 1.1 rc6: IPoIB Oops during IPoIB
	failover loop
In-Reply-To: <bug-263-1@https.bugs.openfabrics.org/>
Message-ID: <20070228050029.2EF4CE602D9@openfabrics.org>

https://bugs.openfabrics.org/show_bug.cgi?id=263


sweitzen at cisco.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|RESOLVED                    |CLOSED


------- Comment #14 from sweitzen at cisco.com  2007-02-27 21:00 -------
With OFED 1.2 alpha1, I was able to failover/failback an IB port every 10
seconds for 8 hours on RHEL4 x86_64 LionMini SDR and DDR.  Will keep testing on
other platforms.


-- 
Configure bugmail: https://bugs.openfabrics.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From mst at mellanox.co.il  Tue Feb 27 21:05:09 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Wed, 28 Feb 2007 07:05:09 +0200
Subject: [openib-general] IPOIB NAPI
In-Reply-To: <OF0FE519D5.C031F5B5-ON8725728F.007D2FAA-8825728F.0051E800@us.ibm.com>
References: <adafy8rte8n.fsf@cisco.com>
	<OF0FE519D5.C031F5B5-ON8725728F.007D2FAA-8825728F.0051E800@us.ibm.com>
Message-ID: <20070228050509.GB26317@mellanox.co.il>

> Quoting Shirley Ma <xma at us.ibm.com>:
> Subject: Re: [openib-general] IPOIB NAPI
> 
> Roland Dreier <rdreier at cisco.com> wrote on 02/27/2007 02:41:44 PM:
> 
> >  > So the IBV_CQ_REPORT_MISSED_EVENTS has been part of OFED-1.2 already? I can
> >  > generate the patch for all ULPs to use this for review. Do you need me to
> >  > do that?
> > 
> > No, it's not in OFED 1.2 or the upstream kernel.  And no one has
> > implemented it for userspace (and I'm somewhat reluctant to break the
> > ABI at this point without some performance numbers to motivate making
> > this API change).
> > 
> > Have the NAPI performance problems with ehca been resolved?  We could
> > probably merge IPoIB NAPI for 2.6.22 then, which would pull in the
> > kernel changes at least.
> > 
> >  - R.
> We have addressed the NAPI performance issues with ehca driver. I believe the patches have been upper stream. However the test results show that it's better to delay poll again to next NAPI interval, something like this:
> 
> poll-cq
> notify-cq, if missed_event && netif_rx_reschedule()
> return 1
> 
> vs.
> poll-cq,
> notify-cq, if missed_event && netif_rx_reschedule()
> poll again
> return 0
> 
> It seems ehca delivering packet much faster than other HCAs. So poll again would stay in the loop for many many times. So the above changes doesn't impact other HCAs, I would recommand it. I saw same implementations on other ethernet drivers.

I'm confused. Which one is faster?

-- 
MST


From bugzilla-daemon at lists.openfabrics.org  Tue Feb 27 21:15:07 2007
From: bugzilla-daemon at lists.openfabrics.org (bugzilla-daemon at lists.openfabrics.org)
Date: Tue, 27 Feb 2007 21:15:07 -0800 (PST)
Subject: [openib-general] [Bug 400] New: OFED 1.2 alpha1 IPoIB HA failover
 gets QP warnings
Message-ID: <bug-400-1@https.bugs.openfabrics.org/>

https://bugs.openfabrics.org/show_bug.cgi?id=400

           Summary: OFED 1.2 alpha1 IPoIB HA failover gets QP warnings
           Product: OpenFabrics Linux
           Version: 1.2alpha1
          Platform: X86-64
        OS/Version: RHEL 4
            Status: NEW
          Severity: normal
          Priority: P3
         Component: IPoIB
        AssignedTo: bugzilla at openib.org
        ReportedBy: sweitzen at cisco.com


OFED 1.2 alpha1 on RHEL4 U4 x86_64, LionMini DDR HCA.

I have IPoIB HA configured, running traffic via netperf, and bringing up/down a
different host IB port every 10 seconds.

This is working for several hours, but I see warnings in dmesg, more on server
side.

Client dmesg:

ib1: dev_queue_xmit failed to requeue packet
ib1: dev_queue_xmit failed to requeue packet
ib1: dev_queue_xmit failed to requeue packet
ib1: dev_queue_xmit failed to requeue packet
ib1: dev_queue_xmit failed to requeue packet
ib1: dev_queue_xmit failed to requeue packet
ib_mthca 0000:04:00.0: QP 000405 not found in MGM
ib1: ib_detach_mcast failed (result = -22)
ib1: ipoib_mcast_detach failed (result = -22)
ib_mthca 0000:04:00.0: QP 000404 not found in MGM
ib0: ib_detach_mcast failed (result = -22)
ib0: ipoib_mcast_detach failed (result = -22)
[root at svbu-qa-dl145-1 log]#

Server dmesg:

ib1: dev_queue_xmit failed to requeue packet
ib1: dev_queue_xmit failed to requeue packet
ib1: dev_queue_xmit failed to requeue packet
ib_mthca 0000:04:00.0: QP 000405 not found in MGM
ib1: ib_detach_mcast failed (result = -22)
ib1: ipoib_mcast_detach failed (result = -22)
ib1: dev_queue_xmit failed to requeue packet
ib1: dev_queue_xmit failed to requeue packet
ib0: dev_queue_xmit failed to requeue packet
ib0: dev_queue_xmit failed to requeue packet
ib0: dev_queue_xmit failed to requeue packet
ib0: dev_queue_xmit failed to requeue packet
ib0: dev_queue_xmit failed to requeue packet
ib0: dev_queue_xmit failed to requeue packet
ib1: dev_queue_xmit failed to requeue packet
ib1: dev_queue_xmit failed to requeue packet
ib_mthca 0000:04:00.0: QP 000405 not found in MGM
ib1: ib_detach_mcast failed (result = -22)
ib1: ipoib_mcast_detach failed (result = -22)
ib_mthca 0000:04:00.0: QP 000405 not found in MGM
ib1: ib_detach_mcast failed (result = -22)
ib1: ipoib_mcast_detach failed (result = -22)
ib0: dev_queue_xmit failed to requeue packet
ib1: dev_queue_xmit failed to requeue packet
ib1: dev_queue_xmit failed to requeue packet
ib_mthca 0000:04:00.0: QP 000405 not found in MGM
ib1: ib_detach_mcast failed (result = -22)
ib1: ipoib_mcast_detach failed (result = -22)
ib0: dev_queue_xmit failed to requeue packet
ib1: dev_queue_xmit failed to requeue packet
ib1: dev_queue_xmit failed to requeue packet
ib0: dev_queue_xmit failed to requeue packet
ib0: dev_queue_xmit failed to requeue packet
ib1: dev_queue_xmit failed to requeue packet
ib1: dev_queue_xmit failed to requeue packet
ib1: dev_queue_xmit failed to requeue packet
ib0: dev_queue_xmit failed to requeue packet
ib0: dev_queue_xmit failed to requeue packet
ib1: dev_queue_xmit failed to requeue packet
ib1: dev_queue_xmit failed to requeue packet
ib_mthca 0000:04:00.0: QP 000405 not found in MGM
ib1: ib_detach_mcast failed (result = -22)
ib1: ipoib_mcast_detach failed (result = -22)
ib1: dev_queue_xmit failed to requeue packet
ib1: dev_queue_xmit failed to requeue packet
ib1: dev_queue_xmit failed to requeue packet
ib_mthca 0000:04:00.0: QP 000405 not found in MGM
ib1: ib_detach_mcast failed (result = -22)
ib1: ipoib_mcast_detach failed (result = -22)
ib1: dev_queue_xmit failed to requeue packet
ib1: dev_queue_xmit failed to requeue packet
ib1: dev_queue_xmit failed to requeue packet
ib_mthca 0000:04:00.0: QP 000405 not found in MGM
ib1: ib_detach_mcast failed (result = -22)
ib1: ipoib_mcast_detach failed (result = -22)
ib_mthca 0000:04:00.0: QP 000405 not found in MGM
ib1: ib_detach_mcast failed (result = -22)
ib1: ipoib_mcast_detach failed (result = -22)
ib0: dev_queue_xmit failed to requeue packet
ib0: dev_queue_xmit failed to requeue packet
ib1: dev_queue_xmit failed to requeue packet
ib1: dev_queue_xmit failed to requeue packet
ib1: dev_queue_xmit failed to requeue packet
ib0: dev_queue_xmit failed to requeue packet
ib0: dev_queue_xmit failed to requeue packet
ib0: dev_queue_xmit failed to requeue packet
ib1: dev_queue_xmit failed to requeue packet
ib1: dev_queue_xmit failed to requeue packet
ib1: dev_queue_xmit failed to requeue packet
ib1: dev_queue_xmit failed to requeue packet
ib1: dev_queue_xmit failed to requeue packet
ib1: dev_queue_xmit failed to requeue packet
ib0: dev_queue_xmit failed to requeue packet
ib0: dev_queue_xmit failed to requeue packet
[root at svbu-qa-dl145-2 log]#


-- 
Configure bugmail: https://bugs.openfabrics.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at lists.openfabrics.org  Tue Feb 27 21:18:07 2007
From: bugzilla-daemon at lists.openfabrics.org (bugzilla-daemon at lists.openfabrics.org)
Date: Tue, 27 Feb 2007 21:18:07 -0800 (PST)
Subject: [openib-general] [Bug 400] OFED 1.2 alpha1 IPoIB HA failover gets
	QP warnings
In-Reply-To: <bug-400-1@https.bugs.openfabrics.org/>
Message-ID: <20070228051807.9DD61E603C6@openfabrics.org>

https://bugs.openfabrics.org/show_bug.cgi?id=400


sweitzen at cisco.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         AssignedTo|bugzilla at openib.org         |rolandd at cisco.com


------- Comment #1 from sweitzen at cisco.com  2007-02-27 21:18 -------
Roland, can you take a look at this, please?


-- 
Configure bugmail: https://bugs.openfabrics.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
You are the assignee for the bug, or are watching the assignee.


From xma at us.ibm.com  Tue Feb 27 22:06:35 2007
From: xma at us.ibm.com (Shirley Ma)
Date: Tue, 27 Feb 2007 22:06:35 -0800
Subject: [OFA General] Re: [openib-general] IPOIB NAPI
In-Reply-To: <20070228050509.GB26317@mellanox.co.il>
Message-ID: <OF90EA3B8E.9F5F3E9C-ON87257290.00217A24-8825728F.00797824@us.ibm.com>


>I'm confused. Which one is faster?
Sorry for the confusion, Michael. The one with return 1 has better
throughput.

Thanks
Shirley Ma
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070227/152ed792/attachment.html>

From bugzilla-daemon at lists.openfabrics.org  Tue Feb 27 22:18:53 2007
From: bugzilla-daemon at lists.openfabrics.org (bugzilla-daemon at lists.openfabrics.org)
Date: Tue, 27 Feb 2007 22:18:53 -0800 (PST)
Subject: [OFA General] [Bug 371] IPoIB HA not working properly with
	OFED1.2-alpha
In-Reply-To: <bug-371-1@https.bugs.openfabrics.org/>
Message-ID: <20070228061853.784E3E60812@openfabrics.org>

https://bugs.openfabrics.org/show_bug.cgi?id=371


sweitzen at cisco.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |sweitzen at cisco.com


-- 
Configure bugmail: https://bugs.openfabrics.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at lists.openfabrics.org  Tue Feb 27 23:08:55 2007
From: bugzilla-daemon at lists.openfabrics.org (bugzilla-daemon at lists.openfabrics.org)
Date: Tue, 27 Feb 2007 23:08:55 -0800 (PST)
Subject: [OFA General] [Bug 371] IPoIB HA not working properly with
	OFED1.2-alpha
In-Reply-To: <bug-371-1@https.bugs.openfabrics.org/>
Message-ID: <20070228070856.0A44EE60810@openfabrics.org>

https://bugs.openfabrics.org/show_bug.cgi?id=371


mst at mellanox.co.il changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         AssignedTo|bugzilla at openib.org         |vlad at mellanox.co.il


------- Comment #2 from mst at mellanox.co.il  2007-02-27 23:08 -------
Assigned to Vlad.


-- 
Configure bugmail: https://bugs.openfabrics.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
You are the assignee for the bug, or are watching the assignee.


From mplee at sandia.gov  Tue Feb 27 23:17:34 2007
From: mplee at sandia.gov (Lee, Michael Paichi)
Date: Wed, 28 Feb 2007 00:17:34 -0700
Subject: [OFA General] List Address Change Completed
Message-ID: <3D84A59A1AD3584DA02AEAD240E8863F0366949A@ES22SNLNT.srn.sandia.gov>

This list has been migrated to the new server, lists.openfabrics.org.  Please update any address book or filter settings to reflect the new mailing list address.  Future messages and replies should be sent to this address:

general at lists.openfabrics.org

The new web address for this list is:

http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

If you have any questions, please contact me at mplee at sandia.gov	

Regards,
Michael
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070228/d59049bb/attachment.html>

From mplee at sandia.gov  Tue Feb 27 23:17:34 2007
From: mplee at sandia.gov (Lee, Michael Paichi)
Date: Wed, 28 Feb 2007 00:17:34 -0700
Subject: [OFA General] List Address Change Completed
Message-ID: <3D84A59A1AD3584DA02AEAD240E8863F0366949A@ES22SNLNT.srn.sandia.gov>

This list has been migrated to the new server, lists.openfabrics.org.  Please update any address book or filter settings to reflect the new mailing list address.  Future messages and replies should be sent to this address:

general at lists.openfabrics.org

The new web address for this list is:

http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

If you have any questions, please contact me at mplee at sandia.gov	

Regards,
Michael
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070228/d59049bb/attachment-0001.html>

From mst at mellanox.co.il  Tue Feb 27 23:17:06 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Wed, 28 Feb 2007 09:17:06 +0200
Subject: [OFA General] Re: IPOIB NAPI
In-Reply-To: <OF0FE519D5.C031F5B5-ON8725728F.007D2FAA-8825728F.0051E800@us.ibm.com>
References: <adafy8rte8n.fsf@cisco.com>
	<OF0FE519D5.C031F5B5-ON8725728F.007D2FAA-8825728F.0051E800@us.ibm.com>
Message-ID: <20070228071706.GA22246@mellanox.co.il>

> Quoting Shirley Ma <xma at us.ibm.com>:
> Subject: Re: IPOIB NAPI
> 
> oland Dreier <rdreier at cisco.com> wrote on 02/27/2007 02:41:44 PM:
> 
> >  > So the IBV_CQ_REPORT_MISSED_EVENTS has been part of OFED-1.2 already? I
> can
> >  > generate the patch for all ULPs to use this for review. Do you need me to
> >  > do that?
> >
> > No, it's not in OFED 1.2 or the upstream kernel.  And no one has
> > implemented it for userspace (and I'm somewhat reluctant to break the
> > ABI at this point without some performance numbers to motivate making
> > this API change).
> >
> > Have the NAPI performance problems with ehca been resolved?  We could
> > probably merge IPoIB NAPI for 2.6.22 then, which would pull in the
> > kernel changes at least.
> >
> >  - R.
> We have addressed the NAPI performance issues with ehca driver. I believe the
> patches have been upper stream. However the test results show that it's better
> to delay poll again to next NAPI interval, something like this:
> 
> poll-cq
> notify-cq, if missed_event && netif_rx_reschedule()
> return 1
> 
> vs.
> poll-cq,
> notify-cq, if missed_event && netif_rx_reschedule()
> poll again
> return 0
> 
> It seems ehca delivering packet much faster than other HCAs. So poll again
> would stay in the loop for many many times. So the above changes doesn't impact
> other HCAs, I would recommand it. I saw same implementations on other ethernet
> drivers.

I have not benchmarked this, but actually the "return 1" version makes sense to
me too: since a new completion was observed after notify-cq, we likely currently
have HCA writing new completions into the CQ at a high rate, so it makes sense
to delay polling by a few cycles, and reduce the number of interrupts in this
way.

Right?

-- 
MST


From mst at mellanox.co.il  Tue Feb 27 23:23:41 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Wed, 28 Feb 2007 09:23:41 +0200
Subject: [OFA General] List Address Change Completed
In-Reply-To: <3D84A59A1AD3584DA02AEAD240E8863F0366949A@ES22SNLNT.srn.sandia.gov>
References: <3D84A59A1AD3584DA02AEAD240E8863F0366949A@ES22SNLNT.srn.sandia.gov>
Message-ID: <20070228072341.GB22246@mellanox.co.il>

> Quoting Lee, Michael Paichi <mplee at sandia.gov>:
> Subject: [OFA General] List Address Change Completed
> 
> This list has been migrated to the new server, lists.openfabrics.org.  Please update any address book or filter settings to reflect the new mailing list address.  Future messages and replies should be sent to this address:
> 
> general at lists.openfabrics.org
> 
> The new web address for this list is:
> 
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> 
> If you have any questions, please contact me at mplee at sandia.gov       

Can the subject prefix be made all lower-case, with dash, please?
OFA General -> ofa-general?

Upper case words look like shouting to me, and e.g. exchange rules
are limited in coping with spaces.

-- 
MST


From mst at mellanox.co.il  Tue Feb 27 23:23:41 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Wed, 28 Feb 2007 09:23:41 +0200
Subject: [OFA General] List Address Change Completed
In-Reply-To: <3D84A59A1AD3584DA02AEAD240E8863F0366949A@ES22SNLNT.srn.sandia.gov>
References: <3D84A59A1AD3584DA02AEAD240E8863F0366949A@ES22SNLNT.srn.sandia.gov>
Message-ID: <20070228072341.GB22246@mellanox.co.il>

> Quoting Lee, Michael Paichi <mplee at sandia.gov>:
> Subject: [OFA General] List Address Change Completed
> 
> This list has been migrated to the new server, lists.openfabrics.org.  Please update any address book or filter settings to reflect the new mailing list address.  Future messages and replies should be sent to this address:
> 
> general at lists.openfabrics.org
> 
> The new web address for this list is:
> 
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> 
> If you have any questions, please contact me at mplee at sandia.gov       

Can the subject prefix be made all lower-case, with dash, please?
OFA General -> ofa-general?

Upper case words look like shouting to me, and e.g. exchange rules
are limited in coping with spaces.

-- 
MST


From vlad at mellanox.co.il  Tue Feb 27 23:28:25 2007
From: vlad at mellanox.co.il (Vladimir Sokolovsky)
Date: Wed, 28 Feb 2007 09:28:25 +0200
Subject: [OFA General] Re: [PATCH  0/6] ofed_1_2: cxgb3 bug fixes
In-Reply-To: <20070227155953.21615.96154.stgit@dell3.ogc.int>
References: <20070227155953.21615.96154.stgit@dell3.ogc.int>
Message-ID: <1172647705.21382.101.camel@vladsk-laptop>

On Tue, 2007-02-27 at 09:59 -0600, Steve Wise wrote:
> Hey Vlad,
> 
> These fixes need to be pulled into ofed_1_2 for the Chelsio Ethernet
> driver.
> 
> You can pull them directly from my ofa git tree:
> 
> git://staging.openfabrics.org/~swise/ofed_1_2 cxgb3_fixes
> 
> Thanks,
> 
> Steve.

Applied.

-- 
Vladimir Sokolovsky <vlad at mellanox.co.il>
Mellanox Technologies Ltd.


From mplee at sandia.gov  Tue Feb 27 23:32:10 2007
From: mplee at sandia.gov (Lee, Michael Paichi)
Date: Wed, 28 Feb 2007 00:32:10 -0700
Subject: [ofa-general] RE: [OFA General] List Address Change Completed
References: <3D84A59A1AD3584DA02AEAD240E8863F0366949A@ES22SNLNT.srn.sandia.gov>
	<20070228072341.GB22246@mellanox.co.il>
Message-ID: <3D84A59A1AD3584DA02AEAD240E8863F0366949B@ES22SNLNT.srn.sandia.gov>

Done


-----Original Message-----
From: Michael S. Tsirkin [mailto:mst at mellanox.co.il]
Sent: Tue 2/27/2007 11:23 PM
To: Lee, Michael Paichi
Cc: general at lists.openfabrics.org; openib-general at openib.org
Subject: Re: [OFA General] List Address Change Completed
 
> Quoting Lee, Michael Paichi <mplee at sandia.gov>:
> Subject: [OFA General] List Address Change Completed
> 
> This list has been migrated to the new server, lists.openfabrics.org.  Please update any address book or filter settings to reflect the new mailing list address.  Future messages and replies should be sent to this address:
> 
> general at lists.openfabrics.org
> 
> The new web address for this list is:
> 
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> 
> If you have any questions, please contact me at mplee at sandia.gov       

Can the subject prefix be made all lower-case, with dash, please?
OFA General -> ofa-general?

Upper case words look like shouting to me, and e.g. exchange rules
are limited in coping with spaces.

-- 
MST


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070228/8849999f/attachment.html>

From mplee at sandia.gov  Tue Feb 27 23:32:10 2007
From: mplee at sandia.gov (Lee, Michael Paichi)
Date: Wed, 28 Feb 2007 00:32:10 -0700
Subject: [ofa-general] RE: [OFA General] List Address Change Completed
References: <3D84A59A1AD3584DA02AEAD240E8863F0366949A@ES22SNLNT.srn.sandia.gov>
	<20070228072341.GB22246@mellanox.co.il>
Message-ID: <3D84A59A1AD3584DA02AEAD240E8863F0366949B@ES22SNLNT.srn.sandia.gov>

Done


-----Original Message-----
From: Michael S. Tsirkin [mailto:mst at mellanox.co.il]
Sent: Tue 2/27/2007 11:23 PM
To: Lee, Michael Paichi
Cc: general at lists.openfabrics.org; openib-general at openib.org
Subject: Re: [OFA General] List Address Change Completed
 
> Quoting Lee, Michael Paichi <mplee at sandia.gov>:
> Subject: [OFA General] List Address Change Completed
> 
> This list has been migrated to the new server, lists.openfabrics.org.  Please update any address book or filter settings to reflect the new mailing list address.  Future messages and replies should be sent to this address:
> 
> general at lists.openfabrics.org
> 
> The new web address for this list is:
> 
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> 
> If you have any questions, please contact me at mplee at sandia.gov       

Can the subject prefix be made all lower-case, with dash, please?
OFA General -> ofa-general?

Upper case words look like shouting to me, and e.g. exchange rules
are limited in coping with spaces.

-- 
MST


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070228/8849999f/attachment-0001.html>

From kliteyn at dev.mellanox.co.il  Wed Feb 28 01:07:31 2007
From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik)
Date: Wed, 28 Feb 2007 11:07:31 +0200
Subject: [ofa-general] [PATCH] osm: Trivial changes for compilation on
	windows
Message-ID: <45E54653.6010300@dev.mellanox.co.il>

Hi Hal.

This patch has trivial data types changes and redefining a macro.


BTW, Sasha, do we still need this macro (NOISE_L in osm_ucast_updn.c)?

Signed-off-by: Yevgeny Kliteynik <kliteyn at dev.mellanox.co.il>
---
 osm/include/opensm/osm_switch.h |    4 ++--
 osm/opensm/osm_ucast_mgr.c      |    2 +-
 osm/opensm/osm_ucast_updn.c     |    6 +++++-
 3 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/osm/include/opensm/osm_switch.h b/osm/include/opensm/osm_switch.h
index 1b3c35d..36c531c 100644
--- a/osm/include/opensm/osm_switch.h
+++ b/osm/include/opensm/osm_switch.h
@@ -105,8 +105,8 @@ typedef struct _osm_switch
 	osm_node_t				*p_node;
 	ib_switch_info_t			switch_info;
 	uint16_t				max_lid_ho;
-	unsigned				num_ports;
-	unsigned				num_hops;
+	uint8_t					num_ports;
+	uint16_t				num_hops;
 	uint8_t					**hops;
 	osm_port_profile_t			*p_prof;
 	osm_fwd_tbl_t				fwd_tbl;
diff --git a/osm/opensm/osm_ucast_mgr.c b/osm/opensm/osm_ucast_mgr.c
index 473baa6..2ab1c3b 100644
--- a/osm/opensm/osm_ucast_mgr.c
+++ b/osm/opensm/osm_ucast_mgr.c
@@ -411,7 +411,7 @@ __osm_ucast_mgr_setup_switch(
   IN cl_map_item_t* const  p_map_item,
   IN void* cxt )
 {
-  uint16_t lids = cl_ptr_vector_get_size(&((osm_subn_t *)cxt)->port_lid_tbl);
+  uint16_t lids = (uint16_t)cl_ptr_vector_get_size(&((osm_subn_t *)cxt)->port_lid_tbl);
 
   osm_switch_prepare_path_rebuild((osm_switch_t *)p_map_item,
                                   lids ? lids - 1 : 0);
diff --git a/osm/opensm/osm_ucast_updn.c b/osm/opensm/osm_ucast_updn.c
index 93f54f4..679346e 100644
--- a/osm/opensm/osm_ucast_updn.c
+++ b/osm/opensm/osm_ucast_updn.c
@@ -97,7 +97,11 @@ struct updn_node {
   unsigned visited;
 };
 
+#ifndef WIN32
 #define NOISE_L(log, fmt, arg...)
+#else
+#define NOISE_L
+#endif
 
 /* ///////////////////////////////// */
 /*  Statics                          */
@@ -294,7 +298,7 @@ __updn_bfs_by_node(
                "move from 0x%016" PRIx64 " rank: %u "
                "to 0x%016" PRIx64" rank: %u\n",
                cl_ntoh64(current_guid), u->rank,
-               cl_ntoh64(remote_guid), rem->rank );
+               cl_ntoh64(remote_guid), rem_u->rank );
       /* Check if this is a legal step : the only illegal step is going
          from DOWN to UP */
       if ((current_dir == DOWN) && (next_dir == UP))
-- 
1.4.4.1.GIT


From tziporet at mellanox.co.il  Wed Feb 28 02:01:32 2007
From: tziporet at mellanox.co.il (Tziporet Koren)
Date: Wed, 28 Feb 2007 12:01:32 +0200
Subject: [ofa-general] IPoIB caused a kernel: BUG: soft lockup detected on
	CPU#0!
Message-ID: <45E552FC.4040305@mellanox.co.il>

Hi Roland,

When running stress tests over IPoIB CM a kernel bug occurred (with 
kernel 2.6.20):

Feb 27 17:47:52 sw169 kernel: BUG: soft lockup detected on CPU#0!
Feb 27 17:47:52 sw169 kernel:
Feb 27 17:47:52 sw169 kernel: Call Trace:
Feb 27 17:47:52 sw169 kernel:  <IRQ>  [<ffffffff80252eef>] 
softlockup_tick+0xd2/0xe4
Feb 27 17:47:52 sw169 kernel:  [<ffffffff802399a4>] 
update_process_times+0x42/0x68
Feb 27 17:47:52 sw169 kernel:  [<ffffffff80218260>] 
smp_local_timer_interrupt+0x31/0x52
Feb 27 17:47:52 sw169 kernel:  [<ffffffff802182d0>] 
smp_apic_timer_interrupt+0x4f/0x66
Feb 27 17:47:52 sw169 kernel:  [<ffffffff8020a166>] 
apic_timer_interrupt+0x66/0x70
Feb 27 17:47:52 sw169 kernel:  [<ffffffff8053aaf1>] 
_spin_lock_irqsave+0x15/0x24
Feb 27 17:47:52 sw169 kernel:  [<ffffffff88067a23>] 
:ib_ipoib:ipoib_neigh_destructor+0xc2/0x139
Feb 27 17:47:52 sw169 kernel:  [<ffffffff804c216b>] neigh_destroy+0xc2/0x10e
Feb 27 17:47:52 sw169 kernel:  [<ffffffff804c198c>] dst_destroy+0x5f/0xd6
Feb 27 17:47:52 sw169 kernel:  [<ffffffff804c1a6f>] dst_run_gc+0x6c/0x12a
Feb 27 17:47:52 sw169 kernel:  [<ffffffff804c1a03>] dst_run_gc+0x0/0x12a
Feb 27 17:47:52 sw169 kernel:  [<ffffffff802398fe>] 
run_timer_softirq+0x14f/0x1a0
Feb 27 17:47:52 sw169 kernel:  [<ffffffff80235d89>] __do_softirq+0x50/0xbb
Feb 27 17:47:52 sw169 kernel:  [<ffffffff8020a6bc>] call_softirq+0x1c/0x28
Feb 27 17:47:52 sw169 kernel:  [<ffffffff8020bb0f>] do_softirq+0x2e/0x97
Feb 27 17:47:52 sw169 kernel:  [<ffffffff802182d5>] 
smp_apic_timer_interrupt+0x54/0x66
Feb 27 17:47:52 sw169 kernel:  [<ffffffff80207e66>] mwait_idle+0x0/0x42
Feb 27 17:47:52 sw169 kernel:  [<ffffffff8020a166>] 
apic_timer_interrupt+0x66/0x70
Feb 27 17:47:52 sw169 kernel:  <EOI>  [<ffffffff80207ea5>] 
mwait_idle+0x3f/0x42
Feb 27 17:47:52 sw169 kernel:  [<ffffffff80207e01>] cpu_idle+0x8b/0xae
Feb 27 17:47:52 sw169 kernel:  [<ffffffff80744709>] start_kernel+0x212/0x214
Feb 27 17:47:52 sw169 kernel:  [<ffffffff80744175>] _sinittext+0x175/0x179


To reproduce:
Need 2 machines back2back (A and B), and opensm installed on machine B.
On A machine run:
ping B (its ib0 address)

On machine B:
Copy scripts from http://www.openfabrics.org/~tziporet/ipoib_scripts/ <http://www.openfabrics.org/%7Etziporet/ipoib_scripts/> to a
local directory and edit them to include the correct ib0 IP address of machine A.

Run: runscripts.sh 

Tziporet


From rf at q-leap.de  Wed Feb 28 02:20:16 2007
From: rf at q-leap.de (Roland Fehrenbacher)
Date: Wed, 28 Feb 2007 11:20:16 +0100
Subject: [ofa-general] Performance penalty of OFED 1.1 versus IBGD 1.8.2
Message-ID: <17893.22368.748298.755523@gargle.gargle.HOWL>

Hi,

I'm migrating from IBGD 1.8.2 (kernel 2.6.15.7) to OFED 1.1, and saw
some unpleasant performance drops when using OFED 1.1 (kernel 2.6.20.1
with included IB drivers). The main drop is in throughput as measured
by the OSU MPI bandwidth benchmark. However, the latency for large
packet sizes is also worse (see results below). I tried with and
without "options ib_mthca msi_x=1" (using IBGD, disabling msi_x makes
a siginficant performance difference of approx. 10%). The IB card is a
Mellanox MHGS18-XT (PCIe/DDR Firmware 1.2.0) running on an Opteron
with nForce4 2200 Professional chipset.

Does anybody have an explanation or even better a solution to this
issue?

Thanks,

Roland

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: osu_bench.result
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070228/d91569cb/attachment.ksh>

From mst at mellanox.co.il  Wed Feb 28 02:31:31 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Wed, 28 Feb 2007 12:31:31 +0200
Subject: [ofa-general] Performance penalty of OFED 1.1 versus IBGD 1.8.2
In-Reply-To: <17893.22368.748298.755523@gargle.gargle.HOWL>
References: <17893.22368.748298.755523@gargle.gargle.HOWL>
Message-ID: <20070228103131.GC28054@mellanox.co.il>

> Quoting Roland Fehrenbacher <rf at q-leap.de>:
> Subject: [ofa-general] Performance penalty of OFED 1.1 versus IBGD 1.8.2
> 
> Content-Description: message body text
> Hi,
> 
> I'm migrating from IBGD 1.8.2 (kernel 2.6.15.7) to OFED 1.1, and saw
> some unpleasant performance drops when using OFED 1.1 (kernel 2.6.20.1
> with included IB drivers). The main drop is in throughput as measured
> by the OSU MPI bandwidth benchmark. However, the latency for large
> packet sizes is also worse (see results below). I tried with and
> without "options ib_mthca msi_x=1" (using IBGD, disabling msi_x makes
> a siginficant performance difference of approx. 10%). The IB card is a
> Mellanox MHGS18-XT (PCIe/DDR Firmware 1.2.0) running on an Opteron
> with nForce4 2200 Professional chipset.
> 
> Does anybody have an explanation or even better a solution to this
> issue?

Could be a BIOS bug. Try setting tune_pci=1. If this helps, contact your BIOS
vendor: here's an explanation about what this parameter does:

http://www.mail-archive.com/openib-general at openib.org/msg25305.html

-- 
MST


From rf at q-leap.de  Wed Feb 28 03:00:02 2007
From: rf at q-leap.de (Roland Fehrenbacher)
Date: Wed, 28 Feb 2007 12:00:02 +0100
Subject: [ofa-general] Performance penalty of OFED 1.1 versus IBGD 1.8.2
In-Reply-To: <20070228103131.GC28054@mellanox.co.il>
References: <17893.22368.748298.755523@gargle.gargle.HOWL>
	<20070228103131.GC28054@mellanox.co.il>
Message-ID: <17893.24754.773054.426451@gargle.gargle.HOWL>

>>>>> "MST" == Michael S Tsirkin <mst at mellanox.co.il> writes:

    >> Quoting Roland Fehrenbacher <rf at q-leap.de>: Subject:
    >> [ofa-general] Performance penalty of OFED 1.1 versus IBGD 1.8.2
    >> 
    >> Hi,
    >> 
    >> I'm migrating from IBGD 1.8.2 (kernel 2.6.15.7) to OFED 1.1,
    >> and saw some unpleasant performance drops when using OFED 1.1
    >> (kernel 2.6.20.1 with included IB drivers). The main drop is in
    >> throughput as measured by the OSU MPI bandwidth
    >> benchmark. However, the latency for large packet sizes is also
    >> worse (see results below). I tried with and without "options
    >> ib_mthca msi_x=1" (using IBGD, disabling msi_x makes a
    >> siginficant performance difference of approx. 10%). The IB card
    >> is a Mellanox MHGS18-XT (PCIe/DDR Firmware 1.2.0) running on an
    >> Opteron with nForce4 2200 Professional chipset.
    >> 
    >> Does anybody have an explanation or even better a solution to
    >> this issue?

    MST> Could be a BIOS bug. Try setting tune_pci=1. If this helps,
    MST> contact your BIOS vendor: here's an explanation about what
    MST> this parameter does:

    MST> http://www.mail-archive.com/openib-general at openib.org/msg25305.html

I tried this with no effect. Just to make sure the settings are in
effect, is there a way I can check this after booting?

Roland


From vlad at lists.openfabrics.org  Wed Feb 28 02:59:18 2007
From: vlad at lists.openfabrics.org (vlad at lists.openfabrics.org)
Date: Wed, 28 Feb 2007 02:59:18 -0800 (PST)
Subject: [ofa-general] ofa_1_2_kernel 20070228-0200 daily build status
Message-ID: <20070228105918.D9287E603C6@openfabrics.org>

This email was generated automatically, please do not reply


Common build parameters:  --with-ipoib-mod --with-sdp-mod --with-srp-mod --with-user_mad-mod --with-user_access-mod --with-mthca-mod --with-core-mod --with-addr_trans-mod  --with-rds-mod --with-cxgb3-mod

Passed:
Passed on i686 with 2.6.15-23-server
Passed on i686 with linux-2.6.18
Passed on i686 with linux-2.6.19
Passed on i686 with linux-2.6.17
Passed on i686 with linux-2.6.16
Passed on i686 with linux-2.6.12
Passed on i686 with linux-2.6.14
Passed on i686 with linux-2.6.13
Passed on i686 with linux-2.6.15
Passed on powerpc with linux-2.6.18
Passed on x86_64 with linux-2.6.16
Passed on x86_64 with linux-2.6.14
Passed on x86_64 with linux-2.6.18
Passed on x86_64 with linux-2.6.13
Passed on powerpc with linux-2.6.17
Passed on ia64 with linux-2.6.12
Passed on ia64 with linux-2.6.17
Passed on x86_64 with linux-2.6.15
Passed on ia64 with linux-2.6.16
Passed on x86_64 with linux-2.6.12
Passed on ppc64 with linux-2.6.12
Passed on x86_64 with linux-2.6.20
Passed on x86_64 with linux-2.6.19
Passed on powerpc with linux-2.6.12
Passed on ia64 with linux-2.6.13
Passed on ia64 with linux-2.6.18
Passed on powerpc with linux-2.6.16
Passed on x86_64 with linux-2.6.17
Passed on powerpc with linux-2.6.13
Passed on ia64 with linux-2.6.15
Passed on powerpc with linux-2.6.15
Passed on ppc64 with linux-2.6.18
Passed on ia64 with linux-2.6.14
Passed on powerpc with linux-2.6.14
Passed on ppc64 with linux-2.6.14
Passed on ppc64 with linux-2.6.15
Passed on ppc64 with linux-2.6.13
Passed on ppc64 with linux-2.6.16
Passed on ppc64 with linux-2.6.17
Passed on x86_64 with linux-2.6.9-42.ELsmp
Passed on x86_64 with linux-2.6.18-1.2798.fc6

Failed:
Build failed on powerpc with linux-2.6.19
Log:
/home/vlad/tmp/ofa_1_2_kernel-20070228-0200_linux-2.6.19_powerpc_check/drivers/infiniband/ulp/vnic/vnic_control.c:1404: error: implicit declaration of function ‘vmalloc’
/home/vlad/tmp/ofa_1_2_kernel-20070228-0200_linux-2.6.19_powerpc_check/drivers/infiniband/ulp/vnic/vnic_control.c:1404: warning: assignment makes pointer from integer without a cast
/home/vlad/tmp/ofa_1_2_kernel-20070228-0200_linux-2.6.19_powerpc_check/drivers/infiniband/ulp/vnic/vnic_control.c:1440: error: implicit declaration of function ‘vfree’
make[4]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070228-0200_linux-2.6.19_powerpc_check/drivers/infiniband/ulp/vnic/vnic_control.o] Error 1
make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070228-0200_linux-2.6.19_powerpc_check/drivers/infiniband/ulp/vnic] Error 2
make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070228-0200_linux-2.6.19_powerpc_check/drivers/infiniband] Error 2
make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070228-0200_linux-2.6.19_powerpc_check] Error 2
make[1]: Leaving directory `/home/vlad/kernel.org/powerpc/linux-2.6.19'
make: *** [kernel] Error 2
----------------------------------------------------------------------------------
Build failed on ia64 with linux-2.6.19
Log:
/home/vlad/tmp/ofa_1_2_kernel-20070228-0200_linux-2.6.19_ia64_check/drivers/infiniband/ulp/vnic/vnic_control.c:1736: warning: format ‘%llx’ expects type ‘long long unsigned int’, but argument 3 has type ‘long unsigned int’
/home/vlad/tmp/ofa_1_2_kernel-20070228-0200_linux-2.6.19_ia64_check/drivers/infiniband/ulp/vnic/vnic_control.c: In function ‘control_log_data_path_pkt’:
/home/vlad/tmp/ofa_1_2_kernel-20070228-0200_linux-2.6.19_ia64_check/drivers/infiniband/ulp/vnic/vnic_control.c:1751: warning: format ‘%llx’ expects type ‘long long unsigned int’, but argument 2 has type ‘u64’
make[4]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070228-0200_linux-2.6.19_ia64_check/drivers/infiniband/ulp/vnic/vnic_control.o] Error 1
make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070228-0200_linux-2.6.19_ia64_check/drivers/infiniband/ulp/vnic] Error 2
make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070228-0200_linux-2.6.19_ia64_check/drivers/infiniband] Error 2
make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070228-0200_linux-2.6.19_ia64_check] Error 2
make[1]: Leaving directory `/home/vlad/kernel.org/ia64/linux-2.6.19'
make: *** [kernel] Error 2
----------------------------------------------------------------------------------
Build failed on ppc64 with linux-2.6.19
Log:
/home/vlad/tmp/ofa_1_2_kernel-20070228-0200_linux-2.6.19_ppc64_check/drivers/infiniband/ulp/vnic/vnic_control.c:1736: warning: format ‘%llx’ expects type ‘long long unsigned int’, but argument 3 has type ‘__be64’
/home/vlad/tmp/ofa_1_2_kernel-20070228-0200_linux-2.6.19_ppc64_check/drivers/infiniband/ulp/vnic/vnic_control.c: In function ‘control_log_data_path_pkt’:
/home/vlad/tmp/ofa_1_2_kernel-20070228-0200_linux-2.6.19_ppc64_check/drivers/infiniband/ulp/vnic/vnic_control.c:1751: warning: format ‘%llx’ expects type ‘long long unsigned int’, but argument 2 has type ‘u64’
make[4]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070228-0200_linux-2.6.19_ppc64_check/drivers/infiniband/ulp/vnic/vnic_control.o] Error 1
make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070228-0200_linux-2.6.19_ppc64_check/drivers/infiniband/ulp/vnic] Error 2
make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070228-0200_linux-2.6.19_ppc64_check/drivers/infiniband] Error 2
make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070228-0200_linux-2.6.19_ppc64_check] Error 2
make[1]: Leaving directory `/home/vlad/kernel.org/ppc64/linux-2.6.19'
make: *** [kernel] Error 2
----------------------------------------------------------------------------------
Build failed on x86_64 with linux-2.6.5-7.244-smp
Log:
/home/vlad/tmp/ofa_1_2_kernel-20070228-0200_linux-2.6.5-7.244-smp_x86_64_check/net/rds/af_rds.c: In function 'rds_exit':
/home/vlad/tmp/ofa_1_2_kernel-20070228-0200_linux-2.6.5-7.244-smp_x86_64_check/net/rds/af_rds.c:468: error: implicit declaration of function 'proto_unregister'
/home/vlad/tmp/ofa_1_2_kernel-20070228-0200_linux-2.6.5-7.244-smp_x86_64_check/net/rds/af_rds.c: In function 'rds_init':
/home/vlad/tmp/ofa_1_2_kernel-20070228-0200_linux-2.6.5-7.244-smp_x86_64_check/net/rds/af_rds.c:517: error: implicit declaration of function 'proto_register'
make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070228-0200_linux-2.6.5-7.244-smp_x86_64_check/net/rds/af_rds.o] Error 1
make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070228-0200_linux-2.6.5-7.244-smp_x86_64_check/net/rds] Error 2
make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070228-0200_linux-2.6.5-7.244-smp_x86_64_check] Error 2
make[1]: Leaving directory `/home/vlad/kernel.org/x86_64/linux-2.6.5-7.244-smp'
make: *** [kernel] Error 2
----------------------------------------------------------------------------------
Build failed on x86_64 with linux-2.6.16.21-0.8-smp
Log:
In file included from /home/vlad/tmp/ofa_1_2_kernel-20070228-0200_linux-2.6.16.21-0.8-smp_x86_64_check/drivers/infiniband/ulp/vnic/vnic_sys.c:33:
include/linux/parser.h:34: error: expected declaration specifiers or ‘...’ before ‘u64’
include/linux/parser.h:35: error: expected declaration specifiers or ‘...’ before ‘s64’
make[4]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070228-0200_linux-2.6.16.21-0.8-smp_x86_64_check/drivers/infiniband/ulp/vnic/vnic_sys.o] Error 1
make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070228-0200_linux-2.6.16.21-0.8-smp_x86_64_check/drivers/infiniband/ulp/vnic] Error 2
make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070228-0200_linux-2.6.16.21-0.8-smp_x86_64_check/drivers/infiniband] Error 2
make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070228-0200_linux-2.6.16.21-0.8-smp_x86_64_check] Error 2
make[1]: Leaving directory `/home/vlad/kernel.org/x86_64/linux-2.6.16.21-0.8-smp'
make: *** [kernel] Error 2
----------------------------------------------------------------------------------
Build failed on ia64 with linux-2.6.16.21-0.8-default
Log:
/home/vlad/tmp/ofa_1_2_kernel-20070228-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/ulp/vnic/vnic_control.c:1736: warning: format ‘%llx’ expects type ‘long long unsigned int’, but argument 3 has type ‘long unsigned int’
/home/vlad/tmp/ofa_1_2_kernel-20070228-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/ulp/vnic/vnic_control.c: In function ‘control_log_data_path_pkt’:
/home/vlad/tmp/ofa_1_2_kernel-20070228-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/ulp/vnic/vnic_control.c:1751: warning: format ‘%llx’ expects type ‘long long unsigned int’, but argument 2 has type ‘u64’
make[4]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070228-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/ulp/vnic/vnic_control.o] Error 1
make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070228-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband/ulp/vnic] Error 2
make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070228-0200_linux-2.6.16.21-0.8-default_ia64_check/drivers/infiniband] Error 2
make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070228-0200_linux-2.6.16.21-0.8-default_ia64_check] Error 2
make[1]: Leaving directory `/home/vlad/kernel.org/ia64/linux-2.6.16.21-0.8-default'
make: *** [kernel] Error 2
----------------------------------------------------------------------------------
Build failed on x86_64 with linux-2.6.9-22.ELsmp
Log:
/home/vlad/tmp/ofa_1_2_kernel-20070228-0200_linux-2.6.9-22.ELsmp_x86_64_check/drivers/net/cxgb3/vsc8211.c:167: error: ‘ADVERTISE_PAUSE_CAP’ undeclared (first use in this function)
/home/vlad/tmp/ofa_1_2_kernel-20070228-0200_linux-2.6.9-22.ELsmp_x86_64_check/drivers/net/cxgb3/vsc8211.c:167: error: (Each undeclared identifier is reported only once
/home/vlad/tmp/ofa_1_2_kernel-20070228-0200_linux-2.6.9-22.ELsmp_x86_64_check/drivers/net/cxgb3/vsc8211.c:167: error: for each function it appears in.)
/home/vlad/tmp/ofa_1_2_kernel-20070228-0200_linux-2.6.9-22.ELsmp_x86_64_check/drivers/net/cxgb3/vsc8211.c:170: error: ‘ADVERTISE_PAUSE_ASYM’ undeclared (first use in this function)
make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070228-0200_linux-2.6.9-22.ELsmp_x86_64_check/drivers/net/cxgb3/vsc8211.o] Error 1
make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070228-0200_linux-2.6.9-22.ELsmp_x86_64_check/drivers/net/cxgb3] Error 2
make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070228-0200_linux-2.6.9-22.ELsmp_x86_64_check] Error 2
make[1]: Leaving directory `/home/vlad/kernel.org/x86_64/linux-2.6.9-22.ELsmp'
make: *** [kernel] Error 2
----------------------------------------------------------------------------------
Build failed on x86_64 with linux-2.6.9-34.ELsmp
Log:
/home/vlad/tmp/ofa_1_2_kernel-20070228-0200_linux-2.6.9-34.ELsmp_x86_64_check/drivers/net/cxgb3/cxgb3_offload.c: In function ‘add_adapter’:
/home/vlad/tmp/ofa_1_2_kernel-20070228-0200_linux-2.6.9-34.ELsmp_x86_64_check/drivers/net/cxgb3/cxgb3_offload.c:1061: error: ‘adapter_list_lock’ undeclared (first use in this function)
/home/vlad/tmp/ofa_1_2_kernel-20070228-0200_linux-2.6.9-34.ELsmp_x86_64_check/drivers/net/cxgb3/cxgb3_offload.c: In function ‘remove_adapter’:
/home/vlad/tmp/ofa_1_2_kernel-20070228-0200_linux-2.6.9-34.ELsmp_x86_64_check/drivers/net/cxgb3/cxgb3_offload.c:1068: error: ‘adapter_list_lock’ undeclared (first use in this function)
make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070228-0200_linux-2.6.9-34.ELsmp_x86_64_check/drivers/net/cxgb3/cxgb3_offload.o] Error 1
make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070228-0200_linux-2.6.9-34.ELsmp_x86_64_check/drivers/net/cxgb3] Error 2
make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070228-0200_linux-2.6.9-34.ELsmp_x86_64_check] Error 2
make[1]: Leaving directory `/home/vlad/kernel.org/x86_64/linux-2.6.9-34.ELsmp'
make: *** [kernel] Error 2
----------------------------------------------------------------------------------


From mst at mellanox.co.il  Wed Feb 28 03:50:47 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Wed, 28 Feb 2007 13:50:47 +0200
Subject: [ofa-general] Re: Performance penalty of OFED 1.1 versus IBGD 1.8.2
In-Reply-To: <17893.24754.773054.426451@gargle.gargle.HOWL>
References: <17893.22368.748298.755523@gargle.gargle.HOWL>
	<20070228103131.GC28054@mellanox.co.il>
	<17893.24754.773054.426451@gargle.gargle.HOWL>
Message-ID: <20070228115047.GE28054@mellanox.co.il>

> Quoting Roland Fehrenbacher <rf at q-leap.de>:
> Subject: Re: Performance penalty of OFED 1.1 versus IBGD 1.8.2
> 
> >>>>> "MST" == Michael S Tsirkin <mst at mellanox.co.il> writes:
> 
>     >> Quoting Roland Fehrenbacher <rf at q-leap.de>: Subject:
>     >> [ofa-general] Performance penalty of OFED 1.1 versus IBGD 1.8.2
>     >> 
>     >> Hi,
>     >> 
>     >> I'm migrating from IBGD 1.8.2 (kernel 2.6.15.7) to OFED 1.1,
>     >> and saw some unpleasant performance drops when using OFED 1.1
>     >> (kernel 2.6.20.1 with included IB drivers). The main drop is in
>     >> throughput as measured by the OSU MPI bandwidth
>     >> benchmark. However, the latency for large packet sizes is also
>     >> worse (see results below). I tried with and without "options
>     >> ib_mthca msi_x=1" (using IBGD, disabling msi_x makes a
>     >> siginficant performance difference of approx. 10%). The IB card
>     >> is a Mellanox MHGS18-XT (PCIe/DDR Firmware 1.2.0) running on an
>     >> Opteron with nForce4 2200 Professional chipset.
>     >> 
>     >> Does anybody have an explanation or even better a solution to
>     >> this issue?
> 
>     MST> Could be a BIOS bug. Try setting tune_pci=1. If this helps,
>     MST> contact your BIOS vendor: here's an explanation about what
>     MST> this parameter does:
> 
>     MST> http://www.mail-archive.com/openib-general at openib.org/msg25305.html
> 
> I tried this with no effect. Just to make sure the settings are in
> effect, is there a way I can check this after booting?

cat /sys/modules/ib_mthca/parameters/tune_pci


-- 
MST


From mst at mellanox.co.il  Wed Feb 28 03:58:46 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Wed, 28 Feb 2007 13:58:46 +0200
Subject: [ofa-general] [PATCH] vnic: include linux/vmalloc.h explicitly
Message-ID: <20070228115846.GF28054@mellanox.co.il>

Some VNIC files use vmalloc. These should include linux/vmalloc.h

Signed-off-by: Michael S. Tsirkin <mst at mellanox.co.il>
    
---

This has been applied to OFED git - it fixes build on 2.6.19, and I think
it's a good idea generally.

diff --git a/drivers/infiniband/ulp/vnic/vnic_control.c b/drivers/infiniband/ulp/vnic/vnic_control.c
index 2c55540..a199380 100644
--- a/drivers/infiniband/ulp/vnic/vnic_control.c
+++ b/drivers/infiniband/ulp/vnic/vnic_control.c
@@ -32,6 +32,7 @@
 
 #include <linux/netdevice.h>
 #include <linux/list.h>
+#include <linux/vmalloc.h>
 
 #include "vnic_util.h"
 #include "vnic_main.h"
diff --git a/drivers/infiniband/ulp/vnic/vnic_data.c b/drivers/infiniband/ulp/vnic/vnic_data.c
index c1d056a..33fa914 100644
--- a/drivers/infiniband/ulp/vnic/vnic_data.c
+++ b/drivers/infiniband/ulp/vnic/vnic_data.c
@@ -33,6 +33,7 @@
 #include <net/inet_sock.h>
 #include <linux/ip.h>
 #include <linux/if_ether.h>
+#include <linux/vmalloc.h>
 
 #include "vnic_util.h"
 #include "vnic_viport.h"

-- 
MST


From rf at q-leap.de  Wed Feb 28 04:23:29 2007
From: rf at q-leap.de (Roland Fehrenbacher)
Date: Wed, 28 Feb 2007 13:23:29 +0100
Subject: [ofa-general] Re: Performance penalty of OFED 1.1 versus IBGD 1.8.2
In-Reply-To: <20070228115047.GE28054@mellanox.co.il>
References: <17893.22368.748298.755523@gargle.gargle.HOWL>
	<20070228103131.GC28054@mellanox.co.il>
	<17893.24754.773054.426451@gargle.gargle.HOWL>
	<20070228115047.GE28054@mellanox.co.il>
Message-ID: <17893.29761.695854.496211@gargle.gargle.HOWL>

>>>>> "MST" == Michael S Tsirkin <mst at mellanox.co.il> writes:

    >> Quoting Roland Fehrenbacher <rf at q-leap.de>: Subject: Re:
    >> Performance penalty of OFED 1.1 versus IBGD 1.8.2

    >> I'm migrating from IBGD 1.8.2 (kernel 2.6.15.7) to OFED 1.1,
    >> and saw some unpleasant performance drops when using OFED 1.1
    >> (kernel 2.6.20.1 with included IB drivers). The main drop is in
    >> throughput as measured by the OSU MPI bandwidth
    >> benchmark. However, the latency for large packet sizes is also
    >> worse (see results below). I tried with and without "options
    >> ib_mthca msi_x=1" (using IBGD, disabling msi_x makes a
    >> siginficant performance difference of approx. 10%). The IB card
    >> is a Mellanox MHGS18-XT (PCIe/DDR Firmware 1.2.0) running on an
    >> Opteron with nForce4 2200 Professional chipset.
    >>
    >> Does anybody have an explanation or even better a solution
    >> to this issue?

    MST> Could be a BIOS bug. Try setting tune_pci=1. If this helps,
    MST> contact your BIOS vendor: here's an explanation about what
    MST> this parameter does:

    MST> http://www.mail-archive.com/openib-general at openib.org/msg25305.html

    >> I tried this with no effect. Just to make sure the settings are
    >> in effect, is there a way I can check this after booting?

    MST> cat /sys/modules/ib_mthca/parameters/tune_pci

Ok, the settings are active, but have zero effect. Anything else I
could check?

Roland


From mst at mellanox.co.il  Wed Feb 28 04:25:35 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Wed, 28 Feb 2007 14:25:35 +0200
Subject: [ofa-general] Re: Performance penalty of OFED 1.1 versus IBGD 1.8.2
In-Reply-To: <17893.29761.695854.496211@gargle.gargle.HOWL>
References: <17893.22368.748298.755523@gargle.gargle.HOWL>
	<20070228103131.GC28054@mellanox.co.il>
	<17893.24754.773054.426451@gargle.gargle.HOWL>
	<20070228115047.GE28054@mellanox.co.il>
	<17893.29761.695854.496211@gargle.gargle.HOWL>
Message-ID: <20070228122535.GA3576@mellanox.co.il>

> Quoting Roland Fehrenbacher <rf at q-leap.de>:
> Subject: Re: Performance penalty of OFED 1.1 versus IBGD 1.8.2
> 
> >>>>> "MST" == Michael S Tsirkin <mst at mellanox.co.il> writes:
> 
>     >> Quoting Roland Fehrenbacher <rf at q-leap.de>: Subject: Re:
>     >> Performance penalty of OFED 1.1 versus IBGD 1.8.2
> 
>     >> I'm migrating from IBGD 1.8.2 (kernel 2.6.15.7) to OFED 1.1,
>     >> and saw some unpleasant performance drops when using OFED 1.1
>     >> (kernel 2.6.20.1 with included IB drivers). The main drop is in
>     >> throughput as measured by the OSU MPI bandwidth
>     >> benchmark. However, the latency for large packet sizes is also
>     >> worse (see results below). I tried with and without "options
>     >> ib_mthca msi_x=1" (using IBGD, disabling msi_x makes a
>     >> siginficant performance difference of approx. 10%). The IB card
>     >> is a Mellanox MHGS18-XT (PCIe/DDR Firmware 1.2.0) running on an
>     >> Opteron with nForce4 2200 Professional chipset.
>     >>
>     >> Does anybody have an explanation or even better a solution
>     >> to this issue?
> 
>     MST> Could be a BIOS bug. Try setting tune_pci=1. If this helps,
>     MST> contact your BIOS vendor: here's an explanation about what
>     MST> this parameter does:
> 
>     MST> http://www.mail-archive.com/openib-general at openib.org/msg25305.html
> 
>     >> I tried this with no effect. Just to make sure the settings are
>     >> in effect, is there a way I can check this after booting?
> 
>     MST> cat /sys/modules/ib_mthca/parameters/tune_pci
> 
> Ok, the settings are active, but have zero effect. Anything else I
> could check?

No idea. Could be an MPI issue?

-- 
MST


From halr at voltaire.com  Wed Feb 28 04:25:57 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 28 Feb 2007 07:25:57 -0500
Subject: [ofa-general] Re: [PATCH] opensm: remove osm_matrix.* files
In-Reply-To: <20070225221943.GG11957@sashak.voltaire.com>
References: <20070225214845.GF11957@sashak.voltaire.com>
	<20070225221943.GG11957@sashak.voltaire.com>
Message-ID: <1172665541.31770.60792.camel@hal.voltaire.com>

On Sun, 2007-02-25 at 17:19, Sasha Khapyorsky wrote:
> Following previously submitted min hops reimplementation this removes
> unused osm_matrix.* files.
> 
> Signed-off-by: Sasha Khapyorsky <sashak at voltaire.com>

Thanks. Applied (to both master and ofed_1_2).

-- Hal


From hnguyen at linux.vnet.ibm.com  Wed Feb 28 04:50:03 2007
From: hnguyen at linux.vnet.ibm.com (Hoang-Nam Nguyen)
Date: Wed, 28 Feb 2007 13:50:03 +0100
Subject: [ofa-general] IPoIB caused a kernel: BUG: soft lockup detected on
	CPU#0!
Message-ID: <200702281350.03788.hnguyen@linux.vnet.ibm.com>

Hi,
I also have seen this when high traffic happens bidirectionally between two
nodes and 4 links (ppc64, ehca on 2.6.20) through ipoib. Here is a snippet
of backtraces:

BUG: soft lockup detected on CPU#23!
Call Trace:
[C00000000F5DB470] [C00000000000FC8C] .show_stack+0x5c/0x1cc (unreliable)
[C00000000F5DB520] [C00000000008731C] .softlockup_tick+0x114/0x14c
[C00000000F5DB5E0] [C000000000063210] .run_local_timers+0x1c/0x30
[C00000000F5DB660] [C000000000024244] .timer_interrupt+0xec/0x504
[C00000000F5DB750] [C000000000003570] decrementer_common+0xf0/0x100
--- Exception: 901 at .tcp_v4_rcv+0x964/0xd04
    LR = .tcp_v4_rcv+0x938/0xd04
[C00000000F5DBB30] [C00000000035A328] .ip_local_deliver+0x1ac/0x400
[C00000000F5DBBC0] [C000000000359B04] .ip_rcv+0x378/0x690
[C00000000F5DBC70] [C00000000032D5EC] .netif_receive_skb+0x550/0x574
[C00000000F5DBD20] [C00000000032D718] .process_backlog+0x108/0x250
[C00000000F5DBE00] [C00000000032B434] .net_rx_action+0x198/0x2f4
[C00000000F5DBED0] [C00000000005CB58] .__do_softirq+0xd8/0x1a0
[C00000000F5DBF90] [C00000000002761C] .call_do_softirq+0x14/0x24
[C0000003B4E23BA0] [C00000000000CE68] .do_softirq+0xb4/0xc0
[C0000003B4E23C30] [C00000000032DC78] .netif_rx_ni+0x58/0x78
[C0000003B4E23CB0] [D00000000013F638] .ipoib_ib_completion+0x2a4/0x6dc [ib_ipoib]
[C0000003B4E23DB0] [D00000000069EB94] .comp_task+0x340/0x424 [ib_ehca]
[C0000003B4E23ED0] [C00000000007338C] .kthread+0x170/0x1c0
[C0000003B4E23F90] [C0000000000277D8] .kernel_thread+0x4c/0x68

Above trace occurred on all 32 cpus multiple times.
Reason is that the kernel timer tick did not get the cpu after 10 secs
(see kernel/softlockup.c), since ipoib_ib_completion() seemed to be polling
cq in high rate. The following patch would help:

diff --git a/drivers/infiniband/ulp/ipoib/ipoib_ib.c b/drivers/infiniband/ulp/ipoib/ipoib_ib.c
index f2aa923..97ea26f 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_ib.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_ib.c
@@ -301,6 +301,7 @@ void ipoib_ib_completion(struct ib_cq *c
 		n = ib_poll_cq(cq, IPOIB_NUM_WC, priv->ibwc);
 		for (i = 0; i < n; ++i)
 			ipoib_ib_handle_wc(dev, priv->ibwc + i);
+		cond_resched();
 	} while (n == IPOIB_NUM_WC);
 }

However I still saw that BUG trace occurred on 3-4 cpus after several hrs. 
I should also mention that the systems are still functional.

Regards
Nam


From wombat2 at us.ibm.com  Wed Feb 28 04:52:10 2007
From: wombat2 at us.ibm.com (Bernard King-Smith)
Date: Wed, 28 Feb 2007 07:52:10 -0500
Subject: [ofa-general] Re: [OFA General] List Address Change Completed
In-Reply-To: <3D84A59A1AD3584DA02AEAD240E8863F0366949A@ES22SNLNT.srn.sandia.gov>
Message-ID: <OFF4A77187.E93B333B-ON85257290.004668F0-85257290.0046B3FE@us.ibm.com>

Michael,

It looks like the migration of the mailing list deleted all subscriber 
settings for  using digest mode. Before the migration I used to get 
postings in digest form, now I get individual postings. Can you restore 
the subscriber settings  for those who used to have digest mode to getting 
digest again?

Regards.

Bernie King-Smith 
IBM Corporation
Server Group
Cluster System Performance 
wombat2 at us.ibm.com    (845)433-8483
Tie. 293-8483 or wombat2 on NOTES 

"We are not responsible for the world we are born into, only for the world 
we leave when we die.
So we have to accept what has gone before us and work to change the only 
thing we can,
-- The Future." William Shatner

general-bounces at lists.openfabrics.org wrote on 02/28/2007 02:17:34 AM:

> This list has been migrated to the new server, lists.openfabrics.
> org.  Please update any address book or filter settings to reflect 
> the new mailing list address.  Future messages and replies should be
> sent to this address:
> 
> general at lists.openfabrics.org
> 
> The new web address for this list is:
> 
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> 
> If you have any questions, please contact me at mplee at sandia.gov 
> 
> Regards,
> Michael _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> 
> To unsubscribe, please visit 
http://openib.org/mailman/listinfo/openib-general
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070228/81509225/attachment.html>

From wombat2 at us.ibm.com  Wed Feb 28 04:52:10 2007
From: wombat2 at us.ibm.com (Bernard King-Smith)
Date: Wed, 28 Feb 2007 07:52:10 -0500
Subject: [ofa-general] Re: [OFA General] List Address Change Completed
In-Reply-To: <3D84A59A1AD3584DA02AEAD240E8863F0366949A@ES22SNLNT.srn.sandia.gov>
Message-ID: <OFF4A77187.E93B333B-ON85257290.004668F0-85257290.0046B3FE@us.ibm.com>

Michael,

It looks like the migration of the mailing list deleted all subscriber 
settings for  using digest mode. Before the migration I used to get 
postings in digest form, now I get individual postings. Can you restore 
the subscriber settings  for those who used to have digest mode to getting 
digest again?

Regards.

Bernie King-Smith 
IBM Corporation
Server Group
Cluster System Performance 
wombat2 at us.ibm.com    (845)433-8483
Tie. 293-8483 or wombat2 on NOTES 

"We are not responsible for the world we are born into, only for the world 
we leave when we die.
So we have to accept what has gone before us and work to change the only 
thing we can,
-- The Future." William Shatner

general-bounces at lists.openfabrics.org wrote on 02/28/2007 02:17:34 AM:

> This list has been migrated to the new server, lists.openfabrics.
> org.  Please update any address book or filter settings to reflect 
> the new mailing list address.  Future messages and replies should be
> sent to this address:
> 
> general at lists.openfabrics.org
> 
> The new web address for this list is:
> 
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> 
> If you have any questions, please contact me at mplee at sandia.gov 
> 
> Regards,
> Michael _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> 
> To unsubscribe, please visit 
http://openib.org/mailman/listinfo/openib-general
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070228/81509225/attachment-0001.html>

From bugzilla-daemon at lists.openfabrics.org  Wed Feb 28 05:00:00 2007
From: bugzilla-daemon at lists.openfabrics.org (bugzilla-daemon at lists.openfabrics.org)
Date: Wed, 28 Feb 2007 05:00:00 -0800 (PST)
Subject: [ofa-general] [Bug 390] perftools don't work on alpha1
In-Reply-To: <bug-390-1@https.bugs.openfabrics.org/>
Message-ID: <20070228130000.8F2A1E60837@openfabrics.org>

https://bugs.openfabrics.org/show_bug.cgi?id=390


mst at mellanox.co.il changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |WONTFIX


------- Comment #2 from mst at mellanox.co.il  2007-02-28 04:59 -------
As far as I know, most perftools do not support CMA.
Here is what I get:

# ib_write_lat --cma
ib_write_lat: unrecognized option `--cma'
Usage:
  ib_write_lat            start a server and wait for connection
  ib_write_lat <host>     connect to server at <host>

Options:
  -p, --port=<port>            listen on/connect to port <port> (default 18515)
  -c, --connection=<RC/UC>     connection type RC/UC (default RC)
  -m, --mtu=<mtu>              mtu size (default 1024)
  -d, --ib-dev=<dev>           use IB device <dev> (default first device found)
  -i, --ib-port=<port>         use port <port> of IB device (default 1)
  -s, --size=<size>            size of message to exchange (default 1)
  -a, --all                    Run sizes from 2 till 2^23
  -t, --tx-depth=<dep>         size of tx queue (default 50)
  -n, --iters=<iters>          number of exchanges (at least 2, default 1000)
  -C, --report-cycles          report times in cpu cycle units (default
microseconds)
  -H, --report-histogram       print out all results (default print summary
only)
  -U, --report-unsorted        (implies -H) print out unsorted results (default
sorted)
  -V, --version                display version number


-- 
Configure bugmail: https://bugs.openfabrics.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From mst at mellanox.co.il  Wed Feb 28 05:11:38 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Wed, 28 Feb 2007 15:11:38 +0200
Subject: [ofa-general] ofed 1.2: backport changes
Message-ID: <20070228131138.GA4715@mellanox.co.il>

Hi!
To fix bug 247, I have moved backport implementation for
struct class sysfs functions from individual backport patches
to kernel_addons.

I then removed this code from individual core and ipath backport patches
for RHEL4 and SLES9 kernels, to avoid conflict.
With 8e99564fab97570e82212000cfd78ada7bcf45fe, core and ipath passes build.

However, since I do not own ipath hardware, please do DOA testing
on RHEL4 and SLES9 kernels.

Thanks,

-- 
MST


From pasha at dev.mellanox.co.il  Wed Feb 28 05:25:49 2007
From: pasha at dev.mellanox.co.il (Pavel Shamis (Pasha))
Date: Wed, 28 Feb 2007 15:25:49 +0200
Subject: [ofa-general] Performance penalty of OFED 1.1 versus IBGD 1.8.2
In-Reply-To: <17893.22368.748298.755523@gargle.gargle.HOWL>
References: <17893.22368.748298.755523@gargle.gargle.HOWL>
Message-ID: <45E582DD.8010206@dev.mellanox.co.il>

Hi Roland,

> I'm migrating from IBGD 1.8.2 (kernel 2.6.15.7) to OFED 1.1, and saw
> some unpleasant performance drops when using OFED 1.1 (kernel 2.6.20.1
> with included IB drivers). The main drop is in throughput as measured
> by the OSU MPI bandwidth benchmark. However, the latency for large
> packet sizes is also worse (see results below). I tried with and
> without "options ib_mthca msi_x=1" (using IBGD, disabling msi_x makes
> a siginficant performance difference of approx. 10%). The IB card is a
> Mellanox MHGS18-XT (PCIe/DDR Firmware 1.2.0) running on an Opteron
> with nForce4 2200 Professional chipset.
> 
> Does anybody have an explanation or even better a solution to this
> issue?

Please try to add follow mvapich parameter : VIADEV_DEFAULT_MTU=MTU2048

Regards,
Pasha.

> 
> Thanks,
> 
> Roland
> 
> 
> 
> ------------------------------------------------------------------------
> 
> IBGD
> --------
> 
> # OSU MPI Bandwidth Test (Version 2.1)
> # Size          Bandwidth (MB/s)
> 1               0.830306
> 2               1.642710
> 4               3.307494
> 8               6.546477
> 16              13.161954
> 32              26.395154
> 64              52.913060
> 128             101.890547
> 256             172.227478
> 512             383.296292
> 1024            611.172247
> 2048            830.147571
> 4096            1068.057366
> 8192            1221.262520
> 16384           1271.771983
> 32768           1369.702828
> 65536           1426.124683
> 131072          1453.781151
> 262144          1457.297992
> 524288          1464.625860
> 1048576         1468.953875
> 2097152         1470.614903
> 4194304         1471.607758
> 
> # OSU MPI Latency Test (Version 2.1)
> # Size          Latency (us)
> 0               3.03
> 1               3.03
> 2               3.04
> 4               3.03
> 8               3.03
> 16              3.04
> 32              3.11
> 64              3.23
> 128             3.49
> 256             3.83
> 512             4.88
> 1024            6.31
> 2048            8.60
> 4096            11.02
> 8192            15.78
> 16384           28.85
> 32768           39.82
> 65536           60.30
> 131072          106.65
> 262144          196.47
> 524288          374.62
> 1048576         730.79
> 2097152         1442.32
> 4194304         2864.80
> 
> OFED 1.1
> ---------
> 
> # OSU MPI Bandwidth Test (Version 2.2)
> # Size          Bandwidth (MB/s)
> 1               0.698614
> 2               1.463192
> 4               2.941852
> 8               5.859464
> 16              11.697510
> 32              23.339031
> 64              46.403081
> 128             92.013928
> 256             182.918388
> 512             315.076923
> 1024            500.083937
> 2048            765.294564
> 4096            1003.652513
> 8192            1147.640312
> 16384           1115.803139
> 32768           1221.120298
> 65536           1282.328447
> 131072          1315.715608
> 262144          1331.456393
> 524288          1340.691793
> 1048576         1345.650404
> 2097152         1349.279211
> 4194304         1350.489883
> 
> # OSU MPI Latency Test (Version 2.2)
> # Size          Latency (us)
> 0               2.99
> 1               3.03
> 2               3.06
> 4               3.03
> 8               3.03
> 16              3.04
> 32              3.12
> 64              3.27
> 128             3.96
> 256             4.29
> 512             4.99
> 1024            6.53
> 2048            9.08
> 4096            11.92
> 8192            17.39
> 16384           31.05
> 32768           43.47
> 65536           67.17
> 131072          115.30
> 262144          212.33
> 524288          405.20
> 1048576         790.45
> 2097152         1558.88
> 4194304         3095.17
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


From mst at mellanox.co.il  Wed Feb 28 05:24:40 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Wed, 28 Feb 2007 15:24:40 +0200
Subject: [ofa-general] Re: ofed 1.2: backport changes
In-Reply-To: <20070228131138.GA4715@mellanox.co.il>
References: <20070228131138.GA4715@mellanox.co.il>
Message-ID: <20070228132440.GC4715@mellanox.co.il>

> Quoting Michael S. Tsirkin <mst at mellanox.co.il>:
> Subject: ofed 1.2: backport changes
> 
> Hi!
> To fix bug 247,

Should have been: bug 347.

> I have moved backport implementation for
> struct class sysfs functions from individual backport patches
> to kernel_addons.
> 
> I then removed this code from individual core and ipath backport patches
> for RHEL4 and SLES9 kernels, to avoid conflict.
> With 8e99564fab97570e82212000cfd78ada7bcf45fe, core and ipath passes build.
> 
> However, since I do not own ipath hardware, please do DOA testing
> on RHEL4 and SLES9 kernels.

-- 
MST


From halr at voltaire.com  Wed Feb 28 05:31:42 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 28 Feb 2007 08:31:42 -0500
Subject: [ofa-general] Re: [PATCH] osm: Trivial changes for compilation on
	windows
In-Reply-To: <45E54653.6010300@dev.mellanox.co.il>
References: <45E54653.6010300@dev.mellanox.co.il>
Message-ID: <1172669491.31770.64611.camel@hal.voltaire.com>

On Wed, 2007-02-28 at 04:07, Yevgeny Kliteynik wrote:
> Hi Hal.
> 
> This patch has trivial data types changes and redefining a macro.
> 
> 
> BTW, Sasha, do we still need this macro (NOISE_L in osm_ucast_updn.c)?
> 
> Signed-off-by: Yevgeny Kliteynik <kliteyn at dev.mellanox.co.il>

Thanks. Applied (to both master and ofed_1_2).

-- Hal


From rf at q-leap.de  Wed Feb 28 05:44:41 2007
From: rf at q-leap.de (Roland Fehrenbacher)
Date: Wed, 28 Feb 2007 14:44:41 +0100
Subject: [ofa-general] Performance penalty of OFED 1.1 versus IBGD 1.8.2
In-Reply-To: <45E582DD.8010206@dev.mellanox.co.il>
References: <17893.22368.748298.755523@gargle.gargle.HOWL>
	<45E582DD.8010206@dev.mellanox.co.il>
Message-ID: <17893.34633.644064.978253@gargle.gargle.HOWL>

>>>>> "Pavel" == Pavel Shamis <(Pasha)" <pasha at dev.mellanox.co.il>> writes:

    Pavel> Hi Roland,
    >> I'm migrating from IBGD 1.8.2 (kernel 2.6.15.7) to OFED 1.1,
    >> and saw some unpleasant performance drops when using OFED 1.1
    >> (kernel 2.6.20.1 with included IB drivers). The main drop is in
    >> throughput as measured by the OSU MPI bandwidth
    >> benchmark. However, the latency for large packet sizes is also
    >> worse (see results below). I tried with and without "options
    >> ib_mthca msi_x=1" (using IBGD, disabling msi_x makes a
    >> siginficant performance difference of approx. 10%). The IB card
    >> is a Mellanox MHGS18-XT (PCIe/DDR Firmware 1.2.0) running on an
    >> Opteron with nForce4 2200 Professional chipset.
    >> 
    >> Does anybody have an explanation or even better a solution to
    >> this issue?

    Pavel> Please try to add follow mvapich parameter :
    Pavel> VIADEV_DEFAULT_MTU=MTU2048

Thanks for the suggestion. Unfortunately, it didn't improve the simple
bandwidth results. Bi-directional bandwidth increased by 3%
though. Any more ideas?

Roland

> ------------------------------------------------------------------------
> 
> IBGD
> --------
> 
> # OSU MPI Bandwidth Test (Version 2.1)
> # Size          Bandwidth (MB/s)
> 1               0.830306
> 2               1.642710
> 4               3.307494
> 8               6.546477
> 16              13.161954
> 32              26.395154
> 64              52.913060
> 128             101.890547
> 256             172.227478
> 512             383.296292
> 1024            611.172247
> 2048            830.147571
> 4096            1068.057366
> 8192            1221.262520
> 16384           1271.771983
> 32768           1369.702828
> 65536           1426.124683
> 131072          1453.781151
> 262144          1457.297992
> 524288          1464.625860
> 1048576         1468.953875
> 2097152         1470.614903
> 4194304         1471.607758
> 
> # OSU MPI Latency Test (Version 2.1)
> # Size          Latency (us)
> 0               3.03
> 1               3.03
> 2               3.04
> 4               3.03
> 8               3.03
> 16              3.04
> 32              3.11
> 64              3.23
> 128             3.49
> 256             3.83
> 512             4.88
> 1024            6.31
> 2048            8.60
> 4096            11.02
> 8192            15.78
> 16384           28.85
> 32768           39.82
> 65536           60.30
> 131072          106.65
> 262144          196.47
> 524288          374.62
> 1048576         730.79
> 2097152         1442.32
> 4194304         2864.80
> 
> OFED 1.1
> ---------
> 
> # OSU MPI Bandwidth Test (Version 2.2)
> # Size          Bandwidth (MB/s)
> 1               0.698614
> 2               1.463192
> 4               2.941852
> 8               5.859464
> 16              11.697510
> 32              23.339031
> 64              46.403081
> 128             92.013928
> 256             182.918388
> 512             315.076923
> 1024            500.083937
> 2048            765.294564
> 4096            1003.652513
> 8192            1147.640312
> 16384           1115.803139
> 32768           1221.120298
> 65536           1282.328447
> 131072          1315.715608
> 262144          1331.456393
> 524288          1340.691793
> 1048576         1345.650404
> 2097152         1349.279211
> 4194304         1350.489883
> 
> # OSU MPI Latency Test (Version 2.2)
> # Size          Latency (us)
> 0               2.99
> 1               3.03
> 2               3.06
> 4               3.03
> 8               3.03
> 16              3.04
> 32              3.12
> 64              3.27
> 128             3.96
> 256             4.29
> 512             4.99
> 1024            6.53
> 2048            9.08
> 4096            11.92
> 8192            17.39
> 16384           31.05
> 32768           43.47
> 65536           67.17
> 131072          115.30
> 262144          212.33
> 524288          405.20
> 1048576         790.45
> 2097152         1558.88
> 4194304         3095.17
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


From halr at voltaire.com  Wed Feb 28 05:42:00 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 28 Feb 2007 08:42:00 -0500
Subject: [ofa-general] [PATCH][MINOR] OpenSM/osm_sa.c: Add osm_log error
	message when osm_sa_mad_ctrl_bind fails
Message-ID: <1172670114.31770.65268.camel@hal.voltaire.com>

OpenSM/osm_sa.c: Add osm_log error message when osm_sa_mad_ctrl_bind
fails

Signed-off-by: Hal Rosenstock <halr at voltaire.com>

diff --git a/osm/opensm/osm_sa.c b/osm/opensm/osm_sa.c
index 42a38aa..d74d875 100644
--- a/osm/opensm/osm_sa.c
+++ b/osm/opensm/osm_sa.c
@@ -505,6 +505,16 @@ osm_sa_bind(
 
   status = osm_sa_mad_ctrl_bind( &p_sa->mad_ctrl, port_guid );
 
+  if( status != IB_SUCCESS )
+  {
+    osm_log( p_sa->p_log, OSM_LOG_ERROR,
+             "osm_sa_bind: ERR 4C03: "
+             "SA MAD Controller bind failed (%s)\n",
+             ib_get_err_str( status ) );
+    goto Exit;
+  }
+
+ Exit:
   OSM_LOG_EXIT( p_sa->p_log );
   return( status );
 }


From vlad at lists.openfabrics.org  Wed Feb 28 05:57:25 2007
From: vlad at lists.openfabrics.org (vlad at lists.openfabrics.org)
Date: Wed, 28 Feb 2007 05:57:25 -0800 (PST)
Subject: [ofa-general] ofa_1_2_kernel 20070228-0525 daily build status
Message-ID: <20070228135725.31E24E60842@openfabrics.org>

This email was generated automatically, please do not reply


Common build parameters:  --with-ipoib-mod --with-sdp-mod --with-srp-mod --with-user_mad-mod --with-user_access-mod --with-mthca-mod --with-core-mod --with-addr_trans-mod  --with-rds-mod --with-cxgb3-mod

Passed:
Passed on i686 with 2.6.15-23-server
Passed on i686 with linux-2.6.19
Passed on i686 with linux-2.6.18
Passed on i686 with linux-2.6.17
Passed on i686 with linux-2.6.12
Passed on i686 with linux-2.6.15
Passed on i686 with linux-2.6.13
Passed on i686 with linux-2.6.14
Passed on i686 with linux-2.6.16
Passed on x86_64 with linux-2.6.14
Passed on x86_64 with linux-2.6.20
Passed on x86_64 with linux-2.6.12
Passed on x86_64 with linux-2.6.16
Passed on powerpc with linux-2.6.18
Passed on x86_64 with linux-2.6.19
Passed on x86_64 with linux-2.6.13
Passed on x86_64 with linux-2.6.18
Passed on powerpc with linux-2.6.19
Passed on x86_64 with linux-2.6.17
Passed on x86_64 with linux-2.6.15
Passed on powerpc with linux-2.6.17
Passed on powerpc with linux-2.6.16
Passed on ppc64 with linux-2.6.14
Passed on powerpc with linux-2.6.14
Passed on ppc64 with linux-2.6.17
Passed on powerpc with linux-2.6.13
Passed on powerpc with linux-2.6.12
Passed on powerpc with linux-2.6.15
Passed on ppc64 with linux-2.6.12
Passed on x86_64 with linux-2.6.18-1.2798.fc6
Passed on ppc64 with linux-2.6.15
Passed on ppc64 with linux-2.6.13
Passed on ppc64 with linux-2.6.19
Passed on ppc64 with linux-2.6.16
Passed on ppc64 with linux-2.6.18
Passed on ia64 with linux-2.6.19
Passed on x86_64 with linux-2.6.16.21-0.8-smp
Passed on ia64 with linux-2.6.18
Passed on ia64 with linux-2.6.16
Passed on ia64 with linux-2.6.12
Passed on ia64 with linux-2.6.15
Passed on ia64 with linux-2.6.17
Passed on ia64 with linux-2.6.14
Passed on ia64 with linux-2.6.13
Passed on ia64 with linux-2.6.16.21-0.8-default
Passed on x86_64 with linux-2.6.9-42.ELsmp

Failed:
Build failed on x86_64 with linux-2.6.5-7.244-smp
Log:
/home/vlad/tmp/ofa_1_2_kernel-20070228-0525_linux-2.6.5-7.244-smp_x86_64_check/net/rds/af_rds.c: In function 'rds_exit':
/home/vlad/tmp/ofa_1_2_kernel-20070228-0525_linux-2.6.5-7.244-smp_x86_64_check/net/rds/af_rds.c:468: error: implicit declaration of function 'proto_unregister'
/home/vlad/tmp/ofa_1_2_kernel-20070228-0525_linux-2.6.5-7.244-smp_x86_64_check/net/rds/af_rds.c: In function 'rds_init':
/home/vlad/tmp/ofa_1_2_kernel-20070228-0525_linux-2.6.5-7.244-smp_x86_64_check/net/rds/af_rds.c:517: error: implicit declaration of function 'proto_register'
make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070228-0525_linux-2.6.5-7.244-smp_x86_64_check/net/rds/af_rds.o] Error 1
make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070228-0525_linux-2.6.5-7.244-smp_x86_64_check/net/rds] Error 2
make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070228-0525_linux-2.6.5-7.244-smp_x86_64_check] Error 2
make[1]: Leaving directory `/home/vlad/kernel.org/x86_64/linux-2.6.5-7.244-smp'
make: *** [kernel] Error 2
----------------------------------------------------------------------------------
Build failed on x86_64 with linux-2.6.9-22.ELsmp
Log:
/home/vlad/tmp/ofa_1_2_kernel-20070228-0525_linux-2.6.9-22.ELsmp_x86_64_check/drivers/net/cxgb3/vsc8211.c:167: error: 'ADVERTISE_PAUSE_CAP' undeclared (first use in this function)
/home/vlad/tmp/ofa_1_2_kernel-20070228-0525_linux-2.6.9-22.ELsmp_x86_64_check/drivers/net/cxgb3/vsc8211.c:167: error: (Each undeclared identifier is reported only once
/home/vlad/tmp/ofa_1_2_kernel-20070228-0525_linux-2.6.9-22.ELsmp_x86_64_check/drivers/net/cxgb3/vsc8211.c:167: error: for each function it appears in.)
/home/vlad/tmp/ofa_1_2_kernel-20070228-0525_linux-2.6.9-22.ELsmp_x86_64_check/drivers/net/cxgb3/vsc8211.c:170: error: 'ADVERTISE_PAUSE_ASYM' undeclared (first use in this function)
make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070228-0525_linux-2.6.9-22.ELsmp_x86_64_check/drivers/net/cxgb3/vsc8211.o] Error 1
make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070228-0525_linux-2.6.9-22.ELsmp_x86_64_check/drivers/net/cxgb3] Error 2
make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070228-0525_linux-2.6.9-22.ELsmp_x86_64_check] Error 2
make[1]: Leaving directory `/home/vlad/kernel.org/x86_64/linux-2.6.9-22.ELsmp'
make: *** [kernel] Error 2
----------------------------------------------------------------------------------
Build failed on x86_64 with linux-2.6.9-34.ELsmp
Log:
/home/vlad/tmp/ofa_1_2_kernel-20070228-0525_linux-2.6.9-34.ELsmp_x86_64_check/drivers/net/cxgb3/cxgb3_offload.c: In function 'add_adapter':
/home/vlad/tmp/ofa_1_2_kernel-20070228-0525_linux-2.6.9-34.ELsmp_x86_64_check/drivers/net/cxgb3/cxgb3_offload.c:1061: error: 'adapter_list_lock' undeclared (first use in this function)
/home/vlad/tmp/ofa_1_2_kernel-20070228-0525_linux-2.6.9-34.ELsmp_x86_64_check/drivers/net/cxgb3/cxgb3_offload.c: In function 'remove_adapter':
/home/vlad/tmp/ofa_1_2_kernel-20070228-0525_linux-2.6.9-34.ELsmp_x86_64_check/drivers/net/cxgb3/cxgb3_offload.c:1068: error: 'adapter_list_lock' undeclared (first use in this function)
make[3]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070228-0525_linux-2.6.9-34.ELsmp_x86_64_check/drivers/net/cxgb3/cxgb3_offload.o] Error 1
make[2]: *** [/home/vlad/tmp/ofa_1_2_kernel-20070228-0525_linux-2.6.9-34.ELsmp_x86_64_check/drivers/net/cxgb3] Error 2
make[1]: *** [_module_/home/vlad/tmp/ofa_1_2_kernel-20070228-0525_linux-2.6.9-34.ELsmp_x86_64_check] Error 2
make[1]: Leaving directory `/home/vlad/kernel.org/x86_64/linux-2.6.9-34.ELsmp'
make: *** [kernel] Error 2
----------------------------------------------------------------------------------


From pasha at dev.mellanox.co.il  Wed Feb 28 06:03:36 2007
From: pasha at dev.mellanox.co.il (Pavel Shamis (Pasha))
Date: Wed, 28 Feb 2007 16:03:36 +0200
Subject: [ofa-general] Performance penalty of OFED 1.1 versus IBGD 1.8.2
In-Reply-To: <17893.34633.644064.978253@gargle.gargle.HOWL>
References: <17893.22368.748298.755523@gargle.gargle.HOWL>	<45E582DD.8010206@dev.mellanox.co.il>
	<17893.34633.644064.978253@gargle.gargle.HOWL>
Message-ID: <45E58BB8.4020902@dev.mellanox.co.il>

>     Pavel> Hi Roland,
>     >> I'm migrating from IBGD 1.8.2 (kernel 2.6.15.7) to OFED 1.1,
>     >> and saw some unpleasant performance drops when using OFED 1.1
>     >> (kernel 2.6.20.1 with included IB drivers). The main drop is in
>     >> throughput as measured by the OSU MPI bandwidth
>     >> benchmark. However, the latency for large packet sizes is also
>     >> worse (see results below). I tried with and without "options
>     >> ib_mthca msi_x=1" (using IBGD, disabling msi_x makes a
>     >> siginficant performance difference of approx. 10%). The IB card
>     >> is a Mellanox MHGS18-XT (PCIe/DDR Firmware 1.2.0) running on an
>     >> Opteron with nForce4 2200 Professional chipset.
>     >> 
>     >> Does anybody have an explanation or even better a solution to
>     >> this issue?
> 
>     Pavel> Please try to add follow mvapich parameter :
>     Pavel> VIADEV_DEFAULT_MTU=MTU2048
> 
> Thanks for the suggestion. Unfortunately, it didn't improve the simple
> bandwidth results. Bi-directional bandwidth increased by 3%
> though. Any more ideas?
3% is good start :-)
Please also try to add this one:
VIADEV_MAX_RDMA_SIZE=4194304

-Pasha

> 
> Roland
> 
>> ------------------------------------------------------------------------
>>
>> IBGD
>> --------
>>
>> # OSU MPI Bandwidth Test (Version 2.1)
>> # Size          Bandwidth (MB/s)
>> 1               0.830306
>> 2               1.642710
>> 4               3.307494
>> 8               6.546477
>> 16              13.161954
>> 32              26.395154
>> 64              52.913060
>> 128             101.890547
>> 256             172.227478
>> 512             383.296292
>> 1024            611.172247
>> 2048            830.147571
>> 4096            1068.057366
>> 8192            1221.262520
>> 16384           1271.771983
>> 32768           1369.702828
>> 65536           1426.124683
>> 131072          1453.781151
>> 262144          1457.297992
>> 524288          1464.625860
>> 1048576         1468.953875
>> 2097152         1470.614903
>> 4194304         1471.607758
>>
>> # OSU MPI Latency Test (Version 2.1)
>> # Size          Latency (us)
>> 0               3.03
>> 1               3.03
>> 2               3.04
>> 4               3.03
>> 8               3.03
>> 16              3.04
>> 32              3.11
>> 64              3.23
>> 128             3.49
>> 256             3.83
>> 512             4.88
>> 1024            6.31
>> 2048            8.60
>> 4096            11.02
>> 8192            15.78
>> 16384           28.85
>> 32768           39.82
>> 65536           60.30
>> 131072          106.65
>> 262144          196.47
>> 524288          374.62
>> 1048576         730.79
>> 2097152         1442.32
>> 4194304         2864.80
>>
>> OFED 1.1
>> ---------
>>
>> # OSU MPI Bandwidth Test (Version 2.2)
>> # Size          Bandwidth (MB/s)
>> 1               0.698614
>> 2               1.463192
>> 4               2.941852
>> 8               5.859464
>> 16              11.697510
>> 32              23.339031
>> 64              46.403081
>> 128             92.013928
>> 256             182.918388
>> 512             315.076923
>> 1024            500.083937
>> 2048            765.294564
>> 4096            1003.652513
>> 8192            1147.640312
>> 16384           1115.803139
>> 32768           1221.120298
>> 65536           1282.328447
>> 131072          1315.715608
>> 262144          1331.456393
>> 524288          1340.691793
>> 1048576         1345.650404
>> 2097152         1349.279211
>> 4194304         1350.489883
>>
>> # OSU MPI Latency Test (Version 2.2)
>> # Size          Latency (us)
>> 0               2.99
>> 1               3.03
>> 2               3.06
>> 4               3.03
>> 8               3.03
>> 16              3.04
>> 32              3.12
>> 64              3.27
>> 128             3.96
>> 256             4.29
>> 512             4.99
>> 1024            6.53
>> 2048            9.08
>> 4096            11.92
>> 8192            17.39
>> 16384           31.05
>> 32768           43.47
>> 65536           67.17
>> 131072          115.30
>> 262144          212.33
>> 524288          405.20
>> 1048576         790.45
>> 2097152         1558.88
>> 4194304         3095.17
>>
>>
>> ------------------------------------------------------------------------
>>
>> _______________________________________________
>> general mailing list
>> general at lists.openfabrics.org
>> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
>>
>> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


From tziporet at mellanox.co.il  Wed Feb 28 06:10:02 2007
From: tziporet at mellanox.co.il (Tziporet Koren)
Date: Wed, 28 Feb 2007 16:10:02 +0200
Subject: [ofa-general] OFED 1.2 Feb-26 meeting summary
Message-ID: <45E58D3A.8060906@mellanox.co.il>

The meeting summary is also available on the Wiki:
https://wiki.openfabrics.org/tiki-index.php?page=Teleconf+02-26-2007


  This is the OFED 1.2 Feb-26 meeting summary on alpha status:


    Abbreviated minutes / summary:

    * We will not build any alpha2 package. Anyone can use the full
      packages that Vlad provides.
    * The cut date for Beta changes is end of this week (Saturday Mar-3)
    * Next milestone is the Beta release - on March-7
    * Each maintainer should fix the bugs assigned to him in bugzilla
    * Documents will stay in the same way as in OFED 1.1 (one directory
      with all docs)
    * Improved RPM usage by the install will not be part of OFED 1.2


    Action Items:

   1. Daily build of full OFED package - Vlad
   2. Fix bugs sent by Scott for the beta - all
   3. Send list of bugs that must be fixed for the beta - Tziporet
   4. Schedule a developers session on OFA developers at Sonoma - Tziporet
   5. Fix ipath driver compilation issues - Bryan
   6. Support MPI selection by MVAPICH2 - Shaun


    Detailed Minutes:

    * RPM and install: The RPM are build today in a non-standard way.
          o We are not going to do any change for OFED 1.2 since it will
            delay the release significantly.
          o The RPM usage will be enhanced for the next (1.3) release
            and we will decide on the correct way in Sonoma.
    * MPI selection:
          o Implemented by Open MPI and MVAPICH
          o Need a support from MVAPICH2
          o Jeff will publish usage to the full list after MVAPICH
            package will support it.
          o Scott said Cisco will test it

*
*Tziporet

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070228/6bf5e8eb/attachment.html>

From pasha at dev.mellanox.co.il  Wed Feb 28 06:12:40 2007
From: pasha at dev.mellanox.co.il (Pavel Shamis (Pasha))
Date: Wed, 28 Feb 2007 16:12:40 +0200
Subject: [ofa-general] Performance penalty of OFED 1.1 versus IBGD 1.8.2
In-Reply-To: <45E58BB8.4020902@dev.mellanox.co.il>
References: <17893.22368.748298.755523@gargle.gargle.HOWL>	<45E582DD.8010206@dev.mellanox.co.il>	<17893.34633.644064.978253@gargle.gargle.HOWL>
	<45E58BB8.4020902@dev.mellanox.co.il>
Message-ID: <45E58DD8.90306@dev.mellanox.co.il>

Also please run : mpirun_rsh -v
I want to check which version of mvapich you have.

Pavel Shamis (Pasha) wrote:
>>     Pavel> Hi Roland,
>>     >> I'm migrating from IBGD 1.8.2 (kernel 2.6.15.7) to OFED 1.1,
>>     >> and saw some unpleasant performance drops when using OFED 1.1
>>     >> (kernel 2.6.20.1 with included IB drivers). The main drop is in
>>     >> throughput as measured by the OSU MPI bandwidth
>>     >> benchmark. However, the latency for large packet sizes is also
>>     >> worse (see results below). I tried with and without "options
>>     >> ib_mthca msi_x=1" (using IBGD, disabling msi_x makes a
>>     >> siginficant performance difference of approx. 10%). The IB card
>>     >> is a Mellanox MHGS18-XT (PCIe/DDR Firmware 1.2.0) running on an
>>     >> Opteron with nForce4 2200 Professional chipset.
>>     >>     >> Does anybody have an explanation or even better a 
>> solution to
>>     >> this issue?
>>
>>     Pavel> Please try to add follow mvapich parameter :
>>     Pavel> VIADEV_DEFAULT_MTU=MTU2048
>>
>> Thanks for the suggestion. Unfortunately, it didn't improve the simple
>> bandwidth results. Bi-directional bandwidth increased by 3%
>> though. Any more ideas?
> 3% is good start :-)
> Please also try to add this one:
> VIADEV_MAX_RDMA_SIZE=4194304
> 
> -Pasha
> 
>>
>> Roland
>>
>>> ------------------------------------------------------------------------
>>>
>>> IBGD
>>> --------
>>>
>>> # OSU MPI Bandwidth Test (Version 2.1)
>>> # Size          Bandwidth (MB/s)
>>> 1               0.830306
>>> 2               1.642710
>>> 4               3.307494
>>> 8               6.546477
>>> 16              13.161954
>>> 32              26.395154
>>> 64              52.913060
>>> 128             101.890547
>>> 256             172.227478
>>> 512             383.296292
>>> 1024            611.172247
>>> 2048            830.147571
>>> 4096            1068.057366
>>> 8192            1221.262520
>>> 16384           1271.771983
>>> 32768           1369.702828
>>> 65536           1426.124683
>>> 131072          1453.781151
>>> 262144          1457.297992
>>> 524288          1464.625860
>>> 1048576         1468.953875
>>> 2097152         1470.614903
>>> 4194304         1471.607758
>>>
>>> # OSU MPI Latency Test (Version 2.1)
>>> # Size          Latency (us)
>>> 0               3.03
>>> 1               3.03
>>> 2               3.04
>>> 4               3.03
>>> 8               3.03
>>> 16              3.04
>>> 32              3.11
>>> 64              3.23
>>> 128             3.49
>>> 256             3.83
>>> 512             4.88
>>> 1024            6.31
>>> 2048            8.60
>>> 4096            11.02
>>> 8192            15.78
>>> 16384           28.85
>>> 32768           39.82
>>> 65536           60.30
>>> 131072          106.65
>>> 262144          196.47
>>> 524288          374.62
>>> 1048576         730.79
>>> 2097152         1442.32
>>> 4194304         2864.80
>>>
>>> OFED 1.1
>>> ---------
>>>
>>> # OSU MPI Bandwidth Test (Version 2.2)
>>> # Size          Bandwidth (MB/s)
>>> 1               0.698614
>>> 2               1.463192
>>> 4               2.941852
>>> 8               5.859464
>>> 16              11.697510
>>> 32              23.339031
>>> 64              46.403081
>>> 128             92.013928
>>> 256             182.918388
>>> 512             315.076923
>>> 1024            500.083937
>>> 2048            765.294564
>>> 4096            1003.652513
>>> 8192            1147.640312
>>> 16384           1115.803139
>>> 32768           1221.120298
>>> 65536           1282.328447
>>> 131072          1315.715608
>>> 262144          1331.456393
>>> 524288          1340.691793
>>> 1048576         1345.650404
>>> 2097152         1349.279211
>>> 4194304         1350.489883
>>>
>>> # OSU MPI Latency Test (Version 2.2)
>>> # Size          Latency (us)
>>> 0               2.99
>>> 1               3.03
>>> 2               3.06
>>> 4               3.03
>>> 8               3.03
>>> 16              3.04
>>> 32              3.12
>>> 64              3.27
>>> 128             3.96
>>> 256             4.29
>>> 512             4.99
>>> 1024            6.53
>>> 2048            9.08
>>> 4096            11.92
>>> 8192            17.39
>>> 16384           31.05
>>> 32768           43.47
>>> 65536           67.17
>>> 131072          115.30
>>> 262144          212.33
>>> 524288          405.20
>>> 1048576         790.45
>>> 2097152         1558.88
>>> 4194304         3095.17
>>>
>>>
>>> ------------------------------------------------------------------------
>>>
>>> _______________________________________________
>>> general mailing list
>>> general at lists.openfabrics.org
>>> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
>>>
>>> To unsubscribe, please visit 
>>> http://openib.org/mailman/listinfo/openib-general
> 
> _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> 
> To unsubscribe, please visit 
> http://openib.org/mailman/listinfo/openib-general


From bugzilla-daemon at lists.openfabrics.org  Wed Feb 28 06:22:31 2007
From: bugzilla-daemon at lists.openfabrics.org (bugzilla-daemon at lists.openfabrics.org)
Date: Wed, 28 Feb 2007 06:22:31 -0800 (PST)
Subject: [ofa-general] [Bug 390] perftools don't work on alpha1
In-Reply-To: <bug-390-1@https.bugs.openfabrics.org/>
Message-ID: <20070228142231.EEAF2E60823@openfabrics.org>

https://bugs.openfabrics.org/show_bug.cgi?id=390


swise at opengridcomputing.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|RESOLVED                    |REOPENED
         Resolution|WONTFIX                     |


------- Comment #3 from swise at opengridcomputing.com  2007-02-28 06:22 -------
ib_rdma_bw, not ib_write_bw.

ib_rdma_bw and ib_rdma_lat both support the --cma flag.  ib_rdma_lat works,
ib_rdma_bw doesn't.

You don't want this fixed?


-- 
Configure bugmail: https://bugs.openfabrics.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From dlezcano at fr.ibm.com  Wed Feb 28 06:35:41 2007
From: dlezcano at fr.ibm.com (Daniel Lezcano)
Date: Wed, 28 Feb 2007 15:35:41 +0100
Subject: [ofa-general] Re: [PATCH RFC 18/31] net: Implment network device
 movement between namespaces
In-Reply-To: <11697516372179-git-send-email-ebiederm@xmission.com>
References: <m13b5zym0n.fsf@ebiederm.dsl.xmission.com>
	<11697516372179-git-send-email-ebiederm@xmission.com>
Message-ID: <45E5933D.4070304@fr.ibm.com>

Eric W. Biederman wrote:
> From: Eric W. Biederman <ebiederm at xmission.com> - unquoted
>
> This patch introduces NETIF_F_NETNS_LOCAL a flag to indicate
> a network device is local to a single network namespace and
> should never be moved.  Useful for pseudo devices that we
> need an instance in each network namespace (like the loopback
> device) and for any device we find that cannot handle multiple
> network namespaces so we may trap them in the initial network
> namespace.
>
> This patch introduces the function dev_change_net_namespace
> a function used to move a network device from one network
> namespace to another.  To the network device nothing
> special appears to happen, to the components of the network
> stack it appears as if the network device was unregistered
> in the network namespace it is in, and a new device
> was registered in the network namespace the device
> was moved to.
>
> This patch sets up a namespace device destructor that
> upon the exit of a network namespace moves all of the
> movable network devices  to the initial network namespace
> so they are not lost.
>   
If you:
 * create etun0/etun1
 * create a namespace
 * move etun1 to this namespace
 *  rename the etun1 to eth0
 *  kill the namespace

the former network device etun1 will be lost if you have in your parent 
namespace an interface eth0 because it will conflict.
Perhaps, the first name should be restored before moving the device back 
to the initial network namespace ?

  -- Daniel

ps : nice patchset


From dlezcano at fr.ibm.com  Wed Feb 28 06:42:08 2007
From: dlezcano at fr.ibm.com (Daniel Lezcano)
Date: Wed, 28 Feb 2007 15:42:08 +0100
Subject: [ofa-general] Re: [PATCH RFC 22/31] net: Add network namespace clone
	support.
In-Reply-To: <11697516373288-git-send-email-ebiederm@xmission.com>
References: <m13b5zym0n.fsf@ebiederm.dsl.xmission.com>
	<11697516373288-git-send-email-ebiederm@xmission.com>
Message-ID: <45E594C0.6090009@fr.ibm.com>

Eric W. Biederman wrote:
> From: Eric W. Biederman <ebiederm at xmission.com> - unquoted
>
> This patch allows you to create a new network namespace
> using sys_clone(...).
>
> Signed-off-by: Eric W. Biederman <ebiederm at xmission.com>
> ---
>  include/linux/sched.h    |    1 +
>  kernel/nsproxy.c         |   11 +++++++++++
>  net/core/net_namespace.c |   38 ++++++++++++++++++++++++++++++++++++++
>  3 files changed, 50 insertions(+), 0 deletions(-)
>
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 4463735..9e0f91a 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -26,6 +26,7 @@
>  #define CLONE_STOPPED		0x02000000	/* Start in stopped state */
>  #define CLONE_NEWUTS		0x04000000	/* New utsname group? */
>  #define CLONE_NEWIPC		0x08000000	/* New ipcs */
> +#define CLONE_NEWNET		0x20000000	/* New network namespace */
>
>  /*
>   * Scheduling policies
> diff --git a/kernel/nsproxy.c b/kernel/nsproxy.c
> index 4f3c95a..7861c4c 100644
> --- a/kernel/nsproxy.c
> +++ b/kernel/nsproxy.c
> @@ -20,6 +20,7 @@
>  #include <linux/mnt_namespace.h>
>  #include <linux/utsname.h>
>  #include <linux/pid_namespace.h>
> +#include <net/net_namespace.h>
>
>  struct nsproxy init_nsproxy = INIT_NSPROXY(init_nsproxy);
>  EXPORT_SYMBOL_GPL(init_nsproxy);
> @@ -70,6 +71,7 @@ struct nsproxy *dup_namespaces(struct nsproxy *orig)
>  			get_ipc_ns(ns->ipc_ns);
>  		if (ns->pid_ns)
>  			get_pid_ns(ns->pid_ns);
> +		get_net(ns->net_ns);
>  	}
>
>  	return ns;
> @@ -117,10 +119,18 @@ int copy_namespaces(int flags, struct task_struct *tsk)
>  	if (err)
>  		goto out_pid;
>
> +	err = copy_net(flags, tsk);
> +	if (err)
> +		goto out_net;
> +
>  out:
>  	put_nsproxy(old_ns);
>  	return err;
>
> +out_net:
> +	if (new_ns->pid_ns)
> +		put_pid_ns(new_ns->pid_ns);
> +
>  out_pid:
>  	if (new_ns->ipc_ns)
>  		put_ipc_ns(new_ns->ipc_ns);
> @@ -146,5 +156,6 @@ void free_nsproxy(struct nsproxy *ns)
>  		put_ipc_ns(ns->ipc_ns);
>  	if (ns->pid_ns)
>  		put_pid_ns(ns->pid_ns);
> +	put_net(ns->net_ns);
>  	kfree(ns);
>  }
> diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
> index 93e3879..cc56105 100644
> --- a/net/core/net_namespace.c
> +++ b/net/core/net_namespace.c
> @@ -175,6 +175,44 @@ out_undo:
>  	goto out;
>  }
>
> +int copy_net(int flags, struct task_struct *tsk)
> +{
> +	net_t old_net = tsk->nsproxy->net_ns;
> +	net_t new_net;
> +	int err;
> +
> +	get_net(old_net);
> +
> +	if (!(flags & CLONE_NEWNET))
> +		return 0;
> +
> +	err = -EPERM;
> +	if (!capable(CAP_SYS_ADMIN))
> +		goto out;
> +
> +	err = -ENOMEM;
> +	new_net = net_alloc();
> +	if (null_net(new_net))
> +		goto out;
> +
> +	mutex_lock(&net_mutex);
> +	err = setup_net(new_net);
> +	if (err)
> +		goto out_unlock;
>   
Should we "net_free" in case of error ?
> +
> +	net_lock();
> +	net_list_append(new_net);
> +	net_unlock();
> +
> +	tsk->nsproxy->net_ns = new_net;
> +
> +out_unlock:
> +	mutex_unlock(&net_mutex);
> +out:
> +	put_net(old_net);
> +	return err;
> +}
> +
>  void pernet_modcopy(void *pnetdst, const void *src, unsigned long size)
>  {
>  	net_t net;
>   


From ebiederm at xmission.com  Wed Feb 28 07:05:13 2007
From: ebiederm at xmission.com (ebiederm at xmission.com)
Date: Wed, 28 Feb 2007 08:05:13 -0700
Subject: [ofa-general] Re: [PATCH RFC 22/31] net: Add network namespace clone
	support.
In-Reply-To: <45E594C0.6090009@fr.ibm.com> (Daniel Lezcano's message of
	"Wed, 28 Feb 2007 15:42:08 +0100")
References: <m13b5zym0n.fsf@ebiederm.dsl.xmission.com>
	<11697516373288-git-send-email-ebiederm@xmission.com>
	<45E594C0.6090009@fr.ibm.com>
Message-ID: <m1k5y2gw5y.fsf@ebiederm.dsl.xmission.com>

Daniel Lezcano <dlezcano at fr.ibm.com> writes:


>> +
>> +	mutex_lock(&net_mutex);
>> +	err = setup_net(new_net);
>> +	if (err)
>> +		goto out_unlock;
>>
> Should we "net_free" in case of error ?

Oops.  Yes we should.
Thanks.

>> +	net_lock();
>> +	net_list_append(new_net);
>> +	net_unlock();
>> +
>> +	tsk->nsproxy->net_ns = new_net;
>> +
>> +out_unlock:
>> +	mutex_unlock(&net_mutex);
	net_free(new_net);
>> +out:
>> +	put_net(old_net);
>> +	return err;
>> +}
>> +
>>

Eric


From ebiederm at xmission.com  Wed Feb 28 07:12:16 2007
From: ebiederm at xmission.com (ebiederm at xmission.com)
Date: Wed, 28 Feb 2007 08:12:16 -0700
Subject: [ofa-general] Re: [PATCH RFC 18/31] net: Implment network device
 movement between namespaces
In-Reply-To: <45E5933D.4070304@fr.ibm.com> (Daniel Lezcano's message of
	"Wed, 28 Feb 2007 15:35:41 +0100")
References: <m13b5zym0n.fsf@ebiederm.dsl.xmission.com>
	<11697516372179-git-send-email-ebiederm@xmission.com>
	<45E5933D.4070304@fr.ibm.com>
Message-ID: <m1fy8qgvu7.fsf@ebiederm.dsl.xmission.com>

Daniel Lezcano <dlezcano at fr.ibm.com> writes:

> Eric W. Biederman wrote:
>> From: Eric W. Biederman <ebiederm at xmission.com> - unquoted
>>
>> This patch introduces NETIF_F_NETNS_LOCAL a flag to indicate
>> a network device is local to a single network namespace and
>> should never be moved.  Useful for pseudo devices that we
>> need an instance in each network namespace (like the loopback
>> device) and for any device we find that cannot handle multiple
>> network namespaces so we may trap them in the initial network
>> namespace.
>>
>> This patch introduces the function dev_change_net_namespace
>> a function used to move a network device from one network
>> namespace to another.  To the network device nothing
>> special appears to happen, to the components of the network
>> stack it appears as if the network device was unregistered
>> in the network namespace it is in, and a new device
>> was registered in the network namespace the device
>> was moved to.
>>
>> This patch sets up a namespace device destructor that
>> upon the exit of a network namespace moves all of the
>> movable network devices  to the initial network namespace
>> so they are not lost.
>>
> If you:
> * create etun0/etun1
> * create a namespace
> * move etun1 to this namespace
> *  rename the etun1 to eth0
> *  kill the namespace
>
> the former network device etun1 will be lost if you have in your parent
> namespace an interface eth0 because it will conflict.
> Perhaps, the first name should be restored before moving the device back to the
> initial network namespace ?

Restoration of a previous name is no guarantee of anything.  Someone may have
renamed the some other interface etun1 in the original network namespace.

However if you look closely at the code.  You will discover that if it can't
keep the same name it will rename the device as it switches namespaces.
In particular it will become devN where N is replaced by some unused number.

That is what the pat parameter to dev_change_net_namespace is about.

I'm not exactly thrilled about the generic name but the code should work,
and I don't know if there is a name that makes better sense.


>  -- Daniel
>
> ps : nice patchset

Thanks.

Eric


From Roland.Fehrenbacher at transtec.de  Wed Feb 28 07:21:00 2007
From: Roland.Fehrenbacher at transtec.de (Roland Fehrenbacher)
Date: Wed, 28 Feb 2007 16:21:00 +0100
Subject: [ofa-general] Performance penalty of OFED 1.1 versus IBGD 1.8.2
In-Reply-To: <45E58BB8.4020902@dev.mellanox.co.il>
References: <17893.22368.748298.755523@gargle.gargle.HOWL>
	<45E582DD.8010206@dev.mellanox.co.il>
	<17893.34633.644064.978253@gargle.gargle.HOWL>
	<45E58BB8.4020902@dev.mellanox.co.il>
Message-ID: <17893.40412.365196.423575@gargle.gargle.HOWL>

>>>>> "Pavel" == Pavel Shamis <(Pasha)" <pasha at dev.mellanox.co.il>> writes:

    Pavel> Hi Roland,
    >> >> I'm migrating from IBGD 1.8.2 (kernel 2.6.15.7) to OFED 1.1,
    >> >> and saw some unpleasant performance drops when using OFED
    >> 1.1 >> (kernel 2.6.20.1 with included IB drivers). The main
    >> drop is in >> throughput as measured by the OSU MPI bandwidth
    >> >> benchmark. However, the latency for large packet sizes is
    >> also >> worse (see results below). I tried with and without
    >> "options >> ib_mthca msi_x=1" (using IBGD, disabling msi_x
    >> makes a >> siginficant performance difference of
    >> approx. 10%). The IB card >> is a Mellanox MHGS18-XT (PCIe/DDR
    >> Firmware 1.2.0) running on an >> Opteron with nForce4 2200
    >> Professional chipset.
    >> >> 
    >> >> Does anybody have an explanation or even better a solution
    >> to >> this issue?
    >> 

    Pavel> Please try to add follow mvapich parameter :
    Pavel> VIADEV_DEFAULT_MTU=MTU2048
    >> Thanks for the suggestion. Unfortunately, it didn't improve the
    >> simple bandwidth results. Bi-directional bandwidth increased by
    >> 3% though. Any more ideas?

    Pavel> 3% is good start :-) Please also try to add this one:
    Pavel> VIADEV_MAX_RDMA_SIZE=4194304

This brought another 2% in bi-directional bandwidth, but still nothing
in uni-directional bandwidth.

mvapich version is 0.9.8

Roland


From mplee at sandia.gov  Wed Feb 28 07:25:38 2007
From: mplee at sandia.gov (Lee, Michael Paichi)
Date: Wed, 28 Feb 2007 08:25:38 -0700
Subject: [ofa-general] RE: [OFA General] List Address Change Completed
References: <OFF4A77187.E93B333B-ON85257290.004668F0-85257290.0046B3FE@us.ibm.com>
Message-ID: <3D84A59A1AD3584DA02AEAD240E8863F0366949D@ES22SNLNT.srn.sandia.gov>

Done.  If I missed anyone, please send me an email.

Michael


-----Original Message-----
From: Bernard King-Smith [mailto:wombat2 at us.ibm.com]
Sent: Wed 2/28/2007 4:52 AM
To: Lee, Michael Paichi
Cc: general at lists.openfabrics.org; openib-general at openib.org
Subject: Re: [OFA General] List Address Change Completed
 
Michael,

It looks like the migration of the mailing list deleted all subscriber 
settings for  using digest mode. Before the migration I used to get 
postings in digest form, now I get individual postings. Can you restore 
the subscriber settings  for those who used to have digest mode to getting 
digest again?

Regards.

Bernie King-Smith 
IBM Corporation
Server Group
Cluster System Performance 
wombat2 at us.ibm.com    (845)433-8483
Tie. 293-8483 or wombat2 on NOTES 

"We are not responsible for the world we are born into, only for the world 
we leave when we die.
So we have to accept what has gone before us and work to change the only 
thing we can,
-- The Future." William Shatner

general-bounces at lists.openfabrics.org wrote on 02/28/2007 02:17:34 AM:

> This list has been migrated to the new server, lists.openfabrics.
> org.  Please update any address book or filter settings to reflect 
> the new mailing list address.  Future messages and replies should be
> sent to this address:
> 
> general at lists.openfabrics.org
> 
> The new web address for this list is:
> 
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> 
> If you have any questions, please contact me at mplee at sandia.gov 
> 
> Regards,
> Michael _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> 
> To unsubscribe, please visit 
http://openib.org/mailman/listinfo/openib-general

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070228/e561c97f/attachment.html>

From mplee at sandia.gov  Wed Feb 28 07:25:38 2007
From: mplee at sandia.gov (Lee, Michael Paichi)
Date: Wed, 28 Feb 2007 08:25:38 -0700
Subject: [ofa-general] RE: [OFA General] List Address Change Completed
References: <OFF4A77187.E93B333B-ON85257290.004668F0-85257290.0046B3FE@us.ibm.com>
Message-ID: <3D84A59A1AD3584DA02AEAD240E8863F0366949D@ES22SNLNT.srn.sandia.gov>

Done.  If I missed anyone, please send me an email.

Michael


-----Original Message-----
From: Bernard King-Smith [mailto:wombat2 at us.ibm.com]
Sent: Wed 2/28/2007 4:52 AM
To: Lee, Michael Paichi
Cc: general at lists.openfabrics.org; openib-general at openib.org
Subject: Re: [OFA General] List Address Change Completed
 
Michael,

It looks like the migration of the mailing list deleted all subscriber 
settings for  using digest mode. Before the migration I used to get 
postings in digest form, now I get individual postings. Can you restore 
the subscriber settings  for those who used to have digest mode to getting 
digest again?

Regards.

Bernie King-Smith 
IBM Corporation
Server Group
Cluster System Performance 
wombat2 at us.ibm.com    (845)433-8483
Tie. 293-8483 or wombat2 on NOTES 

"We are not responsible for the world we are born into, only for the world 
we leave when we die.
So we have to accept what has gone before us and work to change the only 
thing we can,
-- The Future." William Shatner

general-bounces at lists.openfabrics.org wrote on 02/28/2007 02:17:34 AM:

> This list has been migrated to the new server, lists.openfabrics.
> org.  Please update any address book or filter settings to reflect 
> the new mailing list address.  Future messages and replies should be
> sent to this address:
> 
> general at lists.openfabrics.org
> 
> The new web address for this list is:
> 
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> 
> If you have any questions, please contact me at mplee at sandia.gov 
> 
> Regards,
> Michael _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> 
> To unsubscribe, please visit 
http://openib.org/mailman/listinfo/openib-general

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070228/e561c97f/attachment-0001.html>

From monis at voltaire.com  Wed Feb 28 07:38:13 2007
From: monis at voltaire.com (Moni Shoua)
Date: Wed, 28 Feb 2007 17:38:13 +0200
Subject: [ofa-general] Re: [RFC] [PATCH v2] IB/ipoib: Add bonding support to
	IPoIB
In-Reply-To: <20070227145114.GC4437@mellanox.co.il>
References: <45E313D2.70909@voltaire.com>
	<20070227060245.GI12919@mellanox.co.il> <45E41C13.8090300@voltaire.com>
	<20070227145114.GC4437@mellanox.co.il>
Message-ID: <45E5A1E5.4000201@voltaire.com>

Hi,
I took some comments from this discussion and I'll refer to them when I write a new version for this patch.
I'll post it soon.

thanks

 -MoniS

Michael S. Tsirkin wrote:

>> I got the assumption about neighbours living in one of these 2 tables from
>> observation and code reading.  I preferred that that on keeping track of all
>> ipoib_neighs and putting them in a list. However, I could do that instead of
>> neigh_table scanning. Do you think it's better?
> 
> If some neighbours are not on any tables, it seems using our own lists
> (e.g. lists we have in ipoib_path) is the only option, no?
OK, I see what you mean. I'll use my own list to keep track about ipoib_neighs.

>>>> The only way I found to avoid this (for now) is to check skb headroom in
>>>> ipoib_hard_header. I guess that this safety check doesn't harm regular IPoIB 
>>>> operation and it seems to solve my problem. However, I would be happy to hear what
>>>> others think of this last issue.
>>> As I said, this seems to indicate a problem in the bonding code.
>>> But what will happen after you error out in ipoib_hard_header?
>>> Is the packet dropped? What might break as a result?

Michael, your tip about hard_header_len helped. i found what was wrong in the bonding code.
Now the skb_under_panic() issue is gone. 
I will remove the part of checking for headroom from the patch.
Thanks

> 


From pasha at dev.mellanox.co.il  Wed Feb 28 07:44:27 2007
From: pasha at dev.mellanox.co.il (Pavel Shamis (Pasha))
Date: Wed, 28 Feb 2007 17:44:27 +0200
Subject: [ofa-general] Performance penalty of OFED 1.1 versus IBGD 1.8.2
In-Reply-To: <17893.40412.365196.423575@gargle.gargle.HOWL>
References: <17893.22368.748298.755523@gargle.gargle.HOWL>	<45E582DD.8010206@dev.mellanox.co.il>	<17893.34633.644064.978253@gargle.gargle.HOWL>	<45E58BB8.4020902@dev.mellanox.co.il>
	<17893.40412.365196.423575@gargle.gargle.HOWL>
Message-ID: <45E5A35B.8000200@dev.mellanox.co.il>

Roland Fehrenbacher wrote:
>>>>>> "Pavel" == Pavel Shamis <(Pasha)" <pasha at dev.mellanox.co.il>> writes:
> 
>     Pavel> Hi Roland,
>     >> >> I'm migrating from IBGD 1.8.2 (kernel 2.6.15.7) to OFED 1.1,
>     >> >> and saw some unpleasant performance drops when using OFED
>     >> 1.1 >> (kernel 2.6.20.1 with included IB drivers). The main
>     >> drop is in >> throughput as measured by the OSU MPI bandwidth
>     >> >> benchmark. However, the latency for large packet sizes is
>     >> also >> worse (see results below). I tried with and without
>     >> "options >> ib_mthca msi_x=1" (using IBGD, disabling msi_x
>     >> makes a >> siginficant performance difference of
>     >> approx. 10%). The IB card >> is a Mellanox MHGS18-XT (PCIe/DDR
>     >> Firmware 1.2.0) running on an >> Opteron with nForce4 2200
>     >> Professional chipset.
>     >> >> 
>     >> >> Does anybody have an explanation or even better a solution
>     >> to >> this issue?
>     >> 
> 
>     Pavel> Please try to add follow mvapich parameter :
>     Pavel> VIADEV_DEFAULT_MTU=MTU2048
>     >> Thanks for the suggestion. Unfortunately, it didn't improve the
>     >> simple bandwidth results. Bi-directional bandwidth increased by
>     >> 3% though. Any more ideas?
> 
>     Pavel> 3% is good start :-) Please also try to add this one:
>     Pavel> VIADEV_MAX_RDMA_SIZE=4194304
> 
> This brought another 2% in bi-directional bandwidth, but still nothing
> in uni-directional bandwidth.
> 
> mvapich version is 0.9.8
0.9.8 was not distributed (and tested) with OFED 1.1 :-(
Please try to use package distributed with OFED 1.1 version.

Pasha.


From vlad at mellanox.co.il  Wed Feb 28 08:18:46 2007
From: vlad at mellanox.co.il (Vladimir Sokolovsky)
Date: Wed, 28 Feb 2007 18:18:46 +0200
Subject: [ofa-general] [PATCH] Add dapltest headers to Makefile.am
Message-ID: <1172679526.21382.114.camel@vladsk-laptop>

Hi Arlin,
The followin patch fix dapltest compilation after 'make dist':

Add dapltest headers to EXTRA_DIST

Signed-off-by: Vladimir Sokolovsky <vlad at mellanox.co.il>

diff --git a/Makefile.am b/Makefile.am
index e2bf4dc..98bcf70 100644
--- a/Makefile.am
+++ b/Makefile.am
@@ -231,7 +231,35 @@ EXTRA_DIST = dat/common/dat_dictionary.h \
             doc/dat.conf \
             dapl/udapl/libdaplcma.map \
             libdat.spec.in \
-            $(man_MANS)
+            $(man_MANS) \
+            test/dapltest/include/dapl_bpool.h \
+            test/dapltest/include/dapl_client_info.h \
+            test/dapltest/include/dapl_common.h \
+            test/dapltest/include/dapl_execute.h \
+            test/dapltest/include/dapl_fft_cmd.h \
+            test/dapltest/include/dapl_fft_util.h \
+            test/dapltest/include/dapl_getopt.h \
+            test/dapltest/include/dapl_global.h \
+            test/dapltest/include/dapl_limit_cmd.h \
+            test/dapltest/include/dapl_mdep.h \
+            test/dapltest/include/dapl_memlist.h \
+            test/dapltest/include/dapl_params.h \
+            test/dapltest/include/dapl_performance_cmd.h \
+            test/dapltest/include/dapl_performance_stats.h \
+            test/dapltest/include/dapl_performance_test.h \
+            test/dapltest/include/dapl_proto.h \
+            test/dapltest/include/dapl_quit_cmd.h \
+            test/dapltest/include/dapl_server_cmd.h \
+            test/dapltest/include/dapl_server_info.h \
+            test/dapltest/include/dapl_tdep.h \
+            test/dapltest/include/dapl_tdep_print.h \
+            test/dapltest/include/dapl_test_data.h \
+            test/dapltest/include/dapl_transaction_cmd.h \
+            test/dapltest/include/dapl_transaction_stats.h \
+            test/dapltest/include/dapl_transaction_test.h \
+            test/dapltest/include/dapl_version.h \
+            test/dapltest/mdep/linux/dapl_mdep_user.h
+

 dist-hook: libdat.spec
        cp libdat.spec $(distdir)


From xma at us.ibm.com  Wed Feb 28 08:21:15 2007
From: xma at us.ibm.com (Shirley Ma)
Date: Wed, 28 Feb 2007 08:21:15 -0800
Subject: [ofa-general] Re: IPOIB NAPI
In-Reply-To: <20070228071706.GA22246@mellanox.co.il>
Message-ID: <OF6937543E.D3A5AAAE-ON87257290.005984AF-88257290.002DE8B7@us.ibm.com>


Michael,

>I have not benchmarked this, but actually the "return 1" version makes
sense to
>me too: since a new completion was observed after notify-cq, we likely
currently
>have HCA writing new completions into the CQ at a high rate, so it makes
sense
>to delay polling by a few cycles, and reduce the number of interrupts in
this
>way.

>Right?

Agree. Another question, have you benchmark IPoIB NAPI vs. missed event
only mode: just change ipoib completion from notify-cq, poll-cq to poll-cq,
notify-cq if any missed event, poll again? I am going to try this to see
the performance difference.

Thanks
Shirley Ma
IBM Linux Technology Center
15300 SW Koll Parkway
Beaverton, OR 97006-6063
Phone(Fax): (503) 578-7638
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070228/3e7d4556/attachment.html>

From surs at cse.ohio-state.edu  Wed Feb 28 08:40:40 2007
From: surs at cse.ohio-state.edu (Sayantan Sur)
Date: Wed, 28 Feb 2007 11:40:40 -0500
Subject: [ofa-general] Performance penalty of OFED 1.1 versus IBGD 1.8.2
In-Reply-To: <45E5A35B.8000200@dev.mellanox.co.il>
References: <17893.22368.748298.755523@gargle.gargle.HOWL>
	<45E582DD.8010206@dev.mellanox.co.il>
	<17893.34633.644064.978253@gargle.gargle.HOWL>
	<45E58BB8.4020902@dev.mellanox.co.il>
	<17893.40412.365196.423575@gargle.gargle.HOWL>
	<45E5A35B.8000200@dev.mellanox.co.il>
Message-ID: <20070228164038.GA28118@cse.ohio-state.edu>

Hi Roland,

* On Feb,2 Pavel Shamis (Pasha)<pasha at dev.mellanox.co.il> wrote :
> Roland Fehrenbacher wrote:
> >>>>>>"Pavel" == Pavel Shamis <(Pasha)" <pasha at dev.mellanox.co.il>> writes:
> >
> >    Pavel> Hi Roland,
> >    >> >> I'm migrating from IBGD 1.8.2 (kernel 2.6.15.7) to OFED 1.1,
> >    >> >> and saw some unpleasant performance drops when using OFED
> >    >> 1.1 >> (kernel 2.6.20.1 with included IB drivers). The main
> >    >> drop is in >> throughput as measured by the OSU MPI bandwidth
> >    >> >> benchmark. However, the latency for large packet sizes is
> >    >> also >> worse (see results below). I tried with and without
> >    >> "options >> ib_mthca msi_x=1" (using IBGD, disabling msi_x
> >    >> makes a >> siginficant performance difference of
> >    >> approx. 10%). The IB card >> is a Mellanox MHGS18-XT (PCIe/DDR
> >    >> Firmware 1.2.0) running on an >> Opteron with nForce4 2200
> >    >> Professional chipset.
> >    >> >> 
> >    >> >> Does anybody have an explanation or even better a solution
> >    >> to >> this issue?
> >    >> 
> >
> >    Pavel> Please try to add follow mvapich parameter :
> >    Pavel> VIADEV_DEFAULT_MTU=MTU2048
> >    >> Thanks for the suggestion. Unfortunately, it didn't improve the
> >    >> simple bandwidth results. Bi-directional bandwidth increased by
> >    >> 3% though. Any more ideas?
> >
> >    Pavel> 3% is good start :-) Please also try to add this one:
> >    Pavel> VIADEV_MAX_RDMA_SIZE=4194304
> >
> >This brought another 2% in bi-directional bandwidth, but still nothing
> >in uni-directional bandwidth.
> >
> >mvapich version is 0.9.8
> 0.9.8 was not distributed (and tested) with OFED 1.1 :-(
> Please try to use package distributed with OFED 1.1 version.

MVAPICH-0.9.8 was tested by the MVAPICH team on OFED 1.1. It is being
used at several production clusters with OFED 1.1.

I ran the bandwidth test on our Opteron nodes, AMD Processor 254 (2.8
GHz), with Mellanox dual-port DDR cards. I can see a peak bandwidth of
1402 MillionBytes/sec as reported by OSU Bandwidth test. On the same
machines, I ran ib_rdma_bw (in the perftest module of OFED-1.1), which
reports lower Gen2 level performance numbers. The peak bw reported by
ib_rdma_bw is 1307.09 MegaBytes/sec (=1338.09*1.048 = 1402
MillionBytes/sec). So, the lower level numbers match up to what is
reported by MPI.

I'm wondering how your lower-level ib_rdma_bw numbers look like? Are
they matching up with what OSU BW test reports? If they are, then it is
likely some other issue than MPI.

We also have a MVAPICH-0.9.9 beta version out. You could give that a try
too, if you want. We will be making the full release soon.

Thanks,
Sayantan.

-- 
http://www.cse.ohio-state.edu/~surs


From rdreier at cisco.com  Wed Feb 28 08:42:41 2007
From: rdreier at cisco.com (Roland Dreier)
Date: Wed, 28 Feb 2007 08:42:41 -0800
Subject: [ofa-general] IPoIB caused a kernel: BUG: soft lockup detected on
	CPU#0!
In-Reply-To: <200702281350.03788.hnguyen@linux.vnet.ibm.com> (Hoang-Nam
	Nguyen's message of "Wed, 28 Feb 2007 13:50:03 +0100")
References: <200702281350.03788.hnguyen@linux.vnet.ibm.com>
Message-ID: <adatzx6s072.fsf@cisco.com>

I guess the solution is to merge IPoIB NAPI to avoid overloading the
system with interrupts.  I'll fix up a few last things with my NAPI
patch and we can try to get it in shape to merge for 2.6.22.

 > diff --git a/drivers/infiniband/ulp/ipoib/ipoib_ib.c b/drivers/infiniband/ulp/ipoib/ipoib_ib.c
 > index f2aa923..97ea26f 100644
 > --- a/drivers/infiniband/ulp/ipoib/ipoib_ib.c
 > +++ b/drivers/infiniband/ulp/ipoib/ipoib_ib.c
 > @@ -301,6 +301,7 @@ void ipoib_ib_completion(struct ib_cq *c
 >  		n = ib_poll_cq(cq, IPOIB_NUM_WC, priv->ibwc);
 >  		for (i = 0; i < n; ++i)
 >  			ipoib_ib_handle_wc(dev, priv->ibwc + i);
 > +		cond_resched();

obviously this is wrong because ipoib_ib_completion() is not
necessarily called in process context (in fact the ehca scaling hack
is probably the only driver that does call it when it's safe to
reschedule).

 >  	} while (n == IPOIB_NUM_WC);
 >  }
 > 
 > However I still saw that BUG trace occurred on 3-4 cpus after several hrs. 

Right, because this patch is not really doing anything to reduce the
interrupt load.


From surs at cse.ohio-state.edu  Wed Feb 28 08:46:46 2007
From: surs at cse.ohio-state.edu (Sayantan Sur)
Date: Wed, 28 Feb 2007 11:46:46 -0500
Subject: [ofa-general] Performance penalty of OFED 1.1 versus IBGD 1.8.2
In-Reply-To: <20070228164038.GA28118@cse.ohio-state.edu>
References: <17893.22368.748298.755523@gargle.gargle.HOWL>
	<45E582DD.8010206@dev.mellanox.co.il>
	<17893.34633.644064.978253@gargle.gargle.HOWL>
	<45E58BB8.4020902@dev.mellanox.co.il>
	<17893.40412.365196.423575@gargle.gargle.HOWL>
	<45E5A35B.8000200@dev.mellanox.co.il>
	<20070228164038.GA28118@cse.ohio-state.edu>
Message-ID: <20070228164645.GA22595@cse.ohio-state.edu>

Hi,

* On Feb,3 Sayantan Sur<surs at cse.ohio-state.edu> wrote :
> Hi Roland,
> 
> * On Feb,2 Pavel Shamis (Pasha)<pasha at dev.mellanox.co.il> wrote :
> > Roland Fehrenbacher wrote:
> > >>>>>>"Pavel" == Pavel Shamis <(Pasha)" <pasha at dev.mellanox.co.il>> writes:
> > >
> > >    Pavel> Hi Roland,
> > >    >> >> I'm migrating from IBGD 1.8.2 (kernel 2.6.15.7) to OFED 1.1,
> > >    >> >> and saw some unpleasant performance drops when using OFED
> > >    >> 1.1 >> (kernel 2.6.20.1 with included IB drivers). The main
> > >    >> drop is in >> throughput as measured by the OSU MPI bandwidth
> > >    >> >> benchmark. However, the latency for large packet sizes is
> > >    >> also >> worse (see results below). I tried with and without
> > >    >> "options >> ib_mthca msi_x=1" (using IBGD, disabling msi_x
> > >    >> makes a >> siginficant performance difference of
> > >    >> approx. 10%). The IB card >> is a Mellanox MHGS18-XT (PCIe/DDR
> > >    >> Firmware 1.2.0) running on an >> Opteron with nForce4 2200
> > >    >> Professional chipset.
> > >    >> >> 
> > >    >> >> Does anybody have an explanation or even better a solution
> > >    >> to >> this issue?
> > >    >> 
> > >
> > >    Pavel> Please try to add follow mvapich parameter :
> > >    Pavel> VIADEV_DEFAULT_MTU=MTU2048
> > >    >> Thanks for the suggestion. Unfortunately, it didn't improve the
> > >    >> simple bandwidth results. Bi-directional bandwidth increased by
> > >    >> 3% though. Any more ideas?
> > >
> > >    Pavel> 3% is good start :-) Please also try to add this one:
> > >    Pavel> VIADEV_MAX_RDMA_SIZE=4194304
> > >
> > >This brought another 2% in bi-directional bandwidth, but still nothing
> > >in uni-directional bandwidth.
> > >
> > >mvapich version is 0.9.8
> > 0.9.8 was not distributed (and tested) with OFED 1.1 :-(
> > Please try to use package distributed with OFED 1.1 version.
> 
> MVAPICH-0.9.8 was tested by the MVAPICH team on OFED 1.1. It is being
> used at several production clusters with OFED 1.1.
> 
> I ran the bandwidth test on our Opteron nodes, AMD Processor 254 (2.8
> GHz), with Mellanox dual-port DDR cards. I can see a peak bandwidth of
> 1402 MillionBytes/sec as reported by OSU Bandwidth test. On the same
> machines, I ran ib_rdma_bw (in the perftest module of OFED-1.1), which
> reports lower Gen2 level performance numbers. The peak bw reported by
> ib_rdma_bw is 1307.09 MegaBytes/sec (=1338.09*1.048 = 1402
> MillionBytes/sec). So, the lower level numbers match up to what is
> reported by MPI.

The above was done with OFED-1.1. Using IBGD-1.8.2 on the same machines
and saw 1402 MillionBytes/sec peak bandwidth. This is the same as
reported by OFED-1.1.

> I'm wondering how your lower-level ib_rdma_bw numbers look like? Are
> they matching up with what OSU BW test reports? If they are, then it is
> likely some other issue than MPI.
> 
> We also have a MVAPICH-0.9.9 beta version out. You could give that a try
> too, if you want. We will be making the full release soon.

In addition, you can check the following URL w.r.t. performance
numbers.

http://nowlab.cse.ohio-state.edu/projects/mpi-iba/performance/mvapich/opteron/MVAPICH-0.9.8-opteron-gen2-DDR.html

Thanks,
Sayantan.

> 
> Thanks,
> Sayantan.
> 
> -- 
> http://www.cse.ohio-state.edu/~surs
> _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

-- 
http://www.cse.ohio-state.edu/~surs


From hnguyen at linux.vnet.ibm.com  Wed Feb 28 09:01:02 2007
From: hnguyen at linux.vnet.ibm.com (Hoang-Nam Nguyen)
Date: Wed, 28 Feb 2007 18:01:02 +0100
Subject: [ofa-general] [PATCH 2.6.21-rc2] ehca: fix mismatched sync between
	completion handler and destroy cq
Message-ID: <200702281801.02747.hnguyen@linux.vnet.ibm.com>

This patch fixes two issues reported by Roland and Christoph H.:
- Mismatched sync/locking between completion handler and destroy cq
  We introduced a counter nr_events per cq to track number of irq
  events seen. This counter is incremented when an event queue
  entry is seen and decremented after completion handler has been
  called regardless if scaling code is active or not. Note that
  nr_callbacks tracks number of events assigned to a cpu and
  both counters can potentially diverge.
  The sync between running completion handler and destroy cq
  is done by using the global spin lock ehca_cq_idr_lock.
- Replace yield by wait_event on the counter above to become zero


Signed-off-by: Hoang-Nam Nguyen <hnguyen at de.ibm.com>
---


 ehca_classes.h |    6 ++++-
 ehca_cq.c      |   16 +++++++++++++--
 ehca_irq.c     |   59 +++++++++++++++++++++++++++++++++++++--------------------
 ehca_main.c    |    4 +--
 4 files changed, 60 insertions(+), 25 deletions(-)


diff --git a/drivers/infiniband/hw/ehca/ehca_classes.h b/drivers/infiniband/hw/ehca/ehca_classes.h
index 40404c9..85fe741 100644
--- a/drivers/infiniband/hw/ehca/ehca_classes.h
+++ b/drivers/infiniband/hw/ehca/ehca_classes.h
@@ -52,6 +52,8 @@ struct ehca_mw;
 struct ehca_pd;
 struct ehca_av;
 
+#include <linux/completion.h>
+
 #include <rdma/ib_verbs.h>
 #include <rdma/ib_user_verbs.h>
 
@@ -153,7 +155,9 @@ struct ehca_cq {
 	spinlock_t cb_lock;
 	struct hlist_head qp_hashtab[QP_HASHTAB_LEN];
 	struct list_head entry;
-	u32 nr_callbacks;
+	u32 nr_callbacks; /* #events assigned to cpu by scaling code */
+	u32 nr_events;    /* #events seen */
+	wait_queue_head_t wait_completion;
 	spinlock_t task_lock;
 	u32 ownpid;
 	/* mmap counter for resources mapped into user space */
diff --git a/drivers/infiniband/hw/ehca/ehca_cq.c b/drivers/infiniband/hw/ehca/ehca_cq.c
index 6ebfa27..e2cdc1a 100644
--- a/drivers/infiniband/hw/ehca/ehca_cq.c
+++ b/drivers/infiniband/hw/ehca/ehca_cq.c
@@ -146,6 +146,7 @@ struct ib_cq *ehca_create_cq(struct ib_d
 	spin_lock_init(&my_cq->spinlock);
 	spin_lock_init(&my_cq->cb_lock);
 	spin_lock_init(&my_cq->task_lock);
+	init_waitqueue_head(&my_cq->wait_completion);
 	my_cq->ownpid = current->tgid;
 
 	cq = &my_cq->ib_cq;
@@ -302,6 +303,16 @@ create_cq_exit1:
 	return cq;
 }
 
+static int get_cq_nr_events(struct ehca_cq *my_cq)
+{
+	int ret;
+	unsigned long flags;
+	spin_lock_irqsave(&ehca_cq_idr_lock, flags);
+	ret = my_cq->nr_events;
+	spin_unlock_irqrestore(&ehca_cq_idr_lock, flags);
+	return ret;
+}
+
 int ehca_destroy_cq(struct ib_cq *cq)
 {
 	u64 h_ret;
@@ -329,10 +340,11 @@ int ehca_destroy_cq(struct ib_cq *cq)
 	}
 
 	spin_lock_irqsave(&ehca_cq_idr_lock, flags);
-	while (my_cq->nr_callbacks) {
+	while (my_cq->nr_events) {
 		spin_unlock_irqrestore(&ehca_cq_idr_lock, flags);
-		yield();
+		wait_event(my_cq->wait_completion, !get_cq_nr_events(my_cq));
 		spin_lock_irqsave(&ehca_cq_idr_lock, flags);
+		/* recheck nr_events to assure no cqe has just arrived */
 	}
 
 	idr_remove(&ehca_cq_idr, my_cq->token);
diff --git a/drivers/infiniband/hw/ehca/ehca_irq.c b/drivers/infiniband/hw/ehca/ehca_irq.c
index 3ec53c6..7d8b795 100644
--- a/drivers/infiniband/hw/ehca/ehca_irq.c
+++ b/drivers/infiniband/hw/ehca/ehca_irq.c
@@ -404,10 +403,11 @@ static inline void process_eqe(struct eh
 	u32 token;
 	unsigned long flags;
 	struct ehca_cq *cq;
+
 	eqe_value = eqe->entry;
 	ehca_dbg(&shca->ib_device, "eqe_value=%lx", eqe_value);
 	if (EHCA_BMASK_GET(EQE_COMPLETION_EVENT, eqe_value)) {
-		ehca_dbg(&shca->ib_device, "... completion event");
+		ehca_dbg(&shca->ib_device, "Got completion event");
 		token = EHCA_BMASK_GET(EQE_CQ_TOKEN, eqe_value);
 		spin_lock_irqsave(&ehca_cq_idr_lock, flags);
 		cq = idr_find(&ehca_cq_idr, token);
@@ -419,16 +419,20 @@ static inline void process_eqe(struct eh
 			return;
 		}
 		reset_eq_pending(cq);
-		if (ehca_scaling_code) {
+		cq->nr_events++;
+		spin_unlock_irqrestore(&ehca_cq_idr_lock, flags);
+		if (ehca_scaling_code)
 			queue_comp_task(cq);
-			spin_unlock_irqrestore(&ehca_cq_idr_lock, flags);
-		} else {
-			spin_unlock_irqrestore(&ehca_cq_idr_lock, flags);
+		else {
 			comp_event_callback(cq);
+			spin_lock_irqsave(&ehca_cq_idr_lock, flags);
+			cq->nr_events--;
+			if (!cq->nr_events)
+				wake_up(&cq->wait_completion);
+			spin_unlock_irqrestore(&ehca_cq_idr_lock, flags);
 		}
 	} else {
-		ehca_dbg(&shca->ib_device,
-			 "Got non completion event");
+		ehca_dbg(&shca->ib_device, "Got non completion event");
 		parse_identifier(shca, eqe_value);
 	}
 }
@@ -478,6 +482,7 @@ void ehca_process_eq(struct ehca_shca *s
 					 "token=%x", token);
 				continue;
 			}
+			eqe_cache[eqe_cnt].cq->nr_events++;
 			spin_unlock(&ehca_cq_idr_lock);
 		} else
 			eqe_cache[eqe_cnt].cq = NULL;
@@ -504,12 +509,18 @@ void ehca_process_eq(struct ehca_shca *s
 	/* call completion handler for cached eqes */
 	for (i = 0; i < eqe_cnt; i++)
 		if (eq->eqe_cache[i].cq) {
-			if (ehca_scaling_code) {
-				spin_lock(&ehca_cq_idr_lock);
+			if (ehca_scaling_code)
 				queue_comp_task(eq->eqe_cache[i].cq);
-				spin_unlock(&ehca_cq_idr_lock);
-			} else
-				comp_event_callback(eq->eqe_cache[i].cq);
+			else {
+				struct ehca_cq *cq = eq->eqe_cache[i].cq;
+				comp_event_callback(cq);
+				spin_lock_irqsave(&ehca_cq_idr_lock, flags);
+				cq->nr_events--;
+				if (!cq->nr_events)
+					wake_up(&cq->wait_completion);
+				spin_unlock_irqrestore(&ehca_cq_idr_lock,
+						       flags);
+			}
 		} else {
 			ehca_dbg(&shca->ib_device, "Got non completion event");
 			parse_identifier(shca, eq->eqe_cache[i].eqe->entry);
@@ -523,7 +534,6 @@ void ehca_process_eq(struct ehca_shca *s
 		if (!eqe)
 			break;
 		process_eqe(shca, eqe);
-		eqe_cnt++;
 	} while (1);
 
 unlock_irq_spinlock:
@@ -567,8 +577,7 @@ static void __queue_comp_task(struct ehc
 		list_add_tail(&__cq->entry, &cct->cq_list);
 		cct->cq_jobs++;
 		wake_up(&cct->wait_queue);
-	}
-	else
+	} else
 		__cq->nr_callbacks++;
 
 	spin_unlock(&__cq->task_lock);
@@ -577,18 +586,21 @@ static void __queue_comp_task(struct ehc
 
 static void queue_comp_task(struct ehca_cq *__cq)
 {
-	int cpu;
 	int cpu_id;
 	struct ehca_cpu_comp_task *cct;
+	int cq_jobs;
+	unsigned long flags;
 
-	cpu = get_cpu();
 	cpu_id = find_next_online_cpu(pool);
 	BUG_ON(!cpu_online(cpu_id));
 
 	cct = per_cpu_ptr(pool->cpu_comp_tasks, cpu_id);
 	BUG_ON(!cct);
 
-	if (cct->cq_jobs > 0) {
+	spin_lock_irqsave(&cct->task_lock, flags);
+	cq_jobs = cct->cq_jobs;
+	spin_unlock_irqrestore(&cct->task_lock, flags);
+	if (cq_jobs > 0) {
 		cpu_id = find_next_online_cpu(pool);
 		cct = per_cpu_ptr(pool->cpu_comp_tasks, cpu_id);
 		BUG_ON(!cct);
@@ -608,11 +620,17 @@ static void run_comp_task(struct ehca_cp
 		cq = list_entry(cct->cq_list.next, struct ehca_cq, entry);
 		spin_unlock_irqrestore(&cct->task_lock, flags);
 		comp_event_callback(cq);
-		spin_lock_irqsave(&cct->task_lock, flags);
 
+		spin_lock_irqsave(&ehca_cq_idr_lock, flags);
+		cq->nr_events--;
+		if (!cq->nr_events)
+			wake_up(&cq->wait_completion);
+		spin_unlock_irqrestore(&ehca_cq_idr_lock, flags);
+
+		spin_lock_irqsave(&cct->task_lock, flags);
 		spin_lock(&cq->task_lock);
 		cq->nr_callbacks--;
-		if (cq->nr_callbacks == 0) {
+		if (!cq->nr_callbacks) {
 			list_del_init(cct->cq_list.next);
 			cct->cq_jobs--;
 		}
diff --git a/drivers/infiniband/hw/ehca/ehca_main.c b/drivers/infiniband/hw/ehca/ehca_main.c
index c183512..a5e564a 100644
--- a/drivers/infiniband/hw/ehca/ehca_main.c
+++ b/drivers/infiniband/hw/ehca/ehca_main.c
@@ -52,7 +52,7 @@ #include "hcp_if.h"
 MODULE_LICENSE("Dual BSD/GPL");
 MODULE_AUTHOR("Christoph Raisch <raisch at de.ibm.com>");
 MODULE_DESCRIPTION("IBM eServer HCA InfiniBand Device Driver");
-MODULE_VERSION("SVNEHCA_0021");
+MODULE_VERSION("SVNEHCA_0022");
 
 int ehca_open_aqp1     = 0;
 int ehca_debug_level   = 0;
@@ -810,7 +809,7 @@ int __init ehca_module_init(void)
 	int ret;
 
 	printk(KERN_INFO "eHCA Infiniband Device Driver "
-	       "(Rel.: SVNEHCA_0021)\n");
+	       "(Rel.: SVNEHCA_0022)\n");
 	idr_init(&ehca_qp_idr);
 	idr_init(&ehca_cq_idr);
 	spin_lock_init(&ehca_qp_idr_lock);


From rf at q-leap.de  Wed Feb 28 09:14:37 2007
From: rf at q-leap.de (Roland Fehrenbacher)
Date: Wed, 28 Feb 2007 18:14:37 +0100
Subject: [ofa-general] Performance penalty of OFED 1.1 versus IBGD 1.8.2
In-Reply-To: <20070228164038.GA28118@cse.ohio-state.edu>
References: <17893.22368.748298.755523@gargle.gargle.HOWL>
	<45E582DD.8010206@dev.mellanox.co.il>
	<17893.34633.644064.978253@gargle.gargle.HOWL>
	<45E58BB8.4020902@dev.mellanox.co.il>
	<17893.40412.365196.423575@gargle.gargle.HOWL>
	<45E5A35B.8000200@dev.mellanox.co.il>
	<20070228164038.GA28118@cse.ohio-state.edu>
Message-ID: <17893.47229.704258.287392@gargle.gargle.HOWL>

>>>>> "Sayantan" == Sayantan Sur <surs at cse.ohio-state.edu> writes:

    Roland> I'm migrating from IBGD 1.8.2 (kernel 2.6.15.7) to OFED
    Roland> 1.1, and saw some unpleasant performance drops when using
    Roland> OFED 1.1 (kernel 2.6.20.1 with included IB drivers). The
    Roland> main drop is in throughput as measured by the OSU MPI
    Roland> bandwidth benchmark. However, the latency for large packet
    Roland> sizes is also worse (see results below). I tried with and
    Roland> without "options ib_mthca msi_x=1" (using IBGD, disabling
    Roland> msi_x makes a siginficant performance difference of
    Roland> approx. 10%). The IB card is a Mellanox MHGS18-XT
    Roland> (PCIe/DDR Firmware 1.2.0) running on an Opteron with
    Roland> nForce4 2200 Professional chipset.
    Roland> 
    Roland> Does anybody have an explanation or even better a solution
    Roland> to this issue?

    Pavel> Please try to add follow mvapich parameter :
    Pavel> VIADEV_DEFAULT_MTU=MTU2048

    Roland> Thanks for the suggestion. Unfortunately, it didn't
    Roland> improve the simple bandwidth results. Bi-directional
    Roland> bandwidth increased by 3% though. Any more ideas?

    Pavel> 3% is good start :-) Please also try to add this one:
    Pavel> VIADEV_MAX_RDMA_SIZE=4194304

    Roland> This brought another 2% in bi-directional bandwidth, but
    Roland> still nothing in uni-directional bandwidth.

    Roland> mvapich version is 0.9.8

    Pavel>  0.9.8 was not distributed (and tested) with OFED 1.1 :-(
    Pavel> Please try to use package distributed with OFED 1.1
    Pavel> version.

    Sayantan> MVAPICH-0.9.8 was tested by the MVAPICH team on OFED
    Sayantan> 1.1. It is being used at several production clusters
    Sayantan> with OFED 1.1.

    Sayantan> I ran the bandwidth test on our Opteron nodes, AMD
    Sayantan> Processor 254 (2.8 GHz), with Mellanox dual-port DDR
    Sayantan> cards. I can see a peak bandwidth of 1402
    Sayantan> MillionBytes/sec as reported by OSU Bandwidth test. On
    Sayantan> the same machines, I ran ib_rdma_bw (in the perftest
    Sayantan> module of OFED-1.1), which reports lower Gen2 level
    Sayantan> performance numbers. The peak bw reported by ib_rdma_bw
    Sayantan> is 1307.09 MegaBytes/sec (=1338.09*1.048 = 1402
    Sayantan> MillionBytes/sec). So, the lower level numbers match up
    Sayantan> to what is reported by MPI.

    Sayantan> I'm wondering how your lower-level ib_rdma_bw numbers
    Sayantan> look like?

I get:

3802: Bandwidth peak (#0 to #989): 1288.55 MB/sec
3802: Bandwidth average: 1288.54 MB/sec
3802: Service Demand peak (#0 to #989): 1818 cycles/KB
3802: Service Demand Avg  : 1818 cycles/KB

so, 1288.55MB/sec*1.048 = 1350 MillionBytes/sec, also matches up
exactly with the MPI results (see results below)

    Sayantan> Are they matching up with what OSU BW test reports? If
    Sayantan> they are, then it is likely some other issue than MPI.

Looks like it's not MPI then. What else could be wrong? Why is IBGD so
much better in my case?

    Sayantan> We also have a MVAPICH-0.9.9 beta version out. You could
    Sayantan> give that a try too, if you want. We will be making the
    Sayantan> full release soon.

Probably won't help in this case.

Roland

> ------------------------------------------------------------------------
> 
> IBGD
> --------
> 
> # OSU MPI Bandwidth Test (Version 2.1)
> # Size          Bandwidth (MB/s)
> 1               0.830306
> 2               1.642710
> 4               3.307494
> 8               6.546477
> 16              13.161954
> 32              26.395154
> 64              52.913060
> 128             101.890547
> 256             172.227478
> 512             383.296292
> 1024            611.172247
> 2048            830.147571
> 4096            1068.057366
> 8192            1221.262520
> 16384           1271.771983
> 32768           1369.702828
> 65536           1426.124683
> 131072          1453.781151
> 262144          1457.297992
> 524288          1464.625860
> 1048576         1468.953875
> 2097152         1470.614903
> 4194304         1471.607758
> 
> # OSU MPI Latency Test (Version 2.1)
> # Size          Latency (us)
> 0               3.03
> 1               3.03
> 2               3.04
> 4               3.03
> 8               3.03
> 16              3.04
> 32              3.11
> 64              3.23
> 128             3.49
> 256             3.83
> 512             4.88
> 1024            6.31
> 2048            8.60
> 4096            11.02
> 8192            15.78
> 16384           28.85
> 32768           39.82
> 65536           60.30
> 131072          106.65
> 262144          196.47
> 524288          374.62
> 1048576         730.79
> 2097152         1442.32
> 4194304         2864.80
> 
> OFED 1.1
> ---------
> 
> # OSU MPI Bandwidth Test (Version 2.2)
> # Size          Bandwidth (MB/s)
> 1               0.698614
> 2               1.463192
> 4               2.941852
> 8               5.859464
> 16              11.697510
> 32              23.339031
> 64              46.403081
> 128             92.013928
> 256             182.918388
> 512             315.076923
> 1024            500.083937
> 2048            765.294564
> 4096            1003.652513
> 8192            1147.640312
> 16384           1115.803139
> 32768           1221.120298
> 65536           1282.328447
> 131072          1315.715608
> 262144          1331.456393
> 524288          1340.691793
> 1048576         1345.650404
> 2097152         1349.279211
> 4194304         1350.489883
> 
> # OSU MPI Latency Test (Version 2.2)
> # Size          Latency (us)
> 0               2.99
> 1               3.03
> 2               3.06
> 4               3.03
> 8               3.03
> 16              3.04
> 32              3.12
> 64              3.27
> 128             3.96
> 256             4.29
> 512             4.99
> 1024            6.53
> 2048            9.08
> 4096            11.92
> 8192            17.39
> 16384           31.05
> 32768           43.47
> 65536           67.17
> 131072          115.30
> 262144          212.33
> 524288          405.20
> 1048576         790.45
> 2097152         1558.88
> 4194304         3095.17
> 
> 
> ------------------------------------------------------------------------


From dledford at redhat.com  Wed Feb 28 09:56:58 2007
From: dledford at redhat.com (Doug Ledford)
Date: Wed, 28 Feb 2007 12:56:58 -0500
Subject: [ofa-general] OFED 1.2 Feb-26 meeting summary
In-Reply-To: <45E58D3A.8060906@mellanox.co.il>
References: <45E58D3A.8060906@mellanox.co.il>
Message-ID: <1172685419.4777.145.camel@fc6.xsintricity.com>

On Wed, 2007-02-28 at 16:10 +0200, Tziporet Koren wrote:
>       * Improved RPM usage by the install will not be part of OFED
>         1.2 

Since I first brought this up, you have added new libraries, iWARP
support, etc.  These constitute new RPMs.  And, because you guys have
been doing things contrary to standards like the file hierarchy standard
in the original RPMs, it's been carried forward to these new RPMs.  This
is a snowball, and the longer you put off fixing it, the harder it gets
to change.  And not just in your RPMs either.  The longer you put off
coming up with a reasonable standard for MPI library and executable file
locations, the longer customers will hand roll their own site specific
setups, and the harder it will be to get them to switch over to the
standard once you *do* implement it.  You may end up dooming Jeff to
maintaining those custom file location hacks in the OpenMPI spec
forever.

Not to mention that interoperability is about more than one machine
talking to another machine.  It's also about a customer's application
building properly on different versions of the stack, without the
customer needing to change all the include file locations and link
parameters.  It's also about a customer being able to rest assured that
if they tried to install two conflicting copies of libibverbs, it would
in fact cause RPM to throw conflict errors (which it doesn't now because
your libibverbs is in /usr/local, where I'm not allowed to put ours, so
since the files are in different locations, rpm will happily let the
user install both your libibverbs and my libibverbs without a conflict,
and a customer could waste large amounts of time trying to track down a
bug in one library only to find out their application is linking against
the other).

>               * The RPM usage will be enhanced for the next (1.3)
>                 release and we will decide on the correct way in
>                 Sonoma.


There's not really much to decide.  Either the stack is Linux File
Hierarchy Standard compliant or it isn't.  The only leeway for decisions
allowed by the standard is on things like where in /etc to put the
config files (since you guys are striving to be a generic RDMA stack,
not just an IB stack, I would suggest that all RDMA related config files
go into /etc/rdma, and for those applications that can reasonably be run
absent RDMA technology, like OpenMPI, I would separate their config
files off into either /etc or /etc/openmpi, ditto for the include
directories, /usr/include/rdma for the generic non-IB specific stuff,
and possibly /usr/include/rdma/infiniband for IB specific stuff, or you
could put the IB stuff under /usr/include/infiniband, either way).

The biggest variation from the spec that needs to be dealt with is the
need for multiple MPI installations, which is problematic if you just
use generic locations as it stands today, but with a few modifications
to the MPI stack it could be worked around.


-- 
Doug Ledford <dledford at redhat.com>
              GPG KeyID: CFBFF194
              http://people.redhat.com/dledford

Infiniband specific RPMs available at
              http://people.redhat.com/dledford/Infiniband
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070228/9e0910dd/attachment.sig>

From mst at mellanox.co.il  Wed Feb 28 10:39:54 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Wed, 28 Feb 2007 20:39:54 +0200
Subject: [ofa-general] Re: IPOIB NAPI
In-Reply-To: <OF6937543E.D3A5AAAE-ON87257290.005984AF-88257290.002DE8B7@us.ibm.com>
References: <20070228071706.GA22246@mellanox.co.il>
	<OF6937543E.D3A5AAAE-ON87257290.005984AF-88257290.002DE8B7@us.ibm.com>
Message-ID: <20070228183922.GB10826@mellanox.co.il>

> Quoting Shirley Ma <xma at us.ibm.com>:
> Subject: Re: IPOIB NAPI
> 
> Michael,
> 
> >I have not benchmarked this, but actually the "return 1" version makes sense
> to
> >me too: since a new completion was observed after notify-cq, we likely
> currently
> >have HCA writing new completions into the CQ at a high rate, so it makes sense
> >to delay polling by a few cycles, and reduce the number of interrupts in this
> >way.
> 
> >Right?
> 
> Agree. Another question, have you benchmark IPoIB NAPI vs. missed event only
> mode: just change ipoib completion from notify-cq, poll-cq to poll-cq,
> notify-cq if any missed event, poll again? I am going to try this to see the
> performance difference.

At some point, I think I compared req notif + poll against
poll + req notif + poll if missed (both without NAPI), and did not see
any speed difference.

NAPI was also benchmarked and it was a win as compared to non-NAPI,
especially with multiple sockets tests.

-- 
MST


From or.gerlitz at gmail.com  Wed Feb 28 10:43:51 2007
From: or.gerlitz at gmail.com (Or Gerlitz)
Date: Wed, 28 Feb 2007 20:43:51 +0200
Subject: [ofa-general] Re: [openib-general] [RFC] [PATCH] ib_cache: do not
 mask upper bit when searching for a pkey
In-Reply-To: <000201c759e3$24828410$55d8180a@amr.corp.intel.com>
References: <1172507101.4102.277140.camel@hal.voltaire.com>
	<000201c759e3$24828410$55d8180a@amr.corp.intel.com>
Message-ID: <15ddcffd0702281043h52ca49e7t110bc75e3ad2a832@mail.gmail.com>

On 2/26/07, Sean Hefty <sean.hefty at intel.com> wrote:
> I think the following patch would make ipoib spec compliant.
> ib_find_cached_pkey is called by ib_cm, rdma_cm, ib_srp, and ib_ipoib.
> I'm not certain what this change would do to SRP, but the ib_cm and
> rdma_cm look okay, given that non-reversible paths aren't supported
> yet anyway.

Sean,

As Moni stated, we need this functionality and among other scenarions,
the use case I have mentioned over this discussion was of an I/O
target being a full member of a partition where the initiators
connected to it being partial members - since they need not and should
not talk among themselves.

The connection may be implemented over TCP/UDP on top of IPoIB (eg
iscsi / nfs / some cluster file system) or over the RDMA CM and the
VERBS (iSER / rNFS / native implementation of cluster file systems) or
over the IB CM and the VERBS (srp).

For all the above cases expect for SRP IPoIB is used as the ARP
provider and it means that the nodes with the partial membership must
join the "IPv4 broadcast" IB multicast group. This is working fine
with the openib IPoIB and core implementation running against the
Voltaire SA/SM and as Hal commented (Hal - can you verify it? see (*)
below ) also against the open SM/SA. My guess this is also working
fine with TopSpin/Cisco SM/SA.

(*) simply configure the SM to allocate 0xffff (index 0) and 0x8001
(index 1) to node A, then 0x7fff (index 0) and 0x0001 (index 1) to
node B. Now, configure ib0 of both nodes to subnet X, create an 0x8001
ib0 child on both and configure ib0.8001 to subnet Y, make sure you
have pings on both subnets - thanks!

My suggestion is that we act to have the spec changed to match this
real need and not that this code (my guess which is present there from
day one, I guess Roland can tell) would be removed to match the spec.

Or.


From sashak at voltaire.com  Wed Feb 28 12:04:26 2007
From: sashak at voltaire.com (Sasha Khapyorsky)
Date: Wed, 28 Feb 2007 22:04:26 +0200
Subject: [ofa-general] Re: [PATCH] osm: Trivial changes for compilation on
	windows
In-Reply-To: <45E54653.6010300@dev.mellanox.co.il>
References: <45E54653.6010300@dev.mellanox.co.il>
Message-ID: <20070228200426.GC30973@sashak.voltaire.com>

On 11:07 Wed 28 Feb     , Yevgeny Kliteynik wrote:
> Hi Hal.
> 
> This patch has trivial data types changes and redefining a macro.
> 
> 
> BTW, Sasha, do we still need this macro (NOISE_L in osm_ucast_updn.c)?

For me it is perfectly fine to remove this completely.

Sasha


From halr at voltaire.com  Wed Feb 28 12:36:24 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 28 Feb 2007 15:36:24 -0500
Subject: [ofa-general] Re: [PATCH] opensm: remove some unneeded osm_switch
	functions
In-Reply-To: <20070225234705.GH11957@sashak.voltaire.com>
References: <20070225214845.GF11957@sashak.voltaire.com>
	<20070225234705.GH11957@sashak.voltaire.com>
Message-ID: <1172694979.31770.89937.camel@hal.voltaire.com>

On Sun, 2007-02-25 at 18:47, Sasha Khapyorsky wrote:
> Following introduced simplification this patch removes single field
> access functions from osm_switch.
> 
> Signed-off-by: Sasha Khapyorsky <sashak at voltaire.com>

Thanks. Applied (to both master and ofed_1_2).

-- Hal


From mst at mellanox.co.il  Wed Feb 28 13:02:35 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Wed, 28 Feb 2007 23:02:35 +0200
Subject: [ofa-general] [PATCH] IB/mthca: recv poll cq optimization
Message-ID: <20070228210235.GC8564@mellanox.co.il>

All good recv work requests generate HW completions in FIFO order, so we can use
rq->tail rather than hardware data. In this way, we save a branch on data path
for recv completions (branch is still there for send completions).

Signed-off-by: Michael S. Tsirkin <mst at mellanox.co.il>

---

Roland, what do you think? This increases the overall code size but I think the
extra code is on the error CQE handling path.  BTW, since most kernel QPs seem
not to use selective signaling, it might be worth it to optimize send
completions in a similiar way in case selective singaling is disabled on QP.

diff --git a/drivers/infiniband/hw/mthca/mthca_cq.c b/drivers/infiniband/hw/mthca/mthca_cq.c
index efd79ef..78f8069 100644
--- a/drivers/infiniband/hw/mthca/mthca_cq.c
+++ b/drivers/infiniband/hw/mthca/mthca_cq.c
@@ -542,38 +542,37 @@ static inline int mthca_poll_one(struct mthca_dev *dev,
 			     >> wq->wqe_shift);
 		entry->wr_id = (*cur_qp)->wrid[wqe_index +
 					       (*cur_qp)->rq.max];
+		if (wq->last_comp < wqe_index)
+			wq->tail += wqe_index - wq->last_comp;
+		else
+			wq->tail += wqe_index + wq->max - wq->last_comp;
+
+		wq->last_comp = wqe_index;
 	} else if ((*cur_qp)->ibqp.srq) {
 		struct mthca_srq *srq = to_msrq((*cur_qp)->ibqp.srq);
 		u32 wqe = be32_to_cpu(cqe->wqe);
-		wq = NULL;
 		wqe_index = wqe >> srq->wqe_shift;
 		entry->wr_id = srq->wrid[wqe_index];
 		mthca_free_srq_wqe(srq, wqe);
 	} else {
-		s32 wqe;
 		wq = &(*cur_qp)->rq;
-		wqe = be32_to_cpu(cqe->wqe);
-		wqe_index = wqe >> wq->wqe_shift;
-		/*
-		 * WQE addr == base - 1 might be reported in receive completion
-		 * with error instead of (rq size - 1) by Sinai FW 1.0.800 and
-		 * Arbel FW 5.1.400.  This bug should be fixed in later FW revs.
-		 */
-		if (unlikely(wqe_index < 0))
-			wqe_index = wq->max - 1;
-		entry->wr_id = (*cur_qp)->wrid[wqe_index];
+		wq->last_comp = wq->tail++ & (wq->max - 1);
+		entry->wr_id = (*cur_qp)->wrid[wq->last_comp];
 	}
 
-	if (wq) {
-		if (wq->last_comp < wqe_index)
-			wq->tail += wqe_index - wq->last_comp;
-		else
-			wq->tail += wqe_index + wq->max - wq->last_comp;
-
-		wq->last_comp = wqe_index;
-	}
+	if (unlikely(is_error)) {
+		if (!is_send && !(*cur_qp)->ibqp.srq) {
+			s32 wqe = be32_to_cpu(cqe->wqe);
+			wqe_index = wqe >> wq->wqe_shift;
+			/*
+			 * WQE addr == base - 1 might be reported in receive completion
+			 * with error instead of (rq size - 1) by Sinai FW 1.0.800 and
+			 * Arbel FW 5.1.400.  This bug should be fixed in later FW revs.
+			 */
+			if (unlikely(wqe_index < 0))
+				wqe_index = wq->max - 1;
+		}
 
-	if (is_error) {
 		handle_error_cqe(dev, cq, *cur_qp, wqe_index, is_send,
 				 (struct mthca_err_cqe *) cqe,
 				 entry, &free_cqe);
-- 
MST


From sashak at voltaire.com  Wed Feb 28 13:21:03 2007
From: sashak at voltaire.com (Sasha Khapyorsky)
Date: Wed, 28 Feb 2007 23:21:03 +0200
Subject: [ofa-general] [PATCH] opensm: switch pre-routing preparation status
	check
Message-ID: <20070228212103.GE30973@sashak.voltaire.com>


osm_switch_prepare_path_rebuild() will return status value now, it is
needed in order to track switch pre-routing preparation properly. Also
tiny p_sw->hops rework for potentially lockless p_sw->hops accessing.

Signed-off-by: Sasha Khapyorsky <sashak at voltaire.com>
---
 osm/include/opensm/osm_switch.h |    4 +-
 osm/opensm/osm_switch.c         |   32 +++++++++++++++++------------
 osm/opensm/osm_ucast_mgr.c      |   42 ++++++++++++++++++++++++--------------
 3 files changed, 47 insertions(+), 31 deletions(-)

diff --git a/osm/include/opensm/osm_switch.h b/osm/include/opensm/osm_switch.h
index c3ef865..4270904 100644
--- a/osm/include/opensm/osm_switch.h
+++ b/osm/include/opensm/osm_switch.h
@@ -1153,7 +1153,7 @@ osm_switch_path_count_get(
 *
 * SYNOPSIS
 */
-void
+int
 osm_switch_prepare_path_rebuild(
 	IN osm_switch_t* p_sw,
 	IN uint16_t max_lids );
@@ -1166,7 +1166,7 @@ osm_switch_prepare_path_rebuild(
 *		[in] Max number of lids in the subnet.
 *
 * RETURN VALUE
-*	None.
+*	Returns zero on success, or negative value if an error occurred.
 *
 * NOTES
 *
diff --git a/osm/opensm/osm_switch.c b/osm/opensm/osm_switch.c
index f258dbc..3a98a63 100644
--- a/osm/opensm/osm_switch.c
+++ b/osm/opensm/osm_switch.c
@@ -489,37 +489,43 @@ osm_switch_clear_hops(
 
 /**********************************************************************
  **********************************************************************/
-void
+int
 osm_switch_prepare_path_rebuild(
   IN osm_switch_t* p_sw,
   IN uint16_t max_lids )
 {
+  uint8_t **hops;
   unsigned i;
 
   for ( i = 0; i < p_sw->num_ports; i++ )
     osm_port_prof_construct( &p_sw->p_prof[i] );
   if (!p_sw->hops)
   {
-    p_sw->hops = malloc((max_lids + 1)*sizeof(p_sw->hops[0]));
-    if (!p_sw->hops)
-      return;
-    memset(p_sw->hops, 0, (max_lids + 1)*sizeof(p_sw->hops[0]));
+    hops = malloc((max_lids + 1)*sizeof(hops[0]));
+    if (!hops)
+      return -1;
+    memset(hops, 0, (max_lids + 1)*sizeof(hops[0]));
+    p_sw->hops = hops;
     p_sw->num_hops = max_lids + 1;
   }
   else if (max_lids + 1 > p_sw->num_hops)
   {
-    uint8_t **old_hops = p_sw->hops;
-
-    p_sw->hops = malloc((max_lids + 1)*sizeof(p_sw->hops[0]));
-    if (!p_sw->hops)
-      return;
-    memcpy(p_sw->hops, old_hops, p_sw->num_hops*sizeof(p_sw->hops[0]));
-    memset(p_sw->hops + p_sw->num_hops, 0,
-           (max_lids + 1 - p_sw->num_hops)*sizeof(p_sw->hops[0]));
+    uint8_t **old_hops;
+
+    hops = malloc((max_lids + 1)*sizeof(hops[0]));
+    if (!hops)
+      return -1;
+    memcpy(hops, p_sw->hops, p_sw->num_hops*sizeof(hops[0]));
+    memset(hops + p_sw->num_hops, 0,
+           (max_lids + 1 - p_sw->num_hops)*sizeof(hops[0]));
+    old_hops = p_sw->hops;
+    p_sw->hops = hops;
     p_sw->num_hops = max_lids + 1;
     free(old_hops);
   }
   p_sw->max_lid_ho = max_lids;
+
+  return 0;
 }
 
 /**********************************************************************
diff --git a/osm/opensm/osm_ucast_mgr.c b/osm/opensm/osm_ucast_mgr.c
index f02bae9..5b4ce45 100644
--- a/osm/opensm/osm_ucast_mgr.c
+++ b/osm/opensm/osm_ucast_mgr.c
@@ -404,20 +404,6 @@ static void __osm_ucast_mgr_dump_tables(osm_ucast_mgr_t *p_mgr)
 }
 
 /**********************************************************************
- Starting a rebuild, so notify the switch so it can clear tables, etc...
-**********************************************************************/
-static void
-__osm_ucast_mgr_setup_switch(
-  IN cl_map_item_t* const  p_map_item,
-  IN void* cxt )
-{
-  uint16_t lids = (uint16_t)cl_ptr_vector_get_size(&((osm_subn_t *)cxt)->port_lid_tbl);
-
-  osm_switch_prepare_path_rebuild((osm_switch_t *)p_map_item,
-                                  lids ? lids - 1 : 0);
-}
-
-/**********************************************************************
  Add each switch's own LID(s) to its LID matrix.
 **********************************************************************/
 static void
@@ -1195,6 +1181,30 @@ osm_ucast_mgr_build_lid_matrices(
 
 /**********************************************************************
  **********************************************************************/
+static int
+ucast_mgr_setup_all_switches(osm_subn_t *p_subn)
+{
+  osm_switch_t *p_sw;
+  uint16_t lids;
+
+  lids = (uint16_t)cl_ptr_vector_get_size(&p_subn->port_lid_tbl);
+  lids = lids ? lids - 1 : 0;
+
+  for (p_sw = (osm_switch_t*)cl_qmap_head(&p_subn->sw_guid_tbl);
+       p_sw != (osm_switch_t*)cl_qmap_end(&p_subn->sw_guid_tbl);
+       p_sw = (osm_switch_t*)cl_qmap_next(&p_sw->map_item))
+  if (osm_switch_prepare_path_rebuild(p_sw, lids)) {
+    osm_log(&p_subn->p_osm->log, OSM_LOG_ERROR,
+            "ucast_mgr_setup_all_switches: cannot setup switch 0x%016" PRIx64
+            "\n", cl_ntoh64(osm_node_get_node_guid(p_sw->p_node)));
+    return -1;
+  }
+
+  return 0;
+}
+
+/**********************************************************************
+ **********************************************************************/
 osm_signal_t
 osm_ucast_mgr_process(
   IN osm_ucast_mgr_t* const p_mgr )
@@ -1214,11 +1224,11 @@ osm_ucast_mgr_process(
   /*
     If there are no switches in the subnet, we are done.
   */
-  if (cl_qmap_count( p_sw_guid_tbl ) == 0)
+  if (cl_qmap_count( p_sw_guid_tbl ) == 0 ||
+      ucast_mgr_setup_all_switches(p_mgr->p_subn) < 0)
     goto Exit;
 
   p_mgr->any_change = FALSE;
-  cl_qmap_apply_func(p_sw_guid_tbl, __osm_ucast_mgr_setup_switch, p_mgr->p_subn);
 
   if (!p_routing_eng->build_lid_matrices ||
       p_routing_eng->build_lid_matrices(p_routing_eng->context) != 0)
-- 
1.5.0.1.40.gb40d


From sashak at voltaire.com  Wed Feb 28 13:52:50 2007
From: sashak at voltaire.com (Sasha Khapyorsky)
Date: Wed, 28 Feb 2007 23:52:50 +0200
Subject: [ofa-general] [PATCH TRIVIAL] opensm: remove NOISE_L macros from
	osm_ucast_updn.c
In-Reply-To: <20070228200426.GC30973@sashak.voltaire.com>
References: <45E54653.6010300@dev.mellanox.co.il>
	<20070228200426.GC30973@sashak.voltaire.com>
Message-ID: <20070228215250.GG30973@sashak.voltaire.com>


This removes NOISE_L macros completely.

Signed-off-by: Sasha Khapyorsky <sashak at voltaire.com>
---
 osm/opensm/osm_ucast_updn.c |   38 --------------------------------------
 1 files changed, 0 insertions(+), 38 deletions(-)

diff --git a/osm/opensm/osm_ucast_updn.c b/osm/opensm/osm_ucast_updn.c
index b8dd61c..72d943b 100644
--- a/osm/opensm/osm_ucast_updn.c
+++ b/osm/opensm/osm_ucast_updn.c
@@ -97,12 +97,6 @@ struct updn_node {
   unsigned visited;
 };
 
-#ifndef WIN32
-#define NOISE_L(log, fmt, arg...)
-#else
-#define NOISE_L
-#endif
-
 /* ///////////////////////////////// */
 /*  Statics                          */
 /* ///////////////////////////////// */
@@ -258,19 +252,10 @@ __updn_bfs_by_node(
   {
     ib_net64_t remote_guid, current_guid;
 
-    NOISE_L( p_log, OSM_LOG_DEBUG,
-             "__updn_bfs_by_node: "
-             "Starting a new iteration with %zu elements in current list\n",
-             cl_qlist_count(&list) );
-
     u = (struct updn_node *)cl_qlist_remove_head(&list);
     u->visited = 0; /* cleanup */
     current_dir = u->dir;
     current_guid = osm_node_get_node_guid(u->sw->p_node);
-    NOISE_L( p_log, OSM_LOG_DEBUG,
-             "__updn_bfs_by_node: "
-             "Visiting port GUID 0x%" PRIx64 "\n",
-             cl_ntoh64(current_guid) );
     /* Go over all ports of the switch and find unvisited remote nodes */
     for ( pn = 1; pn < u->sw->num_ports; pn++ )
     {
@@ -293,12 +278,6 @@ __updn_bfs_by_node(
                                 current_guid, remote_guid,
                                 u->is_root, rem_u->is_root);
 
-      NOISE_L( p_log, OSM_LOG_DEBUG,
-               "__updn_bfs_by_node: "
-               "move from 0x%016" PRIx64 " rank: %u "
-               "to 0x%016" PRIx64" rank: %u\n",
-               cl_ntoh64(current_guid), u->rank,
-               cl_ntoh64(remote_guid), rem_u->rank );
       /* Check if this is a legal step : the only illegal step is going
          from DOWN to UP */
       if ((current_dir == DOWN) && (next_dir == UP))
@@ -317,15 +296,6 @@ __updn_bfs_by_node(
       remote_min_hop = osm_switch_get_hop_count(p_remote_sw, root_lid, pn_rem);
       if (current_min_hop + 1 < remote_min_hop)
       {
-        NOISE_L( p_log, OSM_LOG_DEBUG,
-                 "__updn_bfs_by_node (less): "
-                 "Setting Min Hop Table of switch: 0x%" PRIx64
-                 "\n\t\tCurrent hop count is: %d, next hop count: %d"
-                 "\n\tlid to set: 0x%x"
-                 "\n\tport number: 0x%X"
-                 "\n\thops number: %d\n",
-                 cl_ntoh64(remote_guid), remote_min_hop,current_min_hop + 1,
-                 root_lid, pn_rem, current_min_hop + 1 );
         set_hop_return_value = osm_switch_set_hops(p_remote_sw, root_lid, pn_rem, current_min_hop + 1);
         if (set_hop_return_value)
         {
@@ -340,11 +310,6 @@ __updn_bfs_by_node(
           /* Insert updn_switch item into the list */
           rem_u->dir = next_dir;
           rem_u->visited = 1;
-          NOISE_L( p_log, OSM_LOG_DEBUG,
-                   "__updn_bfs_by_node: "
-                   "Inserting new element to the next list: guid=0x%" PRIx64 " %s\n",
-                   cl_ntoh64(rem_u->sw->p_node->node_info.port_guid),
-                   (rem_u->dir == UP ? "UP" : "DOWN"));
           cl_qlist_insert_tail(&list, &rem_u->list);
         }
       }
@@ -578,9 +543,6 @@ updn_subn_rank(
       {
         remote_u = p_remote_physp->p_node->sw->priv;
         port_guid = p_remote_physp->port_guid;
-        NOISE_L( p_log, OSM_LOG_DEBUG,
-                 "updn_subn_rank: "
-                 "Ranking port GUID 0x%" PRIx64 "\n", cl_ntoh64(port_guid) );
         did_cause_update = __updn_update_rank(remote_u, rank);
 
         osm_log( p_log, OSM_LOG_DEBUG,
-- 
1.5.0.1.40.gb40d


From halr at voltaire.com  Wed Feb 28 14:01:31 2007
From: halr at voltaire.com (Hal Rosenstock)
Date: 28 Feb 2007 17:01:31 -0500
Subject: [ofa-general] Re: [PATCH TRIVIAL] opensm: remove NOISE_L macros from
	osm_ucast_updn.c
In-Reply-To: <20070228215250.GG30973@sashak.voltaire.com>
References: <45E54653.6010300@dev.mellanox.co.il>
	<20070228200426.GC30973@sashak.voltaire.com>
	<20070228215250.GG30973@sashak.voltaire.com>
Message-ID: <1172700046.31770.94902.camel@hal.voltaire.com>

On Wed, 2007-02-28 at 16:52, Sasha Khapyorsky wrote:
> This removes NOISE_L macros completely.
> 
> Signed-off-by: Sasha Khapyorsky <sashak at voltaire.com>

Thanks. Applied (to both master and ofed_1_2).

-- Hal


From arlin.r.davis at intel.com  Wed Feb 28 14:32:08 2007
From: arlin.r.davis at intel.com (Arlin Davis)
Date: Wed, 28 Feb 2007 14:32:08 -0800
Subject: [ofa-general] RE: [Bug 396] OFED 1.2 alpha DAPL failures using Intel
	MPI 3.0.33, kernel patching issues
In-Reply-To: <bug-396-108@https.bugs.openfabrics.org/>
Message-ID: <000001c75b88$48996200$9f97070a@amr.corp.intel.com>


>
>I'm seeing some problems using Intel MPI 3.0.33 with OFED 1.2 alpha, and
>Intel confirmed they are seeing it, too.
>
>[3:192.168.1.202] unexpected DAPL event 4008 from 0:192.168.1.201^M


This is the result of incorrect timeout values being used as a result of
sean_cm_limit_mra_timeout_patch. Can someone tell me the purpose of this patch and how it became
part of the OFED 1.2 build?

arlin


From sean.hefty at intel.com  Wed Feb 28 14:37:44 2007
From: sean.hefty at intel.com (Sean Hefty)
Date: Wed, 28 Feb 2007 14:37:44 -0800
Subject: [ofa-general] RE: [Bug 396] OFED 1.2 alpha DAPL failures using
	IntelMPI 3.0.33, kernel patching issues
In-Reply-To: <000001c75b88$48996200$9f97070a@amr.corp.intel.com>
Message-ID: <000001c75b89$108c0f10$ff0da8c0@amr.corp.intel.com>

>This is the result of incorrect timeout values being used as a result of
>sean_cm_limit_mra_timeout_patch. Can someone tell me the purpose of this patch
>and how it became
>part of the OFED 1.2 build?

This patch sets the timeout values incorrectly and needs to be removed from
OFED.  The purpose was to work-around a storage target firmware bug, which I
believe now has a fix.

- Sean


From xma at us.ibm.com  Wed Feb 28 14:54:53 2007
From: xma at us.ibm.com (Shirley Ma)
Date: Wed, 28 Feb 2007 14:54:53 -0800
Subject: [ofa-general] Re: [openib-general] Fw: [PATCH] enable IPoIB only if
 broadcast join finish
In-Reply-To: <OF33C71E03.98C43D68-ON8725728F.008355F5-8825728F.0057D9E3@us.ibm.com>
Message-ID: <OF6AA47BA3.7FF51354-ON87257290.007DCF65-88257290.007DD9B4@us.ibm.com>


Hello, Roland,

      Can we push this patch into OFED-1.2 as well?

Thanks
Shirley
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070228/64451c35/attachment.html>

From sean.hefty at intel.com  Wed Feb 28 15:01:12 2007
From: sean.hefty at intel.com (Sean Hefty)
Date: Wed, 28 Feb 2007 15:01:12 -0800
Subject: [ofa-general] preparing releases for librdmacm and libibcm
Message-ID: <000101c75b8c$57778c30$ff0da8c0@amr.corp.intel.com>

I've updated the librdmacm and libibcm libraries in preparation of creating 1.0
releases.  The build environments were updated on both, and additional API
documentation was added to the librdmacm.

Tar files for both have been placed at:

http://www.openfabrics.org/~shefty

Because of dependencies on libibverbs features, final 1.0 releases will wait
until libibverbs 1.1 is available.

- Sean


From mst at mellanox.co.il  Wed Feb 28 15:04:42 2007
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Thu, 1 Mar 2007 01:04:42 +0200
Subject: [ofa-general] Re: [openib-general] Fw: [PATCH] enable IPoIB only if
 broadcast join finish
In-Reply-To: <OF6AA47BA3.7FF51354-ON87257290.007DCF65-88257290.007DD9B4@us.ibm.com>
References: <OF33C71E03.98C43D68-ON8725728F.008355F5-8825728F.0057D9E3@us.ibm.com>
	<OF6AA47BA3.7FF51354-ON87257290.007DCF65-88257290.007DD9B4@us.ibm.com>
Message-ID: <20070228230442.GB25842@mellanox.co.il>

>Quoting Shirley Ma <xma at us.ibm.com>:
>Subject: Re: [openib-general] Fw: [PATCH] enable IPoIB only if broadcast join finish
>
>Hello, Roland,
>
>Can we push this patch into OFED-1.2 as well?
>
>Thanks
>Shirley

I'd like to see Roland's opinion first.

-- 
MST


From arlin.r.davis at intel.com  Wed Feb 28 16:55:17 2007
From: arlin.r.davis at intel.com (Arlin Davis)
Date: Wed, 28 Feb 2007 16:55:17 -0800
Subject: [ofa-general] [PATCH] udapl dtest, add delay option before accept
Message-ID: <000101c75b9c$47736880$9f97070a@amr.corp.intel.com>

Added optional delay feature to dtest before accept which is used to validate rdma_cm timeout
settings.

Signed-off by: Arlin Davis ardavis at ichips.intel.com


diff --git a/test/dtest/dtest.c b/test/dtest/dtest.c
index fe8a366..86b70cc 100644
--- a/test/dtest/dtest.c
+++ b/test/dtest/dtest.c
@@ -405,9 +405,6 @@ cleanup:
        else
                LOGPRINTF("%d unregister_rdma_memory done\n", getpid());
 
-
-       if (delay) sleep(delay);
-
        /* Free protection domain */
        LOGPRINTF("%d Freeing pz\n",getpid());
        start = get_time();
@@ -699,6 +696,8 @@ connect_ep( char *hostname, int conn_id )
            return( DAT_ABORT );
        }
 
+       if (delay) sleep(delay); /* use to test rdma_cma timeout logic */
+
         /* accept connect request from client */
        h_cr = event.event_data.cr_arrival_event_data.cr_handle;
         LOGPRINTF("%d Accepting connect request from client\n",getpid());
@@ -1799,7 +1798,7 @@ void print_usage()
     printf("c: use cno\n");
     printf("v: verbose\n");
     printf("p: polling\n");
-    printf("d: delay before close\n");
+    printf("d: delay before accept\n");
     printf("b: buf length to allocate\n");
     printf("B: burst count, rdma and msgs \n");
     printf("h: hostname\n");


From mshefty at ichips.intel.com  Wed Feb 28 17:06:41 2007
From: mshefty at ichips.intel.com (Sean Hefty)
Date: Wed, 28 Feb 2007 17:06:41 -0800
Subject: [ofa-general] Re: [openib-general] [PATCH] IB/core: Set static rate
 in ib_init_ah_from_path()
In-Reply-To: <45E19730.7010008@dev.mellanox.co.il>
References: <000401c75223$29e86ea0$e598070a@amr.corp.intel.com>
	<1431.85.65.224.140.1171732569.squirrel@dev.mellanox.co.il>
	<adar6sn74fq.fsf@cisco.com> <45E19730.7010008@dev.mellanox.co.il>
Message-ID: <45E62721.1030605@ichips.intel.com>

> int ib_init_ah_from_path(struct ib_device *device, u8 port_num,
>                          struct ib_sa_path_rec *rec, struct ib_ah_attr 
> *ah_attr)
> {
>         int ret;
>         u16 gid_index;
> 
>         memset(ah_attr, 0, sizeof *ah_attr);
>         ah_attr->dlid = be16_to_cpu(rec->dlid);
>         ah_attr->sl = rec->sl;
>         ah_attr->src_path_bits = be16_to_cpu(rec->slid) & 0x7f;

I looked at this more, and to be technically correct here, what we can do is:

in update_sm_ah:
    use port_attr.LMC to record a src_path_mask with ib_sa_port

in ib_init_ah_from_path:
    use the src_path_mask from ib_sa_port to set src_path_bits

However, I'm not completely convinced that masking off the upper bits of the 
SLID is necessary when setting the src_path_bits, which means that the mask used 
above could be removed.

- Sean


From rowland at cse.ohio-state.edu  Wed Feb 28 18:09:43 2007
From: rowland at cse.ohio-state.edu (Shaun Rowland)
Date: Wed, 28 Feb 2007 21:09:43 -0500
Subject: [ofa-general] ofed_1_2_scripts update patchs
Message-ID: <45E635E7.5030401@cse.ohio-state.edu>

I've uploaded a new MVAPICH2 SRPM: mvapich2-0.9.8-5.src.rpm. I will have
to upload a new version again before the beta release, but I wanted to
get these patches out and a new SRPM uploaded ASAP. I've attached the
following patches done against the latest update for the
ofed_1_2_scripts GIT repository:

mvapich2.patch
--------------
- fix for bug 386
- adds mpi-selector support to MVAPICH2
- changes one default setting for MVAPICH2 in the case the user has not 
specified build options

mpi-selector.patch
------------------
- fixes an ordering problem in install.sh around line 130 (see below)

The mpi-selector patch fixes an ordering issue with the mpi-selector
package removal in install.sh around line 130. Since mpi-selector is a
requirement for the MPI packages, it should be removed after the MPI
packages instead of before. This patch changes the order to match what
is done in the uninstall.sh script. I mentioned this to Jeff, so if he's
doing maintenance on this part of the script - perhaps it would be
better for him to review this patch... or whatever needs to happen. It
is just something I noticed. I thought I'd pass along this patch too
since I was trying the latest scripts with the OFED-1.2-20070226-1758
files and it solved the uninstall problem I was having.

In the course of looking into this, I also noticed that the MVAPICH SRPM
is not actually adding mpi-selector as a requirement like the MVAPICH2
and OMPI RPMs are doing. The %use_mpi_selector macro is there, but it is
not defined as 1 until after it is used in the %if condition where
Requires: is set. The mpi-selector program is used, but it's not set in
Requires because of this ordering.
-- 
Shaun Rowland	rowland at cse.ohio-state.edu
http://www.cse.ohio-state.edu/~rowland/
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: mvapich2.patch
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070228/29c73ec2/attachment.ksh>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: mpi-selector.patch
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070228/29c73ec2/attachment-0001.ksh>

From jsquyres at cisco.com  Wed Feb 28 20:05:35 2007
From: jsquyres at cisco.com (Jeff Squyres)
Date: Wed, 28 Feb 2007 23:05:35 -0500
Subject: [ofa-general] Re: ofed_1_2_scripts update patchs
In-Reply-To: <45E635E7.5030401@cse.ohio-state.edu>
References: <45E635E7.5030401@cse.ohio-state.edu>
Message-ID: <F11128C7-D73A-415B-AED5-F58EEF59EFF4@cisco.com>

On Feb 28, 2007, at 9:09 PM, Shaun Rowland wrote:

> The mpi-selector patch fixes an ordering issue with the mpi-selector
> package removal in install.sh around line 130. Since mpi-selector is a
> requirement for the MPI packages, it should be removed after the MPI
> packages instead of before. This patch changes the order to match what
> is done in the uninstall.sh script. I mentioned this to Jeff, so if  
> he's
> doing maintenance on this part of the script - perhaps it would be
> better for him to review this patch... or whatever needs to happen. It

Looks perfect to me.  Thanks!

-- 
Jeff Squyres
Server Virtualization Business Unit
Cisco Systems


From or.gerlitz at gmail.com  Wed Feb 28 21:42:57 2007
From: or.gerlitz at gmail.com (Or Gerlitz)
Date: Thu, 1 Mar 2007 07:42:57 +0200
Subject: [ofa-general] Re: [openib-general] [RFC] [PATCH] ib_cache: do not
	mask upper bit when searching for a pkey
In-Reply-To: <15ddcffd0702281043h52ca49e7t110bc75e3ad2a832@mail.gmail.com>
References: <1172507101.4102.277140.camel@hal.voltaire.com>
	<000201c759e3$24828410$55d8180a@amr.corp.intel.com>
	<15ddcffd0702281043h52ca49e7t110bc75e3ad2a832@mail.gmail.com>
Message-ID: <15ddcffd0702282142i3213a922s106246dd18a42930@mail.gmail.com>

resent - with a CC to general at lists.openfabrics.org so the message
will get to the list...

On 2/28/07, Or Gerlitz <or.gerlitz at gmail.com> wrote:
> On 2/26/07, Sean Hefty <sean.hefty at intel.com> wrote:
> > I think the following patch would make ipoib spec compliant.
> > ib_find_cached_pkey is called by ib_cm, rdma_cm, ib_srp, and ib_ipoib.
> > I'm not certain what this change would do to SRP, but the ib_cm and
> > rdma_cm look okay, given that non-reversible paths aren't supported
> > yet anyway.
>
> Sean,
>
> As Moni stated, we need this functionality and among other scenarions,
> the use case I have mentioned over this discussion was of an I/O
> target being a full member of a partition where the initiators
> connected to it being partial members - since they need not and should
> not talk among themselves.
>
> The connection may be implemented over TCP/UDP on top of IPoIB (eg
> iscsi / nfs / some cluster file system) or over the RDMA CM and the
> VERBS (iSER / rNFS / native implementation of cluster file systems) or
> over the IB CM and the VERBS (srp).
>
> For all the above cases expect for SRP IPoIB is used as the ARP
> provider and it means that the nodes with the partial membership must
> join the "IPv4 broadcast" IB multicast group. This is working fine
> with the openib IPoIB and core implementation running against the
> Voltaire SA/SM and as Hal commented (Hal - can you verify it? see (*)
> below ) also against the open SM/SA. My guess this is also working
> fine with TopSpin/Cisco SM/SA.
>
> (*) simply configure the SM to allocate 0xffff (index 0) and 0x8001
> (index 1) to node A, then 0x7fff (index 0) and 0x0001 (index 1) to
> node B. Now, configure ib0 of both nodes to subnet X, create an 0x8001
> ib0 child on both and configure ib0.8001 to subnet Y, make sure you
> have pings on both subnets - thanks!
>
> My suggestion is that we act to have the spec changed to match this
> real need and not that this code (my guess which is present there from
> day one, I guess Roland can tell) would be removed to match the spec.
>
> Or.
>


From dotanb at dev.mellanox.co.il  Wed Feb 28 22:57:39 2007
From: dotanb at dev.mellanox.co.il (Dotan Barak)
Date: Thu, 01 Mar 2007 08:57:39 +0200
Subject: [ofa-general] Re: [openib-general] [PATCH] IB/core: Set static rate
 in ib_init_ah_from_path()
In-Reply-To: <45E62721.1030605@ichips.intel.com>
References: <000401c75223$29e86ea0$e598070a@amr.corp.intel.com>
	<1431.85.65.224.140.1171732569.squirrel@dev.mellanox.co.il>
	<adar6sn74fq.fsf@cisco.com> <45E19730.7010008@dev.mellanox.co.il>
	<45E62721.1030605@ichips.intel.com>
Message-ID: <45E67963.1070801@dev.mellanox.co.il>

Sean Hefty wrote:
>> int ib_init_ah_from_path(struct ib_device *device, u8 port_num,
>>                          struct ib_sa_path_rec *rec, struct 
>> ib_ah_attr *ah_attr)
>> {
>>         int ret;
>>         u16 gid_index;
>>
>>         memset(ah_attr, 0, sizeof *ah_attr);
>>         ah_attr->dlid = be16_to_cpu(rec->dlid);
>>         ah_attr->sl = rec->sl;
>>         ah_attr->src_path_bits = be16_to_cpu(rec->slid) & 0x7f;
>
> I looked at this more, and to be technically correct here, what we can 
> do is:
>
> in update_sm_ah:
>    use port_attr.LMC to record a src_path_mask with ib_sa_port
>
> in ib_init_ah_from_path:
>    use the src_path_mask from ib_sa_port to set src_path_bits
>
> However, I'm not completely convinced that masking off the upper bits 
> of the SLID is necessary when setting the src_path_bits, which means 
> that the mask used above could be removed.
I think that this behavior is much better that current behavior ..

thanks
Dotan